NLP Case Study
Business Challenge
For commercial transactions, in addition to filing state taxes at various levels like state, county, and city, there can be special tax jurisdictions (STJs) for applicable to entities like Fire, Police, Health, Library, Transportation, and others. The boundaries of special tax jurisdictions can be arbitrary, and may not be related to any other geographic boundary, such as city limits or zip codes.
Since there is no standardized master data set for these jurisdictions across the government agencies, and the industry, extracting and mapping these STJ elements are very error prone and requires a lot of manual effort.
Our client, a large commercial tax service company, needed a way to extract the jurisdictions from various reports and invoices, and map them to its own target master set so it could apply taxes correctly, without extensive offline intervention.
EQengineered’s Approach
The EQengineered data team developed an approach to identify special jurisdictions quickly and accurately from various source systems and map them to a target set, with minimal manual intervention.
Some of the highlights of the approach are:
Use of NLP and fuzzy logic to automatically map from one set of jurisdiction names to another.
Use of synonym dictionaries, weighting criteria, and state specific business rules to improve the overall accuracy of the mapping process.
High-degree of configurability to adapt the solution to different sources of data.
Business Results | Outcomes
The mapping solution achieved mapping accuracy greater than 90% for most of the states that appeared in the source data sets.
This work provides a foundation for building a production-level solution including integration with multiple tax systems, both internal and external, periodic refresh of the special jurisdiction dictionary and configurations, and mapping business rules.