Fuzzy Text Matching
The output of an OCR process often contains many characters
which have not been recognized correctly. This is especially true in
postal sorting applications, and scans of old, degraded, partially
defective, or simply low quality images. Our fuzzy matching solution
relies on highly optimized brute-force string matching, and it runs on normal
hardware. For every input string FuzzServer is able to
retrieve the most similar record from millions of candidates in a
database with in less than one tenth of a second. That's fast!
Intelligent alignment based string mapping
During the nearest string search, FuzzServer performs automatic alignment of the strings using many similarity metrics. On the resulting aligned string match shortlist, powerful validation and classification algorithms can be tuned to deliver confidence scores to any precision and recall trade-off.
The FuzzServer engine is an OEM product that can be integrated into Text Mining, Search Engines, Product and Customer ID Search, Directory Search and Database cleaning products. The technology is used in many other Textkernel products to power mapping to taxonomies and information extraction from noisy OCR data.
Some examples of Textkernel Text Matching solutions are:
- Postal Sorting
- Table recognition in Fax Orders
- Database deduplication
- Citation extraction and matching
- Text mining from OCR'd archives

