Fuzzy Text Matching

Textkernel's FuzzServer software delivers high speed fuzzy text matching to any application. It was built for the matching of very noisy OCR output to databases with many millions of records.

The output of an OCR process often contains many characters which have not been recognized correctly. This is especially true in postal sorting applications, and scans of old, degraded, partially defective, or simply low quality images. Our fuzzy matching solution relies on highly optimized brute-force string matching, and it runs on normal hardware. For every input string FuzzServer is able to retrieve the most similar record from millions of candidates in a database with in less than one tenth of a second. That's fast!

Intelligent alignment based string mapping

During the nearest string search, FuzzServer performs automatic alignment of the strings using many similarity metrics. On the resulting aligned string match shortlist, powerful validation and classification algorithms can be tuned to deliver confidence scores to any precision and recall trade-off.

The FuzzServer engine is an OEM product that can be integrated into Text Mining, Search Engines, Product and Customer ID Search, Directory Search and Database cleaning products. The technology is used in many other Textkernel products to power mapping to taxonomies and information extraction from noisy OCR data.

Some examples of Textkernel Text Matching solutions are:

  • Postal Sorting
  • Table recognition in Fax Orders
  • Database deduplication
  • Citation extraction and matching
  • Text mining from OCR'd archives

More info or demo?