Ginstrom IT Solutions (GITS)

subdist

subdist is a fast Python module for finding fuzzy substring matches. It uses a modified version of the Levenshtein distance algorithm (described here).

Version History

0.1.1
Initial release
0.2
Added get_score function, which returns a score between 0.0 and 1.0 based on the edit distance
0.2.1
Improved docstrings, refactored, added slight speedup to distance function.

Quick Links

Here is a blog article describing the algorithm. Here is an article describing the C extension.