subdist python module

subdist

subdist is a fast Python module for finding fuzzy substring matches. It uses a modified version of the Levenshtein distance algorithm (described here).

Version History

0.1.1
Initial release
0.2
Added get_score function, which returns a score between 0.0 and 1.0 based on the edit distance
0.2.1
Improved docstrings, refactored, added slight speedup to distance function.

Usage

Get the Levenshtein (edit) distance of a substring

import subdist needle = u"short string" haystack = u"This is a long string" distance = subdist.substring(needle, haystack)

Get the fuzzy match score (0.0 to 1.0) of a substring

import subdist score = subdist.get_score(needle, haystack)

Quick Links

Here is a blog article describing the algorithm. Here is an article describing the C extension.