Extending Python with C: A case study

Near-100x speedup with a C extension

I recently wrote about an algorithm for fuzzy matching of substrings implemented in Python. This is a feature that I needed for a piece of software I'm currently developing.

When I started using the fuzzy_substring function on some test cases, however, it was unacceptably slow. Using a modestly large test corpus and about 1,000 search terms, the function was taking about 30 seconds to run. Since this needs to be run in response to a user query, 30 seconds was just too long to wait.

First attempt

The first thing I tried to get this running faster was psyco. Merely by sticking a psyco.full() at the top of my script, the run time went down to 11 seconds. A 3x speedup for zero effort is pretty cool!

Pscyo rocks, but 11 seconds was still too slow. So, I decided to bite the bullet and write this function as an extension module in C.

The C extension

I'd never written a pure-C extension module before, and hadn't touched C in about 10 years, so it was with some trepidation that I read the official docs on extending and embedding and an online tutorial by Michael Hudson.

It turned out to be incredibly easy. I simply copied an example C file and setup.py from the docs, filled in my method, ran setup.py, and pow! A shiny new subdist.pyd file! Since I'd already written the function in Python, it only took me about an hour to write it in C, and I could use the same unit tests to verify it.

Update: I've set up a Web page for this module with code & binaries.


I ran a benchmark on the C extension versus the pure-Python version (with and without psyco). For the benchmark, I took 1,000 unique words from the Python 2.5 README file as my needles, and the 100th to 200th lines in this same file as my haystacks.

text = unicode(open("/python25/readme.txt").read())
sentences = text.splitlines()[100:200]
words = list(set(text.split()))[:1000]

I then got the fuzzy substring match for each word against each of my "sentences."

def time_func(func, words, sentences):
    start = time.time()
    for sentence in sentences:
        for word in words:
            func(word, sentence)
    return time.time() – start

The speedup

Each function was run 10 times; here are the low, high, and median run times:

  Low Median High
Python 53.6410 s 54.6560 s 55.2190 s
Python (psyco) 17.6100 s 17.7960 s 18.0620 s
C 0.7030 s 0.7190 s 0.7340 s


Python (psyco) to Python: 0.3256
C to Python (psyco): 0.0404
C to Python: 0.0132

So the C extension is almost 100 times faster than the pure Python function, and over 20 times faster than Python with psyco, all for an hour's work. Not too shabby!

There are actually a couple more tricks I could have used in the C code to get a further 10% to 30% speedup (e.g. not calculating the lower left and upper right corners), but it's already fast enough for my purposes. If I need to squeeze out more speed in the future, I'll add the somewhat obfuscating optimizations then.


Compiling extensions in C is incredibly easy, much easier than I expected. Pure C is especially suited to cases like this, where you can immediately get away from Python objects and work with pure C data types. Writing my fuzzy string matching algorithm as a C extension turned an algorithm that was sort of interesting into something that will actually be useful.

It's fantastic to be able to get this kind of speedup on performance bottlenecks, yet still write 99% of my code using pure Python.


Downloads are available here. All code and binaries are released under the MIT license.


from subdist import substring
print substring(u"needle", u"Find the needle in the haystack")

Caveat: The function only accepts Unicode strings, and will return an error if non-Unicode strings are passed in.

9 comments to Extending Python with C: A case study

  • nice work. libdistance implements these fuzzy string matching algorithms in C and has python bindings:


    levenshtein, hamming, jaccard, and a few others.

    i hope you find it useful!

  • Thanks a lot for the link! I especially appreciate the very liberal license.

    I haven’t seen any implementations of the substring-match specialization, which is why I wrote this one. Actually I thought I had invented it myself, so I was a little disappointed to see it already described on the wikipedia page for Levenshtein distance 🙂

  • Story added…

    Your story has been submitted to fsdaily.com! Come and promote your article by voting for it here: http://www.fsdaily.com/HighEnd/Extending_Python_with_C_A_case_study

  • You’ve probably already seen it,
    but the Programming Lang Shootout has some interesting
    any lang vs. any lang benchmarks…

    just browsing the code can be interesting too, to see
    how they optimize stuff in python etc.

  • @alex

    Yes, I don’t think anyone will argue that languages like C are much faster than VHLL like Python at these micro-benchmarks. I want to use Python wherever possible, and dip down to a lower-level language to break performance bottlenecks. My goal is to write something that’s 99% Python, yet still 99% as fast as a pure C program 🙂

    Thanks for the idea about browsing the Python source to look for optimization ideas. The Python developers really are wizards about writing tight code. I like their motto: “We read Knuth so you don’t have to.”

  • ben

    The benchmarks at the shootout site above don’t mean anything. For example Perl is an interpreted language written in C, and yet Perl is supposed to be faster than C on some of those benchmarks. Since the Perl is a C program and the benchmark is a C program, all it means is that the C code used in the benchmark isn’t as good as the C code in Perl. It’s testing the programmer’s skill, not the language itself. The only thing that we can conclude is that Larry Wall and co are better C programmers than the person who wrote the benchmark.

    Computer “science” is riddled with this kind of pseudo science.

  • Chris

    I just wanted to mention that implementing min as a function is not c-esque. Remember that every time you call the function, a stack frame needs to be build up. Therefore, people usually define it as a preprocessor macro:

    #define MIN2(a, b) ((a)

  • Chris

    Your blog software might have eaten my comment,
    you can find MIN2, MIN3 at http://untergrund.bewaff.net/~chris/min.h

  • @Chris:

    Thanks for the comment. Yes, as I noted in the article, there are a few more things I can do with the code to get a 10% to 30% speedup.

    Actually though, I don’t know if you really have a function-call overhead for something like min. I don’t have a lot of experience with gcc, but the MS compiler would almost certainly inline that for me. I guess you’d have to look at the generated assembly language to know for sure.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>