The partial rewrite

I haven't been doing much blogging lately. Instead, I've been busy working and hacking. On the hacking side of things, I've been doing a partial rewrite of my big application, a translation-memory application written in C++.

The application is now about 10 years old, and over time and several releases it's grown more and more difficult to add new features, while fighting the introduction of bugs. I have in fact rewritten this application once already. But as features again began to accrete, maintenance and improvement were starting to drag again.

The definition of insanity

The definition of insanity is doing the same thing over and over and expecting different results.

— Benjamin Franklin

The first time I rewrote the application, maintenance and improvement got a lot easier (for a couple of years at least), so my first thought was to rewrite the thing again. But that obviously didn't turn out so well last time in the long run, or I wouldn't be considering another rewrite 6 years later. Meanwhile, during the intervening years lots of new ideas about development have been bumping around in my head.

One is the idea that rewrites usually aren't a very good idea: even if they succeed, their goals could have been met more efficiently. Another is the idea of improving code bases through refactoring, as detailed brilliantly in Working Effectively with Legacy Code. Finally, I've been turned on to the great power of working with higher-level languages as much as possible.

What's worth doing is worth doing halfway

So this time around, I decided to do things a little differently. Instead of rewriting the whole application, I would rewrite parts of it in a higher-level language, refactoring the rest. Since I'm a big fan of Python, I decided to use it for my high-level language.

Since the existing application already uses COM heavily, I decided to use COM as the communication mechanism between C++ and Python code. The next question was, what parts should I rewrite?

Some parts were a no-brainer. My application hosts the IE browser, and I had written a custom templating library in C++ to generate content. I would be happy to ditch that and generate the HTML content in Python, using one of the many available libraries.

It also seemed obvious that a lot of the new features would be done in Python. One of the main reasons for my frustration with the existing code base was my desire to add networking support. This is quite a pain to do in C++, but gloriously simple in Python.

Some of the decisions took a little more of a leap. One example was the core matching engine for the memory database. There are lots of improvements I wanted to make to the engine, but the going was too slow and difficult with the existing code base. But on the other hand, I was scared about sacrificing performance by rewriting it in Python [insert witty joke about premature optimization here — Ed].

This was my worry as I researched the issue, trying various approaches and doing a lot of profiling. And indeed, my first naive implementations in Python were about 30x slower than the C++ code. I was disheartened but kept at it, profiling, optimizing bottlenecks, and rewriting in C the parts that I couldn't get any faster.

I was thus pleasantly surprised when my first serious pass at a replacement — written in a mixture of Python and C extensions — was 3 to 10 times faster than the pure C++ code, depending on the test data! I did this by being eager in some cases (caching results that would be needed repeatedly), lazy in others (avoiding calculations that might not be necessary), and writing the speed-critical portions in C.

I've heard the Python gurus preach this very thing for years, and thought that I believed them. But it took seeing this first-hand to truly realize that starting with Python and optimizing bottlenecks can give you faster code in the end, because you're more likely to choose better algorithms.

With the exception of a few functions written in C, the code for the matching engine — including several "tight" loops — is now pure Python. If the performance gains I've already won turn out to not be enough, I could rewrite more of the code in C, but right now it's good enough, and I have a much better base on which to build my desired improvements.

The happy ending

There were a couple hairy moments early on, when I wondered if I wasn't adding more complexity than I was taking out. After all, I was going from a monolithic C++ application to a mixture of C++, Python, and C. But I persevered, and turned the corner. It's so gratifying now to see my code get cleaner, smaller, better tested, and even faster (!) each day.

2 comments to The partial rewrite

  • salamander

    Fully agree that carefully rewriting is better than throwing away. Which TM did you work on?

  • I’m working on the next version of TransAssist (a rather minor TM).

    _Working Effectively with Legacy Code_ was really what made it possible to build on the existing code base instead of throwing it out. I didn’t figure out ways to unit test GUI code until after the last rewrite, and until I used the techniques in the book, modifying GUI code was so error prone and time consuming that rewriting seemed more attractive. (No connection to the book or author, just a happy user)

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>