Aim high

Alex Martelli has a great quote on optimization in Python in a Nutshell:

Start by designing, coding, and testing your application in Python, using available extension modules if they save you work. This takes much less time than it would with a classic compiled language. Then benchmark the application to find out if the resulting code is fast enough. Often it is, and you're done — congratulations! Ship it!

Since much of Python itself is coded in highly optimized C, as are many of its standard and extension modules, your application may even turn out to be already faster than typical C code. However, if the application is too slow, you need to reexamine your algorithms and data structures. Check for bottlenecks due to application architecture, network traffic, database access, and operating system interactions. For typical applications, each of these factors is more likely than language choice to cause slowdowns. Tinkering with large-scale architectural aspects can often speed up an application dramatically, and Python is an excellent medium for such experimentation.

If your program is still too slow, profile it to find out where the time is going. Applications often exhibit computational bottlenecks: small areas of the source code, often between 10 and 20 percent, which account for 80 percent or more of the running time. Then optimize the bottlenecks…

I write just about all my code in Python first. Then if it's not fast enough, I profile it to find the bottlenecks. First, I try optimizing the algorithms for the bottlenecks. If it's still too slow, I try optimizing the code — things like hoisting method lookups out of the loop, replacing slower idioms like string concatenation with list joins, etc.

If it's still too slow, I'll rewrite the remaining bottlenecks in a lower-level language. I continue to profile to ensure that I'm speeding up what I think I am.

Even when I have a pretty good idea that I'm going to have to implement a certain function in a lower-level language, I'll still write it in Python first. Having the high-level implementation makes writing the lower-level one a lot easier. Plus the resulting code is likely to be cleaner and have fewer bugs.

As an example, I once wrote a file-parser in Python. It was too slow, so I rewrote some parts of it in C++. Overall, development took me about two days — a day for the Python, and a day for the C++.

Some time before, I had written a different file parser entirely in C++. The level of complexity was about the same — but this parser took me about a week to write, and I was finding bugs in it for a couple of months.

The Python-based parser was much more free of bugs; it was also a lot easier to maintain and extend, something that has been of huge value. But the real kicker was that my hybrid Python/C++ parser performed as well or better than the pure-C++ version (it's impossible to compare them directly, because they were parsing different things).

So my motto now is to aim high — put everything into a high-level language (usually Python in my case), dipping down to lower-level languages only when necessary.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>