Archive for the 'python' Category

Aim high

Alex Martelli has a great quote on optimization in Python in a Nutshell:
Start by designing, coding, and testing your application in Python, using available extension modules if they save you work. This takes much less time than it would with a classic compiled language. Then benchmark the application to find out if the resulting code [...]

The past, present, and future of optimization

I have a relative ("Dan") who used to earn a living optimizing code in the late 70s and early 80s. Around then, a new-fangled high-level language named "C" was starting to catch on, but companies didn't like all the wasted cycles in C programs due to the under-optimized assembly code that their C compilers were [...]

Extending Python with C: A case study

Near-100x speedup with a C extension
I recently wrote about an algorithm for fuzzy matching of substrings implemented in Python. This is a feature that I needed for a piece of software I'm currently developing.
When I started using the fuzzy_substring function on some test cases, however, it was unacceptably slow. Using a modestly large test corpus [...]

Fuzzy substring matching with Levenshtein distance in Python

Levenshtein distance is a well known technique for fuzzy string matching. With a couple of modifications, it's also possible to use Levenshtein distance to do fuzzy matching of substrings.
Let's take a simple example just to show what I mean.
needle: "aba"
haystack: "c abba c"
We can intuitively see that "aba" should match up against "abba." Here's a [...]

Parsing multilingual email with Python

The email module in the Python standard library provides just about everything you need to parse multilingual emails with Python. There are a few traps, however, that can catch the unaware and unwary.
Parsing an email message
The email module provides a couple of handy functions for parsing email: message_from_string and message_from_file. Both functions return a Message [...]

Fixing JIS mojibake with Python

JIS (iso-2022-jp) is a Japanese text encoding. The beginning and end of a JIS sequence are marked by escape sequences:
Beginning: 1B $ B (or 1B $ @)
Ending: 1B ( J (or 1B ( B)
This encoding is often used in email. Unfortunately, some email programs (and even mail routers) strip out the escape characters. This doesn't [...]

Honyaku mailing-list code open sourced

I've open sourced the code for the honyaku mailing-list archive, and posted the code to Google code (I named it ml-archive because my plan is to make it a generic mailing-list archive site). The site is written in Python, using the django web framework. It's released under the MIT license.
One of the challenges I'm facing [...]

Custom django template filter: Sanitize emails

Still working on my Honyaku mailing list archive site, I needed a template filter to sanitize emails. The filter itself was very simple: it was finding out how to write a template filter that took most of my research.
The filter code
Here's the code for the filter. As I said, pretty simple stuff.

#coding: UTF8
"""
A template filter [...]

Splitting queries into search terms with Python

I'm currently writing a site to host the archives for the Honyaku mailing list. I needed to split a search query into individual terms, so for example the query "懸念 risk" would retrieve all posts with the words "懸念" and "risk", not necessarily in that sequence or order. I also wanted to support quoted [...]

Python introspection with the inspect module

Here are some fun and potentially useful tricks you can do using the inspect module.
Get a list of currently imported modules:

import inspect
myglobals = dict()
myglobals.update(globals())
modules = [value
           for key, value in myglobals.items()
           if inspect.ismodule(value)]
Get a list of classes in a module (by Marc 'BlackJack' Rintsch):

import module
from inspect import [...]

« Previous PageNext Page »