Speeding up search on Honyaku archive site

Last summer, I launched a new archive site for the Honyaku mailing list.
The site is written in Python using the django framework, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.
Lately, however, the searches have been taking a huge amount of time. […]

Do the math

I don't like doing so-called "native checks," or proofing other translators' work in general. It just turns into a bad experience way too often.
I'm candid about this with clients. I tell them I prefer not to do that sort of work. Sometimes they ask anyway, and if they're good clients (i.e. they send […]

What price elegance?

In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.
The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative […]

Counting occurrences in a sequence with itertools.groupby

itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.
Here are some examples from the interactive interpreter.
A list of numbers

>>> # Create a random list of numbers
>>> from random import random
>>> numbers = [int(random() * 10) for x in range(20)]
>>> numbers
[8, 0, 3, 2, 3, 9, 8, 2, 8, 3, 0, […]

Making the robot dance

Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing Hunt the Wumpus, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. After a lot of typos, […]

Using chardet to convert arbitrary byte strings to Unicode

chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.
Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.

import chardet
def bytes2unicode(bytes, errors='replace'):
    """Convert a byte string into Unicode.
    First checks […]

Delivering the bad news

A few weeks ago, a translation agency I work for occasionally called me in a panic. It seems that a major client had rejected one of their Japanese-to-English translations, calling it "unreadable," and providing another translation as a sample of the quality they were after.
The agency wanted to pay me to review their translation, and […]

Python GUI programming platforms for Windows

[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.
There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.
Tkinter
Tkinter is the ubiquitous […]

Possibly the best name for a programming language ever

Coq.
I would love to see this adopted at the Enterprise level. Could you imagine slipping sentences like these into your next corporate-drivelspeak document?
The vendors have really bent over backwards to make introducing Coq as painless as possible.
We don't believe in shoving Coq down people's throats. In fact, we've found that once people have had a […]

Translating maru batsu into English

Japanese has a very handy shorthand for rating things:

symbol
pronunciation
meaning


nijuu maru
excellent


maru
good


sankaku
fair

×
batsu
poor

I avoid using these symbols in my English translations. Even if a legend is included, I think they're too "foreign" to be easily understood by non-Japanese speakers.
The "translations" I use depend on the context. If the full range of symbols is used for ratings, I often […]

« Previous PageNext Page »