Archive for the 'programming' Category

Counting words (etc.) in an HTML file with Python

In a previous post, I wrote about how to count words, characters, and Asian characters using python.
In this post I want to pull that together with code to get a word count from an HTML file.
What needs counting
What needs counting depends to some extent on what you need the word count for, but here I'm [...]

Speeding up search on Honyaku archive site

Last summer, I launched a new archive site for the Honyaku mailing list.
The site is written in Python using the django framework, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.
Lately, however, the searches have been taking a huge amount of time. [...]

What price elegance?

In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.
The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative [...]

Counting occurrences in a sequence with itertools.groupby

itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.
Here are some examples from the interactive interpreter.
A list of numbers

>>> # Create a random list of numbers
>>> from random import random
>>> numbers = [int(random() * 10) for x in range(20)]
>>> numbers
[8, 0, 3, 2, 3, 9, 8, 2, 8, 3, 0, [...]

Making the robot dance

Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing Hunt the Wumpus, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. After a lot of typos, [...]

Using chardet to convert arbitrary byte strings to Unicode

chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.
Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.

import chardet
def bytes2unicode(bytes, errors='replace'):
    """Convert a byte string into Unicode.
    First checks [...]

Python GUI programming platforms for Windows

[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.
There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.
Tkinter
Tkinter is the ubiquitous [...]

Intermediate Python: Pythonic file searches

It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very pythonic — that is, it doesn't use Python idioms that experienced programmers use.
The problems with un-pythonic code are that it tends to be more verbose, more difficult to [...]

The partial rewrite

I haven't been doing much blogging lately. Instead, I've been busy working and hacking. On the hacking side of things, I've been doing a partial rewrite of my big application, a translation-memory application written in C++.
The application is now about 10 years old, and over time and several releases it's grown more and more difficult [...]

Text speak conversion in Python!

Want to write in txt spk like all the cool mobile-toting kids, but tired of figuring out which letters to leave out? No problem — just run your text through the handy to_txt_spk function!

>>> def to_txt_spk(words):
    return "".join(c for c in words if c not in "aeiou")
>>> to_txt_spk("Impress your friends with your text speak [...]

« Previous PageNext Page »