<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.2" -->
<rss version="0.92">
<channel>
	<title>The GITS Blog</title>
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<lastBuildDate>Sat, 17 May 2008 00:53:04 +0000</lastBuildDate>
	<docs>http://backend.userland.com/rss092</docs>
	<language>en</language>
	
	<item>
		<title>Counting words (etc.) in an HTML file with Python</title>
		<description>In a previous post, I wrote about how to count words, characters, and Asian characters using python.

In this post I want to pull that together with code to get a word count from an HTML file.

What needs counting

What needs counting depends to some extent on what you need the word ...</description>
		<link>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</link>
			</item>
	<item>
		<title>The invisible translator</title>
		<description>By the nature of our profession, translators are generally invisible when they're doing their jobs right.

I say "generally" because this isn't quite a universal truth. For example, unlike in the United States, Japan is a country where a movie subtitle translator (and arguably not even a stellar one) can become ...</description>
		<link>http://ginstrom.com/scribbles/2008/05/15/the-invisible-translator/</link>
			</item>
	<item>
		<title>Speeding up search on Honyaku archive site</title>
		<description>Last summer, I launched a new archive site for the Honyaku mailing list.

The site is written in Python using the django framework, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.

Lately, however, the searches have been taking ...</description>
		<link>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/</link>
			</item>
	<item>
		<title>Do the math</title>
		<description>I don't like doing so-called "native checks," or proofing other translators' work in general.  It just turns into a bad experience way too often. 

I'm candid about this with clients. I tell them I prefer not to do that sort of work. Sometimes they ask anyway, and if they're ...</description>
		<link>http://ginstrom.com/scribbles/2008/03/23/do-the-math/</link>
			</item>
	<item>
		<title>What price elegance?</title>
		<description>In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.

The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a ...</description>
		<link>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/</link>
			</item>
	<item>
		<title>Counting occurrences in a sequence with itertools.groupby</title>
		<description>itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.

Here are some examples from the interactive interpreter.

A list of numbers

&#62;&#62;&#62; # Create a random list of numbers
&#62;&#62;&#62; from random import random
&#62;&#62;&#62; numbers = [int(random() * 10) for x in range(20)]
&#62;&#62;&#62; numbers
[8, 0, 3, 2, 3, 9, ...</description>
		<link>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</link>
			</item>
	<item>
		<title>Making the robot dance</title>
		<description>Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing Hunt the Wumpus, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. ...</description>
		<link>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/</link>
			</item>
	<item>
		<title>Using chardet to convert arbitrary byte strings to Unicode</title>
		<description>chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.

Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.


import chardet

def bytes2unicode(bytes, errors='replace'):
    """Convert a ...</description>
		<link>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</link>
			</item>
	<item>
		<title>Delivering the bad news</title>
		<description>A few weeks ago, a translation agency I work for occasionally called me in a panic. It seems that a major client had rejected one of their Japanese-to-English translations, calling it "unreadable," and providing another translation as a sample of the quality they were after.

The agency wanted to pay me ...</description>
		<link>http://ginstrom.com/scribbles/2008/03/01/delivering-the-bad-news/</link>
			</item>
	<item>
		<title>Python GUI programming platforms for Windows</title>
		<description>[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.

There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to ...</description>
		<link>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</link>
			</item>
	<item>
		<title>Possibly the best name for a programming language ever</title>
		<description>Coq.

I would love to see this adopted at the Enterprise level. Could you imagine slipping sentences like these into your next corporate-drivelspeak document?

The vendors have really bent over backwards to make introducing Coq as painless as possible.

We don't believe in shoving Coq down people's throats. In fact, we've found that ...</description>
		<link>http://ginstrom.com/scribbles/2008/02/22/possibly-the-best-name-for-a-programming-language-ever/</link>
			</item>
	<item>
		<title>Translating maru batsu into English</title>
		<description>Japanese has a very handy shorthand for rating things:


symbolpronunciationmeaning

  ◎nijuu maruexcellent


  〇marugood


  △sankakufair


  ×batsupoor



I avoid using these symbols in my English translations. Even if a legend is included, I think they're too "foreign" to be easily understood by non-Japanese speakers.

The "translations" I use depend on the ...</description>
		<link>http://ginstrom.com/scribbles/2008/02/19/translating-maru-batsu-into-english/</link>
			</item>
	<item>
		<title>Intermediate Python: Pythonic file searches</title>
		<description>It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very pythonic -- that is, it doesn't use Python idioms that experienced programmers use.

The problems with un-pythonic code are that it tends to be ...</description>
		<link>http://ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/</link>
			</item>
	<item>
		<title>The partial rewrite</title>
		<description>I haven't been doing much blogging lately. Instead, I've been busy working and hacking. On the hacking side of things, I've been doing a partial rewrite of my big application, a translation-memory application written in C++.

The application is now about 10 years old, and over time and several releases it's ...</description>
		<link>http://ginstrom.com/scribbles/2008/02/13/the-partial-rewrite/</link>
			</item>
	<item>
		<title>Text speak conversion in Python!</title>
		<description>Want to write in txt spk like all the cool mobile-toting kids, but tired of figuring out which letters to leave out? No problem -- just run your text through the handy to_txt_spk function!


&#62;&#62;&#62; def to_txt_spk(words):
	return "".join(c for c in words if c not in "aeiou")

&#62;&#62;&#62; to_txt_spk("Impress your friends with ...</description>
		<link>http://ginstrom.com/scribbles/2008/02/07/text-speak-conversion-in-python/</link>
			</item>
	<item>
		<title>Version 0.2 of subdist module released</title>
		<description>Just a quick note that I've released version 0.2 of my subdist module.

What is subdist?

subdist is a C Python extension that calculates fuzzy substring matches, based on Levenshtein distance.

subdist works purely with Unicode strings; calling one of its functions with a non-Unicode string will raise an error.

What's new in version ...</description>
		<link>http://ginstrom.com/scribbles/2007/12/16/version-02-of-subdist-module-released/</link>
			</item>
	<item>
		<title>Aim high</title>
		<description>Alex Martelli has a great quote on optimization in Python in a Nutshell:

Start by designing, coding, and testing your application in Python, using available extension modules if they save you work. This takes much less time than it would with a classic compiled language. Then benchmark the application to find ...</description>
		<link>http://ginstrom.com/scribbles/2007/12/14/aim-high/</link>
			</item>
	<item>
		<title>The past, present, and future of optimization</title>
		<description>I have a relative ("Dan") who used to earn a living optimizing code in the late 70s and early 80s. Around then, a new-fangled high-level language named "C" was starting to catch on, but companies didn't like all the wasted cycles in C programs due to the under-optimized assembly code ...</description>
		<link>http://ginstrom.com/scribbles/2007/12/11/the-past-present-and-future-of-optimization/</link>
			</item>
	<item>
		<title>The machine-translation pipe dream</title>
		<description>An article in Sankei News about  NEC putting machine translation onto mobile phones (Japanese) has created a bit of buzz on Honyaku (a mailing list for J&#60;&#62;E translators).

Every time some new development in the machine translation world comes out, translators start to worry about whether they're going to be ...</description>
		<link>http://ginstrom.com/scribbles/2007/12/06/the-machine-translation-pipe-dream/</link>
			</item>
	<item>
		<title>Extending Python with C: A case study</title>
		<description>Near-100x speedup with a C extension

I recently wrote about an algorithm for fuzzy matching of substrings implemented in Python. This is a feature that I needed for a piece of software I'm currently developing.

When I started using the fuzzy_substring function on some test cases, however, it was unacceptably slow. Using ...</description>
		<link>http://ginstrom.com/scribbles/2007/12/02/extending-python-with-c-a-case-study/</link>
			</item>
</channel>
</rss>
