<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>The GITS Blog</title>
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<pubDate>Sat, 17 May 2008 00:53:04 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>
	<language>en</language>
			<item>
		<title>Counting words (etc.) in an HTML file with Python</title>
		<link>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/#comments</comments>
		<pubDate>Sat, 17 May 2008 00:50:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</guid>
		<description><![CDATA[In a previous post, I wrote about how to count words, characters, and Asian characters using python.
In this post I want to pull that together with code to get a word count from an HTML file.
What needs counting
What needs counting depends to some extent on what you need the word count for, but here I'm [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous post, I wrote about <a href="/scribbles/2007/10/06/counting-words-characters-and-asian-characters-with-python/">how to count words, characters, and Asian characters using python</a>.</p>
<p>In this post I want to pull that together with code to get a word count from an HTML file.</p>
<h2>What needs counting</h2>
<p>What needs counting depends to some extent on what you need the word count for, but here I'm going to be assuming that the word count is going to be used to count billable/localizable content.</p>
<p>In that scenario, you've got to count the text in the title tag, as well as the visible text in the body, and certain other localizable content: <code>img</code> <code>alt</code> attributes, <code>a</code> <code>title</code> attributes, and <code>input</code> <code>value</code> attributes (am I missing any?).</p>
<h2>The Code</h2>
<p>The code for counting the actual text is in the above link. Here we need code to extract the text from the HTML file, and to accumulate the counts for all the chunks we've extracted.</p>
<p>Here's the Segment class for accumulating counts:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">class</span> Segment<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Represents a text segment.<br />
&nbsp; &nbsp; (For bookkeeping)<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, text=<span class="st0">&quot;&quot;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot; text is the segment of text we will calculate.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Leave it empty if this will be a master count for a document<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; @param text: The text of the segment<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> = <span class="kw2">len</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; num_spaces = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> x.<span class="me1">isspace</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> = <span class="kw2">self</span>.<span class="me1">characters</span> - num_spaces<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> is_asian<span class="br0">&#40;</span>x<span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> = non_j_len<span class="br0">&#40;</span>text<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> = <span class="kw2">self</span>.<span class="me1">non_asian_words</span> + <span class="kw2">self</span>.<span class="me1">asian_chars</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> accumulate<span class="br0">&#40;</span><span class="kw2">self</span>, seg<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Add the stats from &lt;seg&gt; to this one.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Use this to keep a count for the entire document;<br />
&nbsp; &nbsp; &nbsp; &nbsp; use another for the whole batch of documents<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; @param seg: The segment to accumulate<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg = Segment(u&quot;</span><span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg2 = Segment(u&quot;</span>abc<span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.accumulate(seg2)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.words<br />
&nbsp; &nbsp; &nbsp; &nbsp; 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.characters<br />
&nbsp; &nbsp; &nbsp; &nbsp; 3<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> += seg.<span class="me1">words</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> += seg.<span class="me1">characters</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> += seg.<span class="me1">chars_no_spaces</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> += seg.<span class="me1">asian_chars</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> += seg.<span class="me1">non_asian_words</span></div>
<p>Next, the code for extracting (segmenting) the text from an HTML file. For this, you'll need <a href="http://www.crummy.com/software/BeautifulSoup/">the excellent Beautiful Soup module</a>.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulSoup as bsoup<br />
<span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulStoneSoup<br />
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> normalize<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Normalize whitepace in C{text}.<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; normalize(u&quot;</span> &nbsp; spam\\n\\tspam &nbsp; SPAM<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam spam SPAM'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> u<span class="st0">' '</span>.<span class="me1">join</span><span class="br0">&#40;</span>text.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">class</span> Segmenter<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter<br />
&nbsp; &nbsp; Retrieves the editable/translatable text from an HTML document.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Set up various regular expressions for splitting the text&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;|&quot;</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;body*?&gt;|&lt;/body&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;a[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/a&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;img[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/img&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;input[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/input&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;script*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*?&lt;/script&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;form[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/form&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strip out unsightly tags before heading to the splitter&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">splitter</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'|'</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;p*?&gt;|&lt;/p&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;div*?&gt;|&lt;/div&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;td*?&gt;|&lt;/td&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;li*?&gt;|&lt;/li&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;h<span class="es0">\d</span>*?&gt;|&lt;/h<span class="es0">\d</span>&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dd*?&gt;|&lt;/dd&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dt*?&gt;|&lt;/dt&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;br*?&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Split segments by certain tags (removing tags in bargain)<br />
&nbsp; &nbsp; &nbsp; &nbsp; These tags indicate a segment boundary&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">charset_finder</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'[<span class="es0">\s</span><span class="es0">\S</span>]*&lt;meta[<span class="es0">\s</span><span class="es0">\S</span>]*?charset<span class="es0">\s</span>*=<span class="es0">\s</span>*([<span class="es0">\S</span>]+)&quot;[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*'</span>, <span class="kw3">re</span>.<span class="me1">I</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find the charset if necessary&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = <span class="kw2">None</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__str__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;So we can tell which segger we have (assuming multiple segmenter classes)&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;HTML&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> get_chunks<span class="br0">&#40;</span><span class="kw2">self</span>, html_text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Extract the text from the HTML file&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = bsoup<span class="br0">&#40;</span>html_text, fromEncoding=<span class="kw2">self</span>.<span class="me1">getEncoding</span><span class="br0">&#40;</span>html_text<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># document title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; title = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>.<span class="me1">title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> title:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> title.<span class="kw3">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># image alt attributes, anchor title attributes, input value attributes</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag, attr <span class="kw1">in</span> <span class="br0">&#40;</span><span class="br0">&#40;</span>u<span class="st0">&quot;img&quot;</span>, u<span class="st0">&quot;alt&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;a&quot;</span>, u<span class="st0">&quot;title&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;input&quot;</span>, u<span class="st0">&quot;value&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">getAttributes</span><span class="br0">&#40;</span>tag, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw3">chunk</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> <span class="kw3">chunk</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Parse the body text</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; text = <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span>.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, <span class="kw2">unicode</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">splitter</span>.<span class="me1">split</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; normal = normalize<span class="br0">&#40;</span>html2plain<span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> normal:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> normal<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> getAttributes<span class="br0">&#40;</span><span class="kw2">self</span>, tagName, attrName<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get all attrName values for tagName tags&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; attrs = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; tags = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">findAll</span><span class="br0">&#40;</span>tagName<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag <span class="kw1">in</span> tags:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attr = tag<span class="br0">&#91;</span>attrName<span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> attr:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attrs.<span class="me1">append</span><span class="br0">&#40;</span>attr<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">except</span> <span class="kw2">KeyError</span>, e:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">#print &quot;Tag %s does not have attribute %s&quot; % (tagName, attrName)</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">pass</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> attrs<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> getEncoding<span class="br0">&#40;</span><span class="kw2">self</span>, text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Retrieve the encoding META tag, if present&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; m = <span class="kw2">self</span>.<span class="me1">charset_finder</span>.<span class="me1">match</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> m:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> m.<span class="me1">groups</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">None</span></p>
<p>
TAG_STRIPPER = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;&lt;[!<span class="es0">\w</span>/][<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;&quot;</span>, <span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> strip_tags<span class="br0">&#40;</span>line<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;strip the HTML tags from the line<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; strip_tags(u&quot;</span>&lt;b&gt;spam&lt;/b&gt;<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam'<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> TAG_STRIPPER.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, line<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> html2plain<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strips out tags from HTML text<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('spam &lt;b&gt;eggs&lt;/b&gt;')<br />
&nbsp; &nbsp; u'spam<span class="es0">\\</span>xa0eggs'<br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('&#8211;&gt;')<br />
&nbsp; &nbsp; u'&#8211;&gt;'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; entities = BeautifulStoneSoup.<span class="me1">HTML_ENTITIES</span><br />
&nbsp; &nbsp; text = <span class="kw2">unicode</span><span class="br0">&#40;</span>BeautifulStoneSoup<span class="br0">&#40;</span>strip_tags<span class="br0">&#40;</span>text<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; convertEntities=entities<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> text.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;gt;&quot;</span>, <span class="st0">&quot;&gt;&quot;</span><span class="br0">&#41;</span>.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;lt;&quot;</span>, <span class="st0">&quot;&lt;&quot;</span><span class="br0">&#41;</span></div>
<p>And here's some code to get the actual wordcount:</p>
<div class="dean_ch" style="white-space: nowrap;">
&nbsp; &nbsp; wordcount = docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; segger = htmlseg.<span class="me1">Segmenter</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> segger.<span class="me1">get_chunks</span><span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;thefile.html&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wordcount.<span class="me1">accumulate</span><span class="br0">&#40;</span>docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<p>Here are the <a href="/code/html_wordcount.tar.gz">docstats and htmlseg modules</a>, and here is an <a href="http://felix-cat.com/tools/wordcount/">online tool using the code for the HTML word counts</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The invisible translator</title>
		<link>http://ginstrom.com/scribbles/2008/05/15/the-invisible-translator/</link>
		<comments>http://ginstrom.com/scribbles/2008/05/15/the-invisible-translator/#comments</comments>
		<pubDate>Thu, 15 May 2008 03:29:03 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/05/15/the-invisible-translator/</guid>
		<description><![CDATA[By the nature of our profession, translators are generally invisible when they're doing their jobs right.
I say "generally" because this isn't quite a universal truth. For example, unlike in the United States, Japan is a country where a movie subtitle translator (and arguably not even a stellar one) can become a television celebrity. But that's [...]]]></description>
			<content:encoded><![CDATA[<p>By the nature of our profession, translators are generally invisible when they're doing their jobs right.</p>
<p>I say "generally" because this isn't quite a universal truth. For example, unlike in the United States, Japan is a country where a <a href="http://en.wikipedia.org/wiki/Natsuko_Toda">movie subtitle translator</a> (and arguably <a href="http://search.japantimes.co.jp/member/member.html?fd20030112tc.htm">not even a stellar one</a>) can become a television celebrity. But that's a post for a different time (and a different blogger!).</p>
<p>So let's assume that a translator who has done her job is effectively invisible &#8212; there is no awareness of her presence between the author and reader. The implication is that despite the ubiquity of translation, the general public has a poor awareness of it.</p>
<p>This makes customer education pretty difficult, to say the least. Yes, we can sometimes find savvy consumers of translation, but most of the time they're in turn beholden to an ignorant consumer of translation down the line.</p>
<p>It also makes marketing translation as a profession a losing proposition.</p>
<p>Not be be a pessimist, but I don't see the market changing any time in the near future. Translation has been around a long time, and attitudes haven't really changed. The best we can do as translators is insulate ourselves from the ignorant consumers as best we can by surrounding ourselves with the savvy ones, and slowly, slowly educating our existing customers and culling out the bad ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/05/15/the-invisible-translator/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Speeding up search on Honyaku archive site</title>
		<link>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/</link>
		<comments>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 11:28:48 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/</guid>
		<description><![CDATA[Last summer, I launched a new archive site for the Honyaku mailing list.
The site is written in Python using the django framework, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.
Lately, however, the searches have been taking a huge amount of time. [...]]]></description>
			<content:encoded><![CDATA[<p>Last summer, I launched a <a href="http://honyaku-archive.org/">new archive site</a> for the <a href="http://groups.google.com/group/honyaku">Honyaku mailing list</a>.</p>
<p>The site is written in Python using the <a href="http://www.djangoproject.com/">django framework</a>, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.</p>
<p>Lately, however, the searches have been taking a huge amount of time. Sometimes they would even time out. It makes sense, since I've got more than 216,000 emails in there now, and <code>body__icontains</code> isn't exactly a speed demon.</p>
<p>But it was also taking forever just to get the posts for a given day. That was pretty easy to solve, though: duh, create an index on the <code>date_sent</code> field. So simple I never thought about it until the system was bogging down like a <a href="http://www.daylife.com/photo/02V857Ddoe3Gu">Golden Week traffic jam</a>.</p>
<p>That solved the date problem, but my text search problem remained. In the end, I had to create a full-text index for the <a href="http://honyaku-archive.org/search/">simple search</a>. This solved the speed problem &#8212; queries take a second or two now &#8212; but the problem with MySQL's full-text index is that it has lousy support for Japanese text (which isn't delimited by spaces). For that reason, I kept the old, slow search method for the <a href="http://honyaku-archive.org/advanced-search/">advanced search</a>. If you use that,  I recommend narrowing the search rather than just entering some body text.</p>
<p>In the end, I'm going to have to bite the bullet and install some kind of n-gram indexing scheme that will support Japanese. Right now, though, I simply don't have the time.</p>
<p>As a stopgap measure, I added a <a href="http://www.google.com/coop/cse?cx=001297244641614827125%3Ashsg2vj5xwk">Google search for the Honyaku archive</a>. Google doesn't seem to have indexed the site yet (I just took the main archive out of the robots.txt file), but when it does it'll be a quick way to search with good Japanese support. They even have a gadget that I can put on the Honyaku archive site, but I can't get it to keep the height I set for it.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Do the math</title>
		<link>http://ginstrom.com/scribbles/2008/03/23/do-the-math/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/23/do-the-math/#comments</comments>
		<pubDate>Sat, 22 Mar 2008 15:01:54 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/23/do-the-math/</guid>
		<description><![CDATA[I don't like doing so-called "native checks," or proofing other translators' work in general.  It just turns into a bad experience way too often. 
I'm candid about this with clients. I tell them I prefer not to do that sort of work. Sometimes they ask anyway, and if they're good clients (i.e. they send [...]]]></description>
			<content:encoded><![CDATA[<p>I don't like doing so-called "<a href="/scribbles/2007/09/04/why-i-hate-doing-native-checks/">native checks</a>," or proofing other translators' work in general.  It just turns into a bad experience way too often. </p>
<p>I'm candid about this with clients. I tell them I prefer not to do that sort of work. Sometimes they ask anyway, and if they're good clients (i.e. they send me lots of the kind of work that I like), then sometimes I'm not too busy to do it.</p>
<p>The other day, a client I occasionally work for (the one that was having <a href="/scribbles/2008/03/01/delivering-the-bad-news/">quality issues with another translator</a>) called me and asked me to proof their new translator's work.</p>
<p>Feeling vaguely sorry for them because of their lousy past translations, I said I'd do it. They asked me to quote a rate for doing this proofing work, and I quoted roughly one third my translation rate, softy that I am (a lot of people charge half).</p>
<p>Now, this client obviously knows what I charge them for translation. From my turnaround, they should also have a decent idea of what I earn per day. And they know that I don't like proofing. So when I gave them my rate, did they thank me for the great deal I was cutting them? No. Instead I get the sucking of teeth, and asking me to work for an hourly rate instead of the per-word rate I quoted, for what amounts to less than a third of what I can make translating.</p>
<p>Let's see: translate for some other client doing work I enjoy, or work for less than a third of the money doing something I hate. Tough call!</p>
<p>Come to think of it, the reason for this company's quality problems have now become a bit clearer.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/23/do-the-math/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What price elegance?</title>
		<link>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 04:03:03 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/21/what-price-elegance/</guid>
		<description><![CDATA[In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.
The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative [...]]]></description>
			<content:encoded><![CDATA[<p><a href="/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/">In a recent post</a>, I gave some code for counting the top n most frequent words in an arbitrary text file using <a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby.</a></p>
<p>The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative style using <a href="http://docs.python.org/lib/defaultdict-objects.html">collections.defaultdict</a>.</p>
<p>Here are the two functions:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict</p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></div>
<p>Here are the helper functions:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
<p>The groupby version is shorter than the defaultdict version, and I'd say that it's simpler and more readable as well. Because it's shorter, the groupby version is less likely to contain bugs. In particular, the defaultdict version has a mutable local variable (used as an accumulator in the for loop), which is a classic source of bugs. The groupby version is also likely to be easier to maintain because it's shorter and simpler.</p>
<p>But the defaultdict version of the function winds up being considerably faster.</p>
<p>The times it took to run these functions 10 times on my computer, retrieving the top 50 most frequent words for "/python25/readme.txt", are as follows (seconds rounded to 4 decimal places).</p>
<table>
<tr>
<th>&nbsp;</th>
<th>Without psyco</th>
<th>With psyco</th>
</tr>
<tr>
<th align="left">groupby version</th>
<td align="center"><font color="red">0.3133 s</font></td>
<td align="center"><font color="red">0.2193 s</font></td>
</tr>
<tr>
<th align="left">defaultdict version</th>
<td align="center"><font color="green">0.2852 s</font></td>
<td align="center"><font color="green">0.1818 s</font></td>
</tr>
<tr>
<th align="left">groupby / defaultdict</th>
<td align="center">1.41</td>
<td align="center">1.58</td>
</tr>
</table>
<p>The defaultdict version is 1.4x faster than the groupby version. This gap grows even further when psyco is used, making the defaultdict version nearly 1.6x as fast. I'd say that most of the reason for the slowness is that the groupby version of the function performs two sorts, compared to one sort in the defaultdict version.</p>
<p>(The psyco speedup for the defaultdict version comes from the for loop; changing <code>get_words</code> to return a generator expression eliminates the speedup. The speedup for the groupby version comes from the <code>freq</code> <a href="http://docs.python.org/tut/node7.html#SECTION007140000000000000000">list comprehension</a>; changing this to a generator expression eliminates its speedup.)</p>
<h3>So which one should I use?</h3>
<p>It's pretty common for Python code written in a functional style to be slower than equivalent code written in an imperative style. Nevertheless, I tend to prefer the more functional style of programming, switching to a more imperative style (or <a href="/scribbles/2007/12/02/extending-python-with-c-a-case-study/">other forms of optimization</a>) if performance isn't satisfactory.</p>
<blockquote><p>It is easier to optimize correct code, than correct optimized code.
</p></blockquote>
<p align="right"><em>&#8211;Yves Deville</em></p>
<p>A big question here is how to tell if the functional version is fast enough. My general rule of thumb is that the user would be prepared to wait up to two seconds for a typical "grovel through these files and tell me something interesting" command that's performed infrequently (how frequently do you need to get word frequencies from files?). For a more common action, the wait time should be under a second, with < .5 seconds being optimal (this includes GUI responsiveness but not Web page loading).</p>
<p>Given the times above, and assuming that the user will search no more than 50 files of sizes comparable to <a href="http://svn.python.org/view/python/branches/release25-maint/README?rev=59483">Python's README file</a>, then either version of the function is sufficient. If we assume that the user will search up to 100 files, or files substantially larger than the README file, then only the imperative version is acceptable (and we may need to optimize this further if our demands are higher than this).</p>
<p>That's why it's so important to profile and test Python programs from the very beginning. I keep a suite of test cases that I profile with every build (performed at least daily), noting trends in performance and optimizing when the code has gelled and bottlenecks remain.</p>
<p>Here is the test code:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>The whole shebang:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
Testing functional programming stuff<br />
&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict<br />
<span class="kw1">import</span> <span class="kw3">re</span><br />
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Counting occurrences in a sequence with itertools.groupby</title>
		<link>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 05:29:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</guid>
		<description><![CDATA[itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.
Here are some examples from the interactive interpreter.
A list of numbers

&#62;&#62;&#62; # Create a random list of numbers
&#62;&#62;&#62; from random import random
&#62;&#62;&#62; numbers = &#91;int&#40;random&#40;&#41; * 10&#41; for x in range&#40;20&#41;&#93;
&#62;&#62;&#62; numbers
&#91;8, 0, 3, 2, 3, 9, 8, 2, 8, 3, 0, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby</a> is a great tool for counting the numbers of occurrences in a sequence.</p>
<p>Here are some examples from the interactive interpreter.</p>
<h3>A list of numbers</h3>
<div class="dean_ch" style="white-space: nowrap;">
&gt;&gt;&gt; <span class="co1"># Create a random list of numbers</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">random</span> <span class="kw1">import</span> <span class="kw3">random</span><br />
&gt;&gt;&gt; numbers = <span class="br0">&#91;</span><span class="kw2">int</span><span class="br0">&#40;</span><span class="kw3">random</span><span class="br0">&#40;</span><span class="br0">&#41;</span> * <span class="nu0">10</span><span class="br0">&#41;</span> <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">range</span><span class="br0">&#40;</span><span class="nu0">20</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; numbers<br />
<span class="br0">&#91;</span><span class="nu0">8</span>, <span class="nu0">0</span>, <span class="nu0">3</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">9</span>, <span class="nu0">8</span>, <span class="nu0">2</span>, <span class="nu0">8</span>, <span class="nu0">3</span>, <span class="nu0">0</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">8</span>, <span class="nu0">6</span>, <span class="nu0">5</span>, <span class="nu0">3</span>, <span class="nu0">6</span>, <span class="nu0">1</span>, <span class="nu0">8</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># Now create a dictionary of numbers and numbers </span><br />
&gt;&gt;&gt; <span class="co1"># of occurrences. Feed generator expression of </span><br />
&gt;&gt;&gt; <span class="co1"># (number, frequency) pairs to dict().</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
&gt;&gt;&gt; valdict = <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>k, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>numbers<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> key, val <span class="kw1">in</span> valdict.<span class="me1">items</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> key, <span class="st0">&quot;:&quot;</span>, val</p>
<p>&nbsp; &nbsp; <br />
<span class="nu0">0</span> : <span class="nu0">2</span><br />
<span class="nu0">1</span> : <span class="nu0">1</span><br />
<span class="nu0">2</span> : <span class="nu0">3</span><br />
<span class="nu0">3</span> : <span class="nu0">5</span><br />
<span class="nu0">5</span> : <span class="nu0">1</span><br />
<span class="nu0">6</span> : <span class="nu0">2</span><br />
<span class="nu0">8</span> : <span class="nu0">5</span><br />
<span class="nu0">9</span> : <span class="nu0">1</span></div>
<p>And a function that does this for any iterable:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> count_occurrences<span class="br0">&#40;</span>iterable<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;return a dictionary with items and numbers of occurrences<br />
&nbsp; &nbsp; in iterable&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>item, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>group<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> item, group<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>iterable<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h3>Top 20 most frequent words in a file</h3>
<div class="dean_ch" style="white-space: nowrap;">
&gt;&gt;&gt; <span class="co1"># get a wordlist from the Python README</span><br />
&gt;&gt;&gt; text = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;/python25/readme.txt&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words<span class="br0">&#91;</span>:<span class="nu0">5</span><span class="br0">&#93;</span><br />
<span class="br0">&#91;</span><span class="st0">'this'</span>, <span class="st0">'is'</span>, <span class="st0">'python'</span>, <span class="st0">'version'</span>, <span class="st0">'2.5.2&#8242;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># get the frequency list, using DSU to sort top words</span><br />
&gt;&gt;&gt; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># sort the freqs, get last 20, and reverse </span><br />
&gt;&gt;&gt; <span class="co1"># to put most frequent first</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">-20</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s %s&quot;</span> % <span class="br0">&#40;</span>b.<span class="me1">ljust</span><span class="br0">&#40;</span><span class="nu0">7</span><span class="br0">&#41;</span>, <span class="kw2">str</span><span class="br0">&#40;</span>a<span class="br0">&#41;</span>.<span class="me1">rjust</span><span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <br />
the &nbsp; &nbsp; <span class="nu0">442</span><br />
to &nbsp; &nbsp; &nbsp;<span class="nu0">227</span><br />
<span class="kw1">is</span> &nbsp; &nbsp; &nbsp;<span class="nu0">127</span><br />
<span class="kw1">and</span> &nbsp; &nbsp; <span class="nu0">127</span><br />
you &nbsp; &nbsp; <span class="nu0">118</span><br />
a &nbsp; &nbsp; &nbsp; <span class="nu0">117</span><br />
of &nbsp; &nbsp; &nbsp;<span class="nu0">110</span><br />
<span class="kw1">in</span> &nbsp; &nbsp; &nbsp;<span class="nu0">107</span><br />
<span class="kw1">for</span> &nbsp; &nbsp; &nbsp;<span class="nu0">94</span><br />
python &nbsp; <span class="nu0">81</span><br />
on &nbsp; &nbsp; &nbsp; <span class="nu0">79</span><br />
<span class="kw1">if</span> &nbsp; &nbsp; &nbsp; <span class="nu0">77</span><br />
this &nbsp; &nbsp; <span class="nu0">72</span><br />
<span class="kw1">or</span> &nbsp; &nbsp; &nbsp; <span class="nu0">62</span><br />
be &nbsp; &nbsp; &nbsp; <span class="nu0">58</span><br />
with &nbsp; &nbsp; <span class="nu0">56</span><br />
it &nbsp; &nbsp; &nbsp; <span class="nu0">53</span><br />
are &nbsp; &nbsp; &nbsp;<span class="nu0">53</span><br />
that &nbsp; &nbsp; <span class="nu0">52</span><br />
as &nbsp; &nbsp; &nbsp; <span class="nu0">47</span></div>
<p>Here's a function that will do this.</p>
<div class="dean_ch" style="white-space: nowrap;">
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> get_top_freqs<span class="br0">&#40;</span>filename, num=<span class="nu0">20</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; text = <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; freqs = <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span> <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Making the robot dance</title>
		<link>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 14:33:39 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/</guid>
		<description><![CDATA[Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing Hunt the Wumpus, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. After a lot of typos, [...]]]></description>
			<content:encoded><![CDATA[<p>Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing <a href="http://en.wikipedia.org/wiki/Hunt_the_Wumpus">Hunt the Wumpus</a>, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. After a lot of typos, we finally got the program working: a crudely drawn robot appeared on the side of the screen, and moved to the center.</p>
<p>Then my friend found the section on modifying the code to make the robot "dance" (basically move its arms up and down a few times after it got to the middle of the screen). I thought that was just about the coolest thing I'd ever seen. I started tweaking the program in various ways, turning the program into a mini disco, with flashing colors and sounds.</p>
<p>I was hooked. I started staying in at lunch so I could have time with the computer. I also pestered my mom to buy me a computer. And she did: a Tandy PC with 3KB of memory for programs, and a tape cassette deck for memory storage. It was great. One of my more ambitious projects was tweaking the horse race game that came with the computer to have odds and varying payoffs like at the tracks. I also made a "lemonade stand" clone. I think the only non-game I made was a "conversation" program that would chat with you and say different things depending on what you typed. It was very simple, of course: like if you typed "HELLO," it would respond, "HELLO, HOW ARE YOU?" (For some reason, I think the BASIC environment only had capital letters.)</p>
<p>In college, I would go on to learn more advanced programming languages, and fancy concepts like finite state theory and formal logic. But that same joy was always there: the joy of getting the computer to do my bidding. At one point I wrote a program to parse English text, and just like when I was 11 and wrote that "conversation" program, it was just so neat to have the computer respond to my input.</p>
<p>Now, I'm a technical translator, but I also program. Although I do program professionally to some extent, most of my programming is just for the pure enjoyment of it. I actually think that's better than being paid to program, because I have a lot more freedom about what I do.</p>
<p>I've often tried to pin down just what is so interesting about controlling a computer. My gut feeling is that it's just intrinsically fun &#8212; who wouldn't want to program computers? But of course that's not true: most people think that programming computers is really boring. What makes programming interesting to some and boring to others? Maybe it's a certain way of thinking. Or maybe it's ability: some people "get" programming, and since we tend to like what we're good at, we just gravitate to it.</p>
<p>Of course, there was always the tedious side of programming. Back when I was 11, it was copying programs out of books using hunt and peck, and loading in my programs from cassette tapes. Now, it's fiddling with user interfaces and installers and interoperability. But it's all worth it when I run the program and get to see the robot dance.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Using chardet to convert arbitrary byte strings to Unicode</title>
		<link>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/#comments</comments>
		<pubDate>Sat, 08 Mar 2008 02:24:36 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</guid>
		<description><![CDATA[chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.
Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.

import chardet
def bytes2unicode&#40;bytes, errors='replace'&#41;:
&#160; &#160; &#34;&#34;&#34;Convert a byte string into Unicode.
&#160; &#160; First checks [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chardet.feedparser.org/">chardet</a> is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a <a href="http://en.wikipedia.org/wiki/Byte_Order_Mark">BOM</a> to pretty reliably turn them into Unicode.</p>
<p><strong>Edit:</strong> Thanks to Kirit's comment below, I added code to check for UTF-32.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> chardet</p>
<p><span class="kw1">def</span> bytes2unicode<span class="br0">&#40;</span>bytes, errors=<span class="st0">'replace'</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Convert a byte string into Unicode.<br />
&nbsp; &nbsp; First checks for a BOM, and if one is found returns<br />
&nbsp; &nbsp; the Unicode text minus the BOM. If there is no BOM,<br />
&nbsp; &nbsp; falls back to chardet.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; encoding_map = <span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ef<span class="es0">\x</span>bb<span class="es0">\x</span>bf'</span>, <span class="st0">'utf-8&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe<span class="es0">\0</span><span class="es0">\0</span>'</span>, <span class="st0">'utf-32&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\0</span><span class="es0">\0</span><span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-32BE'</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe'</span>, <span class="st0">'utf-16&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-16BE'</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> bom, encoding <span class="kw1">in</span> encoding_map:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> bytes.<span class="me1">startswith</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes<span class="br0">&#91;</span><span class="kw2">len</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<span class="br0">&#93;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;encoding,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;errors=errors<span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="co1"># No BOM found, so use chardet</span><br />
&nbsp; &nbsp; detection = chardet.<span class="me1">detect</span><span class="br0">&#40;</span>bytes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; encoding = detection.<span class="me1">get</span><span class="br0">&#40;</span><span class="st0">'encoding'</span><span class="br0">&#41;</span> <span class="kw1">or</span> <span class="st0">'utf-16&#8242;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes, encoding, errors=errors<span class="br0">&#41;</span></div>
<p>Usage:</p>
<div class="dean_ch" style="white-space: nowrap;">
text = bytes2unicode<span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>, <span class="st0">'replace'</span><span class="br0">&#41;</span></div>
<h3>Discussion: Why check for a BOM?</h3>
<p>You might ask, why check for a BOM if chardet already does this? This is because although chardet will correctly detect the BOM, it won't tell you that it found it, so you won't know to chop it off before processing the text. Which means that you'd have to check for a BOM anyway in most cases.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Delivering the bad news</title>
		<link>http://ginstrom.com/scribbles/2008/03/01/delivering-the-bad-news/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/01/delivering-the-bad-news/#comments</comments>
		<pubDate>Sat, 01 Mar 2008 03:01:22 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/01/delivering-the-bad-news/</guid>
		<description><![CDATA[A few weeks ago, a translation agency I work for occasionally called me in a panic. It seems that a major client had rejected one of their Japanese-to-English translations, calling it "unreadable," and providing another translation as a sample of the quality they were after.
The agency wanted to pay me to review their translation, and [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, a translation agency I work for occasionally called me in a panic. It seems that a major client had rejected one of their Japanese-to-English translations, calling it "unreadable," and providing another translation as a sample of the quality they were after.</p>
<p>The agency wanted to pay me to review their translation, and the sample provided by the client, and point out specifically what the quality problems were. They wanted to feed this back to their translator, who had been with them for several years and never had any quality complaints.</p>
<p>Being kind of a sucker, I agreed.</p>
<p>The original translation was indeed a big steaming pile. It looked like the (native Japanese-speaker) translator had spent a lot of time translating Japanese into English, but had never actually seen an English document.</p>
<p>The first thing that jumped out was that the file was full of double-byte characters. There were double-byte spaces mixed in, and instead of using a space and parenthesis ("hello (world)"), the translator had used double-byte parentheses ("hello（world）").</p>
<p>It was like a demonstration of <a href="/translation/tech_writing.php">all the things</a> I have said to <a href="/translation/translation_pitfalls.php">avoid in J-E translation</a>. The English text itself was, as the client complained, unreadable. It was as if the translator had semi-randomly chosen an English translation for each Japanese word out of a dictionary, and mashed them together into semi-grammatical sentences. If it had ever gone through a "<a href="/scribbles/2007/09/04/why-i-hate-doing-native-checks/">native checker</a>," that person had done a lousy job, because the translation was still rife with basic errors like subject-verb agreement. Completely worthless as a translation.</p>
<p>I've seen a lot in this biz, but I was shocked by this. How could such an obviously unqualified translator have made it so long without complaint? My first thought was that the translator had been using a very good checker/rewriter, and for some reason, the translator had been unable to get the checker's services and had turned in the translation as-is. But then how had the agency missed it?</p>
<p>Then I saw what "Mika Jz" said in <a href="http://honyaku-archive.org/posts/220942/">this post</a> to <a href="http://groups.google.com/group/honyaku/">the Honyaku mailing list</a> regarding <a href="http://honyaku-archive.org/posts/220935/">strange English in an email received from a client</a>:</p>
<blockquote><p>英語の読解ができる日本人でも、<br />
それを見て「よく書けている、いい英文だ。」<br />
と感じる人は多いのではないでしょうか。</p>
<p>それのどこがおかしいのか、については<br />
おそらく、全く通じていないと思います。</p></blockquote>
<p>And another possibility occurred to me: maybe the agency and its clients had been using "native Japanese speakers with good English skills" to proof this translator's translations. And since every Japanese word was translated, and every translation could be found in a bilingual dictionary, they must have thought that they were fine. </p>
<p>So I had a couple of possible explanations, but the question was how to deliver the bad news. If someone had asked me to simply evaluate the translation, I'd have said it was unreadable and useless, end of story. But the agency wanted feedback, presumably so the translator could improve. What to do? I personally don't think that this translator will be up to producing professional-level English text for many years (if ever), but I had to put it in a somewhat more diplomatic way. </p>
<p>So I wrote up a report, comparing several passages of the translation with the client's sample (which was actually quite good), pointing out errors or poor style (e.g. write "the team investigated the issue," not "the investigation of the issue was conducted by the issue-investigation team"), and finally stating that writing natural English requires an extremely advanced grasp of English that takes many, many years to acquire. Hopefully they'll get the hint.</p>
<p>Or at least make sure to get a "native checker" to rewrite all future translations by this translator.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/01/delivering-the-bad-news/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Python GUI programming platforms for Windows</title>
		<link>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/#comments</comments>
		<pubDate>Tue, 26 Feb 2008 06:00:57 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</guid>
		<description><![CDATA[[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.
There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.
Tkinter
Tkinter is the ubiquitous [...]]]></description>
			<content:encoded><![CDATA[<p><b>[Edit]</b><br />
By popular demand, I've added a section on PyGTK. See bottom of post.</p>
<p>There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.</p>
<h2>Tkinter</h2>
<p>Tkinter is the ubiquitous GUI toolkit for Python. It's cross platform and easy to use, but it looks non-native on just about every platform. There are various add-ons and improvements you can find to improve the look and feel, but the basic problem is that the toolkit implements its own widgets, rather than using the native ones provided on the platform.</p>
<h3>Pros</h3>
<ul>
<li>Most portable GUI toolkit for Python</li>
<li>Very easy to use, with pythonic API</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Non-native look and feel out of the box</li>
</ul>
<p>Hello world example <a href="http://www.shido.info/py/tkinter1.html" title="source of code snippet">(code source)</a>:<br />
<img src="/img/hello-tkinter.png" border="0"/></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">Tkinter</span> as Tk<br />
la = Tk.<span class="me1">Label</span><span class="br0">&#40;</span><span class="kw2">None</span>, text=<span class="st0">'Hello World!'</span>, font=<span class="br0">&#40;</span><span class="st0">'Times'</span>, <span class="st0">'18&#8242;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
la.<span class="me1">pack</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
la.<span class="me1">mainloop</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>wxPython</h2>
<p><a href="http://www.wxpython.org/">wxPython</a> is probably the most popular GUI toolkit for Python. It's a wrapper for the <a href="http://www.wxwidgets.org/">wxWidgets</a> C++ toolkit, and as such it betrays a few unpythonic edges (like lumpy case, getters and setters, and funky C++ errors creeping up occasionally). There are a few pythonification efforts on top of wxPython, such as <a href="http://dabodev.com/">dabo</a> and (the now apparently moribund) <a href="http://sourceforge.net/projects/waxgui">wax</a>.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Relatively mature and robust</li>
<li>Uses native Windows widgets for authentic look and feel</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Must include large wx runtime when packaging with py2exe (adds ~7 MB)</li>
<li>Cross platform nature makes accessing some native platform features (like ActiveX) difficult to impossible</li>
</ul>
<p>Hello world example <a href="http://www.goldb.org/goldblog/PermaLink,guid,d109ef8a-c3ea-4a2b-8ab7-9081c4dcc912.aspx" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-wxpython.png" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> wx</p>
<p><span class="kw1">class</span> Application<span class="br0">&#40;</span>wx.<span class="me1">Frame</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wx.<span class="me1">Frame</span>.<span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent, <span class="nu0">-1</span>, <span class="st0">'My GUI'</span>, size=<span class="br0">&#40;</span><span class="nu0">300</span>, <span class="nu0">200</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel = wx.<span class="me1">Panel</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer = wx.<span class="me1">BoxSizer</span><span class="br0">&#40;</span>wx.<span class="me1">VERTICAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel.<span class="me1">SetSizer</span><span class="br0">&#40;</span>sizer<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; txt = wx.<span class="me1">StaticText</span><span class="br0">&#40;</span>panel, <span class="nu0">-1</span>, <span class="st0">'Hello World!'</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer.<span class="me1">Add</span><span class="br0">&#40;</span>txt, <span class="nu0">0</span>, wx.<span class="me1">TOP</span>|wx.<span class="me1">LEFT</span>, <span class="nu0">20</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Centre</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Show</span><span class="br0">&#40;</span><span class="kw2">True</span><span class="br0">&#41;</span></p>
<p>app = wx.<span class="me1">App</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
Application<span class="br0">&#40;</span><span class="kw2">None</span><span class="br0">&#41;</span><br />
app.<span class="me1">MainLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>.NET with IronPython</h2>
<p><a href="http://www.codeplex.com/IronPython">IronPython</a> is a .NET implementation of Python. As of 1.0 it has full support for Python 2.4 features, and the 2.0 version will duplicate the Python 2.5 feature set. Although there are many CPython libraries/modules that won't run under IronPython (namely, the ones relying on compiled extensions that have not yet been ported), this lack is partially made up by the huge .NET library. </p>
<p>One cool thing about IronPython is that you can easily create lightweight .exe files that you can ship off to your friends &#8212; although you pay for this with a dependency on the .NET runtime, which you can't count on random Windows users to have installed.</p>
<p>Of course, when you go the IronPython route, you take all that comes with it: the good things, like access to .NET libraries and possibly the easiest/cleanest optimization path of any Python implementation (C#); and the bad things, like dependence on the .NET runtime and danger of getting caught on the MS upgrade treadmill.</p>
<p>Another way of getting at the .NET libraries is <a href="http://pythonnet.sourceforge.net/">Python.NET</a>, which adds two files to your Python directory to enable you to call the CLR from CPython.</p>
<h3>Pros</h3>
<ul>
<li>Leverage .NET libraries</li>
<li>Easily create .exe files</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Depends on .NET runtime</li>
</ul>
<p>Hello world example <a href="http://www.voidspace.org.uk/ironpython/winforms/part2.shtml" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-ipy.png" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw3">sys</span>.<span class="me1">path</span>.<span class="me1">append</span><span class="br0">&#40;</span>r<span class="st0">'C:<span class="es0">\P</span>ython24<span class="es0">\L</span>ib'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">import</span> clr<br />
clr.<span class="me1">AddReference</span><span class="br0">&#40;</span><span class="st0">&quot;System.Windows.Forms&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">from</span> System.<span class="me1">Windows</span>.<span class="me1">Forms</span> <span class="kw1">import</span> Application, Form</p>
<p><span class="kw1">class</span> HelloWorldForm<span class="br0">&#40;</span>Form<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Text</span> = <span class="st0">'Hello World'</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Name</span> = <span class="st0">'Hello World'</span></p>
<p>form = HelloWorldForm<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
Application.<span class="me1">Run</span><span class="br0">&#40;</span>form<span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>PyQT</h2>
<p><a href="http://www.riverbankcomputing.co.uk/pyqt/">PyQT</a> is probably the third most widely used GUI toolkit, after wxPython and Tkinter. It has a dual commercial/GPL license (<ins datetime="2008-02-27T22:23:05+00:00">Edit: but it does let you use other open-source licenses; see comments below</ins>). I have to admit that this made it a non-starter for me: I don't want to pay for my toolkit when there are others just as good or better that are free; <del datetime="2008-02-27T22:23:05+00:00">and when I do release open-source software, I want to choose my own license</del>. For others, the GPL might be a non-issue or a plus, so I've left it off my pro/con list.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Very easy to use</li>
<li>Highly mature</li>
<li>Decent looking widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Somewhat non-native look and feel (though much better than Tkinter)</li>
<li>Must include large runtime when packaging with py2exe</li>
</ul>
<p>Hello world example (from PyQT docs):</p>
<div><img src="/img/hello-qt.png" alt="PyQT screen shot" /></div>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw1">from</span> PyQt4 <span class="kw1">import</span> QtGui</p>
<p>app = QtGui.<span class="me1">QApplication</span><span class="br0">&#40;</span><span class="kw3">sys</span>.<span class="me1">argv</span><span class="br0">&#41;</span></p>
<p>hello = QtGui.<span class="me1">QPushButton</span><span class="br0">&#40;</span><span class="st0">&quot;Hello world!&quot;</span><span class="br0">&#41;</span><br />
hello.<span class="me1">resize</span><span class="br0">&#40;</span><span class="nu0">100</span>, <span class="nu0">30</span><span class="br0">&#41;</span></p>
<p>hello.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw3">sys</span>.<span class="me1">exit</span><span class="br0">&#40;</span>app.<span class="me1">exec_</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h2>Pyglet</h2>
<p><a href="http://www.pyglet.org/">Pyglet</a> is kind of the new kid on the block in terms of GUI toolkits, but it sure made a splash. It implements its own windowing system, but with no dependencies other than Python (for Python 2.5 users). You will need <a href="http://www.opengl.org/">OpenGL</a> to do decent 3D graphics, but that's hardly a black mark for pyglet &#8212; other libraries would love to make it this easy.</p>
<h3>Pros</h3>
<ul>
<li>High degree of freedom for GUI creation</li>
<li>Only depends on Python</li>
<li>Large number of widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Purposely doesn't duplicate the native platform look and feel</li>
<li>Although there are a lot of widgets, you'll have to roll your own for many things the platform gives you for free.</li>
</ul>
<p>Hello world example (slightly modified from <a href="http://www.pyglet.org/doc/programming_guide/hello_world.html">code source</a>):<br />
<img src="/img/hello-pyglet.png" alt="hello world with pyglet screenshot" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> pyglet <span class="kw1">import</span> font<br />
<span class="kw1">from</span> pyglet <span class="kw1">import</span> window</p>
<p>win = window.<span class="me1">Window</span><span class="br0">&#40;</span>width=<span class="nu0">300</span>, height=<span class="nu0">150</span>, caption=<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>ft = font.<span class="me1">load</span><span class="br0">&#40;</span><span class="st0">'Arial'</span>, <span class="nu0">36</span><span class="br0">&#41;</span><br />
text = font.<span class="me1">Text</span><span class="br0">&#40;</span>ft, <span class="st0">'Hello, World!'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">while</span> <span class="kw1">not</span> win.<span class="me1">has_exit</span>:<br />
&nbsp; &nbsp; win.<span class="me1">dispatch_events</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">clear</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; text.<span class="me1">draw</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">flip</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>Win32 with ctypes</h2>
<p>Of course, all you really need to write GUI applications on Windows with Python is your trusty ctypes module and a well worn copy of <a href="http://www.charlespetzold.com/pw5/">Petzold</a>. The benefit of this style is that you're working right down at the system API level, with nothing to get in your way. The disadvantage is that you're working right down at the system API level, with nothing to relieve you from all that boilerplate (unless you write your own abstraction layer on top; see Venster, below&#8230;).</p>
<h3>Pros</h3>
<ul>
<li>Enables high level of control</li>
<li>Straightforward if familiar with Win32 API</li>
<li>No added complexity or buried functionality due to need to be cross-platform</li>
<li>Lightest of all Windows GUI programming methods using Python</li>
</ul>
<h3>Cons</h3>
<ul>
<li>All the complexity and inconsistency of Win32 API in gory detail</li>
<li>Lack of high-level libraries (have to write more code)</li>
</ul>
<p>Hello world example (long, ain't it?):<br />
<img src="/img/hello-win32.png" alt="Win32 GUI screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> ctypes <span class="kw1">import</span> *<br />
<span class="kw1">import</span> win32con</p>
<p>WNDPROC = WINFUNCTYPE<span class="br0">&#40;</span>c_long, c_int, c_uint, c_int, c_int<span class="br0">&#41;</span></p>
<p>NULL = c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><br />
_user32 = windll.<span class="me1">user32</span></p>
<p><span class="kw1">def</span> ErrorIfZero<span class="br0">&#40;</span>handle<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">if</span> handle == <span class="nu0">0</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> handle</p>
<p>CreateWindowEx = _user32.<span class="me1">CreateWindowExW</span><br />
CreateWindowEx.<span class="me1">argtypes</span> = <span class="br0">&#91;</span>c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#93;</span><br />
CreateWindowEx.<span class="me1">restype</span> = ErrorIfZero</p>
<p>
<span class="kw1">class</span> WNDCLASS<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'style'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpfnWndProc'</span>, WNDPROC<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbClsExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbWndExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hInstance'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hIcon'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hCursor'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hbrBackground'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszMenuName'</span>, c_wchar_p<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszClassName'</span>, c_wchar_p<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndProc,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;style=win32con.<span class="me1">CS_HREDRAW</span> | win32con.<span class="me1">CS_VREDRAW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;clsExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;menuName=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;className=u<span class="st0">&quot;PythonWin32&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;instance=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;icon=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cursor=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;background=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> instance:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance = windll.<span class="me1">kernel32</span>.<span class="me1">GetModuleHandleW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> icon:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; icon = _user32.<span class="me1">LoadIconW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDI_APPLICATION</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> cursor:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cursor = _user32.<span class="me1">LoadCursorW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDC_ARROW</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> background:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; background = windll.<span class="me1">gdi32</span>.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">WHITE_BRUSH</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpfnWndProc</span>=wndProc<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">style</span>=style<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbClsExtra</span>=clsExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbWndExtra</span>=wndExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hInstance</span>=instance<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hIcon</span>=icon<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hCursor</span>=cursor<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hbrBackground</span>=background<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszMenuName</span>=menuName<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszClassName</span>=className</p>
<p><span class="kw1">class</span> RECT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'left'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'top'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'right'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'bottom'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, left=<span class="nu0">0</span>, top=<span class="nu0">0</span>, right=<span class="nu0">0</span>, bottom=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">left</span> = left<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">top</span> = top<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">right</span> = right<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">bottom</span> = bottom</p>
<p><span class="kw1">class</span> PAINTSTRUCT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hdc'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fErase'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rcPaint'</span>, RECT<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fRestore'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fIncUpdate'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rgbReserved'</span>, c_wchar * <span class="nu0">32</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">class</span> POINT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'x'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'y'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span> <span class="kw2">self</span>, x=<span class="nu0">0</span>, y=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">x</span> = x<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">y</span> = y<br />
&nbsp; &nbsp;<br />
<span class="kw1">class</span> MSG<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hwnd'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'message'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'wParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'time'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'pt'</span>, POINT<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp;<br />
<span class="kw1">def</span> pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Calls message loop&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; msg = MSG<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; pMsg = pointer<span class="br0">&#40;</span>msg<span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">while</span> _user32.<span class="me1">GetMessageW</span><span class="br0">&#40;</span>pMsg, NULL, <span class="nu0">0</span>, <span class="nu0">0</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">TranslateMessage</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DispatchMessageW</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> msg.<span class="me1">wParam</span></p>
<p>
<span class="kw1">class</span> Window<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Wraps an HWND handle&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, hwnd=NULL<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = hwnd<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Register event handlers</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> key <span class="kw1">in</span> <span class="kw2">dir</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; method = <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>, key<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">hasattr</span><span class="br0">&#40;</span>method, <span class="st0">&quot;win32message&quot;</span><span class="br0">&#41;</span> <span class="kw1">and</span> <span class="kw2">callable</span><span class="br0">&#40;</span>method<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers<span class="br0">&#91;</span>method.<span class="me1">win32message</span><span class="br0">&#93;</span> = method<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> GetClientRect<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = RECT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span>, byref<span class="br0">&#40;</span>rect<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> rect<br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> Create<span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; exStyle=<span class="nu0">0</span> , &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1"># &nbsp;DWORD dwExStyle</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; className=u<span class="st0">&quot;WndClass&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName=u<span class="st0">&quot;Window&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; style=win32con.<span class="me1">WS_OVERLAPPEDWINDOW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; width=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; height=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; menu=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lparam=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = CreateWindowEx<span class="br0">&#40;</span>exStyle,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; className,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; style,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; width,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; height,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; menu,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lparam<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span>.<span class="me1">hwnd</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> Show<span class="br0">&#40;</span><span class="kw2">self</span>, flag<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> _user32.<span class="me1">ShowWindow</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span>, flag<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> Update<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> _user32.<span class="me1">UpdateWindow</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> WndProc<span class="br0">&#40;</span><span class="kw2">self</span>, hwnd, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; event_handler = <span class="kw2">self</span>._event_handlers.<span class="me1">get</span><span class="br0">&#40;</span>message, <span class="kw2">None</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> event_handler:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> event_handler<span class="br0">&#40;</span>message, wParam, lParam<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> _user32.<span class="me1">DefWindowProcW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>hwnd<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>message<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>wParam<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>lParam<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="co1">## Lifted shamelessly from WCK (effbot)'s wckTkinter.bind</span><br />
<span class="kw1">def</span> EventHandler<span class="br0">&#40;</span>message<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Decorator for event handlers&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> decorator<span class="br0">&#40;</span>func<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func.<span class="me1">win32message</span> = message<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> func<br />
&nbsp; &nbsp; <span class="kw1">return</span> decorator</p>
<p><span class="kw1">class</span> HelloWindow<span class="br0">&#40;</span>Window<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;The application window&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; @EventHandler<span class="br0">&#40;</span>win32con.<span class="me1">WM_PAINT</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> OnPaint<span class="br0">&#40;</span><span class="kw2">self</span>, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Draw 'Hello World' in center of window&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; ps = PAINTSTRUCT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; hdc = _user32.<span class="me1">BeginPaint</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>, byref<span class="br0">&#40;</span>ps<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; flags = win32con.<span class="me1">DT_SINGLELINE</span>|win32con.<span class="me1">DT_CENTER</span>|win32con.<span class="me1">DT_VCENTER</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DrawTextW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>hdc<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u<span class="st0">&quot;Hello, world!&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span><span class="nu0">-1</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byref<span class="br0">&#40;</span>rect<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; flags<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">EndPaint</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>, byref<span class="br0">&#40;</span>ps<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span></p>
<p>&nbsp; &nbsp; @EventHandler<span class="br0">&#40;</span>win32con.<span class="me1">WM_DESTROY</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> OnDestroy<span class="br0">&#40;</span><span class="kw2">self</span>, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Quit app when window is destroyed&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">PostQuitMessage</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span></p>
<p><span class="kw1">def</span> RunHello<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Create window and start message loop&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># two-stage creation for Win32 windows</span><br />
&nbsp; &nbsp; hello = HelloWindow<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># register window class…</span><br />
&nbsp; &nbsp; wndclass = WNDCLASS<span class="br0">&#40;</span>WNDPROC<span class="br0">&#40;</span>hello.<span class="me1">WndProc</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; wndclass.<span class="me1">lpszClassName</span> = u<span class="st0">&quot;HelloWindow&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> _user32.<span class="me1">RegisterClassW</span><span class="br0">&#40;</span>byref<span class="br0">&#40;</span>wndclass<span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="co1"># …then create Window</span><br />
&nbsp; &nbsp; hello.<span class="me1">Create</span><span class="br0">&#40;</span> className=wndclass.<span class="me1">lpszClassName</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance=wndclass.<span class="me1">hInstance</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName=u<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># Show Window</span><br />
&nbsp; &nbsp; hello.<span class="me1">Show</span><span class="br0">&#40;</span>win32con.<span class="me1">SW_SHOWNORMAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; hello.<span class="me1">Update</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>RunHello<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>Venster</h2>
<p><a href="http://venster.sourceforge.net/htdocs/index.html">Venster</a> was a very promising wrapper over the Win32 API, borrowing heavily from WTL and ATL windowing techniques. Unfortunately, the project hasn't been updated in several years, and doesn't support the latest versions of Python (especially after ctypes.com was dropped). </p>
<h3>Pros</h3>
<ul>
<li>Rational abstraction layer on top of Win32</li>
<li>Use to write native, lightweight (relatively speaking) GUI applications</li>
<li>Has most of the cool Win32 tricks like hosting ActiveX and Coolbars</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Out of date; not updated in several years</li>
</ul>
<p>Hello world example (<a href="http://venster.sourceforge.net/htdocs/tutorial.html">code source</a>):<br />
<img src="/img/hello-venster.png" alt="Venster GUI screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> venster.<span class="me1">windows</span> <span class="kw1">import</span> *<br />
<span class="kw1">from</span> venster.<span class="me1">wtl</span> <span class="kw1">import</span> *</p>
<p><span class="kw1">from</span> venster <span class="kw1">import</span> gdi</p>
<p><span class="kw1">class</span> MyWindow<span class="br0">&#40;</span>Window<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _window_title_ = <span class="st0">&quot;Hello World&quot;</span><br />
&nbsp; &nbsp; _window_background_ = gdi.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>WHITE_BRUSH<span class="br0">&#41;</span><br />
&nbsp; &nbsp; _window_class_style_ = CS_HREDRAW | CS_VREDRAW</p>
<p>&nbsp; &nbsp; <spa