<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>The GITS Blog &#187; programming</title>
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<pubDate>Sat, 17 May 2008 00:53:04 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>
	<language>en</language>
			<item>
		<title>Counting words (etc.) in an HTML file with Python</title>
		<link>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/#comments</comments>
		<pubDate>Sat, 17 May 2008 00:50:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</guid>
		<description><![CDATA[In a previous post, I wrote about how to count words, characters, and Asian characters using python.
In this post I want to pull that together with code to get a word count from an HTML file.
What needs counting
What needs counting depends to some extent on what you need the word count for, but here I'm [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous post, I wrote about <a href="/scribbles/2007/10/06/counting-words-characters-and-asian-characters-with-python/">how to count words, characters, and Asian characters using python</a>.</p>
<p>In this post I want to pull that together with code to get a word count from an HTML file.</p>
<h2>What needs counting</h2>
<p>What needs counting depends to some extent on what you need the word count for, but here I'm going to be assuming that the word count is going to be used to count billable/localizable content.</p>
<p>In that scenario, you've got to count the text in the title tag, as well as the visible text in the body, and certain other localizable content: <code>img</code> <code>alt</code> attributes, <code>a</code> <code>title</code> attributes, and <code>input</code> <code>value</code> attributes (am I missing any?).</p>
<h2>The Code</h2>
<p>The code for counting the actual text is in the above link. Here we need code to extract the text from the HTML file, and to accumulate the counts for all the chunks we've extracted.</p>
<p>Here's the Segment class for accumulating counts:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">class</span> Segment<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Represents a text segment.<br />
&nbsp; &nbsp; (For bookkeeping)<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, text=<span class="st0">&quot;&quot;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot; text is the segment of text we will calculate.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Leave it empty if this will be a master count for a document<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; @param text: The text of the segment<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> = <span class="kw2">len</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; num_spaces = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> x.<span class="me1">isspace</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> = <span class="kw2">self</span>.<span class="me1">characters</span> - num_spaces<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> is_asian<span class="br0">&#40;</span>x<span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> = non_j_len<span class="br0">&#40;</span>text<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> = <span class="kw2">self</span>.<span class="me1">non_asian_words</span> + <span class="kw2">self</span>.<span class="me1">asian_chars</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> accumulate<span class="br0">&#40;</span><span class="kw2">self</span>, seg<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Add the stats from &lt;seg&gt; to this one.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Use this to keep a count for the entire document;<br />
&nbsp; &nbsp; &nbsp; &nbsp; use another for the whole batch of documents<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; @param seg: The segment to accumulate<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg = Segment(u&quot;</span><span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg2 = Segment(u&quot;</span>abc<span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.accumulate(seg2)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.words<br />
&nbsp; &nbsp; &nbsp; &nbsp; 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.characters<br />
&nbsp; &nbsp; &nbsp; &nbsp; 3<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> += seg.<span class="me1">words</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> += seg.<span class="me1">characters</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> += seg.<span class="me1">chars_no_spaces</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> += seg.<span class="me1">asian_chars</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> += seg.<span class="me1">non_asian_words</span></div>
<p>Next, the code for extracting (segmenting) the text from an HTML file. For this, you'll need <a href="http://www.crummy.com/software/BeautifulSoup/">the excellent Beautiful Soup module</a>.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulSoup as bsoup<br />
<span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulStoneSoup<br />
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> normalize<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Normalize whitepace in C{text}.<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; normalize(u&quot;</span> &nbsp; spam\\n\\tspam &nbsp; SPAM<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam spam SPAM'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> u<span class="st0">' '</span>.<span class="me1">join</span><span class="br0">&#40;</span>text.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">class</span> Segmenter<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter<br />
&nbsp; &nbsp; Retrieves the editable/translatable text from an HTML document.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Set up various regular expressions for splitting the text&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;|&quot;</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;body*?&gt;|&lt;/body&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;a[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/a&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;img[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/img&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;input[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/input&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;script*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*?&lt;/script&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;form[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/form&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strip out unsightly tags before heading to the splitter&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">splitter</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'|'</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;p*?&gt;|&lt;/p&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;div*?&gt;|&lt;/div&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;td*?&gt;|&lt;/td&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;li*?&gt;|&lt;/li&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;h<span class="es0">\d</span>*?&gt;|&lt;/h<span class="es0">\d</span>&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dd*?&gt;|&lt;/dd&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dt*?&gt;|&lt;/dt&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;br*?&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Split segments by certain tags (removing tags in bargain)<br />
&nbsp; &nbsp; &nbsp; &nbsp; These tags indicate a segment boundary&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">charset_finder</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'[<span class="es0">\s</span><span class="es0">\S</span>]*&lt;meta[<span class="es0">\s</span><span class="es0">\S</span>]*?charset<span class="es0">\s</span>*=<span class="es0">\s</span>*([<span class="es0">\S</span>]+)&quot;[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*'</span>, <span class="kw3">re</span>.<span class="me1">I</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find the charset if necessary&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = <span class="kw2">None</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__str__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;So we can tell which segger we have (assuming multiple segmenter classes)&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;HTML&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> get_chunks<span class="br0">&#40;</span><span class="kw2">self</span>, html_text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Extract the text from the HTML file&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = bsoup<span class="br0">&#40;</span>html_text, fromEncoding=<span class="kw2">self</span>.<span class="me1">getEncoding</span><span class="br0">&#40;</span>html_text<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># document title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; title = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>.<span class="me1">title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> title:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> title.<span class="kw3">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># image alt attributes, anchor title attributes, input value attributes</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag, attr <span class="kw1">in</span> <span class="br0">&#40;</span><span class="br0">&#40;</span>u<span class="st0">&quot;img&quot;</span>, u<span class="st0">&quot;alt&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;a&quot;</span>, u<span class="st0">&quot;title&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;input&quot;</span>, u<span class="st0">&quot;value&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">getAttributes</span><span class="br0">&#40;</span>tag, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw3">chunk</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> <span class="kw3">chunk</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Parse the body text</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; text = <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span>.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, <span class="kw2">unicode</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">splitter</span>.<span class="me1">split</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; normal = normalize<span class="br0">&#40;</span>html2plain<span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> normal:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> normal<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> getAttributes<span class="br0">&#40;</span><span class="kw2">self</span>, tagName, attrName<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get all attrName values for tagName tags&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; attrs = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; tags = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">findAll</span><span class="br0">&#40;</span>tagName<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag <span class="kw1">in</span> tags:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attr = tag<span class="br0">&#91;</span>attrName<span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> attr:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attrs.<span class="me1">append</span><span class="br0">&#40;</span>attr<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">except</span> <span class="kw2">KeyError</span>, e:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">#print &quot;Tag %s does not have attribute %s&quot; % (tagName, attrName)</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">pass</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> attrs<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> getEncoding<span class="br0">&#40;</span><span class="kw2">self</span>, text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Retrieve the encoding META tag, if present&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; m = <span class="kw2">self</span>.<span class="me1">charset_finder</span>.<span class="me1">match</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> m:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> m.<span class="me1">groups</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">None</span></p>
<p>
TAG_STRIPPER = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;&lt;[!<span class="es0">\w</span>/][<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;&quot;</span>, <span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> strip_tags<span class="br0">&#40;</span>line<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;strip the HTML tags from the line<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; strip_tags(u&quot;</span>&lt;b&gt;spam&lt;/b&gt;<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam'<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> TAG_STRIPPER.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, line<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> html2plain<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strips out tags from HTML text<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('spam &lt;b&gt;eggs&lt;/b&gt;')<br />
&nbsp; &nbsp; u'spam<span class="es0">\\</span>xa0eggs'<br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('&#8211;&gt;')<br />
&nbsp; &nbsp; u'&#8211;&gt;'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; entities = BeautifulStoneSoup.<span class="me1">HTML_ENTITIES</span><br />
&nbsp; &nbsp; text = <span class="kw2">unicode</span><span class="br0">&#40;</span>BeautifulStoneSoup<span class="br0">&#40;</span>strip_tags<span class="br0">&#40;</span>text<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; convertEntities=entities<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> text.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;gt;&quot;</span>, <span class="st0">&quot;&gt;&quot;</span><span class="br0">&#41;</span>.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;lt;&quot;</span>, <span class="st0">&quot;&lt;&quot;</span><span class="br0">&#41;</span></div>
<p>And here's some code to get the actual wordcount:</p>
<div class="dean_ch" style="white-space: nowrap;">
&nbsp; &nbsp; wordcount = docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; segger = htmlseg.<span class="me1">Segmenter</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> segger.<span class="me1">get_chunks</span><span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;thefile.html&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wordcount.<span class="me1">accumulate</span><span class="br0">&#40;</span>docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<p>Here are the <a href="/code/html_wordcount.tar.gz">docstats and htmlseg modules</a>, and here is an <a href="http://felix-cat.com/tools/wordcount/">online tool using the code for the HTML word counts</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Speeding up search on Honyaku archive site</title>
		<link>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/</link>
		<comments>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 11:28:48 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/</guid>
		<description><![CDATA[Last summer, I launched a new archive site for the Honyaku mailing list.
The site is written in Python using the django framework, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.
Lately, however, the searches have been taking a huge amount of time. [...]]]></description>
			<content:encoded><![CDATA[<p>Last summer, I launched a <a href="http://honyaku-archive.org/">new archive site</a> for the <a href="http://groups.google.com/group/honyaku">Honyaku mailing list</a>.</p>
<p>The site is written in Python using the <a href="http://www.djangoproject.com/">django framework</a>, with MySQL as the database. I chose MySQL because my tests showed that it was much faster than PostgreSQL at text searching.</p>
<p>Lately, however, the searches have been taking a huge amount of time. Sometimes they would even time out. It makes sense, since I've got more than 216,000 emails in there now, and <code>body__icontains</code> isn't exactly a speed demon.</p>
<p>But it was also taking forever just to get the posts for a given day. That was pretty easy to solve, though: duh, create an index on the <code>date_sent</code> field. So simple I never thought about it until the system was bogging down like a <a href="http://www.daylife.com/photo/02V857Ddoe3Gu">Golden Week traffic jam</a>.</p>
<p>That solved the date problem, but my text search problem remained. In the end, I had to create a full-text index for the <a href="http://honyaku-archive.org/search/">simple search</a>. This solved the speed problem &#8212; queries take a second or two now &#8212; but the problem with MySQL's full-text index is that it has lousy support for Japanese text (which isn't delimited by spaces). For that reason, I kept the old, slow search method for the <a href="http://honyaku-archive.org/advanced-search/">advanced search</a>. If you use that,  I recommend narrowing the search rather than just entering some body text.</p>
<p>In the end, I'm going to have to bite the bullet and install some kind of n-gram indexing scheme that will support Japanese. Right now, though, I simply don't have the time.</p>
<p>As a stopgap measure, I added a <a href="http://www.google.com/coop/cse?cx=001297244641614827125%3Ashsg2vj5xwk">Google search for the Honyaku archive</a>. Google doesn't seem to have indexed the site yet (I just took the main archive out of the robots.txt file), but when it does it'll be a quick way to search with good Japanese support. They even have a gadget that I can put on the Honyaku archive site, but I can't get it to keep the height I set for it.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/04/29/speeding-up-search-on-honyaku-archive-site/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What price elegance?</title>
		<link>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 04:03:03 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/21/what-price-elegance/</guid>
		<description><![CDATA[In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.
The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative [...]]]></description>
			<content:encoded><![CDATA[<p><a href="/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/">In a recent post</a>, I gave some code for counting the top n most frequent words in an arbitrary text file using <a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby.</a></p>
<p>The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative style using <a href="http://docs.python.org/lib/defaultdict-objects.html">collections.defaultdict</a>.</p>
<p>Here are the two functions:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict</p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></div>
<p>Here are the helper functions:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
<p>The groupby version is shorter than the defaultdict version, and I'd say that it's simpler and more readable as well. Because it's shorter, the groupby version is less likely to contain bugs. In particular, the defaultdict version has a mutable local variable (used as an accumulator in the for loop), which is a classic source of bugs. The groupby version is also likely to be easier to maintain because it's shorter and simpler.</p>
<p>But the defaultdict version of the function winds up being considerably faster.</p>
<p>The times it took to run these functions 10 times on my computer, retrieving the top 50 most frequent words for "/python25/readme.txt", are as follows (seconds rounded to 4 decimal places).</p>
<table>
<tr>
<th>&nbsp;</th>
<th>Without psyco</th>
<th>With psyco</th>
</tr>
<tr>
<th align="left">groupby version</th>
<td align="center"><font color="red">0.3133 s</font></td>
<td align="center"><font color="red">0.2193 s</font></td>
</tr>
<tr>
<th align="left">defaultdict version</th>
<td align="center"><font color="green">0.2852 s</font></td>
<td align="center"><font color="green">0.1818 s</font></td>
</tr>
<tr>
<th align="left">groupby / defaultdict</th>
<td align="center">1.41</td>
<td align="center">1.58</td>
</tr>
</table>
<p>The defaultdict version is 1.4x faster than the groupby version. This gap grows even further when psyco is used, making the defaultdict version nearly 1.6x as fast. I'd say that most of the reason for the slowness is that the groupby version of the function performs two sorts, compared to one sort in the defaultdict version.</p>
<p>(The psyco speedup for the defaultdict version comes from the for loop; changing <code>get_words</code> to return a generator expression eliminates the speedup. The speedup for the groupby version comes from the <code>freq</code> <a href="http://docs.python.org/tut/node7.html#SECTION007140000000000000000">list comprehension</a>; changing this to a generator expression eliminates its speedup.)</p>
<h3>So which one should I use?</h3>
<p>It's pretty common for Python code written in a functional style to be slower than equivalent code written in an imperative style. Nevertheless, I tend to prefer the more functional style of programming, switching to a more imperative style (or <a href="/scribbles/2007/12/02/extending-python-with-c-a-case-study/">other forms of optimization</a>) if performance isn't satisfactory.</p>
<blockquote><p>It is easier to optimize correct code, than correct optimized code.
</p></blockquote>
<p align="right"><em>&#8211;Yves Deville</em></p>
<p>A big question here is how to tell if the functional version is fast enough. My general rule of thumb is that the user would be prepared to wait up to two seconds for a typical "grovel through these files and tell me something interesting" command that's performed infrequently (how frequently do you need to get word frequencies from files?). For a more common action, the wait time should be under a second, with < .5 seconds being optimal (this includes GUI responsiveness but not Web page loading).</p>
<p>Given the times above, and assuming that the user will search no more than 50 files of sizes comparable to <a href="http://svn.python.org/view/python/branches/release25-maint/README?rev=59483">Python's README file</a>, then either version of the function is sufficient. If we assume that the user will search up to 100 files, or files substantially larger than the README file, then only the imperative version is acceptable (and we may need to optimize this further if our demands are higher than this).</p>
<p>That's why it's so important to profile and test Python programs from the very beginning. I keep a suite of test cases that I profile with every build (performed at least daily), noting trends in performance and optimizing when the code has gelled and bottlenecks remain.</p>
<p>Here is the test code:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>The whole shebang:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
Testing functional programming stuff<br />
&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict<br />
<span class="kw1">import</span> <span class="kw3">re</span><br />
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Counting occurrences in a sequence with itertools.groupby</title>
		<link>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 05:29:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</guid>
		<description><![CDATA[itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.
Here are some examples from the interactive interpreter.
A list of numbers

&#62;&#62;&#62; # Create a random list of numbers
&#62;&#62;&#62; from random import random
&#62;&#62;&#62; numbers = &#91;int&#40;random&#40;&#41; * 10&#41; for x in range&#40;20&#41;&#93;
&#62;&#62;&#62; numbers
&#91;8, 0, 3, 2, 3, 9, 8, 2, 8, 3, 0, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby</a> is a great tool for counting the numbers of occurrences in a sequence.</p>
<p>Here are some examples from the interactive interpreter.</p>
<h3>A list of numbers</h3>
<div class="dean_ch" style="white-space: nowrap;">
&gt;&gt;&gt; <span class="co1"># Create a random list of numbers</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">random</span> <span class="kw1">import</span> <span class="kw3">random</span><br />
&gt;&gt;&gt; numbers = <span class="br0">&#91;</span><span class="kw2">int</span><span class="br0">&#40;</span><span class="kw3">random</span><span class="br0">&#40;</span><span class="br0">&#41;</span> * <span class="nu0">10</span><span class="br0">&#41;</span> <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">range</span><span class="br0">&#40;</span><span class="nu0">20</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; numbers<br />
<span class="br0">&#91;</span><span class="nu0">8</span>, <span class="nu0">0</span>, <span class="nu0">3</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">9</span>, <span class="nu0">8</span>, <span class="nu0">2</span>, <span class="nu0">8</span>, <span class="nu0">3</span>, <span class="nu0">0</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">8</span>, <span class="nu0">6</span>, <span class="nu0">5</span>, <span class="nu0">3</span>, <span class="nu0">6</span>, <span class="nu0">1</span>, <span class="nu0">8</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># Now create a dictionary of numbers and numbers </span><br />
&gt;&gt;&gt; <span class="co1"># of occurrences. Feed generator expression of </span><br />
&gt;&gt;&gt; <span class="co1"># (number, frequency) pairs to dict().</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
&gt;&gt;&gt; valdict = <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>k, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>numbers<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> key, val <span class="kw1">in</span> valdict.<span class="me1">items</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> key, <span class="st0">&quot;:&quot;</span>, val</p>
<p>&nbsp; &nbsp; <br />
<span class="nu0">0</span> : <span class="nu0">2</span><br />
<span class="nu0">1</span> : <span class="nu0">1</span><br />
<span class="nu0">2</span> : <span class="nu0">3</span><br />
<span class="nu0">3</span> : <span class="nu0">5</span><br />
<span class="nu0">5</span> : <span class="nu0">1</span><br />
<span class="nu0">6</span> : <span class="nu0">2</span><br />
<span class="nu0">8</span> : <span class="nu0">5</span><br />
<span class="nu0">9</span> : <span class="nu0">1</span></div>
<p>And a function that does this for any iterable:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> count_occurrences<span class="br0">&#40;</span>iterable<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;return a dictionary with items and numbers of occurrences<br />
&nbsp; &nbsp; in iterable&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>item, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>group<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> item, group<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>iterable<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h3>Top 20 most frequent words in a file</h3>
<div class="dean_ch" style="white-space: nowrap;">
&gt;&gt;&gt; <span class="co1"># get a wordlist from the Python README</span><br />
&gt;&gt;&gt; text = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;/python25/readme.txt&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words<span class="br0">&#91;</span>:<span class="nu0">5</span><span class="br0">&#93;</span><br />
<span class="br0">&#91;</span><span class="st0">'this'</span>, <span class="st0">'is'</span>, <span class="st0">'python'</span>, <span class="st0">'version'</span>, <span class="st0">'2.5.2&#8242;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># get the frequency list, using DSU to sort top words</span><br />
&gt;&gt;&gt; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># sort the freqs, get last 20, and reverse </span><br />
&gt;&gt;&gt; <span class="co1"># to put most frequent first</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">-20</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s %s&quot;</span> % <span class="br0">&#40;</span>b.<span class="me1">ljust</span><span class="br0">&#40;</span><span class="nu0">7</span><span class="br0">&#41;</span>, <span class="kw2">str</span><span class="br0">&#40;</span>a<span class="br0">&#41;</span>.<span class="me1">rjust</span><span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <br />
the &nbsp; &nbsp; <span class="nu0">442</span><br />
to &nbsp; &nbsp; &nbsp;<span class="nu0">227</span><br />
<span class="kw1">is</span> &nbsp; &nbsp; &nbsp;<span class="nu0">127</span><br />
<span class="kw1">and</span> &nbsp; &nbsp; <span class="nu0">127</span><br />
you &nbsp; &nbsp; <span class="nu0">118</span><br />
a &nbsp; &nbsp; &nbsp; <span class="nu0">117</span><br />
of &nbsp; &nbsp; &nbsp;<span class="nu0">110</span><br />
<span class="kw1">in</span> &nbsp; &nbsp; &nbsp;<span class="nu0">107</span><br />
<span class="kw1">for</span> &nbsp; &nbsp; &nbsp;<span class="nu0">94</span><br />
python &nbsp; <span class="nu0">81</span><br />
on &nbsp; &nbsp; &nbsp; <span class="nu0">79</span><br />
<span class="kw1">if</span> &nbsp; &nbsp; &nbsp; <span class="nu0">77</span><br />
this &nbsp; &nbsp; <span class="nu0">72</span><br />
<span class="kw1">or</span> &nbsp; &nbsp; &nbsp; <span class="nu0">62</span><br />
be &nbsp; &nbsp; &nbsp; <span class="nu0">58</span><br />
with &nbsp; &nbsp; <span class="nu0">56</span><br />
it &nbsp; &nbsp; &nbsp; <span class="nu0">53</span><br />
are &nbsp; &nbsp; &nbsp;<span class="nu0">53</span><br />
that &nbsp; &nbsp; <span class="nu0">52</span><br />
as &nbsp; &nbsp; &nbsp; <span class="nu0">47</span></div>
<p>Here's a function that will do this.</p>
<div class="dean_ch" style="white-space: nowrap;">
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> get_top_freqs<span class="br0">&#40;</span>filename, num=<span class="nu0">20</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; text = <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; freqs = <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span> <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Making the robot dance</title>
		<link>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 14:33:39 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/</guid>
		<description><![CDATA[Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing Hunt the Wumpus, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. After a lot of typos, [...]]]></description>
			<content:encoded><![CDATA[<p>Some time around 1980, my elementary school classroom got a computer. While most of the other kids fooled around playing <a href="http://en.wikipedia.org/wiki/Hunt_the_Wumpus">Hunt the Wumpus</a>, my friend and I found the BASIC manual that came with the computer. We laboriously copied in the code to make a "robot" appear on the screen. After a lot of typos, we finally got the program working: a crudely drawn robot appeared on the side of the screen, and moved to the center.</p>
<p>Then my friend found the section on modifying the code to make the robot "dance" (basically move its arms up and down a few times after it got to the middle of the screen). I thought that was just about the coolest thing I'd ever seen. I started tweaking the program in various ways, turning the program into a mini disco, with flashing colors and sounds.</p>
<p>I was hooked. I started staying in at lunch so I could have time with the computer. I also pestered my mom to buy me a computer. And she did: a Tandy PC with 3KB of memory for programs, and a tape cassette deck for memory storage. It was great. One of my more ambitious projects was tweaking the horse race game that came with the computer to have odds and varying payoffs like at the tracks. I also made a "lemonade stand" clone. I think the only non-game I made was a "conversation" program that would chat with you and say different things depending on what you typed. It was very simple, of course: like if you typed "HELLO," it would respond, "HELLO, HOW ARE YOU?" (For some reason, I think the BASIC environment only had capital letters.)</p>
<p>In college, I would go on to learn more advanced programming languages, and fancy concepts like finite state theory and formal logic. But that same joy was always there: the joy of getting the computer to do my bidding. At one point I wrote a program to parse English text, and just like when I was 11 and wrote that "conversation" program, it was just so neat to have the computer respond to my input.</p>
<p>Now, I'm a technical translator, but I also program. Although I do program professionally to some extent, most of my programming is just for the pure enjoyment of it. I actually think that's better than being paid to program, because I have a lot more freedom about what I do.</p>
<p>I've often tried to pin down just what is so interesting about controlling a computer. My gut feeling is that it's just intrinsically fun &#8212; who wouldn't want to program computers? But of course that's not true: most people think that programming computers is really boring. What makes programming interesting to some and boring to others? Maybe it's a certain way of thinking. Or maybe it's ability: some people "get" programming, and since we tend to like what we're good at, we just gravitate to it.</p>
<p>Of course, there was always the tedious side of programming. Back when I was 11, it was copying programs out of books using hunt and peck, and loading in my programs from cassette tapes. Now, it's fiddling with user interfaces and installers and interoperability. But it's all worth it when I run the program and get to see the robot dance.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/13/making-the-robot-dance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Using chardet to convert arbitrary byte strings to Unicode</title>
		<link>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/#comments</comments>
		<pubDate>Sat, 08 Mar 2008 02:24:36 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</guid>
		<description><![CDATA[chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.
Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.

import chardet
def bytes2unicode&#40;bytes, errors='replace'&#41;:
&#160; &#160; &#34;&#34;&#34;Convert a byte string into Unicode.
&#160; &#160; First checks [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chardet.feedparser.org/">chardet</a> is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a <a href="http://en.wikipedia.org/wiki/Byte_Order_Mark">BOM</a> to pretty reliably turn them into Unicode.</p>
<p><strong>Edit:</strong> Thanks to Kirit's comment below, I added code to check for UTF-32.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> chardet</p>
<p><span class="kw1">def</span> bytes2unicode<span class="br0">&#40;</span>bytes, errors=<span class="st0">'replace'</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Convert a byte string into Unicode.<br />
&nbsp; &nbsp; First checks for a BOM, and if one is found returns<br />
&nbsp; &nbsp; the Unicode text minus the BOM. If there is no BOM,<br />
&nbsp; &nbsp; falls back to chardet.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; encoding_map = <span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ef<span class="es0">\x</span>bb<span class="es0">\x</span>bf'</span>, <span class="st0">'utf-8&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe<span class="es0">\0</span><span class="es0">\0</span>'</span>, <span class="st0">'utf-32&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\0</span><span class="es0">\0</span><span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-32BE'</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe'</span>, <span class="st0">'utf-16&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-16BE'</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> bom, encoding <span class="kw1">in</span> encoding_map:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> bytes.<span class="me1">startswith</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes<span class="br0">&#91;</span><span class="kw2">len</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<span class="br0">&#93;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;encoding,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;errors=errors<span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="co1"># No BOM found, so use chardet</span><br />
&nbsp; &nbsp; detection = chardet.<span class="me1">detect</span><span class="br0">&#40;</span>bytes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; encoding = detection.<span class="me1">get</span><span class="br0">&#40;</span><span class="st0">'encoding'</span><span class="br0">&#41;</span> <span class="kw1">or</span> <span class="st0">'utf-16&#8242;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes, encoding, errors=errors<span class="br0">&#41;</span></div>
<p>Usage:</p>
<div class="dean_ch" style="white-space: nowrap;">
text = bytes2unicode<span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>, <span class="st0">'replace'</span><span class="br0">&#41;</span></div>
<h3>Discussion: Why check for a BOM?</h3>
<p>You might ask, why check for a BOM if chardet already does this? This is because although chardet will correctly detect the BOM, it won't tell you that it found it, so you won't know to chop it off before processing the text. Which means that you'd have to check for a BOM anyway in most cases.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Python GUI programming platforms for Windows</title>
		<link>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/#comments</comments>
		<pubDate>Tue, 26 Feb 2008 06:00:57 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</guid>
		<description><![CDATA[[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.
There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.
Tkinter
Tkinter is the ubiquitous [...]]]></description>
			<content:encoded><![CDATA[<p><b>[Edit]</b><br />
By popular demand, I've added a section on PyGTK. See bottom of post.</p>
<p>There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.</p>
<h2>Tkinter</h2>
<p>Tkinter is the ubiquitous GUI toolkit for Python. It's cross platform and easy to use, but it looks non-native on just about every platform. There are various add-ons and improvements you can find to improve the look and feel, but the basic problem is that the toolkit implements its own widgets, rather than using the native ones provided on the platform.</p>
<h3>Pros</h3>
<ul>
<li>Most portable GUI toolkit for Python</li>
<li>Very easy to use, with pythonic API</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Non-native look and feel out of the box</li>
</ul>
<p>Hello world example <a href="http://www.shido.info/py/tkinter1.html" title="source of code snippet">(code source)</a>:<br />
<img src="/img/hello-tkinter.png" border="0"/></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">Tkinter</span> as Tk<br />
la = Tk.<span class="me1">Label</span><span class="br0">&#40;</span><span class="kw2">None</span>, text=<span class="st0">'Hello World!'</span>, font=<span class="br0">&#40;</span><span class="st0">'Times'</span>, <span class="st0">'18&#8242;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
la.<span class="me1">pack</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
la.<span class="me1">mainloop</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>wxPython</h2>
<p><a href="http://www.wxpython.org/">wxPython</a> is probably the most popular GUI toolkit for Python. It's a wrapper for the <a href="http://www.wxwidgets.org/">wxWidgets</a> C++ toolkit, and as such it betrays a few unpythonic edges (like lumpy case, getters and setters, and funky C++ errors creeping up occasionally). There are a few pythonification efforts on top of wxPython, such as <a href="http://dabodev.com/">dabo</a> and (the now apparently moribund) <a href="http://sourceforge.net/projects/waxgui">wax</a>.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Relatively mature and robust</li>
<li>Uses native Windows widgets for authentic look and feel</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Must include large wx runtime when packaging with py2exe (adds ~7 MB)</li>
<li>Cross platform nature makes accessing some native platform features (like ActiveX) difficult to impossible</li>
</ul>
<p>Hello world example <a href="http://www.goldb.org/goldblog/PermaLink,guid,d109ef8a-c3ea-4a2b-8ab7-9081c4dcc912.aspx" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-wxpython.png" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> wx</p>
<p><span class="kw1">class</span> Application<span class="br0">&#40;</span>wx.<span class="me1">Frame</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wx.<span class="me1">Frame</span>.<span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent, <span class="nu0">-1</span>, <span class="st0">'My GUI'</span>, size=<span class="br0">&#40;</span><span class="nu0">300</span>, <span class="nu0">200</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel = wx.<span class="me1">Panel</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer = wx.<span class="me1">BoxSizer</span><span class="br0">&#40;</span>wx.<span class="me1">VERTICAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel.<span class="me1">SetSizer</span><span class="br0">&#40;</span>sizer<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; txt = wx.<span class="me1">StaticText</span><span class="br0">&#40;</span>panel, <span class="nu0">-1</span>, <span class="st0">'Hello World!'</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer.<span class="me1">Add</span><span class="br0">&#40;</span>txt, <span class="nu0">0</span>, wx.<span class="me1">TOP</span>|wx.<span class="me1">LEFT</span>, <span class="nu0">20</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Centre</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Show</span><span class="br0">&#40;</span><span class="kw2">True</span><span class="br0">&#41;</span></p>
<p>app = wx.<span class="me1">App</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
Application<span class="br0">&#40;</span><span class="kw2">None</span><span class="br0">&#41;</span><br />
app.<span class="me1">MainLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>.NET with IronPython</h2>
<p><a href="http://www.codeplex.com/IronPython">IronPython</a> is a .NET implementation of Python. As of 1.0 it has full support for Python 2.4 features, and the 2.0 version will duplicate the Python 2.5 feature set. Although there are many CPython libraries/modules that won't run under IronPython (namely, the ones relying on compiled extensions that have not yet been ported), this lack is partially made up by the huge .NET library. </p>
<p>One cool thing about IronPython is that you can easily create lightweight .exe files that you can ship off to your friends &#8212; although you pay for this with a dependency on the .NET runtime, which you can't count on random Windows users to have installed.</p>
<p>Of course, when you go the IronPython route, you take all that comes with it: the good things, like access to .NET libraries and possibly the easiest/cleanest optimization path of any Python implementation (C#); and the bad things, like dependence on the .NET runtime and danger of getting caught on the MS upgrade treadmill.</p>
<p>Another way of getting at the .NET libraries is <a href="http://pythonnet.sourceforge.net/">Python.NET</a>, which adds two files to your Python directory to enable you to call the CLR from CPython.</p>
<h3>Pros</h3>
<ul>
<li>Leverage .NET libraries</li>
<li>Easily create .exe files</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Depends on .NET runtime</li>
</ul>
<p>Hello world example <a href="http://www.voidspace.org.uk/ironpython/winforms/part2.shtml" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-ipy.png" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw3">sys</span>.<span class="me1">path</span>.<span class="me1">append</span><span class="br0">&#40;</span>r<span class="st0">'C:<span class="es0">\P</span>ython24<span class="es0">\L</span>ib'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">import</span> clr<br />
clr.<span class="me1">AddReference</span><span class="br0">&#40;</span><span class="st0">&quot;System.Windows.Forms&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">from</span> System.<span class="me1">Windows</span>.<span class="me1">Forms</span> <span class="kw1">import</span> Application, Form</p>
<p><span class="kw1">class</span> HelloWorldForm<span class="br0">&#40;</span>Form<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Text</span> = <span class="st0">'Hello World'</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Name</span> = <span class="st0">'Hello World'</span></p>
<p>form = HelloWorldForm<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
Application.<span class="me1">Run</span><span class="br0">&#40;</span>form<span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>PyQT</h2>
<p><a href="http://www.riverbankcomputing.co.uk/pyqt/">PyQT</a> is probably the third most widely used GUI toolkit, after wxPython and Tkinter. It has a dual commercial/GPL license (<ins datetime="2008-02-27T22:23:05+00:00">Edit: but it does let you use other open-source licenses; see comments below</ins>). I have to admit that this made it a non-starter for me: I don't want to pay for my toolkit when there are others just as good or better that are free; <del datetime="2008-02-27T22:23:05+00:00">and when I do release open-source software, I want to choose my own license</del>. For others, the GPL might be a non-issue or a plus, so I've left it off my pro/con list.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Very easy to use</li>
<li>Highly mature</li>
<li>Decent looking widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Somewhat non-native look and feel (though much better than Tkinter)</li>
<li>Must include large runtime when packaging with py2exe</li>
</ul>
<p>Hello world example (from PyQT docs):</p>
<div><img src="/img/hello-qt.png" alt="PyQT screen shot" /></div>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw1">from</span> PyQt4 <span class="kw1">import</span> QtGui</p>
<p>app = QtGui.<span class="me1">QApplication</span><span class="br0">&#40;</span><span class="kw3">sys</span>.<span class="me1">argv</span><span class="br0">&#41;</span></p>
<p>hello = QtGui.<span class="me1">QPushButton</span><span class="br0">&#40;</span><span class="st0">&quot;Hello world!&quot;</span><span class="br0">&#41;</span><br />
hello.<span class="me1">resize</span><span class="br0">&#40;</span><span class="nu0">100</span>, <span class="nu0">30</span><span class="br0">&#41;</span></p>
<p>hello.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw3">sys</span>.<span class="me1">exit</span><span class="br0">&#40;</span>app.<span class="me1">exec_</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h2>Pyglet</h2>
<p><a href="http://www.pyglet.org/">Pyglet</a> is kind of the new kid on the block in terms of GUI toolkits, but it sure made a splash. It implements its own windowing system, but with no dependencies other than Python (for Python 2.5 users). You will need <a href="http://www.opengl.org/">OpenGL</a> to do decent 3D graphics, but that's hardly a black mark for pyglet &#8212; other libraries would love to make it this easy.</p>
<h3>Pros</h3>
<ul>
<li>High degree of freedom for GUI creation</li>
<li>Only depends on Python</li>
<li>Large number of widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Purposely doesn't duplicate the native platform look and feel</li>
<li>Although there are a lot of widgets, you'll have to roll your own for many things the platform gives you for free.</li>
</ul>
<p>Hello world example (slightly modified from <a href="http://www.pyglet.org/doc/programming_guide/hello_world.html">code source</a>):<br />
<img src="/img/hello-pyglet.png" alt="hello world with pyglet screenshot" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> pyglet <span class="kw1">import</span> font<br />
<span class="kw1">from</span> pyglet <span class="kw1">import</span> window</p>
<p>win = window.<span class="me1">Window</span><span class="br0">&#40;</span>width=<span class="nu0">300</span>, height=<span class="nu0">150</span>, caption=<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>ft = font.<span class="me1">load</span><span class="br0">&#40;</span><span class="st0">'Arial'</span>, <span class="nu0">36</span><span class="br0">&#41;</span><br />
text = font.<span class="me1">Text</span><span class="br0">&#40;</span>ft, <span class="st0">'Hello, World!'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">while</span> <span class="kw1">not</span> win.<span class="me1">has_exit</span>:<br />
&nbsp; &nbsp; win.<span class="me1">dispatch_events</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">clear</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; text.<span class="me1">draw</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">flip</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>Win32 with ctypes</h2>
<p>Of course, all you really need to write GUI applications on Windows with Python is your trusty ctypes module and a well worn copy of <a href="http://www.charlespetzold.com/pw5/">Petzold</a>. The benefit of this style is that you're working right down at the system API level, with nothing to get in your way. The disadvantage is that you're working right down at the system API level, with nothing to relieve you from all that boilerplate (unless you write your own abstraction layer on top; see Venster, below&#8230;).</p>
<h3>Pros</h3>
<ul>
<li>Enables high level of control</li>
<li>Straightforward if familiar with Win32 API</li>
<li>No added complexity or buried functionality due to need to be cross-platform</li>
<li>Lightest of all Windows GUI programming methods using Python</li>
</ul>
<h3>Cons</h3>
<ul>
<li>All the complexity and inconsistency of Win32 API in gory detail</li>
<li>Lack of high-level libraries (have to write more code)</li>
</ul>
<p>Hello world example (long, ain't it?):<br />
<img src="/img/hello-win32.png" alt="Win32 GUI screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> ctypes <span class="kw1">import</span> *<br />
<span class="kw1">import</span> win32con</p>
<p>WNDPROC = WINFUNCTYPE<span class="br0">&#40;</span>c_long, c_int, c_uint, c_int, c_int<span class="br0">&#41;</span></p>
<p>NULL = c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><br />
_user32 = windll.<span class="me1">user32</span></p>
<p><span class="kw1">def</span> ErrorIfZero<span class="br0">&#40;</span>handle<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">if</span> handle == <span class="nu0">0</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> handle</p>
<p>CreateWindowEx = _user32.<span class="me1">CreateWindowExW</span><br />
CreateWindowEx.<span class="me1">argtypes</span> = <span class="br0">&#91;</span>c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#93;</span><br />
CreateWindowEx.<span class="me1">restype</span> = ErrorIfZero</p>
<p>
<span class="kw1">class</span> WNDCLASS<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'style'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpfnWndProc'</span>, WNDPROC<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbClsExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbWndExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hInstance'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hIcon'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hCursor'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hbrBackground'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszMenuName'</span>, c_wchar_p<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszClassName'</span>, c_wchar_p<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndProc,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;style=win32con.<span class="me1">CS_HREDRAW</span> | win32con.<span class="me1">CS_VREDRAW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;clsExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;menuName=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;className=u<span class="st0">&quot;PythonWin32&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;instance=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;icon=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cursor=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;background=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> instance:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance = windll.<span class="me1">kernel32</span>.<span class="me1">GetModuleHandleW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> icon:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; icon = _user32.<span class="me1">LoadIconW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDI_APPLICATION</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> cursor:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cursor = _user32.<span class="me1">LoadCursorW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDC_ARROW</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> background:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; background = windll.<span class="me1">gdi32</span>.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">WHITE_BRUSH</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpfnWndProc</span>=wndProc<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">style</span>=style<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbClsExtra</span>=clsExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbWndExtra</span>=wndExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hInstance</span>=instance<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hIcon</span>=icon<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hCursor</span>=cursor<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hbrBackground</span>=background<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszMenuName</span>=menuName<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszClassName</span>=className</p>
<p><span class="kw1">class</span> RECT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'left'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'top'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'right'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'bottom'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, left=<span class="nu0">0</span>, top=<span class="nu0">0</span>, right=<span class="nu0">0</span>, bottom=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">left</span> = left<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">top</span> = top<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">right</span> = right<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">bottom</span> = bottom</p>
<p><span class="kw1">class</span> PAINTSTRUCT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hdc'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fErase'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rcPaint'</span>, RECT<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fRestore'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fIncUpdate'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rgbReserved'</span>, c_wchar * <span class="nu0">32</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">class</span> POINT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'x'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'y'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span> <span class="kw2">self</span>, x=<span class="nu0">0</span>, y=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">x</span> = x<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">y</span> = y<br />
&nbsp; &nbsp;<br />
<span class="kw1">class</span> MSG<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hwnd'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'message'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'wParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'time'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'pt'</span>, POINT<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp;<br />
<span class="kw1">def</span> pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Calls message loop&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; msg = MSG<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; pMsg = pointer<span class="br0">&#40;</span>msg<span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">while</span> _user32.<span class="me1">GetMessageW</span><span class="br0">&#40;</span>pMsg, NULL, <span class="nu0">0</span>, <span class="nu0">0</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">TranslateMessage</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DispatchMessageW</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> msg.<span class="me1">wParam</span></p>
<p>
<span class="kw1">class</span> Window<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Wraps an HWND handle&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, hwnd=NULL<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = hwnd<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Register event handlers</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> key <span class="kw1">in</span> <span class="kw2">dir</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; method = <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>, key<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">hasattr</span><span class="br0">&#40;</span>method, <span class="st0">&quot;win32message&quot;</span><span class="br0">&#41;</span> <span class="kw1">and</span> <span class="kw2">callable</span><span class="br0">&#40;</span>method<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers<span class="br0">&#91;</span>method.<span class="me1">win32message</span><span class="br0">&#93;</span> = method<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> GetClientRect<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = RECT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span>, byref<span class="br0">&#40;</span>rect<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> rect<br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> Create<span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; exStyle=<span class="nu0">0</span> , &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1"># &nbsp;DWORD dwExStyle</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; className=u<span class="st0">&quot;WndClass&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName=u<span class="st0">&quot;Window&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; style=win32con.<span class="me1">WS_OVERLAPPEDWINDOW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; width=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; height=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; menu=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lparam=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = CreateWindowEx<span class="br0">&#40;</span>exStyle,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; className,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; style,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; width,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; height,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; menu,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lparam<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span>.<span class="me1">hwnd</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> Show<span class="br0">&#40;</span><span class="kw2">self</span>, flag<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> _user32.<span class="me1">ShowWindow</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span>, flag<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> Update<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> _user32.<span class="me1">UpdateWindow</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> WndProc<span class="br0">&#40;</span><span class="kw2">self</span>, hwnd, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; event_handler = <span class="kw2">self</span>._event_handlers.<span class="me1">get</span><span class="br0">&#40;</span>message, <span class="kw2">None</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> event_handler:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> event_handler<span class="br0">&#40;</span>message, wParam, lParam<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> _user32.<span class="me1">DefWindowProcW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>hwnd<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>message<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>wParam<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>lParam<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="co1">## Lifted shamelessly from WCK (effbot)'s wckTkinter.bind</span><br />
<span class="kw1">def</span> EventHandler<span class="br0">&#40;</span>message<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Decorator for event handlers&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> decorator<span class="br0">&#40;</span>func<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func.<span class="me1">win32message</span> = message<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> func<br />
&nbsp; &nbsp; <span class="kw1">return</span> decorator</p>
<p><span class="kw1">class</span> HelloWindow<span class="br0">&#40;</span>Window<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;The application window&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; @EventHandler<span class="br0">&#40;</span>win32con.<span class="me1">WM_PAINT</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> OnPaint<span class="br0">&#40;</span><span class="kw2">self</span>, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Draw 'Hello World' in center of window&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; ps = PAINTSTRUCT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; hdc = _user32.<span class="me1">BeginPaint</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>, byref<span class="br0">&#40;</span>ps<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; flags = win32con.<span class="me1">DT_SINGLELINE</span>|win32con.<span class="me1">DT_CENTER</span>|win32con.<span class="me1">DT_VCENTER</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DrawTextW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>hdc<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u<span class="st0">&quot;Hello, world!&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span><span class="nu0">-1</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byref<span class="br0">&#40;</span>rect<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; flags<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">EndPaint</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>, byref<span class="br0">&#40;</span>ps<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span></p>
<p>&nbsp; &nbsp; @EventHandler<span class="br0">&#40;</span>win32con.<span class="me1">WM_DESTROY</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> OnDestroy<span class="br0">&#40;</span><span class="kw2">self</span>, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Quit app when window is destroyed&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">PostQuitMessage</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span></p>
<p><span class="kw1">def</span> RunHello<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Create window and start message loop&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># two-stage creation for Win32 windows</span><br />
&nbsp; &nbsp; hello = HelloWindow<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># register window class…</span><br />
&nbsp; &nbsp; wndclass = WNDCLASS<span class="br0">&#40;</span>WNDPROC<span class="br0">&#40;</span>hello.<span class="me1">WndProc</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; wndclass.<span class="me1">lpszClassName</span> = u<span class="st0">&quot;HelloWindow&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> _user32.<span class="me1">RegisterClassW</span><span class="br0">&#40;</span>byref<span class="br0">&#40;</span>wndclass<span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="co1"># …then create Window</span><br />
&nbsp; &nbsp; hello.<span class="me1">Create</span><span class="br0">&#40;</span> className=wndclass.<span class="me1">lpszClassName</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance=wndclass.<span class="me1">hInstance</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName=u<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># Show Window</span><br />
&nbsp; &nbsp; hello.<span class="me1">Show</span><span class="br0">&#40;</span>win32con.<span class="me1">SW_SHOWNORMAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; hello.<span class="me1">Update</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>RunHello<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>Venster</h2>
<p><a href="http://venster.sourceforge.net/htdocs/index.html">Venster</a> was a very promising wrapper over the Win32 API, borrowing heavily from WTL and ATL windowing techniques. Unfortunately, the project hasn't been updated in several years, and doesn't support the latest versions of Python (especially after ctypes.com was dropped). </p>
<h3>Pros</h3>
<ul>
<li>Rational abstraction layer on top of Win32</li>
<li>Use to write native, lightweight (relatively speaking) GUI applications</li>
<li>Has most of the cool Win32 tricks like hosting ActiveX and Coolbars</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Out of date; not updated in several years</li>
</ul>
<p>Hello world example (<a href="http://venster.sourceforge.net/htdocs/tutorial.html">code source</a>):<br />
<img src="/img/hello-venster.png" alt="Venster GUI screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> venster.<span class="me1">windows</span> <span class="kw1">import</span> *<br />
<span class="kw1">from</span> venster.<span class="me1">wtl</span> <span class="kw1">import</span> *</p>
<p><span class="kw1">from</span> venster <span class="kw1">import</span> gdi</p>
<p><span class="kw1">class</span> MyWindow<span class="br0">&#40;</span>Window<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _window_title_ = <span class="st0">&quot;Hello World&quot;</span><br />
&nbsp; &nbsp; _window_background_ = gdi.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>WHITE_BRUSH<span class="br0">&#41;</span><br />
&nbsp; &nbsp; _window_class_style_ = CS_HREDRAW | CS_VREDRAW</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> OnPaint<span class="br0">&#40;</span><span class="kw2">self</span>, event<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; ps = PAINTSTRUCT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; hdc = <span class="kw2">self</span>.<span class="me1">BeginPaint</span><span class="br0">&#40;</span>ps<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rc = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; msg = <span class="st0">&quot;Hello World&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; gdi.<span class="me1">TextOut</span><span class="br0">&#40;</span>hdc, rc.<span class="me1">width</span> / <span class="nu0">2</span>, rc.<span class="me1">height</span> / <span class="nu0">2</span>, msg, <span class="kw2">len</span><span class="br0">&#40;</span>msg<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">EndPaint</span><span class="br0">&#40;</span>ps<span class="br0">&#41;</span><br />
&nbsp; &nbsp; msg_handler<span class="br0">&#40;</span>WM_PAINT<span class="br0">&#41;</span><span class="br0">&#40;</span>OnPaint<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> OnDestroy<span class="br0">&#40;</span><span class="kw2">self</span>, event<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; PostQuitMessage<span class="br0">&#40;</span>NULL<span class="br0">&#41;</span><br />
&nbsp; &nbsp; msg_handler<span class="br0">&#40;</span>WM_DESTROY<span class="br0">&#41;</span><span class="br0">&#40;</span>OnDestroy<span class="br0">&#41;</span></p>
<p>myWindow = MyWindow<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
application = Application<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
application.<span class="me1">Run</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>PyGTK</h2>
<p>PyGTK seems to have a lot going for it as a cross-platform toolkit. It's also licensed under the <a href="http://en.wikipedia.org/wiki/GNU_Lesser_General_Public_License">LGPL</a>, which I like a lot more than the <a href="http://en.wikipedia.org/wiki/GNU_General_Public_License">GPL</a> of PyQT. Unfortunately, it doesn't use native Windows widgets; it does a pretty good job of faking it, but it stands out like a Win32, .NET, or wxPython app wouldn't. </p>
<h3>Pros</h3>
<ul>
<li>Cross platform</li>
<li>Lots of widgets</li>
<li>Voluminous (if somewhat disorganized) documenation</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Native Win32 widgets not used (looks good, but not quite all the way there)</li>
<li>Must include large runtime when packaging with py2exe</li>
</ul>
<p>Hello world example (<a href="http://www.pygtk.org/pygtk2tutorial/examples/helloworld.py">code source</a>):<br />
<img src="/img/hello-gtk.png" alt="PyGTK screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> pygtk<br />
pygtk.<span class="me1">require</span><span class="br0">&#40;</span><span class="st0">'2.0&#8242;</span><span class="br0">&#41;</span><br />
<span class="kw1">import</span> gtk</p>
<p><span class="kw1">class</span> HelloWorld:</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> hello<span class="br0">&#40;</span><span class="kw2">self</span>, widget, data=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;Hello World&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> delete_event<span class="br0">&#40;</span><span class="kw2">self</span>, widget, event, data=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;delete event occurred&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">False</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> destroy<span class="br0">&#40;</span><span class="kw2">self</span>, widget, data=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;destroy signal occurred&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; gtk.<span class="me1">main_quit</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span> = gtk.<span class="me1">Window</span><span class="br0">&#40;</span>gtk.<span class="me1">WINDOW_TOPLEVEL</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">&quot;delete_event&quot;</span>, <span class="kw2">self</span>.<span class="me1">delete_event</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">&quot;destroy&quot;</span>, <span class="kw2">self</span>.<span class="me1">destroy</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">set_border_width</span><span class="br0">&#40;</span><span class="nu0">10</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span> = gtk.<span class="me1">Button</span><span class="br0">&#40;</span><span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">&quot;clicked&quot;</span>, <span class="kw2">self</span>.<span class="me1">hello</span>, <span class="kw2">None</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span>.<span class="me1">connect_object</span><span class="br0">&#40;</span><span class="st0">&quot;clicked&quot;</span>, gtk.<span class="me1">Widget</span>.<span class="me1">destroy</span>, <span class="kw2">self</span>.<span class="me1">window</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">add</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">button</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span>.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> main<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; gtk.<span class="me1">main</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; hello = HelloWorld<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; hello.<span class="me1">main</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Intermediate Python: Pythonic file searches</title>
		<link>http://ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 06:14:34 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/</guid>
		<description><![CDATA[It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very pythonic &#8212; that is, it doesn't use Python idioms that experienced programmers use.
The problems with un-pythonic code are that it tends to be more verbose, more difficult to [...]]]></description>
			<content:encoded><![CDATA[<p>It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very <a href="http://faassen.n--tree.net/blog/view/weblog/2005/08/06/0">pythonic</a> &#8212; that is, it doesn't use Python idioms that experienced programmers use.</p>
<p>The problems with un-pythonic code are that it tends to be more verbose, more difficult to understand, and even to run slower. Here's a naive implementation of a function to find every line in a supplied filename containing a specified string. It returns a list of (line_num, line) tuples.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">def</span> naive_way<span class="br0">&#40;</span>to_find, filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find string to_find in file filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; file_handle = <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; line_number = <span class="nu0">0</span><br />
&nbsp; &nbsp; lines = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; done = <span class="kw2">False</span><br />
&nbsp; &nbsp; <span class="kw1">while</span> done ==