<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>The GITS Blog &#187; python</title>
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<pubDate>Sat, 17 May 2008 00:53:04 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>
	<language>en</language>
			<item>
		<title>Counting words (etc.) in an HTML file with Python</title>
		<link>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/#comments</comments>
		<pubDate>Sat, 17 May 2008 00:50:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</guid>
		<description><![CDATA[In a previous post, I wrote about how to count words, characters, and Asian characters using python.
In this post I want to pull that together with code to get a word count from an HTML file.
What needs counting
What needs counting depends to some extent on what you need the word count for, but here I'm [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous post, I wrote about <a href="/scribbles/2007/10/06/counting-words-characters-and-asian-characters-with-python/">how to count words, characters, and Asian characters using python</a>.</p>
<p>In this post I want to pull that together with code to get a word count from an HTML file.</p>
<h2>What needs counting</h2>
<p>What needs counting depends to some extent on what you need the word count for, but here I'm going to be assuming that the word count is going to be used to count billable/localizable content.</p>
<p>In that scenario, you've got to count the text in the title tag, as well as the visible text in the body, and certain other localizable content: <code>img</code> <code>alt</code> attributes, <code>a</code> <code>title</code> attributes, and <code>input</code> <code>value</code> attributes (am I missing any?).</p>
<h2>The Code</h2>
<p>The code for counting the actual text is in the above link. Here we need code to extract the text from the HTML file, and to accumulate the counts for all the chunks we've extracted.</p>
<p>Here's the Segment class for accumulating counts:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">class</span> Segment<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Represents a text segment.<br />
&nbsp; &nbsp; (For bookkeeping)<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, text=<span class="st0">&quot;&quot;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot; text is the segment of text we will calculate.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Leave it empty if this will be a master count for a document<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; @param text: The text of the segment<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> = <span class="kw2">len</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; num_spaces = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> x.<span class="me1">isspace</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> = <span class="kw2">self</span>.<span class="me1">characters</span> - num_spaces<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> is_asian<span class="br0">&#40;</span>x<span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> = non_j_len<span class="br0">&#40;</span>text<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> = <span class="kw2">self</span>.<span class="me1">non_asian_words</span> + <span class="kw2">self</span>.<span class="me1">asian_chars</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> accumulate<span class="br0">&#40;</span><span class="kw2">self</span>, seg<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Add the stats from &lt;seg&gt; to this one.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Use this to keep a count for the entire document;<br />
&nbsp; &nbsp; &nbsp; &nbsp; use another for the whole batch of documents<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; @param seg: The segment to accumulate<br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg = Segment(u&quot;</span><span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg2 = Segment(u&quot;</span>abc<span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.accumulate(seg2)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.words<br />
&nbsp; &nbsp; &nbsp; &nbsp; 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.characters<br />
&nbsp; &nbsp; &nbsp; &nbsp; 3<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> += seg.<span class="me1">words</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> += seg.<span class="me1">characters</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> += seg.<span class="me1">chars_no_spaces</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> += seg.<span class="me1">asian_chars</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> += seg.<span class="me1">non_asian_words</span></div>
<p>Next, the code for extracting (segmenting) the text from an HTML file. For this, you'll need <a href="http://www.crummy.com/software/BeautifulSoup/">the excellent Beautiful Soup module</a>.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulSoup as bsoup<br />
<span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulStoneSoup<br />
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> normalize<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Normalize whitepace in C{text}.<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; normalize(u&quot;</span> &nbsp; spam\\n\\tspam &nbsp; SPAM<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam spam SPAM'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> u<span class="st0">' '</span>.<span class="me1">join</span><span class="br0">&#40;</span>text.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">class</span> Segmenter<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter<br />
&nbsp; &nbsp; Retrieves the editable/translatable text from an HTML document.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Set up various regular expressions for splitting the text&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;|&quot;</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;body*?&gt;|&lt;/body&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;a[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/a&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;img[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/img&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;input[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/input&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;script*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*?&lt;/script&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;form[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/form&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strip out unsightly tags before heading to the splitter&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">splitter</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'|'</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;p*?&gt;|&lt;/p&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;div*?&gt;|&lt;/div&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;td*?&gt;|&lt;/td&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;li*?&gt;|&lt;/li&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;h<span class="es0">\d</span>*?&gt;|&lt;/h<span class="es0">\d</span>&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dd*?&gt;|&lt;/dd&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dt*?&gt;|&lt;/dt&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;br*?&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Split segments by certain tags (removing tags in bargain)<br />
&nbsp; &nbsp; &nbsp; &nbsp; These tags indicate a segment boundary&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">charset_finder</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'[<span class="es0">\s</span><span class="es0">\S</span>]*&lt;meta[<span class="es0">\s</span><span class="es0">\S</span>]*?charset<span class="es0">\s</span>*=<span class="es0">\s</span>*([<span class="es0">\S</span>]+)&quot;[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*'</span>, <span class="kw3">re</span>.<span class="me1">I</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find the charset if necessary&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = <span class="kw2">None</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__str__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;So we can tell which segger we have (assuming multiple segmenter classes)&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;HTML&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> get_chunks<span class="br0">&#40;</span><span class="kw2">self</span>, html_text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Extract the text from the HTML file&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = bsoup<span class="br0">&#40;</span>html_text, fromEncoding=<span class="kw2">self</span>.<span class="me1">getEncoding</span><span class="br0">&#40;</span>html_text<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># document title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; title = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>.<span class="me1">title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> title:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> title.<span class="kw3">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># image alt attributes, anchor title attributes, input value attributes</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag, attr <span class="kw1">in</span> <span class="br0">&#40;</span><span class="br0">&#40;</span>u<span class="st0">&quot;img&quot;</span>, u<span class="st0">&quot;alt&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;a&quot;</span>, u<span class="st0">&quot;title&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;input&quot;</span>, u<span class="st0">&quot;value&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">getAttributes</span><span class="br0">&#40;</span>tag, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw3">chunk</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> <span class="kw3">chunk</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Parse the body text</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; text = <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span>.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, <span class="kw2">unicode</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">splitter</span>.<span class="me1">split</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; normal = normalize<span class="br0">&#40;</span>html2plain<span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> normal:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> normal<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> getAttributes<span class="br0">&#40;</span><span class="kw2">self</span>, tagName, attrName<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get all attrName values for tagName tags&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; attrs = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; tags = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">findAll</span><span class="br0">&#40;</span>tagName<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag <span class="kw1">in</span> tags:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attr = tag<span class="br0">&#91;</span>attrName<span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> attr:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attrs.<span class="me1">append</span><span class="br0">&#40;</span>attr<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">except</span> <span class="kw2">KeyError</span>, e:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">#print &quot;Tag %s does not have attribute %s&quot; % (tagName, attrName)</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">pass</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> attrs<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">def</span> getEncoding<span class="br0">&#40;</span><span class="kw2">self</span>, text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Retrieve the encoding META tag, if present&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; m = <span class="kw2">self</span>.<span class="me1">charset_finder</span>.<span class="me1">match</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> m:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> m.<span class="me1">groups</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">None</span></p>
<p>
TAG_STRIPPER = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;&lt;[!<span class="es0">\w</span>/][<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;&quot;</span>, <span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> strip_tags<span class="br0">&#40;</span>line<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;strip the HTML tags from the line<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; strip_tags(u&quot;</span>&lt;b&gt;spam&lt;/b&gt;<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam'<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> TAG_STRIPPER.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, line<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> html2plain<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strips out tags from HTML text<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('spam &lt;b&gt;eggs&lt;/b&gt;')<br />
&nbsp; &nbsp; u'spam<span class="es0">\\</span>xa0eggs'<br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('&#8211;&gt;')<br />
&nbsp; &nbsp; u'&#8211;&gt;'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; entities = BeautifulStoneSoup.<span class="me1">HTML_ENTITIES</span><br />
&nbsp; &nbsp; text = <span class="kw2">unicode</span><span class="br0">&#40;</span>BeautifulStoneSoup<span class="br0">&#40;</span>strip_tags<span class="br0">&#40;</span>text<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; convertEntities=entities<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> text.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;gt;&quot;</span>, <span class="st0">&quot;&gt;&quot;</span><span class="br0">&#41;</span>.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;lt;&quot;</span>, <span class="st0">&quot;&lt;&quot;</span><span class="br0">&#41;</span></div>
<p>And here's some code to get the actual wordcount:</p>
<div class="dean_ch" style="white-space: nowrap;">
&nbsp; &nbsp; wordcount = docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; segger = htmlseg.<span class="me1">Segmenter</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> segger.<span class="me1">get_chunks</span><span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;thefile.html&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wordcount.<span class="me1">accumulate</span><span class="br0">&#40;</span>docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<p>Here are the <a href="/code/html_wordcount.tar.gz">docstats and htmlseg modules</a>, and here is an <a href="http://felix-cat.com/tools/wordcount/">online tool using the code for the HTML word counts</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What price elegance?</title>
		<link>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 04:03:03 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/21/what-price-elegance/</guid>
		<description><![CDATA[In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.
The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative [...]]]></description>
			<content:encoded><![CDATA[<p><a href="/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/">In a recent post</a>, I gave some code for counting the top n most frequent words in an arbitrary text file using <a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby.</a></p>
<p>The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative style using <a href="http://docs.python.org/lib/defaultdict-objects.html">collections.defaultdict</a>.</p>
<p>Here are the two functions:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict</p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></div>
<p>Here are the helper functions:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
<p>The groupby version is shorter than the defaultdict version, and I'd say that it's simpler and more readable as well. Because it's shorter, the groupby version is less likely to contain bugs. In particular, the defaultdict version has a mutable local variable (used as an accumulator in the for loop), which is a classic source of bugs. The groupby version is also likely to be easier to maintain because it's shorter and simpler.</p>
<p>But the defaultdict version of the function winds up being considerably faster.</p>
<p>The times it took to run these functions 10 times on my computer, retrieving the top 50 most frequent words for "/python25/readme.txt", are as follows (seconds rounded to 4 decimal places).</p>
<table>
<tr>
<th>&nbsp;</th>
<th>Without psyco</th>
<th>With psyco</th>
</tr>
<tr>
<th align="left">groupby version</th>
<td align="center"><font color="red">0.3133 s</font></td>
<td align="center"><font color="red">0.2193 s</font></td>
</tr>
<tr>
<th align="left">defaultdict version</th>
<td align="center"><font color="green">0.2852 s</font></td>
<td align="center"><font color="green">0.1818 s</font></td>
</tr>
<tr>
<th align="left">groupby / defaultdict</th>
<td align="center">1.41</td>
<td align="center">1.58</td>
</tr>
</table>
<p>The defaultdict version is 1.4x faster than the groupby version. This gap grows even further when psyco is used, making the defaultdict version nearly 1.6x as fast. I'd say that most of the reason for the slowness is that the groupby version of the function performs two sorts, compared to one sort in the defaultdict version.</p>
<p>(The psyco speedup for the defaultdict version comes from the for loop; changing <code>get_words</code> to return a generator expression eliminates the speedup. The speedup for the groupby version comes from the <code>freq</code> <a href="http://docs.python.org/tut/node7.html#SECTION007140000000000000000">list comprehension</a>; changing this to a generator expression eliminates its speedup.)</p>
<h3>So which one should I use?</h3>
<p>It's pretty common for Python code written in a functional style to be slower than equivalent code written in an imperative style. Nevertheless, I tend to prefer the more functional style of programming, switching to a more imperative style (or <a href="/scribbles/2007/12/02/extending-python-with-c-a-case-study/">other forms of optimization</a>) if performance isn't satisfactory.</p>
<blockquote><p>It is easier to optimize correct code, than correct optimized code.
</p></blockquote>
<p align="right"><em>&#8211;Yves Deville</em></p>
<p>A big question here is how to tell if the functional version is fast enough. My general rule of thumb is that the user would be prepared to wait up to two seconds for a typical "grovel through these files and tell me something interesting" command that's performed infrequently (how frequently do you need to get word frequencies from files?). For a more common action, the wait time should be under a second, with < .5 seconds being optimal (this includes GUI responsiveness but not Web page loading).</p>
<p>Given the times above, and assuming that the user will search no more than 50 files of sizes comparable to <a href="http://svn.python.org/view/python/branches/release25-maint/README?rev=59483">Python's README file</a>, then either version of the function is sufficient. If we assume that the user will search up to 100 files, or files substantially larger than the README file, then only the imperative version is acceptable (and we may need to optimize this further if our demands are higher than this).</p>
<p>That's why it's so important to profile and test Python programs from the very beginning. I keep a suite of test cases that I profile with every build (performed at least daily), noting trends in performance and optimizing when the code has gelled and bottlenecks remain.</p>
<p>Here is the test code:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>The whole shebang:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
Testing functional programming stuff<br />
&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict<br />
<span class="kw1">import</span> <span class="kw3">re</span><br />
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Counting occurrences in a sequence with itertools.groupby</title>
		<link>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 05:29:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</guid>
		<description><![CDATA[itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.
Here are some examples from the interactive interpreter.
A list of numbers

&#62;&#62;&#62; # Create a random list of numbers
&#62;&#62;&#62; from random import random
&#62;&#62;&#62; numbers = &#91;int&#40;random&#40;&#41; * 10&#41; for x in range&#40;20&#41;&#93;
&#62;&#62;&#62; numbers
&#91;8, 0, 3, 2, 3, 9, 8, 2, 8, 3, 0, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby</a> is a great tool for counting the numbers of occurrences in a sequence.</p>
<p>Here are some examples from the interactive interpreter.</p>
<h3>A list of numbers</h3>
<div class="dean_ch" style="white-space: nowrap;">
&gt;&gt;&gt; <span class="co1"># Create a random list of numbers</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">random</span> <span class="kw1">import</span> <span class="kw3">random</span><br />
&gt;&gt;&gt; numbers = <span class="br0">&#91;</span><span class="kw2">int</span><span class="br0">&#40;</span><span class="kw3">random</span><span class="br0">&#40;</span><span class="br0">&#41;</span> * <span class="nu0">10</span><span class="br0">&#41;</span> <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">range</span><span class="br0">&#40;</span><span class="nu0">20</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; numbers<br />
<span class="br0">&#91;</span><span class="nu0">8</span>, <span class="nu0">0</span>, <span class="nu0">3</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">9</span>, <span class="nu0">8</span>, <span class="nu0">2</span>, <span class="nu0">8</span>, <span class="nu0">3</span>, <span class="nu0">0</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">8</span>, <span class="nu0">6</span>, <span class="nu0">5</span>, <span class="nu0">3</span>, <span class="nu0">6</span>, <span class="nu0">1</span>, <span class="nu0">8</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># Now create a dictionary of numbers and numbers </span><br />
&gt;&gt;&gt; <span class="co1"># of occurrences. Feed generator expression of </span><br />
&gt;&gt;&gt; <span class="co1"># (number, frequency) pairs to dict().</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
&gt;&gt;&gt; valdict = <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>k, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>numbers<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> key, val <span class="kw1">in</span> valdict.<span class="me1">items</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> key, <span class="st0">&quot;:&quot;</span>, val</p>
<p>&nbsp; &nbsp; <br />
<span class="nu0">0</span> : <span class="nu0">2</span><br />
<span class="nu0">1</span> : <span class="nu0">1</span><br />
<span class="nu0">2</span> : <span class="nu0">3</span><br />
<span class="nu0">3</span> : <span class="nu0">5</span><br />
<span class="nu0">5</span> : <span class="nu0">1</span><br />
<span class="nu0">6</span> : <span class="nu0">2</span><br />
<span class="nu0">8</span> : <span class="nu0">5</span><br />
<span class="nu0">9</span> : <span class="nu0">1</span></div>
<p>And a function that does this for any iterable:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> count_occurrences<span class="br0">&#40;</span>iterable<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;return a dictionary with items and numbers of occurrences<br />
&nbsp; &nbsp; in iterable&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>item, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>group<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> item, group<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>iterable<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h3>Top 20 most frequent words in a file</h3>
<div class="dean_ch" style="white-space: nowrap;">
&gt;&gt;&gt; <span class="co1"># get a wordlist from the Python README</span><br />
&gt;&gt;&gt; text = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;/python25/readme.txt&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words<span class="br0">&#91;</span>:<span class="nu0">5</span><span class="br0">&#93;</span><br />
<span class="br0">&#91;</span><span class="st0">'this'</span>, <span class="st0">'is'</span>, <span class="st0">'python'</span>, <span class="st0">'version'</span>, <span class="st0">'2.5.2&#8242;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># get the frequency list, using DSU to sort top words</span><br />
&gt;&gt;&gt; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># sort the freqs, get last 20, and reverse </span><br />
&gt;&gt;&gt; <span class="co1"># to put most frequent first</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">-20</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s %s&quot;</span> % <span class="br0">&#40;</span>b.<span class="me1">ljust</span><span class="br0">&#40;</span><span class="nu0">7</span><span class="br0">&#41;</span>, <span class="kw2">str</span><span class="br0">&#40;</span>a<span class="br0">&#41;</span>.<span class="me1">rjust</span><span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <br />
the &nbsp; &nbsp; <span class="nu0">442</span><br />
to &nbsp; &nbsp; &nbsp;<span class="nu0">227</span><br />
<span class="kw1">is</span> &nbsp; &nbsp; &nbsp;<span class="nu0">127</span><br />
<span class="kw1">and</span> &nbsp; &nbsp; <span class="nu0">127</span><br />
you &nbsp; &nbsp; <span class="nu0">118</span><br />
a &nbsp; &nbsp; &nbsp; <span class="nu0">117</span><br />
of &nbsp; &nbsp; &nbsp;<span class="nu0">110</span><br />
<span class="kw1">in</span> &nbsp; &nbsp; &nbsp;<span class="nu0">107</span><br />
<span class="kw1">for</span> &nbsp; &nbsp; &nbsp;<span class="nu0">94</span><br />
python &nbsp; <span class="nu0">81</span><br />
on &nbsp; &nbsp; &nbsp; <span class="nu0">79</span><br />
<span class="kw1">if</span> &nbsp; &nbsp; &nbsp; <span class="nu0">77</span><br />
this &nbsp; &nbsp; <span class="nu0">72</span><br />
<span class="kw1">or</span> &nbsp; &nbsp; &nbsp; <span class="nu0">62</span><br />
be &nbsp; &nbsp; &nbsp; <span class="nu0">58</span><br />
with &nbsp; &nbsp; <span class="nu0">56</span><br />
it &nbsp; &nbsp; &nbsp; <span class="nu0">53</span><br />
are &nbsp; &nbsp; &nbsp;<span class="nu0">53</span><br />
that &nbsp; &nbsp; <span class="nu0">52</span><br />
as &nbsp; &nbsp; &nbsp; <span class="nu0">47</span></div>
<p>Here's a function that will do this.</p>
<div class="dean_ch" style="white-space: nowrap;">
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> get_top_freqs<span class="br0">&#40;</span>filename, num=<span class="nu0">20</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; text = <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; freqs = <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span> <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Using chardet to convert arbitrary byte strings to Unicode</title>
		<link>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/#comments</comments>
		<pubDate>Sat, 08 Mar 2008 02:24:36 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</guid>
		<description><![CDATA[chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.
Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.

import chardet
def bytes2unicode&#40;bytes, errors='replace'&#41;:
&#160; &#160; &#34;&#34;&#34;Convert a byte string into Unicode.
&#160; &#160; First checks [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chardet.feedparser.org/">chardet</a> is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a <a href="http://en.wikipedia.org/wiki/Byte_Order_Mark">BOM</a> to pretty reliably turn them into Unicode.</p>
<p><strong>Edit:</strong> Thanks to Kirit's comment below, I added code to check for UTF-32.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> chardet</p>
<p><span class="kw1">def</span> bytes2unicode<span class="br0">&#40;</span>bytes, errors=<span class="st0">'replace'</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Convert a byte string into Unicode.<br />
&nbsp; &nbsp; First checks for a BOM, and if one is found returns<br />
&nbsp; &nbsp; the Unicode text minus the BOM. If there is no BOM,<br />
&nbsp; &nbsp; falls back to chardet.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; encoding_map = <span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ef<span class="es0">\x</span>bb<span class="es0">\x</span>bf'</span>, <span class="st0">'utf-8&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe<span class="es0">\0</span><span class="es0">\0</span>'</span>, <span class="st0">'utf-32&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\0</span><span class="es0">\0</span><span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-32BE'</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe'</span>, <span class="st0">'utf-16&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-16BE'</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> bom, encoding <span class="kw1">in</span> encoding_map:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> bytes.<span class="me1">startswith</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes<span class="br0">&#91;</span><span class="kw2">len</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<span class="br0">&#93;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;encoding,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;errors=errors<span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="co1"># No BOM found, so use chardet</span><br />
&nbsp; &nbsp; detection = chardet.<span class="me1">detect</span><span class="br0">&#40;</span>bytes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; encoding = detection.<span class="me1">get</span><span class="br0">&#40;</span><span class="st0">'encoding'</span><span class="br0">&#41;</span> <span class="kw1">or</span> <span class="st0">'utf-16&#8242;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes, encoding, errors=errors<span class="br0">&#41;</span></div>
<p>Usage:</p>
<div class="dean_ch" style="white-space: nowrap;">
text = bytes2unicode<span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>, <span class="st0">'replace'</span><span class="br0">&#41;</span></div>
<h3>Discussion: Why check for a BOM?</h3>
<p>You might ask, why check for a BOM if chardet already does this? This is because although chardet will correctly detect the BOM, it won't tell you that it found it, so you won't know to chop it off before processing the text. Which means that you'd have to check for a BOM anyway in most cases.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Python GUI programming platforms for Windows</title>
		<link>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/#comments</comments>
		<pubDate>Tue, 26 Feb 2008 06:00:57 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</guid>
		<description><![CDATA[[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.
There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.
Tkinter
Tkinter is the ubiquitous [...]]]></description>
			<content:encoded><![CDATA[<p><b>[Edit]</b><br />
By popular demand, I've added a section on PyGTK. See bottom of post.</p>
<p>There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.</p>
<h2>Tkinter</h2>
<p>Tkinter is the ubiquitous GUI toolkit for Python. It's cross platform and easy to use, but it looks non-native on just about every platform. There are various add-ons and improvements you can find to improve the look and feel, but the basic problem is that the toolkit implements its own widgets, rather than using the native ones provided on the platform.</p>
<h3>Pros</h3>
<ul>
<li>Most portable GUI toolkit for Python</li>
<li>Very easy to use, with pythonic API</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Non-native look and feel out of the box</li>
</ul>
<p>Hello world example <a href="http://www.shido.info/py/tkinter1.html" title="source of code snippet">(code source)</a>:<br />
<img src="/img/hello-tkinter.png" border="0"/></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">Tkinter</span> as Tk<br />
la = Tk.<span class="me1">Label</span><span class="br0">&#40;</span><span class="kw2">None</span>, text=<span class="st0">'Hello World!'</span>, font=<span class="br0">&#40;</span><span class="st0">'Times'</span>, <span class="st0">'18&#8242;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
la.<span class="me1">pack</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
la.<span class="me1">mainloop</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>wxPython</h2>
<p><a href="http://www.wxpython.org/">wxPython</a> is probably the most popular GUI toolkit for Python. It's a wrapper for the <a href="http://www.wxwidgets.org/">wxWidgets</a> C++ toolkit, and as such it betrays a few unpythonic edges (like lumpy case, getters and setters, and funky C++ errors creeping up occasionally). There are a few pythonification efforts on top of wxPython, such as <a href="http://dabodev.com/">dabo</a> and (the now apparently moribund) <a href="http://sourceforge.net/projects/waxgui">wax</a>.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Relatively mature and robust</li>
<li>Uses native Windows widgets for authentic look and feel</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Must include large wx runtime when packaging with py2exe (adds ~7 MB)</li>
<li>Cross platform nature makes accessing some native platform features (like ActiveX) difficult to impossible</li>
</ul>
<p>Hello world example <a href="http://www.goldb.org/goldblog/PermaLink,guid,d109ef8a-c3ea-4a2b-8ab7-9081c4dcc912.aspx" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-wxpython.png" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> wx</p>
<p><span class="kw1">class</span> Application<span class="br0">&#40;</span>wx.<span class="me1">Frame</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wx.<span class="me1">Frame</span>.<span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent, <span class="nu0">-1</span>, <span class="st0">'My GUI'</span>, size=<span class="br0">&#40;</span><span class="nu0">300</span>, <span class="nu0">200</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel = wx.<span class="me1">Panel</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer = wx.<span class="me1">BoxSizer</span><span class="br0">&#40;</span>wx.<span class="me1">VERTICAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel.<span class="me1">SetSizer</span><span class="br0">&#40;</span>sizer<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; txt = wx.<span class="me1">StaticText</span><span class="br0">&#40;</span>panel, <span class="nu0">-1</span>, <span class="st0">'Hello World!'</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer.<span class="me1">Add</span><span class="br0">&#40;</span>txt, <span class="nu0">0</span>, wx.<span class="me1">TOP</span>|wx.<span class="me1">LEFT</span>, <span class="nu0">20</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Centre</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Show</span><span class="br0">&#40;</span><span class="kw2">True</span><span class="br0">&#41;</span></p>
<p>app = wx.<span class="me1">App</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
Application<span class="br0">&#40;</span><span class="kw2">None</span><span class="br0">&#41;</span><br />
app.<span class="me1">MainLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>.NET with IronPython</h2>
<p><a href="http://www.codeplex.com/IronPython">IronPython</a> is a .NET implementation of Python. As of 1.0 it has full support for Python 2.4 features, and the 2.0 version will duplicate the Python 2.5 feature set. Although there are many CPython libraries/modules that won't run under IronPython (namely, the ones relying on compiled extensions that have not yet been ported), this lack is partially made up by the huge .NET library. </p>
<p>One cool thing about IronPython is that you can easily create lightweight .exe files that you can ship off to your friends &#8212; although you pay for this with a dependency on the .NET runtime, which you can't count on random Windows users to have installed.</p>
<p>Of course, when you go the IronPython route, you take all that comes with it: the good things, like access to .NET libraries and possibly the easiest/cleanest optimization path of any Python implementation (C#); and the bad things, like dependence on the .NET runtime and danger of getting caught on the MS upgrade treadmill.</p>
<p>Another way of getting at the .NET libraries is <a href="http://pythonnet.sourceforge.net/">Python.NET</a>, which adds two files to your Python directory to enable you to call the CLR from CPython.</p>
<h3>Pros</h3>
<ul>
<li>Leverage .NET libraries</li>
<li>Easily create .exe files</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Depends on .NET runtime</li>
</ul>
<p>Hello world example <a href="http://www.voidspace.org.uk/ironpython/winforms/part2.shtml" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-ipy.png" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw3">sys</span>.<span class="me1">path</span>.<span class="me1">append</span><span class="br0">&#40;</span>r<span class="st0">'C:<span class="es0">\P</span>ython24<span class="es0">\L</span>ib'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">import</span> clr<br />
clr.<span class="me1">AddReference</span><span class="br0">&#40;</span><span class="st0">&quot;System.Windows.Forms&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">from</span> System.<span class="me1">Windows</span>.<span class="me1">Forms</span> <span class="kw1">import</span> Application, Form</p>
<p><span class="kw1">class</span> HelloWorldForm<span class="br0">&#40;</span>Form<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Text</span> = <span class="st0">'Hello World'</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Name</span> = <span class="st0">'Hello World'</span></p>
<p>form = HelloWorldForm<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
Application.<span class="me1">Run</span><span class="br0">&#40;</span>form<span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>PyQT</h2>
<p><a href="http://www.riverbankcomputing.co.uk/pyqt/">PyQT</a> is probably the third most widely used GUI toolkit, after wxPython and Tkinter. It has a dual commercial/GPL license (<ins datetime="2008-02-27T22:23:05+00:00">Edit: but it does let you use other open-source licenses; see comments below</ins>). I have to admit that this made it a non-starter for me: I don't want to pay for my toolkit when there are others just as good or better that are free; <del datetime="2008-02-27T22:23:05+00:00">and when I do release open-source software, I want to choose my own license</del>. For others, the GPL might be a non-issue or a plus, so I've left it off my pro/con list.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Very easy to use</li>
<li>Highly mature</li>
<li>Decent looking widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Somewhat non-native look and feel (though much better than Tkinter)</li>
<li>Must include large runtime when packaging with py2exe</li>
</ul>
<p>Hello world example (from PyQT docs):</p>
<div><img src="/img/hello-qt.png" alt="PyQT screen shot" /></div>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw1">from</span> PyQt4 <span class="kw1">import</span> QtGui</p>
<p>app = QtGui.<span class="me1">QApplication</span><span class="br0">&#40;</span><span class="kw3">sys</span>.<span class="me1">argv</span><span class="br0">&#41;</span></p>
<p>hello = QtGui.<span class="me1">QPushButton</span><span class="br0">&#40;</span><span class="st0">&quot;Hello world!&quot;</span><span class="br0">&#41;</span><br />
hello.<span class="me1">resize</span><span class="br0">&#40;</span><span class="nu0">100</span>, <span class="nu0">30</span><span class="br0">&#41;</span></p>
<p>hello.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw3">sys</span>.<span class="me1">exit</span><span class="br0">&#40;</span>app.<span class="me1">exec_</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h2>Pyglet</h2>
<p><a href="http://www.pyglet.org/">Pyglet</a> is kind of the new kid on the block in terms of GUI toolkits, but it sure made a splash. It implements its own windowing system, but with no dependencies other than Python (for Python 2.5 users). You will need <a href="http://www.opengl.org/">OpenGL</a> to do decent 3D graphics, but that's hardly a black mark for pyglet &#8212; other libraries would love to make it this easy.</p>
<h3>Pros</h3>
<ul>
<li>High degree of freedom for GUI creation</li>
<li>Only depends on Python</li>
<li>Large number of widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Purposely doesn't duplicate the native platform look and feel</li>
<li>Although there are a lot of widgets, you'll have to roll your own for many things the platform gives you for free.</li>
</ul>
<p>Hello world example (slightly modified from <a href="http://www.pyglet.org/doc/programming_guide/hello_world.html">code source</a>):<br />
<img src="/img/hello-pyglet.png" alt="hello world with pyglet screenshot" border=0 /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> pyglet <span class="kw1">import</span> font<br />
<span class="kw1">from</span> pyglet <span class="kw1">import</span> window</p>
<p>win = window.<span class="me1">Window</span><span class="br0">&#40;</span>width=<span class="nu0">300</span>, height=<span class="nu0">150</span>, caption=<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>ft = font.<span class="me1">load</span><span class="br0">&#40;</span><span class="st0">'Arial'</span>, <span class="nu0">36</span><span class="br0">&#41;</span><br />
text = font.<span class="me1">Text</span><span class="br0">&#40;</span>ft, <span class="st0">'Hello, World!'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">while</span> <span class="kw1">not</span> win.<span class="me1">has_exit</span>:<br />
&nbsp; &nbsp; win.<span class="me1">dispatch_events</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">clear</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; text.<span class="me1">draw</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">flip</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>Win32 with ctypes</h2>
<p>Of course, all you really need to write GUI applications on Windows with Python is your trusty ctypes module and a well worn copy of <a href="http://www.charlespetzold.com/pw5/">Petzold</a>. The benefit of this style is that you're working right down at the system API level, with nothing to get in your way. The disadvantage is that you're working right down at the system API level, with nothing to relieve you from all that boilerplate (unless you write your own abstraction layer on top; see Venster, below&#8230;).</p>
<h3>Pros</h3>
<ul>
<li>Enables high level of control</li>
<li>Straightforward if familiar with Win32 API</li>
<li>No added complexity or buried functionality due to need to be cross-platform</li>
<li>Lightest of all Windows GUI programming methods using Python</li>
</ul>
<h3>Cons</h3>
<ul>
<li>All the complexity and inconsistency of Win32 API in gory detail</li>
<li>Lack of high-level libraries (have to write more code)</li>
</ul>
<p>Hello world example (long, ain't it?):<br />
<img src="/img/hello-win32.png" alt="Win32 GUI screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> ctypes <span class="kw1">import</span> *<br />
<span class="kw1">import</span> win32con</p>
<p>WNDPROC = WINFUNCTYPE<span class="br0">&#40;</span>c_long, c_int, c_uint, c_int, c_int<span class="br0">&#41;</span></p>
<p>NULL = c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><br />
_user32 = windll.<span class="me1">user32</span></p>
<p><span class="kw1">def</span> ErrorIfZero<span class="br0">&#40;</span>handle<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">if</span> handle == <span class="nu0">0</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> handle</p>
<p>CreateWindowEx = _user32.<span class="me1">CreateWindowExW</span><br />
CreateWindowEx.<span class="me1">argtypes</span> = <span class="br0">&#91;</span>c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#93;</span><br />
CreateWindowEx.<span class="me1">restype</span> = ErrorIfZero</p>
<p>
<span class="kw1">class</span> WNDCLASS<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'style'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpfnWndProc'</span>, WNDPROC<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbClsExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbWndExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hInstance'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hIcon'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hCursor'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hbrBackground'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszMenuName'</span>, c_wchar_p<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszClassName'</span>, c_wchar_p<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndProc,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;style=win32con.<span class="me1">CS_HREDRAW</span> | win32con.<span class="me1">CS_VREDRAW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;clsExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;menuName=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;className=u<span class="st0">&quot;PythonWin32&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;instance=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;icon=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cursor=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;background=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> instance:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance = windll.<span class="me1">kernel32</span>.<span class="me1">GetModuleHandleW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> icon:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; icon = _user32.<span class="me1">LoadIconW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDI_APPLICATION</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> cursor:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cursor = _user32.<span class="me1">LoadCursorW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDC_ARROW</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> background:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; background = windll.<span class="me1">gdi32</span>.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">WHITE_BRUSH</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpfnWndProc</span>=wndProc<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">style</span>=style<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbClsExtra</span>=clsExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbWndExtra</span>=wndExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hInstance</span>=instance<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hIcon</span>=icon<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hCursor</span>=cursor<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hbrBackground</span>=background<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszMenuName</span>=menuName<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszClassName</span>=className</p>
<p><span class="kw1">class</span> RECT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'left'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'top'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'right'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'bottom'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, left=<span class="nu0">0</span>, top=<span class="nu0">0</span>, right=<span class="nu0">0</span>, bottom=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">left</span> = left<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">top</span> = top<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">right</span> = right<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">bottom</span> = bottom</p>
<p><span class="kw1">class</span> PAINTSTRUCT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hdc'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fErase'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rcPaint'</span>, RECT<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fRestore'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fIncUpdate'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rgbReserved'</span>, c_wchar * <span class="nu0">32</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">class</span> POINT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'x'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'y'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span> <span class="kw2">self</span>, x=<span class="nu0">0</span>, y=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">x</span> = x<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">y</span> = y<br />
&nbsp; &nbsp;<br />
<span class="kw1">class</span> MSG<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hwnd'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'message'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'wParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'time'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'pt'</span>, POINT<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp;<br />
<span class="kw1">def</span> pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Calls message loop&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; msg = MSG<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; pMsg = pointer<span class="br0">&#40;</span>msg<span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">while</span> _user32.<span class="me1">GetMessageW</span><span class="br0">&#40;</span>pMsg, NULL, <span class="nu0">0</span>, <span class="nu0">0</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">TranslateMessage</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DispatchMessageW</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> msg.<span class="me1">wParam</span></p>
<p>
<span class="kw1">class</span> Window<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Wraps an HWND handle&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, hwnd=NULL<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = hwnd<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Register event handlers</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> key <span class="kw1">in</span> <span class="kw2">dir</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; method = <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>, key<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">hasattr</span><span class="br0">&#40;</span>method, <span class="st0">&quot;win32message&quot;</span><span class="br0">&#41;</span> <span class="kw1">and</span> <span class="kw2">callable</span><span class="br0">&#40;</span>method<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers<span class="br0">&#91;</span>method.<span class="me1">win32message</span><span class="br0">&#93;</span> = method<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> GetClientRect<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = RECT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span>, byref<span class="br0">&#40;</span>rect<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> rect<br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">def</span> Create<span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; exStyle=<span class="nu0">0</span> , &nbsp; &nbsp; &nbsp; &nbsp;<span class="co1"># &nbsp;DWORD dwExStyle</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; className=u<span class="st0">&quot;WndClass&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName=u<span class="st0">&quot;Window&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; style=win32con.<span class="me1">WS_OVERLAPPEDWINDOW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; width=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; height=win32con.<span class="me1">CW_USEDEFAULT</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; menu=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lparam=NULL,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = CreateWindowEx<span class="br0">&#40;</span>exStyle,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; className,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; style,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; x,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; y,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; width,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; height,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parent,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; menu,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lparam<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span>.<span class="me1">hwnd</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> Show<span class="br0">&#40;</span><span class="kw2">self</span>, flag<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> _user32.<span class="me1">ShowWindow</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span>, flag<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> Update<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> _user32.<span class="me1">UpdateWindow</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> WndProc<span class="br0">&#40;</span><span class="kw2">self</span>, hwnd, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp;<br />
&nbsp; &nbsp; &nbsp; &nbsp; event_handler = <span class="kw2">self</span>._event_handlers.<span class="me1">get</span><span class="br0">&#40;</span>message, <span class="kw2">None</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> event_handler:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> event_handler<span class="br0">&#40;</span>message, wParam, lParam<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> _user32.<span class="me1">DefWindowProcW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>hwnd<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>message<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>wParam<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span>lParam<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="co1">## Lifted shamelessly from WCK (effbot)'s wckTkinter.bind</span><br />
<span class="kw1">def</span> EventHandler<span class="br0">&#40;</span>message<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Decorator for event handlers&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> decorator<span class="br0">&#40;</span>func<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func.<span class="me1">win32message</span> = message<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> func<br />
&nbsp; &nbsp; <span class="kw1">return</span> decorator</p>
<p><span class="kw1">class</span> HelloWindow<span class="br0">&#40;</span>Window<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;The application window&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; @EventHandler<span class="br0">&#40;</span>win32con.<span class="me1">WM_PAINT</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> OnPaint<span class="br0">&#40;</span><span class="kw2">self</span>, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Draw 'Hello World' in center of window&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; ps = PAINTSTRUCT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; hdc = _user32.<span class="me1">BeginPaint</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>, byref<span class="br0">&#40;</span>ps<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rect = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; flags = win32con.<span class="me1">DT_SINGLELINE</span>|win32con.<span class="me1">DT_CENTER</span>|win32con.<span class="me1">DT_VCENTER</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DrawTextW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>hdc<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u<span class="st0">&quot;Hello, world!&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; c_int<span class="br0">&#40;</span><span class="nu0">-1</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byref<span class="br0">&#40;</span>rect<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; flags<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">EndPaint</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">hwnd</span><span class="br0">&#41;</span>, byref<span class="br0">&#40;</span>ps<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span></p>
<p>&nbsp; &nbsp; @EventHandler<span class="br0">&#40;</span>win32con.<span class="me1">WM_DESTROY</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> OnDestroy<span class="br0">&#40;</span><span class="kw2">self</span>, message, wParam, lParam<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Quit app when window is destroyed&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">PostQuitMessage</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="nu0">0</span></p>
<p><span class="kw1">def</span> RunHello<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Create window and start message loop&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># two-stage creation for Win32 windows</span><br />
&nbsp; &nbsp; hello = HelloWindow<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># register window class…</span><br />
&nbsp; &nbsp; wndclass = WNDCLASS<span class="br0">&#40;</span>WNDPROC<span class="br0">&#40;</span>hello.<span class="me1">WndProc</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; wndclass.<span class="me1">lpszClassName</span> = u<span class="st0">&quot;HelloWindow&quot;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> _user32.<span class="me1">RegisterClassW</span><span class="br0">&#40;</span>byref<span class="br0">&#40;</span>wndclass<span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="co1"># …then create Window</span><br />
&nbsp; &nbsp; hello.<span class="me1">Create</span><span class="br0">&#40;</span> className=wndclass.<span class="me1">lpszClassName</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance=wndclass.<span class="me1">hInstance</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; windowName=u<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># Show Window</span><br />
&nbsp; &nbsp; hello.<span class="me1">Show</span><span class="br0">&#40;</span>win32con.<span class="me1">SW_SHOWNORMAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; hello.<span class="me1">Update</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>RunHello<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>Venster</h2>
<p><a href="http://venster.sourceforge.net/htdocs/index.html">Venster</a> was a very promising wrapper over the Win32 API, borrowing heavily from WTL and ATL windowing techniques. Unfortunately, the project hasn't been updated in several years, and doesn't support the latest versions of Python (especially after ctypes.com was dropped). </p>
<h3>Pros</h3>
<ul>
<li>Rational abstraction layer on top of Win32</li>
<li>Use to write native, lightweight (relatively speaking) GUI applications</li>
<li>Has most of the cool Win32 tricks like hosting ActiveX and Coolbars</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Out of date; not updated in several years</li>
</ul>
<p>Hello world example (<a href="http://venster.sourceforge.net/htdocs/tutorial.html">code source</a>):<br />
<img src="/img/hello-venster.png" alt="Venster GUI screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">from</span> venster.<span class="me1">windows</span> <span class="kw1">import</span> *<br />
<span class="kw1">from</span> venster.<span class="me1">wtl</span> <span class="kw1">import</span> *</p>
<p><span class="kw1">from</span> venster <span class="kw1">import</span> gdi</p>
<p><span class="kw1">class</span> MyWindow<span class="br0">&#40;</span>Window<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _window_title_ = <span class="st0">&quot;Hello World&quot;</span><br />
&nbsp; &nbsp; _window_background_ = gdi.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>WHITE_BRUSH<span class="br0">&#41;</span><br />
&nbsp; &nbsp; _window_class_style_ = CS_HREDRAW | CS_VREDRAW</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> OnPaint<span class="br0">&#40;</span><span class="kw2">self</span>, event<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; ps = PAINTSTRUCT<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; hdc = <span class="kw2">self</span>.<span class="me1">BeginPaint</span><span class="br0">&#40;</span>ps<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; rc = <span class="kw2">self</span>.<span class="me1">GetClientRect</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; msg = <span class="st0">&quot;Hello World&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; gdi.<span class="me1">TextOut</span><span class="br0">&#40;</span>hdc, rc.<span class="me1">width</span> / <span class="nu0">2</span>, rc.<span class="me1">height</span> / <span class="nu0">2</span>, msg, <span class="kw2">len</span><span class="br0">&#40;</span>msg<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">EndPaint</span><span class="br0">&#40;</span>ps<span class="br0">&#41;</span><br />
&nbsp; &nbsp; msg_handler<span class="br0">&#40;</span>WM_PAINT<span class="br0">&#41;</span><span class="br0">&#40;</span>OnPaint<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> OnDestroy<span class="br0">&#40;</span><span class="kw2">self</span>, event<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; PostQuitMessage<span class="br0">&#40;</span>NULL<span class="br0">&#41;</span><br />
&nbsp; &nbsp; msg_handler<span class="br0">&#40;</span>WM_DESTROY<span class="br0">&#41;</span><span class="br0">&#40;</span>OnDestroy<span class="br0">&#41;</span></p>
<p>myWindow = MyWindow<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
application = Application<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
application.<span class="me1">Run</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>PyGTK</h2>
<p>PyGTK seems to have a lot going for it as a cross-platform toolkit. It's also licensed under the <a href="http://en.wikipedia.org/wiki/GNU_Lesser_General_Public_License">LGPL</a>, which I like a lot more than the <a href="http://en.wikipedia.org/wiki/GNU_General_Public_License">GPL</a> of PyQT. Unfortunately, it doesn't use native Windows widgets; it does a pretty good job of faking it, but it stands out like a Win32, .NET, or wxPython app wouldn't. </p>
<h3>Pros</h3>
<ul>
<li>Cross platform</li>
<li>Lots of widgets</li>
<li>Voluminous (if somewhat disorganized) documenation</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Native Win32 widgets not used (looks good, but not quite all the way there)</li>
<li>Must include large runtime when packaging with py2exe</li>
</ul>
<p>Hello world example (<a href="http://www.pygtk.org/pygtk2tutorial/examples/helloworld.py">code source</a>):<br />
<img src="/img/hello-gtk.png" alt="PyGTK screen shot" /></p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">import</span> pygtk<br />
pygtk.<span class="me1">require</span><span class="br0">&#40;</span><span class="st0">'2.0&#8242;</span><span class="br0">&#41;</span><br />
<span class="kw1">import</span> gtk</p>
<p><span class="kw1">class</span> HelloWorld:</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> hello<span class="br0">&#40;</span><span class="kw2">self</span>, widget, data=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;Hello World&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> delete_event<span class="br0">&#40;</span><span class="kw2">self</span>, widget, event, data=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;delete event occurred&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">False</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> destroy<span class="br0">&#40;</span><span class="kw2">self</span>, widget, data=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;destroy signal occurred&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; gtk.<span class="me1">main_quit</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span> = gtk.<span class="me1">Window</span><span class="br0">&#40;</span>gtk.<span class="me1">WINDOW_TOPLEVEL</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">&quot;delete_event&quot;</span>, <span class="kw2">self</span>.<span class="me1">delete_event</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">&quot;destroy&quot;</span>, <span class="kw2">self</span>.<span class="me1">destroy</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">set_border_width</span><span class="br0">&#40;</span><span class="nu0">10</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span> = gtk.<span class="me1">Button</span><span class="br0">&#40;</span><span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="st0">&quot;clicked&quot;</span>, <span class="kw2">self</span>.<span class="me1">hello</span>, <span class="kw2">None</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span>.<span class="me1">connect_object</span><span class="br0">&#40;</span><span class="st0">&quot;clicked&quot;</span>, gtk.<span class="me1">Widget</span>.<span class="me1">destroy</span>, <span class="kw2">self</span>.<span class="me1">window</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">add</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">button</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">button</span>.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">window</span>.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> main<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; gtk.<span class="me1">main</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; hello = HelloWorld<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; hello.<span class="me1">main</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Intermediate Python: Pythonic file searches</title>
		<link>http://ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 06:14:34 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/</guid>
		<description><![CDATA[It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very pythonic &#8212; that is, it doesn't use Python idioms that experienced programmers use.
The problems with un-pythonic code are that it tends to be more verbose, more difficult to [...]]]></description>
			<content:encoded><![CDATA[<p>It's very easy to get up and running with Python, but programmers coming from other more verbose or procedural languages tend to write code that's not very <a href="http://faassen.n--tree.net/blog/view/weblog/2005/08/06/0">pythonic</a> &#8212; that is, it doesn't use Python idioms that experienced programmers use.</p>
<p>The problems with un-pythonic code are that it tends to be more verbose, more difficult to understand, and even to run slower. Here's a naive implementation of a function to find every line in a supplied filename containing a specified string. It returns a list of (line_num, line) tuples.</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">def</span> naive_way<span class="br0">&#40;</span>to_find, filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find string to_find in file filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; file_handle = <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; line_number = <span class="nu0">0</span><br />
&nbsp; &nbsp; lines = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; done = <span class="kw2">False</span><br />
&nbsp; &nbsp; <span class="kw1">while</span> done == <span class="kw2">False</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; line = file_handle.<span class="kw3">readline</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> line:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; done = <span class="kw2">True</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; line_number += <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index = line.<span class="me1">find</span><span class="br0">&#40;</span>to_find<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> index &gt; <span class="nu0">-1</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lines.<span class="me1">append</span><span class="br0">&#40;</span><span class="br0">&#40;</span>line_number, line<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> lines</div>
<p>This code is fairly readable and it gets the job done, but we can do better. Notice all these variables lying around? Those are bad because they clutter up the function (making the intent of the function harder to see), and actually slow down the code. Things like "line_number += 1&#8243; are more costly than you might expect, because every time you write "1&#8243; you're creating an object.</p>
<p>We can get rid of "<code>done</code>" and "<code>file_handle</code>" by iterating over the file rather than using the low-level <code>readline()</code> method. We can avoid the code to increment "<code>line_number</code>" by using the built-in <code>enumerate</code> generator function. Finally, we can get rid of "<code>index</code>" by using the "<code>in</code>" statement.</p>
<p>Here's a more pythonic version of the above function:</p>
<div class="dean_ch" style="white-space: nowrap;">
<span class="kw1">def</span> pythonic_way<span class="br0">&#40;</span>to_find, filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find string to_find in file filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; lines = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> line_num, line <span class="kw1">in</span> <span class="kw2">enumerate</span><span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> to_find <span class="kw1">in</span> line:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; lines.<span class="me1">append</span><span class="br0">&#40;</span><span class="br0">&#40;</span>line_num<span class="nu0">+1</span>, line<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> lines</div>
<p>Remember what I said about pythonic code being faster? Here are the times I got for running these functions 100 times, searching for "your system" in "/python25/readme.txt" (rounded to nearest three decimals):</p>
<table>
<tr>
<th>&nbsp;</th>
<th>Without psyco</th>
<th>With psyco</th>
</tr>
<tr>
<th align="left">naive_way</th>
<td class="number">0.411 s</td>
<td class="number">0.213 s</td>
</tr>
<tr>
<th align="left">pythonic_way</th>
<td class="number">0.116 s</td>
<td class="number">0.082 s</td>
</tr>
</table>
<p>Psyco manages to narrow the gap a bit (probably by optimizing away those object creations), but even with psyco the pythonic function is 2.5x faster, not to mention more readable (to a Python programmer, at least!). And since bugs are directly correlated to number of lines of source code, it's likely to have fewer bugs as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/02/14/intermediate-python-pythonic-file-searches/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The partial rewrite</title>
		<link>http://ginstrom.com/scribbles/2008/02/13/the-partial-rewrite/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/13/the-partial-rewrite/#comments</comments>
		<pubDate>Wed, 13 Feb 2008 06:57:20 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/13/the-partial-rewrite/</guid>
		<description><![CDATA[I haven't been doing much blogging lately. Instead, I've been busy working and hacking. On the hacking side of things, I've been doing a partial rewrite of my big application, a translation-memory application written in C++.
The application is now about 10 years old, and over time and several releases it's grown more and more difficult [...]]]></description>
			<content:encoded><![CDATA[<p>I haven't been doing much blogging lately. Instead, I've been busy working and hacking. On the hacking side of things, I've been doing a partial rewrite of <a href="http://www.transassist.com/english/">my big application</a>, a <a href="http://en.wikipedia.org/wiki/Translation_memory">translation-memory</a> application written in C++.</p>
<p>The application is now about 10 years old, and over time and several releases it's grown more and more difficult to add new features, while fighting the introduction of bugs. I have in fact rewritten this application once already. But as features again began to accrete, maintenance and improvement were starting to drag again.</p>
<h2>The definition of insanity</h2>
<blockquote><p>The definition of insanity is doing the same thing over and over and expecting different results.</p></blockquote>
<p align="right"><em>&#8211; Benjamin Franklin</em></p>
<p>The first time I rewrote the application, maintenance and improvement got a lot easier (for a couple of years at least), so my first thought was to rewrite the thing again. But that obviously didn't turn out so well last time in the long run, or I wouldn't be considering another rewrite 6 years later. Meanwhile, during the intervening years lots of new ideas about development have been bumping around in my head. </p>
<p>One is the idea that rewrites usually <a href="http://www.joelonsoftware.com/articles/fog0000000069.html">aren't a very good idea</a>: even if they succeed, <a href="http://chadfowler.com/2006/12/27/the-big-rewrite">their goals could have been met more efficiently</a>. Another is the idea of improving code bases through refactoring, as detailed brilliantly in <a href="http://www.amazon.com/Working-Effectively-Legacy-Robert-Martin/dp/0131177052">Working Effectively with Legacy Code</a>. Finally, I've been turned on to the great power of working with higher-level languages as much as possible.</p>
<h2>What's worth doing is worth doing halfway</h2>
<p>So this time around, I decided to do things a little differently. Instead of rewriting the whole application, I would rewrite parts of it in a higher-level language, refactoring the rest. Since I'm a big fan of <a href="http://www.python.org/">Python</a>, I decided to use it for my high-level language.</p>
<p>Since the existing application already uses COM heavily, I decided to use COM as the communication mechanism between C++ and Python code. The next question was, what parts should I rewrite?</p>
<p>Some parts were a no-brainer.