<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The GITS Blog &#187; python</title>
	<atom:link href="http://ginstrom.com/scribbles/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<lastBuildDate>Thu, 05 Aug 2010 13:07:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>mailer version 0.5 released</title>
		<link>http://ginstrom.com/scribbles/2009/05/28/mailer-version-05-released/</link>
		<comments>http://ginstrom.com/scribbles/2009/05/28/mailer-version-05-released/#comments</comments>
		<pubDate>Thu, 28 May 2009 01:02:05 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[mailer]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=1094</guid>
		<description><![CDATA[I've released version 0.5 of my mailer python module for sending emails. Thanks to a patch from Douglas Mayle, this version makes it possible to send HTML emails with attachments (previous versions only let you do one or the other). Project homepage pypi page]]></description>
			<content:encoded><![CDATA[<p>I've released version 0.5 of my <a href="http://pypi.python.org/pypi/mailer/">mailer python module</a> for sending emails. Thanks to a patch from Douglas Mayle, this version makes it possible to send HTML emails with attachments (previous versions only let you do one or the other).</p>
<ul>
<li><a href="/code/mailer.html">Project homepage</a></li>
<li><a href="http://pypi.python.org/pypi/mailer/">pypi page</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/05/28/mailer-version-05-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting kanji numbers to integers with Python</title>
		<link>http://ginstrom.com/scribbles/2009/04/28/converting-kanji-numbers-to-integers-with-python/</link>
		<comments>http://ginstrom.com/scribbles/2009/04/28/converting-kanji-numbers-to-integers-with-python/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 05:19:25 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[conversion]]></category>
		<category><![CDATA[kanji]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=1008</guid>
		<description><![CDATA[A question on StackOverflow about converting kanji numbers (e.g. "五十五") into integers in C++ got me interested in solving this using Python. The result is my kanjinums module, with a function kanji2num that will convert a string containing a kanji num to a Python integer. Download the source distribution (kanjinums-0.1.zip) Download the Windows installer (kanjinums-0.1.win32.exe) [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://stackoverflow.com/questions/795868/how-to-parse-kanji-numeric-characters-using-icu" rel="nofollow">question on StackOverflow</a> about converting kanji numbers (e.g. "五十五") into integers in C++ got me interested in solving this using Python.</p>
<p>The result is my kanjinums module, with a function kanji2num that will convert a string containing a kanji num to a Python integer.</p>
<p><a href="/code/kanjinums-0.1.zip">Download the source distribution (kanjinums-0.1.zip)</a><br />
<a href="/code/kanjinums-0.1.win32.exe">Download the Windows installer (kanjinums-0.1.win32.exe)</a><br />
<a href="/code/test_kanjinums.zip">Download the unit tests (test_kanjinums.zip)</a></p>
<p>Examples:</p>
<div class="dean_ch" style="white-space: wrap;">
&gt;&gt;&gt; <span class="kw1">import</span> kanjinums<br />
&gt;&gt;&gt; kanjinums.<span class="me1">kanji2num</span><span class="br0">&#40;</span><span class="st0">&quot;五百十一&quot;</span>, <span class="st0">&quot;sjis&quot;</span><span class="br0">&#41;</span><br />
<span class="nu0">511</span><br />
&gt;&gt;&gt; kanjinums.<span class="me1">kanji2num</span><span class="br0">&#40;</span><span class="st0">&quot;三万十五&quot;</span>, <span class="st0">&quot;sjis&quot;</span><span class="br0">&#41;</span><br />
<span class="nu0">30015</span></div>
<p>Here's the full code:</p>
<div class="dean_ch" style="white-space: wrap;"><span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
Converts kanji numbers into integers</p>
<p>Can covert numbers up to 9,999,999,999,999,999<br />
(九千九百九十九兆九千九百九十九億九千九百九十九万九千九百九十九)</p>
<p>Released under MIT license.<br />
&quot;</span><span class="st0">&quot;&quot;</span><br />
__version__ = <span class="st0">&quot;0.1&quot;</span><br />
__author__ &nbsp;= <span class="st0">&quot;Ryan Ginstrom&quot;</span><br />
__license__ = <span class="st0">&quot;MIT&quot;</span><br />
__description__ = <span class="st0">&quot;A module to convert kanji numbers into Python integers&quot;</span></p>
<p>NUMS = <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="nu0">1</span>, u<span class="st0">&quot;一&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">2</span>, u<span class="st0">&quot;二&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">3</span>, u<span class="st0">&quot;三&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">4</span>, u<span class="st0">&quot;四&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">5</span>, u<span class="st0">&quot;五&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">6</span>, u<span class="st0">&quot;六&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">7</span>, u<span class="st0">&quot;七&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">8</span>, u<span class="st0">&quot;八&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">9</span>, u<span class="st0">&quot;九&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">10</span>, u<span class="st0">&quot;十&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">100</span>, u<span class="st0">&quot;百&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">1000</span>, u<span class="st0">&quot;千&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">10000</span>, u<span class="st0">&quot;万&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">100000000</span>, u<span class="st0">&quot;億&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="nu0">1000000000000</span>, u<span class="st0">&quot;兆&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>KANJIS = <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>kanji, num<span class="br0">&#41;</span> <span class="kw1">for</span> <span class="br0">&#40;</span>num, kanji<span class="br0">&#41;</span> <span class="kw1">in</span> NUMS<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> _break_down_nums<span class="br0">&#40;</span>nums<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; first, second, third, rest = nums<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>, nums<span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span>, nums<span class="br0">&#91;</span><span class="nu0">2</span><span class="br0">&#93;</span>, nums<span class="br0">&#91;</span><span class="nu0">3</span>:<span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> first &lt; third <span class="kw1">or</span> third &lt; second:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>first+second, third<span class="br0">&#93;</span> + rest<br />
&nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>first, second*third<span class="br0">&#93;</span> + rest</p>
<p><span class="kw1">def</span> kanji2num<span class="br0">&#40;</span>kanji, enc=<span class="st0">&quot;utf-8&quot;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Convert the kanji number to a Python integer.<br />
&nbsp; &nbsp; Supply `kanji` as a unicode string, or a byte string<br />
&nbsp; &nbsp; with the encoding specified in `enc`.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> <span class="kw2">isinstance</span><span class="br0">&#40;</span>kanji, <span class="kw2">unicode</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; kanji = <span class="kw2">unicode</span><span class="br0">&#40;</span>kanji, enc<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># get the string as list of numbers</span><br />
&nbsp; &nbsp; nums = <span class="br0">&#91;</span>KANJIS<span class="br0">&#91;</span>x<span class="br0">&#93;</span> <span class="kw1">for</span> x <span class="kw1">in</span> kanji<span class="br0">&#93;</span></p>
<p>&nbsp; &nbsp; num = <span class="nu0">0</span><br />
&nbsp; &nbsp; <span class="kw1">while</span> <span class="kw2">len</span><span class="br0">&#40;</span>nums<span class="br0">&#41;</span> &gt; <span class="nu0">1</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; first, second, rest = nums<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>, nums<span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span>, nums<span class="br0">&#91;</span><span class="nu0">2</span>:<span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> second &lt; first: <span class="co1"># e.g. [10, 3, ...]</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> any<span class="br0">&#40;</span>x &gt; first <span class="kw1">for</span> x <span class="kw1">in</span> rest<span class="br0">&#41;</span>: <span class="co1"># e.g. [500, 3, 10000, ...]</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; nums = _break_down_nums<span class="br0">&#40;</span>nums<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>: <span class="co1"># e.g. [500, 3, 10, ...]</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; num += first<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; nums = <span class="br0">&#91;</span>second<span class="br0">&#93;</span> + rest<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>: <span class="co1"># e.g. [3, 10, ...]</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; nums = <span class="br0">&#91;</span>first*second<span class="br0">&#93;</span> + rest</p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> num + <span class="kw2">sum</span><span class="br0">&#40;</span>nums<span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/04/28/converting-kanji-numbers-to-integers-with-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Conditional &#8220;tee&#8221; with Python</title>
		<link>http://ginstrom.com/scribbles/2009/03/02/conditional-tee-with-python/</link>
		<comments>http://ginstrom.com/scribbles/2009/03/02/conditional-tee-with-python/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 01:21:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[conditional]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[generators]]></category>
		<category><![CDATA[iterators]]></category>
		<category><![CDATA[tee]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=858</guid>
		<description><![CDATA[This post describes the conditional tee ("ctee") module I wrote to split a sequence into two generators, according to a filter function. The problem David Beazley has a great article about generator pipelining using Python. This is a technique for handling (potentially very large) streams of data in a flexible yet efficient way. As an [...]]]></description>
			<content:encoded><![CDATA[<p>This post describes the <a href="/code/ctee.zip">conditional tee ("ctee") module</a> I wrote to split a sequence into two generators, according to a filter function.</p>
<h3>The problem</h3>
<p>David Beazley has <a href="http://www.dabeaz.com/generators/">a great article about generator pipelining</a> using Python. This is a technique for handling (potentially very large) streams of data in a flexible yet efficient way. As an example, here's a code snippet he gives for summing the total bytes from a log file:</p>
<div class="dean_ch" style="white-space: wrap;">
wwwlog = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;access-log&quot;</span><span class="br0">&#41;</span><br />
bytecolumn = <span class="br0">&#40;</span>line.<span class="me1">rsplit</span><span class="br0">&#40;</span><span class="kw2">None</span>,<span class="nu0">1</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">1</span><span class="br0">&#93;</span> <span class="kw1">for</span> line <span class="kw1">in</span> wwwlog<span class="br0">&#41;</span><br />
bytes = <span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#40;</span>x<span class="br0">&#41;</span> <span class="kw1">for</span> x <span class="kw1">in</span> bytecolumn <span class="kw1">if</span> x != <span class="st0">'-'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">print</span> <span class="st0">&quot;Total&quot;</span>, <span class="kw2">sum</span><span class="br0">&#40;</span>bytes<span class="br0">&#41;</span></div>
<p>The code above first opens a log file, then gets the byte column for each entry. The byte value (if any) is then calculated for each row. Finally, the generator is consumed (or "pumped"), yielding the sum.</p>
<p>Since the entire file is never loaded into active memory, you could run this on quite huge log files, or even add a few steps and run it on collections of log files, without blowing up your memory. Another feature of this technique is that it's very flexible: you can add steps, combine steps into atomic actions, rearrange them, and so on.</p>
<p>This works great, as long as your pipe doesn't branch. If you want to split your pipe &#8212; say, dividing a stream of integers into one stream of even numbers and another of odds, things get a little complicated. One really elegant way to handle this situation is with the <code>itertools.tee</code> function. <code>tee</code> takes an iterable sequence, and returns <em>n</em> "copies" of that sequence that can be iterated independently.</p>
<h3>Using <code>itertools.tee</code></h3>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">itertools</span></p>
<p>lines = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;numbers.txt&quot;</span><span class="br0">&#41;</span><br />
numbers = <span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#40;</span>line<span class="br0">&#41;</span> <span class="kw1">for</span> line <span class="kw1">in</span> lines<span class="br0">&#41;</span></p>
<p>first, second = <span class="kw3">itertools</span>.<span class="me1">tee</span><span class="br0">&#40;</span>numbers<span class="br0">&#41;</span></p>
<p>evens = <span class="br0">&#40;</span>i <span class="kw1">for</span> i <span class="kw1">in</span> first <span class="kw1">if</span> <span class="kw1">not</span> i % <span class="nu0">2</span><span class="br0">&#41;</span><br />
odds = <span class="br0">&#40;</span>i <span class="kw1">for</span> i <span class="kw1">in</span> second <span class="kw1">if</span> i % <span class="nu0">2</span><span class="br0">&#41;</span></p>
<p><span class="kw1">print</span> <span class="st0">&quot;Evens total:&quot;</span>, <span class="kw2">sum</span><span class="br0">&#40;</span>evens<span class="br0">&#41;</span><br />
<span class="kw1">print</span> <span class="st0">&quot;Odds total:&quot;</span>, <span class="kw2">sum</span><span class="br0">&#40;</span>odds<span class="br0">&#41;</span></div>
<p>The code first opens a file containing a bunch of random integers, and creates a generator that's a stream of integers. It then uses <code>itertools.tee</code> to make two copies of that generator (first and second), and applies generator expressions to create two streams: one of even numbers, and one of odd numbers. The built-in <code>sum</code> function is then used to consume each tee.</p>
<p><a href="/code/numbers.txt">Here's the number file</a> that I used for this code. It's a list of 1,000 random integers between 1 and 1,000,000.</p>
<p>That's fine in this case, where the filter expression is relatively inexpensive. But what if we have an expensive filter, like testing whether the number is prime, or making a database query? It could really hurt our performance if we have to perform the same test twice. Ideally, we'd just like to perform the test once for each element in our pipeline.</p>
<p>There are lots of ways to handle that situation. One common way is the <a href="http://en.wikipedia.org/wiki/Continuation-passing_style">"continuation-passing" style</a>, where you pass some data, a filter condition, and one or more functions to perform depending on the results of the test.</p>
<p>This works, but it disrupts the pipeline. That costs us the flexibility and dynamic nature of the generator paradigm.</p>
<p>I wrote the conditional tee (ctee) module for cases when you want to use a generator pipeline, but you need to split the sequence into two generators, and the filter condition is expensive. It creates a pair of instances of the <code>ConditionalTee</code> class, which are linked to each other.</p>
<h3>Using conditional tee</h3>
<p>Here's the meat of the code. The module can be <a href="/code/ctee.zip">downloaded here (ctee.zip)</a>.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">Queue</span> <span class="kw1">import</span> <span class="kw3">Queue</span></p>
<p><span class="kw1">class</span> ConditionalTee<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;A conditional tee class&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, sequence, condition<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">sequence</span> = sequence<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">condition</span> = condition<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">othertee</span> = <span class="kw2">None</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">q</span> = <span class="kw3">Queue</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> next<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; Get the next item that matches the condition.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Adds items to the queue of the other sequence until<br />
&nbsp; &nbsp; &nbsp; &nbsp; one matching this condition is reached.<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> <span class="kw2">self</span>.<span class="me1">q</span>.<span class="me1">empty</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span>.<span class="me1">q</span>.<span class="me1">get</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; item = <span class="kw2">self</span>.<span class="me1">sequence</span>.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> <span class="kw1">not</span> <span class="kw2">self</span>.<span class="me1">condition</span><span class="br0">&#40;</span>item<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">othertee</span>.<span class="me1">q</span>.<span class="me1">put</span><span class="br0">&#40;</span>item<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; item = <span class="kw2">self</span>.<span class="me1">sequence</span>.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> item</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__iter__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;We are an iterator&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span></p>
<p><span class="kw1">def</span> ctee<span class="br0">&#40;</span>sequence, condition<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Creates two sequences from sequence: one where<br />
&nbsp; &nbsp; condition holds, and the other where it doesn't<br />
&nbsp; &nbsp; sequence -&gt; (x for x in sequence if condition(x)),<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (x for x in sequence if not condition(x))<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; yes_iter = ConditionalTee<span class="br0">&#40;</span>sequence, condition<span class="br0">&#41;</span><br />
&nbsp; &nbsp; nocond = <span class="kw1">lambda</span> x : <span class="kw1">not</span> condition<span class="br0">&#40;</span>x<span class="br0">&#41;</span><br />
&nbsp; &nbsp; no_iter = ConditionalTee<span class="br0">&#40;</span>sequence, nocond<span class="br0">&#41;</span><br />
&nbsp; &nbsp; yes_iter.<span class="me1">othertee</span> = no_iter<br />
&nbsp; &nbsp; no_iter.<span class="me1">othertee</span> = yes_iter</p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> yes_iter, no_iter</div>
<p>The <code>ConditionalTee</code> class takes a sequence and a filter condition as arguments to its <code>__init__</code> method. The <code>__init__</code> method also creates an empty queue member, and an <code>othertee</code> member that's initialized to None.</p>
<p>When the <code>next</code> method of a <code>ConditionalTee</code> instance is called, it first looks for any items in its queue. If there is an item on the queue, it returns the first one. Otherwise, it iterates through its sequence; it keeps adding any items that don't match to the queue of its <code>othertee</code> member, until it either finds an item that matches or raises a <code>StopIteration</code> exception.</p>
<p>The <code>ctee</code> function also takes a sequence and a filter condition as arguments. It creates two <code>ConditionalTee</code> instances, and sets their <code>othertee</code> members to each other, then returns the two instances as a pair.</p>
<p>Here's some sample code using ctee:</p>
<div class="dean_ch" style="white-space: wrap;">
lines = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;numbers.txt&quot;</span><span class="br0">&#41;</span><br />
nums = <span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#40;</span>line<span class="br0">&#41;</span> <span class="kw1">for</span> line <span class="kw1">in</span> lines<span class="br0">&#41;</span><br />
iseven = <span class="kw1">lambda</span> x : <span class="kw1">not</span> x % <span class="nu0">2</span><br />
evens, odds = ctee.<span class="me1">ctee</span><span class="br0">&#40;</span>nums, iseven<span class="br0">&#41;</span></div>
<p>There's still a problem if you "pump" each of these generators in succession, though: if the amount of data is large, the other generator class is going to accumulate a huge queue of data. It would be better to pump each generator expression alternately, taking an item and processing it from each generator in turn, in order to avoid building up a big queue.</p>
<p>Here's a function that'll do that:</p>
<h3>Pumping generators alternately instead of consecutively</h3>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> diagonalize<span class="br0">&#40;</span>sequences<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Takes each sequence in turn, retrieving one item from that<br />
&nbsp; &nbsp; sequence and performing action on it, until all sequences<br />
&nbsp; &nbsp; are exhausted.<br />
&nbsp; &nbsp; sequence is a sequence of (iterable, action) pairs.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; sequences = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">iter</span><span class="br0">&#40;</span>s<span class="br0">&#41;</span>, a<span class="br0">&#41;</span> <span class="kw1">for</span> <span class="br0">&#40;</span>s, a<span class="br0">&#41;</span> <span class="kw1">in</span> sequences<span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">while</span> sequences:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> sequence, action <span class="kw1">in</span> sequences:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; item = sequence.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; action<span class="br0">&#40;</span>item<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">except</span> <span class="kw2">StopIteration</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># remove the exhausted sequence from the list</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sequences = <span class="br0">&#91;</span><span class="br0">&#40;</span>s, a<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>s, a<span class="br0">&#41;</span> <span class="kw1">in</span> sequences<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> s != sequence<span class="br0">&#93;</span></div>
<p>This takes a sequence of (sequence, action) pairs. It iterates through each pair, taking the next item in the sequence and applying the action to it. If the sequence raises <code>StopIteration</code>, it's removed from the list of sequences. The list comprehension at the start of the function is to make <code>sequence</code> test False when empty, and to ensure each sequence in it is an iterable (i.e. supporting <code>next</code>).</p>
<p>Here's an example of using this function:</p>
<div class="dean_ch" style="white-space: wrap;">
lines = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;numbers.txt&quot;</span><span class="br0">&#41;</span><br />
numbers = <span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#40;</span>line<span class="br0">&#41;</span> <span class="kw1">for</span> line <span class="kw1">in</span> lines<span class="br0">&#41;</span></p>
<p>evens, odds = ctee<span class="br0">&#40;</span>numbers, <span class="kw1">lambda</span> x : <span class="kw1">not</span> x % <span class="nu0">2</span><span class="br0">&#41;</span></p>
<p>evenout = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;evens.txt&quot;</span>, <span class="st0">&quot;w&quot;</span><span class="br0">&#41;</span><br />
oddout = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;odds.txt&quot;</span>, <span class="st0">&quot;w&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> writeline<span class="br0">&#40;</span>out, item<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> &gt;&gt; out, item</p>
<p>evenaction = <span class="kw1">lambda</span> x : writeline<span class="br0">&#40;</span>evenout, x<span class="br0">&#41;</span><br />
oddaction = <span class="kw1">lambda</span> x : writeline<span class="br0">&#40;</span>oddout, x<span class="br0">&#41;</span></p>
<p>diagonalize<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="br0">&#40;</span>evens, evenaction<span class="br0">&#41;</span>, <span class="br0">&#40;</span>odds, oddaction<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<p>This code will write all the even numbers to "evens.txt", and all the odd numbers to "odds.txt".</p>
<p>You might ask, how is this different from the continuation passing style? And you'd have a point; this is essentially continuation passing.</p>
<p>The thing is that here, the pumping only happens at the end. You can still go on wrapping all sorts of other filtering and transforming generators around your two conditional tees; the sequence won't actually be processed until you start pumping the generators at the end, so you won't build up enormous queues of data.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/03/02/conditional-tee-with-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Coming from C++/Java to Python should bend your mind</title>
		<link>http://ginstrom.com/scribbles/2009/02/03/coming-from-cjava-to-python-should-bend-your-mind/</link>
		<comments>http://ginstrom.com/scribbles/2009/02/03/coming-from-cjava-to-python-should-bend-your-mind/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 02:30:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[c++]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[pythonic]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=839</guid>
		<description><![CDATA[Coming from C++ or Java to Python should bend your mind. If it doesn't, then you haven't learned Python yet &#8212; you're just writing C++ or Java in Python. If you only knew a statically typed and compiled language like C++ or Java before, and learning Python hasn't changed the way you think about programming, [...]]]></description>
			<content:encoded><![CDATA[<p>Coming from C++ or Java to Python should bend your mind. If it doesn't, then you haven't learned Python yet &#8212; you're just writing C++ or Java in Python.</p>
<p>If you only knew a statically typed and compiled language like C++ or Java before, and learning Python hasn't changed the way you think about programming, then check your programs for lots of type-checking code, inheritance, array indexing, and mutable variables. Look at some <a href="http://effbot.org/">excellent</a> <a href="http://www.aleax.it/python_mat_en.html">Python</a> <a href="http://code.activestate.com/recipes/">code</a>, and identify where your code diverges from it.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/02/03/coming-from-cjava-to-python-should-bend-your-mind/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>An easy way to write XML in Python</title>
		<link>http://ginstrom.com/scribbles/2009/01/07/an-easy-way-to-write-xml-in-python/</link>
		<comments>http://ginstrom.com/scribbles/2009/01/07/an-easy-way-to-write-xml-in-python/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 02:41:17 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=766</guid>
		<description><![CDATA[David Mertz's gnosis utilities include the module gnosis.xml.objectify, which makes parsing XML in python as simple as could be. from gnosis.xml import objectify xmltext = &#34;&#34;&#34;&#60;?xml version=&#34;1.0&#34; encoding=&#34;UTF-8&#34;?&#62; &#60;root&#62;&#60;first canned=&#34;true&#34; yummy=&#34;false&#34;&#62;spam&#60;/first&#62;&#60;second&#62;egg&#60;/second&#62;&#60;/root&#62;&#34;&#34;&#34; inst = objectify.make_instance&#40;xmltext&#41; print &#34;first:&#34;, inst.first.PCDATA print &#34; &#160;canned:&#34;, inst.first.canned print &#34; &#160;yummy:&#34;, inst.first.yummy print &#34;second:&#34;, inst.second.PCDATA Output: first: spam &#160; canned: true &#160; [...]]]></description>
			<content:encoded><![CDATA[<p>David Mertz's <a href="http://gnosis.cx/download/">gnosis utilities</a> include the module gnosis.xml.objectify, which makes parsing XML in python as simple as could be.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> gnosis.<span class="kw3">xml</span> <span class="kw1">import</span> objectify</p>
<p>xmltext = <span class="st0">&quot;&quot;</span><span class="st0">&quot;&lt;?xml version=&quot;</span><span class="nu0">1.0</span><span class="st0">&quot; encoding=&quot;</span>UTF<span class="nu0">-8</span><span class="st0">&quot;?&gt;<br />
&lt;root&gt;&lt;first canned=&quot;</span>true<span class="st0">&quot; yummy=&quot;</span>false<span class="st0">&quot;&gt;spam&lt;/first&gt;&lt;second&gt;egg&lt;/second&gt;&lt;/root&gt;&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>inst = objectify.<span class="me1">make_instance</span><span class="br0">&#40;</span>xmltext<span class="br0">&#41;</span><br />
<span class="kw1">print</span> <span class="st0">&quot;first:&quot;</span>, inst.<span class="me1">first</span>.<span class="me1">PCDATA</span><br />
<span class="kw1">print</span> <span class="st0">&quot; &nbsp;canned:&quot;</span>, inst.<span class="me1">first</span>.<span class="me1">canned</span><br />
<span class="kw1">print</span> <span class="st0">&quot; &nbsp;yummy:&quot;</span>, inst.<span class="me1">first</span>.<span class="me1">yummy</span><br />
<span class="kw1">print</span> <span class="st0">&quot;second:&quot;</span>, inst.<span class="me1">second</span>.<span class="me1">PCDATA</span></div>
<p>Output:</p>
<div class="dean_ch" style="white-space: wrap;">
first: spam<br />
&nbsp; canned: true<br />
&nbsp; yummy: false<br />
second: egg</div>
<p>I wanted a similarly simple way to <em>write</em> XML, so I wrote a little module named <a href="/code/xmlwriter.zip">xmlwriter (zip file)</a> to do it.</p>
<p>Usage:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> xmlwriter <span class="kw1">import</span> XmlNode, xmlify<br />
<span class="kw1">from</span> <span class="kw3">StringIO</span> <span class="kw1">import</span> <span class="kw3">StringIO</span></p>
<p>root = XmlNode<span class="br0">&#40;</span>u<span class="st0">&quot;root&quot;</span><span class="br0">&#41;</span></p>
<p>first = root.<span class="me1">first</span><br />
first.<span class="me1">val</span> = u<span class="st0">&quot;spam&quot;</span><br />
first<span class="br0">&#91;</span><span class="st0">&quot;yummy&quot;</span><span class="br0">&#93;</span> = u<span class="st0">&quot;false&quot;</span><br />
first<span class="br0">&#91;</span><span class="st0">&quot;canned&quot;</span><span class="br0">&#93;</span> = u<span class="st0">&quot;true&quot;</span></p>
<p>root.<span class="me1">second</span>.<span class="me1">val</span> = u<span class="st0">&quot;egg&quot;</span></p>
<p>out = <span class="kw3">StringIO</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
xmlify<span class="br0">&#40;</span>root, out<span class="br0">&#41;</span><br />
<span class="kw1">print</span> &nbsp;out.<span class="me1">getvalue</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>Output:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;UTF-8&quot;</span><span class="re2">?&gt;</span></span><br />
<span class="sc3"><span class="re1">&lt;root<span class="re2">&gt;</span></span></span><span class="sc3"><span class="re1">&lt;first</span> <span class="re0">canned</span>=<span class="st0">&quot;true&quot;</span> <span class="re0">yummy</span>=<span class="st0">&quot;false&quot;</span><span class="re2">&gt;</span></span>spam<span class="sc3"><span class="re1">&lt;/first<span class="re2">&gt;</span></span></span><span class="sc3"><span class="re1">&lt;second<span class="re2">&gt;</span></span></span>egg<span class="sc3"><span class="re1">&lt;/second<span class="re2">&gt;</span></span></span><span class="sc3"><span class="re1">&lt;/root<span class="re2">&gt;</span></span></span></div>
<p>It's still pretty basic. For example, you can't have sibling nodes with the same tag name. But it's a very simple way to do some down and dirty XML writing.</p>
<p>Here's the xmlwriter module code.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">StringIO</span> <span class="kw1">import</span> <span class="kw3">StringIO</span></p>
<p><span class="kw1">class</span> XmlNode<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Pythonic representation of an XML node.</p>
<p>&nbsp; &nbsp; Expects values and attributes to be in Unicode</p>
<p>&nbsp; &nbsp; Example:<br />
&nbsp; &nbsp; root = XmlNode(u&quot;</span>root<span class="st0">&quot;)<br />
&nbsp; &nbsp; root.node.val = u&quot;</span>value<span class="st0">&quot;<br />
&nbsp; &nbsp; root[&quot;</span>attr<span class="st0">&quot;] = u&quot;</span>name<span class="st0">&quot;<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, tag=<span class="kw2">None</span>, value=<span class="kw2">None</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._tag = tag<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">val</span> = value<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._attrs = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__getattr__</span><span class="br0">&#40;</span><span class="kw2">self</span>, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Add nodes on the fly&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="kw4">__dict__</span><span class="br0">&#91;</span>attr<span class="br0">&#93;</span> = XmlNode<span class="br0">&#40;</span>attr<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span>.<span class="kw4">__dict__</span><span class="br0">&#91;</span>attr<span class="br0">&#93;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># dictionary access</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__getitem__</span><span class="br0">&#40;</span><span class="kw2">self</span>, key<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">self</span>._attrs<span class="br0">&#91;</span>key<span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__setitem__</span><span class="br0">&#40;</span><span class="kw2">self</span>, key, val<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._attrs<span class="br0">&#91;</span>key<span class="br0">&#93;</span> = val</p>
<p><span class="kw1">def</span> write_open_tag<span class="br0">&#40;</span>node, out<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Writes the opening tag for node, including attributes<br />
&nbsp; &nbsp; out is a file-like object<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; out.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;&lt;%s&quot;</span> % node._tag<span class="br0">&#41;</span><br />
&nbsp; &nbsp; out.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;&quot;</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span><span class="st0">' %s=&quot;%s&quot;'</span> % <span class="br0">&#40;</span>k, v.<span class="me1">encode</span><span class="br0">&#40;</span><span class="st0">&quot;utf-8&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> node._attrs.<span class="me1">items</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; out.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;&gt;&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> xmlify<span class="br0">&#40;</span>root, out<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Takes the root, and recursively goes<br />
&nbsp; &nbsp; down printing out the tags, attributes, and values.<br />
&nbsp; &nbsp; out is a file-like object<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># write XML header if this is the first node</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> out.<span class="me1">pos</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> &gt;&gt; out, <span class="st0">&quot;&quot;</span><span class="st0">&quot;&lt;?xml version=&quot;</span><span class="nu0">1.0</span><span class="st0">&quot; encoding=&quot;</span>UTF<span class="nu0">-8</span><span class="st0">&quot;?&gt;&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># opening tag</span><br />
&nbsp; &nbsp; write_open_tag<span class="br0">&#40;</span>root, out<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># value</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> root.<span class="me1">val</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; out.<span class="me1">write</span><span class="br0">&#40;</span>root.<span class="me1">val</span>.<span class="me1">encode</span><span class="br0">&#40;</span><span class="st0">&quot;utf-8&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># sub-nodes</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> item <span class="kw1">in</span> <span class="kw2">dir</span><span class="br0">&#40;</span>root<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; attr = <span class="kw2">getattr</span><span class="br0">&#40;</span>root, item<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">isinstance</span><span class="br0">&#40;</span>attr, XmlNode<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; xmlify<span class="br0">&#40;</span>attr, out<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># closing tag</span><br />
&nbsp; &nbsp; out.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;&lt;/%s&gt;&quot;</span> % root._tag<span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/01/07/an-easy-way-to-write-xml-in-python/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Iterating over a window in Python</title>
		<link>http://ginstrom.com/scribbles/2008/12/07/iterating-over-a-window-in-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/12/07/iterating-over-a-window-in-python/#comments</comments>
		<pubDate>Sun, 07 Dec 2008 05:36:37 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[deque]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[itertools]]></category>
		<category><![CDATA[window]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=685</guid>
		<description><![CDATA[This weekend, I had a fairly common problem: given a sequence of elements and an element in that sequence, I needed to get the next element; if the element was the last in the sequence, then I needed the first element. My first stab at a solution was pretty straightforward: def item_after&#40;elements, item&#41;: &#160; &#160; [...]]]></description>
			<content:encoded><![CDATA[<p>This weekend, I had a fairly common problem: given a sequence of elements and an element in that sequence, I needed to get the next element; if the element was the last in the sequence, then I needed the first element.</p>
<p>My first stab at a solution was pretty straightforward:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> item_after<span class="br0">&#40;</span>elements, item<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">for</span> i <span class="kw1">in</span> <span class="kw2">range</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span>elements<span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> elements<span class="br0">&#91;</span>i<span class="br0">&#93;</span> == item:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> elements<span class="br0">&#91;</span>i<span class="nu0">+1</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">except</span> <span class="kw2">IndexError</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> elements<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span></div>
<p>This works for the intended purpose, although the C-style looping doesn't sit too well with me. But this code has a major flaw: it doesn't work for generators.</p>
<p>After a little more thought, I decided that what I wanted was to iterate over a window of two items in the sequence, and if the first item matched, return the second one.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> chain</p>
<p><span class="kw1">def</span> by_pairs<span class="br0">&#40;</span>iterable<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; sequence = <span class="kw2">iter</span><span class="br0">&#40;</span>iterable<span class="br0">&#41;</span><br />
&nbsp; &nbsp; previous = sequence.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">while</span> <span class="kw2">True</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; current = sequence.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> previous, current<br />
&nbsp; &nbsp; &nbsp; &nbsp; previous = current</div>
<p>To illustrate how this works:</p>
<div class="dean_ch" style="white-space: wrap;">
&gt;&gt;&gt; <span class="kw1">for</span> pair <span class="kw1">in</span> by_pairs<span class="br0">&#40;</span><span class="kw2">range</span><span class="br0">&#40;</span><span class="nu0">6</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> pair</p>
<p>
<span class="br0">&#40;</span><span class="nu0">0</span>, <span class="nu0">1</span><span class="br0">&#41;</span><br />
<span class="br0">&#40;</span><span class="nu0">1</span>, <span class="nu0">2</span><span class="br0">&#41;</span><br />
<span class="br0">&#40;</span><span class="nu0">2</span>, <span class="nu0">3</span><span class="br0">&#41;</span><br />
<span class="br0">&#40;</span><span class="nu0">3</span>, <span class="nu0">4</span><span class="br0">&#41;</span><br />
<span class="br0">&#40;</span><span class="nu0">4</span>, <span class="nu0">5</span><span class="br0">&#41;</span></div>
<p>The new item_after function:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> item_after2<span class="br0">&#40;</span>elements, item<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; pairs = by_pairs<span class="br0">&#40;</span>elements<span class="br0">&#41;</span><br />
&nbsp; &nbsp; first, second = pairs.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> a, b <span class="kw1">in</span> chain<span class="br0">&#40;</span><span class="br0">&#91;</span><span class="br0">&#40;</span>first, second<span class="br0">&#41;</span><span class="br0">&#93;</span>, pairs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> a == item:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> b<br />
&nbsp; &nbsp; <span class="kw1">return</span> first</div>
<p>In the new <code>item_after</code> function, I peel off the first pair so I can cache the first element, then glob it back on the front of the sequence using <code>itertools.chain</code>.</p>
<p>After a little more thought, I realized that the <code>by_pairs</code> function can be made generic to handle a window of any size.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> deque</p>
<p><span class="kw1">def</span> window_iter<span class="br0">&#40;</span>elements, n<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; sequence = <span class="kw2">iter</span><span class="br0">&#40;</span>elements<span class="br0">&#41;</span><br />
&nbsp; &nbsp; window = deque<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">while</span> <span class="kw2">True</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; element = sequence.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; window.<span class="me1">append</span><span class="br0">&#40;</span>element<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">len</span><span class="br0">&#40;</span>window<span class="br0">&#41;</span> == n:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> window<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; window.<span class="me1">popleft</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>The item_after function now looks like this:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> item_after3<span class="br0">&#40;</span>elements, item<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; pairs = window_iter<span class="br0">&#40;</span>elements, <span class="nu0">2</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; first, second = pairs.<span class="me1">next</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> a, b <span class="kw1">in</span> chain<span class="br0">&#40;</span><span class="br0">&#91;</span><span class="br0">&#40;</span>first, second<span class="br0">&#41;</span><span class="br0">&#93;</span>, pairs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> a == item:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> b<br />
&nbsp; &nbsp; <span class="kw1">return</span> first</div>
<p>And with a little data: one of my pets gets a treat every day. I know who got the treat yesterday; who gets it next?</p>
<div class="dean_ch" style="white-space: wrap;">
pets = <span class="br0">&#91;</span>pet <span class="kw1">for</span> pet <span class="kw1">in</span> <span class="st0">&quot;Bear Lady Mikey Jerry Kiri&quot;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
<span class="kw1">print</span> <span class="st0">&quot;Pet after Kiri is&quot;</span>, item_after3<span class="br0">&#40;</span>pets, <span class="st0">&quot;Kiri&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">print</span> <span class="st0">&quot;Pet after Lady is&quot;</span>, item_after3<span class="br0">&#40;</span>pets, <span class="st0">&quot;Lady&quot;</span><span class="br0">&#41;</span></div>
<p>The output:</p>
<div class="dean_ch" style="white-space: wrap;">
Pet after Kiri is Bear<br />
Pet after Lady is Mikey</div>
<p>I can think of a lot of potential uses for this window iterator. One is in natural language processing, when calculating the "stickiness" of words in a corpus. You could iterate over a window of three words, calculating the collocations of words before and after the middle word.</p>
<p>Here's a rather contrived example: calculating the average temperature on days when the previous two days were 80 degrees or higher.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> ave_temp_after<span class="br0">&#40;</span>temp, temps<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; after_temps = <span class="br0">&#91;</span>z <span class="kw1">for</span> x, y, z <span class="kw1">in</span> window_iter<span class="br0">&#40;</span>temps, <span class="nu0">3</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> x &gt; temp &lt; y<span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">sum</span><span class="br0">&#40;</span>after_temps<span class="br0">&#41;</span> / <span class="kw2">float</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span>after_temps<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>tempdata = <span class="st0">&quot;99 101 85 91 71 64 67 90 81 101&quot;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
temps = <span class="br0">&#40;</span><span class="kw2">float</span><span class="br0">&#40;</span>x<span class="br0">&#41;</span> <span class="kw1">for</span> x <span class="kw1">in</span> tempdata<span class="br0">&#41;</span><br />
temp = ave_temp_after<span class="br0">&#40;</span><span class="nu0">80</span>, temps<span class="br0">&#41;</span></p>
<p><span class="kw1">print</span> <span class="st0">&quot;Average temperature after two days of 80+:&quot;</span>, temp</div>
<p>Output:
</pre>
<div class="dean_ch" style="white-space: wrap;">
Average temperature after two days of 80+: 87.0</div>
<p>It strikes me that this would be great for those baseball statistics guys:</p>
<blockquote><p>This is just the third time since 1963 that a left-handed pinch hitter has broken a bat on three consecutive Wednesdays.</p></blockquote>
<p>I'm waiting for a call from The Show any day now.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/12/07/iterating-over-a-window-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Excerpt from Expert Python Programming: Writing a Package in Python</title>
		<link>http://ginstrom.com/scribbles/2008/11/26/excerpt-from-expert-python-programming-writing-a-package-in-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/11/26/excerpt-from-expert-python-programming-writing-a-package-in-python/#comments</comments>
		<pubDate>Wed, 26 Nov 2008 05:28:36 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[book]]></category>
		<category><![CDATA[excerpt]]></category>
		<category><![CDATA[expert]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=666</guid>
		<description><![CDATA[I'm currently reading Expert Python Programming by Tarek Ziadé. I plan on writing a review once I'm done, but in the meantime the publishers over at Packt have kindly sent me an excerpt from the book to publish here. Download "Writing a Package in Python" (PDF)]]></description>
			<content:encoded><![CDATA[<div style="float:left;"><a title="Expert Python Programming book web page" href="http://www.packtpub.com/expert-python-programming/book"><img src="/img/expert_python_prog_cover.png" alt="Expert Python Programming" /></a></div>
<p>I'm currently reading <a href="http://www.packtpub.com/expert-python-programming/book">Expert Python Programming</a> by <a href="http://www.packtpub.com/author_view_profile/id/241">Tarek Ziadé</a>. I plan on writing a review once I'm done, but in the meantime the publishers over at <a href="http://www.packtpub.com/index">Packt</a> have kindly sent me an excerpt from the book to publish here.<br />
<br clear="all" /></p>
<p><a href="/docs/Writing_a_Package_in_Python.pdf" title="Download PDF file"><img src="/images/pdf.gif" alt="PDF File" style="padding:0px; border:none;" /> Download "Writing a Package in Python" (PDF)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/11/26/excerpt-from-expert-python-programming-writing-a-package-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes for using Unicode with Python 2.x</title>
		<link>http://ginstrom.com/scribbles/2008/11/16/notes-for-using-unicode-with-python-2x/</link>
		<comments>http://ginstrom.com/scribbles/2008/11/16/notes-for-using-unicode-with-python-2x/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 16:04:23 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[doctests]]></category>
		<category><![CDATA[idle]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=575</guid>
		<description><![CDATA[Python is very Unicode friendly, but there are still a few quirks that people new to the language (or not so new!) need to assimilate in order to use Unicode effectively. To avoid going over old ground, for a primer please see this excellent article on using Unicode with Python. Here, I want to talk [...]]]></description>
			<content:encoded><![CDATA[<p>Python is very Unicode friendly, but there are still a few quirks that people new to the language (or not so new!) need to assimilate in order to use Unicode effectively.</p>
<p>To avoid going over old ground, for a primer please see this <a href="http://www.amk.ca/python/howto/unicode">excellent article on using Unicode with Python</a>. Here, I want to talk about some of the corner cases remaining after you've absorbed the great advice in that article.</p>
<p>This is not, of course, to say that Unicode support in Python is in any way buggy. Nay, Python's Unicode support is a unique snowflake, perfect in its own special way. It's just us flawed humans who have trouble appreciating fully its snowy beauty, especially if we're not Dutch.</p>
<p>And of course, all strings are Unicode in Python 3.0. That and the <a href="http://www.python.org/dev/peps/pep-3132/">new syntax for extended iterable unpacking</a> are the two main reasons I'm looking forward to Python 3.0. But alas, we'll have to enjoy the unique aspects of Unicode in Python for a bit more, now.</p>
<h3>Input</h3>
<p>I like to keep my programs as bastions of sanity, where all text is handled as Unicode. I thus try to put gatekeepers on all code accepting input, passing it on to the rest of the program logic as Unicode.</p>
<p>Programs that fail to do this often break when dealing with text input that they were sure would be fine as "ascii." One example of this is file paths. Programmers generally expect paths to be in nice, ASCII characters, and that's why their scripts often break when I run them on my Japanese system. For example, on my system the Desktop folder contains Japanese characters:</p>
<div class="dean_ch" style="white-space: wrap;">
C:\Documents and Settings\Ryan Ginstrom\デスクトップ\</div>
<p>When a random python script breaks when run from my Desktop folder, I peek inside, and it's invariably because the programmer never expected the path to contain characters that couldn't be expressed as ASCII.</p>
<p>Put it into Unicode as soon as you get it.</p>
<p>As mentioned in the article above, the <a href="http://www.python.org/doc/2.5.2/lib/module-codecs.html">codecs</a> module makes reading text files as Unicode very simple:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">codecs</span><br />
unitext = <span class="kw3">codecs</span>.<span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;/data.txt&quot;</span>, encoding=<span class="st0">&quot;utf-8&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>There are just a couple of twists to watch out for when using the <code>codecs</code> module.</p>
<ol>
<li>It obviously can't guess the encoding; you've got to figure this out yourself.</li>
<li><code>open()</code> converts the UTF-8 byte-order mark (BOM) ('\xef\xbb\xbf') into the UTF-16 BOM character ('\ufeff'), while removing the UTF-16 and UTF-16BE BOMs. This might not be what you expected.</li>
</ol>
<p>Because of these <del datetime="2008-11-15T14:52:00+00:00">shortcomings</del> unique aspects of the <code>codecs</code> module, I normally use the <a href="http://chardet.feedparser.org/">chardet</a> module in a custom function to get a random (i.e. user-supplied) text file as Unicode:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> bytes2unicode<span class="br0">&#40;</span>bytes, errors=<span class="st0">'replace'</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Convert a byte string into Unicode</p>
<p>&nbsp; &nbsp; Have to chop off the BOM by hand.<br />
&nbsp; &nbsp; Usage:<br />
&nbsp; &nbsp; text = bytes2unicode(open(&quot;</span>somefile.<span class="me1">txt</span><span class="st0">&quot;, &quot;</span>rb<span class="st0">&quot;).read())<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw3">encodings</span> = <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw3">codecs</span>.<span class="me1">BOM_UTF8</span>, <span class="st0">&quot;utf-8&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="kw3">codecs</span>.<span class="me1">BOM_UTF16_LE</span>, <span class="st0">&quot;utf-16&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="kw3">codecs</span>.<span class="me1">BOM_UTF16_BE</span>, <span class="st0">&quot;UTF-16BE&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> bom, enc <span class="kw1">in</span> <span class="kw3">encodings</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> bytes.<span class="me1">startswith</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes<span class="br0">&#91;</span><span class="kw2">len</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<span class="br0">&#93;</span>, enc, errors=errors<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># No BOM found, so use chardet</span><br />
&nbsp; &nbsp; encoding = chardet.<span class="me1">detect</span><span class="br0">&#40;</span>bytes<span class="br0">&#41;</span>.<span class="me1">get</span><span class="br0">&#40;</span><span class="st0">'encoding'</span>, <span class="st0">'ascii'</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes, encoding, errors=errors<span class="br0">&#41;</span></div>
<h3>Output</h3>
<p>As I mentioned, I like to get my text into Unicode as early as possible, and keep it as Unicode as late as possible. Ideally, I'd like to just output my text as Unicode, and let the output stream take care of the encoding (if any).</p>
<p>That's why when I need to output Unicode as a stream of bytes, I use the codecs module for files, and wrap the output stream otherwise. This is needed, for example, when using <a href="http://www.python.org/doc/2.5.2/lib/module-cStringIO.html">cStringIO</a>, which chokes on Unicode.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="kw1">import</span> <span class="kw3">cStringIO</span></p>
<p>myval = u<span class="st0">&quot;日本語&quot;</span></p>
<p>out = <span class="kw3">cStringIO</span>.<span class="kw3">StringIO</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
<span class="kw1">print</span> &gt;&gt; out, myval</div>
<p>Error message:</p>
<div class="dean_ch" style="white-space: wrap;">
Traceback (most recent call last):<br />
&nbsp; File &quot;C:\workspace\SpamTest\uni2.py&quot;, line 8, in &lt;module&gt;<br />
&nbsp; &nbsp; print &gt;&gt; out, myval<br />
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)</div>
<p>I can fix this by wrapping <code>out</code> with a class that intercepts the <code>write()</code> method, and converts Unicode strings to the specified encoding just before writing.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">class</span> OutStreamEncoder<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Wraps a stream with an encoder</p>
<p>&nbsp; &nbsp; usage:<br />
&nbsp; &nbsp; out = OutStreamEncoder(out, &quot;</span>utf<span class="nu0">-8</span><span class="st0">&quot;)<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, outstream, encoding<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">out</span> = outstream<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">encoding</span> = encoding</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> write<span class="br0">&#40;</span><span class="kw2">self</span>, obj<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; Wraps the output stream, encoding Unicode<br />
&nbsp; &nbsp; &nbsp; &nbsp; strings with the specified encoding<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">isinstance</span><span class="br0">&#40;</span>obj, <span class="kw2">unicode</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">out</span>.<span class="me1">write</span><span class="br0">&#40;</span>obj.<span class="me1">encode</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">encoding</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">out</span>.<span class="me1">write</span><span class="br0">&#40;</span>obj<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__getattr__</span><span class="br0">&#40;</span><span class="kw2">self</span>, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Delegate everything but 'write' to the stream&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">out</span>, attr<span class="br0">&#41;</span></div>
<p>Now the example above works:</p>
<div class="dean_ch" style="white-space: wrap;">
myval = u<span class="st0">&quot;日本語&quot;</span></p>
<p>out = <span class="kw3">cStringIO</span>.<span class="kw3">StringIO</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
out = OutStreamEncoder<span class="br0">&#40;</span>out, <span class="st0">&quot;utf-8&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">print</span> &gt;&gt; out, myval</div>
<h3>IDLE</h3>
<p><a href="http://www.python.org/idle/doc/idle2.html">IDLE</a> has its own peculiarities regarding Unicode. It actually handles Unicode like a champ, but it assumes that everything you type at the command prompt is in the file-system encoding. Since I'm on a Japanese system, this is "mbcs." You can thus get into some odd states:</p>
<div class="dean_ch" style="white-space: wrap;">
&gt;&gt;&gt; <span class="co1"># A unicode string of multibyte chars as bytes&#8230;</span><br />
&gt;&gt;&gt; u<span class="st0">&quot;日本語&quot;</span><br />
u<span class="st0">'<span class="es0">\x</span>93<span class="es0">\x</span>fa<span class="es0">\x</span>96{<span class="es0">\x</span>8c<span class="es0">\x</span>ea'</span><br />
&gt;&gt;&gt; <span class="co1"># This is what it should be</span><br />
&gt;&gt;&gt; <span class="kw2">unicode</span><span class="br0">&#40;</span><span class="st0">&quot;日本語&quot;</span>, <span class="st0">&quot;mbcs&quot;</span><span class="br0">&#41;</span><br />
u<span class="st0">'<span class="es0">\u</span>65e5<span class="es0">\u</span>672c<span class="es0">\u</span>8a9e'</span></div>
<p>The general way to avoid these problems in IDLE is using <code>sys.getfilesystemencoding()</code>.</p>
<div class="dean_ch" style="white-space: wrap;">
&gt;&gt;&gt; <span class="kw1">import</span> <span class="kw3">sys</span><br />
&gt;&gt;&gt; <span class="kw1">print</span> <span class="kw2">unicode</span><span class="br0">&#40;</span><span class="st0">&quot;日本語&quot;</span>, <span class="kw3">sys</span>.<span class="me1">getfilesystemencoding</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
日本語</div>
<h3>Doctests</h3>
<p><a href="http://docs.python.org/lib/module-doctest.html">doctest</a> is so full of snow-flaky uniqueness, I could put cherry syrup on it and call it a snow cone. Note in the example below that my "is_asian" function's doctests contain a Japanese character (日).</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1">#coding: UTF8</span></p>
<p><span class="co1"># 0&#215;3000 is ideographic space (i.e. double-byte space)</span><br />
IDEOGRAPHIC_SPACE = 0&#215;3000</p>
<p><span class="kw1">def</span> is_asian<span class="br0">&#40;</span>char<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Is the character Asian?</p>
<p>&nbsp; &nbsp; &gt;&gt;&gt; is_asian(u'a')<br />
&nbsp; &nbsp; False<br />
&nbsp; &nbsp; &gt;&gt;&gt; is_asian(u'日')<br />
&nbsp; &nbsp; True<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">ord</span><span class="br0">&#40;</span>char<span class="br0">&#41;</span> &gt; IDEOGRAPHIC_SPACE</div>
<p>Running doctest on this gives a rather cryptic error:</p>
<div class="dean_ch" style="white-space: wrap;">
Failed example:<br />
&nbsp; &nbsp; is_asian(u'日')<br />
Exception raised:<br />
&nbsp; &nbsp; Traceback (most recent call last):<br />
&nbsp; &nbsp; &nbsp; File &quot;C:\Python25\lib\doctest.py&quot;, line 1228, in __run<br />
&nbsp; &nbsp; &nbsp; &nbsp; compileflags, 1) in test.globs<br />
&nbsp; &nbsp; &nbsp; File &quot;&lt;doctest __main__.is_asian[1]&gt;&quot;, line 1, in &lt;module&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; is_asian(u'日')<br />
&nbsp; &nbsp; &nbsp; File &quot;C:\workspace\SpamTest\uni1.py&quot;, line 15, in is_asian<br />
&nbsp; &nbsp; &nbsp; &nbsp; return ord(char) &gt; IDEOGRAPHIC_SPACE<br />
&nbsp; &nbsp; TypeError: ord() expected a character, but string of length 3 found</div>
<p>It turns out that doctests can't handle Unicode characters. It's making the same "string of utf-8 bytes as Unicode characters" error as IDLE, and thus interpreting one character ("日") as three.</p>
<p>So we have to trick doctest by taking the repr value of the Unicode text (I usually stick the actual characters in a comment above it). Here's a repaired version, which runs without errors:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">def</span> is_asian<span class="br0">&#40;</span>char<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Repaired version of doctests</p>
<p>&nbsp; &nbsp; &gt;&gt;&gt; is_asian(u'a')<br />
&nbsp; &nbsp; False<br />
&nbsp; &nbsp; &gt;&gt;&gt; # u'日'<br />
&nbsp; &nbsp; &gt;&gt;&gt; is_asian(u'<span class="es0">\u</span>65e5&#8242;)<br />
&nbsp; &nbsp; True<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">ord</span><span class="br0">&#40;</span>char<span class="br0">&#41;</span> &gt; IDEOGRAPHIC_SPACE</div>
<p>To see the silver lining in this, at least it encourages you to keep your complicated tests in unit tests, and save doctests for simple, illustrative purposes.</p>
<h3>Conclusion</h3>
<p>Unicode support in Python is actually quite good &#8212; much better than most languages. And it will get even better with Python 3.0. In the meantime, however, there are a few gotchas to look out for when using Unicode in Python.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/11/16/notes-for-using-unicode-with-python-2x/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Generic adapter class in Python</title>
		<link>http://ginstrom.com/scribbles/2008/11/06/generic-adapter-class-in-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/11/06/generic-adapter-class-in-python/#comments</comments>
		<pubDate>Thu, 06 Nov 2008 02:35:36 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[adapter]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=554</guid>
		<description><![CDATA[The adapter pattern is often used in programming when you need to adapt one interface to another. Here's a simple generic adapter class that can adapt just about any interface to just about any other. class Adapter&#40;object&#41;: &#160; &#160; &#34;&#34;&#34; &#160; &#160; Adapts an object by replacing methods. &#160; &#160; Usage: &#160; &#160; dog = [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://en.wikipedia.org/wiki/Adapter_pattern">adapter pattern</a> is often used in programming when you need to adapt one interface to another. Here's a simple generic adapter class that can adapt just about any interface to just about any other.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">class</span> Adapter<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Adapts an object by replacing methods.<br />
&nbsp; &nbsp; Usage:<br />
&nbsp; &nbsp; dog = Dog()<br />
&nbsp; &nbsp; dog = Adapter(dog, dict(make_noise=dog.bark))<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, obj, adapted_methods<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;We set the adapted methods in the object's dict&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">obj</span> = obj<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="kw4">__dict__</span>.<span class="me1">update</span><span class="br0">&#40;</span>adapted_methods<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__getattr__</span><span class="br0">&#40;</span><span class="kw2">self</span>, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;All non-adapted calls are passed to the object&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">obj</span>, attr<span class="br0">&#41;</span></div>
<p>This adapter can be used to adapt many objects with different interfaces into a single, unified interface.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">os</span></p>
<p><span class="kw1">class</span> Dog<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">name</span> = <span class="st0">&quot;Dog&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> bark<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;woof!&quot;</span></p>
<p><span class="kw1">class</span> Cat<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">name</span> = <span class="st0">&quot;Cat&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> meow<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;meow!&quot;</span></p>
<p><span class="kw1">class</span> Human<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">name</span> = <span class="st0">&quot;Human&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> speak<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;'hello'&quot;</span></p>
<p><span class="kw1">class</span> Car<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">name</span> = <span class="st0">&quot;Car&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> make_noise<span class="br0">&#40;</span><span class="kw2">self</span>, octane_level<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;vroom%s&quot;</span> % <span class="br0">&#40;</span><span class="st0">&quot;!&quot;</span> * octane_level<span class="br0">&#41;</span></p>
<p><span class="kw1">class</span> Adapter<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Adapts an object by replacing methods.<br />
&nbsp; &nbsp; Usage:<br />
&nbsp; &nbsp; dog = Dog<br />
&nbsp; &nbsp; dog = Adapter(dog, dict(make_noise=dog.bark))<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, obj, adapted_methods<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;We set the adapted methods in the object's dict&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">obj</span> = obj<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="kw4">__dict__</span>.<span class="me1">update</span><span class="br0">&#40;</span>adapted_methods<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__getattr__</span><span class="br0">&#40;</span><span class="kw2">self</span>, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;All non-adapted calls are passed to the object&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">obj</span>, attr<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; objects = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; dog = Dog<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; objects.<span class="me1">append</span><span class="br0">&#40;</span>Adapter<span class="br0">&#40;</span>dog, <span class="kw2">dict</span><span class="br0">&#40;</span>make_noise=dog.<span class="me1">bark</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; cat = Cat<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; objects.<span class="me1">append</span><span class="br0">&#40;</span>Adapter<span class="br0">&#40;</span>cat, <span class="kw2">dict</span><span class="br0">&#40;</span>make_noise=cat.<span class="me1">meow</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; human = Human<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; objects.<span class="me1">append</span><span class="br0">&#40;</span>Adapter<span class="br0">&#40;</span>human, <span class="kw2">dict</span><span class="br0">&#40;</span>make_noise=human.<span class="me1">speak</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; car = Car<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; car_noise = <span class="kw1">lambda</span> : car.<span class="me1">make_noise</span><span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; objects.<span class="me1">append</span><span class="br0">&#40;</span>Adapter<span class="br0">&#40;</span>car, <span class="kw2">dict</span><span class="br0">&#40;</span>make_noise=car_noise<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> obj <span class="kw1">in</span> objects:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;A&quot;</span>, obj.<span class="me1">name</span>, <span class="st0">&quot;goes&quot;</span>, obj.<span class="me1">make_noise</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>Output:</p>
<div class="dean_ch" style="white-space: wrap;">
A Dog goes woof!<br />
A Cat goes meow!<br />
A Human goes 'hello'<br />
A Car goes vroom!!!</div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/11/06/generic-adapter-class-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t overuse classes in Python</title>
		<link>http://ginstrom.com/scribbles/2008/10/06/dont-overuse-classes-in-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/10/06/dont-overuse-classes-in-python/#comments</comments>
		<pubDate>Mon, 06 Oct 2008 09:46:46 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[classes]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=386</guid>
		<description><![CDATA[Unlike some mainstream languages like Java, you don't have to package everything into a class in Python. A class is a good tool when you want to package up state and behavior, but when all you've got is a bundle of related functionality, the module is the natural unit of packaging in Python. In my [...]]]></description>
			<content:encoded><![CDATA[<p>Unlike some mainstream languages like Java, you don't have to package everything into a class in Python. A class is a good tool when you want to package up state and behavior, but when all you've got is a bundle of related functionality, the module is the natural unit of packaging in Python.</p>
<p>In my opinion, <a href="http://homepage.mac.com/s_lott/iblog/architecture/C551260341/E20081005191603/index.html">this article</a> is an egregious example of overuse of classes. I don't want to pick on the author in particular, but it illustrates my point so well that I want to examine the article's code here.</p>
<p>The article was about using Python for exploratory programming, but I think that the class-heavy style makes things more complicated than they need to be. The classes in the code essentially have no state. The one exception is <code>TopRowsWBZipContent</code>, where state is passed into the <code>__init__</code> method, but is only used in one method and could just as easily have been passed in there. The author also uses extensive inheritance to get the various methods onto the class instances, where if vanilla functions were used, that could all be eliminated.</p>
<p>Here, I want to post the code from the article, and below that my rewrite using plain functions.</p>
<p>First, the article's code (I've made some of the interspersed text into comments):</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1"># Let's look at the first class definition.</span><br />
<span class="co1"># It isn't very interesting, but it shows the design pattern.</span><br />
<span class="kw1">class</span> Operation<span class="br0">&#40;</span> <span class="kw2">object</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> processList<span class="br0">&#40;</span> <span class="kw2">self</span>, files <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> fileName <span class="kw1">in</span> files:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">process</span><span class="br0">&#40;</span> fileName <span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> processFile<span class="br0">&#40;</span> <span class="kw2">self</span>, fileName <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">pass</span></p>
<p><span class="co1"># Here's a subclass that provides that process.</span><br />
<span class="kw1">class</span> ZipContent<span class="br0">&#40;</span> Operation <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> processFile<span class="br0">&#40;</span> <span class="kw2">self</span>, fileName <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">zip</span>= <span class="kw3">zipfile</span>.<span class="me1">ZipFile</span><span class="br0">&#40;</span> fileName <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> member <span class="kw1">in</span> <span class="kw2">zip</span>.<span class="me1">infolist</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s %s&quot;</span> % <span class="br0">&#40;</span> fileName,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; member.<span class="me1">filename</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">examineMember</span><span class="br0">&#40;</span> <span class="kw2">zip</span>, member <span class="br0">&#41;</span></p>
<p><span class="co1"># Here's the next subclass.</span><br />
<span class="co1"># It opens each zip archive member as a workbook,</span><br />
<span class="co1"># using the xlrd module.</span><br />
<span class="kw1">class</span> WBZipContent<span class="br0">&#40;</span> ZipContent <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> examineMember<span class="br0">&#40;</span> <span class="kw2">self</span>, zipFile, member <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; contents= zipFile.<span class="me1">read</span><span class="br0">&#40;</span> member.<span class="me1">filename</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; wb= xlrd.<span class="me1">open_workbook</span><span class="br0">&#40;</span> file_contents=contents,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; filename=member.<span class="me1">filename</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> sheet <span class="kw1">in</span> wb.<span class="me1">sheets</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">examineSheet</span><span class="br0">&#40;</span> wb, sheet <span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> examineSheet<span class="br0">&#40;</span> <span class="kw2">self</span>, wb, sheet <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;&gt; &nbsp;Sheet %s %d rows&quot;</span> % <span class="br0">&#40;</span>sheet.<span class="me1">name</span>, sheet.<span class="me1">nrows</span> <span class="br0">&#41;</span></p>
<p><span class="co1"># Exploring the Workbook sheets</span><br />
<span class="co1"># Here's sprint three of the application.</span><br />
<span class="co1"># This is yet another subclass.</span><br />
<span class="kw1">class</span> TopRowsWBZipContent<span class="br0">&#40;</span> WBZipContent <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span> <span class="kw2">self</span>, topnRows=<span class="nu0">5</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">super</span><span class="br0">&#40;</span> TopRowsWBZipContent, <span class="kw2">self</span> <span class="br0">&#41;</span>.<span class="kw4">__init__</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">topnRows</span>= topnRows<br />
&nbsp; &nbsp; <span class="kw1">def</span> examineSheet<span class="br0">&#40;</span> <span class="kw2">self</span>, wb, sheet <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;&gt; &nbsp;Sheet %s %d rows&quot;</span> % <span class="br0">&#40;</span>sheet.<span class="me1">name</span>, sheet.<span class="me1">nrows</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">topnRows</span> <span class="kw1">is</span> <span class="kw2">None</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; limit= sheet.<span class="me1">nrows</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; limit= <span class="kw2">min</span><span class="br0">&#40;</span> <span class="kw2">self</span>.<span class="me1">topnRows</span>, sheet.<span class="me1">nrows</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> r <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>limit<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; row= sheet.<span class="me1">row</span><span class="br0">&#40;</span>r<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> r, <span class="br0">&#91;</span> c.<span class="me1">value</span> <span class="kw1">for</span> c <span class="kw1">in</span> row <span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> manual<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Change the options manually.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="co1">#op= ZipContent() # What's in the ZIP files?</span><br />
&nbsp; &nbsp; <span class="co1"># What does the data look like?</span><br />
&nbsp; &nbsp; <span class="co1">#op= TopRowsWBZipContent( topnRows=5 )</span><br />
&nbsp; &nbsp; op= ExtractCSVWBZipContent<span class="br0">&#40;</span><span class="st0">&quot;../data&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; files = <span class="kw3">glob</span>.<span class="kw3">glob</span><span class="br0">&#40;</span> <span class="st0">&quot;../data/*.zip&quot;</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; op.<span class="me1">processList</span><span class="br0">&#40;</span> files <span class="br0">&#41;</span></div>
<p>Here's my rewrite using vanilla functions. The code is now a lot shorter, and I think easier to understand. (It's also easier to test. Yes, I believe in unit testing exploratory code, at least once it settles down a bit.) I've been a bit snooty and used more Pythonic coding conventions while I was at it.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">glob</span><br />
<span class="kw1">import</span> <span class="kw3">zipfile</span><br />
<span class="kw1">import</span> xlrd</p>
<p><span class="kw1">def</span> process_files<span class="br0">&#40;</span>filenames<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Process a list of files.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> filename <span class="kw1">in</span> filenames:<br />
&nbsp; &nbsp; &nbsp; &nbsp; process_file<span class="br0">&#40;</span>filename<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> process_file<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Examine a zipped file.<br />
&nbsp; &nbsp; Configure &quot;</span>examine_member<span class="st0">&quot; below to<br />
&nbsp; &nbsp; customize behavior.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; zipped = <span class="kw3">zipfile</span>.<span class="me1">ZipFile</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> member <span class="kw1">in</span> zipped.<span class="me1">infolist</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s : %s&quot;</span> % <span class="br0">&#40;</span>filename,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; member.<span class="me1">filename</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; examine_member<span class="br0">&#40;</span>zipped, member<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> examine_workbook<span class="br0">&#40;</span>zipped, member<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Examine a workbook. Open up and process each<br />
&nbsp; &nbsp; sheet in the workbook using the xlrd module.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; contents= zipped.<span class="me1">read</span><span class="br0">&#40;</span>member.<span class="me1">filename</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wb= xlrd.<span class="me1">open_workbook</span><span class="br0">&#40;</span>file_contents=contents,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; filename=member.<span class="me1">filename</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">except</span> xlrd.<span class="me1">biffh</span>.<span class="me1">XLRDError</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;Not an excel file&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> sheet <span class="kw1">in</span> wb.<span class="me1">sheets</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; examine_sheet<span class="br0">&#40;</span>wb, sheet<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> examine_sheet<span class="br0">&#40;</span>wb, sheet, top_n_rows=<span class="nu0">5</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Examine a worksheet. Print top_n_rows, or<br />
&nbsp; &nbsp; all rows in the sheet if top_n_rows is 0/None.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;&gt; &nbsp;Sheet %s %d rows&quot;</span> % <span class="br0">&#40;</span>sheet.<span class="me1">name</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sheet.<span class="me1">nrows</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; limit = top_n_rows <span class="kw1">or</span> sheet.<span class="me1">nrows</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> r <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>limit<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; row = sheet.<span class="me1">row</span><span class="br0">&#40;</span>r<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> r, <span class="br0">&#91;</span>c.<span class="me1">value</span> <span class="kw1">for</span> c <span class="kw1">in</span> row<span class="br0">&#93;</span></p>
<p><span class="co1"># configure behavior like this</span><br />
examine_member = examine_workbook</p>
<p><span class="kw1">def</span> manual<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; Run when called as main. Gets all<br />
&nbsp; &nbsp; the zip files in an arbitrary folder<br />
&nbsp; &nbsp; and processes them.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; filenames = <span class="kw3">glob</span>.<span class="kw3">glob</span><span class="br0">&#40;</span><span class="st0">&quot;stuff/*.zip&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; process_files<span class="br0">&#40;</span>filenames<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; manual<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>Note that this code is even more suited to exploratory programming than the class-based code, because we don't have to write all the class machinery, and we can mix around functions without the need for inheritance or other abuses.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/10/06/dont-overuse-classes-in-python/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
