<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The GITS Blog &#187; localization</title>
	<atom:link href="http://ginstrom.com/scribbles/category/localization/feed/" rel="self" type="application/rss+xml" />
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<lastBuildDate>Thu, 05 Aug 2010 13:07:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Japanese/Western mobile website aesthetics</title>
		<link>http://ginstrom.com/scribbles/2009/12/20/japanesewestern-mobile-website-aesthetics/</link>
		<comments>http://ginstrom.com/scribbles/2009/12/20/japanesewestern-mobile-website-aesthetics/#comments</comments>
		<pubDate>Sun, 20 Dec 2009 07:25:12 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[localization]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=1436</guid>
		<description><![CDATA[About a year ago, I wrote about the differences in Website aesthetics between Japan and the West. I was recently translating the review of a redesigned mobile website, and found a similar aesthetic. The mobile site was for the Japanese subsidiary of a major European brand/fashion corporation. The company had changed the site from a [...]]]></description>
			<content:encoded><![CDATA[<p>About a year ago, I wrote about the <a href="/scribbles/2009/01/05/differences-in-japanese-and-western-website-aesthetics/">differences in Website aesthetics between Japan and the West</a>. I was recently translating the review of a redesigned mobile website, and found a similar aesthetic.</p>
<p>The mobile site was for the Japanese subsidiary of a major European brand/fashion corporation. The company had changed the site from a cutesy, plastered-with-cartoon-animals design to a clean, stylish design inspired by the iPhone, and in fact designed specifically for compatibility with the iPhone. As an example, the page background was changed from white to a wood-inspired, shaded brown. The amount of text on each page was pared down considerably as well.</p>
<p>The company hired to evaluate the site did a focus group-style study with the target audience, and found that the subjects almost universally preferred the old, "busy" site, and found the new site 殺風景 ("drab").</p>
<p>Part of this dreariness may have been an attempt to appeal less to young women, who give lots of eyeballs but not much revenue, and more to older women, who might actually buy some of that overpriced stuff. But the study showed that even women in their 30s and 40s liked the old site design better, cutesy cartoon kittens and all.</p>
<p>On the one had, the company wants to maintain a consistent international image (one remark was that the new site design conforms the the global brand image). But I saw this as another proof that you need to design websites for your audience, not necessarily according to what looks good to you &#8212; especially when marketing in different cultures.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/12/20/japanesewestern-mobile-website-aesthetics/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Great resource for translating software docs into English</title>
		<link>http://ginstrom.com/scribbles/2009/12/17/great-resource-for-translating-software-docs-into-english/</link>
		<comments>http://ginstrom.com/scribbles/2009/12/17/great-resource-for-translating-software-docs-into-english/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 02:52:02 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[localization]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=1429</guid>
		<description><![CDATA[Often when translating software documentation from Japanese to English, I'll have to find the exact corresponding English names for various OS and other software components. These are things that you can't just make up, because the user will be looking for that actual text on her computer. I recently discovered a site that makes this [...]]]></description>
			<content:encoded><![CDATA[<p>Often when translating software documentation from Japanese to English, I'll have to find the exact corresponding English names for various OS and other software components. These are things that you can't just make up, because the user will be looking for that actual text on her computer.</p>
<p>I recently discovered a site that makes this a lot easier: <a href="http://screenshots.modemhelp.net/">http://screenshots.modemhelp.net/</a>. This site has screen shots for Windows and Mac operating systems, as well as popular software, all organized and labeled.</p>
<p>Say, for example, you need to get the names of <a href="http://screenshots.modemhelp.net/screenshots/Windows_2000/Control_Panel/Index.shtml">Control Panel icons from Windows 2000</a>, or the menu items on the <a href="http://screenshots.modemhelp.net/screenshots/Windows_Vista/Desktop/Vista/%28Recycle_Bin%29.shtml">context menu for the Recycle bin in Windows Vista</a>, or the <a href="http://screenshots.modemhelp.net/screenshots/Macintosh_OS_v10.1/System_Preferences/Index.shtml">System Preferences icons for Mac OS X</a>. Yep, it's all there.</p>
<p>Unfortunately, the site doesn't appear to have screen shots for Windows 7 yet, but at least those are substantially similar to Vista's.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2009/12/17/great-resource-for-translating-software-docs-into-english/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Localization: Just &#8220;translating the words&#8221; doesn&#8217;t cut it</title>
		<link>http://ginstrom.com/scribbles/2008/12/21/localization-just-translating-the-words-doesnt-cut-it/</link>
		<comments>http://ginstrom.com/scribbles/2008/12/21/localization-just-translating-the-words-doesnt-cut-it/#comments</comments>
		<pubDate>Sun, 21 Dec 2008 02:12:21 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[localization]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[english]]></category>
		<category><![CDATA[Japanese]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/?p=736</guid>
		<description><![CDATA[This month I've been getting ready to make the trip to IJET-20 in Sydney, Australia. I booked my flight to Australia online via Jetstar. Using amazing high-tech IP-geolocation techniques, Jetstar figured out that I was in Japan and decided to treat me to its Japanese-language site. Fair enough; but if you're going to foist off [...]]]></description>
			<content:encoded><![CDATA[<p>This month I've been getting ready to make the trip to <a href="http://ijet.jat.org/ijet-20/">IJET-20</a> in Sydney, Australia.</p>
<p>I booked my flight to Australia online via <a href="http://www.jetstar.com/">Jetstar</a>. Using amazing high-tech IP-geolocation techniques, Jetstar figured out that I was in Japan and decided to treat me to its Japanese-language site. Fair enough; but if you're going to foist off your localized site, you ought to make sure you get it right.</p>
<p>And Jetstar didn't quite. The most egregious example was their confirmation email, which started like this:</p>
<div class="dean_ch" style="white-space: wrap;">お客さまはジェットメールに登録されました。</p>
<p>&nbsp;&nbsp; &nbsp; Ryanさま</p>
<p>ジェットメール/ジェットテキストのご登録ありがとうございます。 –ジェットスターは「オールデイ・エブリデイ・ローフェア」でエアラインの新常識を提供していく航空会社です。</p></div>
<p>Imagining for a moment that I am Japanese (Jetstar did, so why can't you?), the use of my first name followed by the Hiragana "sama" is really out of place. As a customer, I'd expect my last name (Ginstrom) to be properly written in Japanese characters (ジンストロム), followed by the Kanji character for "sama" (様).</p>
<p>Although the use of the first name is fairly widespread in the English-speaking world, it's a big no-no with customer communication in Japanese.</p>
<h3>A word about templates</h3>
<p>The problem stems from the use of templates for emails and dynamic Web pages. I imagine that the original email template went something like this:</p>
<div class="dean_ch" style="white-space: wrap;">
<p>&nbsp; &nbsp; Dear $name:</p>
<p>Blah blah blah JETSTAR blah blah blah&#8230;</p></div>
<p>Where "$name" will be replaced dynamically from the database with the first name.</p>
<p>The site developers must have just passed off this template to be translated. The translator has no control over what gets written for the "$name" value, which the computer is going to fill in with "Ryan," or more normally, "Hanako" or "Taro." The translator, faced with a no-win situation, probably opted for the hiragana "sama" (さま) because it looks less strange with a first name in Roman characters than the Kanji "sama" (様) would.</p>
<div class="dean_ch" style="white-space: wrap;">
<p>&nbsp; &nbsp; $nameさま</p>
<p>ジェットメール/ジェットテキストのご登録ありがとうございます。&#8230;</p></div>
<h3>Localization isn't (just) translation</h3>
<p>What they needed to do was modify the template so that the last name of the customer would be entered in Kanji. This would have made it possible to create a proper Japanese-language email template. Whether Jetstar failed to do this out of ignorance or cheapness (not wanting to incur the development costs), it's still a localization fail.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/12/21/localization-just-translating-the-words-doesnt-cut-it/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My solution to the localization horror story</title>
		<link>http://ginstrom.com/scribbles/2007/10/11/my-solution-to-the-localization-horror-story/</link>
		<comments>http://ginstrom.com/scribbles/2007/10/11/my-solution-to-the-localization-horror-story/#comments</comments>
		<pubDate>Thu, 11 Oct 2007 08:55:01 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
				<category><![CDATA[localization]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2007/10/11/my-solution-to-the-localization-horror-story/</guid>
		<description><![CDATA[The localization horror story in this CPAN article about the Locale::Maketext module tells of the combinatory explosion of translation "rules" required when localizing text with variables (placements) into multiple languages. Since I translate (and localize) Japanese to English, this is a problem that really strikes home with me. Here's a simple example of this problem [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://search.cpan.org/dist/Locale-Maketext/lib/Locale/Maketext/TPJ13.pod#A_Localization_Horror_Story:_It_Could_Happen_To_You">localization horror story</a> in this <a href="http://search.cpan.org/dist/Locale-Maketext/lib/Locale/Maketext/TPJ13.pod">CPAN article about the Locale::Maketext module</a> tells of the combinatory explosion of translation "rules" required when localizing text with variables (placements) into multiple languages.</p>
<p>Since I translate (and localize) Japanese to English, this is a problem that really strikes home with me. Here's a simple example of this problem in Japanese:</p>
<p><span style="color: blue">%s件のコメントを削除しました</span></p>
<p>Japanese lacks a plural marker, so this would be translated as "<span style="color: green">Deleted 1 comment</span>" or "<span style="color: green">Deleted <em>n</em> comments</span>," depending on the value assigned to the variable "%s" &#8212; the Japanese is the same for both the singular and plural cases.</p>
<p>There's an obvious conundrum here: we have to make a distinction in the translation that doesn't exist in the source text. How can you localize this text without changing the source code of the software (often not feasible) and teaching the developers enough English to code for these differences (even less likely to happen)?</p>
<h3>The article's proposed solution</h3>
<p>The authors propose a solution to this problem in the form of a rules-based system. (One example is <span style="color: green">"You have [quant,_1,piece] of new mail."</span>). I see two big problems with this system: (1) you're hard-coding your rules into your code, and relatedly (2) you'll likely have to modify the actual code (or at least page template) every time you add a new language.</p>
<p>Getting all the possible rules for any language would pretty much mean writing a machine translation system, and we all know how successful they are.</p>
<p>This is actually a pretty common beginner's solution. You realize that simple string substitution doesn't work for translating sentences, so you think: I know, I'll add some rules. In the end, though, this never works &#8212; with every new sentence and language you'll be adding new rules. And languages tend to vary along different axes, so your rule system grows increasingly complex.</p>
<p>Look at <a href="http://www.google.com/translate_t">Google Translate</a>.  That's based on Systran, a rule-based system developed in the late 1960s. You can probably guess that in the 40-odd years since, they still haven't got the rules right.</p>
<p>If you only have a fixed number of sentences in a fixed number of languages to translate, the rules approach could work. But the problem that the authors are trying to address is arbitrarily many languages &#8212; and I'd add the probability that you'll have more text to translate later.</p>
<h3>My solution</h3>
<p>I propose that instead of hard-coding rules, we produce concrete translations for the exceptions. This follows a general principle of moving vectors of change out of the code whenever possible.</p>
<p>Then we write an "improved" gettext that first fills in the variable and sees if there's a matching translation, if not using the translation with the placement. Here's how a Python version might look:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1">#coding: UTF8</span></p>
<p>trans_dict = <span class="br0">&#123;</span>u<span class="st0">&quot;スパム&quot;</span> : <span class="st0">&quot;spam&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u<span class="st0">&quot;1件のコメントを削除しました&quot;</span> : <span class="st0">&quot;Deleted 1 comment&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u<span class="st0">&quot;%s件のコメントを削除しました&quot;</span> : <span class="st0">&quot;Deleted %s comments&quot;</span> <span class="br0">&#125;</span></p>
<p><span class="kw1">def</span> get_translation<span class="br0">&#40;</span>msgid<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> trans_dict.<span class="me1">get</span><span class="br0">&#40;</span>msgid<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_and_fill<span class="br0">&#40;</span>msgid, *args<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the translation, then fill in the variables&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; trans = get_translation<span class="br0">&#40;</span>msgid<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> trans:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> trans % args<br />
&nbsp; &nbsp; <span class="co1">#If we didn't find any translation&#8230;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> msgid % args</p>
<p><span class="kw1">def</span> fill<span class="br0">&#40;</span>msgid, *args<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;See if we have a filled-in translation,<br />
&nbsp; &nbsp; otherwise get the translation and fill it&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; trans = get_translation<span class="br0">&#40;</span>msgid % args<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> trans <span class="kw1">or</span> get_and_fill<span class="br0">&#40;</span>msgid, *args<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> <span class="kw3">gettext</span><span class="br0">&#40;</span>msgid, *args<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Improved gettext that checks for filled-in versions&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># Base case &#8212; no variables</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> args:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> get_translation<span class="br0">&#40;</span>msgid<span class="br0">&#41;</span> <span class="kw1">or</span> msgid<br />
&nbsp; &nbsp; <span class="co1"># Try filling it in</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> fill<span class="br0">&#40;</span>msgid, *args<span class="br0">&#41;</span></p>
<p><span class="kw1">print</span> <span class="kw3">gettext</span><span class="br0">&#40;</span>u<span class="st0">&quot;スパム&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">print</span> <span class="kw3">gettext</span><span class="br0">&#40;</span>u<span class="st0">&quot;%s件のコメントを削除しました&quot;</span>, <span class="nu0">1</span><span class="br0">&#41;</span><br />
<span class="kw1">print</span> <span class="kw3">gettext</span><span class="br0">&#40;</span>u<span class="st0">&quot;%s件のコメントを削除しました&quot;</span>, <span class="nu0">2</span><span class="br0">&#41;</span><br />
<span class="kw1">print</span> <span class="kw3">gettext</span><span class="br0">&#40;</span>u<span class="st0">&quot;ハム&quot;</span><span class="br0">&#41;</span></div>
<p>Notice that the code is fault tolerant &#8212; if a translation isn't found, it will just return the source text. In real life, we could (and probably should) log any source text with no translations.</p>
<p>Our output:</p>
<div class="dean_ch" style="white-space: wrap;">
spam<br />
Deleted 1 comment<br />
Deleted 2 comments<br />
ハム</div>
<p>Yay, it works! Here's how. We have a dictionary of 3 terms:<br />
<span style="color: green">スパム</span> : <span style="color: blue">spam</span><br />
<span style="color: green">1件のコメントを削除しました</span> : <span style="color: blue">Deleted 1 comment</span><br />
<span style="color: green">%s件のコメントを削除しました</span> : <span style="color: blue">Deleted %s comments</span></p>
<p>We call <code>gettext</code> with <code>msgid</code> <span style="color: green">スパム</span>. There are no variables, so we do a straight lookup and get "<span style="color: blue">spam</span>."</p>
<p>Next, we call it with <code>msgid</code> <span style="color: green">%s件のコメントを削除しました</span> and a variable of 1. We fill in the string, and get <span style="color: green">1件のコメントを削除しました</span>.</p>
<p><code>get_translation</code> finds the translation of <span style="color: green">1件のコメントを削除しました</span> , and returns the translation "<span style="color: blue">Deleted 1 comment</span>."</p>
<p>Next, we call it with <code>msgid</code> <span style="color: green">%s件のコメントを削除しました</span> and a variable of 2. We fill in the string, and get <span style="color: green">2件のコメントを削除しました</span>.</p>
<p><code>get_translation</code> doesn't find the translation of <span style="color: green">2件のコメントを削除しました</span> , so we look it up without the variable filled in.</p>
<p><code>get_translation</code> finds the translation of <span style="color: green">%s件のコメントを削除しました</span>, and returns the translation "<span style="color: blue">Deleted %s comments</span>."</p>
<p>We then supply the variable 2 to <span style="color: blue">Deleted %s comments</span>, and get <span style="color: blue">Deleted 2 comments</span> &#8212; Simple!</p>
<p>Finally, we call <code>gettext</code> with <code>msgid</code> <span style="color: green">ハム</span>. We don't find a translation, so we just return <span style="color: green">ハム</span> back.</p>
<h3>Potential complication</h3>
<p>One potential problem with this approach is a rule that would produce infinitely many "concrete" (filled-in) translations. The Russian example given in the article seems to fall into this category. Even so, however, it should be possible to give translations for up to some reasonable number &#8212; say, 10,000 directories. The localizer should be able to generate them for us automatically, and it shouldn't hurt lookup times too much or take too much disk space &#8212; as long as the number of such sentences isn't very large&#8230; At any rate, I think this still beats hard-coding exception rules for each new language.</p>
<p>If you are going to add some rule systems, though, I'd add a different one for each language, and add a different rule for each sentence. The localizer engine for each language could load its specific rule set along with its dictionary. The localizer engine would check for matches with rules, then go through the above sequence if no rules are found. So for Russian, I'd have a rule for the equivalent of "Deleted %s comments," and generate translations for just that sentence.</p>
<h3>Conclusion</h3>
<p>The beauty of this approach is that as we add localized languages, we only have to add more localized dictionaries &#8212; there's no need to change the code or our existing localized dictionaries.</p>
<p>Note that this example only works when you have one variable at most. And I think that as a rule of thumb, you should stick to no more than one variable per sentence.</p>
<p>With two or more variables, you'd have to use a dict (with **kwargs instead of *args) with named placements, since the order is likely to change. So the sentence above would be, for example, <span style="color: green">%(num_comments)s件のコメントを削除しました</span>. Other languages do a "{1}", "{2}" type placement, which works the same way.</p>
<p><strong>Edit</strong>: More explanation on why a rule-based approach isn't a good idea, and fleshed out when and how rules could be added for outliers.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2007/10/11/my-solution-to-the-localization-horror-story/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
