Ginstrom IT Solutions (GITS)


Translation memory (TM) is becoming very commonplace in the translation industry. More and more translation agencies and direct clients are demanding that their translators use translation-memory software, often naming specific applications. It is not hard to envision a future in which owning and using translation memory will be viewed as we view using a word processor to write translations today.

In this article, I explore the questions of just what translation memory is, how it works, and what it means for us translators.

What is translation memory?

The purpose of translation memory is to allow translators to leverage previous translations. A translation memory application is basically built around a sophisticated database, that "remembers" what the translator has translated before, and uses that knowledge to help the translator translate future text.

Quite some time ago, translators (or more precisely, the people paying them) realized that a lot of translation work was being duplicated. A manual would be slightly revised in line with a product upgrade, and the entire manual would be re-translated from scratch.

There were a lot of problems with this approach. For one, it took a lot of time, and it cost a lot of money. Second, consistency suffered: what was called a gizmo in the original manual was now being called a doohickey in the new one. Sometimes, translations even lacked internal consistency; a sprocket was being called a gizmo on page 3, and a doohickey on page 17.

The solution was translation memory: as the translator translates the text, a translation-memory application "remembers" each segment of source text and its translation. Then when the translator translates a new segment of text, the application searches back in its memory for identical or similar segments, and offers up their corresponding translations.

A translation memory is basically a great big database. And like other databases, it has many uses, limited primarily only by the imagination of its implementor. One example of another use for a translation memory is a concordance function: a feature that shows the matches for a given string in the translation memory. This allows you to check back on how you (or other translators) have translated a given term or phrase in the past, even if the memory has no segments similar to the current segment you are working on.

How does translation memory work?

At its core, a translation-memory application is basically a database of source-text segments, and their translations. Every time the translator translates a segment of text, a new entry is stored in the database.

When a translation memory is presented with a new source-text segment, it searches through the database for similar segments. It assigns a score to each entry based on the similarity of its source segment (and possibly other criteria), and presents entries with scores passing a predefined threshold to the translator. Note that an essential element in this process is the translator. Unlike machine translation, translation memory is a form of machine-assisted human translation; the role of the human translator is dominant, and the software is in an assisting, or facilitating role.

Crucial to deciding how similar two segments of text are is the "matching" or "similarity" algorithm. Many translation-memory applications keep their similarity algorithms closely guarded secrets. At least one actually includes a prohibition against reverse engineering its algorithms in its license agreement.

One common similarity algorithm, however, is edit distance (also known as Levenshtein Distance). This is the core of the similarity algorithm used by Felix. Edit distance is the number of "edits," or changes, that need to be made to convert one string into another. This can be factored in with the lengths of the respective strings, in order to gain a crude similarity metric. There are many descriptions of edit distance online, including this one.

Of course, there are many ways that the edit-distance algorithm can be made more sophisticated and useful. For instance, penalties can be given for differences in such meta-data as formatting. Similarly, penalties can be reduced if two characters differ only in case, or whether they are half or full-width, or if two words have the same base but are only conjugated differently.

Although each translation-memory application will have its own unique algorithm, given the same target (providing a similarity metric) and source materials (segments of text), I doubt that the basic methods differ too greatly.

Once the translation-memory application has its list (possibly empty) of candidates, it has to show it to the user, and allow the user to select, and possibly edit, one of the suggestions. Of course presentation is important: the TM must make it easy for the translator to tell how similar the two segments are, and just how they are different. One way to mark differences is by highlighting the parts of the two strings that are different, something like this:

This color is black
This color is red

The translator can then choose to retrieve the translation for the similar segment, editing the parts that are different if need be, or ignoring the suggestion and typing in his own translation.

What are the benefits of translation memory?

Translation memory potentially allows translations to be done more cheaply, faster, and with better quality.

Translations are cheaper

Since repeated segments can be recycled in new translations, less of the translator's time (e.g. money) is required to do the translation. Who benefits from this cost saving is discussed below.

How much cheaper translation memory can make a translation depends on the amount of text that can be recycled, which depends greatly on the nature of the document being translated, and the size of the translation memory. On one end of the scale, you might have 5 years' worth of manuals for a given product, and have the task of translating the latest version, in which only 5% or so of the text is new or modified. On the other end of the scale, you might not have any translation memories, and have the task of translating a document with very little internal repetition. In the first case, it would be possible to work several times faster than without translation memory; in the second case, there would be almost no time savings.

Translations are faster

Since the translator is able to recycle past translations, the translation will naturally go faster. Especially when consistency needs to be retained between document versions, this not only saves translation time, but time spent referring back to old versions. The amount that the translation time is shortened also depends greatly on the nature of the document and existing memories.

Translations have better quality

There are two main ways that translation memory improves the quality of a translation: consistency and prevention of omissions (dropped text). Translations using translation memory are more consistent because features like concordance and automated glossary lookup give the translator a heads-up reference to terminology and expression. This is especially handy when there is a client-supplied glossary that the translator is supposed to follow. Without automated lookup, the translator is almost guaranteed to miss some terms if the list is longer than a few hundred items long.

Omitted (dropped) text is a common problem for translators. It is quite easy to skip over a sentence or entire passage of text. While quality-conscious translators will check a translation carefully for omissions (preferably with at least two sets of eyes), this costs time, and hence money. Translation memory helps prevent dropped text by presenting the translator with each segment of text in turn; it becomes very difficult for the translator to inadvertently skip over text.

Unlike the benefits of cost of speed, the quality benefit is not greatly dependent on the nature of the document or the existence of memories. Thus even with types of document lacking a great deal of repetition and without a vast store of past translations in memory, translation memory can still benefit translators.

What does translation memory mean for translators?

As mentioned above, translation memory has a number of benefits. It allows translations to be done faster, and more consistently. And even if you have not translated any very similar texts before, translation memory can still save you work and improve your consistency with features like concordance and automated glossary lookup.

The question that the translator needs to ask, however, is who reaps these benefits? It is very natural that the person paying for the translation would like to be able to pay only for the new parts of the document. The translator, on the other hand, wants to recoup his investment in the translation-memory tool, and may also view his translation memories as an asset to be exploited by him, not some client.

These issues are still fairly new, because although the idea of translation memory has been around for some time, personal computers have only become powerful enough to make such applications practical for individual translators in the past decade or so. The issue is even newer for translators working into or out of Japanese, as most major tools have only recently begun to be able to cope with languages using so-called double-byte character sets. In the end, I imagine that all parties will get a piece of the benefits of translation memory, with the size of each party's slice probably being decided more by the power relationships between translator and client than anything else.