Whence and whither machine translation

As a translator, machine translation is something I always want to keep an eye on. As a dropout from a PhD in computational linguistics, I also find it a fascinating topic. In this post, I want to discuss the past, present and future of machine translation, with a bit of a focus on its implications for us translators.

The distant past

People were dreaming about machine translation almost before there were even computers. According to Fifty years of the computer and translation, the first patents relating to machine translation were awarded in the 1930s.

The first machine-translation systems were just mechanical (and later electronic) glossaries and phrase books. The first public demonstration of a machine-translation system, in 1954, was a Russian/English phrase book containing 250 sentences.

Not much distance passed

As computers became more powerful and the limitations of the phrase-book approach became more evident, scientists began turning to rule-based systems. There were two basic approaches: the "direct" approach, whereby source sentences were transformed directly into translations; and the "interlingua" approach, whereby the source was first converted into an intermediate form, which was then transformed into the translation.

Systran, arguably the most successful commercial machine-translation system ever (and the granddaddy of a lot of the Web-based systems), began using the "direct" approach, then started bolting on some "interlingua" features as the developers started to stumble against the limitations of of the direct approach.

At the time, there was a lot of optimism about the rule-based approach. Scientists thought that if they could just define all the "rules" of language, they'd be able to get "perfect" machine translation. It was widely believed that language was a largely mechanistic affair, easily amenable to reproduction in vacuum tubes (and later transistors). Areas of the brain specific to language processing, like Broca's area and Wernicke's area, had long been known; as Noam Chomsky led the charge to a more scientific study of language, it was widely believed that we would soon find all the "language organs" of the brain.

They eventually found, however, that the brain isn't neatly divided into organs, and that language processing is spread all over the brain. Machine-translation researchers found that every faculty of human knowledge and intelligence is needed in order to translate human language. Machine translation was thus found to be an AI-complete problem, meaning that it could only be accomplished by human-level intelligence — the only current exemplar of which is the human brain.

These systems can hardly be said to have taken work away from translators. The only real use they have is for gisting or triaging documents, and in that sense they may have actually created more work for us (by bringing more translatable documents to the attention of translation consumers). On the other hand, ubiquitous (bad) machine translation has cheapened our profession to some degree. Free, even when it's really really bad, still puts downward pressure on the price tag of good.

Giving knowledge to computers

One way of expressing this need to draw on all the faculties of human intelligence is to say that machine-translation systems need to have "knowledge of the world." At first, researchers tried to hand code rules about the world, like the facts that cows are alive, and they moo and turn into steaks. They soon realized, however, that coding all the knowledge and expertise available to a human was just too enormous a task.

So of course, the next idea was that if we can't explicitly teach machines what humans know, let them learn it for themselves.

Statistical machine translation (SMT) is a very primitive implementation of this idea. This is the approach used by Google's MT system, for example. The basic idea of statistical machine translation is that you take huge corpora of texts in the source and target languages, and create a kind of semantic map. You then take known translations between the two languages, and use them as kind of "landmarks": this point in the source language space is equivalent to that point in the target language space; you can (in theory) then extrapolate to the entire semantic spaces of the two languages.

But SMT still doesn't understand language in any meaningful way, so it still suffers from the fatal flaws of rule-based systems.

The holy grail would be to create an expert system that could teach itself, which leads to the possible next stage in the evolution of machine translation.

Moving forward

The current state of the art in machine translation is still at this impasse. Rule-based and statistical approaches make incremental improvements, but without human-level knowledge and expertise, MT won't progress much beyond current quality (if at all).

It's anybody's guess when human-level or higher machine intelligence will arrive, but my guess is that it'll be within my lifetime. Within this time, we should be able to simulate an entire human brain in silicon, down to the last neuron and synapse, so assuming that the brain is just the sum of its cells and their interactions, that'll push us over the limit. And when we have that level of machine intelligence, we'll have human-level machine translation (as well as human-level everything else).

I think that this could happen in one of two scenarios: a "big bang" scenario, or a "gradual refinement" scenario.

Big bang scenario

In the big bang scenario, some group announces that they've created an AI (or more scarily, it announces itself). Google co-founder Sergey Brin has stated that Google is trying to create an AI. So it's conceivable that at some point in the not-so-distant future, we'll have a full-blown artificial intelligence among us.

At first, such an AI would be incredibly expensive to run (like taking enough power to supply a medium-sized city), and will be too busy curing cancer, solving world hunger, and whatnot to bother with translating widget assembly manuals.

Eventually, however, Moore's Law would bring down the cost of the AI to the point where it was cheap and easy to use machines for any human task requiring intelligence. At this point, all professional translators will be out of a job, as well as all the other professions, including medicine, law, mathematics, and engineering. What such a world would look like is anyone's guess: utopia? we all become Buddhas and meld with the universe? ennui and nihilism? slavery or extermination of the human race? That's why they call it a Singularity — you can't make any reliable predictions for what comes after.

Gradual refinement scenario

In this scenario, computers just keep gradually getting smarter and more powerful. This scenario looks slightly better for the professional translators over the short run: instead of machines replacing us, they'll simply become better and better at helping us translate. Think today's CAT tools on steroids:

Computer: "Ryan, I think you should reconsider the wording of that second paragraph. It doesn't seem to hang together with the rest of the translation."

There would be a gradual shakeout of the bottom rungs of our profession, as low-level translation becomes more and more doable by the unskilled. Highly skilled translators, however, would still be in demand until the end. This is kind of like how things like Microsoft Excel and Visual Basic allowed analysts and engineers to get rid of low-level code monkeys, but highly skilled developers were and still are in high demand.

Eventually, machine intelligence would equal and then surpass human intelligence, and we'd be in the "big bang" scenario, but the gradual scenario will at least allow for a softer landing.

Will this stuff really happen?

Who knows? Maybe we'll have an extinction event first, like a meteor strike, nuclear war, or global-scale bioterror. Maybe Moore's Law will peter out around 2020, just shy of enabling us to simulate a brain in silicon. Maybe we have some fundamental misconceptions about the way brains work.

But I think the safe money is on AI in our lifetimes. There's nothing we can really do to prepare for it, however, so my advice is simply to relax and enjoy the ride.

1 comment to Whence and whither machine translation

  • Thanks! This is the best overview of machine translation I have seen so far. People always ask me what I think of machine translation, and I always say I’ll wait till it happens to worry about it, because it’s pretty much impossible to predict what will happen and how it will affect my work and my life as a translator. So I’m doing as you say, relaxing and enjoying the ride.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>