Word crunching

As many who translate scientific texts know, this is one of the very few professions where the world of arts and letters is condemned to cooperate with and understand the world of science and engineering. Let’s face it, the average fine arts undergraduate doesn’t usually hang out with the chemical engineers on campus at the weekend. Nevertheless, years later when they’re working on articles for scientific journals they suddenly find that they have to get on. This is perhaps the reason why so little seems to be understood about the possibilities for the future of machine translation, and even machine interpreting, as we shall now see.

Monte-Carlo number-crunching methods have long been used as a “brute” method of finding mathematical answers. A classic example may be Deep Blue’s seemingly infinite “random paths” for deciding the best possible next chess move when pitted against Gary Kasparov. I suspect many scientists were hoping the computer would win, especially those who created the beast, but most humans probably had a secret desire to see the human come out triumphant. In fact, the Russian did win one match and drew three, but lost two. So few results are statistically insignificant, but obviously Kasparov, with the “disadvantage” of being human, was unable to generate billions of potential moves like his opponent and yet was capable of winning. How was this possible? I believe the answer, and the analogy, can be applied to machine translation.

Take the translator tool provided by a well-known search engine. With the astronomical amount of words passing through their servers every day, they certainly have a lot of words to crunch through their translation software (if indeed they do). By a process of choosing the median, that is to say the “most used word” on the Internet, the machine is able to choose what it considers the most likely translation for a specific word by comparing/aligning texts in different languages. (Incidentally, take a good read of the policy regulations before setting up a mail account with such large internet service providers and you’ll understand why you should never send your translated texts via these – you may simply be adding to the company's huge TMs). This is not so different from the most widely used dictionaries in the world conducting surveys to see how many people use a word before officially accepting it in their dictionaries. However, there are two problems with this, where we human translators can undoubtedly outwit the machine just like Kasparov. The machine trips up with exceptions (Kasparov did the unexpected by not always choosing the "best" move) and with poetic licence. The metaphor, for example, can derail it.

Clearly, we human translators must take enormous care in using such tools, and always only as an aid. For example, after translating a text you may find it useful to run a section (never the whole text) through to see what words the computer chooses and give you some new ideas that may not have occurred to you, like a thesaurus. But even in this case, one should never forget that the median may not always be correct. One has only to look at the examples in human history of landslide majorities voting for thuggish dictators to realise that the majority can often be wrong. Just because a word is used more frequently does not mean it is correct; one has to check the official sources and consider the specific context. There is also the threat of plagiarism; a scientific author with a new patent will not be pleased to find their closely guarded secret floating around a famous search engine's TM.

Incidentally, computers can also be used to intelligently “generate” their own literature. In his classic book “Fooled by Randomness”, Nassim Nicholas Taleb tells of how he used Andrew C. Bulhak’s Dada Engine to come up with phrases like this: “Many narratives concerning the role of the writer as observer may be revealed. It could be said that if cultural narrative holds, we have to choose between the dialectic paradigm of narrative and neoconceptual Marxism. Sartre’s analysis of cultural narrative holds that society, paradoxically, has objective value.” Such pseudo-intellectual drivel may sound familiar to anyone who has translated for low-brow art critics, but it undoubtedly sounds human.

Then we have the tantalising prospect of machine interpreting, which may not be so far-fetched as many still believe. Whenever you talk on the phone, your voice is digitized before reaching the receiver, and this has been so for many years now. It's not your mother you hear, it's a computer copying her. As seen at a Proz conference in Barcelona, a computer can “learn” an individual human’s voice and reproduce it with new sentences of our own choosing. The “Terminator” films may start to ring a bell. This may be old hat to James Bond or the CIA, which leads one to think that the next step can now be taken; in other words, for the computer to reproduce the speaker's voice not with the same sentence just spoken into the phone receiver or microphone, but with its translation. To do so, the computer obviously has to be able to identify and store the sentence it has just "heard" in order to translate it. In fact, I believe the BBC are well on the way to doing just this, albeit unwittingly. The corporation has been using live subtitling for years now. To do so, one may either employ an extremely fast typist or…a computer. Basically, the computer recognises the voice and flashes the words it has understood onto the screen. From what I have seen, I’d say it gets about 90 % right, which in my opinion is quite impressive, especially when faced with so many accents. Imagine a Glaswegian interviewee saying “I cannae. D’you see? D’you ken?” The machine may well understand this to mean “A can o’ juicy chicken,” for example. However, this hiccup can also be overcome. After crunching thousands of interviews in Glasgow, the computer has only to be told where or who it is translating to get the gist. Taking another leap forward, a GPS / satnav device could also be employed to automatically inform the computer when it is in Glasgow or Los Angeles, so it can adjust its voice recognition and vocabulary accordingly.

So there we have it: our hand-held interpreter of the not-so-far future.

Comments on this article

Knowledgebase Contributions Related to this Article

mode, not median (Posted by Gary Smith Lawson on 09/13/2010)

I've just re-read this and realised that I mean "mode", not "median" as the way machine translators find the most probable word (Mode = the most occurring; Median = the "middle" possibility in a sequence, usually of numbers).
TRANSTAC (Posted by Gary Smith Lawson on 06/16/2010)

http://www.darpa.mil/ipto/programs/transtac/transtac.asp

Want to contribute to the article knowledgebase? Join ProZ.com.

ProZ.com Translation Article Knowledgebase

Word crunching

Your current localization setting

Select a language

ProZ.com Translation Article Knowledgebase

Word crunching

You have native languages that can be verified

Your current localization setting

Select a language