Robot translators decipher mountains of enemy messages
Wed Mar 16, 2:55 PM ET
Top Stories - Knight Ridder Newspapers
By Robert S. Boyd, Knight Ridder Newspapers
WASHINGTON - Somewhere in a vast jumble of documents in a Baghdad warehouse or in the constant buzz of electronic signals in the sky, a few ominous words or phrases may be hidden:
"Explosives." "Nerve gas." "Convoy." "Airport arrival." "The president."
The words, however, are in Arabic, Farsi, Pashto or some other language that few Americans understand. The messages urgently need to be translated, but there aren't enough expert linguists to handle the flood.
The time for robot translators has arrived, according to a panel of language specialists at a meeting of the American Association for the Advancement of Science (news - web sites) in Washington last month.
"The Defense Department doesn't have enough human translators," said Melissa Holland, an expert at the Army Research Laboratory in Arlington, Va.
"The backlog of untranslated documents is a hindrance to the war on international terrorism," said Mohammad Shihadah, the founder of Applications Technology, a small firm in suburban McLean, Va., that sells Arabic-to-English translation software to the government.
Since Sept. 11, 2001, the Defense Department, the CIA (news - web sites) and other intelligence agencies have been pouring money and effort into what's known as "machine translation," or MT for short.
MT uses computers to translate messages from one language to another - such as turning "Good Morning" into "Buenos Dias" or "Auf Wiedersehen" into "Au Revoir" with little or no human intervention.
Computer scientists have labored to perfect machine translation since the 1950s with only modest success. But the terrorist attacks and the wars in Afghanistan (news - web sites) and Iraq (news - web sites) have given the technology a boost.
Today's robot-linguists are far from perfect, but they can give soldiers in the field the gist of a document, a poster or a possible threat scrawled on a wall.
"Soldiers can get a sense of what a document is about - not a perfect translation," Holland said. Accuracy is still less than 50 percent, Clare Voss, another Army researcher, acknowledged.
Equipped with a handheld PDA (a Personal Digital Assistant, such as the popular BlackBerry), a digital camera and a laptop computer in the back of a Hummer, a GI can quickly decide if a message needs human attention.
"Expectations for speed and accuracy are not always met - it's not the Queen's English," admitted William McClellan, a machine translation systems manager at Booz-Allen Hamilton, a technology consulting firm in McLean. "But it's a way to find the needle in the haystack without translating every straw."
The elimination process is called "triage."
"Knowing what to translate first out of thousands of documents is a problem faced daily by our military and intelligence officers," McClellan said. "Thousands of documents can be automatically screened, and those meeting certain criteria can be ... automatically routed to linguists and domain specialists."
The volumes of material to be translated are "enormous," said Mark Turner, an MT expert at CACI, an information technology organization in Lanham, Md.
In Baghdad, "we found warehouses with billions of documents in bags, boxes, binder and books," he said. "There are tons of paper and terabytes (trillions of bytes or letters) of electronic media."
People who use machine translation often find it frustrating, quirky and unreliable. "MT is a useful tool for triage, but it doesn't replace human linguists," Turner said.
For decades, machine translation systems labored to make computers understand traditional rules of grammar - subjects, verbs, objects and so on. Progress was slow, thanks to the tremendous ambiguity and complexity of human language.
The word "get," for example, has 24 possible meanings listed in Webster's New College dictionary. One of them is "kill" - as in "I'll get you for this."
In the 1990s, however, a new technique came along, applying statistical analysis to huge databases of previously translated texts. By comparing a new, unknown message to millions of stored sentences, phrases and words, researchers could quickly find the most likely translation.
This method, also known as "data-driven machine translation," works like this: The computer scans a sentence, lists each possible meaning of each word and arranges them in every possible order, most of them nonsensical, until it finds one that most nearly matches a good translation.
For example: "bites man dog," "dog man bites," "man bites dog," and, finally, "dog bites man." A long sentence can produce millions of variations.
Statistical machine translation "was a huge leap in the state of the art - very high accuracy, very fast," said Daniel Marcu, a co-founder of Language Weaver Inc., a commercial MT company in Marina del Rey, Calif.
Marcu claimed that his company's system can translate 5,000 words per minute, 24 hours a day, seven days a week. Five years ago, he said, the best that could be done was one 1,000-word document a day.
According to Marcu, the system can record a broadcast from al Jazeera, the Arabic-language network that carries Osama bin Laden (news - web sites)'s taped messages, and translate it automatically.
"With a one-minute delay, you can see what al Jazeera reported," he said.
Machine translation is also gaining ground in international commerce, according to Stephen Richardson, a former IBM researcher who now heads the Machine Translation Project at Microsoft in Redmond, Wash.
"Companies are facing increasingly difficult and costly challenges of localizing their products and services in the global marketplace," Richardson said.
Human translation is very expensive - 20 to 50 cents per word, he said. Older, rule-based machine translation systems cost as much as a million dollars to create and maintain.
Microsoft has used the new, data-driven method to translate its customer support database into four foreign languages "at a substantial cost savings," Richardson said. The machine translations still need a final polishing by a human editor, but the total cost is 35 percent less than it used to be.
Nevertheless, the prime mover for machine translation is the war on terror, and the urgent need to understand what potential enemies are saying.
"You can't expect the president to speak Pashto," an Afghan language, said Benson Margulies, the chief technical officer at Basic Technologies Inc., a language processing provider in Cambridge, Mass.
| || |