Speech-to-Speech (S2S) technology seems to have finally stepped out of the realm of science fiction, yet it’s not ready for prime time. In their report published earlier this year, the Translation Automation User Society (TAUS) recognizes this as the paradox the technology currently finds itself in.
The report outlines the current status, future directions, challenges, and opportunities of speech translation. It also includes interviews with 13 people who represent institutes and companies researching and working in this field. We present highlights from the report.
New directions and possibilities
Ike Sagie of Lexifone believes that existing engines for Machine Translation (MT) and Speech Recognition (SR) cannot be used straightaway. Optimization layers and other modifications are also required. Since people speak continuously, there must be an acoustic solution that cuts the flow into sentences or segments and sends the output to an audio optimization layer. Linguistic optimization is needed in the next stage to ensure translation accuracy, such as making sure interrogative sentences are annotated with question marks.
Chris Wendt of Microsoft/Skype states that SR, MT, and Text-to-Speech (TTS) by themselves are not enough to make a translated conversation work. Because clean input is necessary for translation, elements of spontaneous language—hesitations, repetitions, corrections, etc.—must be cleaned between automatic SR and MT. For this purpose, Microsoft has built a function called TrueText to turn what you said into what you wanted to say. Because it’s trained on real-world data, it works best on the most common mistakes, Wendt says.
According to Chengqing Zong from the Chinese Academy of Sciences, future advancements in S2S technology may also include different means of evaluating quality than current automatic techniques such as Bleu Scores. In the future, Zong says, “We’ll rely more on human judgment. Work on neural networks will continue, despite problems with speed and data sparseness.”