Alon Lavie, who heads up the Amazon Machine Translation Research and Development Group, said in a recent Globally Speaking podcast, that neural machine translation “makes very, very strange types of mistakes … because it’s not a direct matching between the source language and the target language in terms of the words and the sequences of words…” Thing is, the strangeness can get masked by the smoothness.
Recently I found myself feeding this line of text through Google Translate.
|For these products, please use 視覚化 not 可視化 based on the definition at the following URL:|
It was aimed at linguists, instructing them to use the term shikakuka instead of kashika depending on context. Both words mean “visualization” but have slightly different nuances. And the results were so humorous I thought it would be a shame not to share them.
The neural cogs went whizzing and immediately gave me this:
You might expect that the two Japanese terms 視覚化 and 可視化 in the source would make it through to the target text. After all, there’s no need to translate them. But no. Instead, it produced 視覚障害, a big red flag. Just hit the reverse translate button (always a good idea to do that) to see what it means…
Okay, is this even close to what I wanted to say? Avoid visual impairment? Did I want to discriminate against someone? Of course not. Problem is, the Japanese text is so fluent that it reads like I really mean to be really mean.
Alon was absolutely right. Very strange. What is going on inside those neural networks of theirs? Can I pre-edit the source to help the MT to produce a better output? Maybe that unnatural colon at the end of the sentence is wreaking some sort of unexpected havoc? Let’s change it to a period.
Nope. It’s still giving us that problematic 視覚障害 but followed by some different wording.
Uh, yeah. So changing two dots ( : ) to one ( . ) takes us from “avoid visual impairment” to “confirm that there is no visual impairment”. Help!