Google admits 'garbage in, garbage out' translation problem

This discussion belongs to Translation news » "Google admits 'garbage in, garbage out' translation problem".
You can see the translation news page and participate in this discussion from there.


LegalTransform  Identity Verified
United States
Local time: 12:26
Member (2002)
Spanish to English
+ ...
Wow! I predicted this on ProZ.com three years ago Feb 8, 2014

I even used the same phrase: Garbage In, Garbage Out.
See this post:
http://www.proz.com/forum/machine_translation_mt/186784-the_future_of_google_translate.html

[Edited at 2014-02-08 00:30 GMT]


Direct link Reply with quote
 
Post removed: This post was hidden by a moderator or staff member for the following reason: Empty, duplicate post

Orrin Cummins  Identity Verified
Japan
Local time: 01:26
Japanese to English
+ ...
As the old saying goes Feb 8, 2014

You get what you pay for, I guess.

Direct link Reply with quote
 

Claudia Cherici  Identity Verified
Italy
Local time: 18:26
Member (2010)
English to Italian
+ ...
well spotted Feb 8, 2014

well done Jeff, you spotted the exact problem with the Google trans system and using even the exact wording is rather impressive, I must say

Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 18:26
Member (2006)
English to Afrikaans
+ ...
The comment about watermarking is more interesting than the so-called admission Feb 8, 2014

The original video

The exact words that were spoken, and the question that prompted it, can be heard here:
http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672
(at minute 51 to minute 53)

There was no admission, however. The man from Google simply "said it" -- he did not "admit to it". I can understand that a news editor might use "admit" in a heading because it is shorter than "acknowledge", but if the news writer persists in referring to the statement as an "admission" throughout the news report is bad journalism, in my opinion.

The question was not about garbage in general but about a specific type of garbage, namely content that was translated by Google itself and left unedited. The question was about the danger of Google using content that it itself had translated, to improve its machine translation system. The Google man's answer is that they are aware of that danger but don't think that it is a threat at this time. In other words, while we can speculate about a worst case scenario, the engineers at Google Translate are not blind to this issue and do actually keep an eye on it. This does not make me trust Google Translate any less.

On watermarking

The Google man told about one experimental method that they used to be able to recognise translations that were translated by Google. They don't use that method any more, but may use it again later. It involves classifying each word in a language as "even" or "odd", and when a translation is about to be generated, and multiple valid word sequences are available for that text, Google would favour a sequence that produces "all even words" or "all odd words" in a phrase. The human reader won't notice the difference, but Google will be able to spot large chunks of all even-classified words or all odd-classified words in web sites that they scrape, and know that the translation is therefore more likely a machine translation. Very clever, IMO.

The fact that Google Translate includes non-printing control characters into its translations may also be a form of watermarking. If you do a translation in Google and copy/paste it into MS Word and enable display of non-printing characters, you will sometimes see those characters show up as grey blocks. They are not printed or visible under normal circumstances (e.g. on web sites or PDFs or other files translated with Google Translate) but they are there and can be detected. In fact, you can search for them in MS Word... their code is ChrW(8203).

With regard to what the Google man said about evaluating the quality of the content, I did notice that about a year or two ago Google Translate changed its output so that it is deliberately poor, from a typesetting point of view. Many translated phrases now start with a lowercase letter even if the source text started with an uppercase letter, or vice versa, and the translated text contains spacing errors next to certain types of punctuation that "good quality" authors would never permit or commit.


[Edited at 2014-02-08 10:30 GMT]


Direct link Reply with quote
 
LilianNekipelov  Identity Verified
United States
Local time: 12:26
Russian to English
+ ...
All their translations are odd, anyhow, Feb 8, 2014

so why do they even bother. The spacing problem--yes, no surprise. The spacing problem becomes more and more annoying even when you, personally--not a machine, are typing. Also, some letters are often skipped or reversed. It is a real pain when you try to type directly on the internet these days.




[Edited at 2014-02-08 11:58 GMT]


Direct link Reply with quote
 

DLyons  Identity Verified
Ireland
Local time: 17:26
Spanish to English
+ ...
Some sites need to be filtered. Feb 8, 2014

Of course sites such as Alibaba should be ignored (or better filtered out from Google hits) by translators. But that's a different problem from Google self-training - watermarking may help Google to recognized and eliminate its own translations from its training material.

Direct link Reply with quote
 

Maxime Bujakov  Identity Verified
France
Local time: 18:26
Member (2006)
French to English
+ ...
Machine translation Feb 10, 2014

Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).


Direct link Reply with quote
 

LegalTransform  Identity Verified
United States
Local time: 12:26
Member (2002)
Spanish to English
+ ...
Yes, but... Feb 13, 2014

did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.

What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.


Maxime Bujakov wrote:

Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).


Direct link Reply with quote
 

Maxime Bujakov  Identity Verified
France
Local time: 18:26
Member (2006)
French to English
+ ...
20% increase in my productivity due to the M Feb 18, 2014

Jeff Whittaker wrote:

did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.

What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
[/quote]

Jeff, of course as a translator in charge I read it all, that's why I still spent 80% of my regular typing time.

The editor must have tracked the source as well plus was a good reference to know that my writing style did not deteriorate.

MT is also surprisingly good at suggesting very appropriate words in some of the most difficult cases - like when you sit and think for minutes over one single word.

Finally, when it comes to someone's personal business operations in an unknown language environment MT revolutionized the life. I can practically read and write in Lithuanian, the oldest European language, having just a basic idea of the language structure.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Jared Tabor[Call to this topic]

You can also contact site staff by submitting a support request »

Google admits 'garbage in, garbage out' translation problem

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search