Human Evaluation of Machine Translation
Thread poster: Juan Martín Fernández Rowda

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 14:47
English to Spanish
+ ...
Jul 29, 2016

Sharing some information and tips for those interested in learning how to evaluate MT quality:

https://www.linkedin.com/pulse/ebay-mt-language-specialists-series-human-evaluation-fernández-rowda?trk=mp-reader-card


Direct link Reply with quote
 
LilianNekipelov  Identity Verified
United States
Local time: 17:47
Russian to English
+ ...
I thought MT had to do more with devolution Jul 30, 2016

—certainly devolution of language due to simplifications, inaccuracy and repetition, all that is bad for style.

Direct link Reply with quote
 
Daryo
United Kingdom
Local time: 22:47
Serbian to English
+ ...
totally depressing and moronic! Jul 30, 2016

Using MT at the present stage of development is like trying to calculate the trajectory of a rocket aiming for Jupiter or making weather forecast using a Babbage machine instead of a super-computer - simply a con job on unsuspecting clients.

The idea that to evaluate the depth of the garbage produced by a non-human translation automaton you should try to avoid human participation at all cost is simply beyond belief - you couldn't make it up!

Has any of these geniuses noticed that translation is part of the communication process between humans, and that the intended recipients would presumably be the ones best placed to evaluate what is feed to them?

If I'm presented with a MT output from a language I don't know at all, and can't make head or tail of that so-called "translation", it's really going to be a great help and consolation if another automaton tells me that the level of garbage isn't that deep and I MUST understand at least a part of it!

Looks like MT lunatics are firmly taking control of the asylum ...

That "evaluation method" advocated in this article is more or less the equivalent of using blind people to evaluate painting generated by some automatons, or deaf people to rate computer-generated music!

[Edited at 2016-07-31 14:09 GMT]


Direct link Reply with quote
 

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 14:47
English to Spanish
+ ...
TOPIC STARTER
It's about having a human in the loop, not the other way around Aug 2, 2016

Daryo wrote:

The idea that to evaluate the depth of the garbage produced by a non-human translation automaton you should try to avoid human participation at all cost is simply beyond belief - you couldn't make it up!



Daryo, with all due respect, I think you may have missed the point of the article. Perhaps a more formal paper will help you get a clearer picture - this is a good place to start to learn more about this: https://www.cs.cmu.edu/~alavie/papers/AMTA-10-Denkowski.pdf

Regards,
Juan


Direct link Reply with quote
 
Daryo
United Kingdom
Local time: 22:47
Serbian to English
+ ...
No machines evaluating machines mentioned? Really? Aug 5, 2016

... As we discussed in our article on quality estimation, machine translation output can be evaluated automatically, using methods like BLEU and NIST, or by human judges.

https://www.linkedin.com/pulse/ebay-mt-language-specialists-series-human-evaluation-fernández-rowda

Quality Estimation is a method used to automatically provide a quality indication for machine translation output without depending on human reference translations. In more simple terms, it’s a way to find out how good or bad are the translations produced by an MT system, without human intervention.

https://www.linkedin.com/pulse/ebay-mt-language-specialists-series-basics-quality-fernández-rowda?trk=mp-author-card

===
a way to find out how good or bad are the translations produced by an MT system, without human intervention
===

It's there or not? If anyone told me that I would ever had the opportunity to read that sentence in a serious article about translating, I would have considered it as an improbable bad joke.

Yes, there is also a mention of humans evaluating MT output [NOT translations], but "machines evaluating machines" IS presented as a current practice, as something perfectly normal and acceptable!

Well, in my rule-book it's not, and the idea of "machines evaluating MT machines" is one the best illustrations of what is wrong with MT as it is practised now


Direct link Reply with quote
 

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 14:47
English to Spanish
+ ...
TOPIC STARTER
Estimation vs Evaluation Aug 5, 2016

Daryo wrote:

Well, in my rule-book it's not, and the idea of "machines evaluating MT machines" is one the best illustrations of what is wrong with MT as it is practised now



Daryo, it's not my intention to start an endless argument on MT. It's not about Human vs. Machine. Of course humans are better - but in certain scenarios, human translation alone is not realistic.

I believe you are mixing the concepts of evaluating and estimating. BLUE, for example, uses human translation references to evaluate MT output. These are methods that have been used for years now, developed by really smart people, scientists. I wouldn't discard all this knowledge without investing some time in learning about it.

I appreciate you taking the time to read my articles, and I encourage to keep doing so if you are interested in the subject, even if you are against it! Knowledge can never hurt. And feedback is always welcomed!


Direct link Reply with quote
 
Daryo
United Kingdom
Local time: 22:47
Serbian to English
+ ...
I have no doubts that Aug 7, 2016

automated processing of information will have a growing importance in translating, there is no way back.

What I object to is the continuing trend to feed the unsuspecting public marketing BS instead of serious information, with the predictable results like Google translate getting indigestion by gorging on its own output, or professional translators being "corrected" by any passing ignoramus on the grounds that "Google translate" says otherwise!


Direct link Reply with quote
 

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 14:47
English to Spanish
+ ...
TOPIC STARTER
Agreed Aug 9, 2016

I'm with you on that. Machine Translation is just a tool and it works great for some things, and it just doesn't work for some others. Forks are amazing but a bit of a pain to eat soup with, right? There's many people out there using this technology in the wrong way or for the wrong purpose.

As you say, most pieces about MT that we get to read are marketing BS, pure advertisement and no substance. That's the reason why I started publishing these posts - sharing knowledge with the translation community. I wish I had some of this valuable information when I started working with MT. I had exactly the same feeling when I was trying to learn about CAT tools more than 14 years ago - there was no information, only people complaining they were getting paid less for TM matches. I'm ok with you considering my article moronic, but I'm sure some will find it interesting, no doubt. That diversity of opinion is healthy. Many translators feel strongly about MT, and that's fine, but we need to be informed, we need to know what's going on, we need to know how the technology is evolving, etc. and then make informed choices.

Just my opinion.

Have a good day!
Juan


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 23:47
Member (2006)
English to Afrikaans
+ ...
@Daryo Aug 9, 2016

Daryo wrote:
The idea that to evaluate the depth of the garbage produced by a non-human translation automaton you should try to avoid human participation at all cost is simply beyond belief...


Yes, but no-one is saying that.

Even in "automated" evaluation systems, humans are involved. In automatic systems, e.g. BLEU, the translation quality is graded using strict metrics instead of opinions. The article in the first post relates to methods that use human opinion when evaluating machine translation. And the problem with "opinion" is that it is very difficult to standardise, as I'm sure you'll agree.

In systems like BLEU, for example, the machine translation is compared to human translations of the same text. The machine's translation is graded in terms of how much it is similar to the human translation. The grading (i.e. the comparison) is done by a computer.


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Human Evaluation of Machine Translation

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search