Tags diminishing machine translation results
Thread poster: Thijs Vissia

Thijs Vissia
Netherlands
Mar 2, 2019

I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory(Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):

“Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

... See more
I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory(Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):

“Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

With some tags/formatting strewn in, this became: “Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

In there, there is the widespread idiom “beschikken over” (meaning “have access to”, “have at ones disposal”). I don’t know about the quality of the machine translation at MyMemory, but as a widespread idiom this should be recognised and translated properly.

When the tags were in there, this was returned as:
“Everyone institute about more money and more buildings.”

The connection between “ieder” and “institute” was broken by the tag, so instead of “every institute” it rendered this as “everyone institute”. Similarly, the composite verb “beschikken over”, was also interrupted by a tag, so the MT treated each piece separately, and apparently left out the verb entirely.

However, after creating a new file without tags, it came back as:

“Every institute will have more money and more buildings.”

Which may not be my phrasing of choice but otherwise a fine translation.

So I was wondering why the tags get sent out to the machine translation services in the first place? Is it so that all the formatting doesn’t need to be put back in manually afterwards? Wouldn’t it be almost as easy to strip the strings of the tags before sending the query to the MT service?
Considering that OmegaT already needs to recognise tags as such (to treat them differently in the editor pane), wouldn't it be possible to make sending them to the MT service optional?

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


[Edited at 2019-03-03 10:37 GMT]
Collapse


 

Milan Condak  Identity Verified
Local time: 15:19
English to Czech
Translator can remove tags before translation Apr 1, 2019

Thijs Vissia wrote:

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


Translator can remove tags before pretranslation against TMX or using MT,

http://www.condak.cz/nove/2019-03/31/en/00.html

and put them back after pretranslation.

Milan


 

Thijs Vissia
Netherlands
TOPIC STARTER
ah Apr 1, 2019

Milan Condak wrote:

Translator can remove tags before pretranslation against TMX or using MT, (...)
and put them back after pretranslation.

Milan


hi Milan,
Ah, thank you for the clarification, I didn't realize you could put them back afterwards by toggling the option again, but of course the source file isn't changed. I somehow assumed this worked the same way as tagwipe, which does affect the source file.

I think the documentation could be a bit clearer about this, or even the option in Preferences, 'Remove tags' seems rather definitive.

But clearly this solves my problem, I can translate and use MT and manually put tags back after translating.

cheers,
Thijs


 

Samuel Murray  Identity Verified
Netherlands
Local time: 15:19
Member (2006)
English to Afrikaans
+ ...
Fixed post (your membership fee will never buy fixed forum software) Apr 2, 2019

Thijs Vissia wrote:
I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory (Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):
Ieder instituut gaat beschikken over meer geld en meer gebouwen.

With some tags/formatting strewn in, this became:
Ieder <f0>instituut gaat beschikken</f0><f1> </f1><f2>over meer geld</f2> en meer gebouwen.

In there, there is the widespread idiom “beschikken over” (meaning “have access to”, “have at ones disposal”). I don’t know about the quality of the machine translation at MyMemory, but as a widespread idiom this should be recognised and translated properly.

When the tags were in there, this was returned as:
Everyone <f0> institute </f0><f1></f1><f2> about more money </f2> and more buildings.

The connection between “ieder” and “institute” was broken by the tag, so instead of “every institute” it rendered this as “everyone institute”. Similarly, the composite verb “beschikken over”, was also interrupted by a tag, so the MT treated each piece separately, and apparently left out the verb entirely.

However, after creating a new file without tags, it came back as:
Every institute will have more money and more buildings.
Which may not be my phrasing of choice but otherwise a fine translation.

So I was wondering why the tags get sent out to the machine translation services in the first place? Is it so that all the formatting doesn’t need to be put back in manually afterwards? Wouldn’t it be almost as easy to strip the strings of the tags before sending the query to the MT service?

Considering that OmegaT already needs to recognise tags as such (to treat them differently in the editor pane), wouldn't it be possible to make sending them to the MT service optional?

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


[Edited at 2019-04-02 05:54 GMT]


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Tags diminishing machine translation results

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search