matching rate
Thread poster: big_fish

big_fish  Identity Verified
Polish to English
+ ...
Dec 14, 2012

Hello,
Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.
Best regards,
Krzysztof


 

Grzegorz Gryc  Identity Verified
Local time: 17:06
French to Polish
+ ...
The algorithm is faulty Jan 4, 2013

big_fish wrote:

Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.


memoQ has serious matching problems for short segments.
E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.

Cheers
GG

[Edited at 2013-01-04 10:31 GMT]


 

big_fish  Identity Verified
Polish to English
+ ...
TOPIC STARTER
Good to know for jobs with short-segments and low match rates Jan 4, 2013

Grzegorz Gryc wrote:

E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.

Cheers
GG

[Edited at 2013-01-04 10:31 GMT]


Thank you Grzegorz!
It's important to know, especially as some jobs may get horribly underpaid this way.
You have to keep an eye on the rates for hits between 60 and 70% for short segments.
In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.
It's important to keep this glitch in mind in these cost-saving times.
Best regards,

Krzysztof


 

LEXpert  Identity Verified
United States
Local time: 10:06
Member (2008)
Croatian to English
+ ...
Dates and numbers Jan 4, 2013

I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.

 

Grzegorz Gryc  Identity Verified
Local time: 17:06
French to Polish
+ ...
Text recognition algorithms again... Jan 4, 2013

big_fish wrote:

Grzegorz Gryc wrote:

E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.


It's important to know, especially as some jobs may get horribly underpaid this way.

Generally, you should know one is very often underpaid according to the memoQ wordcount icon_smile.gif
I.e. in your language pairs it's not a very big problem, the word number difference is usually neglectable but e.g. for FR-PL it may reach by default approx. 15%.
The problem is the word definition in memoQ, it corresponds to a Word-like wordcount i.e. the word is a character chain between spaces (or equivalent), most tools use some GMX-V like word definition i.e. word separators are used (apostrophes, dashes etc.).
E.g., for memoQ, 1-Chloro-2,4-dinitrobenzene is one word while most tools would show more, e.g. 4 words in Trados or 2 words in Déjà Vu (DVX doesn't count numerals as words).

Nonetheless, IMO, unlike many Trados "features", it's not a cheat intent, it's just a fundamental error in the memoQ design.

E.g., this kind of word definition makes memoQ barely usable for some types of jobs e.g. the segments containing chemical compound names like:
1-Chloro-2,4-dinitrobenzene
1-Chloro-3,4-dinitrobenzene
will not be recognized as similar by memoQ even if you lower the threshold to 10% (sic!, ten percent).
Of course, it will also screw up the match level for larger segments but it will be less visible.

You have to keep an eye on the rates for hits between 60 and 70% for short segments.

Frankly speaking, almost everything below 70% should be considered (i.e. paid) as no match...
So why Trados Studio pumps up artificially the wordcount i.e. the match rate is usually approx. 30% higher (relative value) than the old Trados match rates.
E.g. when two word differs in a 5 word sentence, the old Trados shows a 60% match, the new one claims it's a 72 or 73% match, which is obviously absurd for sentences like "The Silence of the Lambs" and "The Voice of the Martyrs"...

In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.

Yep, obviously.
E.g. in French it corresponds to "Il pleut" and "Ça arrive" etc.

It's important to keep this glitch in mind in these cost-saving times.

Most people don't care about algorithms but it's useful icon_smile.gif

Cheers
GG


 

Grzegorz Gryc  Identity Verified
Local time: 17:06
French to Polish
+ ...
And tags... Jan 4, 2013

Rudolf Vedo CT wrote:

I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.


The same with tags.
I didn't analyze it thoroughly i.e. I'm unable to quantify it but it seems memoQ follows in some way the Trados behaviour where the numeral weight is two times bigger than the word weight.
Trados pollutes reason.

Cheers
GG


 

Dr. Matthias Schauen  Identity Verified
Germany
Local time: 17:06
Member
English to German
Apparently no improvement yet Feb 24, 2015

This is an update saying that Kilgray seem to have made no progress regarding these problems right in the very heart of their software. I see a similar behavior in memoQ 2014 R2:

DE: Summe
DE TM: Indian Summer
Match: 69%

EN: Meta-analysis (2)
EN TM: -5
Match: 73%

EN: Effector T Cell
EN TM: T:
Match: 65%

On the other hand, memoQ gives only a 90% match rate for a 33-word (168-character) sentence from the TM identical to the source segment except for one different symbol/letter and four different formatting tag pairs.

You can find forum discussions on the internet dating from 2011 where someone from Kilgray admits to the severity of this problem, saying that they have an "extreme bottleneck for any TM engine related development/bugfixing" and that they will try to fix this as soon as possible.
It is a pity that this hasn't happened yet. I was so hoping that I could get rid of this behavior when recently changing from another industry-leading CAT tool to memoQ.


 

big_fish  Identity Verified
Polish to English
+ ...
TOPIC STARTER
over-reliance on technology Feb 24, 2015

I did not expect to revisit this discussion after years from the initial post.
Even though translation aids have brought a new quality to the manner we work, the over-reliance on technology has not brought any improvement in the quality of translation output.
The software does not make translators richer either.
Translators are cogs in a machine. Translation companies are supervisors. Customers watch their bills.
Who's happier because of this?


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

matching rate

Advanced search






Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search