matching rate
Thread poster: big_fish
big_fish  Identity Verified
Polish to English
+ ...
Dec 14, 2012

Hello,
Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.
Best regards,
Krzysztof


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:50
French to Polish
+ ...
The algorithm is faulty Jan 4, 2013

big_fish wrote:

Thank you again for everyone's suggestions and help.
I'm working on a new job and new curiosities puzzle me.
There is a segment:
Laws and regulations
TM match:
and
the match rate: 64%
Horribly overrated to a naked eye, isn't it?
How come?
Thank you in advance for any suggestions.


memoQ has serious matching problems for short segments.
E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.

Cheers
GG

[Edited at 2013-01-04 10:31 GMT]


Direct link Reply with quote
 
big_fish  Identity Verified
Polish to English
+ ...
TOPIC STARTER
Good to know for jobs with short-segments and low match rates Jan 4, 2013

Grzegorz Gryc wrote:

E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.

Cheers
GG

[Edited at 2013-01-04 10:31 GMT]


Thank you Grzegorz!
It's important to know, especially as some jobs may get horribly underpaid this way.
You have to keep an eye on the rates for hits between 60 and 70% for short segments.
In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.
It's important to keep this glitch in mind in these cost-saving times.
Best regards,

Krzysztof


Direct link Reply with quote
 

LEXpert  Identity Verified
United States
Local time: 00:50
Member (2008)
Croatian to English
+ ...
Dates and numbers Jan 4, 2013

I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.

Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:50
French to Polish
+ ...
Text recognition algorithms again... Jan 4, 2013

big_fish wrote:

Grzegorz Gryc wrote:

E.g. when one word differs in a two word sentence (e.g. "It rains" and "It happens", memoQ says the match rate is 65% while the simplest solution i.e. 50% seems logic.
I.e. for some specific projects kinda part lists the analysis may be completely screwed up.

It's a very old bug.
AFAIR I pointed it two years ago.


It's important to know, especially as some jobs may get horribly underpaid this way.

Generally, you should know one is very often underpaid according to the memoQ wordcount
I.e. in your language pairs it's not a very big problem, the word number difference is usually neglectable but e.g. for FR-PL it may reach by default approx. 15%.
The problem is the word definition in memoQ, it corresponds to a Word-like wordcount i.e. the word is a character chain between spaces (or equivalent), most tools use some GMX-V like word definition i.e. word separators are used (apostrophes, dashes etc.).
E.g., for memoQ, 1-Chloro-2,4-dinitrobenzene is one word while most tools would show more, e.g. 4 words in Trados or 2 words in Déjà Vu (DVX doesn't count numerals as words).

Nonetheless, IMO, unlike many Trados "features", it's not a cheat intent, it's just a fundamental error in the memoQ design.

E.g., this kind of word definition makes memoQ barely usable for some types of jobs e.g. the segments containing chemical compound names like:
1-Chloro-2,4-dinitrobenzene
1-Chloro-3,4-dinitrobenzene
will not be recognized as similar by memoQ even if you lower the threshold to 10% (sic!, ten percent).
Of course, it will also screw up the match level for larger segments but it will be less visible.

You have to keep an eye on the rates for hits between 60 and 70% for short segments.

Frankly speaking, almost everything below 70% should be considered (i.e. paid) as no match...
So why Trados Studio pumps up artificially the wordcount i.e. the match rate is usually approx. 30% higher (relative value) than the old Trados match rates.
E.g. when two word differs in a 5 word sentence, the old Trados shows a 60% match, the new one claims it's a 72 or 73% match, which is obviously absurd for sentences like "The Silence of the Lambs" and "The Voice of the Martyrs"...

In fact the example you give ("It rains" and "It happens") in Polish (and I believe in very many other languages too) is no match, as these would be to distinct sentences with nothing in common.

Yep, obviously.
E.g. in French it corresponds to "Il pleut" and "Ça arrive" etc.

It's important to keep this glitch in mind in these cost-saving times.

Most people don't care about algorithms but it's useful

Cheers
GG


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:50
French to Polish
+ ...
And tags... Jan 4, 2013

Rudolf Vedo CT wrote:

I've noticed that numbers and dates will often be treated as relatively high matches for other numbers or dates, or even each other, and even if the number of digits, formatting, separators, etc. is completely different.


The same with tags.
I didn't analyze it thoroughly i.e. I'm unable to quantify it but it seems memoQ follows in some way the Trados behaviour where the numeral weight is two times bigger than the word weight.
Trados pollutes reason.

Cheers
GG


Direct link Reply with quote
 

Dr. Matthias Schauen  Identity Verified
Germany
Local time: 07:50
Member
English to German
Apparently no improvement yet Feb 24, 2015

This is an update saying that Kilgray seem to have made no progress regarding these problems right in the very heart of their software. I see a similar behavior in memoQ 2014 R2:

DE: Summe
DE TM: Indian Summer
Match: 69%

EN: Meta-analysis (2)
EN TM: -5
Match: 73%

EN: Effector T Cell
EN TM: T:
Match: 65%

On the other hand, memoQ gives only a 90% match rate for a 33-word (168-character) sentence from the TM identical to the source segment except for one different symbol/letter and four different formatting tag pairs.

You can find forum discussions on the internet dating from 2011 where someone from Kilgray admits to the severity of this problem, saying that they have an "extreme bottleneck for any TM engine related development/bugfixing" and that they will try to fix this as soon as possible.
It is a pity that this hasn't happened yet. I was so hoping that I could get rid of this behavior when recently changing from another industry-leading CAT tool to memoQ.


Direct link Reply with quote
 
big_fish  Identity Verified
Polish to English
+ ...
TOPIC STARTER
over-reliance on technology Feb 24, 2015

I did not expect to revisit this discussion after years from the initial post.
Even though translation aids have brought a new quality to the manner we work, the over-reliance on technology has not brought any improvement in the quality of translation output.
The software does not make translators richer either.
Translators are cogs in a machine. Translation companies are supervisors. Customers watch their bills.
Who's happier because of this?


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

matching rate

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search