How one word of three becomes 70% match
Thread poster: Cristian Sălăjan

Cristian Sălăjan  Identity Verified
Romania
Local time: 11:42
French to Romanian
+ ...
Apr 28, 2016

Dear colleagues,

I have the following example:
Rechenbeispiel Teleskoparm eingefahren (Saved in the TM)
Daumenrad Teleskoparm einfahren (new entry that Trados says is a 70% match to the one above)

Any ideas as to why the match percentage is so high when in theory I would need to translate 2 words of 3? Should the match not be something like 33%? (In practice of course it was quicker to rewrite the entire sentence)...

Is it normal for the match to be calculated as a percentage of one single word e.g. einfahren vs engefahren and that I should offer a discount for such useless matches?)

Thank you for any insights..



[Edited at 2016-04-28 07:34 GMT]


 

Erik Freitag  Identity Verified
Germany
Local time: 10:42
Member (2006)
Dutch to German
+ ...
Two words out of three Apr 28, 2016

Cristian Sălăjan wrote:

Dear colleagues,

I have the following example:

Rechenbeispiel teleskoparm aingefahren (Saved in the TM)
Daumenrad Teleskoparm einfahren (new entry that Trados says is a 70% match to the one above)

Any ideas as to why the match percentage is so high when in theory I would need to translate 2 words of 3? Should the match not be something like 33%? (In practice of course it was quicker to rewrite the entire sentence)...

Is it normal for the match to be calculated as a percentage of one single word e.g. einfahren vs engefahren and that I should offer a discount for such useless matches?)

Thank you for any insights..



Well, in your example, the algorithm isn't doing such a bad job after all, is it? "Teleskoparm" is similar (only misspelt in one case), and although "aingefahren" is misspelt too, "einfahren" is closely related. So after all, I think that evaluating this as a 70% match is not too far off the mark. Whether you save 70% of the time needed to translate this segment is entirely a different question, of course.

Generally speaking though, the fuzzy matching algorithm of Studio is indeed quite useless, certainly for short segments. There are a lot of revealing examples in other threads about this topic.

Bottom line: You can't and shouldn't offer discounts for fuzzy matches.



[Edited at 2016-04-28 07:30 GMT]


 

Roy Oestensen  Identity Verified
Norway
Local time: 10:42
Member (2010)
English to Norwegian (Bokmal)
+ ...
I believe it counts characters Apr 28, 2016

I suspect the fuzziness is counted on the basis of how many characters are similar in the TM, in which case you would get 70%.

But please note that it's quite customary to count anything less than 75% as no match, i.e. 100% rate. But I agree, the number doesn't make much sense in this case.

Roy


 

LEXpert  Identity Verified
United States
Local time: 03:42
Member (2008)
Croatian to English
+ ...
All algorithms fall apart on short segments Apr 28, 2016

Unfortunately, all fuzzy matching algorithms fall apart with short segments. I've had them try to tell me that dates from different centuries and numbers of entirely different orders of magnitude that start with different digits, or 1/2/3 words whose reformatting/tag insertion takes as much time as rewriting the segment are 99 or 98% matches. CAT grids would probably be much fairer to the translator if a universal standard for a minimum fuzzy match length were agreed, e.g., anything below 3 words that isn't 100% or a repeat is automatically a no match.

 

SDL Community  Identity Verified
United Kingdom
Local time: 10:42
English
Maybe also note... Apr 28, 2016

... that in the current version it's not 70%. I was just curious (just copied source to target - no idea what this really translates to and it's a good example of where MT may or may not help. You still need to know!):



Regards

Paul
Why not try the new SDL Community


 

Cristian Sălăjan  Identity Verified
Romania
Local time: 11:42
French to Romanian
+ ...
TOPIC STARTER
I am learning my lesson here the hard way Apr 28, 2016

I think the fuzzy match discounts should be especially avoided for German...
I also agree that short sentences should be excluded since they are irrelevant. I am curious however if there might be any statistic studies showing some kind of overall compensation to make up for this imbalanced count..

I always say fuzzy discounts are misleading but here I am accepting to offer them for an interesting project and first-time client... I will definitely have learned my lesson after this experience...


@Erik Freitag: spelling errors corrected


 

Cristian Sălăjan  Identity Verified
Romania
Local time: 11:42
French to Romanian
+ ...
TOPIC STARTER
because of misspelling Apr 28, 2016

SDL Community wrote:

... that in the current version it's not 70%. I was just curious (just copied source to target - no idea what this really translates to and it's a good example of where MT may or may not help. You still need to know!):



Regards

Paul
Why not try the new SDL Community


That's because there was a misspelling in the initial text I wrote - I corrected it now...


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How one word of three becomes 70% match

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search