How one word of three becomes 70% match

Dear colleagues,

I have the following example:
Rechenbeispiel Teleskoparm eingefahren (Saved in the TM)
Daumenrad Teleskoparm einfahren (new entry that Trados says is a 70% match to the one above)

Any ideas as to why the match percentage is so high when in theory I would need to translate 2 words of 3? Should the match not be something like 33%? (In practice of course it was quicker to rewrite the entire sentence)...

Is it normal for the match to be calculated as a percentage of one single word e.g. einfahren vs engefahren and that I should offer a discount for such useless matches?)

Thank you for any insights..

 Two words out of three Apr 28, 2016

Well, in your example, the algorithm isn't doing such a bad job after all, is it? "Teleskoparm" is similar (only misspelt in one case), and although "aingefahren" is misspelt too, "einfahren" is closely related. So after all, I think that evaluating this as a 70% match is not too far off the mark. Whether you save 70% of the time needed to translate this segment is entirely a different question, of course.

Generally speaking though, the fuzzy matching algorithm of Studio is indeed quite useless, certainly for short segments. There are a lot of revealing examples in other threads about this topic.

Bottom line: You can't and shouldn't offer discounts for fuzzy matches.

 I believe it counts characters Apr 28, 2016

I suspect the fuzziness is counted on the basis of how many characters are similar in the TM, in which case you would get 70%.

But please note that it's quite customary to count anything less than 75% as no match, i.e. 100% rate. But I agree, the number doesn't make much sense in this case.

Roy

 All algorithms fall apart on short segments Apr 28, 2016

Unfortunately, all fuzzy matching algorithms fall apart with short segments. I've had them try to tell me that dates from different centuries and numbers of entirely different orders of magnitude that start with different digits, or 1/2/3 words whose reformatting/tag insertion takes as much time as rewriting the segment are 99 or 98% matches. CAT grids would probably be much fairer to the translator if a universal standard for a minimum fuzzy match length were agreed, e.g., anything below 3 words that isn't 100% or a repeat is automatically a no match.

 Maybe also note... Apr 28, 2016

... that in the current version it's not 70%. I was just curious (just copied source to target - no idea what this really translates to and it's a good example of where MT may or may not help. You still need to know!):

Regards

Paul
 I am learning my lesson here the hard way Apr 28, 2016

I think the fuzzy match discounts should be especially avoided for German...
I also agree that short sentences should be excluded since they are irrelevant. I am curious however if there might be any statistic studies showing some kind of overall compensation to make up for this imbalanced count..

I always say fuzzy discounts are misleading but here I am accepting to offer them for an interesting project and first-time client... I will definitely have learned my lesson after this experience...

 because of misspelling Apr 28, 2016

That's because there was a misspelling in the initial text I wrote - I corrected it now...

