The skinny on fuzzy matches
Thread poster: Bernhard Sulzer

Bernhard Sulzer  Identity Verified
United States
Local time: 22:59
English to German
+ ...
Sep 18, 2015

I decided to open this thread because even though we're discussing this topic currently in a technical help forum, I believe it deserves a greater audience. The technical thread is about internal fuzzy matches, which prompted me to think: What's next? Rudimentary internal fuzzy matches?!


To be clear, I am very critical of the value of any fuzzy matches for the so-called leverage during translation. And if you aren't, that's your business. But hear me out.

An 85% match isn't always an 85% in the target text.

I am even more critical of discount schemes based on fuzzy matches.

Simple example for fuzzy match: English>>German: appr. 85% expected the fence to come down >> rechnete damit, dass der Zaun auf sie stürzen würde.

TM from previous project: Eloise had crashed into the fence and was hurled out of the car. She was pinned under the fence. She expected the fence to come down on her.
German: Sie rechnete damit, dass der Zaun auf sie stürzen würde.

New source text, appr. 85% match: Miriam had waited patiently at the border. She had expected the fence to come down. But no chance.
German: Sie hatte erwartet, dass der Zaun abmontiert werden würde.

As you can see, the two translations for expected the fence to come down are quite different.


Even though the value of fuzzy percentage matches is often very questionable, matches per say may hold some value and help the translator, but it's not a guarantee. Too much confidence in and use of matches can ruin a text that's supposed to be homogeneous and not a patchwork of "fixed fuzzy matches."

Certain agencies' practice of demanding arbitrary discounts per word no matter what text, no matter what algorithm calculated the match and no matter what the context or field is, is completely unacceptable to me.

And still, there are plenty of colleagues who can't wait to "be discounted."

So, what's really in a fuzzy match? What is its definition? Can you pin down the value to a standard 85% or whatever percentage across all fields? I don't think you can. Heisenberg comes to mind.

And why should you discount for it? Better translation, faster translation doesn't automatically equate discount. And the matches itself don't "make" the translation. You have to look at them and decide if they're good or if you can use them and how, based on context and field. And what about matches in the source text only, with no previous TM? More likely a match? Maybe. But not necessarily.

If you simply accept an agency's demand to discount arbitrarily, solely based on some questionable segment analysis, you are simply being exploited.

I recommend reading these four other threads, most recent and ongoing one (from the technical forum) first:

http://www.proz.com/forum/cat_tools_technical_help/290012-internal_fuzzy_matching_wfp_34_x_trados_2011_x_memoq_62.html

http://www.proz.com/forum/business_issues/282463-a_large_agencys_new_pricing_structure_for_translations_utilizing_cat_tools.html

http://www.proz.com/forum/money_matters/262210-what_is_grid_for_fuzzy_matches.html

http://www.proz.com/forum/money_matters/285292-weighted_rates_are_these_ones_normal.html

My suggestion: don't let agencies dictate what you charge for your hard work! Especially based on fuzzy and 100% matches Why? There's no solid basis for it. The "value" of any percentage match for translation isn't easily or automatically pinned down.

[Edited at 2015-09-18 02:17 GMT] edited for typo.

[Edited at 2015-09-18 03:47 GMT]

[Edited at 2015-09-18 04:27 GMT]


Direct link Reply with quote
 

Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 10:59
Member (2004)
English to Thai
+ ...
Weaker freelance translators Sep 18, 2015

Bernhard Sulzer wrote:

I decided to open this thread because even though we're discussing this topic currently in a technical help forum, I believe it deserves a greater audience. The technical thread is about internal fuzzy matches, which prompted me to think: What's next? Rudimentary internal fuzzy matches?!


To be clear, I am very critical of the value of any fuzzy matches for the so-called leverage during translation. And if you aren't, that's your business. But hear me out.

An 85% match isn't always an 85% in the target text.

I am even more critical of discount schemes based on fuzzy matches.

If you simply accept an agency's demand to discount arbitrarily, solely based on some questionable segment analysis, you are simply being exploited.



[Edited at 2015-09-18 02:14 GMT]


Exploitation of translators is a permanent issue. The business question is how to maker freelance translators stronger in negotiation. Collective bargain is a possible way, but how we can deal with this method?
Fuzzy matches are quoted by agencies who are familiar with CAT tools. But many agencies apply CAT tools wrongly. For example, they accept quoted price by target texts which is not reasonable because target texts are our job outputs we never know beforehand.
Why not educate agencies on fair use of CAT tools?

Soonthon L.


Direct link Reply with quote
 

Jenae Spry  Identity Verified
United States
Local time: 19:59
French to English
Different scale and it's not personal Sep 18, 2015

Your examples of fuzzy matches are valid and, to an extent, I agree with you. It's true that the percentage is often off. However, I notice you didn't mention that it's often off in the other direction as well. I have come across several fuzzy matches which are actually 100% matches but the source had a misspelling or missing period, which was fixed in the translation. The agency is thinking in averages over hundreds of thousands or, hopefully, millions of words and you're thinking by the word or project. As a result, it stands to reason that you come to a radically different conclusion on the idea of discounting for CAT analyses.

If a client provides me with a TM, and that TM makes my job easier and/or faster, then it does, in my opinion, stand to reason that they deserve a discount for providing that service. I am benefiting from the work they have acquired and had processed through their vendors and processes.

I'd like to also comment on the idea that this practice involves exploitation of translators. I think this viewpoint comes from not thinking like a scalable business, which makes sense because we, as freelancers, are not scalable businesses. The goal of a business is to provide a service or product and turn a profit. If the individuals running that business are not thinking about how to increase their profit margins as much as possible, then that businesses is unlikely to succeed. This brings me to another point -- it's not personal.

The agency is not out to harm you (ok...some are bad businesses, but let's not throw the baby out with the bathwater), they are trying to get the most bang for their buck. It's not personal. If increasing their profit margin through the use of tools and processes is exploitation, then by that definition, any business is exploiting the people it pays since any good business will have the goal of paying as little as possible to get what it needs without going below a given standard, ideally.

Smaller agencies might be in a position to hire better translators and pay higher prices, but this is only because they probably don't have the processes in place to deal with the results of going below that price and/or they have not yet scaled and thus been forced to dig deep into their pool of translators.

I'm aware that the foregoing may not be a popular opinion, but I think it's a realistic one. It's difficult for translators to understand how agencies work without having worked at one or run one but I honestly don't think they are the enemy.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:59
Member (2006)
English to Afrikaans
+ ...
Text type and subject field, and a suggestion Sep 18, 2015

Bernhard Sulzer wrote:
Eloise had crashed into the fence and was hurled out of the car. She was pinned under the fence. She expected the fence to come down on her.

Miriam had waited patiently at the border. She had expected the fence to come down. But no chance.


Yes, how much value an 85% match would be, depends a great deal on the type of text and on the subject field. And on whether you're using a targeted TM or a big mama TM.

Looking at these two samples, I find it unlikely that you'll get many of these matches between the two texts. This means that although 10 words will have been discounted by 50%, most of the text would not be, and payment by word count is always a game of averages.

Using this example is like trying to prove that word rates are a bad idea by showing an example with mostly very long words in it, or by trying to prove that machine text conversion is unreliable by showing an example with a metaphor or idiom in it.

Certain agencies' practice of demanding arbitrary discounts per word no matter what text, no matter what algorithm calculated the match and no matter what the context or field is, is completely unacceptable to me.


In many of your posts you talk about agencies "demanding" things or "forcing" translators to do things, but those words can only apply if the translator is desperate to work for that particular agency.

And still, there are plenty of colleagues who can't wait to "be discounted."


That may be because they make more money per hour when doing discounted work than when they insist on doing only work that involves no discounts. What matters is not how much money you get for the individual words, but how much money you make at the end of a day. If offering/ accepting discounts lead to lower income, then don't be stupid, and stop it.

So, what's really in a fuzzy match? What is its definition? Can you pin down the value to a standard 85% or whatever percentage across all fields?


You're right that evaluating the fuzziness of a text on a sentence basis is risky. Perhaps CAT tool developers should come up with something called "context fuzzy matching", and offer three variants:

* "paragraph context fuzzy matching":
a fuzzy match is only regarded as a fuzzy match if the two sentences before and after it also contain at least one fuzzy match of at least the same match percentage (otherwise the match is either disregarded or penalised, at the user's option).
* "file context fuzzy matching":
a fuzzy match is only regarded as a fuzzy match if the rest of the file contains at least one other fuzzy match of at least the same match percentage. This would apply mostly to internal fuzzy matching or to texts that use highly targeted TMs.
* "segment context fuzzy matching":
a fuzzy match is only regarded as a fuzzy match if the overlapping phrase from that segment also occurs elsewhere in the file. (I'm not sure if this one is practical, since many fuzzy match systems now use distance systems, which is usually character based and may not match full phrases.)

This would go some way to making fuzzy matching more relevant, I think.

But who are we kidding... only CafeTran is likely to implement this (and neither of us use CafeTran).

If you simply accept an agency's demand to discount arbitrarily, solely based on some questionable segment analysis, you are simply being exploited.


Of course. But it is a mistake to believe that most translators who offer/accept discounts are doing so under extreme duress.

You also seem to think that "arbitrary" means something else than what the rest of us thinks (or are you, perhaps, deliberately applying the word as a makeshift type of hyperbole to help your argument sound more desperate?).

Samuel


[Edited at 2015-09-18 08:31 GMT]


Direct link Reply with quote
 

Christine Andersen  Identity Verified
Denmark
Local time: 04:59
Member (2003)
Danish to English
+ ...
Don't assume you are a victim - negotiate assertively Sep 18, 2015

Some of my clients work out detailed rates based on different degrees of fuzziness - but I have reached the point where the TMs they provided plus my own with earlier work for them are a real help. As Jenae Spry points out.
I ALWAYS look at the overall fee they propose for each job, and sometimes adjust it. I may adjust it down, too - it happens!

I have simply set my agreed basic rate to allow for it.

Just because the translator is an individual and by definition a 'small' company, that does not automatically make you open to exploitation.

Translators also hold strong cards in the game. Without their services, the agencies could not run a business either. Ignore the undercutting cheap-jacks and concentrate on why you are better.

There are other parameters to compete over besides price.

We should shift the focus - and in fact the best agencies do - to the various aspects of quality. On any market, buyers are willing to pay good money for a reliable product, tailored to their needs. They are perfectly aware that the cheapest cannot be the best as well.

The same applies to translation.

Instead of defensively assuming translators are being exploited, it is a far better marketing tactic to 'hold your head high' and let clients know what you have to offer. That you are not afraid of cheaper rivals, if it comes to the point.

Emphasise what you actually do, and propose your own rate. Don't mention words like cheap, discount, reduction. Go for clients who want a quality product.

My rate for new words is at least Euro 0.XX, and the rate for fuzzy matches is 0.YY.

Of course, they want value for money, but if you shift the focus to the value instead of haggling about deductions, you should be able to agree on a reasonable overall fee for the job.

Talk rates and quality up instead of down... While there still ARE good clients and translators around!


Direct link Reply with quote
 
Texte Style
Local time: 04:59
French to English
fuzzy wuzzy Sep 18, 2015

Jenae Spry wrote:

Your examples of fuzzy matches are valid and, to an extent, I agree with you. It's true that the percentage is often off. However, I notice you didn't mention that it's often off in the other direction as well. I have come across several fuzzy matches which are actually 100% matches but the source had a misspelling or missing period, which was fixed in the translation.


OK so sometimes that does happen. When it happens with a client of mine, more often than not the source has been corrected further to my flagging the error to the client when doing the previous translation. I consider that flagging source errors to be going over and beyond my brief to translate, so I really do deserve to be paid extra for having to deal with that segment yet again.

So what should really be a 100% is not and I waste time reading source and target, sussing out why it is not a 100%. Those pesky accents don't leap out and hit me square in my third eye after all, and even when the CAT tool is flagging it for me, it does so in the right-hand corner, my eyes only stray that way when I'm checking on the time.
I also inevitably run a mental check for any other issues in the segment (have I changed my mind on any of the terminology? am I sure I checked the spelling for that word? maybe if I switched the two parts of the sentence round I could make it sound more natural?)

As a result, I'm actually mulling over charging more for these segments.

I've just spent three days proofreading total garbage (PM only admitted it was machine translated when I asked her how a bilingual human being could possibly translate "coffre-fort" [a safe] as "case-strong"). I'm sure that I would have spent less time translating it from scratch, simply because of wasting time checking whether any of the syllables of garbage I was sent could possibly be salvaged.
(I sent a comparison file to show exactly how much had been changed, there wasn't a single sentence left intact in the entire 5-page file)


Jenae Spry wrote:

It's difficult for translators to understand how agencies work without having worked at one or run one but I honestly don't think they are the enemy.


I worked in two. One had a dishonest boss but honest employees who would make sure translators didn't get cheated by their boss. It was an upmarket agency and the PMs worked hard to keep their talented, well-paid translators happy, ensuring continuity of style for regular customers. I couldn't count the number of times a translator would say "OK. I'm only saying OK because it's you".

In the other, PMs were obliged to make sure the translator only got 50% of what the agency billed. When they couldn't find a translator with a low enough rate, they would tweak the wordcount to squeeze some free work out of the translator. They didn't bother to make out POs for small jobs, telling the translator that they were doing it for free for the customer, then billing the customer regardless. They didn't bother making out POs for in-house translators, who then got fired for not being productive enough. They would also routinely pass fuzzies off as "segments to be proofread" so that it would cost less. And the previous translations were often so abysmal I ended up completely redoing them so that the entire translation would read well, not just the bits I translated.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:59
Member (2006)
English to Afrikaans
+ ...
Blame the CAT tool Sep 18, 2015

Texte Style wrote:
So what should really be a 100% is not and I waste time reading source and target, sussing out why it is not a 100%. Those pesky accents don't leap out and hit me square in my third eye after all, and even when the CAT tool is flagging it for me, it does so in the right-hand corner, my eyes only stray that way when I'm checking on the time.


It sounds to me like you're using a badly designed CAT tool.

Yes, if you can't see what the difference between the TM source and the current source text is, then those 95% matches will end up taking more time than 50% matches. This is why I don't readily accept fuzzy match discounts for files where the fuzzy match was automatically filled in, or where the client's specialised CAT tool doesn't show me the differences, because when the CAT tool only tells you "it's a 95% match", you end up spending *more* time, not less. Or, I adjust my rate upward.


Direct link Reply with quote
 

Inese Poga-Smith  Identity Verified
Canada
Local time: 22:59
Latvian to English
+ ...
Discounts for matches is advantage taking Sep 18, 2015

Discounts for matches certainly work to some extent in English. When it comes to some other languages, it's a pure exploitation. I'm working mostly in Latvian into English and English into Latvian. Latvian is one of those languages where you cannot take out a word of a sentence and simply replace it with another, the entire structure changes and all endings in all verbs, pronouns, nouns, adjectives and word order are different once we change one element in a sentence. I'm not even mentioning something which is called the previously accepted TMs. Language cannot be viewed as something static which remains the same for decades. First of all, there are so many new terms developed and accepted for devices which we never had before, and when it comes to my field which is medical texts, the number of new discoveries, new approaches, new techniques, etc. is just mind-blowing. New terms usually come from English and we are trying to find an equivalent in language like Latvian for a while. That does not happen overnight or even during a months. Creating a usable terms which suits all aspects of its use can be tricky. Therefore, the term we used a year ago, can be not good when we do the next translation. As you may know, medical texts have to comply with lots of requirements, regulations, rules, provisions and restrictions. Using the same medical TM after 2 years means to rewrite it to big extent, sometimes rewrite completely.
Poor agencies, indeed! I still remember those ancient times when we had a rate, a text, requirements and deadline. Nobody cared how I did it as long as the translation quality met their standards or demands. Besides, the same agency that paid me 0.16 USD per word in 2006, is asking to do medical jobs for 0.05 USD per word in 2015, plus they'd love me to use some old TM so that they could discount even more for something which will never be a match in Latvian.
As we are seeing from support forums, translators run in some kind of problems with CATs very frequently. I catch myself thinking sometimes: I'd had done the job already if it wasn't for this terrible specific CAT they require. I am 100% sure that anything which is not 100% repetition of the same text within the same job MUST be viewed as a new text. I am certainly not accepting jobs which go like this: new words: 1200, 100% matches 4500; fuzzy matches 2600, and the total pay comes down to new words with some 5 bucks for all the other work.
It is a total advantage taking of translators. Translators should always have a choice: use some CAT, not use any, or use their own preferred CAT because as per present moment one needs to be sufficient with at least 3 different CAT tools to be able handling all kinds of work. Translator should not be pressurized to agree with the use of any translation tools in order to boost companies competitiveness and profits.
Therefore, I'm sorry, I cannot agree that some people use bad CAT tools or some people are not educated enough to use them as such, or they don't know how to handle matches, etc. I would love this is left to translator. REALITY is quite different: you don't agree with discounts for matches, you do not get a job. You do not feel like using an old TM, you do not get a job. There are just rare occasions when translator is not asked to give any discounts for matches, and that is when there isn't any single match.


Direct link Reply with quote
 

Bernhard Sulzer  Identity Verified
United States
Local time: 22:59
English to German
+ ...
TOPIC STARTER
Regarding exploitation Sep 18, 2015

Christine Andersen wrote:

Instead of defensively assuming translators are being exploited, it is a far better marketing tactic to 'hold your head high' and let clients know what you have to offer. That you are not afraid of cheaper rivals, if it comes to the point.


They are being exploited every day. Many don't know it. This is for them.


Direct link Reply with quote
 
Texte Style
Local time: 04:59
French to English
Blame the CAT tool Sep 18, 2015

Samuel Murray wrote:

Texte Style wrote:
So what should really be a 100% is not and I waste time reading source and target, sussing out why it is not a 100%. Those pesky accents don't leap out and hit me square in my third eye after all, and even when the CAT tool is flagging it for me, it does so in the right-hand corner, my eyes only stray that way when I'm checking on the time.


It sounds to me like you're using a badly designed CAT tool.

Yes, if you can't see what the difference between the TM source and the current source text is, then those 95% matches will end up taking more time than 50% matches. This is why I don't readily accept fuzzy match discounts for files where the fuzzy match was automatically filled in, or where the client's specialised CAT tool doesn't show me the differences, because when the CAT tool only tells you "it's a 95% match", you end up spending *more* time, not less. Or, I adjust my rate upward.



Blame the CAT tool? I certainly do!


Direct link Reply with quote
 

Thayenga  Identity Verified
Germany
Local time: 04:59
Member (2009)
English to German
+ ...
The crux with matches Sep 18, 2015

German is a rather complex language that oftentimes requires completely different words to get the same meaning across. One example is the word engagement. The agency might think it's a 100% match, repeated several times throughout a text, and therefore demands (are they my service providers?) a 100% discount = 0.00 of any given currency. Yet out of the back of my head I can come up with 12 different translations for thise "one and the same" word in German.

For example the word engagement appear 20 times within a text. Does the agency expect me to use whatever term comes up first regardless of its different meanings within the context? Or do they, by any chance, expect me to use the correct translation within the sentence/context? Of course without paying me for it. They want a flawless translation, but refuse to pay for the actual work. I've often said that one way to teach them the true value of "fuzzy and/or 100% matches" would be to do as described above, just take the first term that pops up and leave it in place. Chances are that one of the translations of the word engagement might be correct, or perhaps none at all.

If the client provides an up-to-date and flawless TM, then discounts can be granted (soley at the descretion of the actual service provider, the translator!), provided that the sentence and grammatical structures require no changes. If the client demands a specific CAT tool, usually Trados, then the question is: who is to pay for that particular (and for the client only acceptable) CAT tool? Just ask an agency to pay for your CAT tool in oder for them to force discounts on you, and all you can hope for is that they don't reply.

In short, the translator is supposed to buy any given CAT tool (perhaps even more than 1) an agency requires out of his/her own pocket only to then be faced with discount demands from the client for, at times, unlogical fuzzy matches, and also be expected to restructure the sentences, find the correct term, make all necessary grammatical adjustments, and all this for free or, at best, for peanuts.

Based on this, my policy as a service provider is simply that any discounts are to be granted at my sole descretion. I fully understand the agency's need to make a living and their desire to maximize their profits. Therefore, I hope that the agency can understand my need to make a living and my desire to make some profit.

[Edited at 2015-09-18 18:26 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

The skinny on fuzzy matches

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search