A statistically true CAT analysis
Thread poster: IanDhu

IanDhu  Identity Verified
France
Local time: 07:04
Member (2005)
French to English
Apr 22, 2013

This is intended as a tool for assessing the fairness of a client's analysis.

Principle: take the arithmetic mean of the fuzzy-match band (absolute ratio, not percent) and divide it by 100 (e.g. 99+95/2*100 = 0.97)
Subtract it from 1, and use the result as the weighting for the relevant band: 1-0.97=0.03
0.03 is the weighting for that band.


Match Types Weighting
Context TM_____________0
Repetitions____________0.1
100%_________________0
95 99%______________0.03
85 94%______________0.105
75 84%______________0.205
50 74%______________0.38
No Match_______________1

Match the weighted wordcount from this (the sum of the weightings times the wordcounts for each band) with the WWC arising from your client's analysis.

For any difference in the client's favour of more than 10%, renegotiation should be considered.

A 15% difference or more is excessive.

I have a table implementing these weightings, if required.

I hope this may help.

With kind regards,

Adam Warren (IanDhu - 41189)


Direct link Reply with quote
 

Mikhail Kropotov  Identity Verified
Russian Federation
Local time: 09:04
Member (2005)
English to Russian
+ ...
Why so low? Apr 22, 2013

You severely underestimate the effort required to translate fuzzy matches. Your weights are way, WAY too low. My scheme is as follows:

Context TM_____________0
Repetitions____________0.3
100%_________________0.3
95 99%______________0.5
85 94%______________0.75
75 84%______________0.75
50 74%______________1.0
No Match_______________1.0


Direct link Reply with quote
 

IanDhu  Identity Verified
France
Local time: 07:04
Member (2005)
French to English
TOPIC STARTER
No agency client of mine is quite as generous as your weightings Apr 22, 2013

And there is even one agency with one subsidiary that doesn't given any credit (weighting = 0) for repetitions.
My matrix is not "profitable", it actually seeks to be neutral, a benchmark for testing agency weighted word counts.
Is there any factor in the RussianEnglish set of language pairs that would warrant such high weightings?

With kind regards,

Adam Warren (IanDhu - 41189)


Direct link Reply with quote
 

Katalin Horváth McClure  Identity Verified
United States
Local time: 01:04
Member (2002)
English to Hungarian
+ ...
Way too low Apr 22, 2013

I don't think it is language-dependent, at least not at the figures you proposed.
Do you really think you need 0 time to review 100% matches from the TM?
Don't you need to at least proofread them to make sure they fit the current context?
Do you really think that 3% of time is enough to review and edit a 95-99% fuzzy match?
Again, even proofreading takes longer.

***Let me put it into prospective. Let's say you get 2500 words of text (10 standard pages) that comes up as 95-99% match. Do you think it is a fair estimate to say that finishing up the translation of this text (reviewing, adding what is missing, editing the surroundings, etc.) takes the same amount of time as translating 75 words? (3% of 2500 is 75.) Translating 75 words takes about 15 minutes (assuming someone translates 300 words per hour).***
Can you translate 10 pages of 95-99% fuzzy matched text in 15 minutes?

(P.S:the paragraph between the *** marks is 79 words.)


Direct link Reply with quote
 

Sheila Wilson  Identity Verified
Spain
Local time: 06:04
Member (2007)
English
+ ...
Suicidal, IMO Apr 22, 2013

I'm not too good with maths, but I personally would never give any discount at all for 50-74% fuzzy matches. I'd like to make a surcharge for them, actually, as they can look editable, so you start off editing, then find out it would have been quicker to have just deleted them - so editing can take longer in the end.

I charge repetitions and 100% matches at my proofreading rate, no lower.


Direct link Reply with quote
 

Katalin Horváth McClure  Identity Verified
United States
Local time: 01:04
Member (2002)
English to Hungarian
+ ...
Agreed Apr 22, 2013

Sheila Wilson wrote:

I'm not too good with maths, but I personally would never give any discount at all for 50-74% fuzzy matches. I'd like to make a surcharge for them, actually, as they can look editable, so you start off editing, then find out it would have been quicker to have just deleted them - so editing can take longer in the end.

I charge repetitions and 100% matches at my proofreading rate, no lower.


Direct link Reply with quote
 

Steven Segaert  Identity Verified
Estonia
Local time: 08:04
Member (2012)
English to Dutch
+ ...
Only internal repetitions Apr 23, 2013

I personally only give discounts for 101% segment (sentence) repetitions that are internal to the text or the project (which I usually exclude from the count). Or when there are 95+ matches coming from a TM I have created myself (for which I apply a proofreading rate).

One big exception is copy that consists of short phrases or single words. I don't apply discounts on these, because the context can change things dramatically.

In any other case, agencies using a scheme will propose a price on that basis. I assess that price in view of the time I think I need for the translation as a whole, and then I either agree or disagree with that price.

I once spent over an hour counting and re-counting files for a job that I didn't win in the end. That wasn't very productive, so I won't repeat it.

In short: a CAT scheme is a good basis to start talking about effort and price, and otherwise only makes sense if you are updating a set of already proofed materials.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:04
Member (2006)
English to Afrikaans
+ ...
Don't mix two percentages Apr 23, 2013

IanDhu wrote:
This is intended as a tool for assessing the fairness of a client's analysis.

... is the weighting for that band.


Sorry, but this is actually based on a classic beginners' mistake (both beginner clients and beginner translators). The mistake is to assume that there is a straight-line relation between the percentage of a fuzzy match and the [inverse] percentage of money to be paid for it. You'll see your error once you keep in mind that the amount money relates to the amount of time required for that segment.

The scheme shown by Mikhail Kropotov may seem mathematically unsound but it is actually a fairly accurate reflection of the amount of time required for the translation.

In fact, matches from the band 50-70% are often so useless that they require as much time or even more time than it would take to translate a 0% match (unless you have very, very long sentences).

In terms of your system, translating a 95% match should take about 3% as long as translating a 0% match. I'm afraid that that is often not the case -- have you timed yourself to see if you can actually accomplish that?

If you want to fine-tune your maths, you should also not assume that a 100% match represents the lower boundary of time to translate. A 100% match does not take 0% time to translate. Most translators will want to review the 100% match to ensure that it is compatible with the surrounding sentences, and so a 100% match will require at least a proofreading amount of time (let's say, for argument, 20% as long as a 0% translation). It is not greed that makes the translator grade a 100% match as a 0.2 instead of a 0.0, but necessity.



[Edited at 2013-04-23 08:25 GMT]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:04
Member (2006)
English to Afrikaans
+ ...
Please explain your reasoning for repetitions Apr 23, 2013

IanDhu wrote:
Match Types Weighting
Context TM_____________0
Repetitions____________0.1
100%_________________0


Why do you pay 0.1 for repetitions but 0.0 for 100% matches? Shouldn't it be the other way round? In normal paragraph text, a repetition requires less work than a 100% match.


Direct link Reply with quote
 

IanDhu  Identity Verified
France
Local time: 07:04
Member (2005)
French to English
TOPIC STARTER
The system is workable, but I concede the figures need tuning. Apr 23, 2013

In the light of your remarks, I concede that the figures need tuning, but the principle, with the inclusion of the time required, should be workable. As I said earlier, the framework is all right, but the figures are under discussion.

By the way, what CAT package are people using? Studio is far more helpful than Trados 2007, even using TagEditor.

With kind regards,

Adam Warren (IanDhu - 41189)


[Edited at 2013-04-23 09:22 GMT]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:04
Member (2006)
English to Afrikaans
+ ...
Change your upper and lower figures a bit Apr 23, 2013

IanDhu wrote:
In the light of your remarks, I concede that the figures need tuning, but the principle, with the inclusion of the time required, should be workable.


Well, the whole point of your post is how to calculate the figures, am I right? I think your method shows potential, in the sense that it attempts to calculate rates regardless of the tool's division of match categories.

To make this work, and if you're going to use traditional analyses that do not take segment length into account, you should not assume that 0.0 is the lowest figure and 1.0 is the highest figure. Instead, 0.2 should be the lowest figure (for e.g. 100% matches and repetitions) and 2.0 should be the highest figure (for 50-75% matches, if the client wants the translator to not ignore such matches). If the client does not require the translator to take into account matches below 75%, then 0-75% can be charged at a figure of 1.0.

In a sense, you're already lost because you're using the traditional CAT tool analysis as your basis. The fuzzy match percentage is only one aspect of speed, and contrary to what some people might think, it is not even a very important aspect of it. For example, one translator might excel at short segments whereas another might find longer sentences faster to translate. Yet for some reason the traditional CAT tool analysis does not include information about segment length. Another thing that the traditional analyses do not include is the degree to which segments are affected by surrounding text (but I don't think you would be able to measure that anyway). These things affect how fast a translator can do the work.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

A statistically true CAT analysis

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search