A statistically true CAT analysis

A statistically true CAT analysis

IanDhu
France
Local time: 14:11
Member (2005)
French to English
 Apr 22, 2013

This is intended as a tool for assessing the fairness of a client's analysis.

Principle: take the arithmetic mean of the fuzzy-match band (absolute ratio, not percent) and divide it by 100 (e.g. 99+95/2*100 = 0.97)
Subtract it from 1, and use the result as the weighting for the relevant band: 1-0.97=0.03
0.03 is the weighting for that band.

Match Types Weighting
Context TM_____________0
Repetitions____________0.1
100%_________________0
95 99%______________0.03
85 94%______________0.105
75 84%______________0.205
50 74%______________0.38
No Match_______________1

Match the weighted wordcount from this (the sum of the weightings times the wordcounts for each band) with the WWC arising from your client's analysis.

For any difference in the client's favour of more than 10%, renegotiation should be considered.

A 15% difference or more is excessive.

I have a table implementing these weightings, if required.

I hope this may help.

With kind regards,

Mikhail Kropotov
Russian Federation
Local time: 16:11
Member (2005)
English to Russian
+ ...
 Why so low? Apr 22, 2013

You severely underestimate the effort required to translate fuzzy matches. Your weights are way, WAY too low. My scheme is as follows:

Context TM_____________0
Repetitions____________0.3
100%_________________0.3
95 99%______________0.5
85 94%______________0.75
75 84%______________0.75
50 74%______________1.0
No Match_______________1.0

IanDhu
France
Local time: 14:11
Member (2005)
French to English
TOPIC STARTER
 No agency client of mine is quite as generous as your weightings Apr 22, 2013

And there is even one agency with one subsidiary that doesn't given any credit (weighting = 0) for repetitions.
My matrix is not "profitable", it actually seeks to be neutral, a benchmark for testing agency weighted word counts.
Is there any factor in the RussianEnglish set of language pairs that would warrant such high weightings?

With kind regards,

Katalin Horváth McClure
United States
Local time: 08:11
Member (2002)
English to Hungarian
+ ...
 Way too low Apr 22, 2013

I don't think it is language-dependent, at least not at the figures you proposed.
Do you really think you need 0 time to review 100% matches from the TM?
Don't you need to at least proofread them to make sure they fit the current context?
Do you really think that 3% of time is enough to review and edit a 95-99% fuzzy match?

***Let me put it into prospective. Let's say you get 2500 words of text (10 standard pages) that comes up as 95-99% match. Do you think it is a fair estimate to say that finishing up the translation of this text (reviewing, adding what is missing, editing the surroundings, etc.) takes the same amount of time as translating 75 words? (3% of 2500 is 75.) Translating 75 words takes about 15 minutes (assuming someone translates 300 words per hour).***
Can you translate 10 pages of 95-99% fuzzy matched text in 15 minutes?

(P.S:the paragraph between the *** marks is 79 words.)

Sheila Wilson
Spain
Local time: 13:11
Member (2007)
English
+ ...
 Suicidal, IMO Apr 22, 2013

I'm not too good with maths, but I personally would never give any discount at all for 50-74% fuzzy matches. I'd like to make a surcharge for them, actually, as they can look editable, so you start off editing, then find out it would have been quicker to have just deleted them - so editing can take longer in the end.

I charge repetitions and 100% matches at my proofreading rate, no lower.

Katalin Horváth McClure
United States
Local time: 08:11
Member (2002)
English to Hungarian
+ ...
 Agreed Apr 22, 2013

Sheila Wilson wrote:

I'm not too good with maths, but I personally would never give any discount at all for 50-74% fuzzy matches. I'd like to make a surcharge for them, actually, as they can look editable, so you start off editing, then find out it would have been quicker to have just deleted them - so editing can take longer in the end.

I charge repetitions and 100% matches at my proofreading rate, no lower.

Steven Segaert
Estonia
Local time: 15:11
Member (2012)
English to Dutch
+ ...
 Only internal repetitions Apr 23, 2013

I personally only give discounts for 101% segment (sentence) repetitions that are internal to the text or the project (which I usually exclude from the count). Or when there are 95+ matches coming from a TM I have created myself (for which I apply a proofreading rate).

One big exception is copy that consists of short phrases or single words. I don't apply discounts on these, because the context can change things dramatically.

In any other case, agencies using a scheme will propose a price on that basis. I assess that price in view of the time I think I need for the translation as a whole, and then I either agree or disagree with that price.

I once spent over an hour counting and re-counting files for a job that I didn't win in the end. That wasn't very productive, so I won't repeat it.

In short: a CAT scheme is a good basis to start talking about effort and price, and otherwise only makes sense if you are updating a set of already proofed materials.

Samuel Murray
Netherlands
Local time: 14:11
Member (2006)
English to Afrikaans
+ ...
 Don't mix two percentages Apr 23, 2013

IanDhu wrote:
This is intended as a tool for assessing the fairness of a client's analysis.

... is the weighting for that band.

Sorry, but this is actually based on a classic beginners' mistake (both beginner clients and beginner translators). The mistake is to assume that there is a straight-line relation between the percentage of a fuzzy match and the [inverse] percentage of money to be paid for it. You'll see your error once you keep in mind that the amount money relates to the amount of time required for that segment.

The scheme shown by Mikhail Kropotov may seem mathematically unsound but it is actually a fairly accurate reflection of the amount of time required for the translation.

In fact, matches from the band 50-70% are often so useless that they require as much time or even more time than it would take to translate a 0% match (unless you have very, very long sentences).

In terms of your system, translating a 95% match should take about 3% as long as translating a 0% match. I'm afraid that that is often not the case -- have you timed yourself to see if you can actually accomplish that?

If you want to fine-tune your maths, you should also not assume that a 100% match represents the lower boundary of time to translate. A 100% match does not take 0% time to translate. Most translators will want to review the 100% match to ensure that it is compatible with the surrounding sentences, and so a 100% match will require at least a proofreading amount of time (let's say, for argument, 20% as long as a 0% translation). It is not greed that makes the translator grade a 100% match as a 0.2 instead of a 0.0, but necessity.

[Edited at 2013-04-23 08:25 GMT]

Samuel Murray
Netherlands
Local time: 14:11
Member (2006)
English to Afrikaans
+ ...

IanDhu wrote:
Match Types Weighting
Context TM_____________0
Repetitions____________0.1
100%_________________0

Why do you pay 0.1 for repetitions but 0.0 for 100% matches? Shouldn't it be the other way round? In normal paragraph text, a repetition requires less work than a 100% match.

IanDhu
France
Local time: 14:11
Member (2005)
French to English
TOPIC STARTER
 The system is workable, but I concede the figures need tuning. Apr 23, 2013

In the light of your remarks, I concede that the figures need tuning, but the principle, with the inclusion of the time required, should be workable. As I said earlier, the framework is all right, but the figures are under discussion.

By the way, what CAT package are people using? Studio is far more helpful than Trados 2007, even using TagEditor.

With kind regards,

[Edited at 2013-04-23 09:22 GMT]

Samuel Murray
Netherlands
Local time: 14:11
Member (2006)
English to Afrikaans
+ ...
 Change your upper and lower figures a bit Apr 23, 2013

IanDhu wrote:
In the light of your remarks, I concede that the figures need tuning, but the principle, with the inclusion of the time required, should be workable.

Well, the whole point of your post is how to calculate the figures, am I right? I think your method shows potential, in the sense that it attempts to calculate rates regardless of the tool's division of match categories.

To make this work, and if you're going to use traditional analyses that do not take segment length into account, you should not assume that 0.0 is the lowest figure and 1.0 is the highest figure. Instead, 0.2 should be the lowest figure (for e.g. 100% matches and repetitions) and 2.0 should be the highest figure (for 50-75% matches, if the client wants the translator to not ignore such matches). If the client does not require the translator to take into account matches below 75%, then 0-75% can be charged at a figure of 1.0.

In a sense, you're already lost because you're using the traditional CAT tool analysis as your basis. The fuzzy match percentage is only one aspect of speed, and contrary to what some people might think, it is not even a very important aspect of it. For example, one translator might excel at short segments whereas another might find longer sentences faster to translate. Yet for some reason the traditional CAT tool analysis does not include information about segment length. Another thing that the traditional analyses do not include is the degree to which segments are affected by surrounding text (but I don't think you would be able to measure that anyway). These things affect how fast a translator can do the work.

To report site rules violations or get help, contact a site moderator:

 Moderator(s) of this forum Margarita [Call to this topic] Jorge Rodrigues [Call to this topic] James Heppe-Smith [Call to this topic] Rania Ioannou [Call to this topic]

You can also contact site staff by submitting a support request »

A statistically true CAT analysis

 Translation news

SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

SDL Trados Studio 2017 only €415 / \$495
Get the cheapest prices for SDL Trados Studio 2017 on ProZ.com

Join this translator’s group buy brought to you by ProZ.com and buy SDL Trados Studio 2017 Freelance for only €415 / \$495 / £325 / ¥60,000 You will also receive FREE access to our getting started eLearning program!

P.O. Box 903
Syracuse, NY 13201
USA
+1-315-463-7323
ProZ.com Argentina
Calle 14 nro. 622 1/2 entre 44 y 45
La Plata (B1900AND), Buenos Aires
Argentina
+54-221-425-1266
ProZ.com Ukraine
6 Karazina St.
Kharkiv, 61002
Ukraine
+380 57 7281624