Mobile menu

Why? : Why do match counts change when files are analyzed together vs. separately?
Thread poster: lkopplin
lkopplin
English
Feb 4, 2008

I have a batch of 30 files. I'm using Workbench (Trados 6.5) and the same TM to analyze the files in 2 ways.

When I analyze the files in 3 separate batches of 10 files each, then add the readouts together, I get the following:

Reps: 4117
100%: 5788
95-99%: 384
85-94%: 477
75-84%: 427
50-74%: 129
No Match: 12,805

When I analyze the files together (as a single batch of 30), the readout is as follows:

Reps: 4848
100%: 5788
95-99%: 286
85-94%: 387
75-84%: 344
50-74%: 112
No Match: 12,362

Repetitions go up when the files are analyzed together. 100% matches stay the same. Fuzzy matches go down.

My question: Why do the match counts differ?
Can someone explain why fuzzy matches go down? And why 100% matches stay the same?

Here is my theory, but perhaps I am wrong: When files are analyzed together, there are more occurrences of the same fuzzy match than there would be if the files were analyzed separately. When there are multiple occurrences of the same fuzzy match, Trados would count the fuzzy match once as a fuzzy match, then count the remaining occurrences as repetitions. So repetitions go up, fuzzy matches go down.
But if that theory is correct, why wouldn't it apply to 100% matches as well? Is it because a 100% match is the maximum benefit you can achieve through TM, so there is no need to re-assign a recurring 100% match as a repetition?

Any help or insight anyone can provide would be greatly appreciated!

Thanks,
Lauren Nemec
Marketing Manager
Translatus, Inc.

[Edited at 2008-02-04 17:09]


Direct link Reply with quote
 

Claudia Digel  Identity Verified
Germany
Local time: 11:19
English to German
+ ...
Your theory is correct Feb 4, 2008

Hi Lauren,

Your theory is correct. If you analyze all files in one batch, a re-occurring fuzzy match will be counted as a fuzzy match just once, all other occurences will be counted as repetitions.

The number of 100% matches does not change because a 100% match is different from a repetition. A 100% match is an identical match from the TM. This is in the TM before you start your translation. All occurences of this sentence in your files will be counted as 100% matches, not as repetitions, no matter how often this sentence appears in the files. Since you use the same TM for both of your analysis processes, the number of 100% matches doesn't change.

A repetition is a 'new 100% match' which you generate from within your translation files, i.e. the sentence is not in the TM before you start the translation. (There might be a fuzzy match for the sentence but not a 100% match.) Of course, this new sentence can occur in several files, which means you get cross-file repetitions. This is why the number of repetitions rises when you analyze more files in one batch.

Hope this helps.

Best regards,
Claudia


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 05:19
English to French
+ ...
You are right Feb 4, 2008

Your theory is right, and Claudia explains it well.

I just wanted to add that the best way to count words when using CAT tools is to analyse files in a way that takes into account the batches that go to each translator. So, if you are splitting a project of six files into two (three files each), don't analyse them one by one, but don't analyse them in one batch either. If you send files 1, 2 and 3 to translator A and files 4, 5 and 6 to translator B, then analyse files 1, 2 and 3 together for translator A and so on.

This will help leverage TM to the max but it will also be more just for each translator. Of course, use this analysis method to quote a rate to the end client as well.

[Edited at 2008-02-04 19:37]


Direct link Reply with quote
 
lkopplin
English
TOPIC STARTER
Thank you! Feb 6, 2008

Thank you Claudia and Viktoria for your help. Everything is clear now.

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Why? : Why do match counts change when files are analyzed together vs. separately?

Advanced search


Translation news related to SDL Trados





Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs