How do different CAT tools do discount analyses?
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 04:21
Member (2006)
English to Afrikaans
+ ...
Oct 15, 2009

G'day everyone

[Sorry for calling it "discount analyses"... I mean of course the type of statistics that show how many segments have what kinds of fuzzy matches against a TM or against some standard.]

How do different CAT tools count those matches? I'm not concerned here about whether a match may be 55% in one CAT tool but 45% in another CAT tool, but whether the CAT tool counts internal fuzzy matches or not. In other words, if there is no TM, but many segments in the source file are similar, what does the CAT tool's analysis say?

Not all CAT tools produce such statistics, but for those that do, I'd like you to help me see how the different tools do the analyses. I only have Wordfast and OmegaT, so I can contribute only those two, but I hope that other ProZians with other tools can tell me what their CAT tools' analyses look like.

Please do this:

1. Create a file with the following text in it:

The quick brown fox jumps over the lazy dog. That quick brown fox jumps over the lazy dog. The slow brown fox jumps over the lazy dog. The quick green fox jumps over the lazy dog. The quick brown cat jumps over the lazy dog. The quick brown fox sails over the lazy dog. The quick brown fox jumps under the lazy dog. The quick brown fox jumps over one lazy dog. The quick brown fox jumps over the dead dog. The quick brown fox jumps over the lazy fish. The rain in Spain falls mainly on the plains. Little rain in Spain falls mainly on the plains. The snow in Spain falls mainly on the plains. The rain on Spain falls mainly on the plains. The rain in Mars falls mainly on the plains. The rain in Spain drops mainly on the plains. The rain in Spain falls largely on the plains. The rain in Spain falls mainly under the plains. The rain in Spain falls mainly on those plains. The rain in Spain falls mainly on the trees.


2. Do an analysis against no TM (or against an empty TM if your CAT tool doesn't allow you to do an analysis without a TM). Tell me what the statistics are (A).

3. Translate the first sentence (or put the first sentence in the TM), and do the analysis again. Tell me what the statistics are (B).

I hope the results are interesting. Thanks!


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:21
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Wordfast 5.5 (WFC) Oct 15, 2009

For Wordfast Classic (WFC) 5.5:

A. Empty TM

Analogy Segments Words Characters Percentage
---------------------------------------------------------
Repetitions 0 0 0 0%
100% 0 0 0 0%
95%-99% 0 0 0 0%
85%-94% 0 0 0 0%
50%-84% 0 0 0 0%
00%-49% 20 180 899 100%
Total 20 180 899

B. One segment in TM

Analogy Segments Words Characters %
---------------------------------------------------------
Repetitions 0 0 0 0%
100% 1 9 44 5%
95%-99% 0 0 0 0%
85%-94% 4 36 177 20%
50%-84% 5 45 221 25%
00%-49% 10 90 457 50%
Total 20 180 899


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:21
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
OmegaT 2.0.5 Oct 15, 2009

For OmegaT 2.0.5:

A. Empty TM

x Segments Words Characters (without spaces) Characters (including spaces)
Repetitions: 0 0 0 0
Exact match: 0 0 0 0
95%-100%: 0 0 0 0
85%-94%: 0 0 0 0
75%-84%: 0 0 0 0
50%-74%: 0 0 0 0
No match: 20 180 739 899

B. One segment in TM

x Segments Words Characters (without spaces) Characters (including spaces)
Repetitions: 0 0 0 0
Exact match: 1 9 36 44
95%-100%: 0 0 0 0
85%-94%: 9 81 326 398
75%-84%: 0 0 0 0
50%-74%: 10 90 377 457
No match: 0 0 0 0


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 04:21
Member (2004)
English to Slovenian
+ ...
on trados (freelance 2007) Oct 15, 2009

Hi Samuel:

Samuel Murray wrote:
2. Do an analysis against no TM (or against an empty TM if your CAT tool doesn't allow you to do an analysis without a TM). Tell me what the statistics are (A).

20 no-matches.

Samuel Murray wrote:
3. Translate the first sentence (or put the first sentence in the TM), and do the analysis again. Tell me what the statistics are (B).

1 100%
9 85% - 94%
10 no-matches

I'll charge you nothing for 100%, but we will have to discuss the 9 in the 85% - 94% slot (g).

How's the stemming in OmegaT doing on the subject of quick foxes and lazy dogs? Any changes in the ranking of matches (should try it myself...)?

Regards


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:21
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Vito Oct 15, 2009

VitoSmolej wrote:
How's the stemming in OmegaT doing on the subject of quick foxes and lazy dogs? Any changes in the ranking of matches (should try it myself...)?


OmegaT's discount analysis system does take tags into account (hence my test here with a plaintext file) but does not take stemming into account (so says the developer on the OmegaT mailing list).

[Edited at 2009-10-15 09:26 GMT]


Direct link Reply with quote
 

Boris Sigalov
Local time: 05:21
English to Russian
MemoQ 3.0.29 Oct 15, 2009

A. Empty TM

Type Segments Source words Source chars Percent
All 20 180 739 100
Repetition 0 0 0 0
101% 0 0 0 0
100% 0 0 0 0
95%-99% 0 0 0 0
85%-94% 0 0 0 0
75%-84% 0 0 0 0
50%-74% 0 0 0 0
No match 20 180 739 100

B. One segment in TM

Type Segments Source words Source chars Percent
All 20 180 739 100
Repetition 0 0 0 0
101% 1 9 36 5
100% 0 0 0 0
95%-99% 0 0 0 0
85%-94% 0 0 0 0
75%-84% 9 81 326 45
50%-74% 0 0 0 0
No match 10 90 377 50


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How do different CAT tools do discount analyses?

Advanced search







SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search