Similar/identical segment analysis?
Thread poster: pj-ffm

pj-ffm
Local time: 03:46
German to English
Jun 13, 2011

Hi all,

is there a feature in Wordfast (or any other tool) that will allow me to get an idea of how many segments in a document are similar?

Background:
Large job, tight rates, would like to have an idea if I can quote lower by knowing beforehand that there are a lot of repeated or similar segments...

Ideally, I'd like to have tool that runs and gives me a statistical breakdown, but anything that will help me quote and time plan would be good.
<
... See more
Hi all,

is there a feature in Wordfast (or any other tool) that will allow me to get an idea of how many segments in a document are similar?

Background:
Large job, tight rates, would like to have an idea if I can quote lower by knowing beforehand that there are a lot of repeated or similar segments...

Ideally, I'd like to have tool that runs and gives me a statistical breakdown, but anything that will help me quote and time plan would be good.

thanks for your help!
cheers,
Peter.

[Edited at 2011-06-13 11:46 GMT]
Collapse


 

Alex Lago  Identity Verified
Spain
Local time: 03:46
Member (2009)
English to Spanish
+ ...
Which version of wordfast? Jun 14, 2011

Both Wordfast Classic and Pro can do that, but they way to do it varies with each one.

Which version do you have?


 

pj-ffm
Local time: 03:46
German to English
TOPIC STARTER
WF Classic 5.92 Jun 14, 2011

Hi Alex,

I use Wordfast Classic 5.92m with Word 2007

cheers,
Peter.


 

Dominique Pivard  Identity Verified
Local time: 04:46
Finnish to French
Look at memoQ homogeneity analysis Jun 14, 2011

pj-ffm wrote:
is there a feature in Wordfast (or any other tool) that will allow me to get an idea of how many segments in a document are similar?

The analysis report produced by Wordfast (Classic or Pro) will only give tell you about segments that are identical within the document and not present in the TM (they will be listed as repetitions) and about segments that are identical or similar to segments found in the TM (they will be listed as full or fuzzy matches). It will not tell you about similar segments within the document. memoQ has a feature called homogeneity analysis that will give you that information. I'm not aware of other tools that offer such a feature.


 

pj-ffm
Local time: 03:46
German to English
TOPIC STARTER
How do I interpret the WF "Analyse" results? Jun 15, 2011

Dominique Pivard wrote:

pj-ffm wrote:
is there a feature in Wordfast (or any other tool) that will allow me to get an idea of how many segments in a document are similar?

The analysis report produced by Wordfast (Classic or Pro) will only give tell you about segments that are identical within the document and not present in the TM (they will be listed as repetitions) and about segments that are identical or similar to segments found in the TM (they will be listed as full or fuzzy matches). It will not tell you about similar segments within the document. memoQ has a feature called homogeneity analysis that will give you that information. I'm not aware of other tools that offer such a feature.


Hi Dominique,

Thanks for the tip (don't know how I didn't know about the "Analyse" feature...)

As an example, I've just run in on the original (unsegemented version) of a document that I have subsequently partially translated, i.e. some of the segments are already in the TM. This is what i get:

Code:

Match values segments words char. %
---------------------------------------------------------
Repetitions 3399 14263 104688 36%
100% 4637 12422 83281 32%
95%-99% 9 70 449 0%
85%-94% 74 369 2604 1%
75%-84% 54 164 1267 0%
00%-74% 1388 11855 91625 30%
Total 9561 39143 283914
0 tags



Just to be clear, does this mean that 36% of the (potential) segements are identical and that, from the existing TM, it could find 32% identical matches etc. (i.e. I've already translated 32% of the doc)?

If so, this is pretty much what I'm looking for!

cheers,
Pete.

[Edited at 2011-06-15 07:20 GMT]


 

Dominique Pivard  Identity Verified
Local time: 04:46
Finnish to French
Analysis reports are useful Jun 15, 2011

pj-ffm wrote:
Just to be clear, does this mean that 36% of the (potential) segements are identical and that, from the existing TM, it could find 32% identical matches etc. (i.e. I've already translated 32% of the doc)?

The percentages are not calculated based on the number of segments, they take into account their length. Otherwise you would get a wrong picture if all the repetitions and matches come from very short segments.

Please note, however, that the analysis report won't detect segments such as:

The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog!
The quick brown fox jumps over the lazy dog
The quick red fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat

unless a similar one is already in the TM, in which case they would all be rated as high fuzzy matches.

They are very similar to each other, differing merely by punctuation or a single word, and once you have translated one, you will get high fuzzy matches for all the rest, but the analysis report won't see that. memoQ's homogeneity analysis, OTOH, will detect them.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Similar/identical segment analysis?

Advanced search


Translation news related to Wordfast





WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search