Differencies between Repetitions / Cross File Repetitions /100% Match
Thread poster: H Habraliova
Apr 9, 2015

Hello,

I'll appreciate if someone could explain to me the differencies between Repetitions / Cross File Repetitions /100% Match in Trados Studio 2011.

I am acting as a customer hiring translators for a project (localization). I analyzed my source files in Trados and got the following statisctis:
Files: 262
Segments Words Characters
PerfectMatch 0 0 0
Context Match 0 0 0
Repetitions 240 1131 6436
Cross-file Repetitions 1574 7374 37334
100% 366 377 1226
95% - 99% 4 7 57
85% - 94% 0 0 0
75% - 84% 0 0 0
50% - 74% 0 0 0
New 2136 14748 87224
Total 4320 23637 132277

Question 1: We've agreed with the translator, that we're not paying for repetitions, only for new segments. That means that cross-file repetitions and 100% match should not be paid for as well. Am i right in assuming so?

Question 2: The project package already includes a translation memory with a small amount of translated segments. I was surpised to see so few fuzzy matched in the report. I know the content of these source files and I think there should be more fuzzy matches.
Am I right in assuming, that Trados will identify fuzzy matches only when I run the same analysis after the translation has been completed and the translation memory has been populated with translations?

I'm sorry if this has been discussed before. Please, point the links to dicsussions.


Direct link Reply with quote
 

Roy Oestensen  Identity Verified
Norway
Local time: 08:02
Member (2010)
English to Norwegian (Bokmal)
+ ...
Crossfile repetitions: Repetitions between two or more documents Apr 9, 2015

I understand crossfile repetitions to be segments that are identical in two different documents. So if you send both documents to the same translator, I think it is fine to take that into account. If you, on the other hand split the documents between two or more translators, they won't get the benefit of this type of repetitions, and then it would be right to pay in full.

This could be important, because I have experienced that an agency that I am working for, have analysed all files in one go and not taken into account that a substantial part was cross file repetitions. The result was that the analysis that I got, had a substantially lower number of repetitions, which made for some disagreements.

Fuzzy matches, that is segments that are similar to whatever is in the TM. So if the TM is almost empty, Studio would not find much. Of course you would have fuzzy propagation, but I don't think CAT tools take those into account.

Running an analysis after the translation is finished should result in less fuzziness and more 100% hits, since Studio then would find the segments in the TM, I would think. I may be wrong there, though.,


Direct link Reply with quote
 
H Habraliova
TOPIC STARTER
Differencies between Repetitions / Cross File Repetitions /100% Match Apr 9, 2015

Thanks for your reply, Roy.

All files go to the same translator, so, you confirmed my assumption that it's ok not to charge for cross files repetitions in this case.

Running an analysis after the translation is finished should result in less fuzziness and more 100% hits, since Studio then would find the segments in the TM,
Yes, when I think more about this, I think you're right.

What I meant initially, is that segments in my source files are very similar to each other. Most of the segments in my case would be like this:

Jane went to the movies on Monday.
Jane went to the movies on Sunday.
Mike went to the movies on Monday.
John went to the theatre on Thursday. etc...
(these are not actual segments from my project, just random sentences to illustrate the degree of similarity).

At first I thought Trados should recognize this similarity, instead of treating them as completely unique segments. So, it would be fair to get a discount for translating them, even though the TM is still almost empty.

Does this make sense? Or am I being too stingy?

[Edited at 2015-04-09 18:19 GMT]


Direct link Reply with quote
 

Rossana Triaca  Identity Verified
Uruguay
Local time: 03:02
Member (2002)
English to Spanish
Internal fuzziness... Apr 9, 2015

The option you are looking for is "Report Internal Fuzzy Match Analysis" (when running an Analyze File(s) task). This will give you the internal fuzziness of the text, in addition to the matches already present in the TM.

That being said, not paying for repetitions, context matches and 100% matches, for me as a translator implies that I'll lock them out and not even see them during translation. This is OK if you trust the TM and you're having a reviewer check and unify everything afterwards, but if that's not the case then it's really a terrible approach for consistency and cohesion (that's why most companies do pay for repeats/100% matches, to cover this "reviewer" task performed by the translator, albeit at a reasonably reduced rate).

Running an analysis after a translation is finished is rather pointless, since it should report that all rows are CMs.


Direct link Reply with quote
 

Gerard de Noord  Identity Verified
France
Local time: 08:02
Member (2003)
German to Dutch
+ ...
Internal fuzzies are our slice of luck Apr 10, 2015

You're asking translators? Outsourcers do not react very often to forum questions like this.

If you haven't agreed on discounts for internal fuzzies, your current Trados wordcount is the one you should adhere to.

And, yes, I think it's stingy to ask for a discount for virtual matches. Why should we be paid less because the outsourcer has found an extra option to tick before pushing the Analyze button?

Cheers,
Gerard


Direct link Reply with quote
 
H Habraliova
TOPIC STARTER
Thanks Rossana! Apr 10, 2015

Rossana Triaca wrote:

The option you are looking for is "Report Internal Fuzzy Match Analysis"...

Thanks. I tried this and that's exactly what i was looking for.

That being said, not paying for repetitions, context matches and 100% matches, for me as a translator implies that I'll lock them out and not even see them during translation. This is OK if you trust the TM and you're having a reviewer check and unify everything afterwards, but if that's not the case then it's really a terrible approach for consistency and cohesion (that's why most companies do pay for repeats/100% matches, to cover this "reviewer" task performed by the translator, albeit at a reasonably reduced rate).


I see your point. However, in my cases repetitions are typically very short segments consisting of 1 or several words - website labels, buttons, etc. And the translator knows the site that is being localized in and out. Reviewing will be paid for separately by hour. I think in our case this solution will provide the best cost/time/quality ratio.

Running an analysis after a translation is finished is rather pointless, since it should report that all rows are CMs.
Yes, I fugured that out already.

[Edited at 2015-04-10 08:52 GMT]


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 08:02
Member (2010)
Spanish to English
Agree with Rossana Apr 10, 2015

Rossana Triaca wrote:

The option you are looking for is "Report Internal Fuzzy Match Analysis" (when running an Analyze File(s) task). This will give you the internal fuzziness of the text, in addition to the matches already present in the TM.

That being said, not paying for repetitions, context matches and 100% matches, for me as a translator implies that I'll lock them out and not even see them during translation. This is OK if you trust the TM and you're having a reviewer check and unify everything afterwards, but if that's not the case then it's really a terrible approach for consistency and cohesion (that's why most companies do pay for repeats/100% matches, to cover this "reviewer" task performed by the translator, albeit at a reasonably reduced rate).

Running an analysis after a translation is finished is rather pointless, since it should report that all rows are CMs.


Agree with absolutely everything that Rossana says here.

I wrote more about fuzzies and ethical issues here:
http://signsandsymptomsoftranslation.com/2015/03/06/fuzzy-matches/


Direct link Reply with quote
 
H Habraliova
TOPIC STARTER
opinion of the opposite party Apr 10, 2015




Thanks for your post. I guess we're on the opposite sides here, so disagreement is natural. Even though it might be weird for you to see such questions addressed to translators, I found your reply and all other replies in this thread very useful. And it was refreshing for me to see the opinion of the other party.


Direct link Reply with quote
 
H Habraliova
TOPIC STARTER
I will check out your post Apr 10, 2015


I wrote more about fuzzies and ethical issues here:
http://signsandsymptomsoftranslation.com/2015/03/06/fuzzy-matches/

Thanks, I will check out your post.


Direct link Reply with quote
 
564354352  Identity Verified
Denmark
Local time: 08:02
Danish to English
+ ...
Fuzzy match values Apr 10, 2015

H Habraliova wrote:

What I meant initially, is that segments in my source files are very similar to each other. Most of the segments in my case would be like this:

1. Jane went to the movies on Monday.
2. Jane went to the movies on Sunday.
3. Mike went to the movies on Monday.
4. John went to the theatre on Thursday. etc...
(these are not actual segments from my project, just random sentences to illustrate the degree of similarity).

At first I thought Trados should recognize this similarity, instead of treating them as completely unique segments. So, it would be fair to get a discount for translating them, even though the TM is still almost empty.



Hoping that I won't be stating the obvious, but fuzzy matches are based on number of words, not on the meaning of sentences. In your imagined example, htere are seven words in each sentence.
Segment 1 is obviously a 'no match'.
Segment 2 repeats 6 out of 7 words from segment 1, i.e. an 85 % match.
Segment 3 repeats 6 out of 7 words from segment 2, i.e. an 85 % match, or 5 out of 7 words in segment 2, i.e. a 71 % match.
Segment 4 repeats 4 out of 7 words from segment 1., i.e. a 57 % match.

So, although the sentences are very similar, the matches may be lower than you expect...


Direct link Reply with quote
 

Rossana Triaca  Identity Verified
Uruguay
Local time: 03:02
Member (2002)
English to Spanish
Editing Distance Apr 10, 2015

In your imagined example, htere are seven words in each sentence.
Segment 1 is obviously a 'no match'.
Segment 2 repeats 6 out of 7 words from segment 1, i.e. an 85 % match.
Segment 3 repeats 6 out of 7 words from segment 2, i.e. an 85 % match, or 5 out of 7 words in segment 2, i.e. a 71 % match.
Segment 4 repeats 4 out of 7 words from segment 1., i.e. a 57 % match.

So, although the sentences are very similar, the matches may be lower than you expect...


Actually... the fuzzy algorithm is much more complex; editing distance is calculated using character substitution - i.e., how many operations you have to make char by char to convert one sentence into the other (roughly put).

So, if you ran an analysis with the default settings it would yield that out of the 4 segments, #2 and #3 are in the 85%-95% fuzzy bracket (specifically, they are both 91% matches), and the first and the last segments are absolutely new content. Needless to say, in a real project penalties and other factors influence the final calculation, but this is a good example of how we may perceive that somehow segment 4 is a "linguistic" fuzzy (solely because we humans find it similar to previous content) when in fact it's so different using a formal metric that it's not considered a match at all. This bodes well for the translator too - full rate on this new sentence, as it should.

p.s. you summoned me by saying "segment" thrice!


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Differencies between Repetitions / Cross File Repetitions /100% Match

Advanced search







SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search