Analysis discrepancy 2011/2014
Thread poster: philwhite

philwhite
Local time: 01:22
German to English
Jun 3, 2014

We currently have a massive problem with a project.

The customer has prepared FrameMaker V10 files as SDLXLIFF files and sent us the TM and the SDLXLIFF files and the TM.

Their analysis showed the following (words):

PerfectMatch: 0
Context Match: 30439
Repetitions: 580
Cross-file Repetitions: 27
100%: 3928
95% - 99%: 283
85% - 94%: 1795
75% - 84%: 1840
50% - 74%: 504
New: 7473
Total: 46869


Our analysis ahowed the following:

PerfectMatch: 0
Context Match: 25813
Repetitions: 646
Cross-file Repetitions: 27
100%: 5406
95% - 99%: 2110
85% - 94%: 3098
75% - 84%: 1856
50% - 74%: 425
New: 7487
Total: 46868


As you see, there is a huge discrepancy in th 85% and 95% fuzzies.

After an hour of telephoning, we established that the customer is using Studio 2011 and we are using 2014. When I analyzed the same files with the same TM in 2011 on my machine, I got the same results as the customer.

Can anyone explain and preferably resolve this discrepancy?

Since the files are SDLXLIFF, the problem is not with the file type or tagging.

No penalties or filters have been applied.

The results were confirmed on two separate machines.

The discrepancy is in excess of 10%, and the biggest problem is not in the analysis, but in processing the file, which will give us considerably more work.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 01:22
English
Some changes... Jun 4, 2014

Hi Phil,

Did you check you have same settings for recognizers? We introduced the acronyms recognizer in SP1. The release notes say this:

Studio 2014 includes some changes related to recognized tokens (also referred to as
placeables), for instance in the area of acronyms.
To ensure optimum matching and avoid creating duplicate translation units when you edit
existing 100% matches, SDL recommends that you re-index all your translation memories
to recalculate all recognized tokens.

To re-index translation memories:

  • Select Translation Memories view and open your translation memory.
  • Click Home tab and select Settings under the Tasks group.
  • Under Performance and Tuning, select Re-index Translation Memory.
  • Wait for the process to finish as this may take a while.

If for any reason you are still seeing matching issues and/or unwanted translation unit
duplicates, consider exporting/importing the translation units into a new TM.


In addition to that some of the matching strategies were changed as customers reported issues that some of the matches were too high.

Without isolated files it's difficult to say anything more than this.


Direct link Reply with quote
 

philwhite
Local time: 01:22
German to English
TOPIC STARTER
Some partial success Jun 5, 2014

Thanks Paul.

To recap:

The original TM was created in Studio 2014.

The customer used this TM to create and analyze the project in 2011 (with the results shown above).

We analyzed the same SDLXLIFF files with the same TM in Studio 2014 with the wildly discrepant results shown above.

I re-indexed the TM (in 2014) and then got the following results:

PerfectMatch: 0
Context Match: 27341
Repetitions: 597
Cross-file Repetitions: 27
100%: 5680
95% - 99%: 1658
85% - 94%: 1830
75% - 84%: 1823
50% - 74%: 425
New: 7487
Total: 46868

These results are better, but we are still missing about 1000 words in 100% matches (i.e. CM plus 100%).


I then created a new TM in Studio 2011 and re-imported the original TM. These are the results:

PerfectMatch: 0
Context Match: 24895
Repetitions: 580
Cross-file Repetitions: 30
100%: 9467
95% - 99%: 330
85% - 94%: 1830
75% - 84%: 1823
50% - 74%: 425
New: 7488
Total: 46868

These are the best results that can be achieved in 2014, and are very close to the results in 2011, with the exception that about 5500 words have shifted from CM to 100%. For us, this would mean that we would have to review these 5500 words, but the customer would not pay us for doing so.

Of course, since we have the customer's SDLXLIFF files, their CMs and 100% hits are already in place, so the review work will not actually be increased unless we were to override what is already there.

It remains, however, a serious issue that the customer has already encountered on a far smaller scale (without knowing the resolution you proposed above). My contact tells me that this is the reason that they have no intention of migrating to 2014 in the near future, which means that we shall have to struggle with it in the future.

We shall do the job in 2011 as I have already lost 5 or 6 hours over this.


I would be happy to send the TM and a sample file if it would help.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 01:22
English
It would be interesting... Jun 5, 2014

... to understand why CMs dropped to 100%, so please do. If you put them on dropbox or something then we can take a look.

Regards

Paul
pfilkin@sdl.com


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Analysis discrepancy 2011/2014

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search