Very different analysis results of same source for different languages
Thread poster: Mikhail Kropotov

Mikhail Kropotov  Identity Verified
Russian Federation
Local time: 05:49
Member (2005)
English to Russian
+ ...
Mar 9, 2015

I'm using MemoQ 2013 R2 to coordinate translations of the same source file into four languages: Russian, French, German and Spanish. The source file includes software strings in .po format for localization.

I've had the file completely translated into all four languages and all translations have been saved to their respective TMs. Then the development team had to change a couple of strings in the source file. Running statistics on the new updated file, which is essentially all the same as before, I get two very different kinds of analysis results.

For Russian, Spanish and French I get:

Type ----------- Segments --- Source words
=============================
All ------------- 782 ----------- 2962
Repetition --- 0 --------------- 0
101% -------- 765 ------------ 2896
100% -------- 12 -------------- 35
95%-99% --- 0 --------------- 0
85%-94% --- 0 --------------- 0
75%-84% --- 2 --------------- 7
50%-74% --- 2 --------------- 9
No match ---- 1 --------------- 15

But for German, on the same source file, I get:

Type ----------- Segments --- Source words
=============================
All ------------- 728 ------------ 2698
Repetition --- 11 -------------- 25
101% --------- 227 ----------- 807
100% --------- 302 ----------- 881
95%-99% --- 18 -------------- 42
85%-94% --- 26 -------------- 136
75%-84% --- 65 -------------- 346
50%-74% --- 112 ------------- 519
No match -----21 -------------- 206

Could someone please explain this drastic difference, or maybe tell me where to look next to understand what causes it?

Thank you in advance for any ideas.

[Edited at 2015-03-09 15:11 GMT]


Direct link Reply with quote
 

Rossana Triaca  Identity Verified
Uruguay
Local time: 23:49
Member (2002)
English to Spanish
Segmentation Rules Mar 10, 2015

First thing that came to mind, given the different number of segments/words, is an issue with the segmentation rules.

Are you sure you are using the same segmentation rules for the source file for all the analyses? These are usually given by the source language, but if you opened/edited the file and changed the language (or codification, or wrapping) mid-way this could explain the difference.


Direct link Reply with quote
 

USTranslation
United States
Local time: 20:49
English
Homogeneity/reps take precedence Mar 10, 2015

Здравствуйте, Михаил!

Differences in analysis numbers may come from several things:

First, you could double check to see if your TM really has the source segments you are analyzing. "Export to TMX" then view with Okapi Olifant http://okapi.sourceforge.net/downloads.html

If all your translations are there, you could check the following:

"Project TMs and corpora" checkbox may remain unchecked, causing the analysis to bypass the TM
The checkbox "Homogeneity" was checked, enabling fuzzy matches from within the project with no TM
The checkbox "Repetitions take precedence over 100% matches" was unchecked, causing all repetitions to be counted as 100% matches

Make sure "Project TMs and corpora" is checked. "Homogeneity" should be unchecked. "Repetitions take precedence over 100%" should be checked. "Disable cross-file repetitions" should also be checked.

Some of these options may only be available in memoQ 2014 R2 but I'm not sure.

Good luck!

Nick Lambson
U.S. Translation Company


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Very different analysis results of same source for different languages

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search