Very different analysis results of same source for different languages
Thread poster: Mikhail Kropotov

Mikhail Kropotov  Identity Verified
Russian Federation
Local time: 12:21
Member (2005)
English to Russian
+ ...
Mar 9, 2015

I'm using MemoQ 2013 R2 to coordinate translations of the same source file into four languages: Russian, French, German and Spanish. The source file includes software strings in .po format for localization.

I've had the file completely translated into all four languages and all translations have been saved to their respective TMs. Then the development team had to change a couple of strings in the source file. Running statistics on the new updated file, which is essentially all the same as before, I get two very different kinds of analysis results.

For Russian, Spanish and French I get:

Type ----------- Segments --- Source words
All ------------- 782 ----------- 2962
Repetition --- 0 --------------- 0
101% -------- 765 ------------ 2896
100% -------- 12 -------------- 35
95%-99% --- 0 --------------- 0
85%-94% --- 0 --------------- 0
75%-84% --- 2 --------------- 7
50%-74% --- 2 --------------- 9
No match ---- 1 --------------- 15

But for German, on the same source file, I get:

Type ----------- Segments --- Source words
All ------------- 728 ------------ 2698
Repetition --- 11 -------------- 25
101% --------- 227 ----------- 807
100% --------- 302 ----------- 881
95%-99% --- 18 -------------- 42
85%-94% --- 26 -------------- 136
75%-84% --- 65 -------------- 346
50%-74% --- 112 ------------- 519
No match -----21 -------------- 206

Could someone please explain this drastic difference, or maybe tell me where to look next to understand what causes it?

Thank you in advance for any ideas.

[Edited at 2015-03-09 15:11 GMT]

Direct link Reply with quote

Rossana Triaca  Identity Verified
Local time: 06:21
Member (2002)
English to Spanish
Segmentation Rules Mar 10, 2015

First thing that came to mind, given the different number of segments/words, is an issue with the segmentation rules.

Are you sure you are using the same segmentation rules for the source file for all the analyses? These are usually given by the source language, but if you opened/edited the file and changed the language (or codification, or wrapping) mid-way this could explain the difference.

Direct link Reply with quote

United States
Local time: 03:21
Homogeneity/reps take precedence Mar 10, 2015

Здравствуйте, Михаил!

Differences in analysis numbers may come from several things:

First, you could double check to see if your TM really has the source segments you are analyzing. "Export to TMX" then view with Okapi Olifant

If all your translations are there, you could check the following:

"Project TMs and corpora" checkbox may remain unchecked, causing the analysis to bypass the TM
The checkbox "Homogeneity" was checked, enabling fuzzy matches from within the project with no TM
The checkbox "Repetitions take precedence over 100% matches" was unchecked, causing all repetitions to be counted as 100% matches

Make sure "Project TMs and corpora" is checked. "Homogeneity" should be unchecked. "Repetitions take precedence over 100%" should be checked. "Disable cross-file repetitions" should also be checked.

Some of these options may only be available in memoQ 2014 R2 but I'm not sure.

Good luck!

Nick Lambson
U.S. Translation Company

Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Very different analysis results of same source for different languages

Advanced search

Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search