Pages in topic:   [1 2] >
memoq's inconsistent statistics
Thread poster: Epameinondas Soufleros

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
May 25

For more than three years now memoQ has had a bug in Statistics: the count function reported a different total number of words from the analysis function.

I reported this problem twice, the second of which was a few days ago. They said they resolved the bug and closed my support ticket.

Today they released memoQ 7.8.175, which is supposed to contain the fix: both totals (from the count and the analysis function) are the same.

But they introduced a new bug: now the repetitions are reported inconsistently, as you can see in the image:

rqbog2cmkwvv9txysshm.png

Update: They claim that this is expected behaviour, since my project has already been translated and what used to be repetitions have become context matches. So, according to them, a repetition is not always a repetition. To me, this does not make sense.

[Edited at 2018-05-25 13:32 GMT]


 

Tomasz Sienicki  Identity Verified
Denmark
Local time: 05:27
Member (2007)
Danish to Polish
+ ...
Check the settings May 25

Note the setting "Repetitions take precedence over 100%" in the Statistics dialogue. Have you by any chance unchecked the box?

 

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
TOPIC STARTER
Repetitions do take precedence May 25

No, I have not: the box is checked.

 

Mirko Mainardi  Identity Verified
Italy
Local time: 05:27
Member
English to Italian
Homogeneity May 25

If that concerns "homogeneity" (aka internal repetitions and fuzzies), then that would actually be a good thing...

P.S. At any rate, I'm still (very negatively) amazed at the fact that (supposedly) professional (and expensive) software like this "needs" dozens of patches after official release...
This obviously doesn't only apply to MQ (which I use on an everyday basis).

[Edited at 2018-05-25 12:45 GMT]


 

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
TOPIC STARTER
Homogeneity May 25

What does homogeneity have to do with it?

Is it a good thing to have a bug affecting word counts, on which quotes an payments are based? Is that what you are suggesting?


 

Mirko Mainardi  Identity Verified
Italy
Local time: 05:27
Member
English to Italian
Homogeneity May 25

Epameinondas Soufleros wrote:

What does homogeneity have to do with it?

Is it a good thing to have a bug affecting word counts, on which quotes an payments are based? Is that what you are suggesting?


IF (please note the word...) the first count doesn't use any TM, then I supposed the reps can only come from homogeneity. And IF that's the case, then the tool not counting them is not such a bad thing for us (translators)...

In other words, I was half-joking, also considering that "internal fuzzies" have become a thing with agencies and some end-clients "thanks" to our dear CAT tools developers, thing which didn't exactly turn in our favor as service providers...


 

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
TOPIC STARTER
Nothing to do with homogeneity May 25

Again, homogeneity plays no part here. And homogeneity affects only the analysis, not the count. I still can't see what you mean.

 

Philippe Etienne  Identity Verified
Spain
Local time: 05:27
Member
English to French
A repetition becomes a 101% match after translation May 25

I fail to see any use for the "Counts" part in practice when you also run an analysis. Pricing is based on the analysis, not the "Counts".
You can untick the option "Show counts" in the Analysis dialog. I've never noticed anything wrong with the "counts" part because I don't display it.

From the 2013 R2 version help:
Show counts: Check this check box to display the number of source segments, source words and source characters and the source-wordcount based percentage for all translatable segments, repetition and, if the check box labeled Include locked rows is checked, all locked segments within the given scope

Epameinondas Soufleros wrote:
...Update: They claim that this is expected behaviour, since my project has already been translated and what used to be repetitions have become context matches. So, according to them, a repetition is not always a repetition. To me, this does not make sense.

A repetition is a source sentence that is not in the TM and appears at least twice in a document.
The first occurrence of the repeated sentence is a 0% match. Subsequent occurrences are repetitions.
If it's already "fuzzily" in the TM, the source sentence doesn't qualify as a repetition: all occurrences of the repeated sentence are all fuzzy matches with the same concordance percentage.

When the doc is translated, provided you committed all segments to the TM, everything is 101% match.

Philippe

EDIT: On a finished real-world project, I was surprised to see that there was still a non-zero figure for repetitions. From my understanding above, there shouldn't have been any, everything was confirmed green. So I clicked "Confirm and update rows" and ticked the box "Confirmed" besides the "Edited" box. Now there are no more repetitions in the analysis of the finished file.

[Edited at 2018-05-25 15:01 GMT]


 

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
TOPIC STARTER
A repetition is a repetition May 25

A repetition is a concept that has nothing to do with a TM or any other external resource. Repetitions are internal to a document or internal to a collection of documents.

By defining a repetition differently before and after translation, we introduce undesirable side-effects, which produce unexpected results. The notion of a repetition should remain constant at all times. Even if a segment in a document is similar to a segment in a TM, all other segments in the document that are identical to it are repetitions of such segment (and most probably will be filled in by propagating the translation entered in such segment).

Given the same input, a function should yield the same output every single time. Otherwise, it is unreliable and quirky, hence unfit for business purposes.

[Edited at 2018-05-25 14:59 GMT]


 

Philippe Etienne  Identity Verified
Spain
Local time: 05:27
Member
English to French
Definition of a repetition May 25

Epameinondas Soufleros wrote:
...By defining a repetition differently before and after translation...

There are no "repetitions" after translation.
A repetition that's translated is no longer a repetition, but a 101% match.
If you want to see repetitions present before translation, then it's the "Counts" part. There are NO repetitions, repeat NO repetitions, in the analysis of a finished translation ("Confirm and update rows" dialog with "Confirmed" ticked, as noticed above)

Epameinondas Soufleros wrote:
The notion of a repetition should remain constant at all times.

It is. See my previous post.

Epameinondas Soufleros wrote:
...Even if a segment in a document is similar to a segment in a TM, all other segments in the document that are identical to it are repetitions of such segment...

No. All are xx% matches.

Either you handle the set of docs with "Counts" to see repetitions ("Counts" seem to be "statistics" rather than "analysis"), or you handle the set of docs with the "Analysis" to see the leverage from the TM.
I think you're mixing up both aspects here.

Philippe


 

mikhailo
Local time: 06:27
English to Russian
+ ...
about before and after translation statistics May 25

Epameinondas Soufleros wrote:

A repetition is a concept that has nothing to do with a TM or any other external resource. Repetitions are internal to a document or internal to a collection of documents.

By defining a repetition differently before and after translation, we introduce undesirable side-effects, which produce unexpected results. The notion of a repetition should remain constant at all times. Even if a segment in a document is similar to a segment in a TM, all other segments in the document that are identical to it are repetitions of such segment (and most probably will be filled in by propagating the translation entered in such segment).

Given the same input, a function should yield the same output every single time. Otherwise, it is unreliable and quirky, hence unfit for business purposes.

[Edited at 2018-05-25 14:59 GMT]


You forget about segment merging/splitting. They can change statistics.

I see that the cause of problem is BAD DOCUMENTATION. MQ is one from BIG3 that have no FULL MANUAL (the help is good to use as a ref. quide, but not as manual).
Another one idea - to separate internal (only project documents - It have only repetitions and no matches) an external (project documents to resources - TM, TB, Livedocs - Have only matches and no repetitions) statistics


 

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
TOPIC STARTER
A repetition is a repetition May 25

Repetitions are not defined against a TM or any other resource external to a document or a collection of documents.

After translating a document, you still see the symbols denoting repetitions inside memoq's grid, don't you? (I mean the green dotted line and the green dotted arrows.) They don't go away. Because those segments continue to be repetitions. A bunch of segments within a document or a collection of documents continue to have the same relation among them, no matter what external resources are attached to a project.

A TM match is a TM match and a repetition is a repetition. It's as simple as that. In other words, a repetition is a 100% internal match.

If analysis before translation compares a document's segments both within a document and against a TM, but analysis after translation only compares a document's segments against a TM, then those two analyses should be offered under a different name: pre-translation analysis and post-translation analysis.

[Edited at 2018-05-25 15:28 GMT]

[Edited at 2018-05-25 15:38 GMT]


 

Mikhail Kropotov  Identity Verified
Russian Federation
Local time: 06:27
Member (2005)
English to Russian
+ ...
Analyze repetitions before starting work May 25

That's why you should, ideally, create an analysis and save it as a log BEFORE you start working.

If you would like to calculate how many repetitions an already translated project had, you can create a new project, import the source files, and analyze it against an empty TM. Granted that's some extra hassle, but it's still possible.

[Edited at 2018-05-25 15:28 GMT]


 

Mirko Mainardi  Identity Verified
Italy
Local time: 05:27
Member
English to Italian
OK May 25

Epameinondas Soufleros wrote:

Again, homogeneity plays no part here. And homogeneity affects only the analysis, not the count. I still can't see what you mean.


Mmmh... if homogeneity counts "internal" matches, and repetitions are necessarily internal, then why are repetitions not part of "homogeneity"? And, is there a difference between single- and cross-file repetitions?

And by the way, Epameinondas, I'm not trying to piss you off or counter what you are saying in any way. Just curious, as I'm a regular MQ user, but (fortunately) I don't have to count reps/fuzzies at the moment...


 

Epameinondas Soufleros  Identity Verified
Greece
Local time: 06:27
Member (2008)
English to Greek
+ ...
TOPIC STARTER
Repetitions should be under homogeneity May 25

Mirko Mainardi wrote:
if homogeneity counts "internal" matches, and repetitions are necessarily internal, then why are repetitions not part of "homogeneity"?


Repetitions are essentially a homogeneity concept, but they existed before memoq had the idea of homogeneity statistics (which Studio calls "internal fuzzies"), that's why they're presented along with TM matches. The problem is with the algorithm that memoq uses, which produces inconsistent results. If a + b = c, then, given the same values for a and b, c should always be the same. But memoq thinks that c should vary according to time. This is wrong and unexpected. And unexpected behaviour is a type of bug.

[Edited at 2018-05-25 15:45 GMT]


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

memoq's inconsistent statistics

Advanced search






WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search