Pages in topic:   [1 2] >
MemoQ - Why should it count less words than Trados?
Thread poster: Tomás Cano Binder, BA, CT

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 14:22
Member (2005)
English to Spanish
+ ...
Sep 21, 2009

I am evaluating MemoQ extensively and am very happy about the tool in general, but I spotted something that does not seem to be right: apparently, MemoQ counts 2-5% words less than Trados. Yes, this could be great news for agencies, but for freelancers it could have an important bottom-line impact.

I initially spotted this on TTX files, but a simulation done with an actual file reveals that --as far as I can see-- MemoQ does grab everything from the presegmented TTX file: if I copy source to target, confirm everything, and export to TMX, the memory in Trados does translate the full TTX file in TagEditor, with some slight differences in segmentation. So it looks like I do get all segments to translate in MemoQ.

Now I am seeing that it happens with all sorts of files: MemoQ counts less words than Trados in medium to large files.

Do the following experiment with freely accessible files:
- Go to any medium-to-large Wikipedia article. Save as an HTML file.
- Analyse with Trados.
- Add the file to a project in MemoQ, select it and use the Statistics function.

The difference with the files (TTX, Word, PowerPoint, HTML files) I have tried ranges from 2% to 5% approximately. For instance, for the "Spain" article in the English wikipedia, I count 20,807 words in Trados, but only 19,275 words in MemoQ. That is a difference of 1,532 words, which is quite a lot. For business reasons, it only leaves me the option to use Trados for counting, which adds a step to any project and makes me lose time.

Is there a reason why the wordcount should be so different between Trados and MemoQ?


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 14:22
French to Polish
+ ...
Words with apostrophe Sep 21, 2009

Tomás Cano Binder, CT wrote:


Now I am seeing that it happens with all sorts of files: MemoQ counts less words than Trados in medium to large files.[/quote]
MemoQ counts less word in very small files too

E.g. create a document with one short sentence like I'm stuck.
Word counts 2 words.
Trados counts 3 words.
MemoQ counts 2 words, like Word.

Is there a reason why the wordcount should be so different between Trados and MemoQ?

The words with apostrophe are counted in a different way.
IMHO Trados is more logical here.

Cheers
GG

[Edited at 2009-09-21 08:44 GMT]


Direct link Reply with quote
 

Gergely Vandor
Hungary
Local time: 14:22
English to Hungarian
what is a word anyway? :) Sep 21, 2009

Hello Tomas,

Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document.

For example, "I'm" or "a/c" is considered one word in MemoQ, but I beleive both are two words for Trados.

There is also the fact that the two tools might actually not extract the exact same text content from documents. For example Trados TagEditor skips number only segments, which is very wrong in my opinion, because in many cases, the source and the target language uses different number formats. Tools may or may not import a generated table of contents, or an index. Import options and their defaults can also affect this.

Best regards,
Gergely Vandor

Kilgray


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 14:22
Member (2005)
English to Spanish
+ ...
TOPIC STARTER
OK, but there must be other reasons Sep 21, 2009

Grzegorz Gryc wrote:
Word counts 2 words.
Trados counts 3 words.
MemoQ counts 2 words, like Word.


I see! Thanks a lot for the note Grzegorz. However, the "Spain" article in Wikipedia contains about 290 words with apostrophes, so there are still a difference of 1,200 words unexplained. There must be other reasons.

I wonder whether it would be possible for MemoQ to emulate Trados' way of counting if we select the TRADOS-like radio button in the Statistics dialog? That way we would not need to use Trados to analyse files for our customers using Trados!


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 14:22
French to Polish
+ ...
Dashes, hyphens... Sep 21, 2009

Tomás Cano Binder, CT wrote:

Grzegorz Gryc wrote:
Word counts 2 words.
Trados counts 3 words.
MemoQ counts 2 words, like Word.


I see! Thanks a lot for the note Grzegorz. However, the "Spain" article in Wikipedia contains about 290 words with apostrophes, so there are still a difference of 1,200 words unexplained. There must be other reasons.

Dashes.
E.g., the Catalan donar-te'ls-hi is 1 (one) word for Word and MemoQ and 4 (four) for Trados.

I wonder whether it would be possible for MemoQ to emulate Trados' way of counting if we select the TRADOS-like radio button in the Statistics dialog? That way we would not need to use Trados to analyse files for our customers using Trados!

Just ask the developers to correct the bug
It should work...

Quoting MemoQ help

Word counts area:

· MemoQ: Check this checkbox to display MemoQ word counts.

Note: In MemoQ, similarly to Microsoft® Excel®, every string or character that is between whitespaces is counted as a word. Therefore in MemoQ mode you always count numbers as a single word and hyphenated words like in-bound are also considered to be a single word.

· TRADOS-like: Check this checkbox to display Trados-like word counts. SDL Trados® is another CAT tool on the market that handles word counts differently.

Note: In Trados®, numbers are only counted as words when they are within a segment, hyphenated words are counted as two words, and a number of other rules apply. In Trados®, segmentation is a factor in word count, i.e. you can get a different word count if the same text appears in one or two lines. Trados® segmentation rules are not public, therefore there is usually a small discrepancy between the word counts of Trados® and Trados-mode MemoQ. In most of the cases, this discrepancy does not exceed 1.5%. We suggest that you only use Trados-like word counts if your client explicitly requires you to do so.


Cheers
GG


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 14:22
Member (2005)
English to Spanish
+ ...
TOPIC STARTER
Looks like a bug indeed! Sep 21, 2009

Grzegorz Gryc wrote:
Just ask the developers to correct the bug
It should work...

Indeed I will. I certainly believe this is a bug in my edition, as counting a large file with MemoQ style and with Trados style yields the same number of total words and the wordcount should be different in a medium-to-large file.

Thank you so much Grzegorz!


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 14:22
Member (2005)
English to Spanish
+ ...
TOPIC STARTER
It's about business, not about translation effort Sep 21, 2009

Gergely Vandor wrote:
Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document.

I entirely agree Gergely and I appreciate your information.

To me this is really a business matter, as some of my customers want me to send them an analysis of the files. I would very much prefer to send them figures as Trados calculates them, which are higher and will match what my TRADOS-based customers get with their tool.

If MemoQ could mimmick Trados' counting practices as much as possible (when the TRADOS-like wordcount is enabled in the Statistics dialog box), it would help me, and probably a lot of other users who have TRADOS-based customers.


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 14:22
Member (2005)
English to Spanish
+ ...
TOPIC STARTER
A reply from Kilgray's support Sep 21, 2009

I emailed about this to Kilgray's support at 12:08 today. The reply just arrived at 13:08. That is one hour. Even if the matter is not resolved immediately, I think this is great support. Thank you very much!

Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 14:22
French to Polish
+ ...
Non breaking spaces... LOL... :) Sep 21, 2009

Gergely Vandor wrote:

Grzegorz has already pointed out one of the many causes. The most fundamental problem is actually counting words, which in fact isn't trivial. The problem is not between Trados and MemoQ, but between basically any two pieces of software that count words. I don't think you can produce identical statistics in two translation tools for any average document.

True.
The wordcount may be different even in different versions of the same software.

PS.
The case of non breaking spaces is funny
In MemoQ and Trados 2006, a text like "Pussy cat" with nbsp is reported as one word.
Trados 2007 Suite counts it already as two words...

Cheers
GG


Direct link Reply with quote
 

Gergely Vandor
Hungary
Local time: 14:22
English to Hungarian
does memoQ count less or more? Sep 30, 2009

Hello Tomás,

Tomás Cano Binder, CT wrote:

If MemoQ could mimmick Trados' counting practices as much as possible (when the TRADOS-like wordcount is enabled in the Statistics dialog box), it would help me, and probably a lot of other users who have TRADOS-based customers.


You could use Trados for quoting if you absolutely prefer the Trados results. I"ve just seen an earlier complaint from a prospective customer about memoQ consistently counting more words than Word or Trados, for the material they tested.

Countng words is far less than obvious, every tool does it differently. I'm not sure I see how a Trados-like count could be "better". MemoQ in fact takes a simplistic approach, and basically defines a word as a string of characters separated by whitespace. This can be less or more than what Trados counts. This leads to another question: isn't it way easier to measure effort by counting characters instead?

Also, I find it alarming how readily people accept these "CAT wordcounts" as "precise" basis for a quotation, or payment to a translator. The quality of the TMs and terminology (if there is any) is just as important. Not to mention the quality of the source material: how many tags will I encounter? Is it over-formatted? Are there many typos in the source? Are there editing problems preventing sane segmentation? Is it easy to see where and how the segments will turn up in the final document?

Best regards,
Gergely


Direct link Reply with quote
 

EJZ
Local time: 14:22
Polish to English
+ ...
counting characters indeed Oct 1, 2009

Gergely Vandor wrote:
This leads to another question: isn't it way easier to measure effort by counting characters instead?
Gergely


Precisely, counting characters is obviously an easier and more equitable way of measuring effort (for one, words can have anything from 1 to 'n' characters and as such are not comparable units); a variation of this is exactly what we predominantly use in Poland - specifically the work calculation unit is a page (a started page usually counts as a full page - this problem could of course easily be avoided by applying character count alone) defined as a specific number of characters depending on the nature of the work (certified or not, specialist or general, etc.).

Cheers
Eryk


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 14:22
French to Polish
+ ...
Polish standard Oct 2, 2009

EJZ wrote:

Gergely Vandor wrote:
This leads to another question: isn't it way easier to measure effort by counting characters instead?


Precisely, counting characters is obviously an easier and more equitable way of measuring effort (for one, words can have anything from 1 to 'n' characters and as such are not comparable units); a variation of this is exactly what we predominantly use in Poland

Unless Trados is used...
As Trados counts chars without spaces, it's more difficult for project managers to have a reliable char count.
Of course, you may have an approximate count if you add the number of words but the addition is a complex operation

- specifically the work calculation unit is a page

According to the Polish Standard, 1800 chars (including blank chars).

(a started page usually counts as a full page

It's obligatory only for sworn translations.
For "normal" ones, other solutions may be used (e.g. fractions of page).

- this problem could of course easily be avoided by applying character count alone) defined as a specific number of characters depending on the nature of the work (certified or not,

For the sworn translations, the 1135 chars page is used (defined by the law).

specialist or general, etc.).

Here, some translations offices try to "cheat" on the prices and impose some "irregular" pages, e.g. 1500 with spaces or 1800 without spaces.
So, in the last case (or similar), a common joke is to deliver (or negociate to deliver) the text without spaces
The standard is clear enough and I don't think it's necessary to multiply page definitions.

BTW, the rate per line (mainly 55 characters) used in some countries is based on a similar principle but the rounding is different (according to the Polish Standard, a standard typewritten page has 30 lines x 60 chars).

Cheers
GG

[Edited at 2009-10-02 08:44 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 14:22
French to Polish
+ ...
Honest quotations vs wordcounts... Trados... Oct 2, 2009

Gergely Vandor wrote:


Countng words is far less than obvious, every tool does it differently. I'm not sure I see how a Trados-like count could be "better". MemoQ in fact takes a simplistic approach, and basically defines a word as a string of characters separated by whitespace. This can be less or more than what Trados counts. This leads to another question: isn't it way easier to measure effort by counting characters instead?[/quote]
Seconded.
See my previous port.

Also, I find it alarming how readily people accept these "CAT wordcounts" as "precise" basis for a quotation, or payment to a translator. The quality of the TMs and terminology (if there is any) is just as important.

Seconded.
No comments.

Not to mention the quality of the source material: how many tags will I encounter? Is it over-formatted?

I try to explain to my students they're should always take in account the formatting.
IMHO the tags shouls be paid by default but it's difficult to force the TOs to accept it...

Are there many typos in the source? Are there editing problems preventing sane segmentation? Is it easy to see where and how the segments will turn up in the final document?

Seconded.

But the problem is the Trados wordcount became a de facto standard and some Trados compatibility is simply convenient for most of us.

Cheers
GG


Direct link Reply with quote
 

Gergely Vandor
Hungary
Local time: 14:22
English to Hungarian
does Trados Studio count the same as Trados 2007 Oct 3, 2009

Grzegorz Gryc wrote:

But the problem is the Trados wordcount became a de facto standard and some Trados compatibility is simply convenient for most of us.



Which version of Trados is the standard? Does Studio count the same as 2007?

Let's not forget that this "de facto standard" is officially dead now, if we are talking about the "old" Trados. All those that pretend nothing has happened and go on with the old Trados without a plan for the future are making a big mistake in my opinion.

Gergely


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 14:22
French to Polish
+ ...
So called Trados wordcount... Oct 3, 2009

Gergely Vandor wrote:

Grzegorz Gryc wrote:

But the problem is the Trados wordcount became a de facto standard and some Trados compatibility is simply convenient for most of us.


Which version of Trados is the standard? Does Studio count the same as 2007?

I dunno.
Probably no
I can check it for you using some fancy examples.
It may take some time (normally, when I'm posting here, it's not because I have nothing to do, I just try to stand my migraines... the flame wars are easier than translation...).

Let's not forget that this "de facto standard" is officially dead now, if we are talking about the "old" Trados.

Of course.
We're talking about the "old" Trados.
In my neighborhood, no translation office uses Trados 2009.
But the Trados 2007 series will remain long time alive in the "translation ecosystem".
Sorry for quoting SDL marketing language

All those that pretend nothing has happened and go on with the old Trados without a plan for the future are making a big mistake in my opinion.

You're right but the reality bites.
A "Trados like" wordcount (using apostrophes, hyphens, nbsp etc. as separators) is just convenient and it will be convenient at least during next 2-3 years.
Then, we'll see.

I agree, the Trados approach is obsolete but still you have a bug in the "Trados like" wordcount

Cheers
GG


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ - Why should it count less words than Trados?

Advanced search






LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search