Pages in topic:   [1 2] >
Word count in memoQ produces 377 words less than word count in CafeTran!!!
Thread poster: Michael Beijer

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
Jun 19, 2014

I just did a word count on the same text with 4 different tools (higher number is better):

CafeTran: 2,641 words
Word: 2,615 words
AnyCount: 2,587 words

memoQ: 2,264 words


Quite interesting, wouldn’t you say? What the %$£* just happened between the first three programs (which are all more or less similar) and memoQ?

2,641 (CafeTran) − 2,264 (memoQ) = 377 words

That is, if I count the text in memoQ, rather than in CafeTran (my CAT tool), I get paid 377 words less!

Michael

[Edited at 2014-06-19 22:03 GMT]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 12:37
Member (2006)
English to Afrikaans
+ ...
Impossible to comment without seeing the file Jun 20, 2014

Michael Beijer wrote:
I just did a word count on the same text with 4 different tools...


It is impossible to give any sort of response to this without seeing the file that you did a word count for.


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 12:37
Member (2005)
English to Spanish
+ ...
Normal... Jun 20, 2014

I'm affraid this is normal. Each tool counts the words in a slightly different way. For instance, I think memoQ counts words separated with a slash ("pcs/month") as one word, while other tools count two words.

In another forum about this same matter, another colleague once asked "What is a word?" Although I found the question a bit shocking at the beginning, I now believe that it makes sense if you consider it from the point of view of CAT tools.

What you need is to agree the method of counting with your customer. If they do not count the files themselves, then the wordcount of your usual CAT tool is probably the simplest for you in the long run.

[Edited at 2014-06-20 05:45 GMT]


Direct link Reply with quote
 

neilmac  Identity Verified
Spain
Local time: 12:37
Spanish to English
+ ...
Swings and roundabouts Jun 20, 2014

I think it's par for the course. Some you win, some you lose.

Yesterday, a client sent me a PDF to translate. When I converted it to Word, a message came up saying "unknown source code" or something like that, and the text appeared scrambled on screen. With the MS Word counter, the number of words came up as roughly 2100. I told the client and explained that the translation would take much longer if I had to fix the format/appearance of the text, and asked them if they could send it to me again. The client duly agreed and shortly afterwards sent me a decent looking copy of the text in MS Word to work with; however, I was surprised to find that this time the word count came up as roughly 1800 words. As I have plenty of work on these days, as far as I'm concerned, the shorter the texts for translation are, the better. However, I can see that it might be prejudicial for someone who was desperate for work or money. Luckily for me, right now that isn't an issue.


Direct link Reply with quote
 

Thomas Rebotier  Identity Verified
Local time: 03:37
English to French
well known Jun 20, 2014

yep... add to this so-so segmentation and ugly tags. Memo-Q has been targetted at agencies not freelancers...

Direct link Reply with quote
 

Meta Arkadia
Local time: 17:37
English to Indonesian
+ ...
Yep Jun 20, 2014

Thomas Rebotier wrote:
Memo-Q has been targetted at agencies not freelancers...

First you samsung DejaVu, then you samsung SDL Trados.

Cheers,

Hans


Direct link Reply with quote
 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
@Samuel: Jun 20, 2014

Samuel Murray wrote:

Michael Beijer wrote:
I just did a word count on the same text with 4 different tools...


It is impossible to give any sort of response to this without seeing the file that you did a word count for.


Hi Samuel,

You are right of course. Because I want to get to the bottom of this I have selected a sample Word document I found online and am going to test count it in every tool I have and do a comparison. Feel free to do the same. I propose we test this file:

http://www.unece.org/stats/documents/ece/ces/ge.30/2011/4.e.doc

-----------------------------------------*

Incidentally, I also posted this on the memoQ mailing list, and Michał Skoczyński recommended reading Paul Filkin’s very interesting blog article on this subject:

http://multifarious.filkin.com/2012/11/13/wordcount/ (well worth a read!)

I just re-ran the word count referred to in my first post in AnyCount, but this time selected ‘Skip numbers’ (available under Settings), and I now get a count that is almost as low as memoQ’s count:

CafeTran: 2,641 words
Word: 2,615 words
AnyCount: 2,587 words
AnyCount (Skip numbers): 2,348 words
memoQ: 2,264 words
Client’s count (Studio): 2,264 words


-----------------------------------------*

Although I translate in CafeTran these days, up until yesterday I was still regularly doing all my word counts in memoQ (because I like it's handy Export as Trados-compatible .csv feature*, which I use as input for the CATCount program in my invoicing tool TO3000). Well, not anymore.

From now on, all my word counts are going to be done either in CafeTran, or AnyCount. I am also going to put a notice on my website that I no longer accept memoQ or Studio counts as they do not reflect the amount of work I have to put into a text.

‘Skip numbers’? Are they crazy? Numbers might not always take as long as words, but they do need to translated, and every single one of them needs to be manually checked. Not charging for them is therefore idiotic. It might be acceptable to implement some kind of ‘Number weight’ feature, similar to memoQ’s relatively new ‘Tag weight’ idea (See http://kilgray.com/memoq/60/help-en/index.html?statistics_dialog.html for info on what that is), although I think all numbers should just be counted normally.

Strangely, I just read the following on the page I just referred to, which would seem to indicate that memoQ does count numbers:

Note: In memoQ, similarly to Microsoft® Excel®, every string or character that is between whitespaces is counted as a word. Therefore in memoQ mode you always count numbers as a single word and hyphenated words like in-bound are also considered to be a single word.

However, the only way I can explain the very low word counts I am getting with memoQ is that it is not counting numbers.

-----------------------------------------*

Anyway, so I suggest we all run some test counts on the file I mentioned (http://www.unece.org/stats/documents/ece/ces/ge.30/2011/4.e.doc ) and try to figure out what is going on here.

Michael

* Statistics > Export > CSV (Per-file, Trados compatible)


Direct link Reply with quote
 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
Idiotic, LSP-oriented software Jun 20, 2014

Tomás Cano Binder, CT wrote:

I'm affraid this is normal. Each tool counts the words in a slightly different way. For instance, I think memoQ counts words separated with a slash ("pcs/month") as one word, while other tools count two words.

In another forum about this same matter, another colleague once asked "What is a word?" Although I found the question a bit shocking at the beginning, I now believe that it makes sense if you consider it from the point of view of CAT tools.

What you need is to agree the method of counting with your customer. If they do not count the files themselves, then the wordcount of your usual CAT tool is probably the simplest for you in the long run.

[Edited at 2014-06-20 05:45 GMT]


Hi Tomás,

Two words separated with a slash are two words, and ought to be counted as such.

As I mentioned above, I am going to do a comparison, figure out which CAT or word counting tool produces a count that most accurately reflects the amount of work that needs to be done, by me, the translator, and use and accept only that. Luckily, these days I’m in a position to pick and choose my jobs and clients and thus free to choose not to use idiotic, LSP-oriented tools.

I used to really love memoQ, but they have been treading on slippery ground recently.

Michael


Direct link Reply with quote
 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
my results so far … Jun 20, 2014

Samuel Murray wrote:

Michael Beijer wrote:
I just did a word count on the same text with 4 different tools...


It is impossible to give any sort of response to this without seeing the file that you did a word count for.


Here are my results so far for the following Word document: http://www.unece.org/stats/documents/ece/ces/ge.30/2011/4.e.doc

Word counts (highest first):

CafeTran:                6954 total, 378 repetitions
AnyCount:                6950 total
PractiCount:             6901 total
MSWord:                  6899 total
memoQ (memoQ):           6880 total, 333 repetitions
memoQ (TRADOS-like):     6854 total, 328 repetitions
AnyCount (Skip numbers): 6701 total
EmEditor:                6315 total

I installed OmegaT (3.1.1, update 1) and tried to run a word count (Tools > Statistics & Tools > Match Statistics), but don’t quite know how to interpret the results:

‘Total’ (7760)
‘Remaining’ (7760)
‘Unique’ (6629)
‘Unique Remaining’ (6629)

Repetitions: 1131
No match: 6593
Total: 7760


Michael

[Edited at 2014-06-20 22:33 GMT]

[Edited at 2014-06-20 22:34 GMT]


Direct link Reply with quote
 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
I know which tool *I* will be using to do my word counts from now on… Jun 20, 2014

…CafeTran.

I would also like to quote a reply from a fellow CafeTran user in the CafeTran mailing list today:

Dear Michael,

I think that memoQ and Studio are written with the LSP in mind whereas CafeTran is written by a translator for fellow translators.

And no, I don't think I'm exaggerating here. I think that this difference is exactly the reason why memoQ and Studio both are bloatware, whereas important, basic features are missing (e.g. auto-assembling in Studio). But hey, you can make a 15-language project with them in a second, leaving out all numbers, cross matches, perfect matches, and god knows what other matches.

Kilgray is trying to have a nice, sympathetic image (with fests and declarations of independence etc.). But the truth is that it is copying SDL in its focus on LSPs and in calculation schemes that aren't in the interest of freelancers. Not only the counting algorithms are LSP oriented (which isn't per se end customer oriented) but also the way of delivery of translation jobs. Just push a new job in your server edition and let the freelancer log in and try to download your 4600 words project with only 23 new words from your sloooow server. Not your problem that the freelancers loses valuable time with this transfer shit. What happened to good old e-mail and sending links to XLIFF files stored on Dropbox?

There's a war going on out there. Just read what one of the most vocal fans of memoQ writes about SDL. Personally I think that memoQ has gone the wrong direction too. "We never supported shared folders", is the reply one gets when one complains about broken support for Parallels. The same person at Kilgray wrote enthusiastic stories about memoQ well-behaving in Parallels before. But hey, LSPs don't ask for Parallels support. Only some lunatic freelancer who wants to use a Mac.

Thank god that we have an alternative.

Cheers,

Hans

Source: https://groups.google.com/forum/?fromgroups=#!topic/cafetranslators/b_iP_XcIgdw


Michael


Direct link Reply with quote
 

Alex Lago  Identity Verified
Spain
Local time: 12:37
Member (2009)
English to Spanish
+ ...
Let's have the LSPs do the selling Jun 22, 2014

Imagine an imaginary world in which CAT tool vendors face a dilemma:

How do they get translators to use their product?

Option 1: make superb software and sell it at reasonable prices

Option 2: make mediocre bloatware and sell it at high prices

Option 2 sounds best to them because mediocre software takes less programming than superb software and selling at high prices is better than selling at reasonable prices.

How can they do this though, if their competition offers better software at lower prices, how can they make people choose their software.

Eureka, they had an epiphany, they decided to make their software ideal for big LSPs, those that have thousands of translators, and make them force the translators to use the software.


I'm glad I don't live in than imaginary world, or do I?


Direct link Reply with quote
 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
… a baroque monstrosity such as SDL Studio or Across … Jun 22, 2014

Alex Lago wrote:

Imagine an imaginary world in which CAT tool vendors face a dilemma:

How do they get translators to use their product?

Option 1: make superb software and sell it at reasonable prices

Option 2: make mediocre bloatware and sell it at high prices

Option 2 sounds best to them because mediocre software takes less programming than superb software and selling at high prices is better than selling at reasonable prices.

How can they do this though, if their competition offers better software at lower prices, how can they make people choose their software.

Eureka, they had an epiphany, they decided to make their software ideal for big LSPs, those that have thousands of translators, and make them force the translators to use the software.


I'm glad I don't live in than imaginary world, or do I?


Hi Alex,

I remember hearing somewhere just how much of Kilgray’s earnings derives from their Server customers, compared to how much they make off of their memoQ Pro product (aimed at freelancers). That’s the problem.

Once a CAT tool has gained a certain number of users, it becomes logical to go after the big boys, the LSPs and the large organisations. However, as more and more LSP-centric functionality is added, the tool will start to morph into a baroque monstrosity such as SDL Studio or Across. All kinds of things will start to change, and none of them will benefit us, the actual translators. However, the more money they earn, the more money they will have to spend on advertising. Will have to spend on advertising. Ads for their bloated monsters will pop up everywhere, tricking naïve translators into buying into their myth, their narrative. ‘Buy our tool because without it you will be missing out on most of the jobs.’ ‘Our tool is necessary, as it is the de facto standard.’ Etc. Etc. Etc. All lies, but very easy to believe if you are a person who is just starting out and who doesn’t have much work yet, and especially if you are not very tech savvy. People want to believe. Especially if you’ve managed to charge them a ridiculous amount of money for your software, which will mean that the poor person will have even more of an incentive to stick with your terrible program, or so they tell themselves. Head in the sand and ‘Oh, I suppose I just need to fork out for yet another ‘Gold Star SDL Training Package’ … maybe I’m just not committed enough to mastering this amazing, complex tool...’.

Interesting (smaller is better):

– Across Personal Edition: god knows
– SDL Studio 2014: 350 MB
– Wordfast Pro: 164 MB
– memoQ 2014: 67.81 MB
– CafeTran: 3.5 MB

Michael



[Edited at 2014-06-22 16:39 GMT]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 12:37
Member (2006)
English to Afrikaans
+ ...
WFP, WFC, OmegaT, Trados 2007 Jun 22, 2014



WFP:

Repetitions: 350
Internal fuzzies
95%-99%: 117
85%-94%: 53
75%-84% 111
50%-74%: 0
No Match: 6161
Total: 6792

WFC:

Repetitions: 428
00%-49%: 6372
Total: 6800

OmegaT:

Repetitions: 451
No match: 6533
Total: 6984

Trados 2007 (as DOC):

Repetitions: 417
No match: 6247
Total: 6664

Trados 2007 (as TTX):

Repetitions: 403
No match: 6426
Total: 6829

Michael Beijer wrote:
I installed OmegaT and tried to run a word count, but don’t quite know how to interpret the results:

‘Total’ (7760)
‘Remaining’ (7760)
‘Unique’ (6629)
‘Unique Remaining’ (6629)

Repetitions: 1131
No match: 6593
Total: 7760


What I find interesting is that you and I both did a wordcount in OmegaT and yet got two different word counts -- you got 7760 and I got 6984. The "unique" count is the count without repetitions. The "remaining" count is the number of words that haven't been translated yet (the non-remaining count is the total, including both segments that are already translated and those that must still be translated).

[Edited at 2014-06-22 18:37 GMT]


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 12:37
English to Czech
OmegaT Jun 22, 2014

Samuel Murray wrote:



OmegaT:

Repetitions: 451
No match: 6533
Total: 6984

Michael Beijer wrote:
I installed OmegaT and tried to run a word count, but don’t quite know how to interpret the results:

‘Total’ (7760)
‘Remaining’ (7760)
‘Unique’ (6629)
‘Unique Remaining’ (6629)

Repetitions: 1131
No match: 6593
Total: 7760


What I find interesting is that you and I both did a wordcount in OmegaT and yet got two different word counts -- you got 7760 and I got 6984. The "unique" count is the count without repetitions. The "remaining" count is the number of words that haven't been translated yet (the non-remaining count is the total, including both segments that are already translated and those that must still be translated).

[Edited at 2014-06-22 18:37 GMT]


My wordcount in OmegaT without TM:

Total Unique Repetitions
CafeTran 6954 6576 378
OmegaT 7760 6629 1131
No Match 6593
Match 36

I think, that my OmegaT count the numbers. The difference of unique words in segments is 6593- 6576 = 17 words.

Milan


Direct link Reply with quote
 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:37
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
It pays to be informed. Jun 22, 2014

This just goes to show how important it is to be aware of the myriad ways of counting words and fuzzies. So far, it looks like CafeTran and OmegaT produce the highest counts, which is obviously in our favour.

Michael


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word count in memoQ produces 377 words less than word count in CafeTran!!!

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search