Word word count calculating method?
Thread poster: N.M. Eklund

N.M. Eklund  Identity Verified
France
Local time: 14:41
Member (2005)
French to English
+ ...
Aug 7, 2008

Not really an essential question, but I'm curious.

Have any of you ever noticed that when you check the word count in Word, for a brief moment it shows some statistics that are then changed to the final count? It may be more noticable for large docs.
Does anyone know why it does this or what it's calculating to get this first number? Or what it's not calculating to get the final number?

I started thinking about this when planning to outsource a heavy doc, but at first I only had the PDF. My 'outsourcee' had a PDF word count application that estimated the doc at around 20k words.
When I finally got the Word version, I ran the word count and the final count was 18k words... BUT for a brief moment before displaying the final count, it had shown around 20k words.

I was just wondering...

[Edited at 2008-08-07 14:26]


Direct link Reply with quote
 

N.M. Eklund  Identity Verified
France
Local time: 14:41
Member (2005)
French to English
+ ...
TOPIC STARTER
Understanding calculating methods Aug 8, 2008

Oh dear, did I finally stump everyone?

I understand it may seem like an unimportant question, but with so many word count programs out there that give varying replies, it seemed interesting to try understand the basic process used in the most widely accepted one, MS Word.

Try it for yourself! Use a big doc (ex 70 pages) with some images in it to give the tool a bit of work to do, you'll notice a preliminary calculation that's completely different from the revised final calculation. Is it including the images as a word during the first calculation? Does it run two processes before winding up with the final amount?

I think Word just calculates anything as a word if it is preceeded by a space.
Take, for example, a French document that uses French punctuation (which places a space before : ; . ! , etc.)
MS Word will count the punctionation as a word.
For example, Copy and past the following into a .doc and then run the word count: the . TIJ (2 real words)
The count result is 3 words.

At the same time all contractions are viewed as one word: Can't, Don't, Wouldn't, etc. For example, try: Est-il à l'univers (5 real words)
The count result is 3 words.

Makes you wonder which word count program really uses the most accurate calculating process.


Direct link Reply with quote
 

Jenny Forbes  Identity Verified
Local time: 13:41
Member (2006)
French to English
+ ...
I'm curious too Aug 8, 2008

I've often noticed (and wondered about) this too. I mean the fact that Word gives a large figure which immediately changes to a smaller (and probably more accurate) one.
I didn't answer not because I wasn't interested but because I don't know how word counting systems work either. Can anyone explain?
Kind regards
Jenny


Direct link Reply with quote
 
Ryan Ginstrom
Local time: 21:41
Japanese to English
I think some kind of heuristics Aug 8, 2008

N.M. Eklund wrote:
Oh dear, did I finally stump everyone?


You probably did. It would be very hard to know how MS Word comes up with this preliminary estimate, since the code is closed. I think they obviously use some kind of heuristic measure, maybe number of characters with sampling to get an idea... but the method has to be very quick, so it doesn't delay startup of Word by too much.

Then when you actually ask for a word count, it does the real thing for you. With Word 2007, I've noticed that this can take a couple of seconds for a moderately long document.


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 14:41
Member
French to English
+ ...
I have a theory... Aug 8, 2008

...although since I know diddly-squat about computer programming and software, I may be way off the mark.

I too have noticed some odd behaviour of the Word word counter, in lots of different ways; on one occasion, I copied some text out of a PDF file, and it came up with 2 entirely different word counts for it; what was odd was that one of the counts was greatly over-inflated, and it was the lower count that was actually the correct one.

I can't help wondering if it isn't something to do with either 'undo' information, or 'track changes' — or both!

I noticed on one particularly large document that was too big to go through as an e-mail attachment, where the client had deleted all the photos, that it still came out pretty large (file size, I mean, now) — because of course the deleted photos were only 'hidden', not actually hard deleted, and so still used up file space. This set me thinking, as to whether or not track changes might affect the word count; on the face of it, it doesn't seem to: hidden text is not counted, unless you show it.

Then what about Wordfast, for example? The uncleaned doc contains both source and target segments (which already creates an annoying inflation of the pagination) — does it affect wordcount to?

My theory is that the fleeting figure one sees at the start of a wordcount is possibly the stored wordcount from the last time the document + undo information was (say) saved, and is only updated when you force a fresh word count.

It would be great to know for sure!


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word word count calculating method?

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search