ProZ.com global directory of translation services
 The translation workplace
Ideas

 
Pages in topic:   [1 2] >
User
Thread poster: suesimons
How can I count words in PDF files?

suesimons  Identity Verified
Local time: 04:58
Member (2003)
Portuguese to English
Apr 12, 2006

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


Direct link Reply with quote
 

Giles Watson  Identity Verified
Italy
Local time: 05:58
Member
Italian to English
It certainly has... Apr 12, 2006

... and there are plenty of links here:

http://www.proz.com/post/326526#326526

You can find more relevant messages by typing "pdf" or "pdf count" in the "Search forums" box in the top righthand corner of this page.

HTH

Giles


Direct link Reply with quote
 

Kristine Lielause  Identity Verified
Latvia
Local time: 06:58
Member (2005)
English to Latvian
+ ...
It depends.... Apr 12, 2006


suesimons wrote:

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


If the file has been created as .pdf document, one option is copying text to Word, another one is using a special programm for word count.
But if the file has been created as a picture - the text is scanned and then made as .pdf document, the only option is manual counting.

Regards,
Kristine


Direct link Reply with quote
 

Marisa Condurso de Nohara  Identity Verified
Argentina
Local time: 00:58
English to Spanish
+ ...
Word and ABBy Apr 12, 2006


Kristine Lielause wrote:

... one option is copying text to Word.....

But if the file has been created as a picture.....

Kristine


I would like to add something to Kristine's suggestion:

I usually do it by copying and pasting on Word, but be careful! Some words may become joined, and objects with readings won't be taken into account. So before clicking on "word-count" see the whole Word.doc over to separate possible word unions and treat objects separatedly.


Secondly, when it has been created as a picture, you could use AbbyFinder to transform "objects" (one pdf page copied) into "words", but truth to tell, I am not sure what happens when pdf's texts are too long. I habitually use AbbyF when dealing with individual pictures with words in-between.

Hope it helps!
McN


Direct link Reply with quote
 

ddelvecchio
Local time: 05:58
English to Italian
+ ...
Practicount Apr 22, 2006

Hello!!

If the file isn't an image, I use Practicount&Invoice, a really nice and simple software counting words from every type of document.
It also generates invoices and many other things.

You can download a shareware version here:
http://www.practiline.com/download.htm

Bye!!
Davide


Direct link Reply with quote
 

aitteam
Ukraine
Local time: 06:58
English to Ukrainian
Word count in pdf, images, and 30 more file formats Jul 28, 2009

Hello,

We have just released new version of our word count software. It is called AnyCount and is used by more than 5000 people worldwide. I am sure colleagues on the forum may give their opinion on its pros and cons.

I will only mention new feature of version 7 - word count in BMP, JPG, PNG, and GIF files.

Best,
Vladimir.


Direct link Reply with quote
 

Carvallo
Mexico
Local time: 22:58
Member (2006)
English to Spanish
Try this one Jul 29, 2009

http://www.globalrendering.com/download.html

It is a good tool.



Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 05:58
Member (2006)
English to Afrikaans
+ ...
Here's how Jul 29, 2009


suesimons wrote:
How do I count the words in a .pdf document?


1. In your PDF viewer, press Ctrl+A and Ctrl+C, and then in MS Word, press Ctrl+V. If you can see the text, count it. If you can't see the text, go to step 2.

2. Use a good, expensive OCR program to convert the PDF into MS Word, and then use CompleteWordCount to count the text. If you don't want to use OCR, go to step 3.

3. Count the way we counted in the old days, by counting a few average lines and then multiplying the average by the average number of lines per page and the number of pages.

http://www.shaunakelly.com/word/CompleteWordCount/


Direct link Reply with quote
 

Michael GREEN  Identity Verified
France
Local time: 05:58
English to French
Agree with Samuel Jul 30, 2009

... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 05:58
Member
French to English
+ ...
Only for text-based PDFs? Jul 31, 2009


Tadzio Carvallo wrote:
Try this one:
http://www.globalrendering.com/download.html


Yes, but as far as I can ascertain from that website, it still only seems to count words in PDF files created directly from native text formats; so it still can't solve the problem of what to do when the PDF is in fact an image from some scanned document etc.

Like Michael G., I have occasionally had to resort to printing out the file and then OCRing it, which really does seem a roundabout way of doing things! Also a problem with poorer quality originals, particularly with fine print; however, the actual absolute accuracy of the OCR is fairly unimportant, as long as on average it produces about the right number of words; and in my exprience, it's a case of 'swings and roundabouts', and the end result is usually accurate enough; after all, it is hardly cost-effective to waste a lot of time producing a to-the-word accurate wordcount, since any discrepancy is likely to be fairly small.

As I translate mainly from FR > EN, I sometimes agree with the customer to base my charging on target word count + a percentage; generally, 10% seems about right for FR>EN, though on a statistical analysis I once did of a quite large number of files, I noticed variations from –16% to +5% in the FR > EN wordcount difference, so the variability is quite large! But I find most customers don't argue with 10% (they can see for themselves that the EN take up less space!), and it's not really worthwhile wasting time trying to get greater accuracy.

In passing, I'd just like to mention one customer who requested specifically that I not reduce my EN translation by more than 5% compared to the FR, for DTP reasons! Better still, this particular customer pays me by target wordcount anyway! However, in the particular field I was working in, it was actually extremely difficult to comply!


Direct link Reply with quote
 

Igor Moshkin
Russian Federation
Local time: 11:58
English to Russian
+ ...
FineCount Jul 31, 2009

Try FineCount - http://www.tilti.com/tilti-com.software.finecount?pc_code=F97961DA6D40A&ver=2.5.1.1766
It's free, though requires registration. In addition to word count this soft provides you plenty of other useful information including invoice.


Direct link Reply with quote
 

Yang Min  Identity Verified
Local time: 11:58
Chinese to English
+ ...
OCR Aug 13, 2009


Michael GREEN wrote:
... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


This is what I want to say. Actually a single OCR software, such as Shocr7.0 is enough. Usually I first save the PDF file as tiff file, then I open the saved tiff files in Shocr 7.0 and transform them into text. Finally copy these text on a word file and count.


Direct link Reply with quote
 

Pierre Fleutot
Argentina
Local time: 00:58
English to French
+ ...
Excellent Dec 29, 2009


Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


Direct link Reply with quote
 

nguyentotam2002
Vietnam
Local time: 10:58
English to Vietnamese
+ ...
count words in PDF Jan 22, 2010


PierreF wrote:


Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


Tried it! That's OK and easy.


Direct link Reply with quote
 

Virginia Anderson  Identity Verified
United States
Local time: 20:58
French to English
+ ...
Second vote for PractiCount Mar 26, 2010

We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tried the standard route of Adobe Acrobat's save-as function. Low-ball word count because most of the images remained images - not editable text.

Next I tried the OCR tool ABBYY PDF Transformer (another tool I love!!). Fair results. At least ABBYY converted most of the images to text, but it still looked incomplete for estimating purposes.

Then I resorted to PractiCount. Somehow PractiCount came up with 2000 words higher than either of the other two approaches.


Note: Over the years, I have found that the success of OCR tools varies with the nature of the image and layout. ABBYY seems to be among the best (especially for foreign or multi-language docs and for retaining layout that a translator can use). But not always. Sometimes OmniPage or another OCR tool simply has better luck for a creative design layout. It seems to be a matter of trial and error with those scans or embedded images.

Good luck,
- Virginia Anderson

Oregon Translation, LLC
Building cooperative relationships with translators.
Apply as a translator here: www.oregontranslation.com


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze[Call to this topic]
Prachya Mruetusatorn[Call to this topic]

You can also contact site staff by submitting a support request »

How can I count words in PDF files?






SDL Trados Studio 2014 Starter Edition
Translation software for only €99 per year

The Starter Edition is an affordable but scaled down version of the industry’s leading translation software, SDL Trados Studio 2014 Freelance. It enables translators to join the largest supply chain at a reduced price.

More info »
Fluency Translation Suite
Come see the Fluency Difference and SAVE

The most affordable and easy to learn translation tool! Start and finish your translations faster than ever with Fluency Translation Suite 2013. TMs, Terminology, and Online Resources are fully integrated and only a click away. Download a free trial today

More info »