https://www.proz.com/forum/translator_resources/44923-how_can_i_count_words_in_pdf_files.html

Pages in topic:   [1 2] >
How can I count words in PDF files?
Thread poster: suesimons
suesimons
suesimons  Identity Verified
Local time: 23:39
Portuguese to English
Apr 12, 2006

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


 
Giles Watson
Giles Watson  Identity Verified
Italy
Local time: 00:39
Italian to English
In memoriam
It certainly has... Apr 12, 2006

... and there are plenty of links here:

http://www.proz.com/post/326526#326526

You can find more relevant messages by typing "pdf" or "pdf count" in the "Search forums" box in the top righthand corner of this page.

HTH

Giles


 
Kristine Sprula (Lielause)
Kristine Sprula (Lielause)  Identity Verified
Latvia
Local time: 01:39
Member (2005)
English to Latvian
+ ...
It depends.... Apr 12, 2006

suesimons wrote:

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


If the file has been created as .pdf document, one option is copying text to Word, another one is using a special programm for word count.
But if the file has been created as a picture - the text is scanned and then made as .pdf document, the only option is manual counting.

Regards,
Kristine


 
Marisa Condurso de Nohara
Marisa Condurso de Nohara  Identity Verified
Argentina
Local time: 19:39
English to Spanish
+ ...
Word and ABBy Apr 12, 2006

Kristine Lielause wrote:

... one option is copying text to Word.....

But if the file has been created as a picture.....

Kristine


I would like to add something to Kristine's suggestion:

I usually do it by copying and pasting on Word, but be careful! Some words may become joined, and objects with readings won't be taken into account. So before clicking on "word-count" see the whole Word.doc over to separate possible word unions and treat objects separatedly.


Secondly, when it has been created as a picture, you could use AbbyFinder to transform "objects" (one pdf page copied) into "words", but truth to tell, I am not sure what happens when pdf's texts are too long. I habitually use AbbyF when dealing with individual pictures with words in-between.

Hope it helps!
McN


 
ddelvecchio
ddelvecchio
Local time: 00:39
English to Italian
+ ...
Practicount Apr 22, 2006

Hello!!

If the file isn't an image, I use Practicount&Invoice, a really nice and simple software counting words from every type of document.
It also generates invoices and many other things.

You can download a shareware version here:
http://www.practiline.com/download.htm

Bye!!
Davide


 
aitteam
aitteam
Ukraine
Local time: 01:39
English to Ukrainian
Word count in pdf, images, and 30 more file formats Jul 28, 2009

Hello,

We have just released new version of our word count software. It is called AnyCount and is used by more than 5000 people worldwide. I am sure colleagues on the forum may give their opinion on its pros and cons.

I will only mention new feature of version 7 - word count in BMP, JPG, PNG, and GIF files.

Best,
Vladimir.


 
Anna Villegas
Anna Villegas
Mexico
Local time: 16:39
English to Spanish
Try this one Jul 29, 2009

http://www.globalrendering.com/download.html

It is a good tool.



 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 00:39
Member (2006)
English to Afrikaans
+ ...
Here's how Jul 29, 2009

suesimons wrote:
How do I count the words in a .pdf document?


1. In your PDF viewer, press Ctrl+A and Ctrl+C, and then in MS Word, press Ctrl+V. If you can see the text, count it. If you can't see the text, go to step 2.

2. Use a good, expensive OCR program to convert the PDF into MS Word, and then use CompleteWordCount to count the text. If you don't want to use OCR, go to step 3.

3. Count the way we counted in the old days, by counting a few average lines and then multiplying the average by the average number of lines per page and the number of pages.

http://www.shaunakelly.com/word/CompleteWordCount/


 
Michael GREEN
Michael GREEN  Identity Verified
France
Local time: 00:39
English to French
Agree with Samuel Jul 30, 2009

... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


 
Tony M
Tony M
France
Local time: 00:39
Member
French to English
+ ...
SITE LOCALIZER
Only for text-based PDFs? Jul 31, 2009

Tadzio Carvallo wrote:
Try this one:
http://www.globalrendering.com/download.html


Yes, but as far as I can ascertain from that website, it still only seems to count words in PDF files created directly from native text formats; so it still can't solve the problem of what to do when the PDF is in fact an image from some scanned document etc.

Like Michael G., I have occasionally had to resort to printing out the file and then OCRing it, which really does seem a roundabout way of doing things! Also a problem with poorer quality originals, particularly with fine print; however, the actual absolute accuracy of the OCR is fairly unimportant, as long as on average it produces about the right number of words; and in my exprience, it's a case of 'swings and roundabouts', and the end result is usually accurate enough; after all, it is hardly cost-effective to waste a lot of time producing a to-the-word accurate wordcount, since any discrepancy is likely to be fairly small.

As I translate mainly from FR > EN, I sometimes agree with the customer to base my charging on target word count + a percentage; generally, 10% seems about right for FR>EN, though on a statistical analysis I once did of a quite large number of files, I noticed variations from –16% to +5% in the FR > EN wordcount difference, so the variability is quite large! But I find most customers don't argue with 10% (they can see for themselves that the EN take up less space!), and it's not really worthwhile wasting time trying to get greater accuracy.

In passing, I'd just like to mention one customer who requested specifically that I not reduce my EN translation by more than 5% compared to the FR, for DTP reasons! Better still, this particular customer pays me by target wordcount anyway! However, in the particular field I was working in, it was actually extremely difficult to comply!


 
Igor Moshkin
Igor Moshkin
Russian Federation
Local time: 05:39
English to Russian
+ ...
FineCount Jul 31, 2009

Try FineCount - http://www.tilti.com/tilti-com.software.finecount?pc_code=F97961DA6D40A&ver=2.5.1.1766
It's free, though requires registration. In addition to word count this soft provides you plenty of other useful information including invoice.


 
CHEN-Ling
CHEN-Ling  Identity Verified
Local time: 06:39
Chinese to English
+ ...
OCR Aug 13, 2009

Michael GREEN wrote:
... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


This is what I want to say. Actually a single OCR software, such as Shocr7.0 is enough. Usually I first save the PDF file as tiff file, then I open the saved tiff files in Shocr 7.0 and transform them into text. Finally copy these text on a word file and count.


 
Pierre Fleutot
Pierre Fleutot
Argentina
Local time: 19:39
English to French
+ ...
Excellent Dec 29, 2009

Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


 
Tam Nguyen
Tam Nguyen
Vietnam
Local time: 05:39
English to Vietnamese
+ ...
count words in PDF Jan 22, 2010

PierreF wrote:

Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


Tried it! That's OK and easy.


 
Virginia canvas
Virginia canvas
United States
Local time: 15:39
French to English
+ ...
Second vote for PractiCount Mar 26, 2010

We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tri
... See more
We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tried the standard route of Adobe Acrobat's save-as function. Low-ball word count because most of the images remained images - not editable text.

Next I tried the OCR tool ABBYY PDF Transformer (another tool I love!!). Fair results. At least ABBYY converted most of the images to text, but it still looked incomplete for estimating purposes.

Then I resorted to PractiCount. Somehow PractiCount came up with 2000 words higher than either of the other two approaches.


Note: Over the years, I have found that the success of OCR tools varies with the nature of the image and layout. ABBYY seems to be among the best (especially for foreign or multi-language docs and for retaining layout that a translator can use). But not always. Sometimes OmniPage or another OCR tool simply has better luck for a creative design layout. It seems to be a matter of trial and error with those scans or embedded images.

Good luck,
- Virginia Anderson

Oregon Translation, LLC
Building cooperative relationships with translators.
Apply as a translator here: www.oregontranslation.com
Collapse


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How can I count words in PDF files?






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »