Extracting Words from PDF for word count
Thread poster: Kian Ting

Kian Ting
Local time: 08:13
English
Feb 1, 2006

I had came across assignments from client and it's really not easy to get the word count automatically especially if the PDF with the words is an image converted to the PDF format where the text is not selectable unless you paste them as an image.

Anyone has any idea where can I get a software to convert all the image characters into countable words with any word counting software? Better still if it can be counted in MS word.

Thanks!


 

Heinrich Pesch  Identity Verified
Finland
Local time: 02:13
Member (2003)
Finnish to German
+ ...
search earlier fora Feb 1, 2006

This has been discussed once a week during the last 5 years at least. I use Abbyy Finreader 8.
Regards
Heinrich


 

Balttext  Identity Verified
Latvia
Local time: 02:13
English to Latvian
+ ...
counting vs. converting Feb 1, 2006

You might want to make this decision first. If you already know that you will translate this pdf then it is worth converting with an OCR software (I also use ABBYY FineReader 8.0 for this).

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR. You would find it easier to use a counting software (PractiCount or alike), except if the pdf is made from a scanned document, then OCR is the only choice.

Hope it helps,


Uldis


 

Ines Burrell  Identity Verified
United Kingdom
Local time: 00:13
Member (2004)
English to Latvian
+ ...
OCR Feb 1, 2006

[quote]Balttext wrote:

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR.


Not necessarily. To get the [preliminary] word count I usually use OCR (FineReader) and send the text to Word without correcting it. The word count in essence cannot change a lot. When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc - this is the figure I invoice the clients for. So all in all it takes about 3 minutes to get the preliminary word count for quoting purposes.

Cheers,
Ines


 

m_Chanoine
Local time: 01:13
French to Spanish
+ ...
editing your source text? Feb 3, 2006


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


 

Ines Burrell  Identity Verified
United Kingdom
Local time: 00:13
Member (2004)
English to Latvian
+ ...
Yes, of course Feb 3, 2006

m_Chanoine wrote:


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


I do not fix it, if I only need to know the word count. However before I actually start translating the text, I go through all the necessary motions (fixing, spellcheching etc.).


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extracting Words from PDF for word count

Advanced search






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search