Extracting Words from PDF for word count
Thread poster: Kian Ting
Kian Ting
Local time: 08:58
English
Feb 1, 2006

I had came across assignments from client and it's really not easy to get the word count automatically especially if the PDF with the words is an image converted to the PDF format where the text is not selectable unless you paste them as an image.

Anyone has any idea where can I get a software to convert all the image characters into countable words with any word counting software? Better still if it can be counted in MS word.

Thanks!


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 03:58
Member (2003)
Finnish to German
+ ...
search earlier fora Feb 1, 2006

This has been discussed once a week during the last 5 years at least. I use Abbyy Finreader 8.
Regards
Heinrich


Direct link Reply with quote
 

Balttext  Identity Verified
Latvia
Local time: 03:58
English to Latvian
+ ...
counting vs. converting Feb 1, 2006

You might want to make this decision first. If you already know that you will translate this pdf then it is worth converting with an OCR software (I also use ABBYY FineReader 8.0 for this).

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR. You would find it easier to use a counting software (PractiCount or alike), except if the pdf is made from a scanned document, then OCR is the only choice.

Hope it helps,


Uldis


Direct link Reply with quote
 

Burrell  Identity Verified
United Kingdom
Local time: 01:58
Member (2004)
English to Latvian
+ ...
OCR Feb 1, 2006

[quote]Balttext wrote:

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR.


Not necessarily. To get the [preliminary] word count I usually use OCR (FineReader) and send the text to Word without correcting it. The word count in essence cannot change a lot. When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc - this is the figure I invoice the clients for. So all in all it takes about 3 minutes to get the preliminary word count for quoting purposes.

Cheers,
Ines


Direct link Reply with quote
 
m_Chanoine
Local time: 02:58
French to Spanish
+ ...
editing your source text? Feb 3, 2006


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


Direct link Reply with quote
 

Burrell  Identity Verified
United Kingdom
Local time: 01:58
Member (2004)
English to Latvian
+ ...
Yes, of course Feb 3, 2006

m_Chanoine wrote:


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


I do not fix it, if I only need to know the word count. However before I actually start translating the text, I go through all the necessary motions (fixing, spellcheching etc.).


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extracting Words from PDF for word count

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs