Extracting Words from PDF for word count
Thread poster: Kian Ting

Kian Ting
Local time: 00:55
English
Feb 1, 2006

I had came across assignments from client and it's really not easy to get the word count automatically especially if the PDF with the words is an image converted to the PDF format where the text is not selectable unless you paste them as an image.

Anyone has any idea where can I get a software to convert all the image characters into countable words with any word counting software? Better still if it can be counted in MS word.

Thanks!


 

Heinrich Pesch  Identity Verified
Finland
Local time: 19:55
Member (2003)
Finnish to German
+ ...
search earlier fora Feb 1, 2006

This has been discussed once a week during the last 5 years at least. I use Abbyy Finreader 8.
Regards
Heinrich


 

Balttext  Identity Verified
Latvia
Local time: 19:55
English to Latvian
+ ...
counting vs. converting Feb 1, 2006

You might want to make this decision first. If you already know that you will translate this pdf then it is worth converting with an OCR software (I also use ABBYY FineReader 8.0 for this).

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR. You would find it easier to use a counting software (PractiCount or alike), except if the pdf is made from a scanned document, then OCR is the only choice.

Hope it helps,


Uldis


 

Ines Burrell  Identity Verified
United Kingdom
Local time: 17:55
Member (2004)
English to Latvian
+ ...
OCR Feb 1, 2006

[quote]Balttext wrote:

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR.


Not necessarily. To get the [preliminary] word count I usually use OCR (FineReader) and send the text to Word without correcting it. The word count in essence cannot change a lot. When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc - this is the figure I invoice the clients for. So all in all it takes about 3 minutes to get the preliminary word count for quoting purposes.

Cheers,
Ines


 

m_Chanoine
Local time: 18:55
French to Spanish
+ ...
editing your source text? Feb 3, 2006


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


 

Ines Burrell  Identity Verified
United Kingdom
Local time: 17:55
Member (2004)
English to Latvian
+ ...
Yes, of course Feb 3, 2006

m_Chanoine wrote:


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


I do not fix it, if I only need to know the word count. However before I actually start translating the text, I go through all the necessary motions (fixing, spellcheching etc.).


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extracting Words from PDF for word count

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search