Extracting Words from PDF for word count
Thread poster: Kian Ting
Kian Ting
Local time: 16:34
English
Feb 1, 2006

I had came across assignments from client and it's really not easy to get the word count automatically especially if the PDF with the words is an image converted to the PDF format where the text is not selectable unless you paste them as an image.

Anyone has any idea where can I get a software to convert all the image characters into countable words with any word counting software? Better still if it can be counted in MS word.

Thanks!


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 10:34
Member (2003)
Finnish to German
+ ...
search earlier fora Feb 1, 2006

This has been discussed once a week during the last 5 years at least. I use Abbyy Finreader 8.
Regards
Heinrich


Direct link Reply with quote
 

Balttext  Identity Verified
Latvia
Local time: 10:34
English to Latvian
+ ...
counting vs. converting Feb 1, 2006

You might want to make this decision first. If you already know that you will translate this pdf then it is worth converting with an OCR software (I also use ABBYY FineReader 8.0 for this).

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR. You would find it easier to use a counting software (PractiCount or alike), except if the pdf is made from a scanned document, then OCR is the only choice.

Hope it helps,


Uldis


Direct link Reply with quote
 

Burrell  Identity Verified
United Kingdom
Local time: 08:34
Member (2004)
English to Latvian
+ ...
OCR Feb 1, 2006

[quote]Balttext wrote:

However, if you need the word count for just making an offer to the client, it is probably not wise to spend valuable time on reading and correcting the text with OCR.


Not necessarily. To get the [preliminary] word count I usually use OCR (FineReader) and send the text to Word without correcting it. The word count in essence cannot change a lot. When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc - this is the figure I invoice the clients for. So all in all it takes about 3 minutes to get the preliminary word count for quoting purposes.

Cheers,
Ines


Direct link Reply with quote
 
m_Chanoine
Local time: 09:34
French to Spanish
+ ...
editing your source text? Feb 3, 2006


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


Direct link Reply with quote
 

Burrell  Identity Verified
United Kingdom
Local time: 08:34
Member (2004)
English to Latvian
+ ...
Yes, of course Feb 3, 2006

m_Chanoine wrote:


When I start working on a project, I process the document with FineReader again and count the wordsbefore translating the doc


Do you mean that you are fixing the source text before translating it?


I do not fix it, if I only need to know the word count. However before I actually start translating the text, I go through all the necessary motions (fixing, spellcheching etc.).


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extracting Words from PDF for word count

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search