How to extract text from Acrobat Reader
Thread poster: Céline Graciet

Céline Graciet
Local time: 15:31
English to French
Jan 15, 2003

Hi everyone, I\'m trying to extract text from a PDF file, but to no avail. It won\'t let me highlight or select any of it. I even tried scanning it to then send the picture to Word, but the result was major mumble jumble. Hope the more technically minded amongst you will come to my rescue !

Direct link Reply with quote
 

Natalie  Identity Verified
Poland
Local time: 16:31
Member (2002)
English to Russian
+ ...

MODERATOR
Hi Celine, Jan 15, 2003

I am pretty sure that your pdf file is in fact a grafical pdf, so the best way would be opening it in an OCR application able of reading pdfs (for example, FineReader 6) and convert it to text. You may contact me privately if you need any technical help.



Best,

Natalia


Direct link Reply with quote
 

Endre Both  Identity Verified
Germany
Local time: 16:31
Member (2002)
English to German
No way to get around scanning Jan 15, 2003

...or, more precisely, reading the PDF into a character recognition (OCR) software, if your PDF is an all graphics file (indicated by the impossibility of highlighting text).



The results of course depend on your OCR software and the settings you apply before recognition.



In any case, the procedure is likely to involve a lot of work (I\'ve just spent a few hours on a similar task) and only pays off if the text contains lots of repetitions and you can use a CAT software afterwards. Otherwise, just use a printout and type the translation into Word.



A basic rule of mine, BTW: no discounts for repetitions in PDF texts.



Feel free to get in touch with me directly if you think I can help you.



Endre

EB Communications


Direct link Reply with quote
 
xxxTService  Identity Verified
Local time: 16:31
English to German
Three possible reasons. Jan 16, 2003

1) There are some kinds of protected PDFs around, allowing you to view the contents only and preventing any attempt to copy.

Solution: Request an unprotected version.



2) Some PDFs cannot be opened correctly with the free version of Acrobat Reader.

Solution: Get Acrobat 5 - but it\'s quite costly.



3) Some PDFs just show \"garbage\" when copied and pasted into another application.

Solution: Contact me; I wrote a tiny algorithm to decode that \"garbage\" using MS Access.


Direct link Reply with quote
 

monitor  Identity Verified
Local time: 16:31
English to German
+ ...
more than one solution Jan 16, 2003

Hi Céline

- first you should try to find out whether your actual pdf is copy protected. If this is the case safe the file under a new file name which in most cases removes the protect mode. In order to do so you need to have Adobe Acrobat, so not just the Reader.

- In Adobe Acrobat you can safe text directly while exporting into an rtf-file.

- you should also consider Gemini solo, a file / image extraction tool from inceni.com, which can be downloaded as trial version for free (restricted usage) but it works.

Hope this is all fine for you

Kind Regards

Marcel

The protect mode cannot be ommited by using Acrobat Reader!

[ This Message was edited by:on2003-01-16 09:17]


Direct link Reply with quote
 

Céline Graciet
Local time: 15:31
English to French
TOPIC STARTER
thanks! Jan 16, 2003

Following on some of your advice, I downloaded a freeware OCR. Ok, it didn\'t work (wouldn\'t save my document as a Word doc) but it was good to try! It\'s called WebOCR and seems really good, if you can make it work...

Direct link Reply with quote
 
dkalinic
Local time: 16:31
Croatian to German
+ ...
Abbyy FineReader works fine with PDF files Jan 16, 2003

You might try using Abbyy FineReader. It reads and extracts PDF files as Word documents. The graphics stays there too.



Greetings,

Davor


Direct link Reply with quote
 

monitor  Identity Verified
Local time: 16:31
English to German
+ ...
Abbyy is it!!! Jan 17, 2003

After the last comment I went to the bookstore bought Fine Reader and had it installed on my notebook.

I took a 24 pages corporate brochure in pdf and had it imported and extracted into word 2000.

Wow!!! Never seen that before. Buy version 6.0 with that new feature and you are safe, once and forever

Marcel


Direct link Reply with quote
 
Simona Oliva
France
Local time: 16:31
French to Italian
+ ...
click on a button Feb 10, 2003

Hi Celine,



This reply might come too late but I just found a button in Acrobat Reader called \"select a text\" (there is a T and a small square on the right hand side). If you click on it, you will be able to highlight the text you need, then right-click on your mouse and eventually copy and paste it onto a Word doc.

Hope it helps.

Simona


Direct link Reply with quote
 

Matthew Coulson  Identity Verified
Albanian to English
+ ...
PDF tools Feb 11, 2003

pdf2txt will change the text from pdf to a plain text file. This can be helpful but you remove all formatting when doing this. It is fairly inexpensive at $38.00 for a license. There is a free trial as well. For more info see:

http://www.verypdf.com/pdf2txt/pdf2txt.htm



You can also use pstotext. It is a bit more difficult to use so if you aren\'t very tech savy it probably isn\'t for you. You need to install GhostScript on your system and GhostView (both free) and then pstotext and then execute the extract function. This doesn\'t handle every type of pdf but it will handle many of them. You can find out more about it at:

http://www.research.compaq.com/SRC/virtualpaper/pstotext.html



A list of other tools can be found at:

http://www.pdfzone.com/toolbox/toolfilter.html

This page tells you all you wanted to know about PDF\'s but would rather never have to learn.



All tools to do a word count from pdf including Adobe Acrobat do have one weakness in that you can make a PDF that is nothing more than a scanned page without any OCR. This makes a PDF that is nothing more than a picture so there would be no way to extract a word count from this type of file without using an OCR program yourself.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to extract text from Acrobat Reader

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search