Extracting Text from a PDF File Thread poster: CHENOUMI (X)
| CHENOUMI (X) English to French + ...
Hi! What's the easiest way to extract a PDF file? Any tips will be much appreciated. TIA, Sandra | | | Acrobat or wordfast | Jul 31, 2003 |
IF the text can be extracted at all (sometime it's not real text but an image, like a picture of the words) Wordfast (even the free trial version) will do the job quite nicely. So will Acrobat (not *Reader*), but personally I have always obtained bettere results with WF (check out the relevant options available in Pandora Box). The quality and direct usability (without much reformatting) of the output depend on the formatting of the original text that has been been pdf'ed. So sometimes th... See more IF the text can be extracted at all (sometime it's not real text but an image, like a picture of the words) Wordfast (even the free trial version) will do the job quite nicely. So will Acrobat (not *Reader*), but personally I have always obtained bettere results with WF (check out the relevant options available in Pandora Box). The quality and direct usability (without much reformatting) of the output depend on the formatting of the original text that has been been pdf'ed. So sometimes there are just no quick ways to retrieve the text. And sometimes, ay que llorar HTH PS Extraction (whatever method) will lose the formatting of the text (less so if you enable OptimalPDF or something similar in Pandora Box). If the format (or even time!) is an issue, you may be better off scanning up the printed pdf page in OCR. CHENOUMI wrote: Hi! What's the easiest way to extract a PDF file? Any tips will be much appreciated. TIA, Sandra
[Edited at 2003-07-31 16:17] ▲ Collapse | | | With Adobe Acrobat (Full version) | Jul 31, 2003 |
Hello, with the full version, you can save a pdf as rtf or as text (if it was distilled from a native file format) If you have a scanned file (all text are bitmap images) you can use an ocr software. Maybe you have to convert your page first in tif format (save as tif in the full version) if your ocr software cannot work with pdf files. Maybe it is easier to ask your client for the source files ?! Hans | | | Andrzej Lejman Poland Local time: 23:25 Member (2004) German to Polish + ... Select, copy and paste | Jul 31, 2003 |
Dear Sandra, this topic has a lot of threads. Look for earlier postings; it does not make much sense to discuss all the time about the same. Regards Andrzej | |
|
|
In Adobe Acrobat reader, under "Edit", choose "Copy File to Clipboard" or "Select All". Open a new word processing document and paste. The rest is all a matter of formatting--eliminating hyphens within words, and paragraph marks that appear at the end of every line. Eliminating running headers and/or footers. If the document has footnotes, there is more work to do. If you introduce an extra line break between every paragraph, list element and before and after every ti... See more In Adobe Acrobat reader, under "Edit", choose "Copy File to Clipboard" or "Select All". Open a new word processing document and paste. The rest is all a matter of formatting--eliminating hyphens within words, and paragraph marks that appear at the end of every line. Eliminating running headers and/or footers. If the document has footnotes, there is more work to do. If you introduce an extra line break between every paragraph, list element and before and after every title (if it's not there already), then you can eliminate all the superfluous line breaks in three easy steps. 1. change all double line breaks to a unique string (###, say). 2. change all single line breaks to a single space. 3. change all instances of your unique string back to a double line break.
[Edited at 2003-07-31 15:33] ▲ Collapse | | | Sorry, my post did double up... | Jul 31, 2003 |
...so I deleted the content of the second one!
[Edited at 2003-07-31 16:18] | | | Carlos Moreno Colombia Local time: 16:25 English to Spanish + ... Adobe Acrobat Reader 6.0 | Jul 31, 2003 |
The latest version of the free reader from Adobe, which has changed its name from Adobe Acrobat Reader to Adobe Reader 6.0, can help you. This program can read PDF documents, as well as E-books, since now it combines the Adobe E-book reader, which used to be a separate program. It can also help you make PDFs for free, and even read books aloud! If the file you need is a text, not an image of a text, and document content extraction is allowed (you can see it clicking on a little arrow... See more The latest version of the free reader from Adobe, which has changed its name from Adobe Acrobat Reader to Adobe Reader 6.0, can help you. This program can read PDF documents, as well as E-books, since now it combines the Adobe E-book reader, which used to be a separate program. It can also help you make PDFs for free, and even read books aloud! If the file you need is a text, not an image of a text, and document content extraction is allowed (you can see it clicking on a little arrow above the scroll bar), you can simply click "File - Save as Text", or "Edit - Copy to Clipboard". By now the Reader is only in English (15 MB). Versions in other languages which appear in the download page refer to older Acrobat 5.1. Download address is http://www.adobe.com/products/acrobat/readstep2.html I make clear that I do not work for Adobe or any of its subsidiaries. And enjoy your work, as I do mine! ▲ Collapse | | | achisholm United Kingdom Local time: 22:25 Italian to English + ... Some OCR programs | Jul 31, 2003 |
allow you to do this (one may have come bundled with your scanner). I like Omnipage but finereader 6 is also OK. | |
|
|
Nigel Skipper (X) Local time: 23:25 Swedish to English Freeware PDF 995 | Jul 31, 2003 |
It you use a PC this freeware is a very useful item to have. It allows youto create PDF's from inside existing applications and the edit version PDFedit 995 allows you to extract text from an exisitng PDF to a Word or text file. You can download it free of charge from www.pdf995.com //Regards, Nigel | | | Lia Fail (X) Spain Local time: 23:25 Spanish to English + ... Various methods | Aug 1, 2003 |
I have recently been downloading PDF material, and since I have limited funds, used a couple of possibilities. 1. If the PDF document toolbar has a little T on the toolbar, click on this and it will allow you to copy, then select the entire text all at once (scrolling down) or page by page. 2. If this doesn't work, there i... See more I have recently been downloading PDF material, and since I have limited funds, used a couple of possibilities. 1. If the PDF document toolbar has a little T on the toolbar, click on this and it will allow you to copy, then select the entire text all at once (scrolling down) or page by page. 2. If this doesn't work, there is an online conversion facility. See this site: http://www.adobe.com/products/acrobat/access_email.html You may want to clean up the text and save it as Word, although you will lose a lot of the features of the original format. Use the find & replace function to remove unwanted paragraph breaks and white spaces. Finally, watch out for headings, tables, boxes and similar inserts as these are relocated arbitrarily in the copied version. You will need to check that no sentences or words are cut off. And sometimes excess hyphenation may appear, which you will also have to correct manually. ▲ Collapse | | | CHENOUMI (X) English to French + ... TOPIC STARTER Thank you All! | Aug 3, 2003 |
Thanks to each one of you for your time and tips! I'm used to using the Editing tool from Acrobat, cutting and pasting the text, then reconverting it into PDF format. To Muja, DSC, GoodWords, Carlos, Alexander, Nigel, Ailish, I'll certainly put your advice and recommendations to good use, next time. Alexander: I have used the OCR option not for PDF ... See more Thanks to each one of you for your time and tips! I'm used to using the Editing tool from Acrobat, cutting and pasting the text, then reconverting it into PDF format. To Muja, DSC, GoodWords, Carlos, Alexander, Nigel, Ailish, I'll certainly put your advice and recommendations to good use, next time. Alexander: I have used the OCR option not for PDF files but in WORD. Since I rarely receive requests for extracting PDF files, I'm faced with the decision whether or not to purchase the whole Adobe Acrobat program now. Have a nice week, and Thank you again! S.:)
[Edited at 2003-08-03 21:12] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Extracting Text from a PDF File TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |