https://www.proz.com/forum/translator_resources/118207-can_you_recommend_a_pdf_converter_and_or_ocr.html

Pages in topic:   [1 2 3] >
Can you recommend a PDF converter and/or OCR?
Thread poster: Pristine
Pristine
Pristine
Local time: 12:00
English to German
Oct 16, 2008

I need it mainly for English and German documents.

The program should
1) Convert PDF documents into Word or Wordpad files.

The OCR should
1) Read German
2) Read English

And they should not cost an arm and a leg.

Any links to freeware and shareware would be nice but I looked and have not found anything yet.

Thanks in advance!

Kindly,
... See more
I need it mainly for English and German documents.

The program should
1) Convert PDF documents into Word or Wordpad files.

The OCR should
1) Read German
2) Read English

And they should not cost an arm and a leg.

Any links to freeware and shareware would be nice but I looked and have not found anything yet.

Thanks in advance!

Kindly,

Pristine
Collapse


 
Kevin Lossner
Kevin Lossner  Identity Verified
Portugal
Local time: 19:00
German to English
+ ...
Try using the search function in the forums Oct 16, 2008

There is a lot of information available for those who take a few seconds to look.

For a QA checklist for documents converted by OCR, take a look on the "How To" tab of my profile. There is a link there titled "Post-processing of OCR text files".


 
Roberto Bertuol
Roberto Bertuol  Identity Verified
United Kingdom
Local time: 19:00
Member (2007)
Italian to English
+ ...
PDF to Work converter Oct 16, 2008

Hi,

here is a link to the Able2Doc Professional converter:
http://www.investintech.com/order_a2d_pro.htm
it costs $69.96 and it has an OCR reader...not sure about German...
For me it works fine, although it really depends on the quality of the pdf document, i.e. whether it is a scanned document or a word document converted into pdf..
Hope it helps


 
Rimma Kehr
Rimma Kehr
Germany
Local time: 20:00
German to English
+ ...
PDF to any Text Tool (Word, FrameMaker, PageMaker, InDesign, etc. Oct 16, 2008

If you have Adobe Illustrator, you can open your PDF-file, then save every page to *.ia or *.eps format. Then you can copy text and paste it in your text tool.
If you have Adobe Reader 8.0 or 9.0, you can copy text and paste it in any text format tool.

Hope this helps.

Rimma


 
Tomas Forro
Tomas Forro  Identity Verified
Poland
Local time: 20:00
English to Slovak
+ ...
pdf to word never works well Oct 16, 2008

Hi,
I've been trying all different kinds of pdf to word converters, both cheap and expensive ones, and they never work very well.
With just text, usually there are only few formatting-related corrections needed, (but this usually works even without converter with simple drag & drop)
However, the more complicated original formatting was, the worse for the result of conversion. The very worst are pictures and text over graphical elements.
What I would suggest is to find som
... See more
Hi,
I've been trying all different kinds of pdf to word converters, both cheap and expensive ones, and they never work very well.
With just text, usually there are only few formatting-related corrections needed, (but this usually works even without converter with simple drag & drop)
However, the more complicated original formatting was, the worse for the result of conversion. The very worst are pictures and text over graphical elements.
What I would suggest is to find some freeware converters or the ones you get with free trial period and uninstall them when they expire, then get the other one etc. I think there is really no point to invest into any "professional" converters at this stage.

Or is any of you guys using pdf2word converter that you can say is really, really good?

For freeware converters, simply google words like "pdf to doc download free" or something similar and you'll get plenty of them.

As for the OCR, I've been using a couple of years ago ABBYY FineReader and it was absolutely excellent tool for all the languages. Usually I'd get about 97% accuracy of the converted text even from worse quality copies and books.
The link for ABBYY:
http://finereader.abbyy.com/
Collapse


 
Tomas Forro
Tomas Forro  Identity Verified
Poland
Local time: 20:00
English to Slovak
+ ...
Actually, the latest ABBYY OCR converter has built-in pdf2word, too Oct 16, 2008

It's 150 EURO, but these guys really know what they're doing (in OCR technology - for pdf, well, who knows? )

 
Viktoria Gimbe
Viktoria Gimbe  Identity Verified
Canada
Local time: 14:00
English to French
+ ...
Price shouldn't be an issue Oct 16, 2008

I use Nuance OmniPage Pro. It reads German, among many other languages. I know, it is the most expensive of all such software - but considering the return on investment and the amount of time it helps you to save (which, of course, depends on usage), it is worth every penny.

When looking for a tool, I don't think it is so important to watch the price. If you consider investing money into something that will help boost your productivity, that already means that the tool in question w
... See more
I use Nuance OmniPage Pro. It reads German, among many other languages. I know, it is the most expensive of all such software - but considering the return on investment and the amount of time it helps you to save (which, of course, depends on usage), it is worth every penny.

When looking for a tool, I don't think it is so important to watch the price. If you consider investing money into something that will help boost your productivity, that already means that the tool in question will put money in your pocket. So the couple hundred dollars difference isn't an issue, in my opinion.
Collapse


 
José Henrique Lamensdorf
José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 15:00
English to Portuguese
+ ...
In memoriam
Another approach Oct 16, 2008

If you need it more to preserve formatting while editing, than to use CAT tools, have a look at this one: http://www.iceni.com/infix.htm . It allows you to actually edit PDF files.

 
Allesklar
Allesklar  Identity Verified
Australia
Local time: 03:30
English to German
+ ...
PDFConverter 4 Oct 17, 2008

I use PDF Converter 4 for English and German texts as well as Infix, when it's not a scanned document and preserving the formatting is important, as José mentioned.

The PDF Converter is not ideal for poor quality scans, but works well enough for most things I am getting. I tried the demo versions of Abby and OmniPage one or two years ago and wasn't that impressed, so I went for the budget tool. Maybe I should have another look at them.


 
achisholm
achisholm
United Kingdom
Local time: 19:00
Italian to English
+ ...
Not all PDF files are the same Oct 17, 2008

Many PDF "converters" just capture the text in a PDF and make it available for other uses - rather like using the text selection tool "|" in Acrobat and copying to the clipboard, only a bit more sophisticated.

Unfortunately, many of the PDF files I work with are graphics files, i.e. PDFs produced from a scanned image. These are the typical methods used by the EU offices and regulatory bodies to store the documents submitted to them. Hence, these files don't contain any text to be co
... See more
Many PDF "converters" just capture the text in a PDF and make it available for other uses - rather like using the text selection tool "|" in Acrobat and copying to the clipboard, only a bit more sophisticated.

Unfortunately, many of the PDF files I work with are graphics files, i.e. PDFs produced from a scanned image. These are the typical methods used by the EU offices and regulatory bodies to store the documents submitted to them. Hence, these files don't contain any text to be converted.

The only way to deal with such files is to OCR the images. This is why I prefer to use OCR software for this type of task.

I currently use OmniPage 16, although I have used FireReader in the past, and both give good results. Recognition accuracy is high, the capture language can be selected and spelling in the desired language checked.

Hope this helps.
Collapse


 
Pristine
Pristine
Local time: 12:00
English to German
TOPIC STARTER
PDF converter and OCR Oct 17, 2008

Thanks to all of you for your kind responses and advice.

Best regards,

Pristine


 
Oleksandr Ivanov
Oleksandr Ivanov  Identity Verified
Ukraine
Local time: 21:00
Member (2008)
English to Ukrainian
+ ...
A nice tool from ABBYY (PDF Transformer 2.0) Oct 17, 2008

Alexander Chisholm wrote:
Unfortunately, many of the PDF files I work with are graphics files, i.e. PDFs produced from a scanned image. These are the typical methods used by the EU offices and regulatory bodies to store the documents submitted to them. Hence, these files don't contain any text to be converted.

The only way to deal with such files is to OCR the images. This is why I prefer to use OCR software for this type of task.


It is exactly for this reason that I use PDF Transformer 2.0 from ABBYY. It lets you process PDF files as either texts, or scanned images and converts the output into an RTF or XLS format (although it gives an RTF file a DOC extension, which I find a bit misleading). It also lets you choose the areas to convert (three different area types: text, table or image). It is relatively cheap (I bought mine for USD 30 two years ago). It does not put paragraph marks or line breaks at the line ends within paragraphs. You also can choose from a number of languages for the output file (almost all EU languages, Russian, Ukrainian, Turkish and Kurdish).


 
Pablo Bouvier
Pablo Bouvier  Identity Verified
Local time: 20:00
German to Spanish
+ ...
Nuance PDF Converter Oct 18, 2008

Imho, good PDF-converters does not exist. They are more or less able to satisfy your needs, but nothing else.

And, if you are a translator you should take in account that not MS-Word based CATS will read all remaining rogue codes in your converted documents. Segmentation and propagation will be troubled too, due to these rogue codes.

My first choice will be Nuance PDF c
... See more
Imho, good PDF-converters does not exist. They are more or less able to satisfy your needs, but nothing else.

And, if you are a translator you should take in account that not MS-Word based CATS will read all remaining rogue codes in your converted documents. Segmentation and propagation will be troubled too, due to these rogue codes.

My first choice will be Nuance PDF converter:

http://www.nuance.com/pdfconverter/

The main reason for that, is that it has a file text OCR built in , that allows to read text in graphical formats like bmp, other pdf-converters don't. And format maintaining is quite good in not too involved formats.

Pricing:

https://www.nuancestore.com/dr/sat5/ec_MAIN.Entry11?SP=10034&PN=0&xid=19198&trackingid=view-quickbuy&CUR=840

You may download a free 30 days trial version to check it.

http://www.nuance.com/pdfconverter/trial/spectrum/

Take in account, it is not a production too.,
Some pages won't be converted or will have a watermark.

My second choice will be SolidPDFConverter.

http://www.soliddocuments.com/products.htm?product=SolidConverterPDF

It hasn't a built in file text OCR, but real page conversion allows to maintain the format quite impressive. The problem: MS-Word does not read text in text boxes. But, this issue can be easy overcomed with a werecat macro http://www.volny.cz/ddaduc/werecat.html.
Collapse


 
Per Magnus
Per Magnus  Identity Verified
Local time: 20:00
English to Norwegian
Price and quality Oct 18, 2008

Viktoria Gimbe wrote:
I use Nuance OmniPage Pro. I know, it is the most expensive of all such software - but considering the return on investment and the amount of time it helps you to save (which, of course, depends on usage), it is worth every penny.


I agree that price is not a main issue here, but results are. I have used Nuance OmniPage pro and PDF converter pro for several years. I am not that impressed about their results. I have, however, heard several praises for Abbyy FineReader. I also notice that there are great differences between the different versions of all the products discussed here.

Does anybody know about some independent test of relevant products? It should be something out there; personally I meet pdf-files several times a month.


 
Tom in London
Tom in London
United Kingdom
Local time: 19:00
Member (2008)
Italian to English
If you're going to spend money Oct 18, 2008

... why not go the whole hog and purchase Adobe Acrobat Professional? It's a good investment if you foresee that you'll be working on lots of pdf files.

It offers numerous useful functions including OCR, save to Word etc.


 
Pages in topic:   [1 2 3] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can you recommend a PDF converter and/or OCR?


Translation news





TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »