Pages in topic:   [1 2] >
Converting PDF to Word - best tool?
Thread poster: John Fossey

John Fossey  Identity Verified
Canada
Local time: 04:10
Member (2008)
French to English
Nov 18, 2016

Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?

I have been using ABBYY PDF Transformer 2.0 for years. It allows you to specify which part of the page is text, table or image and you can tell it to put text on top of an image. But it has been unavailable for new installs for some years.

ABBYY suggested I upgrade to 3.0 but I found it not very good.

Microsoft Word 2016 claims to convert PDF to Word, but I found the results completely useless.

Surely there must be some new and improved technology?


Direct link Reply with quote
 

Bernhard Sulzer  Identity Verified
United States
Local time: 04:10
English to German
+ ...
Thoughts Nov 18, 2016

John Fossey wrote:

Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?

I have been using ABBYY PDF Transformer 2.0 for years. It allows you to specify which part of the page is text, table or image and you can tell it to put text on top of an image. But it has been unavailable for new installs for some years.

ABBYY suggested I upgrade to 3.0 but I found it not very good.

Microsoft Word 2016 claims to convert PDF to Word, but I found the results completely useless.

Surely there must be some new and improved technology?


This is not really a direct answer, but from experience, also involving CAT tools, if I feel I need a Word version, I ask the client to provide it (when they are an agency). That's as a security when things go awry later when it comes to replacing text or using that Word file in a CAT tool. I have had quite some bad experiences with converted PDF files.


Direct link Reply with quote
 

Tom in London
United Kingdom
Local time: 09:10
Member (2008)
Italian to English
I agree with Bernhard Nov 18, 2016

Bernhard Sulzer wrote:

This is not really a direct answer, but from experience, also involving CAT tools, if I feel I need a Word version, I ask the client to provide it (when they are an agency). That's as a security when things go awry later when it comes to replacing text or using that Word file in a CAT tool. I have had quite some bad experiences with converted PDF files.


My experience is the same as Bernhard's. The best conversion tool is......the client!

I became convinced of this after one particularly nasty conversion job that threw the pagination of a long illustrated document into complete chaos. My translation was very good (as I think it always is) but I just could not fix the pagination and wasted a lot of time trying to do that.

There ensued an almighty row with that client and no more work from them for many months. Eventually they came back to me because I always do a good job on the translations but now I always insist that the client provide me with a Word conversion and -very importantly- I check through the conversion before accepting the job.


Direct link Reply with quote
 
wotswot  Identity Verified
France
Local time: 10:10
Member (2011)
French to English
Try Nuance or Solid PDF Tools Nov 18, 2016

Both these do quite a decent job for what I call "clean" PDFs, i.e. PDFs created by Adobe software or by Word 2013/2016 (Save as, PDF).
But in my experience none of these tools do an acceptable job for "dirty" PDFs, i.e. scans, photocopies, etc.


Direct link Reply with quote
 

John Fossey  Identity Verified
Canada
Local time: 04:10
Member (2008)
French to English
TOPIC STARTER
Client's conversion usually just as bad Nov 18, 2016

Well, the problem I frequently have is that the client's conversion is just as bad, so I end up reverting to PDF Transformer 2.0 to get a decent conversion. I was hoping there was progress in newer tools.

It seems that modern conversion tools don't let you set the job up, but depend on automatically formatting the output, which the tool inevitably gets wrong.


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 10:10
Member
French to English
+ ...
My own experience Nov 18, 2016

I often receive PDF > DOC conversions, and most agencies are dleighted with the facsimile formatting, but don't appreciate all the nightmarish 'fudges' used to achieve it — until I send them back my translation and the bill!

I currently had a need to do this myself, so I thought I'd experiment.

I first of all tried 2 of the free online services; the first one yielded rssults that were worse than useless — the amount of additional work needed would have been more than re-creating the original document from scratch.
The second one couldn't handle my large document (51 pages) and simply got stuck... I lfet it running for about an hour, but to no avail.
Another online service offered a free trial — but limited to a file size too small for my immediate needs.

I then downloaded a trial version of the Nitro software (fully functional, for 14 days), and so far, the results seem promising. I opted for the 'middle road' — partially formatted text supplied in a column, but without attmpting to create a facsimile of the original layout. My 2 large documents took some time to process on my ancient slow PC (I think it was built by Brunel!), but it did get there in the end; I didn't find it terribly intuitive to use — that said, I did manage to do what I wanted without needing to read the instructions!
The results were not bad at all; most of the OCR was spot on, even on pages where the text was seriously askew; the partial formatting was not a lot of help, but also not too much of a hindrance. The worst thing for me was that it used multiple spaces to simulate justification, which plays havoc later with CAT; it DOES enable you to get rid of line returns, and I was able to do a few global s&r passes to put most obvious things right, as well as globally chaning the font, spacing, etc.

Overall, I'm very pleased with it, and may well end up buying this one.


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 09:10
Member (2009)
Dutch to English
+ ...
AABBYY FineReader 12 Nov 18, 2016

The absolute best is: ABBYY FineReader 12. I've tried and tested them all.

I don't really think asking the client is a good solution, because every time I tried that in the past, they supplied me with a worse job than I could have done myself.

(if using ABBYY FineReader 12, remember to manually scan through the file in the program, and select the fields, such as Table, Text, Image: this will really help improve the final quality)


Direct link Reply with quote
 

John Fossey  Identity Verified
Canada
Local time: 04:10
Member (2008)
French to English
TOPIC STARTER
Thanks for the info Nov 18, 2016

Michael Joseph Wdowiak Beijer wrote:

(if using ABBYY FineReader 12, remember to manually scan through the file in the program, and select the fields, such as Table, Text, Image: this will really help improve the final quality)


That's what I was hoping to hear - the manual selection of layout is what most of the other programs miss.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 10:10
Member (2006)
English to Afrikaans
+ ...
Hmm... Nov 18, 2016

John Fossey wrote:
Is there a PDF to Word conversion tool that does a good job, including OCR, correct layout, etc.?


If you want OCR, then... no, can't help. But if the PDF is editable, then try Trados 2015 or Wordfast Pro 4. I was pleasantly surprised recently to discover that their conversions to Word/RTF were very good. In both cases, simply load the PDF as a translatable, and the CAT tool will generate a DOC/X somewhere.

I also recall earlier versions of OCR software allowed me to select boxes myself, but the latest versions all auto-select, and although I can then adjust the boxes, I can no longer specify the sequence in which the boxes are read/saved.


Direct link Reply with quote
 

Bernhard Sulzer  Identity Verified
United States
Local time: 04:10
English to German
+ ...
More thoughts Nov 18, 2016

Michael Joseph Wdowiak Beijer wrote:

The absolute best is: ABBYY FineReader 12. I've tried and tested them all.

I don't really think asking the client is a good solution, because every time I tried that in the past, they supplied me with a worse job than I could have done myself.

(if using ABBYY FineReader 12, remember to manually scan through the file in the program, and select the fields, such as Table, Text, Image: this will really help improve the final quality)


Asking the client for the Word file has one reason: not to get blamed if I screw up because I used my own file.
Have I converted files myself? Yes, I have.
Did I have a lot of great experiences? Not really.
It will depend on the structure and formatting of the PDF file and the actual software used to create the original file from which the PDF file was created (ist conversion) before converting that one again to Word (2nd conversion); there are certainly files that can be converted more easily, but there are many that are hardly manageable (after having been converted from a PDF file) in a CAT tool and will allow you to create another PDF file for the client that looks like the one you received from the client.

I personally don't really depend on a conversion tool - and if I did, I would charge the client for using it.


Direct link Reply with quote
 

Artem Vakhitov  Identity Verified
Estonia
English to Russian
+ ...
ABBYY PDF Transformer 2.0 or new-ish FineReader Nov 18, 2016

FineReader is good if you need for example to correct skewed original or set more complex image recognition options. I have FR 11 Pro and it produces good results but sometimes, sadly, worse than PDF Transformer 2.0. So I personally need them both. In addition, I like the PDF Transformer's GUI better in that it's uncluttered. Some things are best addressed by direct copying and pasting from the PDF file if it's not a scanned one.

Direct link Reply with quote
 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 11:10
Member (2008)
English to Russian
+ ...
Only FR + your head will work well Nov 18, 2016

I have been using ABBYY PDF Transformer 2.0 for years.

This tool is just "Scan&Read" button of the bigger product -- FineReader. It is fully automatic and it is not always smart, or course.

Get the full version of FineReader. Manual segmentation is much better, especially with tables and rich formatting: you can change options and methods and see the result on the fly.

It's like with kids — worth doing yourself.

[Редактировалось 2016-11-18 20:39 GMT]


Direct link Reply with quote
 

Robert Rietvelt  Identity Verified
Local time: 10:10
Member (2006)
Spanish to Dutch
+ ...
Where do you need it for? Nov 18, 2016

I haven't got the solution, but when I receive a PDF-file, I can't use it in Studio, although you can import it, but the results are horrible. What I do (possible with most PDF's I receive) is copying the text (and sometimes the pictures) and paste it in Word. The results are reasonable, and above all, workable! Works for me.

Hence my question, where do you need it for.


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 10:10
Member
French to English
+ ...
Impossible with scanned 'image' PDF files Nov 18, 2016

Robert Rietvelt wrote:

What I do is copying the text and paste it in Word.


That's fine when it is a PDF created directly from a native-format file.

The problem we are discussing here is when the PDF originates from a scanned document — i.e. is in the form of an image — where the only solution is to process it using OCR; or else to re-create the entire document manually from scratch!

Many people — myself certainly included! — feel more comfortable working with an editable document in the source language, into which we can enter our target translated text, deleting as we go.

Or of couse we might want to process using a CAT tool, which naturally require access to the source text in order to be able to function.


Direct link Reply with quote
 

Robert Rietvelt  Identity Verified
Local time: 10:10
Member (2006)
Spanish to Dutch
+ ...
That is why I said ..... Nov 18, 2016

Tony M wrote:

Robert Rietvelt wrote:

What I do is copying the text and paste it in Word.


That's fine when it is a PDF created directly from a native-format file.

The problem we are discussing here is when the PDF originates from a scanned document — i.e. is in the form of an image — where the only solution is to process it using OCR; or else to re-create the entire document manually from scratch!

Many people — myself certainly included! — feel more comfortable working with an editable document in the source language, into which we can enter our target translated text, deleting as we go.

Or of couse we might want to process using a CAT tool, which naturally require access to the source text in order to be able to function.


.... most PDF's I receive.

For the rest I never met a tool that converts PDF 1 om 1 correctly.

[Edited at 2016-11-18 22:09 GMT]


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting PDF to Word - best tool?

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search