The best way to convert scanned pdf to Microsoft word
Thread poster: Karolina Petkuviene

Karolina Petkuviene  Identity Verified
Lithuania
Local time: 16:42
English to Lithuanian
+ ...
Oct 19, 2012

I have Abby fine reader 9.0. but I am not satisfied. It takes too much time while formatting the text after converting. Maybe somebody could offer better software? Thank you in advance.

 

564354352  Identity Verified
Denmark
Local time: 15:42
Danish to English
+ ...
Latest version of Adobe Acrobat Oct 20, 2012

Apparently, Adobe Acrobat XI, which has just been released, can convert PDF files directly into Word, Excel and Power Point.

I've seen their demo, and it looks excellent. Expensive, but what a time-saver.

[Edited at 2012-10-20 06:05 GMT]


 

Tony M  Identity Verified
France
Local time: 15:42
Member
French to English
+ ...
Previous discussions Oct 20, 2012

I think you'll find that this subject has been discussed at some length in previous threads, and I feel you may find some helpful comments there.

I can't say I have very wide experience of different software, but it is my highlmy personal observation that Abbyy does do an extremely good job at the character recognition part.

However, as far as document formatting is concerned, it suffers from the same problem that is surely inevitable with any such program: there is no way it can intelligently determine what the original document formatting was.

Generally, and depending on the exact format of the original document, I find the best solution is to choose the conversion option that attempts the least formatting possible; it is then relatively easy to re-apply the original document formatting manually at the end — depending of course on your customer's requirements; very often, my customers are happy to receive plain text which can then be readily re-formatted by a skilled word-processing operator.

[Edited at 2012-10-20 12:02 GMT]


 

V S Rawat
India
Local time: 19:12
English to Hindi
+ ...
Gitte! it was about image to text conversion, not text to text formatting. Oct 20, 2012

Gitte Hovedskov Hansen wrote:

Apparently, Adobe Acrobat XI, which has just been released, can convert PDF files directly into Word, Excel and Power Point.

I've seen their demo, and it looks excellent. Expensive, but what a time-saver.

[Edited at 2012-10-20 06:05 GMT]


Hi Gitte Hovedskov Hansen,

Adobe converts only text pdf files to other formats. Adobe doesn't do OCR to convert text from images to text files. I think o.p. is asking about that. ABBYY is an OCR software.

In any case, if you open a text pdf file in any pdf reading software, you can just select entire text and copy to word or any text editor, if the pdf file is not protected, so that is not an issue.

Thanks.
--
Rawat


 

neilmac  Identity Verified
Spain
Local time: 15:42
Spanish to English
+ ...
Two methods Oct 20, 2012

I use SolidConverter (mine is version 4) for converting PDFs into Word and vice versa. I find it very good and easy to use for most things, but it doesn't work with some scanned texts. You can download a free demo version for evaluation here:
http://www.soliddocuments.com/pdf/-to-word-converter/304/1

I've also just acquired a more powerful OCR program (Omnipage) which my colleague says is very good for converting to and from PDF, even scanned ones, but I haven't had the chance to try it out yet.


 

Agnes Lenkey  Identity Verified
German to Spanish
+ ...
Previous thread Oct 20, 2012

Hi mairapt,

Here is one of the previous threads about this issue:

http://www.proz.com/forum/software_applications/232135-converting_pdf_files_to_word_advice_needed.html

And here is another one:

http://www.proz.com/forum/software_applications/234410-rotating_a_pdf.html

Best regards,

Agnes


 

Siegfried Armbruster  Identity Verified
Germany
Local time: 15:42
Member (2004)
English to German
+ ...
I am happy with Finereader 8 Oct 20, 2012

mairapt wrote:
I have Abby fine reader 9.0. but I am not satisfied. It takes too much time while formatting the text after converting.


Are you using the software optimally, ie. not using the autorecognition for the format but telling the software which parts are text, which parts are tables which parts are images etc.

Finereader 8 gives very good results when you tell the software what to do and how to handle the text/formating. Afterwords there is relatively little work to do in Word, just general adaptation of margins, font/charcter spacing.

I have converted very large documents this way, and found the result always ok. However, if you tell your client that you will charge for your time you need for converting the scanned PDF, you will find that very often they find the original in a format you can process directly.



[Edited at 2012-10-20 09:53 GMT]


 

Karolina Petkuviene  Identity Verified
Lithuania
Local time: 16:42
English to Lithuanian
+ ...
TOPIC STARTER
Hello Siegfried Oct 20, 2012

I am using autorecognition and then I have to recheck all the text and to make many changes. Now I have over 100 pages.

 

Heinrich Pesch  Identity Verified
Finland
Local time: 16:42
Member (2003)
Finnish to German
+ ...
Consider typing Oct 20, 2012

If conversion gives you bad results and editing of the file takes too much time you could just type the translation into Word while reading from pdf. For long files the use of a typist could be cost effective. S/he would type the source text and you could translate it using your favorite tool.

 

Tom in London
United Kingdom
Local time: 14:42
Member (2008)
Italian to English
Not the translator's job Oct 20, 2012

Siegfried Armbruster wrote:

.....if you tell your client that you will charge for your time you need for converting the scanned PDF, you will find that very often the find the original in a format you can process directly.


Precisely. The translator's job is to translate. Not anything else.

If you're given a PDF to translate, just make sure you also deliver your finished translation as a PDF.

That'll teach 'em!

[Edited at 2012-10-20 09:32 GMT]


 

Alexander Chisholm  Identity Verified
Italian to English
+ ...
Abbyy Finereader is the best I've tried so far Oct 20, 2012

and it will handle PDF files prepared from scanned files (the vast majority of the work I am given).

I use Build 8. onwards (Mac) and it is fairly fast and very reliable.
The only problem may be formatting with very large documents.
The formatting the program outputs is pretty good, but in my experience, if the text runs over a page, the formatting will be screwed up when you translate sentences crossing a page.
Its very often worthwhile doing some pre-translation "clean-up" to avoid this problem and basically give yourself an uncluttered body of text to translate, and if this is too time time consuming then that's the price you pay. I still think it worthwhile because I can then still use Trados etc. which means I have access to TMs for speed and consistency.

Tom, I understand your point, but if the PO says:
Hand-off - PDF file
Delivery - Word file

then you accept that when you accept the job - basic and pragmatic fact of life.


 

KawhiLau
United States
PDFelement for Mac - Powerful Tool to Edit / Convert / Create PDFs Feb 1

PDFelement for Mac - Powerful Tool to Edit / Convert / Create PDFs (including scanned PDF files)

youtu.be/WRH6Q3oBqks


 

VIP9N
Local time: 16:42
Russian to English
+ ...
Wondershare PDF element is not a pure OCR tool Feb 1

KawhiLau wrote:
PDFelement for Mac - Powerful Tool to Edit / Convert / Create PDFs (including scanned PDF files)
youtu.be/WRH6Q3oBqks


This application has the OCR option as one of its features (https://pdf.wondershare.com/pdfelement-mac/). Thus, it cannot "beat" a pure OCR programme.

Another thing is that there are versions for other OSes, Windows including.

However, the question put by the topic starter remains the same on all translators' Web-forums for decades already. People always desire to get a fully equivalent and editable document from the formats, the nature of which is totally different, PDF including. They are unhappy to know, that the best way with any OCR soft is to have a clean unformatted txt exported and do some basic formatting by themselves. They want to get it ready to go for their CAT-tools.

Therefore, after OCR-ing, they export results as a so-called "resembling" or "similar" text. Sometimes, it is enough. But in 99% of cases we see here ten more following questions about how to get rid of hundreds of "rogue" codes in the document being translated, or why the sentences are chopped, or why the text is "drifting" from the intended position, or what happened with the original fonts, or whatever elseicon_smile.gif

It is strange to read those questions: is it so difficult to understand, that each and every OCR-programme uses its proper invisible codes to build the resemblance of an OCR-ed document? Today there are plenty of those applications in the market, which more or less good in OCR-ing, but as long as the principle remans the same, there will be codes and an approximate similarity.

Good luck.


 

Diana Obermeyer  Identity Verified
United Kingdom
Local time: 14:42
Member (2013)
German to English
+ ...
Nitro Feb 1

At least it doesn't create random text boxes and typically leaves tables intact.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

The best way to convert scanned pdf to Microsoft word

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search