https://www.proz.com/forum/office_applications/235631-the_best_way_to_convert_scanned_pdf_to_microsoft_word.html

The best way to convert scanned pdf to Microsoft word
Thread poster: Karolina Petkuviene
Karolina Petkuviene
Karolina Petkuviene  Identity Verified
Lithuania
Local time: 14:43
English to Lithuanian
+ ...
Oct 19, 2012

I have Abby fine reader 9.0. but I am not satisfied. It takes too much time while formatting the text after converting. Maybe somebody could offer better software? Thank you in advance.

 
564354352 (X)
564354352 (X)  Identity Verified
Denmark
Local time: 13:43
Danish to English
+ ...
Latest version of Adobe Acrobat Oct 20, 2012

Apparently, Adobe Acrobat XI, which has just been released, can convert PDF files directly into Word, Excel and Power Point.

I've seen their demo, and it looks excellent. Expensive, but what a time-saver.

[Edited at 2012-10-20 06:05 GMT]


 
Tony M
Tony M
France
Local time: 13:43
Member
French to English
+ ...
SITE LOCALIZER
Previous discussions Oct 20, 2012

I think you'll find that this subject has been discussed at some length in previous threads, and I feel you may find some helpful comments there.

I can't say I have very wide experience of different software, but it is my highlmy personal observation that Abbyy does do an extremely good job at the character recognition part.

However, as far as document formatting is concerned, it suffers from the same problem that is surely inevitable with any such program: there is no
... See more
I think you'll find that this subject has been discussed at some length in previous threads, and I feel you may find some helpful comments there.

I can't say I have very wide experience of different software, but it is my highlmy personal observation that Abbyy does do an extremely good job at the character recognition part.

However, as far as document formatting is concerned, it suffers from the same problem that is surely inevitable with any such program: there is no way it can intelligently determine what the original document formatting was.

Generally, and depending on the exact format of the original document, I find the best solution is to choose the conversion option that attempts the least formatting possible; it is then relatively easy to re-apply the original document formatting manually at the end — depending of course on your customer's requirements; very often, my customers are happy to receive plain text which can then be readily re-formatted by a skilled word-processing operator.

[Edited at 2012-10-20 12:02 GMT]
Collapse


 
V S Rawat
V S Rawat
India
Local time: 17:13
English to Hindi
+ ...
Gitte! it was about image to text conversion, not text to text formatting. Oct 20, 2012

Gitte Hovedskov Hansen wrote:

Apparently, Adobe Acrobat XI, which has just been released, can convert PDF files directly into Word, Excel and Power Point.

I've seen their demo, and it looks excellent. Expensive, but what a time-saver.

[Edited at 2012-10-20 06:05 GMT]


Hi Gitte Hovedskov Hansen,

Adobe converts only text pdf files to other formats. Adobe doesn't do OCR to convert text from images to text files. I think o.p. is asking about that. ABBYY is an OCR software.

In any case, if you open a text pdf file in any pdf reading software, you can just select entire text and copy to word or any text editor, if the pdf file is not protected, so that is not an issue.

Thanks.
--
Rawat


 
neilmac
neilmac
Spain
Local time: 13:43
Spanish to English
+ ...
Two methods Oct 20, 2012

I use SolidConverter (mine is version 4) for converting PDFs into Word and vice versa. I find it very good and easy to use for most things, but it doesn't work with some scanned texts. You can download a free demo version for evaluation here:
http://www.soliddocuments.com/pdf/-to-word-converter/304/1

I've also just acquired a more powerful OCR program (Omnipag
... See more
I use SolidConverter (mine is version 4) for converting PDFs into Word and vice versa. I find it very good and easy to use for most things, but it doesn't work with some scanned texts. You can download a free demo version for evaluation here:
http://www.soliddocuments.com/pdf/-to-word-converter/304/1

I've also just acquired a more powerful OCR program (Omnipage) which my colleague says is very good for converting to and from PDF, even scanned ones, but I haven't had the chance to try it out yet.
Collapse


 
Agnes Lenkey
Agnes Lenkey  Identity Verified
German to Spanish
+ ...
Previous thread Oct 20, 2012

Hi mairapt,

Here is one of the previous threads about this issue:

http://www.proz.com/forum/software_applications/232135-converting_pdf_files_to_word_advice_needed.html

And here is another one:
... See more
Hi mairapt,

Here is one of the previous threads about this issue:

http://www.proz.com/forum/software_applications/232135-converting_pdf_files_to_word_advice_needed.html

And here is another one:

http://www.proz.com/forum/software_applications/234410-rotating_a_pdf.html

Best regards,

Agnes
Collapse


 
Siegfried Armbruster
Siegfried Armbruster  Identity Verified
Germany
Local time: 13:43
English to German
+ ...
In memoriam
I am happy with Finereader 8 Oct 20, 2012

mairapt wrote:
I have Abby fine reader 9.0. but I am not satisfied. It takes too much time while formatting the text after converting.


Are you using the software optimally, ie. not using the autorecognition for the format but telling the software which parts are text, which parts are tables which parts are images etc.

Finereader 8 gives very good results when you tell the software what to do and how to handle the text/formating. Afterwords there is relatively little work to do in Word, just general adaptation of margins, font/charcter spacing.

I have converted very large documents this way, and found the result always ok. However, if you tell your client that you will charge for your time you need for converting the scanned PDF, you will find that very often they find the original in a format you can process directly.



[Edited at 2012-10-20 09:53 GMT]


 
Karolina Petkuviene
Karolina Petkuviene  Identity Verified
Lithuania
Local time: 14:43
English to Lithuanian
+ ...
TOPIC STARTER
Hello Siegfried Oct 20, 2012

I am using autorecognition and then I have to recheck all the text and to make many changes. Now I have over 100 pages.

 
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 14:43
Member (2003)
Finnish to German
+ ...
Consider typing Oct 20, 2012

If conversion gives you bad results and editing of the file takes too much time you could just type the translation into Word while reading from pdf. For long files the use of a typist could be cost effective. S/he would type the source text and you could translate it using your favorite tool.

 
Tom in London
Tom in London
United Kingdom
Local time: 12:43
Member (2008)
Italian to English
Not the translator's job Oct 20, 2012

Siegfried Armbruster wrote:

.....if you tell your client that you will charge for your time you need for converting the scanned PDF, you will find that very often the find the original in a format you can process directly.


Precisely. The translator's job is to translate. Not anything else.

If you're given a PDF to translate, just make sure you also deliver your finished translation as a PDF.

That'll teach 'em!

[Edited at 2012-10-20 09:32 GMT]


 
achisholm
achisholm
United Kingdom
Local time: 12:43
Italian to English
+ ...
Abbyy Finereader is the best I've tried so far Oct 20, 2012

and it will handle PDF files prepared from scanned files (the vast majority of the work I am given).

I use Build 8. onwards (Mac) and it is fairly fast and very reliable.
The only problem may be formatting with very large documents.
The formatting the program outputs is pretty good, but in my experience, if the text runs over a page, the formatting will be screwed up when you translate sentences crossing a page.
Its very often worthwhile doing some pre-translation
... See more
and it will handle PDF files prepared from scanned files (the vast majority of the work I am given).

I use Build 8. onwards (Mac) and it is fairly fast and very reliable.
The only problem may be formatting with very large documents.
The formatting the program outputs is pretty good, but in my experience, if the text runs over a page, the formatting will be screwed up when you translate sentences crossing a page.
Its very often worthwhile doing some pre-translation "clean-up" to avoid this problem and basically give yourself an uncluttered body of text to translate, and if this is too time time consuming then that's the price you pay. I still think it worthwhile because I can then still use Trados etc. which means I have access to TMs for speed and consistency.

Tom, I understand your point, but if the PO says:
Hand-off - PDF file
Delivery - Word file

then you accept that when you accept the job - basic and pragmatic fact of life.
Collapse


 
KawhiLau
KawhiLau
United States
PDFelement for Mac - Powerful Tool to Edit / Convert / Create PDFs Feb 1, 2018

PDFelement for Mac - Powerful Tool to Edit / Convert / Create PDFs (including scanned PDF files)

youtu.be/WRH6Q3oBqks


 
VIP9N
VIP9N
Local time: 14:43
Russian to English
+ ...
Wondershare PDF element is not a pure OCR tool Feb 1, 2018

KawhiLau wrote:
PDFelement for Mac - Powerful Tool to Edit / Convert / Create PDFs (including scanned PDF files)
youtu.be/WRH6Q3oBqks


This application has the OCR option as one of its features (https://pdf.wondershare.com/pdfelement-mac/). Thus, it cannot "beat" a pure OCR programme.

Another thing is that there are versions for other OSes, Windows including.

However, the question put by the topic starter remains the same on all translators' Web-forums for decades already. People always desire to get a fully equivalent and editable document from the formats, the nature of which is totally different, PDF including. They are unhappy to know, that the best way with any OCR soft is to have a clean unformatted txt exported and do some basic formatting by themselves. They want to get it ready to go for their CAT-tools.

Therefore, after OCR-ing, they export results as a so-called "resembling" or "similar" text. Sometimes, it is enough. But in 99% of cases we see here ten more following questions about how to get rid of hundreds of "rogue" codes in the document being translated, or why the sentences are chopped, or why the text is "drifting" from the intended position, or what happened with the original fonts, or whatever else

It is strange to read those questions: is it so difficult to understand, that each and every OCR-programme uses its proper invisible codes to build the resemblance of an OCR-ed document? Today there are plenty of those applications in the market, which more or less good in OCR-ing, but as long as the principle remans the same, there will be codes and an approximate similarity.

Good luck.


 
Diana Obermeyer
Diana Obermeyer  Identity Verified
United Kingdom
Local time: 12:43
Member (2013)
German to English
+ ...
Nitro Feb 1, 2018

At least it doesn't create random text boxes and typically leaves tables intact.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

The best way to convert scanned pdf to Microsoft word






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »