ProZ.com global directory of translation services
 The translation workplace
Ideas

 
User
Thread poster: Roy Williams
Converting scans to word with OCR

Roy Williams  Identity Verified
Austria
Local time: 08:31
Member (2011)
German to English
Jan 17

Hi Everyone,

I have a hard copy of a document that I would like to scan and convert to a word file with OCR software.
The document is several pages long and as I've never tried this before, I'm wondering what is the best was to do this without ending up with a separate word file for converted each page.

Any suggestions?

Thanks in advance,

Roy

[Edited at 2012-01-17 13:53 GMT]


Direct link Reply with quote
 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 09:31
Member (2008)
English to Russian
+ ...
Press F1 in you OCR application Jan 17

...

Direct link Reply with quote
 

Roy Williams  Identity Verified
Austria
Local time: 08:31
Member (2011)
German to English
TOPIC STARTER
Test version Jan 20

Hi,

Thanks for responding. Im current using Abby Finereader in trail mode and therefore can save more than one page.

I've tried some free OCRs but the results have been disappointing to say the least. With OCR do you use?


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 08:31
Member (2006)
English to Afrikaans
+ ...
ABBYY FineReader 8.0 Pro Jan 20


Roy Williams wrote:
The document is several pages long and as I've never tried this before, I'm wondering what is the best was to do this without ending up with a separate word file for converted each page.


I have ABBYY FineReader 8.0 Pro, and when you save the batch, it gives you the option of saving the entire batch as a single file, to save the pages of each source file in a single file named after the source file, or to save the individual pages using a name scheme.

If you're scanning and OCR'ing in one operation, then the OCR program might OCR only the currently scanned page. My suggestion is to scan all the pages to JPG and then use the OCR program to process all of them at once.


I've tried some free OCRs but the results have been disappointing to say the least.


There are no *acceptably good* free OCR systems that I know of. Certain versions of MS Word has it built-in (not sure about that) and some CAT tools also have it (I think WFA has it).


Direct link Reply with quote
 

Tomás Cano Binder, CT  Identity Verified
Spain
Local time: 08:31
Member (2005)
English to Spanish
+ ...
ABBYY FineReader will do nicely Jan 20

We use ABBYY FineReader in the office.

It works fine for simple documents, but if you have documents with tons of little cells, tables, and diagrammes... I am affraid no tool will yield a perfect result if you don't want to do formatting work yourself.

ABBYY FineReader allows you to save different types of Word documents. Try them all and see which one works best for you. At times, it is best to OCR simple, unformatted text and format it yourself if you know how to use Microsoft Word.

If your target language is usually longer than the source language, you might have to enlarge the boxes ABBYY creates, and that always means manual work.

If the document is very complex with many bits and pieces, images, stamps, and images spread over the page, trying to deliver a document that looks like the original will prove to be quite cumbersome, so make sure you include an extra formatting charge in your invoice/quotation.

[Edited at 2012-01-20 12:55 GMT]


Direct link Reply with quote
 

Carvallo
Mexico
Local time: 01:31
Member (2006)
English to Spanish
Microsoft Office Document Imaging Jan 20

Totally free if you have MS Office:

Scan images and save them, one by one, in the TIFF format (selecting "Save as" from the "File" menu and name it with a "TIFF" format).

Navigate to the "Start" menu and select "Programs," "Microsoft Office Tools" and "Microsoft Office Document Imaging."

From the "File" menu, select "Open" to open your scanned document that has been saved in the *.TIFF format. You can import each image, one by one, until completing the full batch.

From the "Tools" menu, select "Send Text to Word." Or, you can select manually the text to be converted. Click "OK" to confirm. Depending on your computer's speed, the process will take anywhere from a few moments to a minute or two.

When the process is done, Microsoft Word will automatically load your document(s), which you can edit and format as you please.



Direct link Reply with quote
 

Roy Williams  Identity Verified
Austria
Local time: 08:31
Member (2011)
German to English
TOPIC STARTER
Can MemoQ? Jan 31

Can MemoQ merge documents?

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]
Alfonso Romero[Call to this topic]

You can also contact site staff by submitting a support request »

Converting scans to word with OCR






Wordfast Pro 3.0
Changing the face of translation memory

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro 3.0 through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
XTM Cloud
20,000 extra words free with XTM Cloud!

A fully featured online CAT tool and TMS, with no installation required, and a simple, intuitive interface. Maximize linguistic assets by sharing in real time as you collaborate with colleagues. Make use of next generation, cloud-based translation technol

More info »