Scanned files in pfd
Thread poster: Alicia Casal

Alicia Casal  Identity Verified
Argentina
Local time: 03:07
Member (2005)
English to Spanish
+ ...
Sep 15, 2006

Would you use Abby to digitize scanned files sent in pdf format into word documents to be afterwards used with trados?

I see certain mistakes when i use this process.

tks.

[Subject edited by staff or moderator 2006-09-15 11:26]


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 01:07
English to French
+ ...
It depends on the document Sep 15, 2006

I have used OCR on scanned documents, sometimes even handwritten, with excellent results, and I have had expereiences with PDFs with much less success.

The rule of thumb is that the more colours and shapes you have in a given document, the less chances of success you have. What I do in the case of such documents is take snapshots of parts of the document to exclude as much as possible any images or shapes and this usually helps a lot, although it takes a little more time also.

I have also noticed that sometimes, the resolution of the original you are trying to OCR does not comply with the required settings. In such cases, open the original scanned document in an image editor and change its resolution to match the required setting (I believe this is usually 96 dpi). Even if you original resolution is less than this and the fact that you increase this setting will not make your scan look better, the OCR software somehow better recognizes characters because when you zoom in, the fonts are much smoother, thus easier to read by the OCR's eye.

Finally, it is always good to use this piece of software: http://digital.hollmen.dk/products/autounbreak/index.htm

Give it a try (read its instructions first to fully understand what it does), I bet you will find it useful. I use it together with OCR and PDFs and I can't live without it anymore.

Be good!


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 07:07
Member (2004)
English to Slovenian
+ ...
charge the client in any case for pdf->DOC/RTF step Sep 15, 2006

Would you use Abby to digitize scanned files sent in pdf ...?.
Of course;)

In such cases I ask the client to send the files first before I say anything. I've had absolutely horrible experiences with all kinds of postIts and fax copies &c, disguised as a respectable pdf file.

[Edited at 2006-09-15 06:22]


Direct link Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 07:07
German to English
+ ...
Tweaking and post-editing Sep 15, 2006

Alicia Casal wrote:
Would you use Abby to digitize scanned files sent in pdf format into word documents to be afterwards used with trados?
I see certain mistakes when i use this process.
tks.


Hi Alicia,
Yes, I always use it (in my case in preparation for DVX), and yes, there are some mistakes.

In Abbyy there are settings to define the parts of the page to scan and whether they are normal text, tables etc. And there are options to straighten crooked lines, "despeckle" text blocks etc.
So I almost always draw a frame around each block, and if necessary I call up other options by right clicking.
Usually the results are pretty good, although of course they depend on the visual quality of the source ("GIGO").

And before I feed the resulting Word doc into DVX, I spellcheck and check for bogus line breaks, manual hyphenation marks etc. If I miss these, there are still workarounds within DVX (probably in Trados, too), but things run more smoothly if I get a clean source text first.


Direct link Reply with quote
 
Manuel Rossetti
Local time: 06:07
pdf files Sep 16, 2006

Vito Smolej wrote:

Would you use Abby to digitize scanned files sent in pdf ...?.
Of course;)

In such cases I ask the client to send the files first before I say anything. I've had absolutely horrible experiences with all kinds of postIts and fax copies &c, disguised as a respectable pdf file.

[Edited at 2006-09-15 06:22]


I've seen some scribble scrabble from kids that are better than some .pdf files I've received.


Direct link Reply with quote
 

Tony M  Identity Verified
France
Local time: 07:07
Member
French to English
+ ...
Basic info required for novice, please! Sep 18, 2006

This sounds just like what I often need to do!

I already have OCR software (some variety of Read IRIS, I believe) bundled with my scanner, but I don't know much about using it.

Ought I to be able to use it on PDFs, and if not, what's this Abbyy you're talking about?

Any info and tips gratefully received!


Direct link Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 07:07
German to English
+ ...
Abbyy FineReader 8.0 Sep 19, 2006

Tony -Dusty- wrote:
This sounds just like what I often need to do!
I already have OCR software (some variety of Read IRIS, I believe) bundled with my scanner, but I don't know much about using it.
Ought I to be able to use it on PDFs, and if not, what's this Abbyy you're talking about?
Any info and tips gratefully received!


You could check the help file of your Read IRIS to see if it handles PDFs. Freebies are usually old versions which don't handle all the latest tricks, but you may be lucky.

Otherwise, the two market leaders in the field are Abbyy FineReader 8.0 (otherwise known as FR8) and Omnipage (don't know the version number). Both of them cost (I think I paid almost 100 EUR to upgrade to the latest FR8, and I think it was something in the order of 150-200 EUR for a new licence - I believe it is more expensive in the US of A, and I seem to remember that Omnipage is more expensive anyway, although that may be out of date now).
The word "on the street" is that Abbyy is more responsive to user queries (they certainly helped me with a couple of queries that I had early on). But both programs should do a decent job. As to features, I believe they leapfrog each other in their latest versions, so both are pretty much "state of the art" in their latest incarnations.
http://www.abbyy.com/finereader_ocr/ will give you the latest on FR8. There is a "Try&Buy" option (can't remember whether it was a time limit or a set number of free scan sessions).
Perhaps somebody can give the link for Omnipage.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Scanned files in pfd

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search