Problems with OCR and small text
Thread poster: James Greenfield

James Greenfield  Identity Verified
United Kingdom
Local time: 23:20
French to English
+ ...
Nov 29, 2015

Hi,

I am currently translating a dead PDF. I managed to OCR the document and the results were fine apart from the bibliography section at the end which is in very small print. The results for this section make no sense. Using abbyy finereader I tried to increase the resolution but the results were equally as bad. Does anyone have any advice? This is the first time I have had this problem. Perhaps someone with Abby finereader could guide me as to how to properly increase the resolution. When I try to do this the image size automatically becomes smaller and it still is unable to recognise the text. Many thanks for any advice.


 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 01:20
Member (2008)
English to Russian
+ ...
can you Nov 29, 2015

send me the file?

also, if it is raster, then all you have is all you have.


 

James Greenfield  Identity Verified
United Kingdom
Local time: 23:20
French to English
+ ...
TOPIC STARTER
email Nov 29, 2015

Sergei Leshchinsky wrote:

send me the file?

also, if it is raster, then all you have is all you have.


Thanks, I've just sent you an email.


 

James Greenfield  Identity Verified
United Kingdom
Local time: 23:20
French to English
+ ...
TOPIC STARTER
could anyone help? Nov 29, 2015

I don't suppose anyone has really powerful OCR software that would be prepared to do me a massive favour. I can't manage to OCR the bibliograohy which is in small text and to hand type the 64 entries it is going to take me a long time. Thanks very much.

 

Melissa McMahon  Identity Verified
Australia
Local time: 10:20
Member (2006)
French to English
Not sure if post-facto solutions will help Nov 29, 2015

Hi James,

I'm not an expert, but I think if the scan of the original document was not a high enough resolution, then attempts to increase the resolution of the scan won't help, because the "raw material" is inadequate. If I take a blurry photo of something, no amount of fiddling with the sharpness or resolution of the photo will give me a clear photo. I think the only alternative to typing out the text is to get a better scan.

Good luck!
Melissa


 

James Greenfield  Identity Verified
United Kingdom
Local time: 23:20
French to English
+ ...
TOPIC STARTER
Thanks Nov 29, 2015

Hi Melissa,

Yes, I think that's right. This section is in English anyway so I have decided not to include it. I thought about including it as it is the bibliography and the French text refers to these English journals, but as you say there is no way of increasing the resolution and hand typing it out would take me an awful long time,

James


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 00:20
English to Russian
+ ...
Do you really need to type it? Nov 30, 2015

If the list of references is already in the target language anyway, it makes sense to ask the client if they'd accept it as a pasted image instead of text. If so, you can just copy it using the Snapshot tool of Adobe Reader, then paste it into your target document.

 

esperantisto  Identity Verified
Local time: 02:20
Member (2006)
English to Russian
+ ...
Convert to black and white Nov 30, 2015

In my experience, increasing the resolution above 300 dpi has no noticeable effect on recognition results even for small print. However, there is one setting (off by default) that can be usable: Tools → Options → General → More options… → Convert color/gray-scale images to black and white (translating this menu items from Russian UI for FR 8.0, thus, they may be different in your case). Try it with on.

Also, if the sections in question are French only, do select French only for the language and (re)recognize.


 

Tom in London
United Kingdom
Local time: 23:20
Member (2008)
Italian to English
No problem Nov 30, 2015

James Greenfield wrote:

Hi,

I am currently translating a dead PDF. I managed to OCR the document and the results were fine apart from the bibliography section at the end which is in very small print. The results for this section make no sense. Using abbyy finereader I tried to increase the resolution but the results were equally as bad. Does anyone have any advice? This is the first time I have had this problem. Perhaps someone with Abby finereader could guide me as to how to properly increase the resolution. When I try to do this the image size automatically becomes smaller and it still is unable to recognise the text. Many thanks for any advice.


I don't know about you, James, but my Abbby Fine Reader for MacOS outputs to plain text. The resulting file can then be opened in Word and saved as a .doc file. Then you can alter the text any way you want to. I do this all the time.

[Edited at 2015-11-30 07:51 GMT]


 

Rolf Keller
Germany
Local time: 00:20
English to German
Enlarge the picture externally Dec 1, 2015

esperantisto wrote:

In my experience, increasing the resolution above 300 dpi has no noticeable effect on recognition results even for small print.


Ack.

Convert color/gray-scale images to black and white


Ack.

Plus plan C:
Enlarge the picture beforehand.

If needs be, go to a copy shop, make an enlarged copy, try different contrast settings etc, then scan/export the result onto an USB stick. The shop staff will help you with this.

Back in your office, OCR the file on the stick.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Problems with OCR and small text

Advanced search






SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search