Pages in topic:   [1 2] >
Using OCR with my scanner - chunks of text missing
Thread poster: Wendy Cummings

Wendy Cummings  Identity Verified
United Kingdom
Local time: 23:12
Member (2006)
Spanish to English
+ ...
Mar 8, 2009

I have an HP Scanjet 4850 and it came with OCR software. Great, I thought, a solution to all my pdf problems.

However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.

I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it missed.

Its not even a case of garbled text - the paragraphs simply aren't there.

Is there a reason why the software would do this. And, more importantly, can it be fixed?


Direct link Reply with quote
 

Uldis Liepkalns  Identity Verified
Latvia
Local time: 01:12
Member (2003)
English to Latvian
+ ...
Can't tell without knowing what this OCR software is Mar 8, 2009

However, with my scanner too there came some OCR soft- I think it was I.R.I.S.

Compared with Finereader which I already had- no use at all... It recognises some, but nothing like Finereader.

Uldis


Wendy Leech wrote:
Is there a reason why the software would do this. And, more importantly, can it be fixed?


Direct link Reply with quote
 

Bogdan Burghelea  Identity Verified
Romania
Local time: 01:12
English to German
+ ...
Possible explaination Mar 8, 2009

Wendy Leech wrote:

However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.

I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it missed.

Is there a reason why the software would do this. And, more importantly, can it be fixed?



It would help if you provide the name of the OCR software. They might all quack like ducks, but not all of them are ducks.

Secondly, it might be due to the fact that the software decides all by itself what part of the scanned text is going to be recongnised. Therefore, you should check what part of the scanned image is "active", i. e. due to be read.


Direct link Reply with quote
 

Wendy Cummings  Identity Verified
United Kingdom
Local time: 23:12
Member (2006)
Spanish to English
+ ...
TOPIC STARTER
version Mar 8, 2009

Its the freeware OCR that came as part of the scanner software, but being HP i would have hoped that it would be OK.

As i said, i could understand if it was producing garbled text, but not that it misses out whole paragraphs completely.


Direct link Reply with quote
 

Wendy Cummings  Identity Verified
United Kingdom
Local time: 23:12
Member (2006)
Spanish to English
+ ...
TOPIC STARTER
active sections Mar 8, 2009

Bogdan Burghelea wrote:
Secondly, it might be due to the fact that the software decides all by itself what part of the scanned text is going to be recongnised. Therefore, you should check what part of the scanned image is "active", i. e. due to be read.


How do I do this?


Direct link Reply with quote
 

Uldis Liepkalns  Identity Verified
Latvia
Local time: 01:12
Member (2003)
English to Latvian
+ ...
Manual recognition Mar 8, 2009

Even a freeware OCR should have an option to draw (mark) recognition areas manually. And it should do it. OTOH, from my experience these "complimentary" softs are of not much practical use.

But, though I myself have not tried it, but I've heard that Office XP- 2007 already contains inbuilt OCR feature (as well as speech recognition- which is not widely known, but I can certify that the later indeed does work).

You might want to Google "OCR in Office".

Uldis

Wendy Leech wrote:

Its the freeware OCR that came as part of the scanner software, but being HP i would have hoped that it would be OK.

As i said, i could understand if it was producing garbled text, but not that it misses out whole paragraphs completely.


Direct link Reply with quote
 

Russell Jones  Identity Verified
United Kingdom
Local time: 23:12
Italian to English
HP Scanjet uses Omnia OCR Mar 8, 2009

HP Scanjet uses Omnia OCR

Direct link Reply with quote
 

Uldis Liepkalns  Identity Verified
Latvia
Local time: 01:12
Member (2003)
English to Latvian
+ ...
Not all Mar 8, 2009

My scanner also is HP, but the OCR sure was not Omnia.

Uldis

Russell Jones wrote:

HP Scanjet uses Omnia OCR


Direct link Reply with quote
 

elzbieta jatowt  Identity Verified
France
Local time: 00:12
Member (2007)
French to Polish
+ ...
try it Mar 8, 2009

Hi Wendy
If you have Vista+Word, you can try:
Choose well by "seeing fonction" your texte to be scaned- do it. When a file with your image appears in the folder then open it by double click of a mouse. On the top of the page (on the rigth) you can see "save fonction": click on and save as *.TIFT format. Then again by double click, open it. On the top ( on the right) you can see "open fonction", then open in Microsoft Office Document Imaging. On the band of tools(on the top) click with your mouse on the 8th window and ... after on the 9th window. It works for me. Sorry for my English.
Franela


Direct link Reply with quote
 
xxxBrandis
Local time: 00:12
English to German
+ ...
Abby 9.0 Mar 8, 2009

Hi! That I must admit is a great piece of software. Set at 200 dpi catch resolution you have almost all the content in one step, the rest being here and there you may have to do some copy editing. The best combination for a translator I find is Acrobat 9 ( with plug-ins) and Abby 9.0. BR Brandis

Direct link Reply with quote
 

Wendy Cummings  Identity Verified
United Kingdom
Local time: 23:12
Member (2006)
Spanish to English
+ ...
TOPIC STARTER
language Mar 8, 2009

franela wrote:
Sorry for my English.


It is a little hard to follow your instructions. I see French is one of your languages- write in French if it is easier.


Direct link Reply with quote
 

elzbieta jatowt  Identity Verified
France
Local time: 00:12
Member (2007)
French to Polish
+ ...
Beaucoup plus simple Mar 9, 2009

Alors, pour scanner il faut bien choisir votre partie du texte qui doit être ensuite traitée par l’OCR à l’aide de la fonction « aperçu », en bas de la fenêtre « nouvelle numérisation ». Pour un article de presse ça peut être une colonne. Votre image, une fois scanné, se trouvera dans le répertoire « Documents scannés ». Vous faites un double-clique pour ouvrir votre fil. Une fois votre texte est sur l’écran-en haut de la page vous verrez plusieurs fonctionnalités. Tout à fait à droite se trouve la fonction « enregistrer sous » - cliquez dessus. Une nouvelle fenêtre s’ouvre, choisissez l’option *.TIFT. Enregistrez à nouveau dans le même répertoire qu'avant (vous avez seulement changé le format). Ouvrez ce fichier. En haut de la page, tout à fait à droite, vous avez la fonctionnalité « ouvrir », cliquez là. Ils vont se dérouler les options d’ouverture- choisissez Microsoft Office Document Imaging. Dans le bandeau de commende cliquez sur la 8- ème fenêtre ( reconnaître un texte par OCR). Une fois le texte reconnu - cliquez sur la fenêtre 9 (pour enregistrer votre document sous word)

Direct link Reply with quote
 

Jing Nie
China
Local time: 06:12
Member (2011)
English to Chinese
+ ...
I often meet same problem. Mar 9, 2009

I have found that it is due to background color or background images.
To imporve the OCR quality, you may adjust the color of scanned images in "Microsoft Office Picture Manager" before OCR procedure , it have an "auto adjust" function. It will improve the contrast of your images. There are also some other similiar freewares like GIMP can do that.


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 00:12
Member (2005)
English to Polish
+ ...
Recognize PDF as image? Mar 9, 2009

If the OCR software is based on FineReader in any way, you might be able to try these two options:

* extract text from PDF, if available
* recognize PDF as image

and which one yields better results.

HTH

Piotr


Direct link Reply with quote
 

Anna Sylvia Villegas Carvallo
Mexico
Local time: 17:12
English to Spanish
Also... Mar 9, 2009

You may wish to take a look at the link below. If not using PDF, save your scanned copies as "TIFF", and do as the video says.

http://www.proz.com/videos/ocr



Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Using OCR with my scanner - chunks of text missing

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search