https://www.proz.com/forum/software_applications/129512-using_ocr_with_my_scanner_chunks_of_text_missing.html

Pages in topic:   [1 2] >
Using OCR with my scanner - chunks of text missing
Thread poster: Wendy Cummings
Wendy Cummings
Wendy Cummings  Identity Verified
United Kingdom
Local time: 04:07
Spanish to English
+ ...
Mar 8, 2009

I have an HP Scanjet 4850 and it came with OCR software. Great, I thought, a solution to all my pdf problems.

However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.

I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it mi
... See more
I have an HP Scanjet 4850 and it came with OCR software. Great, I thought, a solution to all my pdf problems.

However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.

I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it missed.

Its not even a case of garbled text - the paragraphs simply aren't there.

Is there a reason why the software would do this. And, more importantly, can it be fixed?
Collapse


 
Uldis Liepkalns
Uldis Liepkalns  Identity Verified
Latvia
Local time: 06:07
Member (2003)
English to Latvian
+ ...
Can't tell without knowing what this OCR software is Mar 8, 2009

However, with my scanner too there came some OCR soft- I think it was I.R.I.S.

Compared with Finereader which I already had- no use at all... It recognises some, but nothing like Finereader.

Uldis


Wendy Leech wrote:
Is there a reason why the software would do this. And, more importantly, can it be fixed?


 
Bogdan Burghelea
Bogdan Burghelea  Identity Verified
Romania
Local time: 06:07
English to German
+ ...
Possible explaination Mar 8, 2009

Wendy Leech wrote:

However, when scanning a good quality document, with all the text of even size, the same font and same quality, the OCR just "misses" out whole paragraphs.

I cannot see any reason why it would do so, since it picks up other paragraphs perfectly - without any errors whatsoever - even though these paragraphs are, to my eyes, identical in quality to the ones it missed.

Is there a reason why the software would do this. And, more importantly, can it be fixed?



It would help if you provide the name of the OCR software. They might all quack like ducks, but not all of them are ducks.

Secondly, it might be due to the fact that the software decides all by itself what part of the scanned text is going to be recongnised. Therefore, you should check what part of the scanned image is "active", i. e. due to be read.


 
Wendy Cummings
Wendy Cummings  Identity Verified
United Kingdom
Local time: 04:07
Spanish to English
+ ...
TOPIC STARTER
version Mar 8, 2009

Its the freeware OCR that came as part of the scanner software, but being HP i would have hoped that it would be OK.

As i said, i could understand if it was producing garbled text, but not that it misses out whole paragraphs completely.


 
Wendy Cummings
Wendy Cummings  Identity Verified
United Kingdom
Local time: 04:07
Spanish to English
+ ...
TOPIC STARTER
active sections Mar 8, 2009

Bogdan Burghelea wrote:
Secondly, it might be due to the fact that the software decides all by itself what part of the scanned text is going to be recongnised. Therefore, you should check what part of the scanned image is "active", i. e. due to be read.


How do I do this?


 
Uldis Liepkalns
Uldis Liepkalns  Identity Verified
Latvia
Local time: 06:07
Member (2003)
English to Latvian
+ ...
Manual recognition Mar 8, 2009

Even a freeware OCR should have an option to draw (mark) recognition areas manually. And it should do it. OTOH, from my experience these "complimentary" softs are of not much practical use.

But, though I myself have not tried it, but I've heard that Office XP- 2007 already contains inbuilt OCR feature (as well as speech recognition- which is not widely known, but I can certify that the later indeed does work).

You might want to Google "OCR in Office".

Uldis

Wendy Leech wrote:

Its the freeware OCR that came as part of the scanner software, but being HP i would have hoped that it would be OK.

As i said, i could understand if it was producing garbled text, but not that it misses out whole paragraphs completely.


 
Russell Jones
Russell Jones  Identity Verified
United Kingdom
Local time: 04:07
Italian to English
HP Scanjet uses Omnia OCR Mar 8, 2009

HP Scanjet uses Omnia OCR

 
Uldis Liepkalns
Uldis Liepkalns  Identity Verified
Latvia
Local time: 06:07
Member (2003)
English to Latvian
+ ...
Not all Mar 8, 2009

My scanner also is HP, but the OCR sure was not Omnia.

Uldis

Russell Jones wrote:

HP Scanjet uses Omnia OCR


 
elzbieta jatowt
elzbieta jatowt  Identity Verified
France
Local time: 05:07
French to Polish
+ ...
try it Mar 8, 2009

Hi Wendy
If you have Vista+Word, you can try:
Choose well by "seeing fonction" your texte to be scaned- do it. When a file with your image appears in the folder then open it by double click of a mouse. On the top of the page (on the rigth) you can see "save fonction": click on and save as *.TIFT format. Then again by double click, open it. On the top ( on the right) you can see "open fonction", then open in Microsoft Office Document Imaging. On the band of tools(on the top) click wit
... See more
Hi Wendy
If you have Vista+Word, you can try:
Choose well by "seeing fonction" your texte to be scaned- do it. When a file with your image appears in the folder then open it by double click of a mouse. On the top of the page (on the rigth) you can see "save fonction": click on and save as *.TIFT format. Then again by double click, open it. On the top ( on the right) you can see "open fonction", then open in Microsoft Office Document Imaging. On the band of tools(on the top) click with your mouse on the 8th window and ... after on the 9th window. It works for me. Sorry for my English.
Franela
Collapse


 
Brandis (X)
Brandis (X)
Local time: 05:07
English to German
+ ...
Abby 9.0 Mar 8, 2009

Hi! That I must admit is a great piece of software. Set at 200 dpi catch resolution you have almost all the content in one step, the rest being here and there you may have to do some copy editing. The best combination for a translator I find is Acrobat 9 ( with plug-ins) and Abby 9.0. BR Brandis

 
Wendy Cummings
Wendy Cummings  Identity Verified
United Kingdom
Local time: 04:07
Spanish to English
+ ...
TOPIC STARTER
language Mar 8, 2009

franela wrote:
Sorry for my English.


It is a little hard to follow your instructions. I see French is one of your languages- write in French if it is easier.


 
elzbieta jatowt
elzbieta jatowt  Identity Verified
France
Local time: 05:07
French to Polish
+ ...
Beaucoup plus simple Mar 9, 2009

Alors, pour scanner il faut bien choisir votre partie du texte qui doit être ensuite traitée par l’OCR à l’aide de la fonction « aperçu », en bas de la fenêtre « nouvelle numérisation ». Pour un article de presse ça peut être une colonne. Votre image, une fois scanné, se trouvera dans le répertoire « Documents scannés ». Vous faites un double-clique pour ouvrir votre fil. Une fois votre texte est sur l’écran-en haut de la page vous verrez plusieurs fonctionnalités. Tout ... See more
Alors, pour scanner il faut bien choisir votre partie du texte qui doit être ensuite traitée par l’OCR à l’aide de la fonction « aperçu », en bas de la fenêtre « nouvelle numérisation ». Pour un article de presse ça peut être une colonne. Votre image, une fois scanné, se trouvera dans le répertoire « Documents scannés ». Vous faites un double-clique pour ouvrir votre fil. Une fois votre texte est sur l’écran-en haut de la page vous verrez plusieurs fonctionnalités. Tout à fait à droite se trouve la fonction « enregistrer sous » - cliquez dessus. Une nouvelle fenêtre s’ouvre, choisissez l’option *.TIFT. Enregistrez à nouveau dans le même répertoire qu'avant (vous avez seulement changé le format). Ouvrez ce fichier. En haut de la page, tout à fait à droite, vous avez la fonctionnalité « ouvrir », cliquez là. Ils vont se dérouler les options d’ouverture- choisissez Microsoft Office Document Imaging. Dans le bandeau de commende cliquez sur la 8- ème fenêtre ( reconnaître un texte par OCR). Une fois le texte reconnu - cliquez sur la fenêtre 9 (pour enregistrer votre document sous word)Collapse


 
Jing Nie
Jing Nie
China
Local time: 11:07
Member (2011)
English to Chinese
+ ...
I often meet same problem. Mar 9, 2009

I have found that it is due to background color or background images.
To imporve the OCR quality, you may adjust the color of scanned images in "Microsoft Office Picture Manager" before OCR procedure , it have an "auto adjust" function. It will improve the contrast of your images. There are also some other similiar freewares like GIMP can do that.


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 05:07
English to Polish
+ ...
Recognize PDF as image? Mar 9, 2009

If the OCR software is based on FineReader in any way, you might be able to try these two options:

* extract text from PDF, if available
* recognize PDF as image

and which one yields better results.

HTH

Piotr


 
Anna Villegas
Anna Villegas
Mexico
Local time: 21:07
English to Spanish
Also... Mar 9, 2009

You may wish to take a look at the link below. If not using PDF, save your scanned copies as "TIFF", and do as the video says.

http://www.proz.com/videos/ocr



 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Using OCR with my scanner - chunks of text missing






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »