Online OCR - for free
Thread poster: Alison Schwitzgebel

Alison Schwitzgebel
France
Local time: 05:00
Member (2002)
German to English
+ ...
Nov 18, 2002

Check out http://docmorph.nlm.nih.gov/docmorph/ if you have to convert tifs, jpegs, etc. into text files (so that you can work with them electronically). You have to register with the site (I know that means some junk mail ;-( ), but then you just upload your file from your computer and the site runs the OCR over it. You can also specify the language for the source file to make the OCR work more accurately.



Alison


Direct link Reply with quote
 

Terry Gilman  Identity Verified
Germany
Local time: 05:00
Member (2003)
German to English
+ ...
So does the site save the data ...? Nov 18, 2002

Hi Alison,



This sounds emminently useful, but what about client confidentiality issues?



Probably revealing my techno ignorance here.



Terry


Direct link Reply with quote
 

Alison Schwitzgebel
France
Local time: 05:00
Member (2002)
German to English
+ ...
TOPIC STARTER
Saving data.... Nov 18, 2002

the site only retains the data while you are actually \"there\". As I understood the general blurb, as soon as you click out of the site the OCR version and your original file are both lost.



Of course I wouldn\'t send confidential stuff anywhere over the Internet without heaps of protection/encryption and the customer\'s express consent, but for translators with newspaper articles, scanned birth certificates, and other general blurb to translate it seems like a great tool.



Alison


Direct link Reply with quote
 

Terry Gilman  Identity Verified
Germany
Local time: 05:00
Member (2003)
German to English
+ ...
Yes, we typically have articles in .tif.... Nov 18, 2002

...that would be nicer to do if scanned in.



Thanks for the info.

Terry


Direct link Reply with quote
 

schmurr  Identity Verified
Local time: 05:00
Italian to German
+ ...
but it's PDF! :-((( Nov 18, 2002

they say they convert into PDF, but that means image files, as our secretary says: she has always to scan them. Though Kim Metzger told me on his Acrobat 5 he has a column button to copy text, but I haven\'t ((

I\'m for forbidding PDF and putting their inventors into jail…

[ This Message was edited by: on 2002-11-19 09:52 ]


Direct link Reply with quote
 

Mats Wiman  Identity Verified
Sweden
Local time: 05:00
Member (2000)
German to Swedish
+ ...

MODERATOR
Excellent result! Nov 18, 2002

Thanks Alison!



I just tested it and it seems pretty flawless - and VERY FAST.



Great service for non-confidential texts.



I\'l ask them what safeguards they have.



Mats


Direct link Reply with quote
 

Sven Petersson  Identity Verified
Sweden
Local time: 05:00
English to Swedish
+ ...
Brill! Nov 18, 2002

Thanks Alison!



It\'s brill!



Sven.


Direct link Reply with quote
 

Nikita Kobrin  Identity Verified
Lithuania
Local time: 06:00
Member (2010)
English to Russian
+ ...
Very poor results! Unfortunately :-( Nov 18, 2002

Dear Alison,



Thanks a lot for sharing with us this resource. I was highly enthusiastic about it especially after Mats\' positive remark but after trying myself I became very pessimistic.



First of all DocMorph and MyMorph do not process GIF and PDF files though great deal of documents for translation are just in these formats.



Then I\'ve tried JPEG files. For 3 files out of 5 I got the following notification as a result: \"No text found on this page\" (in fact there WERE texts there).



The results of processing of other 2 texts were extremely poor. Just have a look:



DOCUMENT:



RESULT:

IBM Microdri,e diskmod tile D~,I.,oed at lh~ JB%lA1nt.&. R~,~ ... ~h Artrooon,~d 9 S,, 19M to to ol. mg P,.,i&d to. Ctont,urc. Hill., dilpl., it Sp 11198 J)j Dr. ( urric Nlunc~ of IBNI r.pa,jjj 340,11bites t2 side-,

-34,1 ..... .& Libroop,dish \'d Sm, 42.8 , 36.4 , 5 rout. Rotational speed 4M0 IJJINI

k ren I densitt. ~ 3 \'10 \' bitskilAnd,



DOCUMENT:



RESULT:

Paq~odoqul ~Vvj ii; m Pievoo e qffinou~ (sn\") amOais juawaBEUeN S ~OeJN Aq MuSS]o PuP purqu ew Si Assuar MaN GPIOU001 u Al D WQA ASiN P WOO JnQq UO PaP3001 APIDel a4l

saie oossS Pue saeAoidwe saiwoisno OP jo ~iaq jam aqj ol Pag wwoo Bu umwai ~q Owes

a~ PS SOW SP04jaw aguodsoi PUS UP PJSAwd uoqnllod ds 1 o jo ssauaiSmE uaqP6uojP; ol s uo ssiw speeqo Oil ds majew snopiezeq See Po jo culum aqj joi sanbLupol PuR saomap doloAap ol Sue Bu Isal aA parqo Ponpum ol aoe[d aps A EjUaWUW AUa US SOP Aoid ki oej aql (sadAl o ainImadwag saAem) suo lipuoa PJaWU0J AUB p oijum japun 10 LP M IUBWUOJ AU8

auuBw e ui papnpumio aq uBa 5u u ml puB iBasm

6ujPSijjuawdinbaasucmJsoj dsloaEos n;ajaqm 411Pq AIUO OHI 9 (~~SINHO) MuEl Isai EPJOwu0J Au~ pie nw S sprjaPEA snOPIEZOH Pue 0



The only point of Mats\' appraisal I can agree with is that DocMorph is really VERY FAST!



Sincerely,

Nikita Kobrin



P.S. Perhaps I did something wrong? You can take the above documents http://www-db.stanford.edu/pub/voy/museum/pictures/display/1-MD-IBM-text.jpg and http://www.ohmsett.com/media/Home%20Page%20Paragraph%20Text.jpg and try it youself. If you manage to get better results please let me know.

[ This Message was edited by: on 2002-11-18 18:45 ]


Direct link Reply with quote
 

Natalie  Identity Verified
Poland
Local time: 05:00
Member (2002)
English to Russian
+ ...

MODERATOR
Hello Nikita, everything is okay Nov 18, 2002

The image submitted to the DocMorph site has only 72 dpi (like most images on web pages). But OCR software hates images of that quality.

After adjusting the resoluon to 200 dpi I submitted one of your images, and here is the result:



Think answers. Oil and Hazardous Materials Simulated EnvIronmental Test Tank (OHMSETT) is \" only facility whens full-scale oil spill response equipment testing, research, and training can be conducted In a marna environment with oil under controlled environmental conditions (waves, temperature, oil types). The facility provides an environmentally safe plane to conduct objective testing and to develop devices and techniques for the control of oil and hazardous material spills. Ohmsed\'s mission is to strengthen awareness of oil spill pollution prevention and response methods, while at the mine thne remaining cornmifted to the well being of Its customers, employees, and associates. The facility, located an hour south of New York City, In Leonardo, New Jersey, is maintained and operated by the Minerals Management Servior (MIMS) through a contrad Win MAR. Incorporated.



As you can see, the result is not so bad.





Direct link Reply with quote
 

Nikita Kobrin  Identity Verified
Lithuania
Local time: 06:00
Member (2010)
English to Russian
+ ...
Oh, that's great Natalie, thanks! Nov 18, 2002

Now it\'s quite another story! I think DocMorph should instruct users on resolution aspects.



But why do you say that \"the image submitted to the DocMorph site has only 72 dpi\"? I have saved the image to my PC and opened it in MS Photo Editor. In \"Properties\" I see that its resolution is 300 dpi.



Am I doing something wrong again?


Direct link Reply with quote
 

Rossana Triaca  Identity Verified
Uruguay
Local time: 00:00
Member (2002)
English to Spanish
Alison, thanks a lot! Nov 19, 2002

I was honestly impressed after a few tests. This is by far my new OCR for non-confidential, no-format texts.



Thanks again,

Rossana


Direct link Reply with quote
 

Sven Petersson  Identity Verified
Sweden
Local time: 05:00
English to Swedish
+ ...
Solutions Nov 19, 2002

Dear Nikita,



You can overcome your format problem (pdf and others) by screen capture (http://www.mirekw.com/winfreeware/mwsnap.html) followed by convertion into suitable format (http://www.irfanview.com/).



Best of luck!



Sven.



Direct link Reply with quote
 

Alison Schwitzgebel
France
Local time: 05:00
Member (2002)
German to English
+ ...
TOPIC STARTER
Nikita: It could be that the site is thinking for you.... Nov 19, 2002

It could be that if you enter, for example, a Russian e-mail address, the site will automatically select your chosen scanning language as Russian - which would give you a garbled result. I think you have to be careful to select the correct text language to get the best result.



HTH



Alison


Direct link Reply with quote
 

Nikita Kobrin  Identity Verified
Lithuania
Local time: 06:00
Member (2010)
English to Russian
+ ...
:-) & :-( Nov 29, 2002

Sven Petersson: \"You can overcome your format problem (pdf and others) by screen capture (http://www.mirekw.com/winfreeware/mwsnap.html) followed by convertion into suitable format (http://www.irfanview.com/).\"



Thank you, Sven, for nice links. MVSnap is a very handy programme. I have already tried it - works perfectly. As for IrfanView I haven\'t downloaded it as I have a better programme for format conversion of this kind - PolyView (http://www.polybytes.com).



With MyMorph site everything is still very bad: I\'ve tried again and the only result I got was a notification \"No text found on this page\". What\'s the heck?



Alison: \"It could be that if you enter, for example, a Russian e-mail address, the site will automatically select your chosen scanning language as Russian - which would give you a garbled result.\"



No, Alison, the Russian e-mail address is not the reason. BTW, \"Russian e-mail addresses\" don\'t exist at all - in all e-mail addresses in Russia they use ONLY Roman characters! Here is mine for example: kobrin@takas.lt (though it\'s not in Russia).



Besides that at MyMorph they ask you to choose the \"primary language in file\".



So, to my deepest regret I still can\'t use this potentially great feature.



Cheers,

Nikita Kobrin

[ This Message was edited by: on 2002-11-30 23:32 ]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maria Castro[Call to this topic]

You can also contact site staff by submitting a support request »

Online OCR - for free

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs