How to convert jpeg into doc or pdf?
Thread poster: Renata Forgacs

Renata Forgacs  Identity Verified
United Kingdom
Local time: 02:23
English to Hungarian
+ ...
Feb 22, 2010

Dear Colleagues,

I’d like to ask for your help with the following, please.

Is there a way to convert jpeg images into doc or pdf or any other format that Studio can handle? I have 18 pages of really interesting text to translate but unfortunately my client only has the hard copies.

He scanned the pages for me and sent them via email so now they are jpeg but that doesn’t really help me as I still can’t get to them with Studio. Which is a shame as they are reasonably repetitive with lots of formatting so Studio could do a nice job with them.

I downloaded the free version of Kleptomania but when I choose 'Select from Scanned/Faxed Image' I get a message that 'Sorry, this is not a true OCR. You need to use the Real OCR like Omni Page.' I haven't found any free true OCRs so it seems like I'm stuck here.

Perhaps Zamzar could convert it but my text is highly confidential so uploading it is not an option...

Has anyone got any ideas as to what could I do? All your advice is very much appreciated...

Many thanks,

Renata


Direct link Reply with quote
 

Miro Pollak  Identity Verified
Slovakia
Local time: 03:23
English to Slovak
+ ...
Well, Renata, Feb 22, 2010

if you can't upload it, I think your only option is to get the "real OCR". You can either get it online or use someone else's computer that has the required software installed.

Not much of a help, I know.

Miro


Direct link Reply with quote
 

umcuz  Identity Verified
Russian Federation
Local time: 06:23
English to Russian
Maybe, this would help? Feb 22, 2010

This is a try&buy version of a real OCR, enough for 50 pages, called ABBYY FineReader 10 Professional Edition: http://finereader.abbyy.com/trial/

Best regards


Direct link Reply with quote
 

Erik Freitag  Identity Verified
Germany
Local time: 03:23
Member (2006)
Dutch to German
+ ...
FineReader Feb 22, 2010

I suggest you buy Abby Finereader. It is widely recognized as one of the best OCR software packages, and it's quite cheap (only 89£).

Direct link Reply with quote
 

Renata Forgacs  Identity Verified
United Kingdom
Local time: 02:23
English to Hungarian
+ ...
TOPIC STARTER
Thanks a lot! Feb 22, 2010

efreitag wrote:

I suggest you buy Abby Finereader.



Thank you so much to all of you for your advice. I will download the Abby Finereader and am considering buying it if it does the job...


Direct link Reply with quote
 

DZiW
Ukraine
English to Russian
+ ...
MS OFFiCE Feb 22, 2010

As far as I remember MS OFFiCE did provide some OCR tools, but it wasn't always installed by default. Once I had to convert all tagged images to TIFF (or something) before it could read it. It's not so versatile as 3rd-parties and there were some misreads even for 400 dpi pics in the final document, but I could easily identify them by MS word itself)

Just check your Office package and Add /Remove, if necessary. At least XP/2003 had it)

Cheers


Direct link Reply with quote
 
yakky  Identity Verified
China
Local time: 10:23
English to Chinese
+ ...
you need high quality picture Feb 22, 2010

Just want to remind you to ensure the quality of these jpg files or you may waste time in correcting OCR results.

Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:23
Member (2006)
English to Afrikaans
+ ...
Type it or OCR it Feb 22, 2010

Renata Forgacs wrote:
Is there a way to convert jpeg images into doc or pdf or any other format that Studio can handle? I have 18 pages of really interesting text to translate but unfortunately my client only has the hard copies.


You can try an OCR program like FineReader, but the quality of the conversion depends on the quality of the JPG.

Alternatively, get a typist you like to work with, let her sign a confidentiality agreement, and give her the job. Pay her between 10 and 20% of your translation rate. A typist who does 90 words per minute (a very common figure) would take an hour and a half to type your 18 pages (assuming 400 words per page).

I haven't found any free true OCRs so it seems like I'm stuck here.


The only true OCR program I know of that is free, is GOCR. It's a command-line tool. It requires fairly good scans to work.

Home page: http://www-e.uni-magdeburg.de/jschulen/ocr/
Download: http://www-e.uni-magdeburg.de/jschulen/ocr/gocr048.exe
User manual: http://www-e.uni-magdeburg.de/jschulen/ocr/gocr-0.48.tar.gz (in the "doc" subfolder, use gocr.html)

If you can't unzip tar.gz files, get 7-zip: http://www.7-zip.org/

You may have to convert the JPG files to PNM before using GOCR. For that, use "djpeg", which is available from this site: http://www.seeingwithsound.com/ocr.htm


Direct link Reply with quote
 

Renata Forgacs  Identity Verified
United Kingdom
Local time: 02:23
English to Hungarian
+ ...
TOPIC STARTER
FineReader works like a charm! Feb 22, 2010

Samuel Murray wrote:

You can try an OCR program like FineReader, but the quality of the conversion depends on the quality of the JPG.


Brilliant! As you all suggested I tried FineReader first; the quality of the JPG was good so it worked like a charm! And I have learnt something new... Thanks guys!


Direct link Reply with quote
 

Pablo Bouvier  Identity Verified
Local time: 03:23
German to Spanish
+ ...
How to convert jpeg into doc or pdf? Feb 22, 2010

Renata Forgacs wrote:

Samuel Murray wrote:

You can try an OCR program like FineReader, but the quality of the conversion depends on the quality of the JPG.


Brilliant! As you all suggested I tried FineReader first; the quality of the JPG was good so it worked like a charm! And I have learnt something new... Thanks guys!



Happy to hear you were able to convert the file. However, what I do not understood is why your client send you a jpeg format, as the standard graphic format for OCR is TIFF. And TIFF format can be used and managed by MS-Office document imaging and exported to MS-Word by mean of the same one.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:23
Member (2006)
English to Afrikaans
+ ...
On TIFF and stuff Feb 22, 2010

Pablo Bouvier wrote:
However, what I do not understood is why your client send you a jpeg format, as the standard graphic format for OCR is TIFF.


1. I don't think there is a standard graphic format for OCR or for scanning. None of the scanners that I have had so far offered TIFF as the default scan format, and I have never had to convert an image to TIFF before feeding it to an OCR program.

2. The OP's client didn't send her files specifically "to be OCR'ed". He simply sent her the files. He scanned it and saved it and sent it. TIFF does not come into play here.

And TIFF format can be used and managed by MS-Office document imaging and exported to MS-Word by mean of the same one.


In order for MS Word to export a file to TIFF, the file has to be in MS Word format first. If I understand the OP correctly, this is precisely what the client did not have. The client had hardcopy only.


Direct link Reply with quote
 

Pablo Bouvier  Identity Verified
Local time: 03:23
German to Spanish
+ ...
How to convert jpeg into doc or pdf? Feb 23, 2010

Samuel Murray wrote:

Pablo Bouvier wrote:
However, what I do not understood is why your client send you a jpeg format, as the standard graphic format for OCR is TIFF.


1. I don't think there is a standard graphic format for OCR or for scanning. None of the scanners that I have had so far offered TIFF as the default scan format, and I have never had to convert an image to TIFF before feeding it to an OCR program.

2. The OP's client didn't send her files specifically "to be OCR'ed". He simply sent her the files. He scanned it and saved it and sent it. TIFF does not come into play here.

And TIFF format can be used and managed by MS-Office document imaging and exported to MS-Word by mean of the same one.


In order for MS Word to export a file to TIFF, the file has to be in MS Word format first. If I understand the OP correctly, this is precisely what the client did not have. The client had hardcopy only.



Maybe, not a standard. But, so far I know, TIFF is a multipage document imaging file format (owned now by Adobe) used not only by scanners, but by faxes too some time ago. This file format may be recognized directly by software scanners, like MSO-Document Imaging (a standard application integrated with MSO) or MT software like ProMT itself.

No need of hardware scanners as MSO is able to index the text embedded in TIFF files with or without OCR:

http://office.microsoft.com/en-us/help/HP030812361033.aspx?pid=CH010000951033 .

Maybe I did not understood the question well. If the client only send hardcopies it is OK. But, if he has had to scan a document, maybe it had been better to send TIFF files as they are far more easely readable and indexable as jpeg or any other grahic formats by mean of software-scanners for the reason explained above. So far I can remember, what I have written before is that a TIFF file format was easely to export to MS-Word and not just the opposite.

By the way, I am not discussing the solution you offered. It did work very well and it solved the problem. I was just wondering why a jpeg and not another graphic file format had been choosen.

[Editado a las 2010-02-23 21:04 GMT]


Direct link Reply with quote
 
Miroslav Jeftic  Identity Verified
Local time: 03:23
English to Serbian
+ ...
:) Feb 23, 2010

If it's a good scan, usually it doesn't matter which format it is, it will be OCR-ed correctly. If it's fax-like quality it's (almost) completely useless, again regardless of the file format.

Direct link Reply with quote
 

Renata Forgacs  Identity Verified
United Kingdom
Local time: 02:23
English to Hungarian
+ ...
TOPIC STARTER
Yes, my client had hardcopy only Feb 24, 2010

Samuel Murray wrote:

If I understand the OP correctly, this is precisely what the client did not have. The client had hardcopy only.



You understood me correctly, Samuel. And thank you for your help; I thought the FineReader was such a smart little software now I am considering buying it.

Pablo, thanks for your advice, too. I will remember what you said about TIFF files next time my client is scanning something.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to convert jpeg into doc or pdf?

Advanced search







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search