Converting text in jpeg format to Word or rtf?
Thread poster: Peter Beauclerk DTCM, CPT

Peter Beauclerk DTCM, CPT
United States
Local time: 03:27
French to English
Nov 24, 2007

I have received several pages for translation in jpeg format- does anyone know how I can convert them into an editable text format?

Direct link Reply with quote
 

Mulyadi Subali  Identity Verified
Indonesia
Local time: 17:27
English to Indonesian
+ ...
ocr Nov 24, 2007

assuming it's not hand-written, you can use any ocr program. many recommend abbyy fine reader.

Direct link Reply with quote
 
xxxBrandis
Local time: 12:27
English to German
+ ...
few steps and copy editing Nov 24, 2007

Hi1 Generally the client´´s never mention this or do not know themselves. Steps are
1 .jpeg 2 .pdf
2 .pdf 2 ocr into word and copy edit comparing with the original .jpeg. and translate . I normally convert the target file into a .jpeg and send back.

But this is separate process and not related to translation at all. There are quite a few data converting comapnies that do a professional job.

Brandis


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 12:27
Member (2006)
English to Afrikaans
+ ...
Print and type Nov 24, 2007

Peter Beauclerk MS LAc wrote:
I have received several pages for translation in jpeg format- does anyone know how I can convert them into an editable text format?


Print them out and let someone type them in.


Direct link Reply with quote
 

a2ztranslate  Identity Verified
New Zealand
Local time: 23:27
English to Japanese
+ ...
depends if text has been outlined first Nov 24, 2007

which it probably has before creating jpeg; try OCR but be real careful on the double check as OCR will often miss content, especially if there are graphics within graphics.

for smaller files it is often just quicker to re-type the master.


Direct link Reply with quote
 

Bogdan Burghelea  Identity Verified
Romania
Local time: 13:27
English to German
+ ...
optical character recognition Nov 24, 2007

Peter Beauclerk MS LAc wrote:

I have received several pages for translation in jpeg format- does anyone know how I can convert them into an editable text format?


Use an OCR program. I highly commend Abbyy FinerReader, even in the free Sprint version. Better, still, Professional.


Direct link Reply with quote
 

Peter Linton  Identity Verified
Local time: 11:27
Member (2002)
Swedish to English
+ ...
Nuance OmniPage Nov 24, 2007

I use an OCR product called Nuance OmniPage (curently version 15 or 16). Very effective and converts JPEG directly to DOC. In fact converts a wide range of graphics files to text, including PDF.

Similar to ABBYY FineReader. A rcent computer magazine survey found them both very good. An essential tool these days - gives you a competitive edge. You may also be able to charge separately for the process, by emphasing the benefits of handing everything electronically. I have certainly recouped the cost of the software.

But as others have mentioned, the text needs to be carefully checked.


Direct link Reply with quote
 

Igor Indruch  Identity Verified
Czech Republic
Local time: 12:27
English to Czech
OCR Nov 24, 2007

Recent versions of OCR programs work very good. I have IRIS, which is bundled to some HP scanners, and it is OK: Recently I needed to OCR some tiffs, and did not have my computer, so I downloaded this:

topocr

it is shareware, so the usage is limited (number of use, not functions). Small package, so quick download. Very easy to use - no need to read manual. It did the job.


Direct link Reply with quote
 
Haiyang Ai  Identity Verified
United States
Local time: 05:27
English to Chinese
+ ...
Manually type them out Nov 24, 2007

Samuel Murray wrote:
Print them out and let someone type them in.


That's probably the easiest way to do it. If you type it yourself, can you write that in the invoice?

Regards,
Haiyang


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 06:27
English to French
+ ...
Ditto Nov 24, 2007

Peter Linton wrote:

I use an OCR product called Nuance OmniPage (curently version 15 or 16).


It's currently at version 16 (came out a few months ago).

Many people think that OCR is tricky because the software will try recognizing text in images or even the images themselves (with, as a result, the usual useless garbles we OCR users are familiar with). I just wanted to say that OmniPage lets you manually process documents, which in turn will tell the software which part of a page should be processed in what way. You can "zone" images so they are left out of the recognition process (you can do this with any type of content, so if you want to exclude footers, for example, it is easy to do). This ensures that images that are not meant to be recognized are simply carried forward into the recognition results without any further recognition, which in turn eliminates the above-mentioned garbles. This is handy with documents that contain lots of images, like for example software manuals, but even more so when it comes to PDFs in image format (scanned or even handwritten documents). OmniPage also recognizes handwriting, although it cannot be perfect in that field, just like any other software. Also, for images, especially when you need to recognize text within images, OmniPage lets you process the images (sharpness, contrast, etc.) to make the initial image prepared for recognition easier to handle and optimize recognition performance (I usually make all images grayscale and increase the contrast). There's even a built-in text editor which will let you correct OCR results before OmniPage exports them into the final format (for our purposes, this would be Word .doc, usually).

However, the learning curve of OmniPage is steep. It takes some time before getting a hang of simple functioning, and then some to learn the tricks that will let you get the most out of the software. But I definitely think that once you get past Nuance's lousy costumer service and the learning period, OmniPage truly is an OCR software that is at least one step ahead of the bunch. It offers unique features not found in other software (advanced zoning is one of them - you can precisely determine which parts of a page will be recognized and how they will be processed, and zoning typically doesn't take long) and it is really great at what it was initially invented for, aside from handy unique features, which is to recognize text.

The investment is sizable (on the website, these babies go for around USD 500) but if you are willing to tame the software, the benefits can far outweigh the investment.

[Edited at 2007-11-24 15:45]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 12:27
Member (2006)
English to Afrikaans
+ ...
Typing included in invoice Nov 24, 2007

Haiyang Ai wrote:
If you type it yourself, can you write that in the invoice?


If you want to... but you must tell the client beforehand that you'll charge extra for the typing. And some clients might try to save that money by self-OCR'ing it and the result will be rubbish.

So my advice is to quote higher for such texts that are not editable, and then pay the typist from your fees. If the client wants you to sign an NDA that excludes using a typist, refuse the job or get the typist to sign the NDA too.

You can easily pay a typist 10% of your own translation rate per word (just an example).


Direct link Reply with quote
 

Peter Beauclerk DTCM, CPT
United States
Local time: 03:27
French to English
TOPIC STARTER
Thanks for all responses- good solution found Nov 27, 2007

Abby Scan to Word Wizard worked very nicely & was easy to use- when set for jpeg it rendered the text plus graphics faithfully & at $40 it won't bust my budget.
cheers,
Peter


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting text in jpeg format to Word or rtf?

Advanced search






Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search