ProZ.com global directory of translation services
 The translation workplace
Ideas

 
User
Thread poster: DavidCanek
What is a good OCR software for Japanese?

DavidCanek
Czech Republic
Local time: 04:25
English to Czech
Feb 9

Hi,

Could anyone recommend a good software product to OCR a bunch of PDFs in Japanese?

Thanks,
David


Direct link Reply with quote
 

DZiW
Local time: 05:25
English to Russian
+ ...
FineReader Feb 9

Hello David.

The problem with hieroglyphs is they usually require 300+ DPI...

So, ABBYY FineReader might be of help. Although I don't use it for Oriental languages , but it proved to be one of the best OCR's atm.


Direct link Reply with quote
 

D. B. Slavenskoj
United States
Russian to English
+ ...
Tesseract Feb 9

Tesseract has language data files for Japanese, and can be trained as well. See: http://en.wikipedia.org/wiki/Tesseract_(software)

Direct link Reply with quote
 

Kirti Vashee
United States
Local time: 19:25
Free OCR Feb 9

This project sounded promising but it is important to remember that source files that have less than 300 dpi are likely to be unsuccessful in most OCR packages

http://www.reviewmylife.co.uk/blog/2010/06/08/free-japanese-ocr-translation/


Direct link Reply with quote
 

Kirti Vashee
United States
Local time: 19:25
More OCR options for JA Feb 9

First, there's 読んde!!ココ v.13. (Windows only.) Here's the basic info:

Main Page:
http://ai2you.com/ocr/

More Details:
http://ai2you.com/ocr/product/koko13/workings.asp

Free Trial:
http://ai2you.com/ocr/product/koko13/trial01.asp

Buy it here:
http://ai2you.com/shopai2you/ocr/koko13.asp

Works with TWAIN scanners and WIA scanners, will play nicely all the way to Win 7 64.

It claims to handle smudged kanji and underlined words, and has a learning mode. Plus, it has a bunch of built-in dictionaries to help with recognition. It claims to be able to handle both kana, kanji, and alphanumeric text on the same page as well, something that ReadIris choked on frequently when I used it.

If it does what it claims to, then it would be a heck of a lot better than anything IRIS puts out, for a lot cheaper. ~13,000 yen for the full download version. 20,000 yen if you want a box. Interface is all Japanese.


The other software I'm looking at is e.Typist (Windows only, supports Mac via Boot Camp... I think. It's vague about Mac support.) :

Main Page-- details along the sidebar links:
http://mediadrive.jp/products/et/index.html

Try the Eval version here:
http://mediadrive.jp/products/et/index8.html

Buy the Download version for cheap here:
http://shop.mediadrive.jp/item_list.htm … p;request=

Looks pretty similar to 読んde!!ココ, feature-wise, with a few notable exceptions. First, you can buy the Neo edition which only does EN and JP for 9800 yen (download), or the standard edition which does a bunch of languages for ~13000 yen (download). If you buy at a store, expect to pay 13,000/20,000 yen. Discounts for downloading are nice here, just like 読んde!!ココ. The Neo feature is nice if you don't care about other languages.

Otherwise, it seems to do just about everything that 読んde!!ココ does, with a few exceptions. First, it has a "preview mode" where it superimposes what it thinks it sees over the text it scans, so you can correct it. Also, it doesn't say whether it supports WIA scanners. It's vague about that. It says it supports Win 7 64, but it's kind of sketchy about which scanners it supports. I guess I'll try the eval version first to see if it likes my Brother MFC 7840.

Both handle image files, PDFs, scans, photos, and various input devices, and will output to txt, rtf, excel, word, etc., with some variations between the two. Check the websites to see if your flavors are supported.

Both have large dictionaries, and it looks like both support learning modes for Japanese, which ReadIris does not.

And if you want Free Japanese OCR, there's this thread here:
http://forum.koohii.com/viewtopic.php?id=2608


Direct link Reply with quote
 

Kirti Vashee
United States
Local time: 19:25
Japanese OPtions Feb 9

These are JA user interface options

http://search.vector.co.jp/vsearch/vsearch.php?query=OCR


Direct link Reply with quote
 

DavidCanek
Czech Republic
Local time: 04:25
English to Czech
TOPIC STARTER
Thanks! Feb 9

Thanks for all the tips!

David


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Catherine Piéret[Call to this topic]
Maria Castro[Call to this topic]

You can also contact site staff by submitting a support request »

What is a good OCR software for Japanese?







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL Trados Studio 2011 Starter Edition
Discover Studio 2011 for only 99€ per year!

SDL Trados Studio 2011 Starter Edition is the new low cost entry-level version of the leading translation memory software. This version is ideal for part-time translators and is a subscription based product. Follow the link to buy or learn more.

More info »