Character Recognition Program that's Word-Compatible
Thread poster: BrianHayden
BrianHayden
United States
Russian to English
Jan 2, 2014

Is there anyway I could scan the pages of a dictionary, then convert them into a (massive) file on Word? If so, what would be the cheapest and simplest way?

Direct link Reply with quote
 

Vadim Kadyrov  Identity Verified
Ukraine
Local time: 07:02
Member (2011)
English to Russian
+ ...
Yes, you can Jan 2, 2014

The best application (I believe) is Abbyy Finereader (you can use the 8th version, it should be much cheaper than the newest one). You just scan pages into jpeg files and then use this application to OCR the images.

Still, this is an extremely time-consuming task. Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.


Direct link Reply with quote
 
BrianHayden
United States
Russian to English
TOPIC STARTER
More detail... Jan 2, 2014

I should probably better explain what my plan -- feasible or unfeasible though it may be -- is. I like Microsoft Word, and I think it's fairly straightforward to use. I've been keeping a dictionary of idioms as a Word file, adding new entries as I encounter new new idioms. Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search for a word within the phrase, which is easier than looking through all the words of an idiom separately in a standard dictionary, which still may not list the idiom. I've recently found an especially good dictionary with a lot of idioms -- and I wanted to scan that in and add it to the Word file, somehow. Hand-typing the entries from the dictionary would be murderous. Anything that would be less laborious than hand-typing is okay in my book.

And I forgot to mention that I need a program that can read Cyrillic -- since this is a dictionary, I also need a program that can read Cyrillic with accent marks. Does Abby FineReader do that? And is it user-friendly?

[Edited at 2014-01-02 08:38 GMT]

[Edited at 2014-01-02 08:39 GMT]

[Edited at 2014-01-02 08:39 GMT]


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 06:02
English to German
OCR needs know-how Jan 2, 2014

[quote]Vadim Kadyrov wrote:

You just scan pages into jpeg files and then use this application to OCR the images.


This is possible, but must be done cautiously. JPG files can (and are if you use default settings) be non-lossless compressed, so that the OCR results will not be optimal. BTW, any OCR application should be able to use scanner input directly – no need to scan beforehand.

Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.


??? For the mentioned purpose, you probably don't want to reproduce the original layout but a clean table (one table row per dictionary item).

In the worst case you have to mark up the columns manually (in the OCR software) and ignore all the remaining. Such markup takes about 30 seconds per page, so 240 pages take 2 hours. In many cases the OCR software will do that automatically, though.

Depending on the dictionary you might have to write a Word macro that tidies up the resulting Word table. This might take one hour or one day.


Direct link Reply with quote
 

Vadim Kadyrov  Identity Verified
Ukraine
Local time: 07:02
Member (2011)
English to Russian
+ ...
The thing I suggested Jan 2, 2014

[quote]Rolf Keller wrote:

Vadim Kadyrov wrote:

You just scan pages into jpeg files and then use this application to OCR the images.


This is possible, but must be done cautiously. JPG files can (and are if you use default settings) be non-lossless compressed, so that the OCR results will not be optimal. BTW, any OCR application should be able to use scanner input directly – no need to scan beforehand.

Even the best OCR applications won`t be able to perfectly reproduce the layout of dictionary pages.


??? For the mentioned purpose, you probably don't want to reproduce the original layout but a clean table (one table row per dictionary item).

In the worst case you have to mark up the columns manually (in the OCR software) and ignore all the remaining. Such markup takes about 30 seconds per page, so 240 pages take 2 hours. In many cases the OCR software will do that automatically, though.

Depending on the dictionary you might have to write a Word macro that tidies up the resulting Word table. This might take one hour or one day.



The thing I suggested is a general scenario, with all the details to be discussed (or suggested) later on. The thing I assumed when I saw the message of the topic starter was his wish to reproduce the hard copy of the dictionary in electronic form (ok, some old and really precious edition of this dictionary).

In case he wants only some entries from this dictionary to be digitalized, the task becomes much easier, of course.

Some words about jpeg images. In case the resolution is high, quality-related issues of this file type no longer matter, I believe.

But these are details. I think the topic starter has already seen the "path".


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 08:02
Member (2006)
English to Russian
+ ...
No Jan 2, 2014

BrianHayden wrote:

Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search


If you use one dictionary, this may be fine. However, a translator normally needs more than one dictionary. In such a case, using a dictionary shell is a better solution. My favorite is GoldenDict.

program that can read Cyrillic with accent marks. Does Abby FineReader do that?


No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.

[Edited at 2014-01-02 11:38 GMT]


Direct link Reply with quote
 
BrianHayden
United States
Russian to English
TOPIC STARTER
Dictionary Shell? Jan 2, 2014

esperantisto wrote:

BrianHayden wrote:

Keeping a dictionary of idioms and phrases in a Word file is especially convenient, since you can do a Ctrl + F search


If you use one dictionary, this may be fine. However, a translator normally needs more than one dictionary. In such a case, using a dictionary shell is a better solution. My favorite is GoldenDict.

program that can read Cyrillic with accent marks. Does Abby FineReader do that?


No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.

[Edited at 2014-01-02 11:38 GMT]


What is a dictionary shell?


Direct link Reply with quote
 
BrianHayden
United States
Russian to English
TOPIC STARTER
Accent marks... Jan 2, 2014

No, FineReader can't produce good output for accented Cyrillic letters. The versions 8 or 9 simply produce unaccented letters, the later 10 and 11 produce recognition errors.

[Edited at 2014-01-02 11:38 GMT] [/quote]

Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок, замо́к).


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 08:02
Member (2006)
English to Russian
+ ...
Answers Jan 3, 2014

BrianHayden wrote:

What is a dictionary shell?


Well, a dictionary program. A program used to access dictionaries.

BrianHayden wrote:

Is there any way around that? It seems that a product that complicated would have some sort of way of dealing with that, especially since in Russian accent marks are occasionally used to disambiguate words in everyday, non-dictionary texts (think of за́мок, замо́к).


No idea. FineReader can be trained to recognize specific languages with specific characters, but I don’t know if it’s applicable to Russian accents as there are no pre-composed accented Cyrillic letters in Unicode.


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 06:02
Member (2010)
Spanish to English
Russian is in the drop-down list of languages in Abbyy Jan 3, 2014

esperantisto wrote:

No idea. FineReader can be trained to recognize specific languages with specific characters, but I don’t know if it’s applicable to Russian accents as there are no pre-composed accented Cyrillic letters in Unicode.


I've got no idea either, but Russian is definitely included in the list of languages that Abbyy will recognise. (Version 11.0)

You can also add a host of symbols/letters as a "user language". For example, I've added µ, α and β because Abbyy doesn't recognise them out of the box.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Character Recognition Program that's Word-Compatible

Advanced search






Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search