OCR to DTP sotware
Thread poster: Jabberwock

Jabberwock  Identity Verified
Poland
Local time: 15:01
Member (2004)
English to Polish
Jan 6, 2005

Is there an OCR software that allows to save in one of popular DTP software formats?

In particular, the following features would be nice:
- separation of graphics to externally linked files
- control over text flows
- designation of special text areas (headers, running heads, page numbers)

I know it is possible to have that functionality by combining several techniques (saving to HTML for graphics, rereading of frames for flows etc.), but it takes a lot of time...


Direct link Reply with quote
 

Doru Voin  Identity Verified
Romania
Local time: 16:01
English to Romanian
+ ...
There are some Jan 6, 2005

Jabberwock wrote:

Is there an OCR software that allows to save in one of popular DTP software formats?


Try ScanSoft Omnipage or Abby FineReader. For more info, search the Proz.com website, the issue has been discussed at least several times before.

Regards,
Doru Voin


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 15:01
Member (2004)
English to Polish
TOPIC STARTER
I did some searching... Jan 6, 2005

I did some searching on the subject, but with no results. I would be grateful for any pointers on such discussions.

FineReader does not allow to save to Corel Ventura, PageMaker, Quark or any other popular formats. It also does not allow to separate graphics (except in HTML), it does not provide for frame flowing control (or I have not found that feature).

I don't know about OmniPage, as it has no demo. However, the list of the features does not indicate it can perform things I expect.


Direct link Reply with quote
 

PAS  Identity Verified
Local time: 15:01
English to Polish
+ ...
Omnipage - maybe Jan 6, 2005

I had a chance to use Omnipage 12 for a while and it can save OCR'd documents as Framemaker (MIF), Ventura Publisher (DOC) and Pagemaker (DOC) files.
Yes, I know - most people do not consider these true DTP software.

However: I never tried it, don't know how well it does the job.

Good luck
Pawel Skalinski

[Edited at 2005-01-06 14:34]


Direct link Reply with quote
 
Ken Cox  Identity Verified
Local time: 15:01
German to English
+ ...
suggestions Jan 7, 2005

I suspect that there's not much demand for this in the professional DTP world, so it's unlikely that a commercial product is available that can do what you want.
A possible DIY solution that would do at least part of what you want would be to export the OCR document in RTF and use a tool to convert the RTF document to tagged text for input to XPress or InDesign. If you are a good programmer (or can find someone who is), it should be possible to make such a tool, although I don't think it's a trivial task. You could also try looking for shareware that can do this (other people may have had the same idea).
Another possibility would be to export the OCR document as PDF and use a tool to convert it directly to XPress or InDesign format. As PDF is becoming a popular output/transfer format in the DTP world, there may be commercial products available that can do this, but they would be expensive.


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 15:01
Member (2004)
English to Polish
TOPIC STARTER
Thanks for the suggestions! Jan 8, 2005

I will keep searching for the solution...

The problem is that the OCR software should be specifically designed with extracting the needed information in mind. The exact format conversion is of secondary importance.

I have thought of a simple example that illustrates my needs: a document which has two columns in two languages on each pages. In DTP it would be quite natural to define the two text flows for each language, so the text might be edited and formatted independently. Getting that from a paper text is quite difficult: the workaround would be to recognize the first column, save it and then recognize the other column. Then any additional frames, tables or pictures might be extracted.

But maybe it is true that such features are not really in demand... DTP usually has the original source files at hand.


Direct link Reply with quote
 

Doru Voin  Identity Verified
Romania
Local time: 16:01
English to Romanian
+ ...
OmniPage Jan 11, 2005

Jabberwock wrote:

I have thought of a simple example that illustrates my needs: a document which has two columns in two languages on each pages. In DTP it would be quite natural to define the two text flows for each language, so the text might be edited and formatted independently. Getting that from a paper text is quite difficult: the workaround would be to recognize the first column, save it and then recognize the other column. Then any additional frames, tables or pictures might be extracted.



I suggest you give a try to Omnipage Pro from Scansoft. With this tool, currently at version 14, you can for instance define zones and extract the correspondin text in separate documents or in the same document, you can setup automatic flows etc.

Regards,
Doru Voin


Direct link Reply with quote
 

Roberta Anderson  Identity Verified
Italy
Local time: 15:01
Member (2001)
English to Italian
+ ...
my Acrobat approach Jan 14, 2005

I use Acrobat a lot, and I would use it (I have the Professional version, I do not know to what extent this would be possible with the cheaper Standard version) to do what you describe in this way:

1. Open the scanned document and use Paper Capture (Acrobat's OCR feature, included in Acrobat 6 as a standard command, available as a free plug-in for Acrobat 5)) to convert from bitmap to text.

2. Use the Article tool to define the different text flows/threads (easy - just drag boxes around the text in the "reading" sequence). In your case, 1 article that covers one language over the various pages, then a second article to cover the second language.

3. Use Iceni Gemini (export plug-in, not included in Acrobat; there may be other similar plug-ins too) to export the separate articles.

4. Use Acrobat's image extraction feature to extract the images.

But I'm sure there are other ways too

cheers,
Roberta


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

OCR to DTP sotware

Advanced search






Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums