Mobile menu

OmegaT/indesign compatibility
Thread poster: Roy Williams

Roy Williams  Identity Verified
Austria
Local time: 19:27
German to English
Oct 20, 2008

Hello all,

I've using Wordfast up until now (for ms office formats) and have started experimenting with OmegaT so that I can work with other file formats. There was no mention of this any of the documentation I've looked through but would anyone know if OmegaT can be used with Indesign and or PDF formats as well?


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 19:27
Member (2007)
English to French
+ ...
InDesign through Rainbow Oct 20, 2008

WilRoy wrote:
I've using Wordfast up until now (for ms office formats) and have started experimenting with OmegaT so that I can work with other file formats. There was no mention of this any of the documentation I've looked through but would anyone know if OmegaT can be used with Indesign

First, InDesign should be exported to the INX format.
Then Rainbow (Okapi) can be used to create an OmegaT project using an intermediate format.

and or PDF formats as well?

What do you call "PDF formats"?
OmegaT cannot read PDF files directly, the content must be extracted or converted (by OCR) first.

Didier


Direct link Reply with quote
 

Roy Williams  Identity Verified
Austria
Local time: 19:27
German to English
TOPIC STARTER
INX Oct 21, 2008

By PDF format I meant PDF files. What is INX?

Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 19:27
Member (2007)
English to French
+ ...
INX is an export/import format Oct 21, 2008

WilRoy wrote:
By PDF format I meant PDF files.

PDF files can either contain text.
In this case, it can be extracted by copy/pasting into Word, for instance. Some reformatting will usually have to be done to get rid of the excess linefeeds.
Or they contain images, and no CAT tool can translate images. The images must be converted to text first, using OCR software.

What is INX?

An XML intermediate format allowing to export and import document in InDesign.

Didier


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:27
Member (2006)
English to Afrikaans
+ ...
Erm... Oct 21, 2008

WilRoy wrote:
By PDF format I meant PDF files.


I know of no CAT tool that can translate PDF files. Not even the mighty Trados can do it. You may be able to translate text extracted from PDF files, and if you're clever you can put the text back yourself using a PDF editor (search the forums), but I know of no CAT tool that offers both extraction and putting it back.

What is INX?


Tell me, how do you translate InDesign files at the moment?


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 21:27
Member (2006)
English to Russian
+ ...
Neither do I, but… Oct 21, 2008

Samuel Murray wrote:

I know of no CAT tool that can translate PDF files.


I vaguely remember some new and bright wannabe program asserting that it supports PDF as input, or so. Maybe, they implemented PDF-to-something conversion on-the-fly? In all respects, though, that program did not look interesting, and I even can't remember its name.


Direct link Reply with quote
 

Roy Williams  Identity Verified
Austria
Local time: 19:27
German to English
TOPIC STARTER
Pdf Oct 21, 2008

In the wordfast documentation it claims to be able to translate PDF's but also states that it "uncertain" as PDF were designed no be write protected. I reasoned that if wordfast could make such a claim, maybe there could be a better tool. I have not had to work with PDF's so I don't know if WF can actually do it.

As for indesign, the company where I work has only recently started using it. At present most of the documentation are still .doc files from which PDF's are created post translation. So to answer your question sam, at the moment I don't translate in Indesign. But with it's increasing use, I thought it would be prudent to find a tool to process said files.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:27
Member (2006)
English to Afrikaans
+ ...
Not Wordfast Oct 21, 2008

WilRoy wrote:
In the Wordfast documentation it claims to be able to translate PDF's but also states that it "uncertain" as PDF were designed no be write protected. I reasoned that if Wordfast could make such a claim, maybe there could be a better tool. I have not had to work with PDF's so I don't know if WF can actually do it.


The Wordfast manual makes no such claims. Can you quote from it? The PlusTools manual does have a section on its PDF conversion functionality. I quote it here in full:

PDF

This pane offers two features: 1. extract textual contents from a PDF document currently opened with Acrobat Reader in the background, and 2. convert text from a currently opened document (typewriter-style, where all lines end with a paragraph mark) into regular text with whole paragraphs.

Both tasks are uncertain. The PDF format was created at first to be a read-only format, this is why it is CAT tool-unfriendly. Extracting text from Acrobat Reader is therefore uncertain.

Re-creating whole paragraphs in a document where each line ends with a paragraph mark (carriage return) is also an uncertain task for a machine, since it supposes an understanding of the text. A 90% success rate is usually achieved, however.


As for InDesign, the company where I work has only recently started using it. ... So to answer your question sam, at the moment I don't translate in Indesign.


Get your hands on a copy of it and find out how to export and import INX files. Then show the graphic people how to do it.


Direct link Reply with quote
 

Roy Williams  Identity Verified
Austria
Local time: 19:27
German to English
TOPIC STARTER
OK PlusTools then Oct 22, 2008

[quote]Samuel Murray wrote:



[i]PDF

This pane offers two features: 1. extract textual contents from a PDF document currently opened with Acrobat Reader in the background, and 2. convert text from a currently opened document (typewriter-style, where all lines end with a paragraph mark) into regular text with whole paragraphs.

Both tasks are uncertain. The PDF format was created at first to be a read-only format, this is why it is CAT tool-unfriendly. Extracting text from Acrobat Reader is therefore uncertain.


The text you quoted is actually what I was refering to when talking about the PDF files, I keep the plustools, wordfast documentation and training manual in a folder I refer to as wordfast docs. Because it seems like a somewhat time-intensive process with "uncertain" results, I haven't tried it. So I thought if one tool had a method of working with PDF, howerver uncertain, perhaps there was another one that could but with solid results.

Didier thanks for answering, the information you provided has proven most useful.


[Edited at 2008-10-22 05:51]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:27
Member (2006)
English to Afrikaans
+ ...
Okay, let me put it this way Oct 22, 2008

WilRoy wrote:
So I thought if one tool had a method of working with PDF, howerver uncertain, perhaps there was another one that could but with solid results.


It is my understanding that any non-OCR method of extracting text from a PDF will be flawed, because the paragraph reorderiser has to guess, based on certain rules made up by the programmer.

I have used the PlusTools method a number of times and I'm quite happy with the results, especially when the PDF is fairly simple. For shorter documents, I prefer to select and copy text by hand, for more control.


Direct link Reply with quote
 

Roy Williams  Identity Verified
Austria
Local time: 19:27
German to English
TOPIC STARTER
Hmm Oct 23, 2008

Ok so I tried extracting text from a PDF with PlusTools and The extraction itselfe is not as time intensive as I thought after reading the manual. The Problem though is none of the formating was preserved; all text (content directory, text from tables, etc.) were simply left justified. Is that limit to PlusTools ability for this particular task?

Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:27
Member (2006)
English to Afrikaans
+ ...
Tables etc Oct 23, 2008

WilRoy wrote:
The Problem though is none of the formating was preserved; all text (content directory, text from tables, etc.) were simply left justified. Is that limit to PlusTools ability for this particular task?


Yes. PlusTools may in certain circumstances retain the character formatting, but not layout formatting. That is too difficult to guess correctly. Tables etc... forget about it. If you want a file with tables intact, pay your $9 per month here:

http://www.freepdfconvert.com/membership.asp

But even there you still need to do some post-formatting (eg removing superfluous tabs, superfluous hard returns etc).


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT/indesign compatibility

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs