Trouble with DocBook
Thread poster: Jan Cerny

Jan Cerny
Austria
Feb 11, 2015

Hi,

for the past few hours I have been unsuccessfully trying to import various Docbook XML documents into OmegaT 3.1.8.

In some cases OmegaT tells me that no files in a supported format are included in my project.
Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

All in all I am not making much progress here. I have tried using files belonging to my own documentation environment (which I can open fine in XMetaL & which are based on DocBook 4.2). I also tried out various DocBook XML files which are included in other tools or which I found on websites.

I even downloaded a test project from an OmegaT bug report:
http://sourceforge.net/p/omegat/bugs/636/
Importing the project attached in that bug report also leads to the "contains invalid characters" error though it plainly must have worked for another user at some time.

I fear I am missing something pretty basic hereicon_smile.gif. Any suggestions would be greatly appreciated.

Or can someone perhaps provide me with a working DocBook example which I can import? I could have a look at the syntax to see why my own files do not work.


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 22:10
Member (2007)
English to French
+ ...
The OmegaT documentation translation kit is in DocBook Feb 11, 2015

Jan Cerny wrote:
In some cases OmegaT tells me that no files in a supported format are included in my project.

That could be the case if the DocBook header is not what is expected.
The pattern we use for DocBook 4 is -//OASIS//DTD DocBook.*

Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

Just to be sure: try to have a very short path (e.g., c:\test\) and with no "exotic" character in the filename. That's not something specific to OmegaT, but a combination of Java and operating system limitations.

All in all I am not making much progress here. I have tried using files belonging to my own documentation environment (which I can open fine in XMetaL & which are based on DocBook 4.2).

The OmegaT documentation is based on 4.5, but that shouldn't be very different.

Or can someone perhaps provide me with a working DocBook example which I can import? I could have a look at the syntax to see why my own files do not work.

You are lucky: the OmegaT documentation translation kit contains precisely that:
https://sourceforge.net/projects/omegat/files/Other%20-%20Localization%20projects/OmegaT%203.1.8/
"Minimal" contains just one DocBook document, "Full" contains the complete documentation.

Didier


Direct link Reply with quote
 

Jan Cerny
Austria
TOPIC STARTER
Entity problems Feb 12, 2015

Hi,

thank you a lot for your answer. The path really was the cause of the problem. It contained a [ and a ]. When I removed the square brackets it started working.

I have now begun to successfully load more and more files of my documentation environment into OmegaT.

One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of a text entity is displayed within the editor as a normal text string (e.g "Coca Cola") and translatable. Is there a way to only display the entity (e.g "&ProductName;") instead and make it write-protected?

When I generate the translated documents, they contain the DTD file content in the translated document (same problem as in the bug report I linked in my first post). Also, the output files again contain the "Coca Cola" value instead of the entity.

Documentation Environment without DTD
If I only import XML files, entities are represented as blanks. This is also not ideal, because you miss them while translating. The output files also contain blanks where the entities should be.

Is there a way around these problems?


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 22:10
Member (2007)
English to French
+ ...
Entities Feb 13, 2015

Jan Cerny wrote:
One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of a text entity is displayed within the editor as a normal text string (e.g "Coca Cola") and translatable. Is there a way to only display the entity (e.g "&ProductName;") instead and make it write-protected?

No (I'm talking from a user point of view).

When I generate the translated documents, they contain the DTD file content in the translated document (same problem as in the bug report I linked in my first post).

If the bug report is still open, that's because it is still valid.

Also, the output files again contain the "Coca Cola" value instead of the entity.

What you see in the Editor is what you will get in the target document.

Documentation Environment without DTD
If I only import XML files, entities are represented as blanks. This is also not ideal, because you miss them while translating. The output files also contain blanks where the entities should be.

Yes, that's not a very satisfying solution.

Is there a way around these problems?

I cannot think of any that doesn't involve pre- and post-processing the documents.
For instance, replace &my-entity; with #my-entity; in source documents, and do the reverse operation in target documents.

By doing so, you can have your "entities" identified as tags in OmegaT. In Options > Tag Validation, enter #.*?; as the regular expression for custom tags.

Didier


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Trouble with DocBook

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search