Trouble with DocBook
Thread poster: Jan Cerny

Jan Cerny
Austria
Feb 11, 2015

Hi,

for the past few hours I have been unsuccessfully trying to import various Docbook XML documents into OmegaT 3.1.8.

In some cases OmegaT tells me that no files in a supported format are included in my project.
Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

All in all I am not making much progress here. I have tried using files belonging to my own documentation environment (which I can open fine in XMetaL & which are based on DocBook 4.2). I also tried out various DocBook XML files which are included in other tools or which I found on websites.

I even downloaded a test project from an OmegaT bug report:
http://sourceforge.net/p/omegat/bugs/636/
Importing the project attached in that bug report also leads to the "contains invalid characters" error though it plainly must have worked for another user at some time.

I fear I am missing something pretty basic hereicon_smile.gif. Any suggestions would be greatly appreciated.

Or can someone perhaps provide me with a working DocBook example which I can import? I could have a look at the syntax to see why my own files do not work.


 

Didier Briel  Identity Verified
France
Local time: 16:13
Member (2007)
English to French
+ ...
The OmegaT documentation translation kit is in DocBook Feb 11, 2015

Jan Cerny wrote:
In some cases OmegaT tells me that no files in a supported format are included in my project.

That could be the case if the DocBook header is not what is expected.
The pattern we use for DocBook 4 is -//OASIS//DTD DocBook.*

Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

Just to be sure: try to have a very short path (e.g., c:\test\) and with no "exotic" character in the filename. That's not something specific to OmegaT, but a combination of Java and operating system limitations.

All in all I am not making much progress here. I have tried using files belonging to my own documentation environment (which I can open fine in XMetaL & which are based on DocBook 4.2).

The OmegaT documentation is based on 4.5, but that shouldn't be very different.

Or can someone perhaps provide me with a working DocBook example which I can import? I could have a look at the syntax to see why my own files do not work.

You are lucky: the OmegaT documentation translation kit contains precisely that:
https://sourceforge.net/projects/omegat/files/Other%20-%20Localization%20projects/OmegaT%203.1.8/
"Minimal" contains just one DocBook document, "Full" contains the complete documentation.

Didier


 

Jan Cerny
Austria
TOPIC STARTER
Entity problems Feb 12, 2015

Hi,

thank you a lot for your answer. The path really was the cause of the problem. It contained a [ and a ]. When I removed the square brackets it started working.

I have now begun to successfully load more and more files of my documentation environment into OmegaT.

One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of a text entity is displayed within the editor as a normal text string (e.g "Coca Cola") and translatable. Is there a way to only display the entity (e.g "&ProductName;") instead and make it write-protected?

When I generate the translated documents, they contain the DTD file content in the translated document (same problem as in the bug report I linked in my first post). Also, the output files again contain the "Coca Cola" value instead of the entity.

Documentation Environment without DTD
If I only import XML files, entities are represented as blanks. This is also not ideal, because you miss them while translating. The output files also contain blanks where the entities should be.

Is there a way around these problems?


 

Didier Briel  Identity Verified
France
Local time: 16:13
Member (2007)
English to French
+ ...
Entities Feb 13, 2015

Jan Cerny wrote:
One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of a text entity is displayed within the editor as a normal text string (e.g "Coca Cola") and translatable. Is there a way to only display the entity (e.g "&ProductName;") instead and make it write-protected?

No (I'm talking from a user point of view).

When I generate the translated documents, they contain the DTD file content in the translated document (same problem as in the bug report I linked in my first post).

If the bug report is still open, that's because it is still valid.

Also, the output files again contain the "Coca Cola" value instead of the entity.

What you see in the Editor is what you will get in the target document.

Documentation Environment without DTD
If I only import XML files, entities are represented as blanks. This is also not ideal, because you miss them while translating. The output files also contain blanks where the entities should be.

Yes, that's not a very satisfying solution.

Is there a way around these problems?

I cannot think of any that doesn't involve pre- and post-processing the documents.
For instance, replace &my-entity; with #my-entity; in source documents, and do the reverse operation in target documents.

By doing so, you can have your "entities" identified as tags in OmegaT. In Options > Tag Validation, enter #.*?; as the regular expression for custom tags.

Didier


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Trouble with DocBook

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search