OmegaT tmx parsing
Thread poster: pmrozik
pmrozik
Ireland
Local time: 00:00
English to Polish
Sep 14, 2005

I'm working on my first large translation job and I was happy to find that OmegaT is under GNU public license. I'd use Wordfast, but I can't afford to buy it at the moment. Below you'll find some background info on the problem, the problem, and the workaround:

The company that had given me the translation job uses Metatexis to create the TMX files. Now I've never used Metatexis but here are some facts on the TMX file:

-It uses UTF-16 encoding
-It is in accordance with TMX 1.1

Somewhere on this forum I found a post in which someone mentioned that UTF-16 encoding is not supported by OmegaT so I converted the document to UTF-8 using SC UniPad but that still didn't make it work.

I experimented a bit and noticed that OmegaT suddenly came to life when I switched the tag of the MetaTexis TMX file from:



to



because that's the way it is written in all TMX files that OmegaT creates.

Now, I'm not exactly sure why it needs to be pl-PL. Is the first the keyboard layout and second the ISO code for the language? Or do ISO codes have two parts to them?

Either way it's either OmegaT not parsing it correctly or MetaTexis not being in accordance with standards.

Anyway, if anyone had come across the problem then the solution is:

1. Convert the file to UTF-8 encoding
2. Replace tags to

That's all it takes. The OmegaT parser needs a major overhaul.

Pawe³


Direct link Reply with quote
 

Rodolfo Raya  Identity Verified
Local time: 20:00
English to Spanish
OmegaT does not support TMX standard Sep 14, 2005

pmrozik wrote:

Either way it's either OmegaT not parsing it correctly or MetaTexis not being in accordance with standards.


Hi Pawel,

OmegaT does not have a real TMX parser. It reads TMX files as plain text, not as XML, and does a lousy job. If you have a TMX 1.1 file with inline information OmegaT either drops it or crashes.

Additionally, the tags generated by OmegaT are completely invalid and you can't reuse the translations memories from OmegaT in other TMX compliant applications because all inline information appears as garbage in the middle of the translatable text.

OmegaT can only be used if you don't need to deliver a translation memory in TMX format together with the translated documents.

Regards,
Rodolfo


Direct link Reply with quote
 
Sonja Tomaskovic  Identity Verified
Germany
Local time: 01:00
English to German
+ ...
.. Sep 15, 2005

Rodolfo Raya wrote:

Additionally, the tags generated by OmegaT are completely invalid and you can't reuse the translations memories from OmegaT in other TMX compliant applications because all inline information appears as garbage in the middle of the translatable text.


Sorry, this is not completely true. You can remove internal tags from OmegaT with the help of a macro, and then deliver your TMX to your client.



OmegaT can only be used if you don't need to deliver a translation memory in TMX format together with the translated documents.


Again, not completely true. I have already reused OmT TMX files with other applications, and had no problem whatsoever.

But - to get back to your initial problem - I assume that you are using the latest OmT release. Lately we found a bug in the way OmT imports TMX files from third-party applications.

If you need to reuse the memory from your client, I suggest you try version 1.4.5 Beta 1. You can find this one on sourceforge or in the files section of the OmT usergroup at Yahoo.

HTH.

Regards,
Sonja


Direct link Reply with quote
 
xxxMarc P  Identity Verified
Local time: 01:00
German to English
+ ...
Clarification Sep 15, 2005

Rodolfo Raya wrote:

OmegaT does not have a real TMX parser. It reads TMX files as plain text, not as XML


Whether OmegaT has a "real TMX parser" or not depends upon your definition of "real TMX parser". OmegaT can of course read TMX files, and whether the TMX parser is a "real" one is really only of interest to programmers.

As many users will have noticed, OmegaT's functionality has been improved considerably in recent months. In the course of these improvements, a bug found its way into the parser which may prevent TMX files produced by other CAT tools from being read. Obviously, this bug will hopefully be fixed shortly, but since OmegaT is maintained by volunteers, it may take us longer to fix bugs than it takes commercial vendors, especially those with product names beginning with "H".

The last version without this bug was 1.4.5 Beta 1, which can be downloaded from the OmegaT user group site on Yahoo!. OmegaT users having problems importing TMX files from other CAT tools may prefer to use this version. Obviously, it is missing some of the features of more recent versions, notably sentence-level segmenting, though this feature can be obtained by using an external utility such as Sentseg.

you can't reuse the translations memories from OmegaT in other TMX compliant applications because all inline information appears as garbage in the middle of the translatable text.


This is not correct. Firstly, it is true that if you use OmegaT TMX files in other CAT tools "as is", superfluous tags will appear in the matches. This does not, however, prevent you from using them! You can still read the textual information without difficulty, which is what translators are interested in. What you cannot do is exploit *formatting* information in legacy TMX files from other CAT tools.

Secondly, if the inline "garbage" troubles you, an external utility written by Sonja Tomaskovic can be used to delete it.

OmegaT can only be used if you don't need to deliver a translation memory in TMX format together with the translated documents.


Again, this is not correct. If you deliver an OmegaT TMX memory, other applications which support TMX will be able to read it, and the information it contains will be useful to other translators. Problems arise only if your customer requires a TMX Level 2 memory, i.e. one containing formatting information.

In short, OmegaT is a CAT tool which is designed for practising translators who want to have their past translations at their fingertips, rather than for translation buyers who wish to be able to have as much of a new text as possible translated automatically.

Marc


Direct link Reply with quote
 

liciamilo
Local time: 00:00
English to Italian
+ ...
problems importing TM into OmegaT Nov 30, 2011

Hi,

I was sent a tmx file (SDL, level 1, utf-8 encoding) alongside a text to translate. I moved it into the /tm folder, made sure that the languages where called the same in the tmx file and in my project, but still OT doesn't seem able to connect to the tmx file and search it.

I must be making a mistake (I'm quite new to the business) but I can't figure out what it is.

Any ideas/suggestions?

Thank you!

Li


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 01:00
Member (2007)
English to French
+ ...
Check the log file, ask in the support group Dec 1, 2011

You did dig an old post!

liciamilo wrote:
I was sent a tmx file (SDL, level 1, utf-8 encoding) alongside a text to translate. I moved it into the /tm folder, made sure that the languages where called the same in the tmx file and in my project, but still OT doesn't seem able to connect to the tmx file and search it.

I must be making a mistake (I'm quite new to the business) but I can't figure out what it is.

Any ideas/suggestions?

Do you have an error message?
TMXs produced by SDL/Trados often contain illegal XML characters.

What version of OmegaT are you using?

If running 2.3, you could check the log file (look at the user manual to find the location on your system) to see whether your TMX is loaded.
(In 2.5, TMX loading information is temporarily missing in the log.)

Ultimately, the more practical solution is to subscribe to the OmegaT Yahoo support group, where people will be able to give further advice and analyse your TMX if needed.

Didier


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 02:00
Member (2006)
English to Russian
+ ...
Is it… Dec 2, 2011

liciamilo wrote:

I was sent a tmx file (SDL, level 1, utf-8 encoding)


really UTF-8? As I remember, SDL exports to UTF-16. Make sure, it’s a) real UTF-8, and b) the encoding is correctly declared in the file header. To do that, open the file in any good text editor.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 01:00
Member (2006)
English to Afrikaans
+ ...
Check for illegal characters Dec 2, 2011

Didier Briel wrote:
TMXs produced by SDL/Trados often contain illegal XML characters.


Yes. Here is a little program that fixes some of them:
http://wikisend.com/download/314158/tmxfixerbasic.zip
Run your TM through the TMXfixer (use the silent one first), and see if OmegaT accepts it.


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT tmx parsing

Advanced search






PDF Translation - the Easy Way
TransPDF converts your PDFs to XLIFF ready for professional translation.

TransPDF converts your PDFs to XLIFF ready for professional translation. It also puts your translations back into the PDF to make new PDFs. Quicker and more accurate than hand-editing PDF. Includes free use of Infix PDF Editor with your translated PDFs.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs