Pages in topic:   [1 2] >
Fixing a tmx to create a Muse - help needed
Thread poster: Olly Pekelharing

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
Mar 18, 2013

I have a large TMX of some 65000 entries, generated by Trados. I can import it into Trados (old and new) without errors and also generate an autosuggest dictionary from it. I can also import it into MemoQ without errors but I cannot use it to create a Muse. Olifant also fails to import it fully (xml errors). Anyone recommend a program to repair this tmx (so I can use it to create a Muse)? I'm not savvy enough to repair it manually, even if I could open it.

Thanks,

Olly


 

Joakim Braun  Identity Verified
Sweden
Local time: 21:38
German to Swedish
+ ...
Re-export Mar 18, 2013

Try re-exporting it from Trados. That might generate a correct TMX file.

You can open and edit TMX files with any plaintext editor, by the way.
Depending on what the XML errors say there might be a simple fix.

[Bearbeitet am 2013-03-18 09:34 GMT]


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
Re: reimport Mar 18, 2013

Have already tried importing and exporting it several times though all the tools at my disposal (Trados, Wordfast, MemoQ, Olifant). The file is to large to be edited with a standard plaintext editor, and anyway I wouldn't know what to edit.

Regards,

Olly


 

Samuel Murray  Identity Verified
Netherlands
Local time: 21:38
Member (2006)
English to Afrikaans
+ ...
Try my TMX fixer script Mar 18, 2013

Olly Pekelharing wrote:
Anyone recommend a program to repair this tmx (so I can use it to create a Muse)? I'm not savvy enough to repair it manually, even if I could open it.


Try my TMX fixer script:
http://leuce.com/autoit/tmxfixerbasic.zip

You must have AutoIt installed to use it. A TMX file produced by Trados is likely in UTF16LE format. Let me know if it works for you (or not).


 

FarkasAndras  Identity Verified
Local time: 21:38
English to Hungarian
+ ...
Options Mar 18, 2013

65000 entries is not that big. Notepad++ would almost certainly be able to handle it.
Ideally, the error message would tell you what to edit. If all it says is "XML error", then you're out of luck. If it says something like "XML error: tu tag not closed at line XXX" or "XML error: character YYY is not UTF-8 at line ZZZ", you know where to look.
If importing to and exporting from Studio doesn't help, you could try the same with apsic xbench. An xbench import-export roundtrip strips everything from the TMX except for the text itself, so it has a good chance of fixing the problem.
In a similar vein, you could try the TMX_to_tabbed utility in my "grab bag" software package at sourceforge.net/projects/aligner, then use the TMX maker in the aligner package to generate a new tmx. If you upload the TMX to dropbox or rapidshare and post a link here, I'll run this conversion for you and you can see if it fixes the problem.

What's a 'muse', by the way?


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
Will let you know Mar 18, 2013

Thanks Samuel. Will give it a go and let you know.

Olly


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
tmxfixer Mar 18, 2013

Hi Samuel,

I ran the silent version (utf16le), which reported no rogues found. The output file is identical in size to the original. After a few minutes I get an Autoit warning: "error allocating memory".

Olly

[Edited at 2013-03-18 11:22 GMT]


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 20:38
Member (2009)
Dutch to English
+ ...
Hi Olly, Mar 18, 2013

My experience is that you can very often fix faulty TMXs by running them through Xbench.

Project > Properties > Add > TMX memory

then:

Tools > Export items

Michael


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
@Farkas Mar 18, 2013

Sorry, I meant 650000 entries. A Muse is an autosuggest function in MemoQ.

Olly


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 21:38
Member (2005)
English to Polish
+ ...
UltraEdit Mar 18, 2013

It will be a somewhat large chunk for this piece of software, but it will parse the XML and take to to the place where the error is. You may have to parse it several times until the file is free of errors.

Regards,

Piotr


 

Samuel Murray  Identity Verified
Netherlands
Local time: 21:38
Member (2006)
English to Afrikaans
+ ...
Some more notes Mar 18, 2013

Olly Pekelharing wrote:
I ran the silent version (utf16le), which reported no rogues found. The output file is identical in size to the original. After a few minutes I get an Autoit warning: "error allocating memory".


Okay, I've discovered that the reason for the memory error was because the script was very memory wasteful -- it loaded the TMX file about eight times into the memory when only about twice was really necessary. I did not realise this because my own "large" TMX files were small enough.

I'm putting the finishing touches on a new version of the script that will remove TUs that contain invalid characters, as soon as I figure out why the regex won't work, heh-heh.

Your TMX file (which you sent to me, thanks) definitely contains invalid XML characters. They are not difficult to locate but it is a cumbersome process if you have to do it one by one. Here's how I find them manually:

1. Try to open the TMX file in Virtaal. Virtaal is very fussy (e.g. if your UTF8 file has a BOM, it will refuse to open it). Virtaal tells you in an error message where the error occurs. The error message given when I try to open your file, is this: "Could not open file. Premature end of data in tag seg line 920042, line 920042, column 233." (I've underlined the important detail).

2. Open the TMX file in Akelpad (a small Unicode editor with very few extra features). In Akelpad, press Ctrl+G (which means "go to"). Type in "920042:233" (which means character 233 of line 920042) and press OK. Akelpad will take the cursor to that position.

You won't be able to see what's wrong, because Akelpad doesn't have a glyph for the invalid character (it displays it as a comma, I think). If you want to see what character it is, copy a portion of the text from the cursor position to a new file (created in Akelpad, too), and then open that tiny file in a hex editor (I use Brooks Younce's Tiny Hex Editor).

The invalid character at position 920042:233 of your TMX is \x1A. In the hex editor it shows up as "00 1A". In valid XML 1.0 this character must be converted to an entity (but it is easier for me to just delete it and the whole TU that comes with it, if the TMX file is going to be used for reference purposes only).

Invalid characters is one reason why a TMX file might fail. Another is if you have a stray greater-than or less-than character somewhere in a segment, or if you have characters that must be written as entities, when there is a stray ampersand, or if you have a missing quote character inside a TMX tag.


[Edited at 2013-03-18 20:11 GMT]


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
no fix yet Mar 19, 2013

Thanks all, I've tried all your suggestions but to no avail. When I have time I will try Samuel's last suggestion. @Farkas: I also tried your tools and successfully produced a new tmx but this I can't import into MemoQ at all. I don't feel comfortable about posting a link to this client TM here.

Regards,

Olly


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
@Farkas Mar 19, 2013

One thing I did notice is that when I was using your tmx maker it suggested that both languages I was using were English (where you enter the language code). I entered the right language codes anyway so I assume this wouldn't have led to the failure of the tmx (in MemoQ) though.

Regards,

Olly


 

FarkasAndras  Identity Verified
Local time: 21:38
English to Hungarian
+ ...
TMX langcodes Mar 19, 2013

The TMX maker has no way to guess what your languages are so it defaults to English. You are supposed to enter the correct language code yourself. If you entered codes that MQ doesn't support (EN instead of EN-GB or whatever) then that might be the reason why the import failed. It could be something else as well, of course. It's impossible to tell for sure without seeing the file. Best of luck fixing this.

By the way, have you tried an import-export roundtrip in MemoQ? If this Muse functionality is in MemoQ, surely it should accept TMX files that were generated by MemoQ itself???


 

Olly Pekelharing  Identity Verified
Netherlands
Local time: 21:38
Member (2009)
Dutch to English
TOPIC STARTER
@Farkas Mar 19, 2013

Yes, I entered the correct codes. As for the round trip, I tried it with studio and the output was error free, but funnily enough when I try it with MemoQ it goes awry (even though MemoQ reports no errors on the import).

 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Fixing a tmx to create a Muse - help needed

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search