Can your CAT tool open my test TMX file?
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 17:55
Member (2006)
English to Afrikaans
+ ...
Mar 19, 2013

G'day everyone

I'd like to know if my test TMX file (3 segments) can be opened in several other CAT tools. If you're willing to help me, please download the test TMX file and see if your CAT tool can read it (all three segments) and if you can access all three segments (they have the numbers 1, 2 and 3 at the end). I boobytrapped segment #2 and I want to see which CAT tools fall for it (it won't harm your computer). Also if you have TMX validation tools or converters, I'd love to know if they had any problems with the file.

http://wikisend.com/download/460404/commatest2%20WfMemory.zip

Thanks
Samuel



[Edited at 2013-03-19 17:02 GMT]


 

Bernard Lieber  Identity Verified
Local time: 17:55
English to French
+ ...
CAT Tools Mar 19, 2013

Hi Samuel,

Which CAT tools have you already tested it with?

First test with DéjàVuX2 latest build, no problem imports all 3 segments without any issue. Can't test with Studio 2011 as it's not my language combination

[Edited at 2013-03-19 19:17 GMT]


 

Joakim Braun  Identity Verified
Sweden
Local time: 17:55
German to Swedish
+ ...
Yes Mar 19, 2013

Yes, my application Xoterm can read this.

When opened in a text editor it also looks like a perfectly good TMX file to me.
What do you mean, "booby-trapped"?


 

Anna Sylvia Villegas Carvallo
Mexico
Local time: 10:55
English to Spanish
SDL Trados 2007 Mar 19, 2013

The rain in Spain, falls mainly on the plains1.
Die reent in Spanje, val saggies op die blanje.

The rain in Spain, falls mainly on the plains2.
Die reent in Spanje‚ val saggies op die blanje.

The rain in Spain, falls mainly on the plains3.
Die reent in Spanje, val saggies op die blanje.

Is this right?

icon_confused.gif


 

xxxnrichy
France
Local time: 17:55
French to Dutch
+ ...
Studio 2011 Mar 19, 2013

Opens without problems and without indicating the language codes. Same result as for Carvallo.

WFC 6.03t in Word 2010: no problem.

20130319~175434 SM 0 EN-ZA The rain in Spain, falls mainly on the plains1. AF-ZA Die reent in Spanje, val saggies op die blanje.
20130319~175422 N 0 EN-ZA The rain in Spain, falls mainly on the plains2. AF-ZA Die reent in Spanje‚ val saggies op die blanje.
20130319~175422 N 0 EN-ZA The rain in Spain, falls mainly on the plains3. AF-ZA Die reent in Spanje, val saggies op die blanje

WP Pro 3.1.3: I had to indicate language codes (English for South Africa and Afrikaans for South Africa). Same results as above.

Does this help?


 

FarkasAndras
Local time: 17:55
English to Hungarian
+ ...
Passes Mar 19, 2013

The TMXCheck TMX verifier, which IIRC I got from LISA's now defunct website, doesn't find fault with it.
What's the trick BTW? It looks like a totally plain & straightforward TMX to me.


 

Paz González  Identity Verified
Chile
English to Spanish
Fluency Translation 2013 Mar 19, 2013

Hi Samuel,

I could open them at the third attempt. The first time opened it just 1, the second time just 2 and the third time the 3 of them. I must tell you that I'm learning to work with CATs, so this was a great test for me and I don't know if I did it right the first and second time, maybe I'm not that's why I could do it at the third time.

With Fluency you can import the file without problem, but to open it you have to indicate the language pair.

Does this help?


 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:55
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Thanks, everyone Mar 19, 2013

Samuel Murray wrote:
I'd like to know if my test TMX file (3 segments) can be opened in several other CAT tools.


Thanks, everyone.

I have since learnt that the problem I had with the one tool that I had the problem with, is likely a problem with that tool only. The tool in question is Virtaal. The comma in the target field of the second segment in the TMX is not a comma, but it looks like one. It is a character whose Unicode number is 201A. Virtaal thinks it's character number 001A. Virtaal also thinks that character number 221A is character 001A (and this probably applies to others that end on "1A" as well). It is a fairly serious bug, IMO, because it causes Virtaal to think that the file ends at that character. And I have no idea when it is going to be fixed.


 

xxxvictor_lo
Local time: 23:55
Heartsome Translation Studio 8.2.0 Mar 20, 2013

Heartsome Translation Studio 8.2.0 imports all 3 segments successfully.

However, the language code af-ZA is not included by default, so I had to add it in Tools | Options | Languages. Also reported to their tech support, hope they will fix it later.


 

Ambrose Li  Identity Verified
Canada
Local time: 11:55
Chinese to English
+ ...
Little-endian UCS-2 files Mar 20, 2013

The file in question is little-endian UCS-2 (or UTF-16), so I will probably say it is not, technically speaking, treating U+201A as U+001A. It is seeing ASCII code 0x1A, period. It does not even know what the next byte (the second half of the Unicode) is. In short, this piece of software is not Unicode compliant.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:55
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Again, please Mar 20, 2013

G'day everyone

Okay, I wanted to test if the Unicode character U+201A is a problem for CAT tools, and it isn't.

However, if you would be so kind, I would also like to know how CAT tools respond to the Unicode character U+001A in a TMX file. This is an invalid character in XML.

Please download this second TMX file:
http://wikisend.com/download/981020/substitute%20test%20WfMemory.zip
and open it in your CAT tool, and if it successfully opens, export it to TMX again, to see if the exported TMX contains the same content as the original TMX file. The character U+001A is present in all three segments, just before the word "saggies", in three different forms.

So far, I've tested it in XML Validator, TMX Validator, Wordfast Pro, Virtaal, OmegaT, Wordfast Classic, and Trados 2007.

Only Wordfast Classic and Trados 2007 accepts the file. Wordfast Classic removed the entities from the first two segments but retained the invalid character in the third segment. When I exported a TMX from Wordfast again, none of the segment contained the invalid character. I can't see inside Trados's TM, but when I exported a TMX from Trados again, none of the segments contained the invalid character. Wordfast Classic and Trados 2007 are forgiving, then, when reading.

So far, this experiment taught me that XML is intolerant of invalid characters even in their traditional entity form.

Thanks
Samuel


 

Bernard Lieber  Identity Verified
Local time: 17:55
English to French
+ ...
DéjàVux2 Mar 20, 2013

I imported your tmx as a project file and not as a TM, displays the following: Die reent in Spanje, val saggies op die blanje. With an arrow after val. When exporting, is replaced by val SUB in each segment.

Also tried Alchemy Publisher 3.0 attaching your tmx as TM and the three segments are displayed correctly. Sorry, forgot to uncheck your first tmx (provides a 98% match) but Publisher can't load your second tmx at all

[Edited at 2013-03-20 12:00 GMT]

[Edited at 2013-03-20 14:24 GMT]


 

Ambrose Li  Identity Verified
Canada
Local time: 11:55
Chinese to English
+ ...
WfA Mar 20, 2013

WordFast Anywhere reports “Uploaded memory has no valid translation units”.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:55
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Li Mar 21, 2013

Ambrose Li wrote:
WordFast Anywhere reports “Uploaded memory has no valid translation units”.


I wonder if WFA rejected the entire TM or only the three segments that were invalid (as there were only three segments in it, all will be rejected in this case).


 

Ambrose Li  Identity Verified
Canada
Local time: 11:55
Chinese to English
+ ...
WFA details Mar 21, 2013

I did a little test involving one of my real TM’s. WFA basically imported the whole test TM, but with the U+001A character removed.

So I’m now a little perplexed as to why it had refused to import your test TM.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can your CAT tool open my test TMX file?

Advanced search







memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search