Pages in topic:   < [1 2]
Wordfast Classic: my .tmx exports appear corrupted in Trados
Thread poster: Céline Mélard
FarkasAndras
FarkasAndras  Identity Verified
Local time: 16:27
English to Hungarian
+ ...
generate correct TMX files Oct 17, 2012

Céline Mélard wrote:


I really appreciate that you tried to help me but I can't possibly take any more of your time for this. If it persists I think I'll purchase the hotline help and get them to figure out the problem.

That's understandable. Encoding can be incredibly frustrating - trust me, you've only just scratched the surface. In any case, this appears to be a bug in Wordfast (the TMXes are not corrupted after the fact and they are not read incorrectly by another CAT, they come out of Wordfast wrong).
I would still recommend you to try and generate TMX files in a different encoding. Don't open them in a text editor and try to fix the encoding there; try to find a setting in Wordfast for generating them in a different encoding (hopefully correctly) in the first place.

I don't know Wordfast, esperantisto wrote this suggestion:
"Just make sure your TMX files are saved to UTF-16 (Unicode) encoding when exporting, and everything will be fine. And to make it, keep your TMs in Unicode. In the TM Editor, choose the special filter of rewrite to Unicode or something like this. Refer to the user manual."


 
esperantisto
esperantisto  Identity Verified
Local time: 18:27
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Bold statement Oct 17, 2012

FarkasAndras wrote:

this appears to be a bug in Wordfast…

I don't know Wordfast


Then don’t claim there’s a bug. What if someone says: I don’t use and have never used LF Aligner, but it’s certainly a bag of bugs?

I would still recommend you to try and generate TMX files in a different encoding.


Wordfast does not let you choose. My TMs are in UTF-16, thus Wordfast produces TMX files in UTF-16, however, with no encoding declaration. And they are read fine by OmegaT. So, one should keep WF TMs in UTF-16, that’s it.

Don't open them in a text editor and try to fix the encoding there


Why? It helps, provided that you know what you’re doing.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 16:27
English to Hungarian
+ ...
Let's stick to the facts Oct 17, 2012

esperantisto wrote:

FarkasAndras wrote:

this appears to be a bug in Wordfast…

I don't know Wordfast


Then don’t claim there’s a bug. What if someone says: I don’t use and have never used LF Aligner, but it’s certainly a bag of bugs?

Well, here the original poster says he generated a TMX file with Wordfast, then opened it with a text editor right away without doing anything else to the file and saw corrupted characters in it. In theory it could be user error (i.e. his account of what he did is inaccurate), a bug in the text editor or a problem with the source files, but these are all unlikely. By far the most likely explanation is that WF doesn't work correctly on the OP's computer. It's generating corrupted TMX files, i.e. it's buggy. It could be an incompatibility in WF with the OP's system or java settings, a corrupted install or some internal WF bug, perhaps related to the source file or TM format. It's impossible to tell but it's almost certainly in WF. If the OP can share a raw, unmodified TMX as exported by WF, we'll know a bit more.
And yes, if a user reported that LF Aligner is generating corrupted files (when used in accordance with the user guide), then I'd conclude that it's probably a bug in LF Aligner. I've fixed a few bugs already based on user bug reports so I wouldn't be shocked if it happened again.


esperantisto wrote:
Wordfast does not let you choose. My TMs are in UTF-16, thus Wordfast produces TMX files in UTF-16, however, with no encoding declaration. And they are read fine by OmegaT. So, one should keep WF TMs in UTF-16, that’s it.

Well, if WF allows you to convert existing TMs to UTF-16, then this is still a possible solution.

esperantisto wrote:
Don't open them in a text editor and try to fix the encoding there

Why? It helps, provided that you know what you’re doing.

Because it generally doesn't help. If you see corrupted characters in a text editor, you're usually SOL, especially if you're not a huge geek with some background knowledge on encodings, like myself. Text editors generally autodetect the encoding, so if the file uses a consistent encoding correctly all the way through, it will show up correctly. Generally, files will only show up incorrectly if it they contain illegal characters, parts in mismatched encodings or a BOM or XML header encoding declaration that doesn't match the file. The header posted by the OP doesn't contain an encoding declaration, so the encoding was definitely autodetected by the text editor. Illegal characters or mismatched encodings are most likely.
You can sometimes override the autodetected encoding in a text editor (force it to "read the file as" UTF-8 or as ISO-8859-1 or whatever) and fix the problem fairly simply, but that's pretty rare and it requires a text editor that has this feature.

[Edited at 2012-10-17 15:28 GMT]


 
Céline Mélard
Céline Mélard
Local time: 16:27
English to French
+ ...
TOPIC STARTER
Copy of a TM Oct 18, 2012

Hi everyone,

After reading your conversation last night, I went looking for an example of a full TM in .tmx. I finally found a small one

I copied it here : https://docs.google.com/document/d/1GnVgHzEvEuq7vYUC790khqtGv2b1UuGLS7A_Rb5P1TM/edit

(I tried copying it here but the tags were a problem).

I left
... See more
Hi everyone,

After reading your conversation last night, I went looking for an example of a full TM in .tmx. I finally found a small one

I copied it here : https://docs.google.com/document/d/1GnVgHzEvEuq7vYUC790khqtGv2b1UuGLS7A_Rb5P1TM/edit

(I tried copying it here but the tags were a problem).

I left it raw, I did nothing to it except exporting it as .tmx

I hope it helps!
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 16:27
English to Hungarian
+ ...
upload Oct 18, 2012

Well, that does show the character corruption, but not its cause. When you copy-paste the text to Google docs it is re-encoded in whatever encoding Google uses. To allow us to see the original encoding, you'd need to upload the original TMX file itself to Rapidshare, dropbox, google drive or some other such service. You can even zip it to make sure nothing interferes with it.
To share with Google Drive, click the red upload button (to the right of "Create"), then right-click the file, clic
... See more
Well, that does show the character corruption, but not its cause. When you copy-paste the text to Google docs it is re-encoded in whatever encoding Google uses. To allow us to see the original encoding, you'd need to upload the original TMX file itself to Rapidshare, dropbox, google drive or some other such service. You can even zip it to make sure nothing interferes with it.
To share with Google Drive, click the red upload button (to the right of "Create"), then right-click the file, click share, make it public and post the link here.
It'll probably turn out to be a corrupted UTF-8 file, in which case you would have two options: try and convert your TMs to UTF-16 to get around it or report the bug to Wordfast and hope they offer a fix.
Collapse


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Wordfast Classic: my .tmx exports appear corrupted in Trados







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »