Pages in topic: < [1 2] | Wordfast Classic: my .tmx exports appear corrupted in Trados Thread poster: Céline Mélard
| generate correct TMX files | Oct 17, 2012 |
Céline Mélard wrote: I really appreciate that you tried to help me but I can't possibly take any more of your time for this. If it persists I think I'll purchase the hotline help and get them to figure out the problem. That's understandable. Encoding can be incredibly frustrating - trust me, you've only just scratched the surface. In any case, this appears to be a bug in Wordfast (the TMXes are not corrupted after the fact and they are not read incorrectly by another CAT, they come out of Wordfast wrong). I would still recommend you to try and generate TMX files in a different encoding. Don't open them in a text editor and try to fix the encoding there; try to find a setting in Wordfast for generating them in a different encoding (hopefully correctly) in the first place. I don't know Wordfast, esperantisto wrote this suggestion: "Just make sure your TMX files are saved to UTF-16 (Unicode) encoding when exporting, and everything will be fine. And to make it, keep your TMs in Unicode. In the TM Editor, choose the special filter of rewrite to Unicode or something like this. Refer to the user manual." | | | esperantisto Local time: 18:27 Member (2006) English to Russian + ... SITE LOCALIZER Bold statement | Oct 17, 2012 |
FarkasAndras wrote: this appears to be a bug in Wordfast… I don't know Wordfast Then don’t claim there’s a bug. What if someone says: I don’t use and have never used LF Aligner, but it’s certainly a bag of bugs? I would still recommend you to try and generate TMX files in a different encoding. Wordfast does not let you choose. My TMs are in UTF-16, thus Wordfast produces TMX files in UTF-16, however, with no encoding declaration. And they are read fine by OmegaT. So, one should keep WF TMs in UTF-16, that’s it. Don't open them in a text editor and try to fix the encoding there Why? It helps, provided that you know what you’re doing. | | | Let's stick to the facts | Oct 17, 2012 |
esperantisto wrote: FarkasAndras wrote: this appears to be a bug in Wordfast… I don't know Wordfast Then don’t claim there’s a bug. What if someone says: I don’t use and have never used LF Aligner, but it’s certainly a bag of bugs? Well, here the original poster says he generated a TMX file with Wordfast, then opened it with a text editor right away without doing anything else to the file and saw corrupted characters in it. In theory it could be user error (i.e. his account of what he did is inaccurate), a bug in the text editor or a problem with the source files, but these are all unlikely. By far the most likely explanation is that WF doesn't work correctly on the OP's computer. It's generating corrupted TMX files, i.e. it's buggy. It could be an incompatibility in WF with the OP's system or java settings, a corrupted install or some internal WF bug, perhaps related to the source file or TM format. It's impossible to tell but it's almost certainly in WF. If the OP can share a raw, unmodified TMX as exported by WF, we'll know a bit more. And yes, if a user reported that LF Aligner is generating corrupted files (when used in accordance with the user guide), then I'd conclude that it's probably a bug in LF Aligner. I've fixed a few bugs already based on user bug reports so I wouldn't be shocked if it happened again. esperantisto wrote: Wordfast does not let you choose. My TMs are in UTF-16, thus Wordfast produces TMX files in UTF-16, however, with no encoding declaration. And they are read fine by OmegaT. So, one should keep WF TMs in UTF-16, that’s it. Well, if WF allows you to convert existing TMs to UTF-16, then this is still a possible solution. esperantisto wrote: Don't open them in a text editor and try to fix the encoding there Why? It helps, provided that you know what you’re doing. Because it generally doesn't help. If you see corrupted characters in a text editor, you're usually SOL, especially if you're not a huge geek with some background knowledge on encodings, like myself. Text editors generally autodetect the encoding, so if the file uses a consistent encoding correctly all the way through, it will show up correctly. Generally, files will only show up incorrectly if it they contain illegal characters, parts in mismatched encodings or a BOM or XML header encoding declaration that doesn't match the file. The header posted by the OP doesn't contain an encoding declaration, so the encoding was definitely autodetected by the text editor. Illegal characters or mismatched encodings are most likely. You can sometimes override the autodetected encoding in a text editor (force it to "read the file as" UTF-8 or as ISO-8859-1 or whatever) and fix the problem fairly simply, but that's pretty rare and it requires a text editor that has this feature.
[Edited at 2012-10-17 15:28 GMT] | | | Céline Mélard Local time: 16:27 English to French + ... TOPIC STARTER
|
|
Well, that does show the character corruption, but not its cause. When you copy-paste the text to Google docs it is re-encoded in whatever encoding Google uses. To allow us to see the original encoding, you'd need to upload the original TMX file itself to Rapidshare, dropbox, google drive or some other such service. You can even zip it to make sure nothing interferes with it. To share with Google Drive, click the red upload button (to the right of "Create"), then right-click the file, clic... See more Well, that does show the character corruption, but not its cause. When you copy-paste the text to Google docs it is re-encoded in whatever encoding Google uses. To allow us to see the original encoding, you'd need to upload the original TMX file itself to Rapidshare, dropbox, google drive or some other such service. You can even zip it to make sure nothing interferes with it. To share with Google Drive, click the red upload button (to the right of "Create"), then right-click the file, click share, make it public and post the link here. It'll probably turn out to be a corrupted UTF-8 file, in which case you would have two options: try and convert your TMs to UTF-16 to get around it or report the bug to Wordfast and hope they offer a fix. ▲ Collapse | | | Pages in topic: < [1 2] | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Wordfast Classic: my .tmx exports appear corrupted in Trados Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |