MS Glossaries to Trados: Unicode problems
Thread poster: Éric Cléach
Éric Cléach  Identity Verified
France
Local time: 02:09
Member (2005)
English to French
Nov 14, 2005

Hi all,

I am trying to build a Trados TM with EN>FR MS glossaries and apparently I am experiencing Unicode problems.

I have 213 glossary files in CSV. I have converted all of them in TXT with a small program called MSGloss2TWB. When I try to open these files with NotePad2 (should be the same with Windows Notepad anywyay), they look fine, ie all accentuated letters appear as they should.

As you cannot import several TXT files at once in Trados, I decided to merge all these files into one single TXT. For that purpose, I have used TextMerger (http://www.maliska.net/mal/). I then obtained a huge 340 Mb TXT file. I cannot open such a large file in Notepad, but the "merging" should not incur any unicode-related problem.

I tried to import this huge TXT in Trados 7, and I get an error. Apparently, MSGloss2TWB does not create a header in the TXT files. To solve this problem, I decided to create an empty TM in Trados and to export as TXT. The resulting TXT contains only the header (rtf preamble), which I managed to merge with my huge TXT TM file.

This works OK: I've been able to import something like 750.000 TUs in the TM... except that the French accentuated letters do not appear correctly in Trados.

What have I done wrong ?

Thanks for your help!

[Edited at 2005-11-14 17:48]


Direct link Reply with quote
 
xxxBrandis
Local time: 02:09
English to German
+ ...
I have tried same thing EN-DE combination Nov 14, 2005

with Trados and MS-Glossaries, didn´t work, then I have imported into Multiterm as tab delimited and it took a whole one and half days time to import. The rest is loading the termbase.Best Brandis

Direct link Reply with quote
 
xxxczechtran
English to Czech
+ ...
special application for import Nov 17, 2005

There is a neat little program for importing MS glossaries into Trados. It is free to download at www.globalready.com. Click on Downloads on the top - the name of hte program is MSGloss2TWB. Hope it helps!

Direct link Reply with quote
 
Éric Cléach  Identity Verified
France
Local time: 02:09
Member (2005)
English to French
TOPIC STARTER
I have used this program Nov 17, 2005

czechtran wrote:

There is a neat little program for importing MS glossaries into Trados. It is free to download at www.globalready.com. Click on Downloads on the top - the name of hte program is MSGloss2TWB. Hope it helps!


Thanks for your help... but as I've mentioned in the first post, I have used this program.

My problem is not linked with the conversion of the files, but to the conversion of the "characters" (probably a Unicode issue).

Thanks anyway.
I still haven't solved this issue... so if someone can help... Thanks!


Direct link Reply with quote
 

Rodolfo Raya  Identity Verified
Local time: 22:09
English to Spanish
Use CSVConverter Nov 17, 2005

Hi,

You can get CSVConverter from http://www.maxprograms.com and use it to convert MS Glossaries to TMX format. You can then import the TMX files int your Trados TM databases.

Hope this helps,
Rodolfo


Direct link Reply with quote
 
Éric Cléach  Identity Verified
France
Local time: 02:09
Member (2005)
English to French
TOPIC STARTER
Yes, but... Nov 17, 2005

Yes, I know, CSV converter, but the *big* problem is that, as far as I know, you can only convert one file at a time...

Direct link Reply with quote
 
xxxczechtran
English to Czech
+ ...
Sorry, for some reason I didn’t notice Nov 17, 2005

you are already using MSGloss2TWB. Importing a large merged file never worked for me either and I had to import all files one by one. Any attempts to change, resave or merge the files created in the MSGloss resulted in an error message in Trados.
Even with importing the files one by one, there are still some character distortions in my TM. I would like to try again with the updated version of the MSGloss2TWB, so I would be curious to see an outcome of this discussion!


Direct link Reply with quote
 
Éric Cléach  Identity Verified
France
Local time: 02:09
Member (2005)
English to French
TOPIC STARTER
it should work... Nov 17, 2005

czechtran wrote:

you are already using MSGloss2TWB. Importing a large merged file never worked for me either and I had to import all files one by one. Any attempts to change, resave or merge the files created in the MSGloss resulted in an error message in Trados.
Even with importing the files one by one, there are still some character distortions in my TM. I would like to try again with the updated version of the MSGloss2TWB, so I would be curious to see an outcome of this discussion!


Well, apart from the "strange characters", everything worked perfectly for me: as I said, I converted 213 MS glossaries from CSV into trados TXT, then with a smal program called TextMerge, I managed to merge all these TXT files, and finally I managed to import the resulting 350Mb TXT file into Trados 7 (although this process hogged 99% of my laptop resources for about one hour!).
Now I have this great Trados Microsoft TM, but... all the French accentuated letters are "corrupted"...


Direct link Reply with quote
 

Niina Lahokoski  Identity Verified
Finland
Local time: 03:09
Member (2008)
English to Finnish
+ ...
Same problem + a solution(?) Nov 22, 2005

Hi,

I converted my EN-FI glossary with the MSGloss2TWB and in my case the accented characters are corrupted.

Right now I'm trying to fix it using the Maintenance tool in TWB. First I check, which symbol string (it usually consists of more than one character) appears in the place of each corrupted character, then I just copy the symbols to the Find field and put the right character in the Replace field. Then I click Search and select Change All Translation Units from the menu. It will take some time, though, and the same must be done for every corrupted character (luckily I have only two that need to be fixed).

I once had the corrupted character problem with an import TM and I fixed it this way, so I think it will work in this case, too.


Direct link Reply with quote
 

Hynek Palatin  Identity Verified
Czech Republic
Local time: 02:09
English to Czech
+ ...
Unicode Nov 22, 2005

How about converting the glossaries to ANSI first? "Save As" in Notepad, if there's no better tool available...

Direct link Reply with quote
 
Éric Cléach  Identity Verified
France
Local time: 02:09
Member (2005)
English to French
TOPIC STARTER
Batch conversion Nov 23, 2005

Does anyone know of a programm that would allow you to convert the "code" of a batch of files at once?

Thanks!


Direct link Reply with quote
 

Robert Tucker
United Kingdom
Local time: 01:09
German to English
+ ...
iconv Nov 23, 2005

Éric Cléach wrote:


Does anyone know of a programm that would allow you to convert the "code" of a batch of files at once?




This will convert between encodings too numerous to list here. It's native software on Linux but also available for Windows, I believe.

For batch conversion on Linux I would write a simple shell script, something like:

#!/bin/bash
for filename in *.txt
do
iconv -f ISO-8859-1 -t utf8 /home/my_username/file_folder/$filename > /home/my_username/file_folder_utf/$filename
done

for example.

Don't know how easy or otherwise it is to write a similar batch file for Windows, I'm afraid.


Direct link Reply with quote
 

Rodolfo Raya  Identity Verified
Local time: 22:09
English to Spanish
Merge text Nov 23, 2005

Éric Cléach wrote:

Does anyone know of a programm that would allow you to convert the "code" of a batch of files at once?

Thanks!


Hi,

Merge the CSV files and use CSVConverter.

Be careful: most MS Glossaries have 8 columns but not all. You may need to convert 2 or 3 manually.

Rodolfo


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MS Glossaries to Trados: Unicode problems

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search