How to convert Microsoft Glossaries into a TM?
Thread poster: Stanislaw Czech, MCIL

Stanislaw Czech, MCIL  Identity Verified
United Kingdom
Local time: 08:32
Member (2006)
English to Polish
+ ...
Dec 3, 2009

I have comma separated files containing Microsoft Glossaries. I would like to use them to create a TM, which I would use for reference. I have already found a program which is ment to help in this task MSGloss2TWB and I have converted the original files into txt.

However when I import them into my TM all Polish diacritical characters are corrupted (what is strange these characters seem to be perfectly OK in txt files so probably it is a problem with encoding).

Do you have any suggestions what could I do?

Best Regards
Stanislaw


Direct link Reply with quote
 

Pablo Bouvier  Identity Verified
Local time: 09:32
German to Spanish
+ ...
How to convert Microsoft Glossaries into a TM? Dec 3, 2009

Stanislaw Czech wrote:

I have comma separated files containing Microsoft Glossaries. I would like to use them to create a TM, which I would use for reference. I have already found a program which is ment to help in this task MSGloss2TWB and I have converted the original files into txt.

However when I import them into my TM all Polish diacritical characters are corrupted (what is strange these characters seem to be perfectly OK in txt files so probably it is a problem with encoding).

Do you have any suggestions what could I do?

Best Regards
Stanislaw


Dear Stanislaw: What TM do you use? Near all modern TMs allows to define the encoding format to import. Between, you can try this procedure:

1) Download EditPad Lite. EditPad Lite is free for non-commercial use.
2) Open your text memory with EditPadLite.
3) Use the menu path Convert > Text Encoding and then select one of both options and the encoding format you want and click OK.
4) Try to load the converted memory to your TM.

Please, tell us if this procedure worked for you.

Regards,
Pablo Bouvier

[Editado a las 2009-12-03 17:46 GMT]


Direct link Reply with quote
 

Claudia Alvis  Identity Verified
Peru
Local time: 02:32
Spanish
+ ...
ApSIC Xbench Dec 3, 2009

I used ApSIC Xbench to convert my Microsoft glossaries into a TM. You need to create a new project, add the glossary files and export them them as TMX files. It's very easy to use.

ApSIC downloads website

[Edited at 2009-12-03 18:27 GMT]


Direct link Reply with quote
 
FarkasAndras
Local time: 09:32
English to Hungarian
+ ...
Don't install anything Dec 3, 2009

Well, you might as well install xbench as it's a great little program, but I don't think you need it for this.

In all likelyhood, your MSGloss2TWB program produced a UTF-8 file, and Trados has "interesting" support for UTF-8. Just open the txt in Notepad, click File/Save as and see what the encoding box at the bottom says. If it's UTF-8, try changing it to ANSI and saving under a different name. I don't know if ANSI covers all the Polish characters... if the file contains characters that can't be reproduced in ANSI, you will get a warning.
If the encoding is big endian Unicode or some other exotic variety, try UTF-8 first and then ANSI if UTF-8 doesn't work.


Direct link Reply with quote
 

Pablo Bouvier  Identity Verified
Local time: 09:32
German to Spanish
+ ...
How to convert Microsoft Glossaries into a TM? Dec 3, 2009

FarkasAndras wrote:

Well, you might as well install xbench as it's a great little program, but I don't think you need it for this.

In all likelyhood, your MSGloss2TWB program produced a UTF-8 file, and Trados has "interesting" support for UTF-8. Just open the txt in Notepad, click File/Save as and see what the encoding box at the bottom says. If it's UTF-8, try changing it to ANSI and saving under a different name. I don't know if ANSI covers all the Polish characters... if the file contains characters that can't be reproduced in ANSI, you will get a warning.
If the encoding is big endian Unicode or some other exotic variety, try UTF-8 first and then ANSI if UTF-8 doesn't work.


I agree that the method that you propose will probably be the most rapid, simple and effective. But, as you said before, it is not sure that it could convert all polish diacritical characters.

The method I have described earlier allows all the conversions between all the variants of the Unicode, Windows, ISO, DOS, KOI8 and EBCDIC formats, as well as to save a text uder UNIX, MacIntosh and Windows formats.


Direct link Reply with quote
 

Hynek Palatin  Identity Verified
Czech Republic
Local time: 09:32
English to Czech
+ ...
Convert to ANSI before importing Dec 3, 2009

FarkasAndras wrote:

Just open the txt in Notepad, click File/Save as and see what the encoding box at the bottom says. If it's UTF-8, try changing it to ANSI and saving under a different name.


This is it. It works with Czech (no pun intended), should work with Polish too.


Direct link Reply with quote
 
FarkasAndras
Local time: 09:32
English to Hungarian
+ ...
True enough Dec 3, 2009

Pablo Bouvier wrote:

I agree that the method that you propose will probably be the most rapid, simple and effective. But, as you said before, it is not sure that it could convert all polish diacritical characters.

The method I have described earlier allows all the conversions between all the variants of the Unicode, Windows, ISO, DOS, KOI8 and EBCDIC formats, as well as to save a text uder UNIX, MacIntosh and Windows formats.


I'd still try Notepad and any preinstalled text editors or even word processors first. It's faster and more convenient than downloading and installing a new editor just for a (hopefully) trivial encoding conversion. If it seems to be an encoding problem these can't solve, time to go text editor hunting. You only lost 5 minutes.


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 09:32
Member (2005)
English to Polish
+ ...
If you try Apsic Xbench... Dec 4, 2009

... then it will generate a TMX file, which will be encoded either in utf-8 or utf-16 and Workbench will have no problems importing it - hopefully.

Regards,

Piotr


Direct link Reply with quote
 
Claudio Porcellana  Identity Verified
Italy
WinLexic 2005 Dec 4, 2009

clearly, WinLexic doesn't do any conversion to tmx, but let you browse all ftp glossaries that are no more available on MS web site

Claudio

[Modificato alle 2009-12-04 23:45 GMT]


Direct link Reply with quote
 

Joel Earnest
Local time: 09:32
Swedish to English
What about converting them into a termbase? Dec 13, 2009

I came across the Microsoft glossaries a few years back, but found it more useful to convert the file into a Multiterm termbase.
I opened it in Word and then used Search and Replace and Convert Text to Table to put it into a table layout (with Swedish and English in my case). After pasting the table into Excel, I followed the usual procedure, beginning with Multiterm Convert, to generate the required XML file, and then importing that into a fresh termbase in Multiterm.

Here are step-by-step instructions I prepared some years ago that retain the three Swedish special letters (just substitute your languages where appropriate):

TRANSFERRING TERMS FROM EXCEL TO MULTITERM
These simpified instructions were prepared using TRADOS 7 with Windows XP but are basically the same for both older new software versions.


Step 1. Prepare the Excel file
In the Excel file, delete all columns but two – one with “Swedish” (no quotation marks) in Row 1, Column A, with the Swedish terms beneath, and one with “English” (no quotation marks) in Row 1, Column B, with the English terms beneath.
Save and close the file.



Step 2. Converting the Excel file into TRADOS XML format
Open MultiTerm Convert.

MultiTerm Convert opens with a conversion wizard at step 1/7.

1/7 Click Next.

2/7 Select: New conversion session and Save conversion session.
Click Save as…
Give the file a name and save it somewhere convenient (the file extension is XCD).
Click Next.

3/7 Select Microsoft Excel format.
Click Next.

4/9 Browse to your Excel input file.
Click Next.

5/9 Select Swedish from Available column header fields: and then click Index field. Select Swedish (Sweden) from the dropdown list.
Select English from Available column header fields: and then click Index field. Select English (United States), or the appropriate sublanguage, from the dropdown list.
Click Next.

6/9 Click Next.

7/9 Select Convert immediately.
Click Next.

8/9 Wait until the conversion is complete.
Click Next.

9/9 Click Finish to close the MultiTerm Convert.






Step 3. Creating a new termbase
Open MultiTerm.

Select Create termbase… from the Termbase menu.
Browse to the folder where you would like to store your termbase and click OK. (Selecting a folder in your My Documents folder will usually make it easier to back up the termbase or move it to a new computer when the time comes.)

The Termbase Wizard will now open at step 1/5.
Click Next.

1/5 Select Load an existing termbase file.
Browse to the XDT file created during the conversion process described in Step 2 and click Open.
Click Next.


2/5 Enter a termbase name.
Click Next.

3/5 Click Next.

4/5 Click Next.

5/5 Click Next.

Click Finish.




Step 4. Importing the entries from your Excel file
Choose Import Entries… from the Termbase menu.
Click Process… in the Termbase Catalogue dialog box.

2/8 Browse to the XML file created in MultiTerm Convert in Step 2, and click Open.
Select Fast import.
Click Next.

3/8 Click Save As… to save the exclusion file.
Click Next.

7/8 Click Next.

8/8 Wait until the import has been processed.
Click Next.

Click Finish.

Click OK.



[Edited at 2009-12-13 19:45 GMT]

[Edited at 2009-12-13 19:49 GMT]


Direct link Reply with quote
 
yakky  Identity Verified
China
Local time: 15:32
English to Chinese
+ ...
Only need Word Dec 29, 2009

Stanislaw Czech wrote:

I have comma separated files containing Microsoft Glossaries. I would like to use them to create a TM, which I would use for reference. I have already found a program which is ment to help in this task MSGloss2TWB and I have converted the original files into txt.

However when I import them into my TM all Polish diacritical characters are corrupted (what is strange these characters seem to be perfectly OK in txt files so probably it is a problem with encoding).

Do you have any suggestions what could I do?

Best Regards
Stanislaw


1) create a new word document.
2) copy your file content into this word.
3) if the text looks like "source word A->target word A, source word B->target word B,......", use "ctrl+h", type "->" in the "Find" field and "" in "Replace with" field, then click "replace all" button.
now the text should become "source word Atarget word A, source word Btarget word B......"
4) use "ctrl+h" again, type ", " in the "Find" field and "" in "Replace with" field, then click "replace all" button.
now the text should change to:
source word Atarget word Asource word Btarget word Bsource word Atarget word Asource word Btarget word Bsource word Ztarget word Z


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to convert Microsoft Glossaries into a TM?

Advanced search







SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search