Mobile menu

MultiTerm Convert: non-ASCII chars lost
Thread poster: Jussi Rosti

Jussi Rosti  Identity Verified
Finland
Local time: 19:42
Member (2005)
English to Finnish
+ ...
Jan 5, 2007

I'm trying to create a MT from a tab delimited text file. Otherwise all goes well, but non-ASCII characters are lost.

Example:

"Lisää tähän" becomes "Lis thn"

I'd appreciate your help!


Direct link Reply with quote
 

Jussi Rosti  Identity Verified
Finland
Local time: 19:42
Member (2005)
English to Finnish
+ ...
TOPIC STARTER
Figured out a workaround Jan 5, 2007

1) I replaced the Finnish characters in the txt file with a tag. (eg. ä with -aaaaa
2) did the conversion
3) reversed the tagging process after which xml file was ok
4) imported the xml to MT

This solved the problem.

Anyway, any hints how to correct the problem in first place?


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 18:42
Member (2005)
English to Polish
+ ...
Encoding of the text file is the answer Jan 5, 2007

Jussi Rosti wrote:

I'm trying to create a MT from a tab delimited text file. Otherwise all goes well, but non-ASCII characters are lost.

Example:

"Lisää tähän" becomes "Lis thn"

I'd appreciate your help!


Hi Jussi,

You must know what encoding is the source txt file. If you are not sure, you can install the jEdit text editor (it's free), and open the file in it, it will tell you the encoding of the open file in the bottom right corner.

When you use Multiterm Convert, you should specify the encoding of the source file.

If it still does not work, you should be aware that Multiterm convert likes the Unicode (UTF-16) encoding best, so you could try converting the file to Unicode, using another free utility, Rainbow, see: http://okapi.sourceforge.net/Release/Rainbow/ReadMe.htm, and only then feed it to Multiterm Convert

Hope this helps.

Piotr


Direct link Reply with quote
 

Jussi Rosti  Identity Verified
Finland
Local time: 19:42
Member (2005)
English to Finnish
+ ...
TOPIC STARTER
How to specify the enconding in MT Convert? Jan 5, 2007

Dziękuję za rady, Piotr!

Piotr Bienkowski wrote:
You must know what encoding is the source txt file.


Hmmm.... windows I guess (it's standard Excel export).


When you use Multiterm Convert, you should specify the encoding of the source file.


How to do that? There is quite a little options that can be set.
This was my first thought, too.



If it still does not work, you should be aware that Multiterm convert likes the Unicode (UTF-16) encoding best, so you could try converting the file to Unicode, using another free utility, Rainbow, see: http://okapi.sourceforge.net/Release/Rainbow/ReadMe.htm, and only then feed it to Multiterm Convert


My second try was to export the text in Unicode text. This didn't change anything.


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 18:42
Member (2005)
English to Polish
+ ...
Correction Jan 5, 2007

Just checked and there is no way to specify a code page in Multiterm Convert.

So either your TXT file should be in Unicode, or you should convert it to an Excel file and go from there.

I have yet another approach, which I use for bilingual glossaries: I use a custom made perl script to convert the txt file into Multiterm's XML import format, and then I convert the XML file into Unicode, which is accepted smoothly by Multiterm.

I did not find a way to make the Perl script write directly to Unicode.

Regards,

Piotr


Direct link Reply with quote
 

Jussi Rosti  Identity Verified
Finland
Local time: 19:42
Member (2005)
English to Finnish
+ ...
TOPIC STARTER
Thanks for your help Jan 5, 2007

Piotr Bienkowski wrote:

Just checked and there is no way to specify a code page in Multiterm Convert.


So either your TXT file should be in Unicode, or you should convert it to an Excel file and go from there.

[/quote]

I tried exporting also in unicode, but apparently it didn't work for me...

Since the MT is so unflexible, I guess my workaround is good enough for my Finnish purposes. It's quite easy, since I need just encode-decode four chars (upper and lower case ä & ö).

As for languages like Polish with bigger number of "foreign" characters, a Perl script may be a good way to handle the conversion... thanks for the idea! After resigning from software business I sometimes forget how perfect tool Perl is for a linguist...


Direct link Reply with quote
 
dnitzpon
Germany
Local time: 18:42
Dutch to German
+ ...
UFT-8 works in MT2009 May 23, 2014

I don't want to dig up old threads, but this one was on the first page of search results when I encountered the problem myself, so maybe someone else finds this useful:

I had the same problem with Win encoding for German special characters, but saving the export file as UTF-8 made MT Convert 2009 process the file without problems.

It is .... err, strange... however, that MT Convert does obviously neither detect the encoding correctly nor does it offer any option to specify it...


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MultiTerm Convert: non-ASCII chars lost

Advanced search


Translation news related to SDL Trados





TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs