ProZ.com global directory of translation services
 The translation workplace
Ideas

 
User
Thread poster: zabat
Corruption of UCS-2 Little Endian in Notepad?

zabat
Kazakhstan
Member (2010)
English to Kazakh
+ ...
Mar 24, 2011

Hello everyone,

This week I received an offer to edit a txt-file encoded in UCS-2 Little Endian with the help of a text editor such as Notepad. The language of the file was Kazakh, which uses Cyrillic characters.

When I submitted the edited file, the translation agency told me that the file was corrupted and proposed that I use Notepad++. I edited the file again in Notepad++, but apparently to no avail. The agency said my file was still corrupted.

I then sent the file to some colleagues, who told me that they could open and read the edited file in Notepad without any problems.

Since I am new to using Notepad, I do not know what has happened here. Does anyone on the forum know? Any comments or advice would be greatly appreciated.


Direct link Reply with quote
 

Michael Grant
Japan
Local time: 05:39
Japanese to English
Questions... Jul 29, 2011

All I can think of is that maybe you used a different character set setting than the file.
In Notepad you have to select the font as well as the character set. For example, the Verdana font has at least seven character set settings available for it (Format menu > Font), only one of which is Cyrillic...

I do not have Notepad++ installed, so I can't tell you how that works...

It would be very helpful to have the actual file or, if that is impossible, then I would want to know:

1) What application was used to create the original text file?
2) What font was used?
3) What does the agency mean by "corrupted"?
- Does the file not open?
- Does it open but display badly?
4) What application is the agency using to open the file??

A lot of questions need to be answered before we can troubleshoot this without having the file ourselves...

Afterthought:
UCS-2 is an older character encoding that, according to the Unicode Consortium Web site, was replaced by UTF-16 in version 2.0 of the unicode standard.
(Source: http://www.unicode.org/faq/basic_q.html#14)
If the file does not have to used for some obscure, older database system or application, then maybe they can get a UTF-16, or UTF-8 version of it...?

What does the agency mean by the word "corrupted"...? In what way is the file you sent to them unusable?

MGrant


Direct link Reply with quote
 

zabat
Kazakhstan
Member (2010)
English to Kazakh
+ ...
TOPIC STARTER
corrupted: strange characters appear Jul 31, 2011

Hello Michael,

Thank you for your assistance. It's been a while ago, so I have already forgotten the details of the case. I just remember that they told me to work in UCS-2 Endian because the file was created in that font. I did. Then they asked me to install Notepad ++ and start the project from scratch. I did. Nothing helped. The target language was Kazakh, so I suspect something was wrong with the Kazakh font versions. Anyway, I have lost the project).

Regards


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Fernanda Rocha[Call to this topic]
Donatella Cifola[Call to this topic]

You can also contact site staff by submitting a support request »

Corruption of UCS-2 Little Endian in Notepad?







SDL provides market-leading translation software to over 185,000 users
SDL offers leading translation management solutions to meet LSPs needs throughout the whole translation supply chain.

With over 185,000 licenses being used by translators and organizations worldwide, our products will help you to connect to a supply chain that guarantees compatibility, making it easier to work with your customers and other users.

More info »
memoQ translator pro 5.0
Save 20% with memoQ today!

memoQ translator pro is the premium product for professionals. It is Kilgray's best-selling tool among freelance translators: you get all the functionality available in memoQ in your local environment plus the ability to work on remote servers.

More info »