Shouldn’t we all use Unicode?
Thread poster: Mihai Badea (X)
Mihai Badea (X)
Mihai Badea (X)  Identity Verified
Luxembourg
English to Romanian
+ ...
May 13, 2005

We are now having a discussion on the Romanian Forum about what encoding would be best to use. Most of us use Central European, but the problem is with questions in some language pairs. For instance, it’s impossible to see both the French and Romanian special characters on the same page, in case of a question in the French – Romanian pair. But, it seems there are other people in the same situation. I have just taken a look at a French - Russi... See more
We are now having a discussion on the Romanian Forum about what encoding would be best to use. Most of us use Central European, but the problem is with questions in some language pairs. For instance, it’s impossible to see both the French and Romanian special characters on the same page, in case of a question in the French – Romanian pair. But, it seems there are other people in the same situation. I have just taken a look at a French - Russian question http://www.proz.com/kudoz/511536 and, regardless of the encoding used, it was impossible for me to display the page correctly.

In this situation, wouldn’t be Unicode the solution? If we all used Unicode, both when writing and when reading, all the special characters should be displayed correctly on the same page. Or am I wrong?


[Edited at 2005-05-13 09:05]
Collapse


 
Balasubramaniam L.
Balasubramaniam L.  Identity Verified
India
Local time: 03:29
Member (2006)
English to Hindi
+ ...
SITE LOCALIZER
I am all for it May 13, 2005

Unicode all the way.

I am from India, and the last Census recorded more than 4,000 languages here, of which 17 are the official languages of the country. Each of these has a vibrant print industry and each has a plethora of fonts following different code-sets. The only way to tame this madnes is to go for unicode, which promotes a common code set for all the languages of the world and facilitates searching and sorting. We can then say good bye to font and display related problems a
... See more
Unicode all the way.

I am from India, and the last Census recorded more than 4,000 languages here, of which 17 are the official languages of the country. Each of these has a vibrant print industry and each has a plethora of fonts following different code-sets. The only way to tame this madnes is to go for unicode, which promotes a common code set for all the languages of the world and facilitates searching and sorting. We can then say good bye to font and display related problems and life would become much simpler for many around here.

On the negative side, the Unicode is a still evolving standard, and codes have not been finalized for some of the less frequent languages, but I think all the major languages are covered.

Another issue would be the vast amount of data that already exists in non-unicode format. Converting all of that into unicode may be prohibitively costly and in certain not even possible.

Also I have heard that databases tend to increase in size when unicode is used, for unicode uses a two byte code for each letter, whereas ansi, etc., use a single byte code. These are issues that programmers would have to work around, but I am all for unicode.
Collapse


 
PRAKASH SHARMA
PRAKASH SHARMA  Identity Verified
India
Local time: 03:29
English to Hindi
+ ...
Why only unicode? May 13, 2005

This is not justice to those, who don't or can't use unicode fonts. As far as my language pair is concerned, I too face problems, because of lack of knowledge unicode hindi typing or in Nepali, but as far as translations are concerned, they are satisfactory for most of my clients, even if I use only Kruti Dev font!

Why to emphasis that much on use of Unicode? It's just the case of choice of agency/outsourcer or Translator!! Moreover, how many hindi translators are there, who know ev
... See more
This is not justice to those, who don't or can't use unicode fonts. As far as my language pair is concerned, I too face problems, because of lack of knowledge unicode hindi typing or in Nepali, but as far as translations are concerned, they are satisfactory for most of my clients, even if I use only Kruti Dev font!

Why to emphasis that much on use of Unicode? It's just the case of choice of agency/outsourcer or Translator!! Moreover, how many hindi translators are there, who know even normal Hindi typing? Think over it!

That's all!

PRAKASH SHARMA
FREELANCE TRANSLATOR OF NEPALI, HINDI, SANSKRIT AND ENGLISH TO FOUR OF THE SAME
+977 56 530738
[email protected]
Collapse


 
Balasubramaniam L.
Balasubramaniam L.  Identity Verified
India
Local time: 03:29
Member (2006)
English to Hindi
+ ...
SITE LOCALIZER
Unicode has nothing to do with typing or fonts May 13, 2005

PRAKAASH wrote:

This is not justice to those, who don't or can't use unicode fonts...


Unicode has nothing to do with fonts or typing. It is a standard. It just standardizes the code positions for each glyph (that is, shape in the alphabet set) in a language. Using this standarized code positions, font makers develop fonts. Since they all use the same code map, font developed by different persons become fully compatible. You would often have faced the problem while using KritiDev (which you said you used for Hindi) that matter typed in this font cannot be read in another font, say Shusha. This is because these fonts use different code pages. Such problems you would not face if you switch over to unicode fonts.

As far as typing goes, unicode fully supports the standardized Hindi keyboard called the ISCRIPT keyboard, and if you know how to type on this keyboard, you can merrily type in any unicode font.

Unicode is a standard, not a font, this distinction is important.

Today Microsoft is fully backing Unicode, and Windows, MSOffice, SQL Server and all their other softwares fully support Unicode. Other operating system manufactures like Linux and Unix are also not far behind.

So give up your diffidence about Unicode and switch over to Unicode, it is the thing of the future.


 
Marc P (X)
Marc P (X)  Identity Verified
Local time: 23:59
German to English
+ ...
Shouldn’t we all use Unicode? May 13, 2005

Balasubramaniam wrote:

Other operating system manufactures like Linux and Unix are also not far behind.


SuSE Linux sets the system encoding to Unicode by default. The OpenDocument word processor standard, which is the native file format of OpenOffice.org, is an XML file format and hence Unicode. So is TMX, the open standard for translation memory files.

Marc


 
Balasubramaniam L.
Balasubramaniam L.  Identity Verified
India
Local time: 03:29
Member (2006)
English to Hindi
+ ...
SITE LOCALIZER
We are going off topic May 13, 2005

I am afraid we are going slightly off topic. Mihai probably wanted us to discuss whether we should adopt unicode encoding for this site, whereas the discussion has steered off into a debate on the pros and cons of unicode. I am partly responsible for this and request that we come back to the original topic of whether we should switch to unicode encoding for this site or not.

[Edited at 2005-05-14 11:21]


 
Mihai Badea (X)
Mihai Badea (X)  Identity Verified
Luxembourg
English to Romanian
+ ...
TOPIC STARTER
Unicode on Proz May 13, 2005

Dear Balasubramaniam,

I'm no expert on the subject and that's why I find your information very useful. You're right, I’d like to discuss whether we should adopt Unicode on Proz. And I think that Marc’s comments are just supporting the idea that adopting Unicode should be possible.


 
Marc P (X)
Marc P (X)  Identity Verified
Local time: 23:59
German to English
+ ...
Unicode everywhere May 13, 2005

Mihai Badea wrote:

I think that Marc’s comments are just supporting the idea that adopting Unicode should be possible.


I'd like to see Unicode adopted everywhere, Mihai. That includes Proz.com.

Marc


 
Balasubramaniam L.
Balasubramaniam L.  Identity Verified
India
Local time: 03:29
Member (2006)
English to Hindi
+ ...
SITE LOCALIZER
It shouldn't be a problem for the English part of the site but... May 13, 2005

It shouldn't be a problem for the English part of the site (and all those languages that follow the ansi code set, (French?), because the ansi standard is a subset of the Unicode standard, that is unicode is an extension of the ansi standard. In other words, code positions upto 128 of unicode are identical to that of ansi and the later ones are used for other languages in Unicode.

Anyway it is pointless to discuss it by ourselves, unless the site owners and administrators are prep
... See more
It shouldn't be a problem for the English part of the site (and all those languages that follow the ansi code set, (French?), because the ansi standard is a subset of the Unicode standard, that is unicode is an extension of the ansi standard. In other words, code positions upto 128 of unicode are identical to that of ansi and the later ones are used for other languages in Unicode.

Anyway it is pointless to discuss it by ourselves, unless the site owners and administrators are prepared to make the shift to unicode, which could involve a major disruption for them for a while.
Collapse


 
Balasubramaniam L.
Balasubramaniam L.  Identity Verified
India
Local time: 03:29
Member (2006)
English to Hindi
+ ...
SITE LOCALIZER
Probably the site is already unicode compatible May 14, 2005

It is also possible that the site is already unicode compatible, for I have been able to upload EnglishHindi glossaries to the site. This wouldn't have been possible if the site didn't have a database that can handle unicode-encoded text.

The problem that you are referring to about you not being able to see properly what some people are writing in other languages, has more to do with the settings in your browser, and the selection of the correct encoding in your browser.

... See more
It is also possible that the site is already unicode compatible, for I have been able to upload EnglishHindi glossaries to the site. This wouldn't have been possible if the site didn't have a database that can handle unicode-encoded text.

The problem that you are referring to about you not being able to see properly what some people are writing in other languages, has more to do with the settings in your browser, and the selection of the correct encoding in your browser.

The other possibility is they are using a non-unicode font which you don't have in your system.

[Edited at 2005-05-14 11:24]
Collapse


 
Robert Tucker (X)
Robert Tucker (X)
United Kingdom
Local time: 22:59
German to English
+ ...
Javascript May 14, 2005

Looking at:

http://www-306.ibm.com/software/globalization/topics/javascript/unicode.jsp

it would seem that Proz will receive all (Javascript) postings in UTF-16.

If you view the source code for a Proz page the content is given as 'text/css' without anything given for 'charse
... See more
Looking at:

http://www-306.ibm.com/software/globalization/topics/javascript/unicode.jsp

it would seem that Proz will receive all (Javascript) postings in UTF-16.

If you view the source code for a Proz page the content is given as 'text/css' without anything given for 'charset'.

Referring to:

http://en.selfhtml.org/css/eigenschaften/schrift_datei.htm

it would seem that it is possible to define a font for a specific range of Unicode, but there's no Unicode ranges in Proz's css style sheet:

http://www.proz.com/css/v4.css

So how it encodes a page from the UTF-16 is still a mystery to me.

At one time I understood the encoding for Proz was ISO-8895-1 or ISO-8895-15 and maybe I've read that it was changed – if I got the right information in the first place.

If you set the encoding to ISO-8895-1 or ISO-8895-15 (or Unicode UTF-8) to view the page given by Mihai you cannot read the Cyrillic and if you set it to Cyrillic (Windows-1251) you get the Cyrillic but lose the accented e's in the French. No setting on my (Linux) computer will allow proper reading of both.

Some tech input might be useful here!
Collapse


 
Balasubramaniam L.
Balasubramaniam L.  Identity Verified
India
Local time: 03:29
Member (2006)
English to Hindi
+ ...
SITE LOCALIZER
Getting the users to switch over is another ball game May 14, 2005

That leaves the users. There are I think 100,000 registered users of proz.com, of whom 20,000 are active. They probably represent every major language in the world. Many of them would be using legacy fonts, and unwilling to change over. Prakash has already said that he wouldn't want to. There would be many more like him out there. There would also be people who can't, like those who have old dos-based computers (believe me, dos is still used, there are millions of them in India) and those who ha... See more
That leaves the users. There are I think 100,000 registered users of proz.com, of whom 20,000 are active. They probably represent every major language in the world. Many of them would be using legacy fonts, and unwilling to change over. Prakash has already said that he wouldn't want to. There would be many more like him out there. There would also be people who can't, like those who have old dos-based computers (believe me, dos is still used, there are millions of them in India) and those who have Windows98 installed (which does not support unicode). In India at least, Windows98 is the most popular version of the Windows operating system range and is going to hold strong for a few years more.

What do you do in this situation? All that we can probably do is to create maximum awareness about the advantages of unicode and hope for the best. Unicode will eventually get established, there is no doubt about that, but it may take a couple of years, to get everyone out there to switch over.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Shouldn’t we all use Unicode?






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »