Polish forum to be converted to Unicode Thread poster: Andrew Wright (X)
| Andrew Wright (X) United States Local time: 09:50 English
Hello forum goers, We will be converting this forum to using Unciode as the default character set some time later today. There won't be any need to change your browser's default encoding, it should recognize the page's character set automatically once the conversion takes place. Once this is done, you will notice that older posts may not currently be legible. If you view these posts you will see a link reading "Is the text in this post garbled? Click here", or an icon o... See more Hello forum goers, We will be converting this forum to using Unciode as the default character set some time later today. There won't be any need to change your browser's default encoding, it should recognize the page's character set automatically once the conversion takes place. Once this is done, you will notice that older posts may not currently be legible. If you view these posts you will see a link reading "Is the text in this post garbled? Click here", or an icon of a speech bubble containing a question mark. Clicking either of these will bring you to a dailgue that will ask you a few questions to determine how to convert that post into Unicode. For more information on Unicode, see the recently revised Character Set and Localization FAQ: http://www.proz.com/faq/localization For more information on the conversion tool see the about page: http://www.proz.com/?sp=charset_issues If there is anything else that is not fully explained, please let me know. Andrew Wright Site Staff ▲ Collapse | | | Andrew Wright (X) United States Local time: 09:50 English TOPIC STARTER Auto converter enabled | Feb 8, 2006 |
I've enabled an auto converter for old posts in this forum. If everything worked right it should automatically convert most older posts into unicode and any that didn't work can still be manuaully converted via the conversion tool. Of course, I don't actually speak Polish so I need someone here to tell me if everything looks ok. -Andrew Wright | | | Magda Dziadosz Poland Local time: 15:50 Member (2004) English to Polish + ...
Hi Andrew, I've made a random check in past threads and it looks very well! I only noticed one post still a bit garbled, but was able to convert it manually with no prob. We will let you know if there are any problems (hopefully not ) Best, Magda | | | Jaroslaw Michalak Poland Local time: 15:50 Member (2004) English to Polish SITE LOCALIZER
I have checked several popular topics at random... While most of it looks OK, there are many "š" letters, which do not appear in Polish (s with caron, not to be confused with "ś", s with breve, which is certainly Polish!). It should be converted to "ą" (a with ogonek). See, for examp... See more I have checked several popular topics at random... While most of it looks OK, there are many "š" letters, which do not appear in Polish (s with caron, not to be confused with "ś", s with breve, which is certainly Polish!). It should be converted to "ą" (a with ogonek). See, for example: http://www.proz.com/topic/39408 http://www.proz.com/topic/39355 http://www.proz.com/topic/37594 ▲ Collapse | |
|
|
there are many "š" letters, which do not appear in Polish This is what happens when you try to read text encoded in CP-1250 and the browser applies the ISO-8859-2 standard. Other affected letters include ź and ą. Unrelatedly, ś is not s with breve, but s with acute. | | | Magda Dziadosz Poland Local time: 15:50 Member (2004) English to Polish + ... Converting them manually | Feb 8, 2006 |
Hi Jabber, I've seen such as well and they need to be converted manually following the "Is this post garbled..." link. It seems that Windows-1250 coding version needs to be selected each time. Strange, that your post here seems garbled, too.... ? Magda | | | Jaroslaw Michalak Poland Local time: 15:50 Member (2004) English to Polish SITE LOCALIZER The basis for conversion? | Feb 8, 2006 |
Well, it means that the pages were automatically converted from ISO-8859-2. However, it seems that it had been agreed (see: http://www.proz.com/post/176328#176328 ) that the Polish forum will use Win-1250 so I suppose _most_ of the posts will be in that coding. Therefore, I think that Win-1250 should be used as the conversion base. Of course, it would be even better to detect which cod... See more Well, it means that the pages were automatically converted from ISO-8859-2. However, it seems that it had been agreed (see: http://www.proz.com/post/176328#176328 ) that the Polish forum will use Win-1250 so I suppose _most_ of the posts will be in that coding. Therefore, I think that Win-1250 should be used as the conversion base. Of course, it would be even better to detect which coding has originally been used (it should not be that hard, as some of the codes are exclusive...). ▲ Collapse | | | Andrew Wright (X) United States Local time: 09:50 English TOPIC STARTER Default switch | Feb 8, 2006 |
Ok, I just switched the default conversion base from ISO-8859-2 to Windows-1250. Let me know if that works out better or worse. As a side note, detecting the differences between most single-byte character sets (wihch includes all the ISO-8859-# and windows-125# sets) is a difficult or impossible task for a computer. In Windows-1250 a certain character might be displayed as 'š' and in another set it might dsplay as 'ś'. But to the computer all this looks like is "10011011", with... See more Ok, I just switched the default conversion base from ISO-8859-2 to Windows-1250. Let me know if that works out better or worse. As a side note, detecting the differences between most single-byte character sets (wihch includes all the ISO-8859-# and windows-125# sets) is a difficult or impossible task for a computer. In Windows-1250 a certain character might be displayed as 'š' and in another set it might dsplay as 'ś'. But to the computer all this looks like is "10011011", without a character set the computer doesn't know what letter this byte is supposed to be. There are ways to guess based on analysis of the text versus the linguistic properties of the language, but those are too complex to implement here. Andrew Wright ▲ Collapse | |
|
|
Jaroslaw Michalak Poland Local time: 15:50 Member (2004) English to Polish SITE LOCALIZER No text analysis, just codes... | Feb 8, 2006 |
It's just a number of codes, it should be fairly easy to detect which are converted incorrectly, assuming that the text was written in Polish (which is a safe assumption for this forum). For example, if after conversion from Win 1250 the given post contains characters: ± or ¶, it means it should originally be converted from ISO. In other words, if the Polish post contains the character with the code 177 (±), it was written in ISO. On the other hand, if it contains character with ... See more It's just a number of codes, it should be fairly easy to detect which are converted incorrectly, assuming that the text was written in Polish (which is a safe assumption for this forum). For example, if after conversion from Win 1250 the given post contains characters: ± or ¶, it means it should originally be converted from ISO. In other words, if the Polish post contains the character with the code 177 (±), it was written in ISO. On the other hand, if it contains character with the code 154 (š), we might be pretty sure that it was written in Win 1250. (I hope I got the codes right! Only got win character map here to check...) I have selected this pair, as it represents the quite frequent Polish letter "ą" in both code pages. There are other pairs, as well...
[Edited at 2006-02-08 18:13] ▲ Collapse | | | The full story | Feb 9, 2006 |
You can find the full listing of the Polish letter codes (plus the "section sign") in five encodings at http://www.republika.pl/elgec/m2.htm Only CP1250 (aka Windows EE) and ISO-8859-2 (aka ISO Latin 2) are used nowadays, though.
[Edited at 2006-02-09 11:18] | | | Andrew Wright (X) United States Local time: 09:50 English TOPIC STARTER Addition to the system | Feb 9, 2006 |
Hello again, Just now I've added a new feature to the system to the migration system. At the moment what is happening in this forum is that old data is being pulled from the database but converted to unicode before it is sent to your browser for display. This is nice, because most older posts remain visible. However, the down side to this approach is that old data remains in non-unicode format in the database which means that it will not match a search if the search... See more Hello again, Just now I've added a new feature to the system to the migration system. At the moment what is happening in this forum is that old data is being pulled from the database but converted to unicode before it is sent to your browser for display. This is nice, because most older posts remain visible. However, the down side to this approach is that old data remains in non-unicode format in the database which means that it will not match a search if the search is done in unicode. So we needed a way to tell the system that the posts that are automatically converted correctly are legible. But we also didn't want to make this process tedious for whoever was doing it. So today I've added a link that goes next to the "Is this text garbled?" link that reads "Is the text in this post legible?". Clicking this link will use Javascript to let the conversion system that the automatic conversion was correct, but thanks to the Javascript the user won't actually have to leave the page they are viewing to do it. If clicking on the link generates any errors on the page, please copy the error and post it in this topic or email it directly to me. Thanks, Andrew Wright ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Polish forum to be converted to Unicode CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |