Russian Charactes corrupt after closing a segment
Thread poster: Stefan Gentz
Stefan Gentz
Stefan Gentz
Local time: 12:57
English to German
+ ...
Jul 22, 2004

The following problem with Russian RTFs (FrameMaker S-Tagger RTFs, actually) that I got back from translation: I need to go through the unclean RTF and check each segment. Bad: When I close a segment all russian characters get corrupted (prob. wrong CodePage). They show up correct before I open a segment and in an open segment). When I close the segment they get corrupted.
I have written a v
... See more
The following problem with Russian RTFs (FrameMaker S-Tagger RTFs, actually) that I got back from translation: I need to go through the unclean RTF and check each segment. Bad: When I close a segment all russian characters get corrupted (prob. wrong CodePage). They show up correct before I open a segment and in an open segment). When I close the segment they get corrupted.
I have written a very simple Word VBA Macro (here) that changes the wrong chars to the correct CYR ones in the end. Nevertheless this is not very satisfying.

Question: Why does this happen?
I have tried the following steps until now without success:
* In TMW > File > Setup Fonts, Target Defalut Font "Arial CYR"
* "Translate into target default for all other fonts" activated.
* "Microsoft Office 2003 Language Settings" (the tool that can be found in startmenu in "Microsoft Office Tools" shows support for Russian in "Enabled Languages" listbox
* WordLang.exe in TRADOS *TT folder shows (of course) the same
* I also started Word + TMW via MS AppLocal with CP 1251 emulation (CYR)

All these steps: No sucess. Of course I have tried all possible combination of these steps. I changed the font of style "Normal" to Arial in Word and back to Times New Roman. I also checked the font versions of my Arial (Version 2.98 -> latest vesion with CP 1251 support) and Times New Roman (Version 2.97 -> latest vesion with CP 1251 support).

Of course, my very first approach was to open the unclean RTFs in TagEditor (as I prefer tag fixing in TE anyway as it's MUCH more convenient and faster!). But the RTFs have that many errors that I cannot open them in TE (several tw4winMark seems to be corrupted as well).

Besides I have the same problems with Greek. Just the same effect.

Frankly, I'm quite at my wits' end.

I'm hoping that someone has a solution for this. I tried to search the forum but did not really get any good matches.

[Edited at 2004-07-23 14:00]
Collapse


 
Natalie
Natalie  Identity Verified
Poland
Local time: 12:57
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
Hi Stefan Jul 22, 2004

Please repost in the Russian forum - this problem was discussed there, but in Russian, therefore the search was of no help.

Best,
Natalia


 
Stefan Gentz
Stefan Gentz
Local time: 12:57
English to German
+ ...
TOPIC STARTER
Good tip, but I don't speak russian :-( Jul 22, 2004

Hi Natalie,

thanks so much for your quick reply. Unfortunately I don't speek russian at all (but I have to work on the files so to say in a technical, not linguistic way (tag verification and fixing)). As the russian forum is as far as I can see completely in Russian, I don't understand a single word or know who to post a question in russian language.
Maybe you could give me a brief summary of what they figured out there? Or post a link if there is an answer in english that mi
... See more
Hi Natalie,

thanks so much for your quick reply. Unfortunately I don't speek russian at all (but I have to work on the files so to say in a technical, not linguistic way (tag verification and fixing)). As the russian forum is as far as I can see completely in Russian, I don't understand a single word or know who to post a question in russian language.
Maybe you could give me a brief summary of what they figured out there? Or post a link if there is an answer in english that might help me? This would be much appreciated!

Thanks again for your help!
Collapse


 
Victor Sidelnikov
Victor Sidelnikov  Identity Verified
Russian Federation
Local time: 14:57
English to Russian
+ ...
Check your register Jul 23, 2004

Stefan, start Regedit and check your regiser: HKEY LOCAL MACHINE | SYSTEM | CurrentControlSet | Control | Nls | CodePage. If you see row "1252 - cp 1252.nls, correct one in the following way:
for Win98 - change on "cp 1251.nls" ; for Win NT/2000/XP - change on "c 1251.nls".

May be this help you.

[Edited at 2004-07-23 04:53]


 
Natalie
Natalie  Identity Verified
Poland
Local time: 12:57
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
Stefan Jul 23, 2004

I did not mean posting in Russian, I meant posting in the Russian forum - but in English

 
Stefan Gentz
Stefan Gentz
Local time: 12:57
English to German
+ ...
TOPIC STARTER
Registry Patch Jul 23, 2004

Hello Victor,

thanks for the registry patch tip. But I'm wondering if you are aware what it does? It maps all western characters (CP 1252) to the Russian Codepage. And it does this for your whole windows environment which is everything else but acceptable especially - not only because all windows applications show western special chars wrong now but especially because it results in wrong characters in the german source segments - all german special chars are mapped to cyrillic chara
... See more
Hello Victor,

thanks for the registry patch tip. But I'm wondering if you are aware what it does? It maps all western characters (CP 1252) to the Russian Codepage. And it does this for your whole windows environment which is everything else but acceptable especially - not only because all windows applications show western special chars wrong now but especially because it results in wrong characters in the german source segments - all german special chars are mapped to cyrillic characters which is not acceptable neither for me nor the client.

I'm really angry now about this bug in TRADOS and I'm wondering if there is no better solution provided by TRADOS.

Nevertheless your help and the tip is much appreciated
Collapse


 
Ralf Lemster
Ralf Lemster  Identity Verified
Germany
Local time: 12:57
English to German
+ ...
Contacted Trados Support? Jul 23, 2004

Hi Stefan,
Did you contact Trados Support? If not, I will alert them to this thread - just don't want to duplicate things.

Best regards, Ralf


 
Stefan Gentz
Stefan Gentz
Local time: 12:57
English to German
+ ...
TOPIC STARTER
TRADOS Support Jul 23, 2004

Hi Ralf,

no, I have not contacted them yet. My hunch is, that they will tell to do the registry change. But as mentioned this is not an acceptable solution due to the negativ side effects. Nevertheless it might be worse a try. Yes, if you could contact them regarding this thread this would be great.


 
arterm
arterm  Identity Verified
Serbia
Local time: 12:57
English to Russian
solution that works with us: change all the font to Times NEw Roman Jul 23, 2004

Somehow this bug is resolved in all Trados versions up to 6 just by changing all of the fonts to Times New roman
Just Select All the text and change the font
You can change it back when you need
This tip works with Russian and I have just tried it again.
However there is no guarantee it will work on your particular system.
It works well with all tagged texts because font really does not matter in this kind of documents.

[Edited at 2004-07-23 19:02]


 
Ralf Lemster
Ralf Lemster  Identity Verified
Germany
Local time: 12:57
English to German
+ ...
TTX? Jul 26, 2004

Hi again, Stefan,
I got feedback from Trados Support.
Have you tried using TagEditor, but generating TTX files (as opposed to RTF)? Apparently, there's a setting in S-Tagger (which I'm not using myself, so I cannot be more precise here) that defines the output format for the STF file.

Please let me know if that works. If not, please contact me through my profile, so we can exchange files.

Best, Ralf


 
Stefan Gentz
Stefan Gentz
Local time: 12:57
English to German
+ ...
TOPIC STARTER
Good old Trados support ... Jul 26, 2004

... never knows a not-out-of-the-box answer. Odd, but somehow I knew in advance that they were not able to provide a solution. Though I know quite some people who have this problem with Russian and Greek, TRADOS' only answer is to use TE.
Obviously they did not even make the effort to read this thread, because otherwise they would have read in my initial posting:

Of course, my very first approach was to open the unclean RTFs in TagEditor (as I prefer tag fixing in TE anyway as it's MUCH more convenient and faster!). But the RTFs have that many errors that I cannot open them in TE (several tw4winMark seems to be corrupted as well).


Anyway, I went through the files now and didn't care if the cyrillic segments corrupted after closing the segment. In the end I ran my macro (link posted above) and changed the corrupted CYR chars back to the correct ones.
And yes, of course, using LSP 6.5 I always create TTX files as I hate to get dirty RTF files back from translation. Unfortunately not all translators have TE and others are not willing to work with it. Personally I absolutely prefer TE and TTX file format.

Well, conclusion: The problem still exists (even if I was able to fix the broken characters with a self-written makro).
TRADOS Support Solution: None.
As always when it's a little beyond than what the user manual tells you anyway.

Whatsoever, thanks again, Ralf and the others who tried to find a solution, for your help and dedication. Much appreciated!


 
Ralf Lemster
Ralf Lemster  Identity Verified
Germany
Local time: 12:57
English to German
+ ...
You were referring to RTFs... Jul 26, 2004

Hi Stefan,
Your initial post specifically referred to RTFs, which is why I find your reaction (and conclusion) a bit unfair.

Trados offered me to send in the files to have them checked; since you didn't specify your system details (OS, Office) a remote diagnosis is a bit difficult.

I suggest you contact them directly.

Thanks, Ralf


 
Jerzy Czopik
Jerzy Czopik  Identity Verified
Germany
Local time: 12:57
Member (2003)
Polish to German
+ ...
Have you checked such obvious things Jul 26, 2004

as the language of the RTF-file you are working with?
If it is set to any other language but Russian (I mean the style "Standard"), this may cause severe problems.
Did you check, if the file has possibly some "Asian" formatting?
Have you set the comatibility of the file to your current Word version?

Font and format changes in RTF does not affect the target format in any way, so you can change them as you wish.

Does this happen to all segments in that f
... See more
as the language of the RTF-file you are working with?
If it is set to any other language but Russian (I mean the style "Standard"), this may cause severe problems.
Did you check, if the file has possibly some "Asian" formatting?
Have you set the comatibility of the file to your current Word version?

Font and format changes in RTF does not affect the target format in any way, so you can change them as you wish.

Does this happen to all segments in that file? Or are only some segments affected? I do not know why that happens, but sometimes it is really easier to work with the file and forget about character changes, to change them all afterwards using such macro as you suggest.
This happens very often, wehn I get Word files, generated in Asia...

If I were you, I would go for Ralf's solution and send the files to Trados - they do take care and provide some not out-of-the-box solutions. Just give it a try...

Regards
Jerzy
Collapse


 
Stefan Gentz
Stefan Gentz
Local time: 12:57
English to German
+ ...
TOPIC STARTER
What I have isolated until now ... Jul 26, 2004

Hi Jerzy and Ralf,

I have not sent the file to TRADOS yet, because I heard from a friend that TRADOS has no solution for this problem. Also the only suggestion they have currently is to apply the registry codepage patch (they even have a support document for this online) which is not acceptable for me.

Jerzy, thanks much for your suggestions. Good points. I have done the following now:

* Set Paragraph Language of Style "Normal" in translated RTF to Russian<
... See more
Hi Jerzy and Ralf,

I have not sent the file to TRADOS yet, because I heard from a friend that TRADOS has no solution for this problem. Also the only suggestion they have currently is to apply the registry codepage patch (they even have a support document for this online) which is not acceptable for me.

Jerzy, thanks much for your suggestions. Good points. I have done the following now:

* Set Paragraph Language of Style "Normal" in translated RTF to Russian
* Set Paragraph Language of Style "Normal" in TRADOS6.dot to Russian
* Set Paragraph Language of Style "Normal" in normal.dot to Russian
* Set ActiveDoc Compatibility to Word 2003

All this did not bring any solution at all.

But with a more close look I detected the following:

1. If there is only one TU in a paragraph the characters do not corrupt, but

2. if you open the same TU again, the characters get corrupted in the open TU (and not only after closing the TU).

3. If there is more than one TU (e.g. two full sentences), opening and closing the first segment does not corrupt the characters. After opening and closing the second segment all CYR characters in both TUs get corrupted.


Asian Characters do not appear (while I know this problem, too! - an issue I had reported to TRADOS Support some time ago - their answer was (Mr. Luis Lopes, April 21, 2004): "We don't have this problem here. I cannot explain why this problem happens." - Yeh, that's what I call great support, actually!).

Anyway, I finnally tried the most simple thing of all: I opened the original source language RTFs that were sent to the russian translator and retranslated them with TMW. Unbelievable, but: In this RTFs everything worked fine - no char corruption at all. So, there must have happened something to the internal encoding of the files during translation.

I created a new empty doc, copy and pasted the content (Ctrl-A) from the translated RTF in to the new RTF and saved the new one. Surprise: In this RTF I was able to work properly, open close and open again TUs without any problems. No more char corruption at all! This is bizarre, isn't it?

My hunch was now, that somehow the font econding information in the RTF were corruted. I went ahead and checked the RTF header in the source code of the RTF. As expected the font table ({\\fonttbl *** }) covered tons of fonts including several asian fonts like MS Mincho. I "saved as" the RTF into a new file to clean up the RTF header a little bit which actually reduced file size from about 600 to about 300 Kb.
After this I simply copied the complete fonttbl section from a "good RTF" into the "bad" rtf. After this I opened the problematic RTF again and opened and closed a few segemnts. You may understand how surpised I was to see that everything worked fine now and that I had no more character corruption at all.
I have created a readable RTF Font Table Header that shows the differences: RTF Header Compare.


Conclusio

Beside this very in-depth RTF source code technical approach, the probably most easy solution is to create a new empty document, copy and paste the complete content from the problem RTF into it and save the new document as RTF. I have tested it with a couple of files and it worked like a charm.

Also let me mention please, that I have only tested this on a german Windows XP with US Word 2003. It might not work with earlier Word versions or on other operationg systems. You might give it a try though.

Cheers
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Russian Charactes corrupt after closing a segment







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »