SDL Trados 2011 not recognizing the correct Romanian special characters
Thread poster: Daniel Grigoras
Daniel Grigoras
Daniel Grigoras  Identity Verified
Romania
Romanian to English
+ ...
Apr 12, 2013

Hello,

I hope SDL Trados will take note of and fix the error that I am just about to report.

Background information

Prior to Windows Vista, the Romanian special characters "ş" and "ţ" were wrongfully rendered with a cedilla as "ş" and "ţ" instead of with a comma. This error and its correction is explained on Wikipedia:

"Many printed and online texts still wrongfully use "s with cedilla" and "t with cedilla"... The lack of... See more
Hello,

I hope SDL Trados will take note of and fix the error that I am just about to report.

Background information

Prior to Windows Vista, the Romanian special characters "ş" and "ţ" were wrongfully rendered with a cedilla as "ş" and "ţ" instead of with a comma. This error and its correction is explained on Wikipedia:

"Many printed and online texts still wrongfully use "s with cedilla" and "t with cedilla"... The lack of support for the comma diacritics has been corrected in current versions of major operating systems: Windows Vista or newer... As mandated by the European Union, Microsoft released a font update to correct this deficiency in Windows XP in early 2007, soon after Romania joined the European Union [...] Writing letters ș and ț with a cedilla instead of a comma is considered incorrect by the Romanian Academy. Romanian writings, including books created to teach children to write, treat the comma and cedilla as a variation in font." (http://en.wikipedia.org/wiki/Romanian_alphabet ; http://en.wikipedia.org/wiki/Romanian_alphabet#Comma-below_.28.C8.99_and_.C8.9B.29_versus_cedilla_.28.C5.9F_and_.C5.A3.29)


So despite the European Union demanding that this error be fixed, as a result of the notifications it received, many software out there still lack support (e.g. MemoQ).
On Windows XP, In order to be able to use, or at least be able to see these characters in their correct form, one has to install the "European Union Expansion Font Update" which can be found here: http://www.microsoft.com/en-us/download/details.aspx?id=16083.
This should enable you to see the proper characters.

If one would like to also be able to input this characters in Windows XP, after installing that Expansion Font Update, one will have to install an additional driver in order to be able to use the correct "Standard Romanian Keyboard" instead of the incorrect one, called "Legacy". In order to do this you will have to download and install kbdro_2.3.exe from http://www.secarica.ro/html/ro_kbd_winxp.html and then go to Control Panel -> Date, Time, Language, and Regional Options -> Regional and Language Options -> Languages -> Details -> Add -> and here choose as "Input language:" Romanian, and as "Keyboard Layout/IME:" Romanian (Standard).

If you are using Windows Vista or newer (I use Windows 7 x64), then all you have to do in order to use the correct Romanian characters is to go to Control Panel -> Clock, Language, and Regional -> Change keyboards or other input methods -> Change keyboards -> Add -> and here choose Romanian (Standard).

The issue

Well, to put it briefly, SDL Trados 2011 has not taken note of this issue, and because I am using the proper keyboard layout and I insert the correct Romanian characters, the words that contain the correct form of "ş" and "ţ" are underlined as misspelled.

SDL, please show respect to your paying customers and fix this simple but crippling issue.

[Edited at 2013-04-13 07:30 GMT]
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
hunspell encoding problem Apr 13, 2013

Darius Daniel Grigoras wrote:

(...)

The issue

Well, to put it briefly, SDL Trados 2011 has not taken note of this issue, and because I am using the proper keyboard layout and I insert the correct Romanian characters, the words that contain the correct form of "ş" and "ţ" are underlined as misspelled.

SDL, please show repect to your paying customers and fix this simple but crippling issue.


By default, Studio uses the hunspell spellchecker file which uses an outdated code page i.e. ISO 8859-2.
It will never work correctly with this setup unless the file is recoded as UTF-8.
You can try it yourself and recode ro_RO.aff and ro_RO.dic files as UTF-8 (you must also declare UTF-8 in the first line of the ro_RO.aff file, maybe some other steps are necessary), then replace globally the t-cedilla by t-comma etc.
So, it should be relatively easy to fix but nobody cares.

BTW, it's a common problem with Romanian in hunspell, e.g. I'm sure the earlier memoQ versions had exactly the same problem, it was reported for some people 3 years ago, AFAIR.
I don't know if it's solved in memoQ now (my Romanian is too poor, I never tried to use CAT tools for Romanian, so I didn't test it).
Generally I'm still amazed the CAT technology providers have no basic idea of some very common language related problems.

As a workaround, you can use the MS Word spellchecker instead.
Tools, Options, Editor, Spelling, Active Spell Checker.

Of course, it will work only if one has Romanian spellchecker module in Word but I suppose you have it

Cheers
GG


 
Daniel Grigoras
Daniel Grigoras  Identity Verified
Romania
Romanian to English
+ ...
TOPIC STARTER
Hunspell Apr 13, 2013

Thanks Grzegorz


I've tried "re-encoding" (actually manually editing ro_RO.aff and Search&Replacing ro_RO.dic with Notepad++), but I received the following error:



Using the MS Word dictionary seems to work tough.

About MemoQ, I remember trialing it in November last year, because I worked on a large project, and my main CAT Tool back then, Fluency, was not up to the job. Fluency had serious problems importing complex formatted documents, and this mainly because of MS Word junk or rogue code (maybe to make it incompatible with Open/LibreOffice), which I later found out can be cleaned using a macro called CodeZapper, that Atril's Déjà Vu employs.
Nonetheless, I was in the middle of a very large project all by my own, converting dead PDF's to MS Word, formatting, then importing in Fluency to translate, but as I said Fluency was not up to the job. MemoQ worked fine, but when I exported my translated documents, the special characters s-comma and t-comma were replaced with squares. Now that could easily have been rectified with Search-Replace, but the squares were all the same, so I had worked in vain.
Anyway, I ended up being very disappointed with MemoQ because of this.
So it was a really sad story. Fluency even stopped working after a short power outage. When I restarted my computer, it wouldn't work. I contacted Fluency support, asked how can this be possible, and they told me that there probably is some locked dll. I had to uninstall and reinstall Fluency.
Because of the fiasco Fluency caused me, and of MemoQ's trial version also failing to help back then, I recently decided buying SDL Trados, which is far better, but could be even better.

[Edited at 2013-04-13 08:15 GMT]


 
Alina - Maria Chiteala
Alina - Maria Chiteala  Identity Verified
Romania
Local time: 06:36
Member (2011)
English to Romanian
+ ...
Issues Apr 13, 2013

As I tested MemoQ, Fluency, Wordfast, Across, and many others, I still find Trados to be better than all those, even if some colleagues might argue that.
So use Word spelling and you should be just fine.


 
Daniel Grigoras
Daniel Grigoras  Identity Verified
Romania
Romanian to English
+ ...
TOPIC STARTER
LibreOffice's spellchecker Apr 13, 2013

I agree Maria.

As I also own a Wordfast Studio license, I just checked its spellchecker and found out that it uses the correct Romanian characters. Replacing SDL Trados' ro_RO.aff and ro_RO.dic with those of Wordfast's turned out to work just fine with Trados' Hunspell.

So much for the re-encoding.

I also just now checked LibreOffice's spellchecking files, and I think this might be the source of Wordfast's spellchecker. Anyway, LibreOffice's dictionary seem
... See more
I agree Maria.

As I also own a Wordfast Studio license, I just checked its spellchecker and found out that it uses the correct Romanian characters. Replacing SDL Trados' ro_RO.aff and ro_RO.dic with those of Wordfast's turned out to work just fine with Trados' Hunspell.

So much for the re-encoding.

I also just now checked LibreOffice's spellchecking files, and I think this might be the source of Wordfast's spellchecker. Anyway, LibreOffice's dictionary seems to be about 30.000 words larger (actually lines), so I replaced again Trados's spellchecking files with LibreOffice's and I haven't received any errors so far.

ro_RO.dic - lines in Notepad++ (lines may not be quite the same thing with words):

  • SDL Trados': 33.277

  • Wordfast's: 153.526

  • LibreOffice's: 180.889

  • Fluency's: uses that of LibreOffice

  • Adobe's: uses that of LibreOffice but only has 170.039



[Edited at 2013-04-13 08:16 GMT]
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
Recipe Apr 13, 2013

Darius Daniel Grigoras wrote:

I've tried "re-encoding" (actually manually editing ro_RO.aff and Search&Replacing ro_RO.dic with Notepad++), but I received the following error:



You did something wrong, probably the encoding or the encoding statement is incorrect.
It works here.



Yous should open both files in Notepad++, select Encoding, Convert to UTF-8 without BOM, then perform the global search and replace in both files.

In the ro_RO.aff file, the first line should be:
SET UTF-8

The same operation will enable the corrrect Romanian chars handling in memoQ.
Like the Trados guys, they were unable to find this simple solution during at least 4 years....

About MemoQ, I remember trialing it in November last year, (...)

It was a bad period for them.
They introduced a lot of important changes and improvements but the software got somehow unstable.

Cheers
GG

[Edited at 2013-04-13 09:28 GMT]


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
The best one :) Apr 13, 2013

Alina - Maria Chiteala wrote:

As I tested MemoQ, Fluency, Wordfast, Across, and many others, I still find Trados to be better than all those, even if some colleagues might argue that.

Indeed
DVX2 is the best

Cheers
GG


 
Daniel Grigoras
Daniel Grigoras  Identity Verified
Romania
Romanian to English
+ ...
TOPIC STARTER
Notepad++ Apr 13, 2013

Thanks Grzegorz

I knew about the encoding, and I knew that Notepad++ sometimes doesn't need to be told what encoding it needs to use because it automatically uses the encoding needed to preserve the correct characters. Nonetheless, that was true for the ro_RO.dic, because there I made replacements with the correct characters, but not in the file itself, but by creating a new file whose encoding Notepad ++ automatically set to ANSI as UTF-8 (but probably not without BOM).
But t
... See more
Thanks Grzegorz

I knew about the encoding, and I knew that Notepad++ sometimes doesn't need to be told what encoding it needs to use because it automatically uses the encoding needed to preserve the correct characters. Nonetheless, that was true for the ro_RO.dic, because there I made replacements with the correct characters, but not in the file itself, but by creating a new file whose encoding Notepad ++ automatically set to ANSI as UTF-8 (but probably not without BOM).
But the conversion is the only solution for the ro_RO.aff and probably the best overall.

About MemoQ, indeed, besides the character corruption, I also remember receiving numerous strange and crippling errors that made me think very bad of MemoQ.

Regarding Déjà Vu, I haven't tried it yet, but the developer of CodeZapper seems to prefer it.

Anyway, I don't quite see now the use of conversion, as LibreOffice's spell-checking files seem to work very well with Trados. Nonetheless, I followed your instructions and tested it: it works!

I now wonder whether I should use LibreOffice's spell-checking files with SDL Trados even for English, not only Romanian.

[Edited at 2013-04-13 10:07 GMT]
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
Replacing hunspell files... Apr 13, 2013

Darius Daniel Grigoras wrote:

As I also own a Wordfast Studio license, I just checked its spellchecker and found out that it uses the correct Romanian characters. Replacing SDL Trados' ro_RO.aff and ro_RO.dic with those of Wordfast's turned out to work just fine with Trados' Hunspell.


Yes, these files can be replaced with no problem, the hunspell engine is basically the same and should handle 'em correctly.
Nonetheless, it's somehow annoying the same files are written several times on the HDD.

ro_RO.dic - lines in Notepad++ (lines may not be quite the same thing with words):

  • SDL Trados': 33.277

  • Wordfast's: 153.526

  • LibreOffice's: 180.889 (...)



Basically, one line is one word but some lines may contain "irregular" forms which can't be created using the rules in the aff file.
The LibreOffice is simply bigger.

Cheers
GG

[Edited at 2013-04-13 10:43 GMT]


 
Daniel Grigoras
Daniel Grigoras  Identity Verified
Romania
Romanian to English
+ ...
TOPIC STARTER
(Ir)regular forms Apr 13, 2013

Why would SDL Trados' Hunspell be unable to use the rules in the *.aff file I've copied from LibreOffice?

 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
BOM, LibreOffice and some off topics... Apr 13, 2013

Darius Daniel Grigoras wrote:

I knew about the encoding, and I knew that Notepad++ sometimes doesn't need to be told what encoding it needs to use because it automatically uses the encoding needed to preserve the correct characters.

The problem is the hunspell files use the Linux convention i.e. its files have no BOM.
Normally a Windows program can't detect automatically this kind od encoding (the encoding detection is based on the BOM).
So why it's important to handle it manually.

About MemoQ, indeed, besides the character corruption, I also remember receiving numerous strange and crippling errors that made me think very bad of MemoQ.

Well, memoQ has probably the best term recognition algorithms for languages like Romanian (postponed article, e.g. in Trados it's impossible to get a hit for "locuri" when "loc" is defined in the termbase etc., memoQ does).
If one uses heavily the terminology, it may make sense.
As memoQ is my secondary tool (after DVX) and it represents max. 10% of my work time (Studio is approx. 5%), I can't really judge it's more or less stable than Studio, it's basically the same for me.

Regarding Déjà Vu, I haven't tried it yet, but the developer of CodeZapper seems to prefer it.

Because it's the best
Seriously speaking, its main advantages it a very high automation level, machine translation approach (in fact, DVX is a simple machine translation engine), multiple instances (e.g. I can simultaneously translate and pretranslate the same file) and an easy networking (I work with my wife, we simply point the same project on the LAN and we can work on the same file).
Of course. every tool has its quirks, so why I use also memoQ, the old Trados or Studio.

Anyway, I don't quite see now the use of conversion, as LibreOffice's spell-checking files seem to work very well with Trados.

In fact, it's the easiest solution...
I should have thought about it earlier...

I now wonder whether I should use LibreOffice's spell-checking files with SDL Trados even for English, not only Romanian.

In fact, everything is best than these outdated hunspell dictionaries shipped with Studio.
I just checked the latest hunpell version shipped with Studio 2011 SP2R is exactly the same as the version shipped initially with Studio 2009, so it's at least 4 year old, probably far more.
IMO you should use the LibreOffice version.
Fără ezitare

Cheers
GG


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
Rules Apr 13, 2013

Darius Daniel Grigoras wrote:

Why would SDL Trados' Hunspell be unable to use the rules in the *.aff file I've copied from LibreOffice?


No, it will use it.

Probably I was imprecise.
I would say the general problem is some aff rules are too simple.
Let's say, "domn "and "doamna" are basically the same word for me because the alternation o/oa is obvious in Romanian but hunspell doesn't handle this kind of alternations (in the stem), so why both forms are included as separate entries.

This example is maybe stupid but it proves the implementation of some rules is too difficult and it's simply easier to put a series of "difficult" forms e.g. at the end of the LO dictionary, you have a long series of gerunziu the Studio dictionary lacks.
E.g., as the rules in the original aff file from Studio are... huh... primitive.., by default Studio will be unable to create "zvonindu" from "zvon" and will flag it as spelling error.
The LO dictionary contains explicitly "zvonindu" and LO will not throw the spelling error although it can't generate this form using the aff rules.
I selected "zvonindu" 'cause it's the latest entry in the Studio hunspell dictionary, I was too lazy to find better examples

In few words, the dictionaries from LO are far better.

Cheers
GG


 
Daniel Grigoras
Daniel Grigoras  Identity Verified
Romania
Romanian to English
+ ...
TOPIC STARTER
Déjà Vu & memoQ Apr 13, 2013

I admit that when I trialled memoQ I was surprised by its suggestion feature and spell-checker. I remember this very well. Nonetheless, the character corruption upon exporting the translated document, and some subsequent errors that prevented me from continuing work made me change my mind. But that was back then. MemoQ is still an impressive tool.

It's sad though that I find so late that Atril's Déjà Vu would be the best. Anyway, SDL Trados is required by one of my clients, and ma
... See more
I admit that when I trialled memoQ I was surprised by its suggestion feature and spell-checker. I remember this very well. Nonetheless, the character corruption upon exporting the translated document, and some subsequent errors that prevented me from continuing work made me change my mind. But that was back then. MemoQ is still an impressive tool.

It's sad though that I find so late that Atril's Déjà Vu would be the best. Anyway, SDL Trados is required by one of my clients, and many other potential ones.

With Fluency I had a very bad experience. I ended up defaulting on a large project and lost $2600. Well, actually I lost that client (my best client), so I lost a lot more money. The project itself was larger in value, so I managed to send the first instalment, but I gave up on the last two, as I was mentally exhausted by all the work and especially because of the stress caused by the numerous errors. I taught of suing Western Standard, the Utah based company that developed Fluency for their in-house needs, but taught of competing with the other CAT tools on the market.
Word itself wasn't better (as Fluency and MemoQ failed to help me, I had to work in Word). I had numerous similar tables that had a common heading, though large, and if I used copy-paste, well the highly complex table would be garbled, no matter the paste option, i.e. "Keep text only." So thanks to Microsoft and Western Standard I lost a great deal of money.

About "zvonindu," that's not actually a proper word. The proper gerund would be "zvonind" (with the noun "zvon" as its root), but "zvonindu" is used to form contracted forms like "zvonindu-i," "zvonindu-le," the part after the hyphen being the pronoun. "Zvonindu" cannot stand on its own.

[Edited at 2013-04-13 12:48 GMT]
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 05:36
French to Polish
+ ...
Jingle bells :) Apr 13, 2013

Darius Daniel Grigoras wrote:

I admit that when I trialled memoQ I was surprised by its suggestion feature and spell-checker. I remember this very well. Nonetheless, the character corruption upon exporting the translated document, (...)

In my normal workflow, I always test the export at the very beginning.
If something goes wrong, I know it immediately and I can try to find some workaround before it gets critical.
It's a good practice, independently of the CAT tool.

It's sad though that I find so late that Atril's Déjà Vu would be the best.
Anyway, SDL Trados is required by one of my clients, and many other potential ones.

In fact, Trados (the "classic" one and Studio) represents approx. 85%, maybe 90% of my jobs.
So why I have a Trados license in order to process 'em properly.
But I simply reimport the Trados files and I translate 'em in DVX.

With Fluency I had a very bad experience. (...)

I tested it very shortly, so I can't really comment it.
It was not a way I would work, so I dropped it very rapidly.

About "zvonindu," that's not actually a proper word. The proper gerund would be "zvonind" (with the noun "zvon" as its root), but "zvonindu" is used to form contracted forms like "zvonindu-i," "zvonindu-le," the part after the hyphen being the pronoun. "Zvonindu" cannot stand on its own.

You know it better
I never learned the Romanian grammar, I just use it "as is" on a very low level, i.e. the spoken Romanian is a PITA for me but I read it pretty well "comparandu-le cu" other Romance languages.
At least, I learned something today
BTW, the Slavic background helps a lot, w.g. in Polish, "zvon" correspond to "dzwon" i.e. bell with some similar connotations (ringing, rumour, noise...).

Cheers
GG

[Edited at 2013-04-13 13:58 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

SDL Trados 2011 not recognizing the correct Romanian special characters







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »