Hungarian/Estonian translations in Trados
Thread poster: Studio Moderna

Studio Moderna
Slovenia
Local time: 15:39
Apr 26, 2013

Dear translators,

I am writting in the name of a company that is interested in using Trados translation tool. We would use it for localizing texts from English to 21 different languages.
Apparently there is a problem using Trados for Hungatian and Estonian language (agglutinative languages). Does anyone have an experience translating in Trados with those languages?

Thank you for your answers.


 

Agris Koppel  Identity Verified
Local time: 16:39
English to Estonian
+ ...
Estonian Apr 26, 2013

Hi!
What to you mean with problems? I am translating every day with different CAT tools, including Trados, into Estonian and do not find any special problems. Of course, there are issues regarding the structures of sentences or using glossaries, but they just need to be clarified with client.
Therefore, the fear regarding Estonian is not reasonableicon_smile.gif
With best regards,
Agris


 

Grzegorz Gryc  Identity Verified
Local time: 15:39
French to Polish
+ ...
Terminology recognition... Apr 26, 2013

SMtest wrote:

I am writting in the name of a company that is interested in using Trados translation tool. We would use it for localizing texts from English to 21 different languages.
Apparently there is a problem using Trados for Hungatian and Estonian language (agglutinative languages). Does anyone have an experience translating in Trados with those languages?


The problem arises when one translates from Estonian. Finnish, Hungarian etc.
The terminology recognition basically doesn't work because the term recognition algorithms in Multiterm are not suitable for agglutinative languages.
So, e.g. the QA for terminology will not work properly and throw gazillions of false alarms.
It's one of the reasons why many Finnish/Hungarian translators prefer to use e.g. Wordfast Classic or memoQ where th terminology recognition is based on stemming principle.

Cheers
GG


 

Heinrich Pesch  Identity Verified
Finland
Local time: 16:39
Member (2003)
Finnish to German
+ ...
That is not a problem Apr 26, 2013

You need a translator of those languages to use the software. Did you think it could be done automatically? You can use MT, but the output will not be perfect, not for any language combination. There is no problem using translation environment tools like Trados or Studio etc. with Estonian, Hungarian or Finnish.
False alarms with term recognition in QA? There is nothing you can do about it. Translation is about meaning, not single words.

[Bearbeitet am 2013-04-26 10:24 GMT]


 

Studio Moderna
Slovenia
Local time: 15:39
TOPIC STARTER
not automatic Apr 26, 2013

Heinrich Pesch wrote:

You need a translator of those languages to use the software. Did you think it could be done automatically? You can use MT, but the output will not be perfect, not for any language combination. There is no problem using translation environment tools like Trados or Studio etc. with Estonian, Hungarian or Finnish.
False alarms with term recognition in QA? There is nothing you can do about it. Translation is about meaning, not single words.

[Bearbeitet am 2013-04-26 10:24 GMT]



Of course we are perfectly aware we need an actual translator for Trados. We just want to make sure Trados is the best way to go with these languages.

Regards


 

Veronika Varep
Estonia
Local time: 16:39
Hungarian to Estonian
+ ...
I haven´t noticed any problems Apr 26, 2013

I am using Trados Studio 2011 for translating EN-ET/ET-EN and HU-ET/ET-HU and never faced any problems.

I strongly agree with Heinrich, the software never does the job automatically, you need a translator to do this.


 

Heinrich Pesch  Identity Verified
Finland
Local time: 16:39
Member (2003)
Finnish to German
+ ...
Depends on the glossary Apr 27, 2013

The solution is not to enter complete words into the glossary/multiterm, but only the root. Example:

en building
fi rakennus

If you enter these into your glossary and want to make sure the translation has always "rakennus" where in the English source is "building", you will fail, because you´ll catch only nominative cases. If in the source is "in the building", in Finnish it will be "rakennuksessa", so your search will not find "rakennus" and display an error.

So you should enter only "rakennu" instead of "rakennus".

But this trick solves only part of the problem. "Main building" would be "päärakennus", and your search might not find it. But this applies also to German.


 

Grzegorz Gryc  Identity Verified
Local time: 15:39
French to Polish
+ ...
Algorithms Apr 27, 2013

Heinrich Pesch wrote:

The solution is not to enter complete words into the glossary/multiterm, but only the root. Example:

en building
fi rakennus

If you enter these into your glossary and want to make sure the translation has always "rakennus" where in the English source is "building", you will fail, because you´ll catch only nominative cases.

Muliterm may catch this kind of terms but it depends of the fuzziness level and of the term length.
E.g. it will not work for shorter words, the Multiterm algorithm is trigram based and must have enough trigrams in order to start to work correctly.
It's the reason why the Multiterm guys were unable to detect 2 letter words (e.g. acronyms), this feature was added very recently (as an exact search i.e. the search bypasses the standard algorithm).

If in the source is "in the building", in Finnish it will be "rakennuksessa", so your search will not find "rakennus" and display an error.

So why I said memoQ is far better here.
In memoQ, you can enter rakennu|s in your glossary and memoQ will catch'em all 'cause the pipe separates the invariable and variable part of the term.
I deliberately don't use linguistic terms because the invariable and variable part not always corresponds to the stem/endings/suffixes etc.

So you should enter only "rakennu" instead of "rakennus".

But this trick solves only part of the problem. "Main building" would be "päärakennus", and your search might not find it. But this applies also to German.

Wodrdfast Classic does, AFAIK.
You can enter *rakennu|s and a word like "päärrakennuksessa" should be also recognized.
It's one of very few CAT tools able to detect prefixed terms.

If one heavily uses glossaries, a proper terminology recognition is a must.
So why most HU/FI translators I know personally prefer memoQ or Worfast Classic because Multiterm is basically not suitable for agglutinative languages, the algorithm was designed for inflecting languages like German.
I.e. if the word to be detected is approx. 55% longer than the word in the termbase (which is very frequent for agglutinative languages...), Multiterm will not find it.

BTW, most of these EN/FI translators I know have also Trados license and are "Trados compatible" but they simply prefer to work in a more "tuned" environment it they have a choice.
Very often, they're the CAT hoppers,

Cheers
GG


 

Meta Arkadia
Local time: 20:39
English to Indonesian
+ ...
Bahasa Indonesia May 8, 2013

Grzegorz (or others, but for the moment I put my hopes on you), the algorithms you mention, should the be "built-in" in the CAT tool? And the rules for stemming in a particular language, can I find them somewhere? I speak Bahasa Indonesia rather fluently, but since the written language is quite different from the spoken version, I wouldn't think of translating from BI. My daughter - yes, Meta - is a native speaker of BI though, and I'm teaching her CafeTran. So far, I never worried about stemming, but in the case of BI, it seems to make sense, a lot of sense. In the (outdated) CT handbook, it says:

Prefix matching
When this option is selected, CafeTran will analyze the beginnings of words (here called prefixes) and discard any endings responsible for inflection of words. It is an option which increases significantly the number of hits for highly inflected languages. The length of prefixes is set by a percentage number. The bigger the percent number the longer the prefix of words which the program will analyze. The minimal prefix length option (menu Edit | Options | Memory | Minimal prefix length) lets you set the minimal allowed length of prefixes. The length can also be fixed, when the "fixed" option selected, instead of a set percentage length. It means that all the words will have the minimal prefix length, no matter their actual length.

Custom prefixes
If the inflection of a word is too high for automatic prefix matching you can enter your terms to the memory determining the prefix of a word manually. This is done by inserting the pipe character | at the end of a prefix in a word. For example, the Polish phrase "piękny dzień" (a beautiful day) has a highly inflected word "dzień" occuring in a number of various cases (dnia, dni, dniom). If you insert the pipe characters at the following positions - "pięk|ny d|zień", CafeTran will also recognize other forms of the phrase (pięknego dnia, pięknych dni etc.). Note that inserting the pipe character at the first word in the phrase - "pięk|ny" is optional since its inflection is quite regular and CafeTran should recognize its prefix automatically.


But those Polish examples mean nothing to me, nor do those minimal length settings. An Indonesian example:

pukul = hit, clock
memukul = to hit
dipukul = be hit (deliberately)
terpukul = be hit (by accident)
pukulan = knock, smack
pemukul = beater
pukul-memukul = fighting
and probably a dozen or so more based on pukul.

TIA,

Hans

[Edited at 2013-05-08 04:01 GMT]


 

Grzegorz Gryc  Identity Verified
Local time: 15:39
French to Polish
+ ...
Stemming etc. May 8, 2013

Meta Arkadia wrote:

Grzegorz (or others, but for the moment I put my hopes on you),


I'll answer in a more detailed way tomorrow, I have a tight deadline...

(,,,) CafeTran. So far, I never worried about stemming, but in the case of BI, it seems to make sense, a lot of sense. In the (outdated) CT handbook, it says:

Custom prefixes
If the inflection of a word is too high for automatic prefix matching you can enter your terms to the memory determining the prefix of a word manually. This is done by inserting the pipe character | at the end of a prefix in a word. For example, the Polish phrase "piękny dzień" (a beautiful day) has a highly inflected word "dzień" occuring in a number of various cases (dnia, dni, dniom). If you insert the pipe characters at the following positions - "pięk|ny d|zień", CafeTran will also recognize other forms of the phrase (pięknego dnia, pięknych dni etc.). Note that inserting the pipe character at the first word in the phrase - "pięk|ny" is optional since its inflection is quite regular and CafeTran should recognize its prefix automatically.


In fact, this example is bad i.e. it catches too much false positives.
E.g. "piękn!y d|zień" matches also "piękna dupa", "piękna dłoń", "piękna dama" etc. which have a completely different meaning (i.e. beautiful arse. beautiful hand, beautiful lady).
I still nave no time to dig in in Cafetran but in memoQ, I should use there a "cluster" of terms i.e.:
- piękny dzień (forced exact matching for Nom./Acc. sg.)
- piękn|ego dni|a (Gen. sg., stemming matching working for all the other cases).
The idea of "clusters" is very effective for languages with very complex morphology and for suppletive forms.
BTW, Cafetran would be probably one of my very first choices as a CAT tool, after DVX, of course icon_smile.gif

But those Polish examples mean nothing to me, nor do those minimal length settings.
An Indonesian example:

pukul = hit, clock
memukul = to hit
dipukul = be hit (deliberately)
terpukul = be hit (by accident)
pukulan = knock, smack
pemukul = beater
pukul-memukul = fighting
and probably a dozen or so more based on pukul.


As I understand, the prefixes like di- or ter- are used in order to modify the meaning of the basic word (in fact, just like in Polish), the method proposed by Cafetran (or e.g. memoQ) will not work here, except for "pukulan" and similar.
The beginning of the word differs and only Wordfast Classic would do, AFAIK.
IMO you should suggest to Igor to implement a rule for recognition of pairs like PL robić/zrobić (to do, perfective vs imperfective), if would solve recognition problems with pukul/dipukul etc.
If it's not already done icon_smile.gif

Cheers
GG

[Edited at 2013-05-08 09:36 GMT]


 

Orsolya Bugar-Buday  Identity Verified
Hungary
Local time: 15:39
Member (2013)
English to Hungarian
+ ...
No problem with Hungarian... May 8, 2013

Hello,

I don't really see why the use of Trados should be a problem for agglutinative languages. Perhaps you meant MultiTerm? I've been using Trados from and into Hungarian for almost 10 years, and the problems I experienced had nothing to do with the languages themselves.

Best regards,
Orsolya


Studio Moderna wrote:

Dear translators,

I am writting in the name of a company that is interested in using Trados translation tool. We would use it for localizing texts from English to 21 different languages.
Apparently there is a problem using Trados for Hungatian and Estonian language (agglutinative languages). Does anyone have an experience translating in Trados with those languages?

Thank you for your answers.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Hungarian/Estonian translations in Trados

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search