Studio 2011, question on "Pre-Translate" behaviour
Thread poster: owhisonant (X)

owhisonant (X)  Identity Verified
Germany
Local time: 13:09
German to English
+ ...
Nov 17, 2013

Hello,

This not urgent, just out of general interest:

A project I am currently working on includes an extensive translation memory. I ran “pre-translation” on both (Word 2007) files in the project, resulting in a good deal of fuzzy matches (75% and above, as per my setting in Preferences).

So far, so good.

Here’s a source phrase from one of the files:

Nullstellung geschlossen (NC)

After pre-translation, this phrase shows a fuzzy match of 78% with the target phrase:

closed (NC) without power

So far, so good…

However, when I run a concordance search (F3) on "Nullstellung geschlossen", it shows me a 100% match in the TM:

Zero position closed,

which is obviously more correct and missing only the (NC) addition.

I can find other examples of this sort of thing throughout the file, and I find it curious that Trados would choose a 78% match in pre-trans instead of the close to 100% match which is obviously contained in the TM. At your convenience, any ideas for me, as in: Is there a way to change this behaviour?

Thanks in advance,

OW


 

SDL Community  Identity Verified
United Kingdom
Local time: 13:09
English
I think you're missing some info... Nov 17, 2013

... what was the source TU found in the lookup and in the concordance?

Regards

Paul


 

owhisonant (X)  Identity Verified
Germany
Local time: 13:09
German to English
+ ...
TOPIC STARTER
Source for lookup and concordance... Nov 17, 2013

SDL Support wrote:

... what was the source TU found in the lookup and in the concordance?

Regards

Paul


I see what you're getting at...

in pre-translation the source was the entire text within that segment:

Nullstellung geschlossen (NC)

for which pre-translation returned a 78% match with "closed (NC) without power" and inserted it into the corresponding target segment.

For the concordance search, I selected only

Nullstellung geschlossen

minus the (NC), and received 100% with "zero position closed".

Call me clueless, but I still don't quite understand the algorithm's selection criteria in this case icon_confused.gif

Best, OW

[Edited at 2013-11-17 23:37 GMT]


 

SDL Community  Identity Verified
United Kingdom
Local time: 13:09
English
I guess I'm misunderstanding you... Nov 18, 2013

... but I'm still not sure what was in the source TU from your TM in both cases. The TM lookup will always be based on the entire segment whereas the concordance is only on the text you select.

So for example a TM look up on this:

"One Two three four"

Would return a 25% match (... ish) on a TM containing this in the source:

"One"

But if you concordance on “One" then it will return a 100% match. I would not expect this to take priority over the 25% because similarly I would get a 100% match concordance searching on “One" for this:

"One little, two little, three little indians, four little indian braves."

This is why I wondered what your source text was in the TM result as this is what we are comparing... not the target result.

Regards

Paul


 

owhisonant (X)  Identity Verified
Germany
Local time: 13:09
German to English
+ ...
TOPIC STARTER
Looking at the TM... Nov 18, 2013

...I hope I can make this clear.

The entire segment being looked up in "pre-translate" is

Nullstellung geschlossen (NC)

The pre-translation selected, with a weighting of 78%

closed (NC) without power

which is based on the source phrase (in the TM) of

stromlos geschlossen (NC)

Far as I can see, based on characters

stromlos geschlossen (NC)

has 15 that match with

Nullstellung geschlossen (NC)

for a match of 55.5% (I have no idea where it comes up with 78%).

In concordance search, I looked up

Nullstellung geschlossen

which resulted in a 100% match, to the source phrase in the TM, "Nullstellung geschlossen". This corresponds to a target phrase in the TM of "zero point closed".

I'm certainly not an expert in computational linguistics, but it seems to me that the phrase that was already in the TM

Nullstellung geschlossen

is a closer match to the entire translation segment, i.e.

"Nullstellung geschlossen (NC)"

(I make it 85% based on character matches), than the phrase (s. above) that was selected as a 78% match (I make it 55.5%) by the pre-translate function. This is the selection behavior that I am trying to understand (and possibly modify).

Thanks, OW


 

Meta Arkadia
Local time: 19:09
English to Indonesian
+ ...
The old problem Nov 18, 2013

I don't use Trados (it doesn't run natively on a Mac, a.o.), but it seems that like most (all?) CAT tools, Trados uses the longest match available in your TMs to auto-translate - in your case the ones that includes "(NC)" - rather than the best match. Nothing much developers can do about it, it seems. My main trouble with it is, that it can also affect your QA, so you'll have to be extremely careful when you select the TM/glossary/termbase you use for the QA.

Cheers,

Hans

[Edited at 2013-11-18 12:44 GMT]


 

owhisonant (X)  Identity Verified
Germany
Local time: 13:09
German to English
+ ...
TOPIC STARTER
Agreed... Nov 18, 2013

Meta Arkadia wrote:

I don't use Trados (it doesn't run natively on a Mac, a.o.), but it seems that like most (all?) CAT tools, Trados uses the longest match available in your TMs to auto-translate - in your case the ones that includes "(NC)" - rather than the best match. Nothing much developers can do about it, it seems. My main trouble with it is, that it can also affect your QA, so you'll have to be extremely careful when you select the TM/glossary/termbase you use for the QA.

Cheers,

Hans

[Edited at 2013-11-18 12:44 GMT]


Hi,

Thanks for the input. I run Trados in VM on a Mac (maybe that's my problemicon_smile.gif).

Anyway, if you look at the word matches, both source phrases in the TM match two words (assuming that (NC) is treated as a word) in the segment to be translated.

If you look at the character matches, there are actually more in 'Nullstellung geschlossen' (85%) than in 'stromlos geschlossen (NC)' (55%).

Either way, pre-translate has chosen the shorter match, not the longer, certainly not the better one. This seems a bit random, although it can't be, logically speaking, so I'm still interested what drove the selection...


 

Meta Arkadia
Local time: 19:09
English to Indonesian
+ ...
Not the characters Nov 18, 2013

owhisonant wrote:
If you look at the character matches

I think the number of "words" (parts, whatever) is relevant here, not the number of characters. The CAT tool looks for the "longest" string for the three components of the pretranslation, whereas you "cheat" by leaving out "(NC)" with your concordance search.

Cheers,

Hans


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Studio 2011, question on "Pre-Translate" behaviour

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search