Studio 2011, question on "Pre-Translate" behaviour
Thread poster: xxxowhisonant
xxxowhisonant  Identity Verified
Germany
Local time: 03:38
German to English
+ ...
Nov 17, 2013

Hello,

This not urgent, just out of general interest:

A project I am currently working on includes an extensive translation memory. I ran “pre-translation” on both (Word 2007) files in the project, resulting in a good deal of fuzzy matches (75% and above, as per my setting in Preferences).

So far, so good.

Here’s a source phrase from one of the files:

Nullstellung geschlossen (NC)

After pre-translation, this phrase shows a fuzzy match of 78% with the target phrase:

closed (NC) without power

So far, so good…

However, when I run a concordance search (F3) on "Nullstellung geschlossen", it shows me a 100% match in the TM:

Zero position closed,

which is obviously more correct and missing only the (NC) addition.

I can find other examples of this sort of thing throughout the file, and I find it curious that Trados would choose a 78% match in pre-trans instead of the close to 100% match which is obviously contained in the TM. At your convenience, any ideas for me, as in: Is there a way to change this behaviour?

Thanks in advance,

OW


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 03:38
English
I think you're missing some info... Nov 17, 2013

... what was the source TU found in the lookup and in the concordance?

Regards

Paul


Direct link Reply with quote
 
xxxowhisonant  Identity Verified
Germany
Local time: 03:38
German to English
+ ...
TOPIC STARTER
Source for lookup and concordance... Nov 17, 2013

SDL Support wrote:

... what was the source TU found in the lookup and in the concordance?

Regards

Paul


I see what you're getting at...

in pre-translation the source was the entire text within that segment:

Nullstellung geschlossen (NC)

for which pre-translation returned a 78% match with "closed (NC) without power" and inserted it into the corresponding target segment.

For the concordance search, I selected only

Nullstellung geschlossen

minus the (NC), and received 100% with "zero position closed".

Call me clueless, but I still don't quite understand the algorithm's selection criteria in this case

Best, OW

[Edited at 2013-11-17 23:37 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 03:38
English
I guess I'm misunderstanding you... Nov 18, 2013

... but I'm still not sure what was in the source TU from your TM in both cases. The TM lookup will always be based on the entire segment whereas the concordance is only on the text you select.

So for example a TM look up on this:

"One Two three four"

Would return a 25% match (... ish) on a TM containing this in the source:

"One"

But if you concordance on “One" then it will return a 100% match. I would not expect this to take priority over the 25% because similarly I would get a 100% match concordance searching on “One" for this:

"One little, two little, three little indians, four little indian braves."

This is why I wondered what your source text was in the TM result as this is what we are comparing... not the target result.

Regards

Paul


Direct link Reply with quote
 
xxxowhisonant  Identity Verified
Germany
Local time: 03:38
German to English
+ ...
TOPIC STARTER
Looking at the TM... Nov 18, 2013

...I hope I can make this clear.

The entire segment being looked up in "pre-translate" is

Nullstellung geschlossen (NC)

The pre-translation selected, with a weighting of 78%

closed (NC) without power

which is based on the source phrase (in the TM) of

stromlos geschlossen (NC)

Far as I can see, based on characters

stromlos geschlossen (NC)

has 15 that match with

Nullstellung geschlossen (NC)

for a match of 55.5% (I have no idea where it comes up with 78%).

In concordance search, I looked up

Nullstellung geschlossen

which resulted in a 100% match, to the source phrase in the TM, "Nullstellung geschlossen". This corresponds to a target phrase in the TM of "zero point closed".

I'm certainly not an expert in computational linguistics, but it seems to me that the phrase that was already in the TM

Nullstellung geschlossen

is a closer match to the entire translation segment, i.e.

"Nullstellung geschlossen (NC)"

(I make it 85% based on character matches), than the phrase (s. above) that was selected as a 78% match (I make it 55.5%) by the pre-translate function. This is the selection behavior that I am trying to understand (and possibly modify).

Thanks, OW


Direct link Reply with quote
 

Meta Arkadia
Local time: 08:38
English to Indonesian
+ ...
The old problem Nov 18, 2013

I don't use Trados (it doesn't run natively on a Mac, a.o.), but it seems that like most (all?) CAT tools, Trados uses the longest match available in your TMs to auto-translate - in your case the ones that includes "(NC)" - rather than the best match. Nothing much developers can do about it, it seems. My main trouble with it is, that it can also affect your QA, so you'll have to be extremely careful when you select the TM/glossary/termbase you use for the QA.

Cheers,

Hans

[Edited at 2013-11-18 12:44 GMT]


Direct link Reply with quote
 
xxxowhisonant  Identity Verified
Germany
Local time: 03:38
German to English
+ ...
TOPIC STARTER
Agreed... Nov 18, 2013

Meta Arkadia wrote:

I don't use Trados (it doesn't run natively on a Mac, a.o.), but it seems that like most (all?) CAT tools, Trados uses the longest match available in your TMs to auto-translate - in your case the ones that includes "(NC)" - rather than the best match. Nothing much developers can do about it, it seems. My main trouble with it is, that it can also affect your QA, so you'll have to be extremely careful when you select the TM/glossary/termbase you use for the QA.

Cheers,

Hans

[Edited at 2013-11-18 12:44 GMT]


Hi,

Thanks for the input. I run Trados in VM on a Mac (maybe that's my problem).

Anyway, if you look at the word matches, both source phrases in the TM match two words (assuming that (NC) is treated as a word) in the segment to be translated.

If you look at the character matches, there are actually more in 'Nullstellung geschlossen' (85%) than in 'stromlos geschlossen (NC)' (55%).

Either way, pre-translate has chosen the shorter match, not the longer, certainly not the better one. This seems a bit random, although it can't be, logically speaking, so I'm still interested what drove the selection...


Direct link Reply with quote
 

Meta Arkadia
Local time: 08:38
English to Indonesian
+ ...
Not the characters Nov 18, 2013

owhisonant wrote:
If you look at the character matches

I think the number of "words" (parts, whatever) is relevant here, not the number of characters. The CAT tool looks for the "longest" string for the three components of the pretranslation, whereas you "cheat" by leaving out "(NC)" with your concordance search.

Cheers,

Hans


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Studio 2011, question on "Pre-Translate" behaviour

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search