Pages in topic:   [1 2] >
TU not found in Studio 2011
Thread poster: Jonathan Hopkins
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
Nov 23, 2011

Has anyone ever experienced the case of Studio not producing a fuzzy match for an existing TU? Here's an example of what I mean:

I have a segment with the following text: "Office Suites Antibakterielle Microban-Fußstütze"
In my TM I have a nearly perfect match: "Office Suites Antibakterielle Microban Fußstütze"

As you can see the only difference is the hyphen between Microban and Fußstütze. However, Studio can't find it in the Editor (Ctrl+shift+t). Only
... See more
Has anyone ever experienced the case of Studio not producing a fuzzy match for an existing TU? Here's an example of what I mean:

I have a segment with the following text: "Office Suites Antibakterielle Microban-Fußstütze"
In my TM I have a nearly perfect match: "Office Suites Antibakterielle Microban Fußstütze"

As you can see the only difference is the hyphen between Microban and Fußstütze. However, Studio can't find it in the Editor (Ctrl+shift+t). Only when I either do a concordance search, or a wildcard search in the TM-view, is the TU found.

At first, I suspected tags were the culprit, but in the TM-view I see that there is none. Some screenshots to illustrate:

Here Studio offers me zero hits:



And here you can see that I get a little luckier when doing a concordance search:



And when I look for it in the TM-view:



Any ideas what I could look for to solve the riddle?

Thanks in advance for your replies,
Jonathan
Collapse


 
Erik Freitag
Erik Freitag  Identity Verified
Germany
Local time: 19:16
Member (2006)
Dutch to German
+ ...
Known problem Nov 23, 2011

Jonathan,

This is a behaviour I know quite well. The outcome of quite some detective work with support staff was that in my case this has to do with the way project TMs are populated from the master TM. To my surprise, staff informed me that the use of project TMs should generally be avoided, unless absolutely needed for collaboration with other translators - this is a piece of information I'd like to see in the software documentation, but as there actually even isn't anythin
... See more
Jonathan,

This is a behaviour I know quite well. The outcome of quite some detective work with support staff was that in my case this has to do with the way project TMs are populated from the master TM. To my surprise, staff informed me that the use of project TMs should generally be avoided, unless absolutely needed for collaboration with other translators - this is a piece of information I'd like to see in the software documentation, but as there actually even isn't anything like that ...

The problem indeed seems to be gone since I don't use project TMs anymore.

So - do you use project TMs? If yes - try without.

Kind regards,
Erik



[Bearbeitet am 2011-11-23 17:07 GMT]

[Bearbeitet am 2011-11-23 17:07 GMT]
Collapse


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
I wouldn't want to work without the project TM Nov 23, 2011

Hi Erik,

Thanks for your reply.

efreitag wrote:
So - do you use project TMs? If yes - try without.


Since Studio doesn't have an automatic backup or save function to save the sldxliff files, I'd rather not do without a project TM, since that is the only real back up that I have. Or at least to my knowledge, master TMs are not automatically updated upon confirming a segment (Ctrl+Enter), only project TMs are. (Or is this merely a setting that I could change, so that the Master TMs are always automatically updated after confirming a segment?)

Don't you worry that you may lose work, should Studio crash? How do you secure your work?

Cheers,
Jon


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
No change after disabling the project TM Nov 23, 2011

Hi Erik,

I disabled the Project TM (and incidentally it asked me if I wanted to update the master TM, so that answered my earlier question), but that didn't give me the desired result, unfortunately.

I've since logged a support case with SDL. If they provide me with anything useful I'll pass it on.

Cheers,
Jonathan


 
Erik Freitag
Erik Freitag  Identity Verified
Germany
Local time: 19:16
Member (2006)
Dutch to German
+ ...
Match value issue? Nov 23, 2011

Jonathan,

Ok, next thing to consider ist that the problem you're experiencing might be based on another well-known bug (or, depending on the point of view: feature) of Trados, at least since 2009: What a human perceives as a near-100% match (say, 99%), often is only a 70% match for Studio (I'm making up the numbers, obviously). Studio's match algorithm is quite useless.

For us humans, there's only a hyphen missing, for Studio, there are two separate words (Microban Fuß
... See more
Jonathan,

Ok, next thing to consider ist that the problem you're experiencing might be based on another well-known bug (or, depending on the point of view: feature) of Trados, at least since 2009: What a human perceives as a near-100% match (say, 99%), often is only a 70% match for Studio (I'm making up the numbers, obviously). Studio's match algorithm is quite useless.

For us humans, there's only a hyphen missing, for Studio, there are two separate words (Microban Fußstütze) in the TM, while there's only one long word (Microban-Fußstütze) in your text. As far as Studio's match algorithm is concerned, both are completely unrelated.

You might want to play with your fuzzy match value settings a bit: If you set them low enough, your TM segment might be proposed as a fuzzy match - the trade-off being that you'll get a lot of noise then.

Kind regards,
Erik

Edit: Links to earlier discussions:

http://glg.proz.com/forum/sdl_trados_support/183991-how_to_have_trados_concentrate_on_relevant_text_rather_than_tag_material_for_finding_matches.html

http://www.proz.com/forum/cat_tools_technical_help/196156-match_algorithm_expectations.html



[Bearbeitet am 2011-11-23 17:58 GMT]

[Bearbeitet am 2011-11-23 18:00 GMT]
Collapse


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
No change even for 30% threshold Nov 23, 2011

Hi Erik,

I actually already responded to this message several hours ago, but it would appear that something is amiss and the post never appeared. Hence this message.

efreitag wrote:
You might want to play with your fuzzy match value settings a bit: If you set them low enough, your TM segment might be proposed as a fuzzy match ...


This was the first experiment I tried. I set the threshold down to the lowest possible value (30%) and received a hit (53%), which was nothing like the segment in question. Unfortunately, the TU that is exactly the same, save the hyphen, still wasn't found by Studio.

Btw, thanks for the links to the other threads.

Cheers,
Jonathan


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 20:16
Finnish to French
Some matching algorithms are more "human" than others... Nov 24, 2011

efreitag wrote:
Ok, next thing to consider ist that the problem you're experiencing might be based on another well-known bug (or, depending on the point of view: feature) of Trados, at least since 2009: What a human perceives as a near-100% match (say, 99%), often is only a 70% match for Studio (I'm making up the numbers, obviously). Studio's match algorithm is quite useless.

FWIW, here is how some other tools would have rated Jonathan's fuzzy match:

1) TWB 8.3: 97%
2) Wordfast Classic 6: 87%
3) memoQ 5: 68%


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
TU found! Nov 24, 2011

Ok,

so now I changed not only the threshold down a few notches, but I also checked the box, "Search both project and main translation memories", and that produced the appropriate TU.



Even still, I'd expect a sentence that is a perfect match save one character to be much closer to 100% i.e. 47 matching characters divided by total number of characters 48
... See more
Ok,

so now I changed not only the threshold down a few notches, but I also checked the box, "Search both project and main translation memories", and that produced the appropriate TU.



Even still, I'd expect a sentence that is a perfect match save one character to be much closer to 100% i.e. 47 matching characters divided by total number of characters 48 = ca. 98% match.

Cheers,
Jonathan
Collapse


 
Anne Bohy
Anne Bohy  Identity Verified
France
Local time: 19:16
English to French
May be related to particular characters in the sentence Dec 1, 2011

I have just experienced a similar problem. I was trying to retrieve 100% matches from a TM (English to French) and discovered that some were found, some not. I realized quite quickly that the sentences which didn't work were those containing contractions (aren't, isn't, etc.), that is, all sentences containing a single quote.
Changing the project settings as you indicated, to search both project and main translation memories helped.
HOWEVER, I still see that IDENTICAL TUs are not con
... See more
I have just experienced a similar problem. I was trying to retrieve 100% matches from a TM (English to French) and discovered that some were found, some not. I realized quite quickly that the sentences which didn't work were those containing contractions (aren't, isn't, etc.), that is, all sentences containing a single quote.
Changing the project settings as you indicated, to search both project and main translation memories helped.
HOWEVER, I still see that IDENTICAL TUs are not considered 100% matches when there are quotes in them... For instance the word "they're" is striked and replaced by "they'" (in front) and "re" (behind)... Because of this, the match rate is down to 92%.
Obviously, there are two pieces of Studio 2011 code which do not handle words the same way !
In my opinion the hyphenation sign that you have in your TUs may lead to the same problem. I wonder if some piece of code suppresses hyphens and concatenates the strings before and after the hyphen? Try to see if you can find the concatenated word (with no hyphen) in your TM.
Collapse


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
I wouldn't consider those exactly 100% matches either Dec 1, 2011

Hi Bohy,

bohy wrote:

I have just experienced a similar problem. I was trying to retrieve 100% matches from a TM (English to French) and discovered that some were found, some not. I realized quite quickly that the sentences which didn't work were those containing contractions (aren't, isn't, etc.), that is, all sentences containing a single quote.


So, were the 92% matches not found at all? Or do you just mean that instead of a 100% match, Studio gave you 92% matches?

bohy wrote:

HOWEVER, I still see that IDENTICAL TUs are not considered 100% matches when there are quotes in them... For instance the word "they're" is striked and replaced by "they'" (in front) and "re" (behind)... Because of this, the match rate is down to 92%.


I wouldn't have expected Studio to consider these as 100% matches either, and depending on how many characters are in the source segment (e.g. if you only have 12 characters, a difference of only one character could justifiably decrease the value to roughly 92%. However, I think Studio's algorithms work on a word-matching basis, and therefore if simply one character is different, Studio considers the entire word a mismatch (even though there is just an apostrphe or hyphen). That would be the reason why the fuzzy value seems way off, especially for short segments.

Bohy wrote:

I wonder if some piece of code suppresses hyphens and concatenates the strings before and after the hyphen? Try to see if you can find the concatenated word (with no hyphen) in your TM.


Please see above. I've already included screen shots showing the word in a concordance search and via the TM view.

Cheers,
Jonathan


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
Comic relief Dec 1, 2011

So, if you have a short sentence and the old source segment differs from the new source segment by nothing more than a hyphen, you may not get a match at all, but if your sentence only has a few words, lets say 4, and two of them happen to match at the same position you may get an incredibly high match



So "Zipper compartment in lid" (old segment) is a
... See more
So, if you have a short sentence and the old source segment differs from the new source segment by nothing more than a hyphen, you may not get a match at all, but if your sentence only has a few words, lets say 4, and two of them happen to match at the same position you may get an incredibly high match



So "Zipper compartment in lid" (old segment) is a 73% match of "On-screen display (OSD) menu". And for the simple fact that the preposition and definite article match at the same location "... auf dem..." (on the).

I think this example is good at showing the fallacy of weighting the position of words so highly. Especially for small sentences, it is entirely possible that the language structure and word order will often be similar and prepositions and articles will be in similar places as in the example above, so why weight it so high?

On the other hand, if I have a segment like:

Gel Handgelenkauflage

and then

Gel-Handgelenkauflage

or

Gelhandgelenkauflage

Neither of the latter two spellings will match the first segment at all. 0% match

But if you happen to be translating some kind of point form list in a ppt file, you'll get all kinds of nonsense just because there are a similar number of words in the segment with the position of a colon and one two-letter word being in the same place:



*sigh*
Collapse


 
Dr. Matthias Schauen
Dr. Matthias Schauen  Identity Verified
Germany
Local time: 19:16
Member (2007)
English to German
That's why I changed back to Trados 2007 Dec 1, 2011

I experienced the same behaviour as you, Jonathan, and since - because of its new matching algorithm - Trados Studio gives me too little leverage of my TMs, I changed back to Trados 2007. To me, investing in Studio 2009 was useless. I posted examples similar to yours elsewhere in this forum, for instance:
New segment: Condensate pump (A and B)
TM hit 1: Reference plot (A and B) [74%]
TM hit 3: Condensate pump A [62%]


 
Anne Bohy
Anne Bohy  Identity Verified
France
Local time: 19:16
English to French
Eureka: different characters Dec 1, 2011

I tried to reproduce the problem with a simple testcase.
After many unsuccessful attemps, I finally realized what happened.
The text in my translation memory was actually coming from an Excel file.
The new text that I wanted to translate was a Word file.
The problem is that Word and Excel don't use the same character for the apostrophe: when you hit the single quote key, Excel produces a (vertical) single quote and Word produces an apostrophe (slanted or curly, depending
... See more
I tried to reproduce the problem with a simple testcase.
After many unsuccessful attemps, I finally realized what happened.
The text in my translation memory was actually coming from an Excel file.
The new text that I wanted to translate was a Word file.
The problem is that Word and Excel don't use the same character for the apostrophe: when you hit the single quote key, Excel produces a (vertical) single quote and Word produces an apostrophe (slanted or curly, depending on the language context).
The same may happen with your dash. Have you checked that it is the same dash? There are short ones, and longer ones...
The dash is considered as a character inside a word, so changing the type of dash makes the whole compound word appear as different...
Although there is an explanation to this strange behavior, it is something that we would like SDL to address in a rational way!
Collapse


 
Jonathan Hopkins
Jonathan Hopkins  Identity Verified
Germany
Local time: 19:16
German to English
+ ...
TOPIC STARTER
Thanks for those examples Dec 1, 2011

Hello Matthias,

Dr. Matthias Schauen wrote:

I experienced the same behaviour as you, Jonathan, and since - because of its new matching algorithm - Trados Studio gives me too little leverage of my TMs, I changed back to Trados 2007. To me, investing in Studio 2009 was useless. I posted examples similar to yours elsewhere in this forum, for instance:
New segment: Condensate pump (A and B)
TM hit 1: Reference plot (A and B) [74%]
TM hit 3: Condensate pump A [62%]


Thanks for those examples. I've never used older versions of Trados. My experience with CAT tools dates back only roughly 4 years and is limited to a few trial versions of dejavu, wordfast classic, memoQ, Swordfish and some others. The only prioprietary tools that I've used (and used the most extensively) are Transit and now Studio. I like the environment of Studio and really appreciate a lot of its features (AutoSuggest incl. the method for inserting dictionary entries to name but one or two advantages). I just wish it would fix some of these really annoying issues. Surely, the developers at SDL must admit that something is amiss here.

Has Paul or anyone else from SDL given a response to complaints of this kind in other threads?

Kind regards,
Jonathan


 
Dr. Matthias Schauen
Dr. Matthias Schauen  Identity Verified
Germany
Local time: 19:16
Member (2007)
English to German
Responses? Yes, but... Dec 1, 2011

Jonathan Hopkins wrote:

Has Paul or anyone else from SDL given a response to complaints of this kind in other threads?


What I found here from people working for or associated with SDL in response to reports of this problem goes all in the same direction:

We have a single algorithm which is optimized to deliver appropriate scores in most situations and this is more likely to be the reason why most users don't complain about it.

...a need to understand specific cases and how best to use the software to suit your needs as you can only cater for the majority of situations with the default settings.

Wrong or right - the whole world is imperfect


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

TU not found in Studio 2011







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »