Pages in topic:   [1 2] >
How to have Trados concentrate on relevant text rather than tag material for finding matches
Thread poster: Alexandre Oberlin
Alexandre Oberlin
Alexandre Oberlin  Identity Verified
France
Local time: 17:59
English to French
+ ...
Oct 27, 2010

Hi all,

Apparently tags are considered just as any text in Trados matching algorithms.

When translating heavily tagged (DTP) documents, Trados will give a better score to TUs having similar tags, while underscoring or even discarding the ones where the relevant text is the same but the tags differ more.

A particular case is when you change a translation in a new project using the same tm. You might not be able to retrieve your changes later when the exact
... See more
Hi all,

Apparently tags are considered just as any text in Trados matching algorithms.

When translating heavily tagged (DTP) documents, Trados will give a better score to TUs having similar tags, while underscoring or even discarding the ones where the relevant text is the same but the tags differ more.

A particular case is when you change a translation in a new project using the same tm. You might not be able to retrieve your changes later when the exact same phrase comes up with even slightly different tags. Actually you might well have a 100% match showing your older option which you wanted to override everywhere in the new projects. If this is still fresh in your memory, you will remember that you changed it and make a concordance search to find the new phrasing. If you don't, or if you are not the person who decided to change the translation, you won't be able to consistently change the phrasing.

I find this very annoying but I did not find how to change that behavior. The penalties tag does not seem to have much effect on that matter though the project attributes are typically different. Some other translation tools show the fuzzy matches even when a 100% match is found, which does help, but Trados seems to consider that if a 100% match is found then all issues are solved altogether...

There *must* be someone who already experienced this!

Cheers,

Alexandre Oberlin






[Edited at 2010-10-27 19:39 GMT]
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:59
French to Polish
+ ...
Big Trados swindle... Oct 27, 2010

Alexandre Oberlin wrote:

Apparently tags are considered just as any text in Trados matching algorithms.

You're wrong
The tag weight is TWO times more important than the weight of a "human"word.
It's a well known (?) Trados swindle.
In some cases (especially in short sencences) you may receive "matches" with no matching words...
It was discussed here, just google.

find this very annoying but I did not find how to change that behavior.

You can't do it.
It's hardcoded in the algorithm.

There *must* be someone who already experienced this!

Yep.
So why I switched to DV many years ago.

Cheers
GG


 
RWS Community
RWS Community
United Kingdom
Local time: 17:59
English
Can we look at this sensibly? Oct 27, 2010

Hi Grzegorz,

A strong choice of words, but I think it might be fairer to look at some specific examples so we can try to explain the logic.

Tags trigger penalties, while words are counted relative to the segment length. In short segments the tags may (relatively) outweigh the word-based score reductions, but specific examples would help.

I don't think we'd call it a swindle..!

Regards

Paul


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:59
French to Polish
+ ...
The truth is upon us ;P Oct 27, 2010

SDL Support wrote:

A strong choice of words, but I think it might be fairer to look at some specific examples so we can try to explain the logic.

Just a rapid google search.
http://fra.proz.com/forum/sdl_trados_support/163876-studio_sp2:_fuzzy_matches_make_no_sense.html
http://mac.proz.com/forum/sdl_trados_support/176063-trados_studio_inserts_translations_below_the_desired_threshold_level_.html
http://arm.proz.com/forum/sdl_trados_support/163876-studio_sp2:_fuzzy_matches_make_no_sense.html
I can provide more examples.
Please, give me some time

Tags trigger penalties, while words are counted relative to the segment length. In short segments the tags may (relatively) outweigh the word-based score reductions, but specific examples would help.

The short segment test is really usefull as it shows the algorithm logic.
For longer segments the results are less visible so most people don't notice it and don't complain about it.
But you can't deny the tag weight is two times more important than the "human" word weight.
In the Trados world, we're paid for word, not for tags...

I don't think we'd call it a swindle..!

Well.
An error may happen.
The perseverance in sin will be punished by flame wars ;P

Cheers
GG


 
RWS Community
RWS Community
United Kingdom
Local time: 17:59
English
The root of all evil... Oct 27, 2010

Hello Grzegorz,

Thank you for todays sermon, it's always a pleasure.

I'll gather up your examples, they are really useful, and will use them to help shape a discussion internally on the matching algorithms. We have a single algorithm which is optimized to deliver appropriate scores in most situations and this is more likely to be the reason why most users don't complain about it. It avoids the too low scores of “naïve scoring”, and avoids the too high scores of �
... See more
Hello Grzegorz,

Thank you for todays sermon, it's always a pleasure.

I'll gather up your examples, they are really useful, and will use them to help shape a discussion internally on the matching algorithms. We have a single algorithm which is optimized to deliver appropriate scores in most situations and this is more likely to be the reason why most users don't complain about it. It avoids the too low scores of “naïve scoring”, and avoids the too high scores of “direct dice scoring”, but still takes (the amount of) differences in punctuation, tags, and whitespace into account.

The common recommendation, as you know, is that if users feel they get too few matches or “TM Silence”, they can lower the minscore. If they get too much noise, they can increase the minscore. Obviously, lowering the minscore may lead to more noise (i.e. reduced precision), while increasing it may lead to silence (i.e. reduced recall). This is an inherent trade-off decision with all information retrieval systems.

Like any scoring algorithm, users will sometimes “feel” that it’s too high, and sometimes they will “feel” it to be too low. This is an “emotional quality”, though, which is difficult to capture in an algorithm, particularly as it also depends on the present case. There are other times when you might find particular examples (not made up ones to prove a point) where it clearly doesn't seem useful at all.

Practically speaking it may not be worthy of too much attention, as many of the links you quote attempt to suggest (inbetween your lengthy sermons) that overall productivity is the main criteria. But I do take your point, and will use the information in these posts as a discussion point on what could be improved without degenerating the overall perception.

Regards

Paul
Collapse


 
RWS Community
RWS Community
United Kingdom
Local time: 17:59
English
Just a simple test Oct 28, 2010

I wanted to take a closer look at this as mentioned below, so took first of all a simple example to see how short and long sentences are handled and how simple tags are handled in various tools. I created a word document like this;


The tags in segments #3 and #4 are simple formatting tags, and the tags in segments #5 and #6 are bookmarks. Then I just made up some text to
... See more
I wanted to take a closer look at this as mentioned below, so took first of all a simple example to see how short and long sentences are handled and how simple tags are handled in various tools. I created a word document like this;


The tags in segments #3 and #4 are simple formatting tags, and the tags in segments #5 and #6 are bookmarks. Then I just made up some text to extend the sentences for #7 through to #12.

I then opened the document in one of the tools I tested, translated and confirmed to the Translation Memory segments #1 and #7. Then I simply looked at the matching to see, out of interest, how a few of the desktop tools we see mentioned in this forum performed. The results were these;


I'll leave you to draw your own conclusions from this, but I think it's clear from this example that Studio is not performing as you stated and only applies simple penalties for tags, in the same way as all the rest. So for example in segment #3 we see a pair of formatting tags. This is a matching pair so we apply a single penalty point. In segment #5 the book mark tags are two different tags (start and end) so we apply two penalty points.

I'm also aware this is a very simplified example, as are so many of your examples, and we could probably dream up more to make any tool look bad in a particular situation. But I think the message should be that everything is explainable once we have clear examples of what the material is and we are happy to take off forum any examples that are of cause for real concern. Then we can make a reasoned decision on whether we should be changing anything or not. Unless it's a frequent occurrence it could be more costly to investigate and fix safely than it would be to complete the correct translation and move on. I think changes in this area carry a lot of risk and need a lot of testing to ensure that we don't fix the few at the expense of the many.

In the meantime, we will look at the real examples we have from these threads as promised.

On a final note, I thought I'd post the editing shots for interest so you can see I'm not making them up (apart from our own legacy products as you've probably seen these before) For memoQ and DVX I had to take the scores from a different window as they didn't show up in the same pane as I worked, so you can't see them in here. Maybe an expert user would make a better job of that, but I'm sure of the matching.

Studio


memoQ


Wordfast


DVX


Regards

Paul
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:59
French to Polish
+ ...
Levenshtein distance Oct 28, 2010

SDL Support wrote:

I'll leave you to draw your own conclusions from this, but I think it's clear from this example that Studio is not performing as you stated and only applies simple penalties for tags, in the same way as all the rest.

You selected the simplest formatting tags instead of numbers or more complex placeables.
For these tags the analysis swindle is different

So for example in segment #3 we see a pair of formatting tags. This is a matching pair so we apply a single penalty point. In segment #5 the book mark tags are two different tags (start and end) so we apply two penalty points.

This is another face of the flaw in the Trados algorithms.
You use absolute valours instead of weighted ones.
In this way, another time you underestimate the translator work necessary to make the changes in the segment.
If you compare the results of your test to the well known Levenshtein distance algorithm, i.e. the number of strokes (edits) needed to transform the source version proposed from TM to the source version spotted in the segment, you'll see Trados 2009 proposes always too high scores i.e. the translator is not paid as he should be if discounts apply.
E.g.:
- for the segment #3, you have 99% instead of approx. 94% (16/17).
- for the segment #5, you have 98% instead of approx. 88% (16/18).
I counted one tag (or paired tag) as one stroke, just like you.
Of course, the results for the longer sentences will be closer to the reality.

Although the Levenshtein algorithm is the best to evaluate the difference between strings, it needs a damn lot of calculations so why it is not used in the real CAT world.
Nonetheless, it should be used as reference of the simplified/approximative algorithms.

I'm also aware this is a very simplified example (...)

The simple examples are beautiful

PS
I hope I didn't make some mistake when counting letters, I'm really poor above 3

Cheers
GG


 
Miguel Carmona
Miguel Carmona  Identity Verified
United States
Local time: 08:59
English to Spanish
You can't beat good logic Oct 29, 2010

Grzegorz: Your solid logic is unbeatable.

Paul: Bad logic.


Conclusion: "Swindle" stands.


 
Stanislav Pokorny
Stanislav Pokorny  Identity Verified
Czech Republic
Local time: 17:59
English to Czech
+ ...
A question Oct 29, 2010

Hi GG,
I'm a complete ignorant as regards maths, so I thought I would ask you for an explanation:

I entered two sentences in a Levenshtein distance calculator:
The cat is black.
The acata is black. The two "a's" are here to substitute tags from Paul's example. The Levenshtein distance is 2, i.e. a 98% match.

Why do you think it should be 88%? Dividing the number of characters in the first sentence by the number of characters in the second sentence seem
... See more
Hi GG,
I'm a complete ignorant as regards maths, so I thought I would ask you for an explanation:

I entered two sentences in a Levenshtein distance calculator:
The cat is black.
The acata is black. The two "a's" are here to substitute tags from Paul's example. The Levenshtein distance is 2, i.e. a 98% match.

Why do you think it should be 88%? Dividing the number of characters in the first sentence by the number of characters in the second sentence seems to me... uhm, different than the calculation of the Levenshtein distance. I'm not arguing that it is wrong, but it's different. It may well only be my ignorance that leads me to my conclusion, but currently I simply think you're mixing apples with pears. I'd be interested to hear why you suggest/prefer the division method over the Levenshtein algorithm.
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:59
French to Polish
+ ...
Weighted value Oct 29, 2010

Stanislav Pokorny wrote:

I'm a complete ignorant as regards maths, so I thought I would ask you for an explanation:

I entered two sentences in a Levenshtein distance calculator:
The cat is black.
The acata is black. The two "a's" are here to substitute tags from Paul's example. The Levenshtein distance is 2, i.e. a 98% match.

Why do you think it should be 88%?

You should use the formula like:
match level = (target length - Levenshtein distance)/target length
If you have 19 letters in the target sentence, 2 letters are not 2%

Dividing the number of characters in the first sentence by the number of characters in the second sentence seems to me... uhm, different than the calculation of the Levenshtein distance.

It's a weighted value.
The problem with "pure" Levenshtein distance is, let's say, the distance between "Cat" and "Cat(tag)" is 1 in the same way like in two 100 word sentences with only one different tag.
A hundred of "Cat/Cat(tag)" like sentences give you 100 tags to change/add/delete while one 100 word sentence needs only one stroke.
So, you must consider the sentence/segment length to make 'em comparable.

I'm not arguing that it is wrong, but it's different. It may well only be my ignorance that leads me to my conclusion, but currently I simply think you're mixing apples with pears. I'd be interested to hear why you suggest/prefer the division method over the Levenshtein algorithm.

As above.
I'm not very precise but I suppose the main line is clear.

Of course, the results for large sentences are rather OK but the short sentence matching level is obviously wrong.
As you see in Paul's table, it's not only a Trados problem.

BTW, the classic Trados wordcount was rather OK many years ago but today it's clearly obsolete as it doesn't take in account the tags etc., at least in a transparent explicite way.
So why standards like GMX-V are being proposed.
http://www.lisa.org/fileadmin/standards/GMX-V.html
But almost nobody cares.
Nobody wants to pay us per tag.
Probably we should start to deliver translation without tags saying "as it doesn't cost a penny, DIY"

Cheers
GG


 
Stanislav Pokorny
Stanislav Pokorny  Identity Verified
Czech Republic
Local time: 17:59
English to Czech
+ ...
Thank you Oct 29, 2010

Hi GG,
thank you for your explanation; now it makes more sense (even to me).


 
Daniel García
Daniel García
English to Spanish
+ ...
But tags are accounted for, aren't they? Oct 30, 2010

Grzegorz Gryc wrote:

As you see in Paul's table, it's not only a Trados problem.

BTW, the classic Trados wordcount was rather OK many years ago but today it's clearly obsolete as it doesn't take in account the tags etc., at least in a transparent explicite way.
So why standards like GMX-V are being proposed.
http://www.lisa.org/fileadmin/standards/GMX-V.html
But almost nobody cares.
Nobody wants to pay us per tag.
Probably we should start to deliver translation without tags saying "as it doesn't cost a penny, DIY"


Very interesting discussion.

Two things, though.

In Trados 8 and previous versions, you could allocate a higher penalty to placeables. This made tag-loaded files more expensive to translate that clean files.

Another nice feature in the old Trados analysis is that, together with the word and sentence count you got a count of placeables. You could very easily calculate the average number of tags per sentence and keep it into account when preparing your invoice.

The technology was there. Whether people used it to make realistic effort estimation is another matter.

Aren't these two features present in Studio?

Daniel


 
Alexandre Oberlin
Alexandre Oberlin  Identity Verified
France
Local time: 17:59
English to French
+ ...
TOPIC STARTER
But those penalties can indeed work! Oct 30, 2010

Hi again,

Congratulations for your very interesting tests and developments.

As a general rule, documents full of short phrases need significantly more work and are underestimated with a standard word rate. At least this behavior of Trados and tags does not seem to alleviate this bias.

Concerning my particular problem of changing small phrases in related projects, I finally figured out something very basic: the filter setting can help! I think I ha
... See more
Hi again,

Congratulations for your very interesting tests and developments.

As a general rule, documents full of short phrases need significantly more work and are underestimated with a standard word rate. At least this behavior of Trados and tags does not seem to alleviate this bias.

Concerning my particular problem of changing small phrases in related projects, I finally figured out something very basic: the filter setting can help! I think I had forgotten the true use of those filters somewhere along the way (Trados is not my preferred tool) and had naively come to think that applying filters would just filter out the TUs that have different fields. This of course was not what I wanted so I did not activate the filters. Today I read the fine manual and realized that the filters were (please tell me if I'm wrong) just a precondition for the text/attributes penalties to be effective.

So now at least I can trust the 100%, even if there are much less of them...

AO
Collapse


 
RWS Community
RWS Community
United Kingdom
Local time: 17:59
English
Increasing Penalties Oct 31, 2010

Daniel García wrote:

Very interesting discussion.

Two things, though.

In Trados 8 and previous versions, you could allocate a higher penalty to placeables. This made tag-loaded files more expensive to translate that clean files.

Another nice feature in the old Trados analysis is that, together with the word and sentence count you got a count of placeables. You could very easily calculate the average number of tags per sentence and keep it into account when preparing your invoice.

The technology was there. Whether people used it to make realistic effort estimation is another matter.

Aren't these two features present in Studio?

Daniel


Hi Daniel,

Yes, at least all these features are there in Studio. This part of the discussion is of course based on the defaults so it is perfectly possible to change this to reflect whatever you think is more appropriate. You can also report seperately on placeables and tags in the analysis (this is all calculated by default).

The problem is always whether you agree with the weightings applied by others.

Regards

Paul


 
RWS Community
RWS Community
United Kingdom
Local time: 17:59
English
More complex placeables Oct 31, 2010

Grzegorz Gryc wrote:

You selected the simplest formatting tags instead of numbers or more complex placeables.
For these tags the analysis swindle is different

Cheers
GG


Hi Grzegorz,

As I'm waiting for a flight I thought I'd take another look at some of these posts. This quote is interesting because numbers, dates etc are all autolocalised so shouldn't really be a problem at all. You can of course apply an increased penalty for autolocalisation or even switch this off, so the options are probably there for this too if you feel the default settings just aren't adequate for your needs on a particular project with lots of tags.

I wanted to see if I could create a few examples to look at this but I'm struggling to see why I would complain about the defaults when they can be changed.

The more I look at this, the more I don't see a swindle. Rather a need to understand specific cases and how best to use the software to suit your needs as you can only cater for the majority of situations with the default settings.

Regards

Paul


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to have Trados concentrate on relevant text rather than tag material for finding matches







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »