Pages in topic:   [1 2] >
Studio SP2: fuzzy matches make no sense
Thread poster: Adam Bojan

Adam Bojan  Identity Verified
Poland
Local time: 20:41
Dutch to Polish
+ ...
Apr 29, 2010

Hello,
What am I doing wrong that the fuzzy matches shown in the Translation Results window are so inaccurate? In other CAT's a 70% match is quite a nice one. Here I get "Stop the MFT" as a 69% match of "Solve the problem".

[img]http://yfrog.com/g4fuzzymatchesj[/img]
Is it another bug in the software or something with my TM or settings, I haven't, however, had the problem in Trados 8 and SDLX. I have done some research and found this . So I am not the only one. I wonder if this also happens with the statistics (I am afraid it does). IF so, you have to be very careful accepting "Studio discounts" based on its fuzzy matches. So far none of my clients uses Studio, but maybe they will start ... anyway that's one of the reasons I've bought this thing. What do you think?


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 21:41
Member (2006)
English to Turkish
+ ...
90% Apr 29, 2010

In Trados 2007 I use 70% and in DVX 30-40% but 80 or 90% is required for Studio.

Direct link Reply with quote
 

Adam Bojan  Identity Verified
Poland
Local time: 20:41
Dutch to Polish
+ ...
TOPIC STARTER
Strange Apr 29, 2010

So it is not a bug. It is meant so and even required, as you say. Strange. And what about the analyze report? Will it make sense to give discounts for fuzzy matches under, say, 80%?. And why is it so? By the way, the concordance search seems to be more reliable.

Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 20:41
Member (2003)
Polish to German
+ ...
You cannot say that Apr 29, 2010

Selcuk - there is no absolute truth in the world, so statements as your are inapropriate and misplaced.
In fact I do use 40% for Studio and 30% for Workbench, 60% for Transit.
Depending on the kind of text, the translation memoery uesd and the software matching algorithmus the results differ very much - but there is NO general setting, which could apply to any user in the world.
This is a very individual setting. If you're unhappy with 69%, go higher to 75%, but do not tell me "90% is required for Studio". Nothing is "required".


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 21:41
Member (2006)
English to Turkish
+ ...
Thanks for confirming Apr 29, 2010

Jerzy Czopik wrote:

In fact I do use 40% for Studio and 30% for Workbench, 60% for Transit.


So it means that fuzzy matches suggested by Studio are not as reliable as those suggested by Trados (2007).

And I repeat my statement adding my language pair:

In Trados 2007 I use 70% and in DVX 30-40% but 80 or 90% is required for Studio in English>Turkish translations.


Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 20:41
Member (2003)
Polish to German
+ ...
Fuzzy matches in Studio are different Apr 29, 2010

But in fact I can confirm, that setting minimum match value as low as in T2007 makes no sense for Studio.
But still - even in your language pair, such statement should not be worded as absolute truth. What works for you may not work (may work better) for another people.
So 90% is required in Studio for Selcuk.
Studio allows Jerzy to set the minimum match value and many other parameters accordingly to his needs. Values set in Studio by Jerzy do not matter to the community due to a vey specific way of working Jerzy uses.
This would be my statement


Direct link Reply with quote
 

Adam Bojan  Identity Verified
Poland
Local time: 20:41
Dutch to Polish
+ ...
TOPIC STARTER
What about quotes? Apr 29, 2010

Jerzy Czopik wrote:
This is a very individual setting. If you're unhappy with 69%, go higher to 75%, but do not tell me "90% is required for Studio". Nothing is "required".


This stops being only an individual setting if you are asked to offer discounts based on fuzzy matches. I have no problem with discounts for 70-85% matches in Trados, but I will be very careful if one asks me to apply the same rates for Studio. Do you, will you, Jerzy? Have you guys been offered a job based on Studio analysis? and how does this analysis compare to that of Trados?


Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 20:41
Member (2003)
Polish to German
+ ...
My fuzzy rating does go down just til 85% Apr 29, 2010

And this already since T2007.
Regardless the real match situation, matches under 70% are always to be considered as no match. And I must admit I did not pay that big attention to match values in Studio, as it is rewarding me with other features, so I still can use some of my 40-50% matches.
Never thought about making an experiment and compare the matches.
What worries me much more is the way Studio is presenting the changes in the current segment compared to previously translated segment.
T2007 uses a very clear and easy to understand system of just 3 background coulors, while Studio tries to show changes in a way Word does, when track changes feature is allowed. But with this way of presenting changes does not help really.
You have already supported that idea - thanks.


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 20:41
French to Polish
+ ...
SDL Match Maker Apr 29, 2010

Adam Bojan wrote:

What am I doing wrong that the fuzzy matches shown in the Translation Results window are so inaccurate? In other CAT's a 70% match is quite a nice one. Here I get "Stop the MFT" as a 69% match of "Solve the problem".

http://yfrog.com/g4fuzzymatchesj
Is it another bug in the software or something with my TM or settings, I haven't, however, had the problem in Trados 8 and SDLX.

Sometimes it was possible to get comparable results with Trados "classic".
It was not so flagrant although.

The problem is Trados algorthms may be completely unreliable for short sentences.
In this way, in Studio, I receive a 70% match below for only one common one letter word, "w" (and a dash).
With 1% penalty for formatting
Text:
"TagKołaTag - w ruch"
(wheels... are turning, a fragment of a famous Polish poem for children about a locomotive)
TM:
"w - częstość wymuszenia"
(w - forcing frequency, in my case, a parameter of Newmark analysis).

If you have concording tags and some other elements (e.g. dashes. like above), the results may be even worse, 'cause the tags are included in the matching, you may receive "SDL matches" with no human correspondance at all...

For the tag weight, see e.g.
"CF tagUchCF tag - jak gorąco"
"CF tagPuffCF tag - jak gorąco"
The same poem
Oof, how she's burning,
Puff, how she's burning,
A "human" valor for Polish is something like 75%, let's say.
Merrily counted as 97% by T2009 SP2.

I wonder if this also happens with the statistics (I am afraid it does).

They're simply false.

IF so, you have to be very careful accepting "Studio discounts" based on its fuzzy matches. So far none of my clients uses Studio, but maybe they will start ... anyway that's one of the reasons I've bought this thing. What do you think?

The algorithms were tailored for DTP, methinks.
They permit to reduce the translation costs...

Cheers
GG

[Edited at 2010-04-29 18:49 GMT]


Direct link Reply with quote
 

Stefan de Boeck  Identity Verified
Belgium
Local time: 20:41
English to Dutch
+ ...
spot on Apr 29, 2010

Adam Bojan wrote:
This stops being only an individual setting if you are asked to offer discounts based on fuzzy matches. I have no problem with discounts for 70-85% matches in Trados, but I will be very careful if one asks me to apply the same rates for Studio.

At 70% Studio will happily return utter garbage as some sort of a match – so you're absolutely right.

Which also implies that it never was some kind of "individual setting" or language pair related thing – it's after all source that's compared to source.


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 21:41
Member (2006)
English to Turkish
+ ...
Any volunteers? Apr 29, 2010

Can anybody make a fuzzy match test with Trados 2007 and Studio? And if Studio matches are not reliable, translators may consider revision of the "Trados Rates" from agencies using Studio, if any.

Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 20:41
Member (2003)
Polish to German
+ ...
What is reliable? Apr 29, 2010

And how do you want to compare?
Taking just the match itself is just half the truth.
More important is how long do you need to translate the same text with the same TM using Studio and using T2007. Provided, that Studio utilizes both MultiTerm and AutoSuggest, I will be faster with Studio. And this is what matters to me. I do not really care, if I get a 40% match returned, which is usable (was the case for T2007, sometimes) or a 80% match. What counts, is what I get in the end.
I will not persuade anyone to use Studio. But discussing this will not bring anything, but just again make the impression something would be wrong with Sudio. But when I compare the amount of work I have to invest in a translation with T2007, Transit or Studio, Studio wins. That's my opinion.
But I have revided my weighted word counts, because those are not "Trados rates", as you try to call them. Weighted word counts are also used by companies utilizing Transit - without any extempt. All my Transit customers demand weighted counting - AFTER the translation has been done, to get also "discounts" for internal fuzzy matches.
So this is not just "Trados rate", to make that clear please.


Direct link Reply with quote
 

Adam Bojan  Identity Verified
Poland
Local time: 20:41
Dutch to Polish
+ ...
TOPIC STARTER
translations results vs concordance Apr 29, 2010

Just one more example how vastly inappropriate the fuzzy search function works.
First I translate the sentence:
(1)"Check again the elevator operational cycle" (the translation goes to TM)
then, after a few segments, I have:
(2)"Check the elevator operational cycle again" and no fuzzy match! Then I do get it after setting the search to 40% as the 12th result down the list.
The fuzzy match for (2) based on (1) is rated only 53%, while e.g.
(3) Check the No. 23 BACK UP (10 A) fuse again.
is presented at the top as 67% match. Unbelievable. The more if I mark the sentence (2) and perform concordance search which shows it to be 100% compared to (1).
Actually there should be an option to make the concordance search automatic for whole new segment.
Let me say it once again. In Studio sentences like "I give the apple to John" and "I give John the apple" are actually no matches, while "I give the apple to John" and "I get the idea to sleep" are certainly very alike.


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 20:41
French to Polish
+ ...
Pilate saith unto him, What is truth? Apr 29, 2010

And when he had said this, he went out again unto the Jews, and saith unto them, I find in him no fault at all.

Sorry, I can't resist...

Jerzy Czopik wrote:

And how do you want to compare?

Probably a collection of tricky sentences.
But it may give us a biased vue.
Probably we need a corpus, let's say at least 2 x 100 sentences (different length, with tags an without tags etc.) in order to get very approximative representative results.

IMO the short sentences text is interesting, see my post above.
I never received so stupid results in DVX, so something is rotten in the kingdom of Trados.

By the other hand, Selcuk launches a good idea. it's interesting how different tools analyse the sentences and see the statistical difference.
Because these diffferences exist.
The algortithms are not the same, it's evident.
So why I often pretranslate in Trados before I import bilingual files in DVX. DVX sometimes misses something.

Taking just the match itself is just half the truth.
More important is how long do you need to translate the same text with the same TM using Studio and using T2007. Provided, that Studio utilizes both MultiTerm and AutoSuggest, I will be faster with Studio.

I think so too.
But we discuss here pure TM matches 'cause they're used for the ratings.

And this is what matters to me. I do not really care, if I get a 40% match returned, which is usable (was the case for T2007, sometimes) or a 80% match. What counts, is what I get in the end.

Well, true.
I suppose for some kind of texts I may be faster with no matches with AutoAssemble in DVX/MQ than a lot of Trados users with, let's say, 60% matches.
Provided I have huge terminology/portions data bases the majority of Trados users don't have.
And I have no clair "return in investment" in the terminology.
A least in the wordcounts
So, in some extent, you're incontestably right.
The wordcount is a treacherous god.

I will not persuade anyone to use Studio. But discussing this will not bring anything, but just again make the impression something would be wrong with Studio.

But something is really wrong in Studio.
See my examples above.

Cheers
GG


Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 20:41
Member (2003)
Polish to German
+ ...
Wrong or right - the whole world is imperfect Apr 29, 2010

And maybe this is why I learned to live with Trados, Transit, Word, computer, Windows and all other tools I use. No one of thos is perfect, even Mac isn't
The only perfect part of this world is my wife.

But coming back to the topic - your idea of a corpus of some hundred or so mixed sentences is a good one to get a comparison between tools. Looking for single matches (or missmatches) does not give enough impression of how the tool really works. The test could also consider the usage of variables - I do not know how far those are used in other tools. For example at the first glance Transit seems to be good in terminology, because it can replace words in matches, when they are found in termbase. But it just looks like a good idea, but I for example cannot utilize that. To much work in changing the declination of such word or to relocate it and so on. So even if we start to do such test, the results will be higly subjective - due to a different way of work of each of us. But still it would be interesting to know, how long it takes to translate such a test corpus using a different tool. Even if this will never be really objective, it could give interesting results, if they would be similar. But I really can imagine me translating faster in Studio than in Transit and for example Antonin Otahal being much faster in Transit or you in DV - simply because of personal preferences.


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Studio SP2: fuzzy matches make no sense

Advanced search







memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search