SDL Trados Studio 2009 Auto Suggest Limit?
Thread poster: AZTranslations

AZTranslations  Identity Verified
Germany
Local time: 06:32
German to English
+ ...
Sep 23, 2009

Hello all,
I have finally gotten around to creating an auto suggest glossary, and yes, >i have more than enough translation units in the relevant TM to create one, however:
Once I start the process, everything is going fine - TM export in to tmx file, extraction of phrases, coding of file, but after 195300 processed phrases the calculation of probabilities starts, regardless of how large the basic TM is?!
Anyone know why that is and how it could be circumvented?
Thanks,
Anke


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 06:32
English
Why circumvent? Sep 23, 2009

Hello Anke,

I wondered why you wanted to circumvent something here? Have you run into a problem at this point or are you just trying to see if the creation process for an AutoSuggest Dictionary can be made faster in some way?

The technical reasons for why this is relate to the process being RAM-intensive, so we apply a maximum number of processed TUs to prevent the process from "hogging" your machine. The assumption we made is that we should allow half a GB of RAM to compute the AS Dict for 100.000 TUs. At the moment we use a static number, 1000, for the available memory to ensure that the process is not too RAM intensive for the majority of users, so 1000/512*100.000 = 195313. We will probably look at this in the future, to base the calculation on the real amount of RAM you have available and this should speed the process up for users with higher spec. machines.

Is this a problem for you at the moment given you do not do this very often?

Kind regards

Paul
SDL Support


Direct link Reply with quote
 

AZTranslations  Identity Verified
Germany
Local time: 06:32
German to English
+ ...
TOPIC STARTER
No, not a problem per se Sep 23, 2009

I was just wondering why the number of processed phrases was limited and if there was a way to process a TM without such a limit to get the most out of a very large TM, for example.
I just thought larger TMs should yield more auto suggest probabilities, and smaller ones less. I do understand, however, how that could bog down a PC without sufficient RAM to work with.

BTW: Thank you for the prompt answer, Paul!


Direct link Reply with quote
 

Andrei Vybornov  Identity Verified
Russian Federation
Local time: 09:32
Member (2008)
English to Russian
+ ...
Does it mean that Studio will use only 100.000 TUs to create an AS Dict and throw out the rest? Sep 26, 2009

Paul, this is not very reassuring. I guess Anke simply wanted to know, if her entire TM is used to create an AS Dict or not. It is not a matter of speed, but volume. If I have a TM with 200.000 units in it, I want all of them to be processed and not just 100.000. What happens with the rest? Do you simply ignore them to prevent the machine from crashing, or do you process them in another go?
Your answer does not clarify that, although I assume that Studio does process the entire TM, no matter how big it is, to create an AS Dictionary.
SDL recently posted a few European Union AutoSuggest Dictionaries (http://www.translationzone.com/en/landing/autosuggest-download.asp). The source TMs must have been really huge, and I guess SDL did not use only a tiny portion of them in the posted AutoSuggest dictionaries.

Kind regards,
Andrei


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 06:32
English
How much of the TM is used Sep 26, 2009

Hi Andrei,

This is a good question, and on re-reading my response I can see your point. I think the easiest way to look at this is to think of your TM as a cylinder. The surface area at the top of the cylinder represents the entire TM as it is presented to you, and the depth at which you go down this cylinder is the amount of analysis used to create the AS Dictionary.

The restriction on how much RAM is used in this analysis is based on making the best use of the information obtained to ensure the sort of productivity gains users are reporting, but at the same time not causing all the available RAM on a users PC to be taken.

I hope this is a little clearer? I also think the reported messages during this process could use a little tidying up as they are somewhat misleading.

Kind regards

Paul
SDL Support


Direct link Reply with quote
 

Andrei Vybornov  Identity Verified
Russian Federation
Local time: 09:32
Member (2008)
English to Russian
+ ...
“Yes, we use entire TM” or “No, we use only 100.000 TUs” would be easier to understand Sep 26, 2009

Hi Paul,

Thank you for your very quick response.

I am afraid that even now with all this 3D analogy I don’t get a clear picture out of your explanations. After all the TM has just one dimension – its size in TUs.

I guess what you meant to say is, yes, the entire TM is analyzed, but not more than 195300 phrases (i.e. AutoSuggest Dictionary units) can be extracted from each 100.000 TUs of the original TM. Is that correct or am I totally wrong?

Kind regards,
Andrei

[Edited at 2009-09-26 11:54 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 06:32
English
Yes, we use the entire TM Sep 26, 2009

Hi Andrei,

The entire TM is used, but how much depth of analysis we put in to generate all the possible combinations in the TM is controlled. The calculation I quoted and the figures in the error message are misleading because these only reflect the way we control it and not what is happening. I tried to explain the reason for the figures and seem to have only confused you. Sorry about that.

The problem is not only that all the RAM may be taken, but also that the process may completely stall due to swapping (i.e. pagefile use) when the AutoSuggest Dictionary is computed on large TMs, which is the "real" reason of limiting the input size by available RAM, the process may run for days and not make any progress, which is hardly in the user's best interest.

I hope this is a little more explanatory and reassures you that the entire TM is being used and you will see more benefit from a 600,000 TU memory than you would from a 25,000 TU memory for example.

Perhaps some of the users who are already making use of this feature could share their experiences. I think this would be the real test... does it increase productivity or not?

Regards

Paul
SDL Support


Direct link Reply with quote
 

Andrei Vybornov  Identity Verified
Russian Federation
Local time: 09:32
Member (2008)
English to Russian
+ ...
That’s what we all wanted to hear… Sep 26, 2009

Thank you very much for you explanation, Paul. Very consoling.

I use AutoSuggest dictionaries myself, but it does not mean that I know how they are built. I, and so I guess Anke, just wanted to make sure that the whole TM is analyzed in the process of creating an AS dict and no part of it is sacrificed for the sake of preventing the machine from crashes.
Otherwise it would be a disappointment as it is with the concordance search, for example. Studio is VERY slow in that respect. Performing a concordance search takes ages and may very well outweigh any ‘productivity gains’ one may get with the AutoSuggest function. Yes, I can make it work faster with the ‘Performance and tuning’, but the price for it will be lower accuracy, which is not acceptable to me.

Thank you again for your prompt answers.

Kind regards,
Andrei


Direct link Reply with quote
 

Joel Earnest
Local time: 06:32
Swedish to English
Using AutoSuggest Sep 26, 2009

SDL Support wrote:

Perhaps some of the users who are already making use of this feature could share their experiences. I think this would be the real test... does it increase productivity or not?



I've been using the AutoSuggest feature for a few months now and have mixed feelings about it.
I'm typing along when suddenly suggestions appear. I then have to stop for an instant and decide whether or not to use one of the displayed suggestions.
If something looks good, I have to use the down arrow key to move to the correct suggestion, if it's not at the top of the list, and hit return to enter it. I've streamlined this a bit by normally just choosing a suggestion if it's at the top of the list (omitting the "trip" to the down arrow key). My initial impression was that it'd be good with a foot pedal controller for hitting enter. With time, I've trained the little finger on my right hand to make the stretch, which I probably should have been doing all along.

I'm still using this feature regularly but I'm not sure whether it increases speed unless the suggestion is a phrase of two or three words, due to it interrupting my typing flow. It also means having to keep a close watch on what I'm typing. Normally, my attention is pretty much evenly divided between the source and target (but maybe a bit more on the source). Sometimes it feels like I'm watching a ping-pong match.

I may need another month or two before I've reached a final decison on this. It could well be that I stop using it as intended but refer to it when encountering those "hard" terms to see if it puts up anything of use. In these cases, the typing flow is already interrupted.

[Edited at 2009-09-26 13:44 GMT]


Direct link Reply with quote
 
Edric Barbosa Filho
Local time: 03:32
English to Portuguese
I would never had noticed that limitation... Sep 26, 2009

Hi

AutoSuggest boosted significantly my productivity, from about 3,000 words per day (no-match) up to about 4,000 w/d, sometimes even close to 5,000 depending on AutoSuggest hits. Of course that performance highly depends on the job you are working at: such a boost may not happen with other kind of documents/memories.

I would never knew about that limit, so good the feature worked: I managed to finish a rather large technical job (530 XML files with 145,000 words (96,000 no-match) using a memory with 410,000 translations units) in less than 30 days...

I have experienced wonderful gains even combining several memories of related subjects, when the memory specific to the job had less than 25,000 TUs.

In the last 30 days I am using Studio with Beta SP1, AutoSuggest has become so fundamental and natural to me that, when I had to finish a quick job on both Word and Tag Editor using Freelance 2007 the day before yesterday, I caught myself becoming upset because "I was typing without any suggestions at the cursor": you get addicted to that feature!!!

Regards

Edric

[Edited at 2009-09-26 16:42 GMT]


Direct link Reply with quote
 

Joel Earnest
Local time: 06:32
Swedish to English
AutoSuggest Sep 26, 2009

Thanks for sharing your experiences, Edric! I'll keep using it until I've developed my keyboard techinique a bit more.

Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 06:32
Member (2003)
Polish to German
+ ...
Extremly positve impression in a "complicated" language pair Sep 26, 2009

I'm using AutoSuggest since the beginning of the beta testing and must say, that I'm still deeply impressed.
AutoSugges helps me very much, as I am a bad typist, so it is easier to me to chose an entry instead of typing it fresh.
In the meantime I created mnore than one AutoSuggest dictionary, the lates literally yesterday, extracted from EU TMs for EN-PL.
My main AS dictionary is a result of my > 1 million units TM and has been created in the early beta phase. It is German-Polish and the quality of prediction is really very high, given that Polish uses very different grammar forms from German.
And I must also admit, that I did not notice ANY limits. Simply I did not observer the proces of gerenating AS dictionary, but left the machine alone.
It worked on machines with 4 and 3 GB RAM, both Windows XP.

Best regards
Jerzy


Direct link Reply with quote
 

AZTranslations  Identity Verified
Germany
Local time: 06:32
German to English
+ ...
TOPIC STARTER
Thank you Andrej and Paul Sep 27, 2009

@ Andrej: Thanks for understanding my question and posing it better! Especially since I wasn't on the PC these last 2 days...

@Paul: Thanks for clarifying the process and alleviating my worries! I guess I monitor so much of what the machine does because weird things have happened before when I wasn't paying attention.


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 06:32
French to Polish
+ ...
Forcing the Windows default... Sep 28, 2009

SDL Support wrote:


The technical reasons for why this is relate to the process being RAM-intensive, so we apply a maximum number of processed TUs to prevent the process from "hogging" your machine. The assumption we made is that we should allow half a GB of RAM to compute the AS Dict for 100.000 TUs. At the moment we use a static number, 1000, for the available memory to ensure that the process is not too RAM intensive for the majority of users, so 1000/512*100.000 = 195313. [/quote]
IMHO a static number used by default makes no sense at all.
I know the process may be time and resources consuming but it's the reason why I have separate machine(s) for this kind of tasks and I would use all disponible resources.
I have no problems to run this kind of analysis by night, when I'm asleep

We will probably look at this in the future, to base the calculation on the real amount of RAM you have available and this should speed the process up for users with higher spec. machines.

You should use the 32-bit Windows default (i.e. 2 GB), at least you should permit to force it if necessary.

IMHO, a standard new machine of a power user has at least 3 GB of RAM, so, if you reserve 1 GB for the system and Co, you have approx. 2 GB for the AS dictionnary creator acting as a single application.

IMHO you underestimate your user's intelligence

Cheers
GG

[Edited at 2009-09-28 17:06 GMT]


Direct link Reply with quote
 
jimshanks
Local time: 05:32
Dutch to English
Using AutoSuggest Oct 2, 2009

I think it is fantastic. I have been using it in a test environment, but based on my huge TM, over 500,000 TUs. The suggestions are spot on even in context. I am very concerned about being so positive about this all, you will start thinking I am on the staff!

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

SDL Trados Studio 2009 Auto Suggest Limit?

Advanced search







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search