Strategy for managing TMs
Thread poster: Rob Grayson

Rob Grayson  Identity Verified
United Kingdom
Local time: 03:53
Member
French to English
Dec 30, 2007

Hi,

Having been freelancing for 18 months now, nearly all of which time I've been using a CAT tool, I have built up a sizeable translation memory of around 45k translation units. My "strategy" for managing TMs and termbases (aka glossaries) is very simple - I have a single TM which I use for all projects, and then I have multiple termbases, one for each major subject area and ocasionally for each individual client or even project. Although this strategy is simple, this does not necesarily make it the best strategy - it's just the one that I decided to use when I first launched out, without any great understanding of what different approaches were available and what the relative advantages and drawbacks might be.

The main disadvantage I now find myself facing is that my one TM is becoming very large. While this hasn't yet slowed it down to an unmanageable level, I wonder how far off this point is. Would anyone like to let me know how large their TMs are for comparison purposes?

Assuming I want to get away from using a single all-encompassing TM, I'd be really interested to know what strategies fellow translators use for managing their TMs, and how effective these strategies are - and by "effective", I mean how good they are at enhancing productivity by speeding up the overall process.

Thanks in advance,

Rob


Direct link Reply with quote
 

Steven Capsuto  Identity Verified
United States
Local time: 22:53
Spanish to English
+ ...
Scalability Dec 30, 2007

Rob Grayson wrote:
The main disadvantage I now find myself facing is that my one TM is becoming very large. While this hasn't yet slowed it down to an unmanageable level, I wonder how far off this point is. Would anyone like to let me know how large their TMs are for comparison purposes?
Rob


My largest Trados TM is about 50,000 segments and it works fine. Matches pop up immediately and concordance searches are very fast. I know people who have 80,000-segment TMs and they work fine as well.

[Edited at 2007-12-30 18:21]


Direct link Reply with quote
 

Peter Linton  Identity Verified
Local time: 03:53
Member (2002)
Swedish to English
+ ...
Concordance Dec 30, 2007

This is an issue which gets more complex and difficult as the years roll by.

When I started (5 years ago) I had one TM per customer. I rapidly switched to your approach, one large TM. This worked fine for some years, even with very large TMs.

But that caused another unexpected and more insidious problem. After a while, you begin to realise that in many ways the Concordance is the most useful facility, often more useful than the plain TM. You want to see words and phrases in context. But the bigger your TM, the more hits in the Concordance -- so you get an awful lot of similar and often unhelpful instances -- and sometimes very long sentences that are more of a hindrance than a help.

So there is a lot to be gained by having a clean high-quality TM - and weeding it, or at least preventing it from growing out of control. I have, for instance, deleted from the TM all the translations I did 4 or 5 years ago - on the grounds that anything I did them is slightly suspect, and not worth keeping.

So I tried a different strategy -- having TMs for broad topics, like (in my case) Technical, Business, Finance, and also a few for particular customers. I also keep my original large TM, but run it in background (as you can do with the later versions of Trados). In some ways that is the best of both worlds.

I have added other refinements. If a particular translation is not likely to produce any worthwhile new terminology or sentences, I load a TM called Dump. That TM can grow very big, that's OK. I know the data is there (and of course I keep track of which TM I use for each translation in TO3000, so I can find the right one later on). But it doesn't clutter up my quality TMs.

If a translation is likely to produce some worthwhile new terminology or sentences, I load a TM called "Possibles". After completing the translation, I export Possibles and look through for any useful sentences, which I put into a small import file and import into the appropriate TM. More work, but important for maintaining quality and preventing the accumulation of rubbish.

I still don't know what the best strategy is, but I'm getting more and more convinced about the need to minimmise GIGO (Garbage In, Garbage Out) and that it is the Concordance facility that should determine your strategy.

I hope to hear about other novel strategies here.


Direct link Reply with quote
 
Charlie Bavington  Identity Verified
Local time: 03:53
French to English
Good question Dec 30, 2007

I use Wordfast, FWIW, and a 2-year old laptop running XP.
I was recently supplied with a TM of well over 120,000 units, which (apart from being slightly dodgy in terms of content) was a little slow - it would sometimes take over a second to move to the next, fresh, segment in the document while it checked the TM.

I guess what we really all need is some reliable way to have available everything we've done before that might be useful, while minimising the pollution from irrelevant stuff.

Until recently, I too had one big TM for (more or less) everything, which at least covers the first bit of my supposition.

But I noticed that its usefulness was limited - cross-client matches are rare - they always come, if at all, from previous jobs for the same client. So I'm now (more or less) on a one TM per client strategy.

In terms of ensuring I have everything "available", I use a (free) tool called Apsic xbench. This can find terms/phrases in any text file, TMX file, Wordfast glossary or TM, a range of various types of Trados files that mean nowt to me, some SDLX files.... If the file is included in a "project", it whizzes through looking for the term, and returns all possible hits. The drawback is that the process is somewhat manual, you have to remember to 'ask' it to look. But it works fine for me, and I use computerised glossary files less and less these days (altho I have one of them per client too!).

But each to their own.

I only really switched strategy because I didn't like the way the big TM had everything in - if it became corrupt, lost, deleted, I would have been stymied big time. I was also thinking that, super long term (say 10 years hence) there would eventually come a point when it would become too big to use properly. So why not start now, even if I didn't need to? Apsic takes care of the rest (i.e not having everything available in one place). And if Aspic didn't, google desktop would handle most of what Apsic does now, just less quickly.


Direct link Reply with quote
 

Anna Branicka  Identity Verified
Poland
Local time: 04:53
Member (2007)
English to Polish
+ ...
customer-oriented TM Dec 30, 2007

My strategy is to have either customer-oriented or topic-oriented TMs and so far it works.
The largest I have is about 1,200,000 units. It works fine but it takes ages to reorganise it- about a day. I have talked to people from SDL about the size of the TM and they say they have customers who have over 2,000,000 units and everything goes smoothly.


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 04:53
Member (2004)
English to Slovenian
+ ...
Death to GIGO and Concordance rules OK Dec 31, 2007

Peter Linton wrote:

I still don't know what the best strategy is, but I'm getting more and more convinced about the need to minimmise GIGO (Garbage In, Garbage Out) and that it is the Concordance facility that should determine your strategy.



I can second that.

One problem, not yet mentioned: how to convince the client that their Terminology and TM need a thorough overhaul? Especially so if they show the whole trail from the initially horrible to later substandard and latest acceptable quality? It is relatively easy to quarantine this kind of material and/or improve on it (I do it again and again). But the tough moment is when the client start bitching and moaning about the latest translation: "... we did send you the MultiTerm file and TM, so that we have consistent terminology..." Of course they did receive my improved material (my extra mile for the client) but there's something called cognitive dissonance involved here: they just can't accept the fact that they have all kinds of skeletons and corpses in their closets.

Regards

Vito

PS: my largest TMs (call aptly language pair_Full monty.tmw) are all cca 50k segments big and show no signs of instability. They are used only to dump things in and then run concordance on them.

[Urejeno ob 2007-12-31 08:57]


Direct link Reply with quote
 

KSL Berlin  Identity Verified
Portugal
Local time: 03:53
Member (2003)
German to English
+ ...
Concordances rule Dec 31, 2007

I agree completely with Peter and others that the concordance value of a TM generally exceeds its usefulness for ordinary matching purposes. However, I would be hesitant to toss out material just because it is a few years old. I often find useful terminology from segments I translated as long as 7 years ago when I first started using these technologies.

As far as size is concerned, I've got a Trados WB TM for one client with over 200,000 segments and its performance is just fine. God only knows how many segments are in my Déjà Vu TMs, but as long as I don't use too many of them at once for reference, I haven't had any big issues with speed. Part of my TM management strategy does include a "master TM" with all but obviously useless content being written to it. However, regardless of whether I'm using Trados or DV, I make it a point to use attributes for client and subject area to enable me to do filtered exports of the content whenever I find it useful to do so (like when an agency manages to lose all the TM records for a particular end customer whose material I translate).

If you apply these attributes consistently, it should be relatively easy to break up the TM into smaller, more manageable reference units and avoid getting lost in the jungle of an over-full TM.


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 22:53
English to French
+ ...
My strategy Dec 31, 2007

Even though I have tons of TMs of different sizes and kinds, I always make up a new TM for each job. Before I start using it, I set up my text fields. I have a Client, Project and Subject field. Each TU produced therefore is labeled with this information. When I am done translating, I either keep the TM separate from the others, or I incorporate it into a larger one, by subject. I have a huge mechanical engineering TM, which contains smaller TMs from projects for different clients. I typically use this TM as reference TM. When I find matches in this TM, I always know where the TU came from and can thus decide if I can trust it in the present circumstance. I frequently have 100 concordance matches, and it is nice to be able to see at a glance which match comes from where.

Because of the way I set up text fields, I have complete freedom over how I create super TMs: I can create a TM for a particular client which contains all work done for them or a TM that contains all my TUs for a specific subject. In the subject field, I always put more than one value (e.g., user guide; environmental; risk assessment). This allows me to compile all my TMs for user guides, no matter what the subject otherwise was, so I can concentrate on user guide terminology, but I can also create a TM of environmental TUs that will be more general, as well as create a TM that is specific to risk assessments.

In any case, each and every project has its own TM and I always keep these individual TMs (on an external hard drive for the most part) so that I can mix and match them to match a particular project. I think it is important to resist the temptation to make a super TM, unless you copy the original TMs first. Always keep a backup so you can go back in time.

I find that since I've started using text fields, my TMs are much more orderly and I get to leverage a great deal more of my previous terminology, and when I get too many matches, I have the critical information that will help me choose among the results.

Edit: I forgot to mention that I also use ApSic XBench to browse my TMs externally. What's nice about this is that I can combine as many TMs as I want and XBench will also show me which search result came from which TM, but it goes even farther because I can add termbases and bilingual documents to the mix. Then, I can specify in what order these results should be displayed (for example, I want the termbase to be on top because it is my most reliable resource) so I don't have to scroll through thousands of results. For bilingual documents, you can even see the context (the 10 segments before and the 10 segments after the result segment). When you search for a word, you can search for partial words with wildcards, so if you look for do, it will also return done and does, for example. With Trados, a segment has to be a minimum 30% match for Trados to pick it up, and this can be a real pain in the butt with abbreviations, for example. XBench shows you every single segment where the search string occurs, no matter the match type or rate. Finally, XBench has Power Search, which is basically advanced search. It lets you search source strings, target strings or even both. It lets you search for several words that occur in the same segment but not necessarily in that order (if you search for risk and assess, it will for example find a segment that contains the assessment of such a risk). If you couple this with the use of text fields, you will always manage to find the correct TUs and always be equipped for deciding which one to pick.

[Edited at 2007-12-31 17:02]


Direct link Reply with quote
 

Rob Grayson  Identity Verified
United Kingdom
Local time: 03:53
Member
French to English
TOPIC STARTER
Thanks! Jan 4, 2008

Thank you to all for your extremely useful and interesting insights. I already felt that I needed to move away from a single mammoth TM, and thanks to your responses I now have a reasonably good idea how to do that. Since I'm in the process of (probably) changing CAT tool, now is a great time to implement a new strategy.

Special thanks to Viktoria for your detailed explanation.

Apsic Xbench sounds great - unfortunately both the CAT tool I've used up to now and the one I may be about to switch to use their own proprietary formats for TMs and glossaries, which are not recognised by Xbench. So unless I'm not sure whether it's worth converting every TM into a TMX or text file in order to be able to use it...

Thanks again to all,

Rob


Direct link Reply with quote
 

Gregory Flanders  Identity Verified
France
Local time: 04:53
French to English
+ ...
And for managing glossaries? Jan 12, 2008

I also found the above post by Viktoria to be extremely helpful -- my question is how do you organize your glossaries then?

Do you have a glossary for each client, or just by subject?


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maria Castro[Call to this topic]

You can also contact site staff by submitting a support request »

Strategy for managing TMs

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs