Clean up of translation memories?
Thread poster: Peter Berntsen

Peter Berntsen  Identity Verified
Local time: 09:32
English to Swedish
+ ...
Feb 5, 2015

Does anyone have any ideas on how to go about cleaning up a large TM that contains many old terms and incorrect translations? Where do you start?

Direct link Reply with quote

Vadim Shirkozhukhov
Russian Federation
Local time: 11:32
English to Russian
Just Edit/Delete Specific Words Feb 5, 2015

If I had to clean my translation memory (eg. if someone paid me to do that), I would do it by searching for specific words/phrases that I consider wrong (incorrectly translated). I would find all instances of those words/phrases in my TM and either edit or delete them manually. I would do it directly in my TM environment. Based on my experience, both SDL Trados and MemoQ allow you to do that.

However, like I said, it would take someone paying me to do that. Normally, the idea of cleaning a large TM seems non-productive to me. Even if you only clean out specific words, it takes significant amounts of time. If you want to clean the whole thing, it will take forever. Depending on what you consider large, it may take ages (like months) to just look through a large TM (again, I would do it in my TM environment). And the benefits of having a clean TM (or not having incorrect translations in your TM) seem so insignificant they do not justify the effort to my eyes.

Direct link Reply with quote

Local time: 09:32
English to Hungarian
+ ...
Open ended question Feb 5, 2015

It depends on what you want/need to do. If there are specific (incorrect) terms that you need to get rid of, and simply deleting the affected segments is an acceptable solution, then you may be able to do it fairly painlessly. If you want to fix poor translations of varying types, that will take a lot of work.

For scenario 1, you could use some kind of TM manager to batch delete segments. You can also do it with the newest version of TMLookup (link in TMLookup thread), but I would suggest using a TM editor instead. I believe heartsome TMX editor and Olifant are popular choices. I don't use them so I can't give specific instructions, but someone else surely will. You will probably need to 1) export the TM to TMX. 2) open that TMX with your TM editor of choice, do the automated or manual changes you want, save the TMX and 3) create a new TM in your CAT and import the modified TMX.

Direct link Reply with quote

Silvio Picinini  Identity Verified
United States
Local time: 01:32
English to Portuguese
+ ...
Criteria to decide cleanup Aug 18, 2017

Farkas above has pointed you to how to do it, Okapi Olifant is great, Heartsome I don't know but should work, and cleant TMs in CAT tools is usually painful, so prefer a TM editor. However, before the "how" there is the "should I do it". How do you know if you have lots of errors? I am interested in this topic if people want to share ideas.
First I would take some or the QA checks that can be done with QA tools like Verifika and Xbench, or with the QA features in CAT tools. You can find out if your have lots of inconsistent segments, if you are following your glossary, and a variety of other things. You may have specific regular expressions about mandatory things from your customer (like "our slogan should not be translated"). You can apply that to the TM and find segments that do not comply, maybe because they were created before the rule was established.
Once you find these errors, you have a number for them. Then consider if it is significant or not and decide if the cleanup is needed.

I am also interested in criteria that is specific to TMs (different from the checks above that can be applied to the content that you just translated. I wonder if a purge on the TM for older segments is a good idea. Also, if you (actually your end client) have TMs for obsolete products, should you recommend that they remove those segments from the TM?

I would appreciate to hear about it.


Direct link Reply with quote

CafeTran Training
Local time: 09:32
Correct your TM on the fly Aug 20, 2017

Peter Berntsen wrote:

Does anyone have any ideas on how to go about cleaning up a large TM that contains many old terms and incorrect translations? Where do you start?

Like others have written here, the correction of a TM can be very time consuming.

In CafeTran you have this nice feature to make changes via Find and Replace simultaneously in the project and in the translation memories attached to that project.

Whenever I encounter a wrong term, typo etc. in my legacy TM (and in the project segments that have been populated from this legacy TM), I make sure that the cursor is placed in the incorrect word, press Cmd+F (Ctrl+F), type the correct replacement word and make sure that the correct Scope radio buttons are selected.

CafeTran will make sure that the case of the replacement string is automatically adapted (when the corresponding checkbox is selected):


While working in the translation project, I can also remove different translations for the same source segment (to condense the TM and possibly avoid the use of different target terms):


CafeTran also offers a full-fledged TMX editor, that allows you to execute its QA tasks, there are two tasks here that are especially useful:


In this QA mode you can also perform a spell check or a check for the use of forbidden words/the use of correct terminology from a glossary.

Direct link Reply with quote

CafeTran Training
Local time: 09:32
Some more suggestions Aug 21, 2017

See also:

Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Clean up of translation memories?

Advanced search

Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search