Anticipating terminology needs before dispatching a document to different translators
Apr 8, 2009

Hi all,

a colleague and I are about to translate a 200 page document (Word). My colleague will translate the main text (pages 1 to 94) and I, the Annexes (94 to 187).

I would like to know in anticipation if we're likely to meet the same terminology.

Should we create a kind of pdf index to check if, for example, term A appears on page 3 and 125? Is there any existing terminology extractor capable of doing that? Any idea?

Any help is much appreciated.

Best regards,



PlusTools / +Extract Apr 8, 2009

Hi Mathieu,

The +Extract function in PlusTools can be a good starting point. It allows you to extract all word combinations that appear several times in the text. You can fine tune it to some extent, so that words like "and", "or", "in", "for", "if" etc. are ignored. You will still get a fairly long list, which you may need to trim manually. It will take some time -- but probably not an excessive amount of it.

SDL's Term Extract does more or less the same (I have no experience with it), and several other CAT tools, including Deja Vu, offer similar functions. You may find useful hints in this thread.

I would also strongly advise you to exchange TM and terminology on a daily basis with your colleague.

Kind regards,


Manual term extraction Apr 8, 2009

I have posted about this several times in this forum. I have tried several different methods to do what you want to do, but I have come to the conclusion that no term extraction software is powerful and smart enough to automate this, and you thus need to basically do it manually. This may seem like a lot of complicated work, but it isn't so bad and it is really worth it.

A great way to establish a list of terms that are used frequently in a set of documents is to use a concordancer (I use AntConc - it's free and the best of its kind for this application) to build a list of terms. Concordancers will let you organize frequently occurring words by order of frequency. Then, you can query the concordancer for each word to find frequent terms that include that word (for terms composed of more than one word). The result of this will be a monolingual term list which you can then translate and turn into a termbase. This is what I base my termbases on.

In your case, though, since you are only working with one colleague, it may be more straightforward to create a blank termbase or glossary and add to it as you go. It is important that this termbase or glossary be maintained and updated by only one person, who will then distribute the updated version to the rest of the team. Otherwise, there may be duplicates, two very different translations for the same term, and ultimately a lot of confusion that will hinder rather than aid your work.

I think that exchanging TMs on a daily basis is not a good idea. If the units in a TM haven't been reviewed, it isn't safe to distribute it. Rather, I propose that you update the glossary or termbase and distribute that on a daily basis.

One last thing. If you do create a glossary or termbase, make sure to keep track of changes between versions. When you send an updated glossary or termbase to your colleague, send a file containing all terms added or changed since the last version as well. Then, ask the translator to quickly scan through that file and then implement any terminology changes in the translated portion of the document s/he is working on before resuming translation. Otherwise, you will hardly benefit from the glossary or termbase.


A wiki Apr 8, 2009

If you create a wiki for your common termbase or glossary, both of you can access it easily, both of you can work on it, the changes are recorded, and it's always up-to-date.

suggestion Apr 8, 2009

Mathieu Jacquet wrote:

a colleague and I are about to translate a 200 page document (Word). My colleague will translate the main text (pages 1 to 94) and I, the Annexes (94 to 187).

I would add the following to the other suggestions made here: both should start recording the terms, you have decided to use (a simple word or excel list will do). If they happen to be identical, fine, and if not, you at least will have a collection of items, you have to decide upon and eventually correct.

My experience with a starting head-on (PlusTools plus Extract for instance) preparation: eventually it boils down to maybe hundred or two hundred items, that are unknown, questionable, open to discussion. This can turn out to be a lot of hard work. Next to doing it top down (with brute-force extraction) the alternative would be bottom up - as I have suggested above.



For your case, do it manually in real time Apr 8, 2009

Mathieu Jacquet wrote:
A colleague and I are about to translate a 200 page document (Word). My colleague will translate the main text (pages 1 to 94) and I, the Annexes (94 to 187).

The problem with term extraction tools is that they're based mostly on frequency. Sure, some of them have smart carpfiltering systems (eg PlusTools' Extract will ignore words with X number of synonymns in a thesaurus), but mostly the terms you'll extract won't be particularly useful to you unless you're an inexperienced translator who has yet to develop a gut feeling for words.

Term extraction is only useful for very, very large projects or highly technical texts or for projects with a long deadline and a full-time term guy whose job it is to sift through reams of useless output. That is just my opinion based on my limited experience.

What you need to do is to capture all terms that you believe may be recurring terms, and write down their translations, and share these terms with each other every hour, via e-mail. For this, you need a program that makes capturing terminology easier. If you use a CAT tool for the translation, perhaps the CAT tool has a useful built-in glossary tool.

If you don't use CAT tools, I've written a tiny little program to make adding terms to a text file easier. If you want it, send me a mail.


