Glossary extraction tool
Thread poster: Emmanuel V
Jan 19, 2010

I am trying to take the first step into building glossaries and came across an article that mentioned "a term extraction from source documents ".

Can anyone shed some light on if indeed there are "automated" term extraction tools/software and how they actually "choose" which words should go into a glossary?

Thank you


 

Attila Piróth  Identity Verified
France
Local time: 04:51
Member
English to Hungarian
+ ...
Monolingual term extraction in PlusTools Jan 19, 2010

Hi Emmanuel,

There are several such tools, including the +Extract feature of PlusTools (Wordfast's free add-on), which is downloadable here.

The tool uses a statistical approach: it selects all word combinations that occur a certain number of times in the document. This number can be set by the user -- just like the maximum length of the combination ("look for all combinations of 10 words or less that occur at least twice in the document").

This purely statistical approach usually produces a long list that you need to trim manually. You can get slightly better results by excluding such trivial words as "of, and, or, with, on, in, the, for, about, ..."; the list of such stop words is also customizable. But even if you use such a list, the automatically obtained results need to be checked and weeded out.

Some ballpark figures: if your source text is about 30,000 words, and you set the minimum number of repetitions to as low as 2, you may end up with an automatic list of 1500-2000 words. Depending on your computer, the program takes between 5 and 20 minutes to produce this first list. After manual trimming, about 10% of these terms (200 words) are kept. This manual part takes 2 hours. (These figures are just very rough estimates.)

Whether or not this investment of time is justified, depends heavily on the specific details of the project. In my experience, the invested time is often justified for team projects, where having the vocabulary established in advance saves time in the long run. (If terminology unification is done after translators have produced their first version, reworking their translation will be more time consuming.) If you work alone, there is a good case for skipping this step and build the terminology database on the fly.

Kind regards,
Atitla


 

Emmanuel V
TOPIC STARTER
Thanks Jan 19, 2010

Thank you very much for the input and the extra information.
Much appreciated.


 

Pablo Bouvier  Identity Verified
Local time: 04:51
German to Spanish
+ ...
Glossary extraction tool Jan 19, 2010

Emmanuel V wrote:

I am trying to take the first step into building glossaries and came across an article that mentioned "a term extraction from source documents ".

Can anyone shed some light on if indeed there are "automated" term extraction tools/software and how they actually "choose" which words should go into a glossary?

Thank you



automatic, monololingual, free (only for german and english) Beosphere

automatic, bilingual, payment: Synchroterm

manual, bilingual, payment TermiDOG

[Editado a las 2010-01-19 17:23 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Glossary extraction tool

Advanced search







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search