Automatic Term Extraction functionality
Thread poster: Mark Smith

Mark Smith  Identity Verified
United States
Local time: 11:30
German to English
+ ...
Jun 7, 2005

Hi,

I am writing an appraisal of different CAT software for the European Space Agency, which is interested in acquiring one of DVX, SDLX, TWB or Wordfast for their in-house translation team.

One of the things that they listed in their brief as being important was an "automatic term extraction" feature. Does this exist in any of the current generation of software?

Kind regards,
Mark Smith


Direct link Reply with quote
 
alain raymond
Canada
Local time: 11:30
English to French
Term Extraction : Fusion "Terminology" Jun 7, 2005

Hello Mark,

I realize you didn't mention us, but Fusion does have term Extraction and possible translation Generation... These functions are located in the "Terminolgy" module.

In Fusion you can quickly extract expressions from your TMs (open as many TM as you need) or other files in a few clicks of the mouse. Once the source expression list has been generated, a few other clicks of the mouse will enable you to generate a list of possible translations found in target text. Once you have verified and adjusted proposed translations, you can send them to your terminology database with a few other clicks...

Does this qualify as "automatic term extraction" for the ESA? For more information on these functions, don't hesitate to get in touch directly with us at fusion-support@proz.com, we'll be glad to give you more details.

Warm regards,


Alain
Fusion Team
GMT - 5:00


Direct link Reply with quote
 
ENGSOL
German to English
+ ...
SDLPhraseFinder™ 2005 Jun 7, 2005

Mark,

Mark Smith wrote:

One of the things that they listed in their brief as being important was an "automatic term extraction" feature. Does this exist in any of the current generation of software?




I've been an enthusiastic and a very satisfied SDLX user for a number of years now. Can't offer you first-hand experience of SDL's new SDLPhraseFinder™ 2005 (yet!), but here's some information on the program's automatic terminology features from the SDL website, anyway:


Automating Terminology Extraction

SDLPhraseFinder is new, patent-pending, multilingual terminology extraction software that uses powerful linguistic algorithms to accurately identify terms in existing business content. It is tightly integrated with SDLTermBase™ Online, enabling up-to-date terminology definitions and translations to be maintained as a centralized resource for the business. This has a direct impact on the quality and consistency of multilingual communications.

SDLPhraseFinder outperforms statistical methods of source text analysis, because it makes use of structure of language and uses grammatical information to more accurately identify term candidates and their translations. Linguistic resources can focus on the appropriate classification of terms rather than wasting time and money filtering out poor candidates and looking for correct translations.

http://www.sdl.com/products-translation/products/sdlphrasefinder-desktop.htm

Product brief:
http://www.sdl.com/files/pdfs/SDLPhraseFinder/sdlphrasefinder-2005-product-brief-desktop.pdf

HTH

Thomas

I should maybe add: SDLPhraseFinder™ 2005 is a separate application (not part of SDLX), but it integrates with the SDL family of products, i.e. SDLX, SDLTermBase™ Online etc.

According to the product brief:

"SDLPhraseFinder can analyze a single file or a collection of files in different formats, including RTF, HTML, TXT, SDL Translation Memory and the SDL ITD format [i.e. SDLX - my note]. This latter option enables any file format to be processed by applying an SDLX file filter to create the ITD format prior to analysis by SDLPhraseFinder."

[Edited at 2005-06-07 20:39]


Direct link Reply with quote
 

Mark Smith  Identity Verified
United States
Local time: 11:30
German to English
+ ...
TOPIC STARTER
Fusion / SDL Phrase Finder 2005 Jun 7, 2005

Hi both,

Thanks for your advice. I am interested from a personal point of view to hear about this functionality in Fusion, although I cannot discuss this in the appraisal, as it is a dissertation for my MA Translation Studies, and consequently has a set brief.

As for SDL PhraseFinder 2005, funnily enough I came upon it in my research shortly after posting the above. It sounds like a useful tool, and I will definitely mention it. It's all very confusing though, the "big 3" or whatever the number is all having so many different versions and parts to their packages.

Thanks again,
Mark


Direct link Reply with quote
 
Sonja Tomaskovic  Identity Verified
Germany
Local time: 17:30
English to German
+ ...
Wordfast + PlusTools Jun 8, 2005

Wordfast does not have this feature "built-in". However, there is an utility called Plustools, created by the same programmer, that is capable of extracting terms from a document.

IMHO, the term extraction of PlusTools is not very good. But that is only my experience, I haven't tried any other tool for that and can't compare PT's usefulness with that of any other tool.

HTH.

Sonja


Direct link Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 17:30
German to English
+ ...
Automatic what? Jun 8, 2005

I wonder what is meant by "automatic" term extraction.
How does such a program decide which terms are worth extracting?

For **automatic** extraction it presumably needs a list of words to be excluded - and a separate list for every source language involved. And a "one size fits all" exclusion list would be difficult: we can all agree that we don't need is, are, and and but, but it's not so easy when it comes to common words such as bank (as noun or verb), wheel, body, wall etc., and an exclusion list for multi-word phrases would be rather tricky to define (is "dry wall" a combination of trivial words or a specialist technical term?).
And if it is to do any automatic matching between a source and target language, it will need an enormous bilingual dictionary for every language pair that may be needed.

For the record, the tool I use when I need to extract any terminology is the "lexicon" function in DVX. This can produce a list of all of the words in the source text for the job (or for a single file in the job). I can define the maximum number of words per entry. The entries can be listed in order of frequency.
Normally, I then go through the lexicon and fill in any target language equivalents which I consider useful for the job, then delete all entries for which I have not entered an equivalent.

Once I needed to do a much larger terminology list, so I created my lexicon, exported it to Word and manually deleted everything that was too general for the very technical work of the client. I eventually came up with a 19 page glossary, with target entries drawn partly from specialist dictionaries, partly from information provided by the client and partly from my own research. Took me a couple of days, but impressed the client and provided a good general basis for jobs in that technical field.


Direct link Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 17:30
German to English
+ ...
Multilingual databases Jun 8, 2005

One feature of DVX which may be relevant to your appraisal is that you can hold several languages in the same database. Depending on the way you create an entry, it is possible to store the equivalents for a specific term or sentence in several languages in the same record, so if you have defined your English term in French, German, Spanish, Italian etc. in one multilingual record, you can then also draw on that entry for a German to Spanish translation.

Two remarks on this:
1. I have never tried it myself (no need, I only handle German and English).
2. As we all know, language is often not that simple, and multilingual entries can easily leave us up the creek without a paddle.

But in a highly regulated technical environment such as ESA, there may be some benefit in this feature.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:30
Member (2006)
English to Afrikaans
+ ...
ExtPhr32 by Tim Craven + Wordfast by Yves Jun 10, 2005

Mark Smith wrote:
I am writing an appraisal of different CAT software for the European Space Agency, which is interested in acquiring one of DVX, SDLX, TWB or Wordfast for their in-house translation team.

One of the things that they listed in their brief as being important was an "automatic term extraction" feature. Does this exist in any of the current generation of software?


PlusTools, which accompanies Wordfast, has a term extraction tool which can extract frequent words and phrases from text, but it is rather slow. One option frequently mentioned on other forums, is to use Tim Craven's ExtPhr32 (freeware, Windows 95+), which does pretty much the same, but much much faster. The resulting word list or phrase list can then be exported and manually translated.

Wordfast has no option to "automatically" create a bilingual list, though. But I don't think any tool can do that... it'll have to be exceedingly clever. The user must always be present.

Your company seems to be rich enough, but for a cheaper solution one might combine the above-mentioned with a program like Wilbur (GPL from Redtree.com), which is basically a Desktop Search tool with advanced features (such as regex, proximity searches, and the ability to display only lines in results with match words), as follows:

* If you already have bilingual TMs, you can export the TM to a tab delimited format (such as the one Wordfast uses). Then index it with Wilbur, and do a "preview" search so that only the lines of text in which the search word occurs, is shown on screen.

* If you only have monolingual material, also use Wilbur but possibly use the full view so that you can see more context.


Direct link Reply with quote
 

Simon Sobrero  Identity Verified
United Kingdom
Local time: 16:30
Italian to English
+ ...
SDL Phrasefinder Nov 11, 2007

Hi Mark,

Have a look at my posting on:

http://www.proz.com/topic/72526?pg=e

basically I have bought SDL Phrasefinder but I have been unhappy with its bilingual terminology extraction.

do you have any more information from your studies? I would love to hear from you!

thanks
Simon


Direct link Reply with quote
 
David Turner  Identity Verified
Local time: 17:30
French to English
+ ...
All under the same roof Nov 19, 2007

Victor Dewsbery wrote:
For the record, the tool I use when I need to extract any terminology is the "lexicon" function in DVX. This can produce a list of all of the words in the source text for the job (or for a single file in the job). I can define the maximum number of words per entry. The entries can be listed in order of frequency.


And the DVX Lexicon Builder is a built-in feature in the normal translation environment.
It's not a separate add-on like the other tools mentioned.
BR,
David


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 11:30
English to French
+ ...
Good term extraction - free! Nov 25, 2007

I have found a good way to extract terms from monolingual documents when preparing for translation. I use across. It is free for freelancers under the condition that the freelancer is willing to be added to the software editor's database which is in turn used by outsourcers to find freelancers for their projects (outsourcers, on the other hand, pay a hefty sum for the software).

across has a handy term extraction feature. It is very simple to use and I find it much more efficient than +tools (I find that most of +tools results is garbage). First, you load the translatable document (create a translation project and make sure you specify that terminology work needs to be done before starting translation). Then, the first task proposed by across will be term extraction. When you run this, across automatically finds all term candidates in the file (it takes a little while - I did the test on a 40K-word Word document and it took about ten minutes). Then, you review the list and check all terms that you want to keep - the rest will be discarded. Once you are done with this task, the next one that across proposes is term translation. You get the list of terms you defined earlier and you can enter the translation for each - a termbase will be created based on this data. The neat thing is that you can ask to see the context within the translatable document for each term in the list of terms and you can see all this in the same window. Once the termbase is created, you can export it as CSV or TBX file, so you can either use the CSV as a reference document or create a termbase for use with other CAT tools.

I find that the results are awesome! The software really seems to look for meaningful phrases, not just count the number of times any string (meaningful or meaningless) appears in a document. So far, this is the fastest and most straightforward way I've found to build lists of meaningful terms.

[Edited at 2007-11-25 18:41]


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 16:30
Member (2009)
Dutch to English
+ ...
term extraction in Across Sep 7, 2012

Hello Viktoria,

Can you perhaps elaborate on how one goes about 'specifying that terminology work needs to be done before starting translation'?

I installed Across so as to try out its term extraction, but can't seem to figure out how and where in Across to start the term extract module...

Michael


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 16:30
Member (2009)
Dutch to English
+ ...
figured it out already! Sep 7, 2012

'Choose from Several Workflow-Templates

The workflow template determines which task types Across identifies for a text, for example, "terminology work", "translation", and "correction." Before you check in new documents, check that the correct workflow template has been defined under Tools -> Profile Settings -> Project Wizard settings under the entry Workflow template.' (http://www.doku.info/doku_article_332.html )

... you need to make sure that you select the right 'workflow' before you create a project. the one with sth like: 'terminology > translation', and then, when you create a new project, and then open it, the first step Across will present you with is term extraction!

Michael


Direct link Reply with quote
 

Sone-Ngole Alvin Ngole  Identity Verified
Local time: 16:30
English to French
+ ...
I tried Across too, it works good Dec 9, 2012

I just tried Across too, I am quite satisfied with it.
Some troubles: very heavy installation file (1GB). After installation, Across took several minutes to open for the first time. I had already started uninstalling it when the window finally opened.
It opened quite fast the second time


Direct link Reply with quote
 

Sadie Scapillato
Local time: 11:30
French to English
There's a way to update the Workflow type Mar 13, 2013

Michael Beijer wrote:


'Choose from Several Workflow-Templates

The workflow template determines which task types Across identifies for a text, for example, "terminology work", "translation", and "correction." Before you check in new documents, check that the correct workflow template has been defined under Tools -> Profile Settings -> Project Wizard settings under the entry Workflow template.' (http://www.doku.info/doku_article_332.html )

... you need to make sure that you select the right 'workflow' before you create a project. the one with sth like: 'terminology > translation', and then, when you create a new project, and then open it, the first step Across will present you with is term extraction!

Michael


Stumbled upon a way to update your workflow type. Under Projects (not My Projects) in the left-hand column, select one of your tasks , then click on Properties. In the window that opens, there's a dropdown menu called Workflow. I changed my original selection to Terminology work and Translation. It worked!

(I'm using Across Personal Edition v5.5)


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Automatic Term Extraction functionality

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search