A free term extraction tool?
Thread poster: eileenPL

eileenPL
Local time: 00:52
Polish to English
Jan 28

Hi all,

could anyone recommend a free extraction tool for terms and phrases that can be used offline?

Regards,
E.



[Edited at 2018-01-28 21:21 GMT]


 

Jason_P
South Korea
Local time: 07:22
English to Korean
it's not for Online though, Jan 29

https://www.proz.com/forum/sdl_trados_support/322396-word_cloud.html

 

Danesh
Local time: 02:22
English to Farsi (Persian)
+ ...
Okapi Rainbow Jan 29

Rainbow — is a GUI application to launch various utilities related to translation and localization tasks, such as: Text extraction (to XLIFF, OmegaT projects, RTF, etc.) and merging, pre-translation, encoding conversion, terms extraction, file format conversions, quality verification, translation comparison, search and replace on filtered text, pseudo-translation, and much more. Using the framework's pipeline mechanism, you can use Rainbow to create chains of steps that perform a custom set of tasks specific to your needs.
http://okapiframework.org/wiki/index.php?title=Rainbow


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:52
Member (2006)
English to Afrikaans
+ ...
@Eileen Jan 29

eileenPL wrote:
Could anyone recommend a free extraction tool for terms and phrases...?


From what data sources will you be extracting?


 

Rossana Triaca  Identity Verified
Uruguay
Local time: 19:52
Member (2002)
English to Spanish
In addition to Okapi... Jan 29

you can also use the old free version of Xbench.

But the tool I really like the most is AntConc; it's not exactly a term extraction tool, but the word list and, particularly, the "keyness" lists are great to this end, and the video tutorials are easy to follow for newcomers to corpus analysis.

For specialized texts, I find that having a baseline like the Brown corpus is invaluable!


 

eileenPL
Local time: 00:52
Polish to English
TOPIC STARTER
.doc files Jan 29

Samuel Murray wrote:

eileenPL wrote:
Could anyone recommend a free extraction tool for terms and phrases...?


From what data sources will you be extracting?


I'd like to extract phrases from .doc files. I'm a Trados user, but when I think of manual copying terms from files into the translation memory... it'd take ages!



Thanks for all the replies!

[Edited at 2018-01-29 20:07 GMT]


 

Jason_P
South Korea
Local time: 07:22
English to Korean
try mine. Jan 30

eileenPL wrote:
I'd like to extract phrases from .doc files. I'm a Trados user, but ...

It works well with SDL Trados Studio project file. It means you can use it for the all file formats which SDL Trados Studio accepts.

regards


 

David Turner  Identity Verified
Local time: 00:52
French to English
+ ...
You could try my PhraseMiner tool: Jan 30

eileenPL wrote:

Could anyone recommend a free extraction tool for terms and phrases...?


http://asap-traduction.com/PhraseMiner

From a Word document, it extracts "internal" or "intra-document" fuzzy matches, sentences containing 5 or more common consecutive words, sentences that are subsegments of longer sentences, terms containing at least two or more words and appearing two or more times using stop-word lists, sentences containing two more of such extracted terms, etc.

David Turner


 

Arianne Farah  Identity Verified
Canada
Local time: 18:53
Member (2008)
English to French
There's a new free app in the SDL store Jan 30

It works quite well - takes a little tweaking at first to remove common words, but you can save those custom exclusion dictionaries and reuse them, so you'll only have to exclude 'that' 'which' 'will' 'allow', etc. once. You set your minimum word length, your minimum amount of times it's repeated and it'll generate this cloud of words, you can then whittle it down before creating an sdxliff from it that you'll use to populate a glossary - it sounds like a lot of steps, but it's quite painless


http://appstore.sdl.com/language/app/projecttermextract/817/


 

Jason_P
South Korea
Local time: 07:23
English to Korean
functionality limited. Jan 31

Arianne Farah wrote:

It works quite well - takes a little tweaking at first to remove common words, but you can ...
http://appstore.sdl.com/language/app/projecttermextract/817/


It can not extract multiple words (2 words or 3 words pharases).
It sees only 1 word.

[Edited at 2018-01-31 10:18 GMT]


 

stevebpdx
United States
'Bilingual' Term Extraction? - Rainbow May 1

Does Rainbow do 'Bilingual' Term Extraction?

It doesn't appear that it does. Just source term extraction.


 

DZiW
Ukraine
English to Russian
+ ...
https://www.wordfast.net/wiki/PlusTools May 1

Besides AntConc, for .doc files I prefer free and flexible PlusTools, although it requires weighted-words and collocations filtering too.

David, not very convenient to ask a software via email, yet does it support Russian (Unicode/UTF) ?
TY


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

A free term extraction tool?

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search