What max size of TMX corpus to run term extract with Multiterm Extract
Thread poster: Happylion
Jan 6, 2010

Hi all,
I'm willing to run 'massive' term extraction on huge TMX files (containing between 80,000 to 150,000 TUs, weighting around 50 to 85 MB) using Multiterm Extract 2007.
It seems that the SW stops working after processing around 29% of the extraction pass on such bug TMXs.
The progress is definitely stopped after several attempts at the same stage for different TMXs of around the same size.
Anyone to advise on the maximum recommended size for TMX/corpus/bilingual files to perform TE on it?
Or any suggestion to do TE on a different format with more chances to run TE to the end?
Or any suggestion of a tool to split TMX to perform TE on splitted TMXs?

Note that I can afford waiting for several hours to complete a long TE process, but on the other hand, I don't want to wait 24 hours and the process not completed at the end either...

Any suggestions most welcome!




Yasmina Ait Ali
Local time: 14:50
English to French
+ ...
Hi Vince Oct 24, 2012

I have the same problem. Did you or somebody else find a solution for this problem?

Thanks in advance.


Nope, no solution Oct 24, 2012

Hi Yasmina,
Very surprised to receive a follow up to my quite old post. To be honest, I did not even remember having posted such a question.
Unfortunately, I did not find any answer from anyone.
So instead of automating term harvesting from a big corpus, I did extract manually a few hundreds terms.
This can be done easily after turning the tmx into an Xls file. First you delete very short segments that do not contain any valuable info, then you can sort them in alpha order and go through the list to extract manually the terms.
This may take a few hours but in the end, revising the result of term extraction from automation tools takes a while too. And you're certain that manual extraction is always well spotted.


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

What max size of TMX corpus to run term extract with Multiterm Extract

Advanced search

Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »

  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search