What max size of TMX corpus to run term extract with Multiterm Extract
Thread poster: Happylion
Jan 6, 2010

Hi all,
I'm willing to run 'massive' term extraction on huge TMX files (containing between 80,000 to 150,000 TUs, weighting around 50 to 85 MB) using Multiterm Extract 2007.
It seems that the SW stops working after processing around 29% of the extraction pass on such bug TMXs.
The progress is definitely stopped after several attempts at the same stage for different TMXs of around the same size.
Anyone to advise on the maximum recommended size for TMX/corpus/bilingual files to perform TE on it?
Or any suggestion to do TE on a different format with more chances to run TE to the end?
Or any suggestion of a tool to split TMX to perform TE on splitted TMXs?

Note that I can afford waiting for several hours to complete a long TE process, but on the other hand, I don't want to wait 24 hours and the process not completed at the end either...

Any suggestions most welcome!



Direct link Reply with quote

Yasmina Ait Ali
Local time: 21:43
English to French
+ ...
Hi Vince Oct 24, 2012

I have the same problem. Did you or somebody else find a solution for this problem?

Thanks in advance.

Direct link Reply with quote

Nope, no solution Oct 24, 2012

Hi Yasmina,
Very surprised to receive a follow up to my quite old post. To be honest, I did not even remember having posted such a question.
Unfortunately, I did not find any answer from anyone.
So instead of automating term harvesting from a big corpus, I did extract manually a few hundreds terms.
This can be done easily after turning the tmx into an Xls file. First you delete very short segments that do not contain any valuable info, then you can sort them in alpha order and go through the list to extract manually the terms.
This may take a few hours but in the end, revising the result of term extraction from automation tools takes a while too. And you're certain that manual extraction is always well spotted.

Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

What max size of TMX corpus to run term extract with Multiterm Extract

Advanced search

SDL Trados Studio 2017 only €415 / $495
Get the cheapest prices for SDL Trados Studio 2017 on ProZ.com

Join this translator’s group buy brought to you by ProZ.com and buy SDL Trados Studio 2017 Freelance for only €415 / $495 / £325 / ¥60,000 You will also receive FREE access to our getting started eLearning program!

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »

  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search