What max size of TMX corpus to run term extract with Multiterm Extract
Thread poster: Happylion
Jan 6, 2010

Hi all,
I'm willing to run 'massive' term extraction on huge TMX files (containing between 80,000 to 150,000 TUs, weighting around 50 to 85 MB) using Multiterm Extract 2007.
It seems that the SW stops working after processing around 29% of the extraction pass on such bug TMXs.
The progress is definitely stopped after several attempts at the same stage for different TMXs of around the same size.
Anyone to advise on the maximum recommended size for TMX/corpus/bilingual file
... See more
Hi all,
I'm willing to run 'massive' term extraction on huge TMX files (containing between 80,000 to 150,000 TUs, weighting around 50 to 85 MB) using Multiterm Extract 2007.
It seems that the SW stops working after processing around 29% of the extraction pass on such bug TMXs.
The progress is definitely stopped after several attempts at the same stage for different TMXs of around the same size.
Anyone to advise on the maximum recommended size for TMX/corpus/bilingual files to perform TE on it?
Or any suggestion to do TE on a different format with more chances to run TE to the end?
Or any suggestion of a tool to split TMX to perform TE on splitted TMXs?

Note that I can afford waiting for several hours to complete a long TE process, but on the other hand, I don't want to wait 24 hours and the process not completed at the end either...

Any suggestions most welcome!

Regards.

Vince
Collapse


 
Yasmine Aitali (X)
Yasmine Aitali (X)
Canada
Local time: 06:49
English to French
+ ...
Hi Vince Oct 24, 2012

I have the same problem. Did you or somebody else find a solution for this problem?

Thanks in advance.


 
Happylion
Happylion
TOPIC STARTER
Nope, no solution Oct 24, 2012

Hi Yasmina,
Very surprised to receive a follow up to my quite old post. To be honest, I did not even remember having posted such a question.
Unfortunately, I did not find any answer from anyone.
So instead of automating term harvesting from a big corpus, I did extract manually a few hundreds terms.
This can be done easily after turning the tmx into an Xls file. First you delete very short segments that do not contain any valuable info, then you can sort them in alpha orde
... See more
Hi Yasmina,
Very surprised to receive a follow up to my quite old post. To be honest, I did not even remember having posted such a question.
Unfortunately, I did not find any answer from anyone.
So instead of automating term harvesting from a big corpus, I did extract manually a few hundreds terms.
This can be done easily after turning the tmx into an Xls file. First you delete very short segments that do not contain any valuable info, then you can sort them in alpha order and go through the list to extract manually the terms.
This may take a few hours but in the end, revising the result of term extraction from automation tools takes a while too. And you're certain that manual extraction is always well spotted.
Cheers.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

What max size of TMX corpus to run term extract with Multiterm Extract







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »