Reducing the size of a Translation Memory (TMX)
Thread poster: Nelson Yemeli

Nelson Yemeli  Identity Verified
France
Member (2016)
English to French
Aug 1, 2015

Hello dearest colleagues!
I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7
Thanks in advance.


Direct link Reply with quote
 

Umang Dholabhai  Identity Verified
India
Local time: 15:58
Member
English to Gujarati
+ ...
At Openexchange Aug 1, 2015

Open exchange has something which may help you. You may have to spend 35 Euros because it is free only up to 50,000 units.

Here is the link:
http://www.translationzone.com/openexchange/app/sdltmconvert-522.html


Direct link Reply with quote
 

Nelson Yemeli  Identity Verified
France
Member (2016)
English to French
TOPIC STARTER
Humm! Aug 1, 2015

Thanks a million, Umang! But I think 35 euros is more huge than my TM!:-)
Hope there is a cheaper solution.

[Edited at 2015-08-01 08:31 GMT]


Direct link Reply with quote
 

Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 17:28
Member (2004)
English to Thai
+ ...
TM Maintenance Aug 1, 2015

Nelson Yemeli wrote:

I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7


Trados TM maintenance functions are provided to filter field values, delete TU etc. Please refer to below to reduce the TM size:

http://producthelp.sdl.com/sdl_trados_studio_2014/client_en/tm_view/TM_Data/TM_Overview_Managing_Translation_Memory_Data.htm

Soonthon L.


Direct link Reply with quote
 

Nelson Yemeli  Identity Verified
France
Member (2016)
English to French
TOPIC STARTER
I need to reuce before opening Aug 1, 2015

Soonthon LUPKITARO(Ph.D.) wrote:

Nelson Yemeli wrote:

I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7


Trados TM maintenance functions are provided to filter field values, delete TU etc. Please refer to below to reduce the TM size:

http://producthelp.sdl.com/sdl_trados_studio_2014/client_en/tm_view/TM_Data/TM_Overview_Managing_Translation_Memory_Data.htm

Soonthon L.


Dear Soonthon, I don't think I can edit a TM in Trados without opening it first. I need hours only to open it. And whenever I try, Trados tells me that I must upgrade the TM; Upgrading too requires hours.


Direct link Reply with quote
 

Meta Arkadia
Local time: 17:28
English to Indonesian
+ ...
5 GB is large... Aug 2, 2015

Nelson Yemeli wrote:
I need to re[d]uce before opening


... too large for most CAT tools to handle. It's also way too large for "only" a million segments, unless there are heaps of metadata, and more likely, lots of languages.

I suppose you need only one language pair, and (almost) no metadata. In that case, I suggest to use either Andras' free TMLookup or the free version of CafeTran to extract those two languages. Both can import the two languages needed in an SQLite database. You would then need a tool to open your SQLite database to export the table to a format that can be imported in a CAT tool, like CSV or Excel, to get a TMX file again. I use SQLite Browser - again free - for that purpose.



This all looks rather complicated (though it's not that bad, really), so I hope somebody will come up with an easier solution.


Cheers,

Hans

[Edited at 2015-08-02 02:20 GMT]


Direct link Reply with quote
 

Siegfried Armbruster  Identity Verified
Germany
Local time: 11:28
Member (2004)
English to German
+ ...
Large TMX files, AutoSuggest, Studio TM and Splitting the TMX Aug 2, 2015

Dear Nelson,
I don’t know anything about your TM, but I know a bit about converting large TMX files into Studio TMs and AutoSuggest dictionaries. In the past few weeks I created AutoSuggest files from the DGT TMs (https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory) in more than 300 language pairs. They can be found here: https://alexandria-translation-resources.com/resources-for-translation-providers/autosuggest-dictionaries/dgt-tm/


To create the AutoSuggest files, we used the following process:

- Download the DGT files.
- Extract the language pair you want.

This gives you a TMX with about 3 million entries and about 2 GB size in your language pair.

- Use Xbench or Olifant to remove duplicates from the TMX

This will result in a TMX with about 2.1 million entries and about 1.2 GB size.

- Use Autosuggest Creator to produce an AutoSuggest directly from your shrinked TMX file.

Creating the AutoSuggest file takes about 3 hours on a decent computer (8 GB RAM, i5 Processor)

This is much faster than creating a Studio TM from this shrinked TMX file, which on the same computer might take up to 12 hours.

To split your TMX in smaller files, you could use Olifant, how to do it, is described here:
https://groups.yahoo.com/neo/groups/okapitools/conversations/topics/3678


Direct link Reply with quote
 
xxx2nl  Identity Verified
Netherlands
Local time: 11:28
Very nice posting! Aug 2, 2015

Meta Arkadia wrote:

This all looks rather complicated (though it's not that bad, really), so I hope somebody will come up with an easier solution.


Cheers,

Hans


Very nice posting, Hans!


Direct link Reply with quote
 

Nelson Yemeli  Identity Verified
France
Member (2016)
English to French
TOPIC STARTER
Let's go back to the source! Aug 2, 2015

Hello!
I thank each and everyone for all these interesting suggestions. I thought it may be interesting or necessary for you to know where I got such a weird TM. Here is the link: http://opus.lingfil.uu.se/MultiUN.php
The original TM was a in zipped file. When I unzipped it, I obtained a TMX file of about 5 GB.


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 10:28
Member (2009)
Dutch to English
+ ...
it's just a big TMX Aug 2, 2015

Nelson Yemeli wrote:

Hello!
I thank each and everyone for all these interesting suggestions. I thought it may be interesting or necessary for you to know where I got such a weird TM. Here is the link: http://opus.lingfil.uu.se/MultiUN.php
The original TM was a in zipped file. When I unzipped it, I obtained a TMX file of about 5 GB.


Aha, thanks for the additional info. I just downloaded and had a look at the TMX, and it is bilingual, and contains zero metadata, so its size is purely due to … its size

some_text

I'm no Studio specialist, but you might need to cut the TMX up into smaller chunks to get it into Studio. I think András Farkas has a nice little tool to cut up big TMXs, which I think might be in his "Grab Bag", or his LF Aligner package on sourceforge.

Michael


Direct link Reply with quote
 

Siegfried Armbruster  Identity Verified
Germany
Local time: 11:28
Member (2004)
English to German
+ ...
One of these monsters Aug 2, 2015

Nelson Yemeli wrote:
I thought it may be interesting or necessary for you to know where I got such a weird TM. Here is the link: http://opus.lingfil.uu.se/MultiUN.php


Ah, one of these monsters. We are working on it. I'll let you know when we have it processed. This might take a bit.


Direct link Reply with quote
 
Richard Foulkes  Identity Verified
United Kingdom
Local time: 10:28
German to English
+ ...
1 million TUs not unmanageable...? Aug 3, 2015

I've routinely used TMs of that size in Studio in recent years. Is the performance of your computer the issue maybe? Obviously Studio is pretty heavy in terms of memory (RAM) usage. If you can't open a TM of 1m units, I don't think the problem is the TM - unless it's corrupted.

One thing I did do a while ago was to 'prune' one of my big TMs by filtering and deleting all TUs that hadn't been used for over 10 years. It trimmed down the TM and I'm sure I haven't missed them.


Direct link Reply with quote
 

Nelson Yemeli  Identity Verified
France
Member (2016)
English to French
TOPIC STARTER
The TM is actually gigantic!!! Aug 3, 2015

Richard Foulkes wrote:

I've routinely used TMs of that size in Studio in recent years. Is the performance of your computer the issue maybe? Obviously Studio is pretty heavy in terms of memory (RAM) usage. If you can't open a TM of 1m units, I don't think the problem is the TM - unless it's corrupted.

One thing I did do a while ago was to 'prune' one of my big TMs by filtering and deleting all TUs that hadn't been used for over 10 years. It trimmed down the TM and I'm sure I haven't missed them.

Dear Richard, I made a mistake: actually, the TM contains more than 10 millions TUs. It's a "monster" indeed as Siegfried said.


Direct link Reply with quote
 

Nelson Yemeli  Identity Verified
France
Member (2016)
English to French
TOPIC STARTER
I found the solution: it is patience! Aug 3, 2015

I thank you all for your help. I have tried almost anything, but finally I think the solution is to start a process and be patient. My pc has been running for three days now. I only hibernate it when going to bed, and I really have much hope. Tomorrow, my AutoSuggest Dictionary may be ready for use. I need it, so I have to wait for it!

Direct link Reply with quote
 
Richard Foulkes  Identity Verified
United Kingdom
Local time: 10:28
German to English
+ ...
I'd agree 10m TUs is a bit on the big side :) Aug 3, 2015

I'm not surprised you can barely open it! I'd delete old TUs year at a time and break up what's left if need be. Also consider that all the time you spend saving TUs you'll probably never use is hours of your life you'll never get back! I must have wasted a lot of time maintaining TMs down the years.

Good luck.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Reducing the size of a Translation Memory (TMX)

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search