Splitting up gigantic TMs
Thread poster: Hans Lenting

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
Apr 17

I'm referring to this posting:

https://cafetran.freshdesk.com/support/discussions/topics/6000053707

SDL's Paul Filkin advises to use Heartsome TMX editor:
https://community.sdl.com/product-groups/translationproductivity/f/160/t/9898

One could try to use UltraEdit (Mac and Windows) and chop up the gigantic TMX file.

However, how about a new feature for CafeTran Espresso 2018: Split up TMX files?

A dedicated feature could read the TMX file line after line and write it to a new TMX file of 300,00 TUs.

BTW: I wouldn't be amazed if the 1.5 GB from the article mentioned above could be reduced by 50 %, when removing all extra info Studio stores in its TMs and that CT can do without easily.


 

Igor Kmitowski  Identity Verified
Poland
Local time: 12:25
Member (2016)
English to Polish
+ ...
Split translation memory in TMX edit mode Apr 17

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.
2. In the Search field, type a range for the split (e.g 1-100000) and press Enter.
3. Save the filtered segments to the new TMX file via Project > Export and Exchange > To TMX memory... .


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Small RAM Apr 17

Igor Kmitowski wrote:

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.


I think that loading the entire TMX file to RAM is a problem with gigantic TMX files and computers with small RAM.

I'm not a developer so I don't know how this works, but:

With a simple MS-Word macro that reads a big file line after line, I could extract useful bits of glossaries and TMX files very fast.

Isn't something like that possible for CT too? Not loading the entire TM to RAM but just processing line after line?


 

FarkasAndras  Identity Verified
Local time: 12:25
English to Hungarian
+ ...
yes Apr 19

Hans Lenting wrote:

Igor Kmitowski wrote:

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.


I think that loading the entire TMX file to RAM is a problem with gigantic TMX files and computers with small RAM.


True. There is a fairly primitive tmx chopper in my random collection of scripts:
https://sourceforge.net/projects/aligner/files/grab_bag_1.7-random_tools_for_translators.zip/download

It strips various tags from the tmx but it should work on any file size.


 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Solution in AppleScript Apr 23

Thanks, Andras. I'm sorry to report that I couldn't get your Perl script to work on my Mac.

Here's a great solution using AppleScript:

https://www.proz.com/forum/apple_mac_operating_systems/324749-split_gigantic_tmx_files.html

Here's a part of the DGT, split in parts of 100,000 translation units:

b453xodk2yqnj4djhjap.pngvddgyopitylz6gyns1zh.png

[Edited at 2018-04-23 16:18 GMT]


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Splitting up gigantic TMs

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search