Splitting up gigantic TMs
Thread poster: Hans Lenting

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
Apr 17

I'm referring to this posting:

https://cafetran.freshdesk.com/support/discussions/topics/6000053707

SDL's Paul Filkin advises to use Heartsome TMX editor:
https://community.sdl.com/product-groups/translationproductivity/f/160/t/9898

One could try to use UltraEdit (Mac and Windows) and chop up the gigantic TMX file.

However, how about a new feature for CafeTran Espresso 2018: Split up TMX files?

A dedicated feature could read the TMX file line after line and write it to a new TMX file of 300,00 TUs.

BTW: I wouldn't be amazed if the 1.5 GB from the article mentioned above could be reduced by 50 %, when removing all extra info Studio stores in its TMs and that CT can do without easily.


Direct link Reply with quote
 

Igor Kmitowski  Identity Verified
Poland
Local time: 01:37
Member (2016)
English to Polish
+ ...
Split translation memory in TMX edit mode Apr 17

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.
2. In the Search field, type a range for the split (e.g 1-100000) and press Enter.
3. Save the filtered segments to the new TMX file via Project > Export and Exchange > To TMX memory... .


Direct link Reply with quote
 

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
+ ...
TOPIC STARTER
Small RAM Apr 17

Igor Kmitowski wrote:

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.


I think that loading the entire TMX file to RAM is a problem with gigantic TMX files and computers with small RAM.

I'm not a developer so I don't know how this works, but:

With a simple MS-Word macro that reads a big file line after line, I could extract useful bits of glossaries and TMX files very fast.

Isn't something like that possible for CT too? Not loading the entire TM to RAM but just processing line after line?


Direct link Reply with quote
 

FarkasAndras
Local time: 01:37
English to Hungarian
+ ...
yes 11:38

Hans Lenting wrote:

Igor Kmitowski wrote:

You can split your large TM in CafeTran (2018 Akua version) as follows:

1. Open your TMX memory file in the edit mode.


I think that loading the entire TMX file to RAM is a problem with gigantic TMX files and computers with small RAM.


True. There is a fairly primitive tmx chopper in my random collection of scripts:
https://sourceforge.net/projects/aligner/files/grab_bag_1.7-random_tools_for_translators.zip/download

It strips various tags from the tmx but it should work on any file size.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Splitting up gigantic TMs

Advanced search






memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search