How to search very large TMX files on a Mac?
Thread poster: xxxwilhelm_zwo

xxxwilhelm_zwo
Netherlands
Local time: 13:05
German to Dutch
Jul 18, 2013

What is a good way to search very large TMX files (3 GB and up) on a Mac, for concordance purposes?

WII


 

Fernando Toledo  Identity Verified
Germany
Local time: 13:05
German to Spanish
I use Jul 18, 2013

wilhelm_zwo wrote:

What is a good way to search very large TMX files (3 GB and up) on a Mac, for concordance purposes?

WII


for this TextWrangler but I think any Editor will do it.


 

Meta Arkadia
Local time: 18:05
English to Indonesian
+ ...
Nope Jul 18, 2013

Fernando Toledo wrote:
[I use] for this TextWrangler

No you don't. TextWrangler can't open files larger than around 350 MB.

but I think any Editor will do it.


Nope. As far as I can see, only Java based text-editors can open such large files.

Cheers,

Hans


 

Martin Skara, PhD.  Identity Verified
Slovakia
Local time: 13:05
French to Slovak
+ ...
Vim Jul 19, 2013

try http://macvim.org/OSX/index.php

 

xxxwilhelm_zwo
Netherlands
Local time: 13:05
German to Dutch
TOPIC STARTER
UltraEdit vs. TextWrangler Jul 19, 2013

Fernando Toledo wrote:

wilhelm_zwo wrote:

What is a good way to search very large TMX files (3 GB and up) on a Mac, for concordance purposes?


for this TextWrangler but I think any Editor will do it.


I'm sorry, but these DGT TMs are way too large for good old TextWrangler. Luckily the new UltraEdit 4.1 can handle them. AMOF its Find in Files function can search very fast in all TMs in a folder, even when they are as gigantic as the DGT. Still looking for the optimal solution, though. (Since UE isn't integrated in my CAT tool CafeTran on my Mac.)

Joakim, when will your revamped TMX searcher be relaunched?


 

Meta Arkadia
Local time: 18:05
English to Indonesian
+ ...
Heap? Jul 19, 2013

Martin Skara, PhD. wrote:
try http://macvim.org/OSX/index.php

MacVim should be able to open large files, but I think it requires increasing the Java heap. I can't. It's not in a *.plist or *.config file (as far as I can see), and I stay away from the Terminal if I don't know what I'm doing which is most of the time.

Advice welcome. Martin?

Cheers,

Hans


 

John Moran  Identity Verified
Ireland
Local time: 12:05
Member (2004)
German to English
+ ...
OmegaT Jul 19, 2013

Assuming you have more than 4GB RAM OmegaT has no problems with 3GB TM's but you have to tell the Java Virtual Machine to make enough space for the file as the default is too small.

To do this go to Applications/Utilities and open terminal

The type:

cd /Applications/OmegaT.app/Contents/

and then

open .

Drang and drop the file "Info.plist" into a text editor (I use TextWrangler).

Look for the VMOptions and change the -Xmx value to something above 3GB. I have 8GB RAM so I use:


VMOptions
-Xmx6024M

Then create a project with a small dummy file (a docx with one dummy sentence) and place the tmx file in the /tm directory. Then you can use Ctrl+F to search the TMX file and it also uses lemmatisation so "dog" will find "dogs".


 

Meta Arkadia
Local time: 18:05
English to Indonesian
+ ...
It's a worry Jul 20, 2013

John Moran wrote:
Assuming you have more than 4GB RAM OmegaT has no problems with 3GB TM's

I'm pretty sure der Wilhelm is well aware of that solution. Like me, he uses CafeTran. Loading and searching large files in CT is no problem, and you can even run two instances of CT at the same time. And if that isn't enough, you can load a huge TMX file as an "external" database, in which case it uses very little RAM.

So let me rephrase Wilhelm's question:

How can I search - and index - large TMX (and other) files on a Mac, outside my CAT tool.

There are two problems with that:

 You can't open documents (not files in general) exceeding around 350 MB on a Mac with apps that don't run under Java (I don't know if there are other solutions, but I doubt it)
 Spotlight/SpotInside cannot search TMX files

So to search those files, you'll need a Java application, or you (still) need a Java application to open the TMX file, convert it to TXT, split it into files OS X can handle, i.e. smaller than 300 MB to make them searchable in Spotlight/SpotInside.

I still don't know how to do it. I tried Martin's solution (above), but a 1.5 GB TMX file didn't open in MacVim. I tried to increase the Java heap for MacVim, to no avail, mainly because MacVim isn't a Java app.

Der Wilhelm suggested UltraEdit (Java). The new beta can split files it seems, so that could be a solution. I downloaded the latest build which can't split files...

I spent so many hours on trying to solve the issue, I could have learned the contents of those databases by heart. I'm sick of it. But I'm sure everybody knows we're talking about the EU files (DGT and Eurobook), and I happen to translate EU notifications. What's worse, from two source languages - ENG and GER - into DUT. I need those big files. Searching them, and even auto-assemble from them in CAfeTran, is not a problem, but I want to be able to search the DGT/Eurobook files of the other source language. And for non-EU texts, I want to be able to search them without attaching them to my current project.

Cheers,

Hans


[Edited at 2013-07-20 00:44 GMT]

[Edited at 2013-07-21 04:42 GMT]


 

Meta Arkadia
Local time: 18:05
English to Indonesian
+ ...
Well, integrate it then Jul 20, 2013

wilhelm_zwo wrote:
Since UE isn't integrated in my CAT tool CafeTran on my Mac.

Write an Automator Service to be able to search from within CafeTran (or any other app), or ask the UE developer to write it.

Cheers,

Hans


 

Heartsome Support
Local time: 19:05
Import server-based database Aug 30, 2013

It is too big to open this TMX even with text editor. Theoretically you can import the TMX to Heartsome supported server-based database such as MySQL, PostgreSQL or Oracle for searching on Mac. In this case you have to translate files in Heartsome, because Heartsome does not provide an independent TM program.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to search very large TMX files on a Mac?

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search