Best CAT tools for large files
Thread poster: Hulda Knigge

Hulda Knigge  Identity Verified
United States
Local time: 18:09
English to Portuguese
+ ...
Mar 10, 2015

Hello,

I would like to know what is the best CAT tools for large files? One that re-uses large TM from projetcs?

I read that MemoQ allows re-use of resources? How many megabytes?

Also, are there any Free CAT tools for large projects(TM), that can be re-use in other new projetcs?


Thanks,

Hulda


 

Siegfried Armbruster  Identity Verified
Germany
Local time: 01:09
Member (2004)
English to German
+ ...
How long is your piece of string Mar 10, 2015

Hi Hulda,
could you please define what you mean with large files and large TMs.

A large file starts for me with 25.000 words and a large TM for me starts at about 100 Mb. With Trados Studio 2011/2014 I definitely have no problems with files of this size or projects with several files of this size and I regularly use several TMs that are each between 100 MB and 1.5 Gb for a single project.

I assume that all modern CAT tools deliver similar performance, so what do you exactly want to know?


 

2nl (X)  Identity Verified
Netherlands
Local time: 01:09
CafeTran can deal with really large projects and TMs Mar 10, 2015

CafeTran can deal with really large projects (like 200,000 words) and TMs (like 4 million segments as TMX file in RAM or 40 GB as indexed database) and also with gigantic glossaries (like 1 million entries, plain text).

You can keep it simple (plain text glossaries, tab-delimited, plain text TMX files, no import needed) or use very fast SQL databases (e.g. for large DGT TMs).

CafeTran comes with tons of really useful features, not for project managers but for translators.

Free support for one year is included.


 

FarkasAndras
Local time: 01:09
English to Hungarian
+ ...
TMs Mar 10, 2015

Siegfried Armbruster wrote:

Hi Hulda,
could you please define what you mean with large files and large TMs.

A large file starts for me with 25.000 words and a large TM for me starts at about 100 Mb. With Trados Studio 2011/2014 I definitely have no problems with files of this size or projects with several files of this size and I regularly use several TMs that are each between 100 MB and 1.5 Gb for a single project.

I assume that all modern CAT tools deliver similar performance, so what do you exactly want to know?

I'm not sure about large translatable files, but there are certainly significant differences between CATs when it comes to handling large TMs. Studio (2011, I have no info on 2014) is towards the middle of the field: It can handle 1 million TUs reasonably well, but importing takes ages. I never managed to import more than about 1.5M TUs into a single TM. The process just grinds to a halt. AFAIK Wordfast fares much worse, while MemoQ does significantly better (it can reasonably use 5-10M TUs in a project, I'm told).
There could easily be similar differences with regard to large translatable files, but I have no info on that.


 

Meta Arkadia
Local time: 06:09
English to Indonesian
+ ...
The Empirical Approach Mar 11, 2015

By lack of a huge (millions of words) source file, I created a CafeTran project with a DGT memory as the source. It's the procedure to edit TMX files, so it's 100% comparable with a regular project. It consists of 1.979.863 segments (two million), and I think I can safely say that that means more than 10 million words (5 words per segment, and since this is EU stuff, probably a lot more) source language, and more than 10 million target language. On my old - late 2009 - iMac, it took less than 2m37s. I used the timer of my screencast app to arrive at that time, and it includes a bit more than the time needed for loading the file in the project. You can watch the exciting video here: DGT as Project

The trick CafeTran uses, is loading both the project file and the TMX file(s) into the RAM, and RAM is heaps faster than an HHD, it's even ten times faster than an SSD. In other words, you need RAM (I assigned 8 GB to CafeTran).

If the file size really gets out of hand, you can use CafeTran's indexed database for blistering fast searches.

Cheers,

Hans

[Edited at 2015-03-11 01:36 GMT]


 

Dominique Pivard  Identity Verified
Local time: 02:09
Finnish to French
Some ideas Mar 11, 2015

Hulda Knigge wrote:
I read that MemoQ allows re-use of resources?

All CAT tools do this.
Hulda Knigge wrote:
How many megabytes?

Most CAT tools should be able to handle your resources, especially if you're starting from scratch.
Hulda Knigge wrote:
Also, are there any Free CAT tools for large projects(TM), that can be re-use in other new projects?

There are not so many free CAT tools, so why don't you give them a try? Here are a few that come to mind: Across (free for freelancers), Heartsome Translation Suite, Memsource Personal, OmegaT, Wordfast Anywhere.


 

Hulda Knigge  Identity Verified
United States
Local time: 18:09
English to Portuguese
+ ...
TOPIC STARTER
Thank you Mar 11, 2015

Thank you everyone for your input.
I have a better idea now of the sizes of TM, etc. I plan to buy a CAT tool but I need one that allows me to re-use the large TMs. I will check Café Tran.

Hulda


 

Dominique Pivard  Identity Verified
Local time: 02:09
Finnish to French
Large TMs Mar 12, 2015

Hulda Knigge wrote:
I plan to buy a CAT tool but I need one that allows me to re-use the large TMs.

I'm still puzzled as to why you think the ability to re-use large TMs would be a differentiator between various CAT tools. Do you think your particular TMs would be way larger than those of other translators?

Re-using TMs is one of the key purpose (though not the only one) of any CAT tool. I'm not aware that some CAT tools would perform very poorly with "large" TMs (whatever large means), while others would shine in comparison. They all rely on some sort of indexing, which means retrieving translations doesn't really depend on the size of the TM. There may be differences in the time it takes to import a large TMX, for instance, but once it's in, it's in.

There are people fond of collecting humongous TMs (available from public sources) with tens of millions of entries. Not all CAT tools can deal with such TMs, but it's not really a problem: they will be primarily useful as reference material and you can use a separate tool like TMLookup to search them.


 

2nl (X)  Identity Verified
Netherlands
Local time: 01:09
A matter of taste Mar 12, 2015

Dominique Pivard wrote:

There are people fond of collecting humongous TMs (available from public sources) with tens of millions of entries. Not all CAT tools can deal with such TMs, but it's not really a problem: they will be primarily useful as reference material and you can use a separate tool like TMLookup to search them.


I agree! It's all a matter of personal preferences. Some still enjoy working in MS Word, others prefer an integrated translation environment, with integrated features to access large resources (rather than running several tools to access their data).

Hans
CafeTran for Mac user


 

Meta Arkadia
Local time: 06:09
English to Indonesian
+ ...
Ins and outs of TMs Mar 12, 2015

Dominique Pivard wrote:
but once it's in, it's in.

True enough, but it doesn't mean that when it's in it's usable, say for auto-assembly or even concordance search.

Cheers,

Hans


 

Dominique Pivard  Identity Verified
Local time: 02:09
Finnish to French
Auto-assembly Mar 13, 2015

Meta Arkadia wrote:
True enough, but it doesn't mean that when it's in it's usable, say for auto-assembly or even concordance search.

I know a small minority of translators swear by auto-assembly: if that feature matters to the original poster, then, yes, she should consider only tools that "auto-assemble". If she's merely interested in "re-using" TM's, then I believe there are far more important aspects to consider.

As to concordance search, well, it's a basic feature supported by more or less all tools. Can you provide examples of usable vs. useless implementation of concordance search?

[Edited at 2015-03-13 07:38 GMT]


 

Meta Arkadia
Local time: 06:09
English to Indonesian
+ ...
Speed also matters Mar 13, 2015

Dominique Pivard wrote:
If she's merely interested in "re-using" TM's, then I believe there are far more important aspects to consider.

Auto-Assembly is re-using TMs, re-using them optimally. It works better if the "units" (usually words and phrases) are not subject to (much) change. So English is a very suitable source language, German less so (compounds), Finnish probably not at all. The larger the TM(s), the better the chance AA comes up with useful suggestions.

Can you provide example of usable vs. useless implementation of concordance search?

Yes. And a very general one, directly related to Hulda's question. A larger TM will take longer to search. The longer the search takes, the less useful it is. Or rather, you're not going to use it if the search takes too long. "Too long" here, is measured in seconds. If a concordance search takes longer than a few seconds, you will either not use it all, or only in a few instances. That's not taking advantage of your large TMs optimally. If you have to wait long before your large TM has been loaded, and if you then have to wait long before you see concordance search result, chances are you're not going to use your large TM at all.

Some time ago, I compared concordance search in a 2 million segments TM (DGT), imported into CafeTran as a TMX file (RAM) and as an indexed database (then still H2, now SQLite) file. A search in the TMX file is still "doable," but if you are only interested in a concordance search, I bet you will actually use the database version.

95NPzHRJJIpnlinking doesn't seem to work, so: http://www.screencast.com/t/95NPzHRJJIpn

Since I only use CafeTran, I cannot offer a comparison with other tools. Please provide them, anybody? I used my good old, late 2009 iMac, 3 GHZ Core 2 Duo, 8 GB assigned to CafeTran, rotational HDD.

Cheers,

Hans


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Best CAT tools for large files

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search