Max size of DV TM
Thread poster: Matthias Brombach

Matthias Brombach  Identity Verified
Germany
Local time: 21:42
Member (2007)
Dutch to German
+ ...
May 8, 2012

Dear colleagues,

I just wanted to ask about your experience what the max size of a DV TM may be. I recently tried to convert a Studio TM with a size of > 300 MB into a DV TM, but after hours (really!) of processing the import (by an .txt file created with Trados 2007) without any remarkable progress I terminated the process. Converting Studio TMs of a „normal“ size is always going on fine, so I wonder one of you might know a trick how to deal with it.
Best regards,
Matthias

[Bearbeitet am 2012-05-08 09:01 GMT]


 

Klaus Herrmann  Identity Verified
Germany
Local time: 21:42
Member (2002)
English to German
+ ...
Moin Matthias! May 8, 2012

I'm sure someone will come up with the exact TM size limit, but in the meantime, here's how I go about big TXT TMs:
- Most important, make sure to UNCHECK "delete duplicate TM entries" in the import options.
- Cut TXT into 5-6 smaller chunks of 50 k
- Do the import
- Buy two or three 6-packs of your favorite beer
- Open DVX TM, use DVX's function to delete duplicate.
- Open the first bottle...
- If you're out of beer, your TM will be ready.

(In my experience, TXT is pretty reliable compared to the TMX import).

Gruß
Klaus


 

Matthias Brombach  Identity Verified
Germany
Local time: 21:42
Member (2007)
Dutch to German
+ ...
TOPIC STARTER
Unfortunately... May 8, 2012

...step 4 wouldn´t be Alt beer (just because you have to drink it where it comes from), but the other steps sound fine although not very promising.

Moin Klaus,

thanks, I will try it overnight simply because it´s not yet the time for step 6...;-)

Best regards to Düsseldorf,

Matthias

[Bearbeitet am 2012-05-08 11:03 GMT]


 

David Turner  Identity Verified
Local time: 21:42
French to English
+ ...
Not sure whether you're using DVX1 or DVX2... May 8, 2012

... but DVX2 will usually import .tmx TMs of that size in a matter of minutes.

 

Matthias Brombach  Identity Verified
Germany
Local time: 21:42
Member (2007)
Dutch to German
+ ...
TOPIC STARTER
Also .tmx files? May 8, 2012

Hi David,
so you leave out the intermediate step to create a .txt file first with Trados 2007?
I just wonder because when importing tmx files in a DVX TM (I use the newest build) DVX doesn´t recognice the language combinations, that´s why I still stick to import Studio TMs as .txt. But I think Klaus his hint may help, to uncheck first "delete duplicate TM entries". I will try again later this day, but thanks.
Best regards,
Matthias


 

Matthias Brombach  Identity Verified
Germany
Local time: 21:42
Member (2007)
Dutch to German
+ ...
TOPIC STARTER
Step 7 completed after approx. 2 h (255,000 segments, 1 Jever Pils), but... May 9, 2012

...the problems are still (nearly) the same:
Whereas the source TM in Studio works very smooth without running out the performance of my PC (2.1 GHz, 4 GB RAM), the converted TM in DVX2 really slows DVX2 down. Klaus, I know you as an experienced user of DVX; do you still stick to it? And what PC configuration do you recommend with AutoSuggest running in DVX2? Even with smaller projects and smaller TMs and term banks the performance sometimes is slow and you can listen to the HD working, working, working... Or do you also tend to use Studio? Yesterday I was about trying to use Studio also for the translation process itself (before I just used it to prepare the translated .sdlxliff files for my customer), but my version (Studio 2009 Freelance) doesn´t offer AutoSuggest (creating an own AutoSuggest file, to be more precisely). And only when SDL will implement reasonable short cuts for term handling (as DV has got), then I will think, (but just think!) about working in Studio. Any suggestions? Thanks!

Best regards,
Matthias


 

MikeTrans
Germany
Local time: 21:42
Italian to German
+ ...
Size of TMs in DVX2... May 30, 2012

Hi Matthias,

I don't know max sizes for TMs, but here a reference for large DTBs:

EMEA, FR-DE, a medical DTB, 300.000+ segments = 1.08 GB
DGT, FR-DE, Union Européenne, 333.000 segments = 1.47 GB

The response times are very fast after a search of the 1st segment.

It's VERY important for performance to do a Repair of your DTBs and Projects on a regular basis, after importing/deleting lots of segments or documents in your project, especially before and after removing Duplicates.

I have private DTBs containing Chunk Segments of the 2 Big Mammas above (for displaying Concordances). They both contain 3.400.000+ segments with the same short translation "CHUNK_xxx".
For practical reasons, I've split them in 2 parts of 1.8 GB each. No problems whatsoever.

Greets,
Mike

[Edited at 2012-05-30 18:55 GMT]


 

Matthias Brombach  Identity Verified
Germany
Local time: 21:42
Member (2007)
Dutch to German
+ ...
TOPIC STARTER
Maybe a batch repair process available? May 31, 2012

Hi Mike,

thanks, I get your point and I will try it more frequently as already performed. How sad that there is no batch routine available to do the repair process with all or with a choice of projects, TMs and termbases.

Best regards,

Matthias


 

MikeTrans
Germany
Local time: 21:42
Italian to German
+ ...
Repair & Compact May 31, 2012

Matthias,

There are actually 2 options: Tools > Repair and Tools > Compact

My experience is that I use Compact after removing lots of files in a project, or after deleting or making a new import of a considerable number of segments in a TM.

Whereas, after a power shut down, after a program crash, after Duplicates removal, then it's advicable to chose the Repair option, other than compacting it will re-index the TM which takes much longer.

Mike


 

Grzegorz Gryc  Identity Verified
Local time: 21:42
French to Polish
+ ...
2 GB Jul 5, 2012

Matthias Brombach wrote:

I just wanted to ask about your experience what the max size of a DV TM may be.

Theoretically 2 GB per file, as for MS Jet 4.0 databases.
Attention, as the DVX TM contains several files, the complete set may reach several GBs.
As the languages specific files are usually greater than the main one, I think something like 1,5 GB for the main file is realistic.
It's hard to see to say how many segments it may contain because it depends of the segment length, I suppose 1.5 million TUs is a good approximation.

I recently tried to convert a Studio TM with a size of > 300 MB into a DV TM, but after hours (really!) of processing the import (by an .txt file created with Trados 2007) without any remarkable progress I terminated the process. Converting Studio TMs of a „normal“ size is always going on fine, so I wonder one of you might know a trick how to deal with it.


Generally, If you experience problems split the file and compact the TM after importing every parts.
It may also help if the Trados TM contains "tricky" segments, e.g. DVX may fail on extremely long segments with multiple tags.

Cheers
GG

[Edited at 2012-07-05 08:57 GMT]


 

Matthias Brombach  Identity Verified
Germany
Local time: 21:42
Member (2007)
Dutch to German
+ ...
TOPIC STARTER
Thank you (dziękuję)... Jul 5, 2012

...for your answers; yes, I got it now that compacting on a regular basis and splitting the .tmx file to be imported would be wise. Maybe you also know how to export .tmx from Studio 2009 by size (in portions)? Also, when importing big .tmx files from Studio, the "language codes" do not appear in DVX2, so I am still forced to import .txt by an intermediate step with the use of Trados 2007, which makes the whole process more time consuming. My customer sends projects with the same Studio-TM, but updated, that´s why, and I always would like to import, better: update it with a minimum effort in DVX2.

Best regards,

Matthias


 

Grzegorz Gryc  Identity Verified
Local time: 21:42
French to Polish
+ ...
Invalid TMX... Jul 5, 2012

Matthias Brombach wrote:

...for your answers; yes, I got it now that compacting on a regular basis and splitting the .tmx file to be imported would be wise. Maybe you also know how to export .tmx from Studio 2009 by size (in portions)?

You can use filters when exporting but IMO it's faster to use a decent text editor and split' em manually.

Also, when importing big .tmx files from Studio, the "language codes" do not appear in DVX2,

It happens when DVX considers the TMX is malformed (invalid chars, invalid header etc.).

It's not always true, it may be also related to some DVX filter errors e.g. extremely large segments with a gazillion of tags, kinda 300 words and 200 tags.
This kind of segments may be sometimes found in incorrectly prepared DTP jobs, when the Trados segmentation rules are screwed up.

so I am still forced to import .txt by an intermediate step with the use of Trados 2007, which makes the whole process more time consuming. My customer sends projects with the same Studio-TM, but updated, that´s why, and I always would like to import, better: update it with a minimum effort in DVX2.

Yep, I understand...
As Studio is totally marginal for me (one job per year...), I didn't pay attention but it seems their TMX may contain invalid characters.
Try to open the TM file in Olifant and fix invalid chars first before you import it in DVX.

Cheers
GG


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Pavel Tsvetkov[Call to this topic]

You can also contact site staff by submitting a support request »

Max size of DV TM

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search