Pages in topic:   [1 2] >
DGT translation memories
Thread poster: Dominique Pivard

Dominique Pivard  Identity Verified
Local time: 13:51
Finnish to French
Feb 6, 2013

Old news already, but here is how to create DGT TM's in any language pair (23 EU languages available):

http://wordfast.fi/blog/cat-tools/2013/02/06/how-to-create-dgt-translation-memories/
or
http://youtu.be/GNj07W2ZqhQ?hd=1

The sample Finnish-Slovenian TMX used in the video has more than 2 million translation units (though probably lots of duplicates).


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 12:51
Member (2005)
English to Spanish
+ ...
2 million in English-Spanish Feb 6, 2013

Thanks a lot Dominique! This is great information and I appreciate it. I downloaded the package and made English-Spanish. Contains nearly 2 million segments as well. I also plan to make my other main pair, German-Spanish.

I am keeping this memory as background information for EU related translations in our memoQ server here.

The extraction took just over an hour on my machine.


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 13:51
Finnish to French
TOPIC STARTER
memoQ Feb 6, 2013

Tomás Cano Binder, CT wrote:
The extraction took just over an hour on my machine.

You must have a much faster computer than the one I used (a three-year old laptop with an AMD processor and 4 GB of RAM)!

Let me know how the import in memoQ goes, because when I tried, I wasn't able to complete it. Not a problem for me, because I'm searching the DGT (and other very large TM's) with dtSearch, but I think memoQ (and probably several other tools) may have problems dealing with TM's that big.


Direct link Reply with quote
 

Meta Arkadia
Local time: 18:51
English to Indonesian
+ ...
No problems, and problems Feb 6, 2013

Dominique Pivard wrote:
but I think memoQ (and probably several other tools) may have problems dealing with TM's that big.

I use the DGT for GER>DUT as one of three TMs in CafeTran without problems. I assigned 6 GB of RAM to Java, and DGT "pre-translates" (another strange "Igor term" which means auto-assemble) as a database with a low priority. I also set it to Read Only.
Searching within DGT provides instant results.

Not that I don't have problems, though: http://www.proz.com/forum/apple_mac_operating_systems/242687-automated_search_help_needed.html but they have nothing to do with DGT, and everything with searching in databases.

Cheers,

Hans


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 11:51
Member (2009)
Dutch to English
+ ...
re: importing large (DGT) TMXs into memoQ Feb 6, 2013

In order to get the really big ones into memoQ you need to cut them in half in a text editor and import them in two goes. For really big files, I recommend EmEditor (which can handle sizes that even UltraEdit can't).

Michael


http://www.emeditor.com/

[Edited at 2013-02-06 10:06 GMT]


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 13:51
Finnish to French
TOPIC STARTER
Other tools? Feb 6, 2013

Michael Beijer wrote:
In order to get the really big ones into memoQ you need to cut them in half in a text editor and import them in two goes. For really big files, I recommend EmEditor (which can handle sizes that even UltraEdit can't).

Yes, now you're mentioning it, I remember you talked about splitting the TMX before importing in memoQ. Did you remember how long it took to import each half? Did you import the 2nd half into the same TM as the 1st half, or to a separate memoQ TM? Do you find you get useful LSC hits from the DGT TM's?

Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011? If so, how long did it take (for instance compared to memoQ)?


Direct link Reply with quote
 

Stanislav Pokorny  Identity Verified
Czech Republic
Local time: 12:51
English to Czech
+ ...
Studio positive Feb 6, 2013

Dominique Pivard wrote:
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011?


No problems with Studio; took about four hours on an older i3 2.5 GHz machine with 3 GB RAM.


Direct link Reply with quote
 
FarkasAndras
Local time: 12:51
English to Hungarian
+ ...
Studio struggles above 2M Feb 6, 2013

Stanislav Pokorny wrote:

Dominique Pivard wrote:
Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011?


No problems with Studio; took about four hours on an older i3 2.5 GHz machine with 3 GB RAM.


I've done a couple of tests with Studio (2009 only). It slows down as the number of segments goes up. So it might do 100,000 segments in 2 minutes and 1 million segments in an hour and a half (random figure), but it will take six hours to import two million. In my experience, about two million is the upper limit. I tried to import 6 million TUs once, and killed it after sixteen hours. It was not even halfway done IIRC. Maybe 2011 brought improvements in this regard, I will soon test it.
I'm not sure if lookup performance is better with multiple smaller TMs but I suspect it might.
In any case, the size of the DGT-TM is right about where Studio starts to crap out.

I asked about this in a separate thread here: http://www.proz.com/forum/cat_tools_technical_help/237113-very_large_tms_~10_million_tu.html


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 11:51
Member (2009)
Dutch to English
+ ...
as far as I can remember... Feb 6, 2013

Dominique Pivard wrote:

Yes, now you're mentioning it, I remember you talked about splitting the TMX before importing in memoQ. Did you remember how long it took to import each half? Did you import the 2nd half into the same TM as the 1st half, or to a separate memoQ TM? Do you find you get useful LSC hits from the DGT TM's?

Have you tried importing the DGT TMX into other tools, eg. DVX2 or Studio 2011? If so, how long did it take (for instance compared to memoQ)?


Hi Dominique,

1. I can't remember exactly how long it took for each half (of 330MB), maybe around 40 minutes or so each (on a 64-bit desktop with a 3.07 GHz i7, 16GB of RAM and an SSD).

2. I imported the 2nd half into the same TM as the 1st half.

3. I have LSC (longest substring concordance) switched off. I find it never has anything useful to report. Incidentally, I also have Predictive Typing & AutoPick (and the Muse) switched off, as I find they just get in my way when translating.

4. I tried importing it into Déjà Vu X2, but gave up after 8 hours.

Michael

[Edited at 2013-02-06 13:54 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 12:51
French to Polish
+ ...
DVX Feb 6, 2013

Michael Beijer wrote:

4. I tried importing it into Déjà Vu X2, but gave up after 8 hours.


As a big DVX fan, I can but confirm than a large TMX import in DVX is a PITA
Now, after some months, I don't remember exactly but the header of the DGT TMX is/was incorrect and the file can't be imported "as is", it was necessary to edit in a decent text editor.
A good practice is to import a smaller TMX, compact the DVMDB, then import another smaller TMX, compact the DVMDB, etc.

Cheers
GG


Direct link Reply with quote
 

Meta Arkadia
Local time: 18:51
English to Indonesian
+ ...
A very short screencast of DGT GER-DUT Feb 7, 2013

in CafeTran. I go to the next segment, DGT (and my other TMs and glossaries) Auto-Assembles. Next, I select a word to search in DGT (and other resources).
The screencast is short because, er, CT doesn't take much time to arrive at the desired results…

http://www.screencast.com/t/E4IDfKcMueF

Cheers,

Hans


Direct link Reply with quote
 
xxxtrhanslator
No indexing? Feb 7, 2013

Do you mean that CafeTran with no indexing of the TM is that fast?

How about opening the TMX file, how many hours did that take?


Direct link Reply with quote
 

Meta Arkadia
Local time: 18:51
English to Indonesian
+ ...
Seconds Feb 7, 2013

trhanslator wrote:
Do you mean that CafeTran with no indexing of the TM is that fast?

Well, yes. But it's set to "pre-translate", and that explains the very fast auto-assemble results. However, searching within the DGT is fast as well, as you can see in my miserably short screencast.

How about opening the TMX file, how many hours did that take?


Seconds. With 6 GB of RAM assigned to Java. And CafeTran loads TMs in RAM.

Cheers,

Hans


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 11:51
Member (2009)
Dutch to English
+ ...
@Hans (Meta Arkadia): Feb 7, 2013

And how about the amount of TMs that CafeTran can access in a project simultaneously? In memoQ I have around 8,000,000 segments across all of my connected TMs and experience no slowdowns. How does this work in CT?

Michael

[Edited at 2013-02-07 11:06 GMT]


Direct link Reply with quote
 

Meta Arkadia
Local time: 18:51
English to Indonesian
+ ...
Eight million? WOW! Feb 7, 2013

Michael Beijer wrote:
And how about the amount of TMs that CafeTran can access in a project simultaneously?

I never tried more that three TMs (.tmx) and two glossaries (tab delimited .txt) at the same time, Michael. And that doesn't present any problems. However, the total number of TUs never came close to 8 million. I don't think I can even try it, because I probably don't have that number of TUs in one language pair. I hope somebody else can answer your question.

Cheers,

Hans


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

DGT translation memories

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search