Pages in topic:   [1 2] >
Free TM in 24 languages! The DGT TM
Thread poster: Fi2 n Co

Fi2 n Co  Identity Verified
Portugal
Local time: 15:19
Member (2013)
English to French
+ ...
Jun 9

Hello To all,

I'm creating this thread to inform all that do not know yet that you can have and use a big translation memory made available to the public by the EU, it’s called the DGT TM. This TM includes 24 languages and depending on the languages, if you merge all the entries released in 2017 you will have on average over 400 000 TUs per language pair. If you make use of earlier releases it will be even bigger.

To get it, there are a few steps to the process (3) which I describe in the video I’ve posted on YouTube for all here: https://youtu.be/wVeU9NKEYjM
I have put direct download links to already extracted TMs From the DGT TM in a few language pairs in the video description.

This TM was suggested by Milan Condak, many thanks 😉, ( http://www.proz.com/profile/37344 ) in a thread I created about two other free assets: The VLTM TM and the IATE Glossary that can be connected in Wordfast Pro 5 here: http://www.proz.com/forum/wordfast_support/314785-new_working_feature_in_wordfast_pro_5_connect_to_free_remote_tms_and_remote_glossaries.html#2651147

Feel free to discuss it further here!
My bests to all! 😊


Direct link Reply with quote
 

Tatiana Grehan  Identity Verified
United States
Local time: 10:19
English to Russian
+ ...
Thanks for the information! Jun 9

Does it include a TM for the Russian language?

Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 16:19
English to Czech
No Jun 9

Tatiana Grehan wrote:

Does it include a TM for the Russian language?


In EU is 31 countries and only 24 official languages. After brexit will English for part of Cyprus, Gibraltar, ...

(I suggest to replace an English with Czech: it is Slavonic (en_US: Slavic) language in Latin-2 and encoding is similar to Baltic languages. Now are three EU languages: EN, DE, FR; german, roman, roman. This makes no sense.)

Milan


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 16:19
English to Czech
Old presentations Jun 9

Fi2 n Co wrote:

This TM was suggested by Milan Condak, many thanks 😉, ( http://www.proz.com/profile/37344 )



Here are links to some of my presentations on DGT in Czech language:

1. http://www.condak.cz/nove/2016-05/29/cs/00.html

Vyhledávání v databázi DGT

Wordfast Classic a Wordfast Server, 29.05.2016
--
2. http://www.condak.net/tmx/tmx-dgt-pctrans/cs/00.html

TM DGT v PC Translatoru 2012, 23.10.2011
--
3. http://www.condak.cz/archiv-cz/2009-01/01-25/cs/00.html

PC Translator 2009 + HU, 25.01.2009
--
Before DGT was avalaible database JRC Acquis.

Another resource for non-EU languages are e.g. in OPUS .. the open parallel corpus

http://opus.lingfil.uu.se/index.php

One section are DGT files: http://opus.lingfil.uu.se/DGT.php

==
But all need some work: with downloading and extracting of TMX a story only begins.

Milan

[Edited at 2017-06-09 19:17 GMT]


Direct link Reply with quote
 

Fi2 n Co  Identity Verified
Portugal
Local time: 15:19
Member (2013)
English to French
+ ...
TOPIC STARTER
List of languages Jun 9

Tatiana Grehan wrote:

Does it include a TM for the Russian language?


Hi,

There's a list with statistics and all released languages on the DGT TM page, you can see it here: http://optima.jrc.it/Resources/DGT-TM_Statistics.pdf

My bests


Direct link Reply with quote
 

Nuno Oliveira  Identity Verified
Portugal
English to Portuguese
TMXtract.jar not executing solution Jun 12

First of all, let me thank you for sharing this with us! This is great stuff.

If someone can't get the TMXtract.jar file to execute my advice is to just download and run jarfix. It will fix it.


Direct link Reply with quote
 

Fi2 n Co  Identity Verified
Portugal
Local time: 15:19
Member (2013)
English to French
+ ...
TOPIC STARTER
Thanks Jun 13

Nuno Oliveira wrote:

First of all, let me thank you for sharing this with us! This is great stuff.

If someone can't get the TMXtract.jar file to execute my advice is to just download and run jarfix. It will fix it.


Thanks Nuno, yes I hope it will help many since free TMs like the VLTM have some time limited entries in some language pairs.

Thank you for this tip. I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.
It will be interesting to get feedback from others to see if this gave them a permanent fix, looks promising!

My bests



[Modifié le 2017-06-13 11:33 GMT]


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 16:19
English to German
Java :-( Jun 14

Fi2 n Co wrote:

I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.


On my Windows 10 the .jar file doesn't work. Moreover I have to restart the PC after trying the .jar, because the cpu load goes up to 50 percent and never goes down again.

On my old Vista PC it works, though.


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 16:19
English to German
Getting the most from the EU's glossaries Jun 14

You don't need any CAT software to benefit from the EU's glossaries. All these (and other) .tbx and .tmx files can be searched by Omni-Lookup. At the same time you can search several other glossaries in the web or offline (e. g. Acolada or Excel). One click for your preferred portfolio of resources. See www.omni-lookup.de.

Direct link Reply with quote
 

Fi2 n Co  Identity Verified
Portugal
Local time: 15:19
Member (2013)
English to French
+ ...
TOPIC STARTER
Thanks for this esxperience Jun 14

Rolf Keller wrote:

Fi2 n Co wrote:

I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.


On my Windows 10 the .jar file doesn't work. Moreover I have to restart the PC after trying the .jar, because the cpu load goes up to 50 percent and never goes down again.

On my old Vista PC it works, though.


Hi Rolf,

Sorry it didn't work on yours (W10). Did you try Jarfix to solve this then?
Did you get the 1607 or 1703 update? When you get a major update, you could uninstalling Java before restarting for the update to install. Then install Java again. This may help with some issues.

My bests

Win 10 versions here: https://en.wikipedia.org/wiki/Windows_10_version_history

[Modifié le 2017-06-14 14:03 GMT]


Direct link Reply with quote
 

Niann-Tsyr
Netherlands
Local time: 16:19
Dutch to English
+ ...
Error with importing tmx file into MemoQ Jun 29

So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbases and translation memories? (Free or paid license)

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 15:19
Member (2009)
Dutch to English
+ ...
Yes there are. Here are some of the best: Jun 30

Niann-Tsyr wrote:

So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbases and translation memories? (Free or paid license)


http://www.farkastranslations.com/tmlookup.php (amazing TM/terminology concordancer)
https://www.xbench.net/ (great all-round terminology tool)
https://github.com/heartsome/tmxeditor8 Heartsome (TMX editor)
http://prdownloads.sourceforge.net/okapi/Olifant-R00022.zip?download (Olifant TMX editor)

Michael

[Edited at 2017-06-30 09:01 GMT]


Direct link Reply with quote
 
FarkasAndras
Local time: 16:19
English to Hungarian
+ ...
Xbench Jun 30

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 15:19
Member (2009)
Dutch to English
+ ...
yes, that RAM is a problem Jun 30

FarkasAndras wrote:

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


I only use Xbench for glossaries, because of the RAM-loading limitation you mentioned. I use TMLookup for TMs.


Direct link Reply with quote
 
Noe Tessmann  Identity Verified
Local time: 16:19
English to German
+ ...
Updates for your EU TM alignments? Jun 30

Hi Andras,

another question concerning your EU TM alignments. Are there any updates? What happened to the project? I still use your highly valuable TMs.

Kind regards

Noe



FarkasAndras wrote:

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Free TM in 24 languages! The DGT TM

Advanced search







WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search