Pages in topic:   [1 2] >
Free TM in 24 languages! The DGT TM
Thread poster: Fi2 n Co

Fi2 n Co  Identity Verified
Portugal
Local time: 00:31
English to French
+ ...
Jun 9, 2017

Hello To all,

I'm creating this thread to inform all that do not know yet that you can have and use a big translation memory made available to the public by the EU, it’s called the DGT TM. This TM includes 24 languages and depending on the languages, if you merge all the entries released in 2017 you will have on average over 400 000 TUs per language pair. If you make use of earlier releases it will be even bigger.

To get it, there are a few steps to the process (3) which I describe in the video I’ve posted on YouTube for all here: https://youtu.be/wVeU9NKEYjM
I have put direct download links to already extracted TMs From the DGT TM in a few language pairs in the video description.

This TM was suggested by Milan Condak, many thanks 😉, ( http://www.proz.com/profile/37344 ) in a thread I created about two other free assets: The VLTM TM and the IATE Glossary that can be connected in Wordfast Pro 5 here: http://www.proz.com/forum/wordfast_support/314785-new_working_feature_in_wordfast_pro_5_connect_to_free_remote_tms_and_remote_glossaries.html#2651147

Feel free to discuss it further here!
My bests to all! 😊


Anthony Teixeira
 

Tatiana Grehan  Identity Verified
United States
Local time: 19:31
English to Russian
+ ...
Thanks for the information! Jun 9, 2017

Does it include a TM for the Russian language?

 

Milan Condak  Identity Verified
Local time: 01:31
English to Czech
No Jun 9, 2017

Tatiana Grehan wrote:

Does it include a TM for the Russian language?


In EU is 31 countries and only 24 official languages. After brexit will English for part of Cyprus, Gibraltar, ...

(I suggest to replace an English with Czech: it is Slavonic (en_US: Slavic) language in Latin-2 and encoding is similar to Baltic languages. Now are three EU languages: EN, DE, FR; german, roman, roman. This makes no sense.)

Milan


 

Milan Condak  Identity Verified
Local time: 01:31
English to Czech
Old presentations Jun 9, 2017

Fi2 n Co wrote:

This TM was suggested by Milan Condak, many thanks 😉, ( http://www.proz.com/profile/37344 )



Here are links to some of my presentations on DGT in Czech language:

1. http://www.condak.cz/nove/2016-05/29/cs/00.html

Vyhledávání v databázi DGT

Wordfast Classic a Wordfast Server, 29.05.2016
--
2. http://www.condak.net/tmx/tmx-dgt-pctrans/cs/00.html

TM DGT v PC Translatoru 2012, 23.10.2011
--
3. http://www.condak.cz/archiv-cz/2009-01/01-25/cs/00.html

PC Translator 2009 + HU, 25.01.2009
--
Before DGT was avalaible database JRC Acquis.

Another resource for non-EU languages are e.g. in OPUS .. the open parallel corpus

http://opus.lingfil.uu.se/index.php

One section are DGT files: http://opus.lingfil.uu.se/DGT.php

==
But all need some work: with downloading and extracting of TMX a story only begins.

Milan

[Edited at 2017-06-09 19:17 GMT]


 

Fi2 n Co  Identity Verified
Portugal
Local time: 00:31
English to French
+ ...
TOPIC STARTER
List of languages Jun 9, 2017

Tatiana Grehan wrote:

Does it include a TM for the Russian language?


Hi,

There's a list with statistics and all released languages on the DGT TM page, you can see it here: http://optima.jrc.it/Resources/DGT-TM_Statistics.pdf

My bests


 

Nuno Oliveira  Identity Verified
Portugal
English to Portuguese
TMXtract.jar not executing solution Jun 12, 2017

First of all, let me thank you for sharing this with us! This is great stuff.

If someone can't get the TMXtract.jar file to execute my advice is to just download and run jarfix. It will fix it.


 

Fi2 n Co  Identity Verified
Portugal
Local time: 00:31
English to French
+ ...
TOPIC STARTER
Thanks Jun 13, 2017

Nuno Oliveira wrote:

First of all, let me thank you for sharing this with us! This is great stuff.

If someone can't get the TMXtract.jar file to execute my advice is to just download and run jarfix. It will fix it.


Thanks Nuno, yes I hope it will help many since free TMs like the VLTM have some time limited entries in some language pairs.icon_smile.gif

Thank you for this tip. I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.
It will be interesting to get feedback from others to see if this gave them a permanent fix, looks promising!

My bests



[Modifié le 2017-06-13 11:33 GMT]


 

Rolf Keller
Germany
Local time: 01:31
English to German
Java :-( Jun 14, 2017

Fi2 n Co wrote:

I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.


On my Windows 10 the .jar file doesn't work. Moreover I have to restart the PC after trying the .jar, because the cpu load goes up to 50 percent and never goes down again.icon_frown.gif

On my old Vista PC it works, though.


 

Rolf Keller
Germany
Local time: 01:31
English to German
Getting the most from the EU's glossaries Jun 14, 2017

You don't need any CAT software to benefit from the EU's glossaries. All these (and other) .tbx and .tmx files can be searched by Omni-Lookup. At the same time you can search several other glossaries in the web or offline (e. g. Acolada or Excel). One click for your preferred portfolio of resources. See www.omni-lookup.de.

 

Fi2 n Co  Identity Verified
Portugal
Local time: 00:31
English to French
+ ...
TOPIC STARTER
Thanks for this esxperience Jun 14, 2017

Rolf Keller wrote:

Fi2 n Co wrote:

I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.


On my Windows 10 the .jar file doesn't work. Moreover I have to restart the PC after trying the .jar, because the cpu load goes up to 50 percent and never goes down again.icon_frown.gif

On my old Vista PC it works, though.


Hi Rolf,

Sorry it didn't work on yours (W10). Did you try Jarfix to solve this then?
Did you get the 1607 or 1703 update? When you get a major update, you could uninstalling Java before restarting for the update to install. Then install Java again. This may help with some issues.

My bests

Win 10 versions here: https://en.wikipedia.org/wiki/Windows_10_version_history

[Modifié le 2017-06-14 14:03 GMT]


 

Niann-Tsyr
Netherlands
Local time: 01:31
Dutch to English
+ ...
Error with importing tmx file into MemoQ Jun 29, 2017

So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbases and translation memories? (Free or paid license)

 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:31
Member (2009)
Dutch to English
+ ...
Yes there are. Here are some of the best: Jun 30, 2017

Niann-Tsyr wrote:

So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbases and translation memories? (Free or paid license)


http://www.farkastranslations.com/tmlookup.php (amazing TM/terminology concordancer)
https://www.xbench.net/ (great all-round terminology tool)
https://github.com/heartsome/tmxeditor8 Heartsome (TMX editor)
http://prdownloads.sourceforge.net/okapi/Olifant-R00022.zip?download (Olifant TMX editor)

Michael

[Edited at 2017-06-30 09:01 GMT]


 

FarkasAndras
Local time: 01:31
English to Hungarian
+ ...
Xbench Jun 30, 2017

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:31
Member (2009)
Dutch to English
+ ...
yes, that RAM is a problem Jun 30, 2017

FarkasAndras wrote:

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


I only use Xbench for glossaries, because of the RAM-loading limitation you mentioned. I use TMLookup for TMs.


 

Noe Tessmann  Identity Verified
Local time: 01:31
English to German
+ ...
Updates for your EU TM alignments? Jun 30, 2017

Hi Andras,

another question concerning your EU TM alignments. Are there any updates? What happened to the project? I still use your highly valuable TMs.

Kind regards

Noe



FarkasAndras wrote:

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Free TM in 24 languages! The DGT TM

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search