Multi UN parallel corpora TM taking days to upload
Thread poster: rincgom

rincgom
United States
Nov 5, 2015

Hi all,

I was wondering if anyone experienced problems when trying to add a TM from a source like the Multi UN parallel corpora from http://opus.lingfil.uu.se/MultiUN.php

The English to Spanish TM is around 9 million sentence pairs and words and is a big file. I tried to use the TM in SDL trados studio 2014 sp 2, it comes in a .tmx format and simply gets stuck halfway into the process. I have not been able to use the TM since the process has taken more than 1 day to complete (I work in a University and computers are needed so I have very little time to wait). I have converted the .tmx to .sdltm and tried to add it that way but it still does not load completely.
Has anyone else experienced this? If so, what course of action would you recommend I take to troubleshoot this?
Also, I wanted to know if there was an add in or option that can convert my finished translations into a .docx file format. Please let me know if the thread already exists so I can visit it.

Thanks,

Francisco


 

Dmitry Pakidov
Russian Federation
Local time: 11:38
Out of pure interest… Nov 6, 2015

…I’ve downloaded the EN-RU TMX (9.5M segments) and tried to import it into an empty TM. I’ve launched the import process yesterday at 6 PM and it was finished by 9:30 AM today. Both the TMX and the target TM were located on a SSD, so one has to assume that the conversion process will take much more time if carried out at an ordinary HDD (and with a slower CPU). Here’s a screenshot of the resulting TM’s settings window: http://i.imgur.com/fQo2oRY.png

My specs:

i5 750 @ 2.67 GHz
8 GB RAM
OCZ ARC 100 @ 100 GB

On a side note, I’m not sure whether the problem lies with Trados or the original TMX is messy (I’ve tried opening it in Olifant, but even the opening process takes ages with Olifant using the whole available RAM for its operations), but in the resulting TM some source segments do not correspond to the target ones, e.g.: http://i.imgur.com/TSuuHQH.png (even if you don’t know Russian it’s pretty easy to see that the source segment contains a list of organisations, and the target segment is half the size of the source segment, is structured in a different way, and contains a hyperlink that is not present in the source segment).

What exactly are you referring to by “convert […] finished translations into a .docx file format”? If your source document is a docx, then use the “Save Target As” function to generate your translated (single-language) docx file.

[Edited at 2015-11-06 12:41 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Multi UN parallel corpora TM taking days to upload

Advanced search







SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search