Multi UN parallel corpora TM taking days to upload
Thread poster: rincgom

rincgom
United States
Nov 5, 2015

Hi all,

I was wondering if anyone experienced problems when trying to add a TM from a source like the Multi UN parallel corpora from http://opus.lingfil.uu.se/MultiUN.php

The English to Spanish TM is around 9 million sentence pairs and words and is a big file. I tried to use the TM in SDL trados studio 2014 sp 2, it comes in a .tmx format and simply gets stuck halfway into the process. I have not been able to use the TM since the process has taken more than 1 day to complete (I work in a University and computers are needed so I have very little time to wait). I have converted the .tmx to .sdltm and tried to add it that way but it still does not load completely.
Has anyone else experienced this? If so, what course of action would you recommend I take to troubleshoot this?
Also, I wanted to know if there was an add in or option that can convert my finished translations into a .docx file format. Please let me know if the thread already exists so I can visit it.

Thanks,

Francisco


 

Dmitry Pakidov
Russian Federation
Local time: 10:44
Out of pure interest… Nov 6, 2015

…I’ve downloaded the EN-RU TMX (9.5M segments) and tried to import it into an empty TM. I’ve launched the import process yesterday at 6 PM and it was finished by 9:30 AM today. Both the TMX and the target TM were located on a SSD, so one has to assume that the conversion process will take much more time if carried out at an ordinary HDD (and with a slower CPU). Here’s a screenshot of the resulting TM’s settings window: http://i.imgur.com/fQo2oRY.png

My specs:

i5 750 @ 2.67 GHz
8 GB RAM
OCZ ARC 100 @ 100 GB

On a side note, I’m not sure whether the problem lies with Trados or the original TMX is messy (I’ve tried opening it in Olifant, but even the opening process takes ages with Olifant using the whole available RAM for its operations), but in the resulting TM some source segments do not correspond to the target ones, e.g.: http://i.imgur.com/TSuuHQH.png (even if you don’t know Russian it’s pretty easy to see that the source segment contains a list of organisations, and the target segment is half the size of the source segment, is structured in a different way, and contains a hyperlink that is not present in the source segment).

What exactly are you referring to by “convert […] finished translations into a .docx file format”? If your source document is a docx, then use the “Save Target As” function to generate your translated (single-language) docx file.

[Edited at 2015-11-06 12:41 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Multi UN parallel corpora TM taking days to upload

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search