WORDFAST FORWARD 2019

WHAT IS IT? Join users, developers, trainers, staff, and tech support for Wordfast’s 5th annual user conference. The program will feature three days of Wordfast training and workshops, other software integration sessions, keynote speeches, one-to-one meetings with experts, and more. The evenings will be spent networking and celebrating Wordfast’s 20-year anniversary. WHEN AND WHERE IS IT? The 2019 edition of Wordfast Forward will take place in Sainte-Luce, Martinique on March 21-23, 2019.

Click for Full Participation

Conversion of a large terminological db from tbx to multiterm db
Thread poster: Rossano Rossi

Rossano Rossi  Identity Verified
Local time: 19:12
English to Italian
+ ...
Mar 14

My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]).

I am trying to convert a tbx file with 5268366 entries to Multiterm format.

However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException".

I have extracted 10000 entries from the db (and reconstructed a tbx-compliant file) and the process was completed without errors (conversion to .mtf.xml, generation of .xdt and finally import into a Multiterm termbase).

Are there limits on the size (number of entries) of a tbx file that Multiterm Convert can convert?

Also, are there limits on number of entries a Multiterm sqlite db can contain?

TIA,

Rossano



[Edited at 2019-03-15 08:13 GMT]


 

DZiW
Ukraine
English to Russian
+ ...
SQLITE_MAX_LENGTH = 1'000'000'000 Mar 14

Hello Rossano--It's not as much about hard limits, as database abstraction layer and implementation. Practically, it's very limited by the hardware and architecture, working without performance issues somewhere between 50-300GB in a single file. As a rule of thumb, it should be less than 60% of the storage partition.

How big is your file and could you make sure it's not corrupted? If you're DBA or techy, just check the mem usage and system log to see what else could trigger the err.


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 18:12
Member (2009)
Dutch to English
+ ...
hmm Mar 14

Rossano Rossi wrote:

My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]).

I am trying to convert a tmx file with 5268366 entries to Multiterm format.

However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException".

I have extracted 10000 entries from the db (and reconstructed a tmx-compliant file) and the process was completed without errors (conversion to .mtf.xml, generation of .xdt and finally import into a Multiterm termbase).

Are there limits on the size (number of entries) of a tmx file that Multiterm Convert can convert?

Also, are there limits on number of entries a Multiterm sqlite db can contain?

TIA,

Rossano



What's the structure of the data in the TMX?

How big is the TMX?

How about splitting the big TMX into smaller chunks and trying them?

If there is no metadata, or you don't care if it gets mangled, Xbench will import the TMX, and you can export all the entries into e.g. .xlsx. Then open that and divide it into smaller pieces and try importing these.

Michael

[Edited at 2019-03-14 21:18 GMT]


 

DZiW
Ukraine
English to Russian
+ ...
1 Mar 14

Michael, Excel is not a real database, so XLS (97-2003) is limited 65'536 rows by 256 columns whereas XLSX (2007+) can afford 1'048'576 rows by 16'384 columns up to 32'767 characters each.

I made a quick search and they recommend SQL file to be under 5-25GB for Windows x64 and up to 200GB for *nix.


 

Rossano Rossi  Identity Verified
Local time: 19:12
English to Italian
+ ...
TOPIC STARTER
Format of db Mar 15

I correct my post above tmx should be tbx. I have edited my post to avoid misleading other readers.
This is the header and one entry of my cleaned-up tbx file:
---------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd">
<martif type="TBX-Default" xml:lang="en">
  <martifHeader>
    <fileDesc>
      <sourceDesc>
        <p>This is a TBX file downloaded from the IATE website. Address any enquiries to iate@cdt.europa.eu.</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p type="XCSURI">TBXXCS.xcs</p>
    </encodingDesc>
  </martifHeader>
  <text>
    <body>      
      <termEntry id="IATE-84">
        <langSet xml:lang="de">
          <tig>
            <term>Zuständigkeit der Mitgliedstaaten</term>
          </tig>
        </langSet>
        <langSet xml:lang="en">
          <tig>
            <term>competence of the Member States</term>
          </tig>
        </langSet>
      </termEntry>
--------------------------------------------------------------------------------
There are 6701542 lines & 488245 entries (termEntry). The file is 168 MB.
SDL MultiTerm 2015 Convert cannot convert it ("SystemOutOfMemoryException") to .mtf.xml.
However, it can convert a subset with 10000 entries.


[Edited at 2019-03-15 08:14 GMT]

[Edited at 2019-03-15 08:15 GMT]


 

DZiW
Ukraine
English to Russian
+ ...
https://gateway.sdl.com/CommunityHome Mar 15

While SDL generally recommends that you use the latest version to mitigate glitches and memory leak issues, they say it might be caused by the occurrence of ampersands, even escaped &ampersands, in some fields.

Little wonder even machines with 32+GB RAM no guarantee.


 

Rossano Rossi  Identity Verified
Local time: 19:12
English to Italian
+ ...
TOPIC STARTER
TBX to multiterm DB Mar 15

I have upped the ante to a subset of 100,000 entries. Multiterm converter is able to deal with such a chunk and Multiterm is able to import it. This seems a reasonable solution for an overall set of 488245 entries (i.e. five chunks of 100,000 entries).
Thanks for your support.
Rossano


[Edited at 2019-03-15 15:55 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Conversion of a large terminological db from tbx to multiterm db

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search