Conversion of a large terminological db from tbx to multiterm db
Thread poster: Rossano Rossi

Rossano Rossi  Identity Verified
Local time: 09:39
English to Italian
+ ...
Mar 14

My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]).

I am trying to convert a tbx file with 5268366 entries to Multiterm format.

However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException".

I have extracted 10000 entries from the db (and reconstructed a tbx-compliant file) and the process was completed without errors (conversion to .mtf
... See more
My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]).

I am trying to convert a tbx file with 5268366 entries to Multiterm format.

However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException".

I have extracted 10000 entries from the db (and reconstructed a tbx-compliant file) and the process was completed without errors (conversion to .mtf.xml, generation of .xdt and finally import into a Multiterm termbase).

Are there limits on the size (number of entries) of a tbx file that Multiterm Convert can convert?

Also, are there limits on number of entries a Multiterm sqlite db can contain?

TIA,

Rossano



[Edited at 2019-03-15 08:13 GMT]
Collapse


 

DZiW
Ukraine
English to Russian
+ ...
SQLITE_MAX_LENGTH = 1'000'000'000 Mar 14

Hello Rossano--It's not as much about hard limits, as database abstraction layer and implementation. Practically, it's very limited by the hardware and architecture, working without performance issues somewhere between 50-300GB in a single file. As a rule of thumb, it should be less than 60% of the storage partition.

How big is your file and could you make sure it's not corrupted? If you're DBA or techy, just check the mem usage and system log to see what else could trigger t
... See more
Hello Rossano--It's not as much about hard limits, as database abstraction layer and implementation. Practically, it's very limited by the hardware and architecture, working without performance issues somewhere between 50-300GB in a single file. As a rule of thumb, it should be less than 60% of the storage partition.

How big is your file and could you make sure it's not corrupted? If you're DBA or techy, just check the mem usage and system log to see what else could trigger the err.
Collapse


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 08:39
Member (2009)
Dutch to English
+ ...
hmm Mar 14

Rossano Rossi wrote:

My system is Windows 10 (64-bit system with 8 GB RAM and Intel Core i3-3240 CPU [3.40 GHz]).

I am trying to convert a tmx file with 5268366 entries to Multiterm format.

However SDL MultiTerm 2015 Convert interrupts, after several minutes, the execution with an exception of type "SystemOutOfMemoryException".

I have extracted 10000 entries from the db (and reconstructed a tmx-compliant file) and the process was completed without errors (conversion to .mtf.xml, generation of .xdt and finally import into a Multiterm termbase).

Are there limits on the size (number of entries) of a tmx file that Multiterm Convert can convert?

Also, are there limits on number of entries a Multiterm sqlite db can contain?

TIA,

Rossano



What's the structure of the data in the TMX?

How big is the TMX?

How about splitting the big TMX into smaller chunks and trying them?

If there is no metadata, or you don't care if it gets mangled, Xbench will import the TMX, and you can export all the entries into e.g. .xlsx. Then open that and divide it into smaller pieces and try importing these.

Michael

[Edited at 2019-03-14 21:18 GMT]


 

DZiW
Ukraine
English to Russian
+ ...
1 Mar 14

Michael, Excel is not a real database, so XLS (97-2003) is limited 65'536 rows by 256 columns whereas XLSX (2007+) can afford 1'048'576 rows by 16'384 columns up to 32'767 characters each.

I made a quick search and they recommend SQL file to be under 5-25GB for Windows x64 and up to 200GB for *nix.


 

Rossano Rossi  Identity Verified
Local time: 09:39
English to Italian
+ ...
TOPIC STARTER
Format of db Mar 15

I correct my post above tmx should be tbx. I have edited my post to avoid misleading other readers.
This is the header and one entry of my cleaned-up tbx file:
---------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd">
<martif type="TBX-Default" xml:lang="en">
  <martifH
... See more
I correct my post above tmx should be tbx. I have edited my post to avoid misleading other readers.
This is the header and one entry of my cleaned-up tbx file:
---------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE martif SYSTEM "TBXcoreStructV02.dtd">
<martif type="TBX-Default" xml:lang="en">
  <martifHeader>
    <fileDesc>
      <sourceDesc>
        <p>This is a TBX file downloaded from the IATE website. Address any enquiries to iate@cdt.europa.eu.</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <p type="XCSURI">TBXXCS.xcs</p>
    </encodingDesc>
  </martifHeader>
  <text>
    <body>      
      <termEntry id="IATE-84">
        <langSet xml:lang="de">
          <tig>
            <term>Zuständigkeit der Mitgliedstaaten</term>
          </tig>
        </langSet>
        <langSet xml:lang="en">
          <tig>
            <term>competence of the Member States</term>
          </tig>
        </langSet>
      </termEntry>
--------------------------------------------------------------------------------
There are 6701542 lines & 488245 entries (termEntry). The file is 168 MB.
SDL MultiTerm 2015 Convert cannot convert it ("SystemOutOfMemoryException") to .mtf.xml.
However, it can convert a subset with 10000 entries.


[Edited at 2019-03-15 08:14 GMT]

[Edited at 2019-03-15 08:15 GMT]
Collapse


 

DZiW
Ukraine
English to Russian
+ ...
https://gateway.sdl.com/CommunityHome Mar 15

While SDL generally recommends that you use the latest version to mitigate glitches and memory leak issues, they say it might be caused by the occurrence of ampersands, even escaped &ersands, in some fields.
... See more
While SDL generally recommends that you use the latest version to mitigate glitches and memory leak issues, they say it might be caused by the occurrence of ampersands, even escaped &ersands, in some fields.

Little wonder even machines with 32+GB RAM no guarantee.
Collapse


 

Rossano Rossi  Identity Verified
Local time: 09:39
English to Italian
+ ...
TOPIC STARTER
TBX to multiterm DB Mar 15

I have upped the ante to a subset of 100,000 entries. Multiterm converter is able to deal with such a chunk and Multiterm is able to import it. This seems a reasonable solution for an overall set of 488245 entries (i.e. five chunks of 100,000 entries).
Thanks for your support.
Rossano


[Edited at 2019-03-15 15:55 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Conversion of a large terminological db from tbx to multiterm db

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search