Pages in topic:   [1 2] >
Multiterm 2009 and TBX import
Thread poster: Grzegorz Gryc

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
Nov 2, 2009

Hi

After some not very exhaustive testing of data migration and consolidation in Multiterm 2009 I'm really amazed by my findings.

E.g.
The TBX import doesn't work properly.

I have a set of TBX files exported from Idiom Worldserver 9 (a SDL brand now), MT Convert stops because it can't find the DTD file (TBXcdv04.dtd).
It's rather easy to find this standard file (e.g. it's provided with Swordfish) and continue but I don't understand why the Multiterm team is unable to include the basic TBX related files in their distro while Rodolfo does it easily in Swordfish.

It seems the MT guys are aware only of the TBXcdv02.dtd version (as they use it in the TBX files exported from MT) and they don't have any idea about the way their colleagues from SDL Idiom division generate their TBX files.
In the same way, I suppose they have a poor idea about the way Multiterm interoperates with Trados.
E.g. see the recent threads:
http://www.proz.com/forum/sdl_trados_support/149675-trados_studio_2009_sp1_f12_to_save_a_new_term.html
http://www.proz.com/forum/sdl_trados_support/149689-sdl_studio_2009_random_term_recognition.html
http://www.proz.com/forum/sdl_trados_support/148976-error_message_when_trying_to_open_termbase_in_trados_studio_2009.html
etc.

PS.
The TBXcdv04.dtd specification was published in 1999, AFAIK.
Some amendments were made in 2000.
No comments.

Cheers
GG

[Edited at 2009-11-02 10:23 GMT]


Direct link Reply with quote
 

Laurent KRAULAND  Identity Verified
France
Local time: 07:10
French to German
+ ...
My best guess Nov 4, 2009

Grzegorz Gryc wrote:

I have a set of TBX files exported from Idiom Worldserver 9 (a SDL brand now), MT Convert stops because it can't find the DTD file (TBXcdv04.dtd).
It's rather easy to find this standard file (e.g. it's provided with Swordfish) and continue but I don't understand why the Multiterm team is unable to include the basic TBX related files in their distro while Rodolfo does it easily in Swordfish.



Want to know my best guess, Grzegorz?
No offence to the SDL developers teams here (I'd just love to run a SDL product which would work in the same fashion as e.g. Swordfish/Heartsome)...

There it is: because SDL Trados have not developed their .SDLTBX format, which would be compatible with standard .TBX? (This being said because Studio uses .sdlxliff instead of the standard, open-source .XLF format).

[Edited at 2009-11-04 21:16 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
Total cost of bug ownership... Nov 7, 2009

Laurent KRAULAND wrote:

Grzegorz Gryc wrote:

I have a set of TBX files exported from Idiom Worldserver 9 (a SDL brand now), MT Convert stops because it can't find the DTD file (TBXcdv04.dtd).
It's rather easy to find this standard file (e.g. it's provided with Swordfish) and continue but I don't understand why the Multiterm team is unable to include the basic TBX related files in their distro while Rodolfo does it easily in Swordfish.


Want to know my best guess, Grzegorz?
No offence to the SDL developers teams here (I'd just love to run a SDL product which would work in the same fashion as e.g. Swordfish/Heartsome)...

There it is: because SDL Trados have not developed their .SDLTBX format, which would be compatible with standard .TBX? (This being said because Studio uses .sdlxliff instead of the standard, open-source .XLF format).


IMHO you're right.
And probably their current internal data structures were not designed correctly and the entire software should be rewritten.
E.g., you have a similar problem with CSV, Multiterm is simply unable to produce correct CSV files by default if your termbase is not a simple 1:1 term list.
According to all the Time-to-Market related books, it's cheaper to maintain this bug than to make the software work well.

It's a pity because it's one of few terminology management tools which theoretically meets my expectations but I simply can't use it because of it's unreliable import/export filters which are crucial for me.

BTW.
Try to convert a 1000000 (one million) software string list from MT5 to MT7.
Hell will freeze.
In MT, I killed the conversion after 5 days.
DVX does in one night, AFAIR.

Cheers
GG

[Edited at 2009-11-08 10:18 GMT]


Direct link Reply with quote
 

Laurent KRAULAND  Identity Verified
France
Local time: 07:10
French to German
+ ...
Curiosity killed the CAT... Nov 7, 2009

but while we speak about basic / open-source formats, have you ever tried importing an .sdlxliff file to Heartsome or Swordfish? I am curious to know whether this would work smoothly... As per the .csv files, I am now warned and will not do any attempt in that direction.

[Edited at 2009-11-07 17:49 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
SDLX :) Nov 7, 2009

Laurent KRAULAND wrote:

but while we speak about basic / open-source formats, have you ever tried importing an .sdlxliff file to Heartsome or Swordfish? I am curious to know whether this would work smoothly...

Yes, some time ago (in June?).
Swordfish deleted all the SDL specific extensions and the file became unusable.
But the Swordfish world is rapidly changing
Now, I have no time to test it but I suppose the solution already exist or will be published shortly.

PS.
SDLX does almost perfectly.
With some additional processing it's a decent alternative
I never tested it thoroughly but I received some confirmation it works even for very complex documents.
When you have an ITD, you have TTX.
When you have TTX, the door is wide open

Cheers
GG

[Edited at 2009-11-07 21:48 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
CSV case explanation Nov 7, 2009

Laurent KRAULAND wrote:

per the .csv files, I am now warned and will not do any attempt in that direction.[/quote]
For the explanation, see my posts in the thread:
http://www.proz.com/forum/déjà_vu_support/117860-exporting_multiterm_7_termbases_to_dvx.html

In Multiterm 2009 SP1, it's still the same.

BTW.
The tab delimited file definition according to Multiterm guys is just beautiful.
You put a tab somewhere, you have a correct tab delimited file.
Hereby, I post my "a" delimited file definition.
You put an "a" char anywhere, you have a correct "a" delimited file.
Both definitions are true/absurd at (almost) the same level.

Use the default CSV export definition and export the "Sample" termbase provided with Multiterm.
Before you take a look on the results, finish your coffee/drink etc., you may harm your screen or keyboard.

^-^

[Edited at 2009-11-07 21:49 GMT]


Direct link Reply with quote
 

Jonathan Hopkins  Identity Verified
Germany
Local time: 07:10
German to English
+ ...
Did you ever find a way to import TMX in MT Convert? Nov 25, 2011

Hi Grzegorz,

Grzegorz Gryc wrote:
I have a set of TBX files exported from Idiom Worldserver 9 (a SDL brand now), MT Convert stops because it can't find the DTD file (TBXcdv04.dtd).
It's rather easy to find this standard file (e.g. it's provided with Swordfish) and continue but I don't understand why the Multiterm team is unable to include the basic TBX related files in their distro while Rodolfo does it easily in Swordfish.



Did you ever find a way to import your TBX file using MultiTerm Convert, or did you just use Swordfish to do the trick? I just ran into the same problem. I'll likely have the agency just export Star Transit's TermStar dictionary as TMX, import that into memoQ, which can then export an SDL compatible xml file...quite the work around, which I won't be able to perform until tomorrow (provided I get the tmx file). So if you happen to know how to use MultiTerm Convert to import a TBX, I'd appreciate any tips.

Thanks,
Jonathan


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
Varia... CSV workround... Nov 25, 2011

Jonathan Hopkins wrote:

Grzegorz Gryc wrote:
I have a set of TBX files exported from Idiom Worldserver 9 (a SDL brand now), MT Convert stops because it can't find the DTD file (TBXcdv04.dtd).
It's rather easy to find this standard file (e.g. it's provided with Swordfish) and continue but I don't understand why the Multiterm team is unable to include the basic TBX related files in their distro while Rodolfo does it easily in Swordfish.

Did you ever find a way to import your TBX file using MultiTerm Convert, or did you just use Swordfish to do the trick?

I simply provided the correct dtd file.
AFAIR I copied it in the directory where the TBX files were stored or in the Multiterm Convert directory.
BTW, this file is provided in th MT 2009 SP4 and in MT 2011, so theoretically it should work out of the box (I didn't test it).

I just ran into the same problem. I'll likely have the agency just export Star Transit's TermStar dictionary as TMX, import that into memoQ, which can then export an SDL compatible xml file... quite the work around, which I won't be able to perform until tomorrow (provided I get the tmx file).

If necessary:
- for the TBX files, you can also use Across and export 'em as CSV (tab seperated),
- for the TMX-CSV conversion, you have a lot of possibilities, e.g. XBench, Olifant etc.
CSV is handled by MT Convert (Spreadsheet or database exchange format).

So if you happen to know how to use MultiTerm Convert to import a TBX, I'd appreciate any tips.

If you have the most recent Multiterm version, it should work.
If not, it's a bug.

Cheers
GG

[Edited at 2011-11-25 08:07 GMT]


Direct link Reply with quote
 

Jonathan Hopkins  Identity Verified
Germany
Local time: 07:10
German to English
+ ...
Looks like it was a bug Nov 25, 2011

Thanks for that very quick response,

Grzegorz Gryc wrote:
I simply provided the correct dtd file.
AFAIR I copied it in the directory where the TBX files were stored or in the Multiterm Convert directory.
BTW, this file is provided in th MT 2009 SP4 and in MT 2011, so theoretically it should work out of the box (I didn't test it).


I guess it is some kind of a bug. I have the latest version of MT 2011. I found the dtd file in the program files saved along with the MultiTerm Convert program files, copied into the same directory where the TBX file is located and that seemed to work. Only now I get a new error:



Do you know if this means that the TBX file is corrupt, or is it yet another bug in MT Convert?

Thanks again for the quick reply
Jonathan


Direct link Reply with quote
 

Jonathan Hopkins  Identity Verified
Germany
Local time: 07:10
German to English
+ ...
Strange characters in TBX file Nov 25, 2011

Transit NXT may actually be the culprit this time. I'm finding all kinds of strange markings in the TBX file, which is causing the conversion to fail.



Hopefully there won't be all too many of them.

Cheers,
Jonathan


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
Corrupt TBX Nov 25, 2011

Jonathan Hopkins wrote:

Thanks for that very quick response,

I guess it is some kind of a bug. I have the latest version of MT 2011. I found the dtd file in the program files saved along with the MultiTerm Convert program files, copied into the same directory where the TBX file is located and that seemed to work. Only now I get a new error:



Do you know if this means that the TBX file is corrupt, or is it yet another bug in MT Convert?


Yep, the TBX is corrupt.
I.e. it contains an invalid character you should fix (e.g. delete or replace).
If it's an isolated case, a decent text editor is enough (you can use, let's say, Notepad++).
If you spot multiple errors, see e.g. XML Marker.

Multiterm never handled correctly corrupt XML files, i.e. it's unable to skip a unit, the entire conversion fails.

Catspeed
GG


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
Transit bug... XBench... Nov 25, 2011

Jonathan Hopkins wrote:

Transit NXT may actually be the culprit this time. I'm finding all kinds of strange markings in the TBX file, which is causing the conversion to fail.

Some tools are simply unable to produce valid XML (TMX/TBX) files.
It's probably a well known Transit bug (I remember vaguely a friend of mine complained a lot about it).



Hopefully there won't be all too many of them.

You can try some global search and replace operations.

If your file is strictly bilingual and you, try XBench instead.
It should work well for your language pairs (XBench corrupts chars if the source and target code page don't match).

Cheers
GG

[Edited at 2011-11-25 08:45 GMT]


Direct link Reply with quote
 

Jonathan Hopkins  Identity Verified
Germany
Local time: 07:10
German to English
+ ...
One bug resolved, another rears its ugly head Nov 25, 2011

Ok, so I cleaned up the TBX file with Notepad++ (a great tool which I discovered not that long ago), and went ahead with the process. However, at step 7 of 10, MultiTerm would appear to represent the TB levels on their head:



Compare that to a normal MultiTerm termbank structure:



And what is worse, MultiTerm doesn't seem to let me add any fields into any other level but the Entry level. Whenever I click on any of the respective levels on the left, and then select the field that I want to add, both monitor screens flicker and I can't add the field.

*sigh*

So looking forward to just using memoQ on the weekend to do the job with a few clicks. If only the agency would hurry up and give me the tmx files.

Cheers,
Jonathan


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 07:10
French to Polish
+ ...
TOPIC STARTER
Primitive Multiterm limitations... CSV workaround... Nov 25, 2011

Jonathan Hopkins wrote:

Ok, so I cleaned up the TBX file with Notepad++ (a great tool which I discovered not that long ago), and went ahead with the process. However, at step 7 of 10, MultiTerm would appear to represent the TB levels on their head:

(...)
Compare that to a normal MultiTerm termbank structure:
(...)

Indeed, I remember the import was not perfect but I cared only about the terms, not attributes (my structure was really simple, a GUI glossary).
It's a general problem with Multiterm, it expects exactly the same structure (names etc.) for the attributes.
It's simply stupid.
Multiterm should permit the field mapping as any decent tool.

And what is worse, MultiTerm doesn't seem to let me add any fields into any other level but the Entry level. Whenever I click on any of the respective levels on the left, and then select the field that I want to add, both monitor screens flicker and I can't add the field.

*sigh*

So looking forward to just using memoQ on the weekend to do the job with a few clicks. If only the agency would hurry up and give me the tmx files.


Try other workarounds based on TBX-CSV conversion.
The CSV mapping in Multiterm is easier.
You should rename the column headers in the CSV file, they should match the Multiterm database structure.
You can also save CSV as Excel, it's probably easier to edit complex data.

Cheers
GG


Direct link Reply with quote
 

FarkasAndras
Local time: 07:10
English to Hungarian
+ ...
Oh my... Nov 25, 2011

Good Lord, that's a veritable mess.
Can't the translation industry produce and implement standards that work? TMX isn't exactly working flawlessly, and TBX looks like it's a lot worse.
Anyone know for a fact who is at fault here? Is it SDL or is the standard broken?

[Edited at 2011-11-25 10:06 GMT]


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Multiterm 2009 and TBX import

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search