Cross-compatible terminology database format
Thread poster: Matt Train
Matt Train
United Kingdom
Member (2007)
English
+ ...
Jul 23, 2009

Hi Everyone,

Our agency wants to store and provide terminology databases in a format that can be imported into any CAT tool - so that people can work in the tool of their choice.

It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better. We are hoping that TXT or CSV will work, but understand that we may be wrong!

Does anyone know if there is a universal format that can be imported into all terminology tools in all CAT tools?

Thanks!

Matt


Direct link Reply with quote
 
Laurent KRAULAND  Identity Verified
France
Local time: 00:19
French to German
+ ...
How about TBX or open-source formats? Jul 23, 2009

Hello Matt,
you could look have a look at this:
http://www.lisa.org/Term-Base-eXchange.32.0.html and at that: http://en.wikipedia.org/wiki/Tbx#TBX

HTH
Laurent K.

[Edited at 2009-07-23 11:50 GMT]


Direct link Reply with quote
 
Matt Train
United Kingdom
Member (2007)
English
+ ...
TOPIC STARTER
Does export to TBX exist in all CAT tools? Jul 23, 2009

Hi Laurent,

Thanks for your message, really helpful.

When i look in our tool here - MemoQ (v.3.5.22) I cannot see export to TBX format - I only have the option to export to CSV or Multiterm XML format. MemoQ also only appears to allow import of TermBases in CSV or TMX format....just wondering what the most common tools will import/export?

Thanks for your help!

Matt


Direct link Reply with quote
 
Laurent KRAULAND  Identity Verified
France
Local time: 00:19
French to German
+ ...
Most commonly imported/exported format Jul 23, 2009

Matt Train wrote:
just wondering what the most common tools will import/export?

Thanks for your help!

Matt


Hi again, Matt, glad I could help in some way. AFAIAK the most commonly imported/exported format is TMX (Translation Memory eXchange), but it is not a terminology database format. Hope that other colleagues can add their input to mine.

Laurent K.


Direct link Reply with quote
 
FarkasAndras
Local time: 00:19
English to Hungarian
+ ...
CSV and TXT are basically the same thing Jul 23, 2009

Matt Train wrote:

Hi Everyone,

Our agency wants to store and provide terminology databases in a format that can be imported into any CAT tool - so that people can work in the tool of their choice.

It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better. We are hoping that TXT or CSV will work, but understand that we may be wrong!

Does anyone know if there is a universal format that can be imported into all terminology tools in all CAT tools?

Thanks!

Matt


A CSV is a comma separated txt file. Now, I have no idea how CSV could possibly work as terminology data often contains commas of its own, which would screw it all up horribly. I'm sure there is a solution for that issue, but why bother when you can use tab separated? Comma separated and tab separated TXTs are almost the same thing but tab separated is a bit more user friendly I think. For starters, you can copy-paste between a tab separated txt and a spreadsheet with zero adjustment or trickery, they morph into each other by default.

So, the only really good solution that I can see is tab separated txt and/or Excel tables. (Txt wins in compatibility but xls is more familiar and more easily manageable to most users.)


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:19
Member (2006)
English to Afrikaans
+ ...
Any flat file is best Jul 23, 2009

Matt Train wrote:
It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better.


Some CAT tools simply don't have the facility to import a simple format. There is no format that every tool can import. But your best bet is probably something tab delimited. If the translation team member meant "Trados TXT", then he's got it wrong -- the only tool that can read Trados TXT is, well, Trados. But if he meant a tab delimited file with a TXT file extension, then it's spot on.

TBX was designed (by some guy and his mates, over a cup of coffee perhaps) as a universal format, but so far very few tools can read and/or write it.

TMX may work but the problem with TMX is that there are only two fields, whereas with a tab delimited TXT file or a CSV file you can have as many fields as you can dream of.

CSV is not a good choice because different tools generate different dialects of CSV that are not all mutually intelligible.

[Edited at 2009-07-23 12:20 GMT]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:19
Member (2006)
English to Afrikaans
+ ...
How to handle commas in CSV Jul 23, 2009

FarkasAndras wrote:
A CSV is a comma separated txt file. Now, I have no idea how CSV could possibly work as terminology data often contains commas of its own, which would screw it all up horribly.


Ah, but CSV is not simply a comma separated file -- it is a comma separated file with some extras to make it comma compatible. If a field contains a comma, simply put quotes on either side of the field. If a field contains a quote, simply double it. With CSV, your fields can also contain tabs and even line breaks. Some CSV programs do not accept a CSV file if there are superfluous quotes, however, and some other programs generate quotes whether they are strictly necessary or not, so you have a recipe for disaster.

That said, I don't think the CSV format is sufficiently simple that people should attempt to edit it by hand. Tab delimited is simpler and more human editable.

For starters, you can copy-paste between a tab separated txt and a spreadsheet with zero adjustment or trickery...


Agreed. You can do slightly more with Microsoft Office than with OpenOffice.org, but basically a tab delimited file shows the most promise.


[Edited at 2009-07-23 12:29 GMT]


Direct link Reply with quote
 
Matt Train
United Kingdom
Member (2007)
English
+ ...
TOPIC STARTER
Thanks! Jul 23, 2009

Thanks Samuel and Andras.

Interesting that a standard simple format does not exist in the practical world (although TBX may solve that in future hopefully!).

Now we know that in this case there is not a one-size-fits-all solution we can act accordingly.

Thanks for your input!


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Cross-compatible terminology database format

Advanced search







Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search