Formats for term database exchange
Thread poster: langu2
Oct 30, 2013

Which file formats are usually used by translators for the exchange of term databases?

Direct link Reply with quote
 

Heartsome Support
Local time: 03:39
txt-based Oct 31, 2013

File formats such as TBX, Excel, TXT etc. is usually used for this. It depends on your CAT-supported file formats for importing and exporting.

Direct link Reply with quote
 
langu2
TOPIC STARTER
Terminology exchange formats Oct 31, 2013

Thanks for your reply. I am aware there are donzens of possible exchange formats, but which are mostly used by translators working with a professional terminology tool (not Excel)?

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:39
Member (2009)
Dutch to English
+ ...
CafeTran Oct 31, 2013

Depends on what you mean by 'professional terminology tool'. For me, CafeTran (my CAT tool) is a 'professional terminology tool', and it stores its terminology databases as tab-delimited UTF-8 text files, which, if you ask me, is the best format to store, edit, maintain and share term data.

Michael


Direct link Reply with quote
 

Selcuk Akyuz  Identity Verified
Turkey
Local time: 22:39
Member (2006)
English to Turkish
+ ...
xml Oct 31, 2013

TBX was created for this purpose but no, it is not used by translators.

Plain text files or comma / tab separated text files, and excel / word files are used by translators. Agencies or end clients sometimes send xml files (and all the other relevant files) if MultiTerm is used.

MultiTerm is not the ideal tool for simple, bilingual (source = target) term lists. But it is the best terminology management tool to store all metadata. And xml is the only way to share these 'real' databases.

CafeTran? Michael, it was a good CAT tool for any translator but now I am lost in it. And it seems to me that the developer adds new features only for you.

Selcuk


Direct link Reply with quote
 
FarkasAndras
Local time: 21:39
English to Hungarian
+ ...
professional Oct 31, 2013

langu2 wrote:

Thanks for your reply. I am aware there are donzens of possible exchange formats, but which are mostly used by translators working with a professional terminology tool (not Excel)?

The term "professional terminology tool" makes me laugh. There is no such thing. MultiTerm is the most widely used terminology tool and 'professional' isn't the adjective that comes to mind.
Anyway, the TB exchange situation is a complete mess. There is no widespread standard shared by the overwhelming majority of tools. A lot of them can import xls so that's a reasonably good option. TBX was designed to be a universal standard (like TMX for TMs) but it never really took off. Tab separated txt is a decent option too. MultiTerm has an XML export format, but only MultiTerm and a handful of other tools can read it.

In short, decide what you want to use those TBs for and who you want to give them to, and choose a format based on that. The default generic option is, like it or not, xls.


Direct link Reply with quote
 
langu2
TOPIC STARTER
Tab delimited-text files Oct 31, 2013

It is interesting to learn that tab-delimited text files are so popular.


Is the information in each row (tab-delimited files) in a specific order or does this vary from file to file?


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:39
Member (2009)
Dutch to English
+ ...
MultiTerm is a hideous monster of a tool Oct 31, 2013

Hi Selcuk,

CafeTran was a good tool, and it is getting better every day.

Igor has added all kinds of great new features that benefit everyone. Not just me. It's a shame that you think only I benefit from them. If you had a careful look at the latest release, I think you would agree that there are all kinds of new things that anyone can benefit from, such as:

– source and target-side synonyms
– export CT segment notes straight to Word documents
– new Quick Term Editor
– term prioritising based on term fields (Subject/Client)
– selectable metadata lists in glossaries (in TXT file stored on your computer)
– regular expressions in source terms in TXT Glossaries
– export project as bilingual document for review purposes (like memoQ's RTFs + DVX's word tables)

(taken from: http://cafetran.wikidot.com/pre-release-version )

I also (strongly) disagree, MultiTerm is not the best tool for real termbases with metadata. It's clunky and is very difficult to use. How many people do you know that own, use and love MultiTerm?

I have 8 different fields in my tab-delimited UTF-8 text file glossaries in CT, and can easily edit/maintain them in a good CSV editor, such as Ron's Editor.

Here is what my glossary header looks like:

#nl-NL #en-GB #Context #Subject #Client #Note #Definition #Usage example #Source #URL

XML is definitely not the only way to share these 'real' databases. Tab-delimited UTF-8 text files are the most transparent and interoperable format that exists.

Michael

[Edited at 2013-11-01 09:12 GMT]


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:39
Member (2009)
Dutch to English
+ ...
@langu2: Oct 31, 2013

langu2 wrote:

It is interesting to learn that tab-delimited text files are so popular.

Is the information in each row (tab-delimited files) in a specific order or does this vary from file to file?


It can be in any order you like, and there can be as many rows as you need. You can also open them in any UTF-8-aware text editor or CSV editor. Also, if you use a good CSV editor, you can filter on headers in order to work with your data. Kind of like what you can do in Excel, but without worrying about all of the problems Excel has with character corruption.

For example, my glossaries consist of these fields:

#nl-NL: Dutch
#en-GB: English
#Context: Contextual Priority: http://cafetran.wikidot.com/using-context-aware-auto-assembling
#Subject: can be used for auto-assembly
#Client: can be used for auto-assembly
#Note:
#Definition
#Usage example
#Source: where I found the term / author of the term
#URL: clickable in CafeTran


Michael


Direct link Reply with quote
 

Meta Arkadia
Local time: 02:39
English to Indonesian
+ ...
Features kill Oct 31, 2013

Selcuk Akyuz wrote:
CafeTran? Michael, it was a good CAT tool for any translator but now I am lost in it. And it seems to me that the developer adds new features only for you.

I agree with Selçuk, and not for the first time. Without an underlying philosophy, heaps of features will kill any CAT tool - any product actually - including CafeTran. And this comes from a CafeTran fanboy.

langu2 wrote: I am aware there are donzens of possible exchange formats, but which are mostly used by translators working with a professional terminology tool (not Excel)?


Impossible to answer, methinks. TMX is supposed to be the industry standard, CSV (often as an Excel file, sorry for that, langu2) is probably the most used format, and I think using a DBMS would be the most professional approach. Those answers don't agree with your criteria.

Cheers,

Hans


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:39
Member (2009)
Dutch to English
+ ...
TMX -> TBX Nov 1, 2013

Meta Arkadia wrote:

langu2 wrote: I am aware there are dozens of possible exchange formats, but which are mostly used by translators working with a professional terminology tool (not Excel)?


Impossible to answer, methinks. TMX is supposed to be the industry standard, CSV (often as an Excel file, sorry for that, langu2) is probably the most used format, and I think using a DBMS would be the most professional approach. Those answers don't agree with your criteria.

Cheers,

Hans


I suppose you meant TBX, right? TMX was never designed for terminology.

Michael


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:39
Member (2009)
Dutch to English
+ ...
stop complaining about features in general, and start complaining about particular features Nov 1, 2013

Hi Hans,

Meta Arkadia wrote:

Selcuk Akyuz wrote:
CafeTran? Michael, it was a good CAT tool for any translator but now I am lost in it. And it seems to me that the developer adds new features only for you.

I agree with Selçuk, and not for the first time. Without an underlying philosophy, heaps of features will kill any CAT tool - any product actually - including CafeTran. And this comes from a CafeTran fanboy.

Cheers,

Hans


It's all very well complaining but perhaps you're not really being fair as I know that there are features among all the new features that you actually do like and use. That is, stop complaining about features in general, and start complaining about particular features. How's that for a 'philosophy'?

And anyway, Igor does have a philosophy, and constantly saying that he doesn't is actually an insult to his intelligence. Just because it differs from your philosophy (which is what, by the way?) doesn't mean it isn’t one.

Michael


Direct link Reply with quote
 
langu2
TOPIC STARTER
Excel files Nov 1, 2013

And which tools can handle tab-delimited files + Excel files well?

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:39
Member (2009)
Dutch to English
+ ...
@langu2: Nov 2, 2013

Before we continue, could you maybe explain exactly what it is you are trying to do? That is, are you looking for a terminology tool to actually use (and if so, for what), or are you doing some sort of a study, etc.? This might enable us to better answer your questions.

Also, do you mean a full-blown tool to create a dictionary, like the TLex Dictionary Production Software Suite (http://tshwanedje.com/ ) or Unilex (http://www.acolada.de/unilex.htm ), something to manage terminology, like tlTerm (http://tshwanedje.com/terminology/ ), or a CAT tool with a built in terminology system, or something entirely different?

Michael

[Edited at 2013-11-02 19:14 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Formats for term database exchange

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search