Definitive list of domains
Thread poster: Reed James
Reed James
Reed James
Chile
Local time: 19:57
Member (2005)
Spanish to English
Apr 4, 2007

Hello again. I am an avid terminology database compiler. This is because it is advantageous for me to have ready made lists for my CAT tools as well as a good way of organizing terminology all in one place.

One of my biggest dilemmas in building these extensive lists of terms is deciding how to categorize them (i.e. what should be a business term and what should be a legal term) and how many categories and subcategories I should have (Should electrical engineering be its own doma
... See more
Hello again. I am an avid terminology database compiler. This is because it is advantageous for me to have ready made lists for my CAT tools as well as a good way of organizing terminology all in one place.

One of my biggest dilemmas in building these extensive lists of terms is deciding how to categorize them (i.e. what should be a business term and what should be a legal term) and how many categories and subcategories I should have (Should electrical engineering be its own domain or just a subdomain of engineering itself).

What I would like to know is if there is an official list of domains that I can follow. Any input on this matter will be greatly appreciated. Thanks.

Reed
Collapse


 
mediamatrix (X)
mediamatrix (X)
Local time: 19:57
Spanish to English
+ ...
UNESCO Thesaurus Apr 4, 2007

There is not - and of course never could be - a complete classification of all fields of human knowledge. But that hasn't stopped some people from trying ...

One of the least-used books in my paper library is:

UNESCO Thesaurus - A structured list of descriptors for indexing and retrieving literature in the fields of education, science, social and human science, culture, communication and information.

... See more
There is not - and of course never could be - a complete classification of all fields of human knowledge. But that hasn't stopped some people from trying ...

One of the least-used books in my paper library is:

UNESCO Thesaurus - A structured list of descriptors for indexing and retrieving literature in the fields of education, science, social and human science, culture, communication and information.

ISBN: 92-3-0031009-3

It is tri-lingual, en/es/fr.

My edition is dated 1995. It might be worth visiting the UNESCO website to see if they've put in on-line in the meantime...

MediaMatrix
Collapse


 
Henrik Pipoyan
Henrik Pipoyan  Identity Verified
Local time: 03:57
Member (2004)
English to Armenian
Dictionary abbreviations lists Apr 5, 2007

Most dictionaries are built that way. They specify the domains next to the definitions of the words, and since they are repeated, these domain names are usually abbreviated. You can find the abbreviations list in the beginning or at the end of the dictionary. These are mostly the technical fields, but I think it can serve a good start.

 
helentrans (X)
helentrans (X)
English to Chinese
Refer to the categories from websites for translators Apr 5, 2007

I think the categories you saw when registering in the major translation websites may give you some clues.

 
Marie-Céline GEORG
Marie-Céline GEORG  Identity Verified
France
Local time: 01:57
German to French
+ ...
DEWEY or UDC, maybe? Apr 5, 2007

Hi,
Are you looking for something like the Dewey classification or the Universal Decimal Classification used in libraries (e.g. http://en.wikipedia.org/wiki/Universal_Decimal_Classification, the article also gives links to other systems)?

Personally I'm using a customized list of domains/clients derived from the UDC used in the DejaVu X CAT Tool, but the i
... See more
Hi,
Are you looking for something like the Dewey classification or the Universal Decimal Classification used in libraries (e.g. http://en.wikipedia.org/wiki/Universal_Decimal_Classification, the article also gives links to other systems)?

Personally I'm using a customized list of domains/clients derived from the UDC used in the DejaVu X CAT Tool, but the idea of using the dictionary domain abbreviations is good, especially if you need precise subcategories in a particular field.

HTH
Marie-Céline
Collapse


 
Reed James
Reed James
Chile
Local time: 19:57
Member (2005)
Spanish to English
TOPIC STARTER
Dewey sounds the best to me Apr 6, 2007

Marie-Céline GEORG wrote:

Hi,
Are you looking for something like the Dewey classification or the Universal Decimal Classification used in libraries (e.g. http://en.wikipedia.org/wiki/Universal_Decimal_Classification, the article also gives links to other systems)?


I am a little disappointed to learn that there is no official or standard list for all translators. To me, terminology is such an important aspect of what I do.

Nevertheless, I like the Dewey Decimal System because I grew up with it on my forays to the library, it has been around for a long time and it is precise.

Does anyone have any preference for assembling one termbase per subdomain or larger ones made up of several subdomains. Thanks.

Reed


 
mediamatrix (X)
mediamatrix (X)
Local time: 19:57
Spanish to English
+ ...
Cart before the horse? Apr 7, 2007

Reed D. James wrote:

Nevertheless, I like the Dewey Decimal System because I grew up with it on my forays to the library, it has been around for a long time and it is precise.

Does anyone have any preference for assembling one termbase per subdomain or larger ones made up of several subdomains. Thanks.


Some years ago I found myself involved in a project requiring the classification of information in the entire field of radio and television broadacsting. That's how I ended up with a copy of the UNESCO Thesaurus mentioned earlier ...

This apparently simple task proved to be impossible - by which I mean we couldn't satisfy anyone even part of the time!

Part of our problem was that the system had to be used by people for whom 'broadcasting' is a social science concerned with certain aspects of the art of human communication, whilst for others it is a 'mere' technology. The associative structures in each model are different - what seems to be a natural flow of subject dependencies for one group is confusing, or downright stupid, to the other. Certain colleagues went so far as to argue that it was unnecessary to provide for a distinction between 'radio' and 'television' ...

Anyway, we ended up with a system containing a vast amount of information, but if you wanted to extract something it was helpful (often essential) to know who had classified the item you were looking for, since the 3-level classification hierarchy could not be 'laid flat' when searching. Consequently if you were searching, say, for the 'French law on TV advertising', you could never be sure whether you would find it under 'TV - France - Advertising law' or 'Law - France - TV advertising'. Thus any search could only be considered complete after the user had diligently searched all possible combinations of likely keywords; and even if the user found an apparent 'hit' (s)he should always complete the search sequence in case there was another document also available elsewhere - perhaps a newer edition of the first hit. There were instances, too, where the classification of a specific item of EU legislation, for example, was classified in one way for the English text and a quite different way for the French version, owing to inadequate coordination between classifyers.

Of course what actually happened was that users used the full-text search on document content, or on the document summary content, to find what they needed, and the classification system ended up as a big white elephant.

That system was concerned mainly with documentation of all kinds (anything from historical documents to EU legislation to videotapes); each item had to be examined before classification and the process triggered perhaps 100 classification decisions per person per working day. The labour overhead relating to the actual decision-making process - i.e. which is the best classification for this docment? and picking this choice from the hierarchy - was relatively small. In a terminology database the overhead is likely to represent a far greater proportion of the overall workload...

A further problem was that in many respects - mainly in the technology area, but in other areas too - the documentation was ground-breaking stuff for which there was no relevant classification. So we spent much time re-jigging the classifcation to make things fit. That was not only frustrating and time-consuming for the system developers, but also very confusing for system users because things ended up being shifted from one classification to another.

Now let's imagine what effort would be necessary in any attempt to expand that (apparently simple) example to cover 'all human knowledge'. Indeed, it is not without reason there have been very few attempts at such an undertaking - and most of them have been mentioned already in this thread.

So, long before trying to decide how to structure the database (one termbase per subdomain or larger ones made up of several subdomains, etc.) I would suggest that you validate your choice of the DDC system. My personal 'feeling' is that it is not a good choice, unless you are happy to have a somewhat superficial classification (not to say 'artificial' if your termbases include any 21st century domains such as technology, medecine, etc.).

Before selecting any one of those systems as the basis for your terminology database, it would be very instructive to do a paper exercise using, for example, the DDC, UDC, a dictionary or encyclopedia classification, and the UNESCO Thesaurus to classify the first 100 terms that are to go into the chosen system.

Factors to be considered during the exercise would include:

- does this classification cater for this term?
- how long does it take to find the best classification?
- will the chosen classification satisfy all experts having that term in their every-day vocabulary?
- is the chosen classification the one that a non-specialist user would intuitively use when searching for that term?

Then, having selected a system - DCC, UDC or whatever - you will need to consider two key user interactions:
- classification of terms by the terminologist;
- retrieval of terms by the end user.

Of those two, the end user is by far the most important. If the end user cannot find what (s)he is looking for, or if there's a chance that (s)he will find only part of the relevant data, then the system itself will need to be classified under DDC code 165.

MediaMatrix

PS A year or so ago there was quite a lot of useful discussion on this subject in the (now defunct) www.wikiwords.org You might find it worthwhile looking at the relevant forum postings there (search the forums for 'taxonomy').


 
Reed James
Reed James
Chile
Local time: 19:57
Member (2005)
Spanish to English
TOPIC STARTER
Will give Wikiwords a try... Apr 9, 2007

Mediamatrix,

Hi. This terminology classification is for my own personal use. I am not overly worried about misplacing terms or the complexities of taxonomy as I am the only one using the termbase and I am able to search for the right term no matter what category it is placed in.

I had forgotten about Wikiwords. I will look into the discussion group. Thanks.

Reed


 
Viktoria Gimbe
Viktoria Gimbe  Identity Verified
Canada
Local time: 19:57
English to French
+ ...
Why not try Wikipedia? Apr 23, 2007

Try this URL: http://en.wikipedia.org/wiki/Lists_of_basic_topics

Take a look and see if you find it suits you. I have looked thrugh tons of taxonomies/lists/directories, and they all are different, perhaps for reasons covered in mediamatrix's post above. But I find this one straightforward and personally like the logic used. For me, it is hard to "get lost" in this list. Ma
... See more
Try this URL: http://en.wikipedia.org/wiki/Lists_of_basic_topics

Take a look and see if you find it suits you. I have looked thrugh tons of taxonomies/lists/directories, and they all are different, perhaps for reasons covered in mediamatrix's post above. But I find this one straightforward and personally like the logic used. For me, it is hard to "get lost" in this list. Maybe this is because Wikipedia is crafted by the collective conscious, so it's an "average", if I may use this word, of our collective logics.

Give it a try and tell me what you think!
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Definitive list of domains







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »