Idea for doctoral thesis: Encyclo tags
Arnaud HERVE

Arnaud HERVE



Jan 13, 2003


very often we translators are asked to translate on subjects of which we are not professionnals. It is good to have glossaries and so on, but the www allows us more possibilities: for example reading the documents actually made by the professionnals themselves, people who must be for us models of perfect use of jargon in definite fields.

In most cases I check that kind of documents before writing: the most my translation is invisible, the best. By the way, I have also learned to avoid professionnal documents when doing a test for agencies, because examiners tend to apply corrrections valid in an internal world of translators, official glossaries, etc. , which sometimes can derive from what is fluent in \'real life\'.

However, when I became a translator, part of my interest was also to get in touch with professions outside the faculty of arts, since our activity gives us a unique right to frequent so many different trades. Of course that slant can give people the opinion that you\'re a \"dilletante\", but there are ways to correct unvoluntary amateurism, by a better documentation.

The best documents we find when the job is for a company which has a website, and that website is multilingual, and we find the pages for the previous version of the product we are dealing with in source and target languages, and hence most of the vocabulary is there. That\'s an ideal situation, and I call that a \"Rosette stone situation\".

I sometimes regretted there was not an HTML tag especially made for us translators, that would help us to reproduce quickly Rosette stone situations. The tag would look like this:

<META NAME=\"Rosette\" SOURCE=\"source_language_page.html\" LANGUAGE=\"fr\" VALIDATED=\"yes\">

in which NAME would indicate the kind of meta tag, SOURCE would of course point to the original, LANGUAGE would indicate the version I\'m looking for as I don\'t translate into anything else than fr, and VALIDATED would be the creme de la creme: as it been proof-read or not by a person of the trade who is not a translator?

No doubt the insertion of such fields by companies or professionnal sectors would help us, using a tool similar to Copernic, but would also entice them to make steps to help us, instead of complaining that we don\'t know the job as well as a professionnal.

After reflecting, I came to think that such a principle of new tags could be developped: for instance a QUOTE tag

<NAME=\"quote\" author=\"Bertrand Russell\" book=\"Principia Mathematica\">

so you can communicate to all people in the world who are interested in Bertrand Russell that you\'ve made a quote here.

or a DEFINITION tag:

<NAME=\"definition\" object=\"runabout\" field=\"maritime\">

so you can help the world in giving your own definition of what is a runabout boat, which helps making the www a free open-source dictionnary.

Thus I came to the conclusion that it was best to give up the META nature of such tags, and to name them ENCYCLO tags:


I am now planning to make a doctorate thesis about that, and I\'m looking for a professor who would accept to direct it. I need your feedback.

Please destroy my proposal if you think it\'s worthless or already done.

Arnaud the day-dreamer

gianfranco




XML does it Jan 13, 2003

A web site could be entirely written in XML with content tags and it would be usable exactly as you dream.

The only problem is that web site developers and their customers either do not know, or have no funds, or no interest in doing it.

XML is indeed used for some content-providing web sites, a small minority so far.


Luca Tutino




A dream on the dot. Jan 13, 2003

Dear Arnaud,

I like your dream very much, and I think it is a great subject for a doctorate. I feel it is feasible, the same way as in 1988 I thought that something like Trados had to be possible, and in 1994 I felt the need of exactly what today is ProZ.

Your specific examples, although apparently obvious after reading them, are actually very interesting per se.

The content and keyword meta tags are widely implemented because of the search engines, and the hit count fever... These are already changing our relation with culture. Before 1992 I used to go to a local public library 2 times a week just to keep updated and run some accumulated checks on my editing issues. Today I probably enter a public library very few times in a year, and most of my queries find their answers on the web or via email.

The main problem with your tags, I believe, is in the time and effort required for using. Their real benefit would be in their consistent presence across a good deal of web pages. But who would take the time to fill them in?

The language tags, even if implemented by some official organization, would have a big chance of getting overlooked and misused. The citation and definition tags would be used, at best, by a small community.

My first suggestion would be to give some thought in the direction of motivation and automation. On the other side, personally, I would be quite happy in taking part to these studies and experiments.


Henry Dotterer

Great topic Jan 13, 2003

Great topic for a thesis!

I agree with Gianfranco; you have to use XML. I have not seen any of your specific tags used in practice, but the second and third tags are what XML is made for. Look for XML DTDs published or being formed for reporters, researchers, etc., I bet you will find quote and definition tags.

As for your first idea, you might be able to get some ideas from version control methodologies (thinking of a document as the French version of an English doc). Ted Nelson did work on version control years ago...

Also, the localization technology vendors--GlobalSight, Idiom, etc.--must have a way of referencing translations from within documents, and all of them use XML, so they may be doing this already. GlobalSight has spearheaded some XML standards that may be relevant.

For your thesis, you might take the approach not of defining your own \"encyclo\" DTD, but rather, assume that all online documents were labelled in this way. Now what? Lots of possibilities...citation counting to assess importance, knowledge extraction (for example, in our field, automatic generation of TMs so that a machine can help a person translate, etc.)

Anyway, there will be lots to work with there. Best of luck!

Arnaud HERVE



Thanks ! Jan 13, 2003

Thank you to the three of you. It is an ecouragement indeed. I am preparing some clever answer.

In the meanwhile, I have noticed that my \"tags\", which appeared in IE and Mozilla this afternoon, do not appear any more tonight. Still I checked the box \'disable HTML tags\' when posting.

Could that be repaired somehow? Otherwise people won\'t understand what I\'m talking about.

Henry Dotterer

Use the ISO 3-letter code Jan 14, 2003

The ISO has a 3-letter code to identify languages. Some people add an underscore followed by an ISO 2-letter country code to indicate dialect. (So fra_fr is French as spoken in France.)

And sorry to have squashed your tags when editing your title for the home page...I repaired...

Idea for doctoral thesis: Encyclo tags

