Pages in topic:   < [1 2]
Starting the KOG clean-up process
Thread poster: Kim Metzger
Claudia Luque Bedregal
Claudia Luque Bedregal  Identity Verified
Italy
Local time: 00:24
English to Spanish
+ ...
Agree with Viktoria Nov 6, 2006

Hi Kim, you're right, it's time to clean up the KOG.
I like Viktoria's suggestion posted as "A simple solution". It seems like a good system.
Regards,
Claudia


 
Kim Metzger
Kim Metzger  Identity Verified
Mexico
Local time: 16:24
German to English
TOPIC STARTER
Immediate interim solution Nov 6, 2006

Hi Claudia - I think Viktoria has some good ideas on how to go about the final clean-up, but my current proposal is that we start collecting bad KudoZ entries right away.

With all the projects the ProZ.com staff is working on, I assume it will be quite some time before they get down to brass tacks for the ultimate clean-up. But since hundreds of members use the KOG every day for their translation work, (I certainly do) I'm proposing that we channel this potential energy into a data
... See more
Hi Claudia - I think Viktoria has some good ideas on how to go about the final clean-up, but my current proposal is that we start collecting bad KudoZ entries right away.

With all the projects the ProZ.com staff is working on, I assume it will be quite some time before they get down to brass tacks for the ultimate clean-up. But since hundreds of members use the KOG every day for their translation work, (I certainly do) I'm proposing that we channel this potential energy into a database that the lexicographers to be named in the future can use once we've decided on the strategy.

Whoever is designated to work on the clean-up will then already have hundreds of terms to start working on rather than having to start the process from scratch.
Collapse


 
Denyce Seow
Denyce Seow  Identity Verified
Singapore
Local time: 06:24
Member (2004)
Chinese to English
Questions... Nov 6, 2006

Kim Metzger wrote:

...my current proposal is that we start collecting bad KudoZ entries right away.


What do we do after we collect the bad Kudoz entries? Send them to a database as you have mentioned in your first posting? Who will set up this database?

I know Henry and the others are really busy right now, especially with the Edinburgh conference coming up. However, we need them to set something up, at least a system, so that we have proper instructions and guidelines to carry out this project. What do you think?

Denyce


 
Kim Metzger
Kim Metzger  Identity Verified
Mexico
Local time: 16:24
German to English
TOPIC STARTER
Database Nov 6, 2006

Denyce Seow wrote:

I know Henry and the others are really busy right now, especially with the Edinburgh conference coming up. However, we need them to set something up, at least a system, so that we have proper instructions and guidelines to carry out this project. What do you think?

Denyce


Yes, Denyce. I know they're extremely busy. I'm just hoping that when they all get back they might think setting up a database makes sense. It's just an interim step that will take advantage of something that is already taking place every day: KudoZ glossary searches. And you're right, too, that we would need guidelines, etc.


 
KathyT
KathyT  Identity Verified
Australia
Local time: 08:24
Japanese to English
Why not set up a Google spreadsheet? Nov 7, 2006

I agree that this is a great idea and the sooner things can get underway, the better. In the Japanese language SCs also, we have way too many entries of the "FYI"-type variety.

What about Google spreadsheets as a starting point?
See http://www.google.com/googlespreadsheets/tour1.html

From the Wikipedia
... See more
I agree that this is a great idea and the sooner things can get underway, the better. In the Japanese language SCs also, we have way too many entries of the "FYI"-type variety.

What about Google spreadsheets as a starting point?
See http://www.google.com/googlespreadsheets/tour1.html

From the Wikipedia entry on Google spreadsheets:
(http://en.wikipedia.org/wiki/Writely)
Google Docs & Spreadsheets is a Web-based word processor and spreadsheet application offered by Google. It allows users to create and edit documents and spreadsheets online while collaborating in real-time with other users. Docs & Spreadsheets is the result of two services, Writely and Spreadsheets, and they were merged on October 10, 2006 into a single product.

It seems fairly straightforward, and willing volunteers could have their email address added to the list of people granted access so that they can add to or update entries as time permits.
If necessary, this could be limited to a handful of people per language pair, OR alternatively, separate spreadsheets could be maintained for each SC, in anticipation of merging them later when the matter gets to the top of the Proz.com staff "To Do" list.
Too awkward?
Collapse


 
Gina W
Gina W
United States
Local time: 18:24
Member (2003)
French to English
Regarding moderators and glossary entries Nov 7, 2006

SzIwonka wrote:

I am not sure who should decide about the quality and corectness of a gloss entry. Moderators? It would mean putting more work on them.


True, but also the moderator could actually be wrong. Just because they volunteer as site moderators does not inherently mean they know each and every term correctly.

So what would be the basis for deciding whether or not an entry is "bad"? Some answers might work in one context but in no other context. There are also dialects of target languages - particularly, British English or American English - not to mention the source languages, that may make someone think that an answer is "bad", but it may not really be.

Just wondering what the criteria will be, then, for deciding whether or not an entry is "bad".


 
mediamatrix (X)
mediamatrix (X)
Local time: 18:24
Spanish to English
+ ...
What the KOG is - what it ain't - and how to handle the KOG problem Nov 7, 2006

I think there is a general misunderstanding here of what the KOG is - as opposed to what its name implies it should be.

Many of the problems with the KOG would simply go away if 'KOG' were changed to 'KO-something else' (your choice...) which would be a better representation of what it actually is and its value to professional translators in particular and Mankind in general.

... See more
I think there is a general misunderstanding here of what the KOG is - as opposed to what its name implies it should be.

Many of the problems with the KOG would simply go away if 'KOG' were changed to 'KO-something else' (your choice...) which would be a better representation of what it actually is and its value to professional translators in particular and Mankind in general.

Although the KOG purports to be a container for glossary data, in fact what it contains is a summary (and often a very poorly-prepared summary) of the contributions in response to a question about a particular term or expression that proved to be obscure for the asker. In many cases a KOG entry does not actually contain any answer that correctly fits the term of the question as posed. Or if it does, then it often gives a translation which is very specific to the obscure useage of the question and thus is not directly transferable to better-written source material. In very few cases does it actually give a straightforward translation of a straightforward term - which is what most people would expect of a glossary.

Additionally, KOG entries are devoid of what matters most in any worthwhile glossary: not only a term but also a definition and an array of other essential data without which the entries cannot be properly and unambiguously interpreted: part of speech, language variant, register, etc. etc. (we all know the theory...).

This has become blatantly obvious in WikiWords which, in principle, is a multilingual dictionary - i.e. a data repository sharing many characteristics with a multilingual glossary. When staff decided to import thousands of KOG entries into WikiWords, actually they only imported the terms (separately in each Proz language pair, leaving users to merge them where appropriate); and they imported all these terms in bulk - duplicates, rubbish and all (into a dictionary which is supposed to be 'concept'-based, i.e. where the definition is the primary key, but that's another story). For all the other clues about useage - including the actual meaning of the term - WikiWords users have to click a link back to the KOG with the original KudoZ entry.

Even if the WikiWords interface provided data fields for register, language variants, etc. - which it doesn't, despite numerous requests for these features - all KOG-sourced WikiWords entries now need to be revised and expanded by drafting dictionary-style definitions for reach and every one of them.

In contrast to Proz.com, WikiWords.org was at least built from the outset to handle dictionary/glossary data. That is its vocation - and hopefully one day it will have a full set of tools allowing it to fulfil that vocation. Of these two websites, WikiWords - despite its present shortcomings - is the more appropriate place for the re-use (note, I don't say 'conversion', less still 'clean-up') of Kudoz/KOG data in a dictionary/glossary context, subject to the availability of unlimited (wo)manpower to restructure and re-write everything and fill in the missing data.

I would suggest that, rather than trying to clean up the KOG it should be left alone, renamed and taken for what it really is: a rough and ready summary of answers to KudoZ questions. The only clean-up that can be justified, within the KOG itself and given the multitude of other projects in hand at Proz.com at present (which seem to have stalled WikiWords development altogether, incidentally), is that which could be done with a few lines of SQL - i.e. the automatic elimination of fully-matching duplicate entries mentioned already by Valentin.

Any other effort - including the proposals here to tag duff KOG content for hypothetical 'future action' - will do nothing to help sort out the current mess in WikiWords, since any such tagging will not be transferable to WikiWords where terms have already been merged, expanded and in some cases partially corrected - rendering them independent of the original KOG content.

MediaMatrix
Collapse


 
Gina W
Gina W
United States
Local time: 18:24
Member (2003)
French to English
Good suggestion Nov 14, 2006

mediamatrix wrote:

I would suggest that, rather than trying to clean up the KOG it should be left alone, renamed and taken for what it really is: a rough and ready summary of answers to KudoZ questions. The only clean-up that can be justified, within the KOG itself and given the multitude of other projects in hand at Proz.com at present (which seem to have stalled WikiWords development altogether, incidentally), is that which could be done with a few lines of SQL - i.e. the automatic elimination of fully-matching duplicate entries mentioned already by Valentin.

Any other effort - including the proposals here to tag duff KOG content for hypothetical 'future action' - will do nothing to help sort out the current mess in WikiWords, since any such tagging will not be transferable to WikiWords where terms have already been merged, expanded and in some cases partially corrected - rendering them independent of the original KOG content.

MediaMatrix


Well put. I am not in favor of "cleaning up" the KOG because I may not agree with the criteria for deciding which entries are "bad", and which entries are acceptable. When I use the ProZ.com Term Search, I look at the entire question and do not simply use the translation listed on the search results. So it really doesn't matter what answer an Asker did or did not choose, since I make up my own mind which translation I will use in the context of my document.


 
mediamatrix (X)
mediamatrix (X)
Local time: 18:24
Spanish to English
+ ...
Proposals Nov 14, 2006

With the creation of WikiWords, Henry & Co. provided themselves - and us - with prototype tools for managing multilingual dictionary/glossary data.

By transferring KOG data in bulk from here to WikiWords, however, they have done two things which are extremely counter-productive:

- they have duplicated the quantity of bad data to be cleaned up;

- they have made it very easy for users to make one copy o
... See more
With the creation of WikiWords, Henry & Co. provided themselves - and us - with prototype tools for managing multilingual dictionary/glossary data.

By transferring KOG data in bulk from here to WikiWords, however, they have done two things which are extremely counter-productive:

- they have duplicated the quantity of bad data to be cleaned up;

- they have made it very easy for users to make one copy of that data (WikiWords) even worse, by adding extra languages to unclear, undefined concepts, etc., as mentioned earlier;

Additionally, users can continue to transfer KOG data (their personal glossaries) to WikiWords, without any clean-up - and that, alas, is exactly what many WikiWords users have done and continue to do on a daily basis.

It was suggested long ago in the WikiWords forum that users should not be allowed to bulk transfer their KOG glossaries with their eyes shut. Instead, they should be required to review each entry, adapt it to meet the criteria governing WikiWords content, and provide the minimum additional data not found in the KOG: definition, example sentence, part of speech, etc. A 'review dialogue' of this kind has not been implemented, thus far.

Meanwhile, the quantity of data to be cleaned up here in the KOG is growing day-by-day...

It would make sense, I believe, to concentrate staff manpower on the development of synergies between Kudoz and Wikiwords rather than on building systems to 'fix' the KOG. And, for all those who are demonstrating willingness to contribute to the building of worthwhile dictionaries and glossaries - here and in WikiWords - it would be appropriate if we were able to concentrate our expert (wo)man power where it is most beneficial: in drafting good basic dictionary/glossary data that can serve as the basis for expansion into multiple languages.

With these objectives in mind I suggest the following modifications to the Kudoz system and WikiWords, drawing on the strengths of each system (such as they are):

In Kudoz
- When a user is invited to make an entry in the KOG, they should be offered the opportunity (and given the necesssary tools) to adapt the Kudoz data into the form required for a WikiWords concept. If they do that, the data would be transferred into WikiWords, arriving there as a well-formed bi-lingual concept ready for the addition of extra languages.

Regardless of whether or not the user does that, the 'raw' Q/A can still go in the KOG, as now, on the understanding that it will probably never get cleaned up. No matter - as WikiWords becomes the primary dictionary resource, the KOG will eventually become redundant, obsolecent, obsolete - and die a natural death.

In WikiWords
- Add to the home-page a list of recent concepts arriving from the KOG, as a replacement for the 'Concept of the Day' feature, to encourage WikiWords users/enthusiasts to work on enhancing and expanding these fresh concepts. That way, a translator's problem posed today in Kudoz may blossom into a properly drafted and validated 20-language entry in WikiWords in the space of 24 hours - rather than ending up as yet another dollop of junk in the KOG.

- Allow users to import data from the KOG only via a revision dialogue, as described above.

- Delete all existing KOG data from WikiWords, except that which has already been modified/expanded within WikiWords - including all content imported by WikiWords users from their personal glossaries in KOG which have not been adapted to meet WikiWords criteria.

....

There are another problems that could be alleviated or resolved as a by-product of these modifications.

One is that WikiWords would receive a steady stream of input in the whole range of languages appearing in Kudoz questions, and this would help to counter the present day over-representation of half a dozen languages - and English in particular.

Another is the Kudoz points sysytem. There have been numerous discussions here as to whether it is the asker or one of the answerers who should get the points. If the above proposal is adopted - and to encourage all users to contribute meaningfully to the transfer of valid data to WikiWords - I suggest that there should be points only for data that is correctly reformatted for WikiWords. No-one should get points for merely dumping junk into the KOG.

MediaMatrix
Collapse


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Starting the KOG clean-up process






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »