Witten bell VS Kneser ney smoothing
Thread poster: agusia147

agusia147
Poland
Jul 16, 2015

Could you please help me ? I am new in MT translation. I am interested why sometimes witten bell smoothing returns better results than kneser ney ? In what situations it is better to use each of them ? Do you have any good references ?

 

Benno Groeneveld  Identity Verified
United States
Local time: 10:20
English to Dutch
+ ...
I don't even want to know Jul 18, 2015

what Kneser ney smoothing is, if someone tries to explain it online thusly:

Kneser-Ney evolved from absolute-discounting interpolation, which makes use of both higher-order (i.e., higher-n) and lower-order language models, reallocating some probability mass from 4-grams or 3-grams to simpler unigram models. The formula for absolute-discounting smoothing as applied to a bigram language model is presented below:

Pabs(wi∣wi−1)=max(c(wi−1wi)−δ,0)∑w′c(wi−1w′)+αpabs(wi)

Here δ refers to a fixed discount value, and α is a normalizing constant. The details of this smoothing are covered in Chen and Goodman (1999).

Source: http://www.foldl.me/2014/kneser-ney-smoothing/ and on that page the formula looks even smoothier.

Wikipedia has even more interesting formulas:

https://en.wikipedia.org/wiki/Kneser–Ney_smoothing

I wonder how that would come out after a Machine Translation.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 16:20
Member (2006)
English to Afrikaans
+ ...
. Jul 18, 2015

agusia147 wrote:
I am new in MT translation.


Do you study MT translation at college or university, and if so, what course are you studying?

I am interested why sometimes Witten-Bell smoothing returns better results than Kneser-Ney? In what situations it is better to use each of them? Do you have any good references?


No, I have no idea, but it was interesting to find out what "smoothing" isicon_smile.gif and what the different types of smoothing are:
http://www.speech.sri.com/projects/srilm/manpages/pdfs/chen-goodman-tr-10-98.pdf

Why do you say that Witten-Bell sometimes returns better results? Did you actually experience this when you applied it, or did someone (and if so, who?) tell you that it does that?


 

neilmac  Identity Verified
Spain
Local time: 16:20
Spanish to English
+ ...
Facepalm Jul 19, 2015

...

 

Samuel Murray  Identity Verified
Netherlands
Local time: 16:20
Member (2006)
English to Afrikaans
+ ...
Why? Jul 19, 2015

neilmac wrote:
Facepalm


Why?


 

Christine Andersen  Identity Verified
Denmark
Local time: 16:20
Member (2003)
Danish to English
+ ...
Most of us here haven't a clue Jul 19, 2015

Many of us haven't the faintest idea of what Witten Bell or Kneser Ney are, and quite honestly don't care!

I can sometimes understand why people keep trying to find algorithms for language and get MT to work. Like the old alchemists, they may discover something useful along the way, even if it is not exactly what they were looking for.

Way back in the 1970s I was involved with the purely empirical end of a monolingual project with recognition of keywords and strings for indexing the library at a resarch institution. There was not even a screen on the computer, and we librarians with dusty fingers were not allowed near it! (Singluar - a university or a big research institute might have a computer, but not everyone else...) We worked with huge heaps of blue-striped Leporello printout.

The manual indexes on dusty carboard cards were getting far too big to handle, and the mechanical 'peek-a-boo' punched card systems and whatever were not a lot better, so the institute decided to embrace the brave new world of computers and construct magnetic tape indexes. Only a couple of years before, at library school, one of our teachers had told us he could not really believe that computers would be much use in libraries, but we needed to know enough about them to keep them out.

Well, the rest is history - now every time anyone types a searchword into Google, it churns through databases more vast than we can imagine, and turns up millions of results in seconds.

Somewhere inside me the old librarian is still muttering that the now defunct AltaVista could find ten absolutely spot-on hits that were worth more than the millions Google finds. Often I only need one anyway... But OK, the chances are that Google finds it among the first 20 hits. Everything else it finds is usually totally useless.

_______________________

A lot of members on this site - including me - have seen the output from all those fancy algorithms, and of course you will find some quite good results of you choose a single sentence or a passage that is not too ambiguous. The grammar may (or may not) be seductively correct, but it does NOT necessarily convey the meaning of the source text, and often it still sounds strangely 'foreign'. MT cannot convert syntax, idioms and sentence structures reliably. It often has trouble with gender, singular and plural, and other factors which are indicated differently in different languages.

We working translators are still at a similar stage to my teacher at library school. We know the phenomenon of human speech and writing, and some have studied linguistics and semantics and etymology, deixis and semiotics... (most of that is Greek to me, but I have studied a little pragmatics...)

Some of us have tried post-editing machine translation (PEMT).
I can assure you, I am not impressed.
That unmanageable library was to all intents and purposes monolingual, and covered hydraulics in the wide sense of the word - there were departments of Tribology, Aeronautics, and about a dozen others. I worked with harbours, dams and offshore installations. It was just one branch of science, with enfringing smatterings of law and mathematics, for instance, but excluded most of human knowledge, theoretical science, the arts, philosophy etc.

I still don't think MT would be very useful if anyone wanted the material translated.
________________________

I did learn that you should never say never, but I am certain I will not live to see MT replacing translators.

The extra dimension of a second language, and all the problems that arise from homonyms in different domains - not to mention the fact that languages are constantly developing...

I cannot imagine MT coping with the complexity of human language, for all the talk of artificial intelligence and neural networks.

But good luck if you enjoy playing with them, and let us humans do the translating!

______________________________________

That is my somewhat longer take on what neilmac can express in eight letters:

Facepalm!

icon_smile.gificon_smile.gificon_smile.gif

[Edited at 2015-07-19 16:20 GMT]


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 15:20
Member (2009)
Dutch to English
+ ...
Try stackexchange.com/stackoverflow.com, etc. Jul 19, 2015

agusia147 wrote:

Could you please help me ? I am new in MT translation. I am interested why sometimes witten bell smoothing returns better results than kneser ney ? In what situations it is better to use each of them ? Do you have any good references ?


Probably not the best to place to ask this question, although there are a few people here who might know about this stuff. Jeff Allen or Kevin Dias, are you reading this?

I think you'd probably be better off asking it over in one of the http://stackexchange.com/about forums.

See e.g.:

http://stackoverflow.com/questions/7694888/how-to-smooth-unigrams
http://stackoverflow.com/questions/238541/looking-for-books-on-information-science-information-retrieval/238677#238677
http://stackoverflow.com/questions/15697623/training-and-evaluating-bigram-trigram-distributions-with-ngrammodel-in-nltk-us
http://stackoverflow.com/questions/3017455/ngram-idf-smoothing/3020955#3020955

http://stackexchange.com/about


 

Jeff Allen  Identity Verified
France
Local time: 16:20
Multiplelanguages
+ ...
NLP metrics for MT Jul 19, 2015

agusia147 wrote:

Could you please help me ? I am new in MT translation. I am interested why sometimes witten bell smoothing returns better results than kneser ney ? In what situations it is better to use each of them ? Do you have any good references ?


Michael Beijer wrote:

Probably not the best to place to ask this question, although there are a few people here who might know about this stuff. Jeff Allen or Kevin Dias, are you reading this?


Thanks Michael for tagging me. I had see this thread come up in my notifications.

Agusia, although I've been doing the MT gig for 2 decades, I find that all the studies on such metrics are stale and dry as they are often done in isolation with regard to people who spend their time translating.
Some publications outlets for such studies would be the ACL, NAACL, EAMT, AMTA, and the MT Marathon.
However, most of what I've read from such sessions really has not been that helpful for deciding how to integrate an MT system in an effective and efficient way and how to keep translators producing translated output at a level above what they were doing before using MT. These are the real, burning questions which should always be asked and sought out to resolve.
And these are the types of questions which I asked as a panelist at the recent webinar: Machine Translation Past, Present and Future - An Interview with Philipp Koehn
http://www.asiaonline.net/EN/Resources/Webinars/#Webinars21

Jeff


 

Christine Andersen  Identity Verified
Denmark
Local time: 16:20
Member (2003)
Danish to English
+ ...
I do not mean to be sarcastic Jul 20, 2015

To augusia147

It struck me that my remarks about alchemists could sound rude and sarcastic, I'm sorry.

The background is that I have recently been to a fascinating lecture about how alchemy actually laid the foundations for modern chemistry, and the alchemists were NOT simply dreaming about the impossible all the time.

They did not know it was impossible to make gold, and in fact it was not always their immediate aim anyway. Some of them experimented, noted results and designed new experiments, in principle just as modern scientists do. Research for its own sake has produced a great many valuable results all through history, so don't be put off by comments on this site about MT.

The theory and understanding of language is always fascinating, even if we can't put it directly to use in the next translation.

We feel that MT is still a lottery, no matter how strongly you tip the odds, and in many situations guessing is not good enough, or only as a starting point for checking things out. In terms of alchemy, we are still aiming for gold, or as close as you can get ... Humans make errors too, but so far an expert human and another to check the text still seem to be the safest bets in many cases.

All the best with your research!


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 15:20
Member (2009)
Dutch to English
+ ...
Thanks for the link about the webinar! Jul 20, 2015

Jeff Allen wrote:

agusia147 wrote:

Could you please help me ? I am new in MT translation. I am interested why sometimes witten bell smoothing returns better results than kneser ney ? In what situations it is better to use each of them ? Do you have any good references ?


Michael Beijer wrote:

Probably not the best to place to ask this question, although there are a few people here who might know about this stuff. Jeff Allen or Kevin Dias, are you reading this?


Thanks Michael for tagging me. I had see this thread come up in my notifications.

Agusia, although I've been doing the MT gig for 2 decades, I find that all the studies on such metrics are stale and dry as they are often done in isolation with regard to people who spend their time translating.
Some publications outlets for such studies would be the ACL, NAACL, EAMT, AMTA, and the MT Marathon.
However, most of what I've read from such sessions really has not been that helpful for deciding how to integrate an MT system in an effective and efficient way and how to keep translators producing translated output at a level above what they were doing before using MT. These are the real, burning questions which should always be asked and sought out to resolve.
And these are the types of questions which I asked as a panelist at the recent webinar: Machine Translation Past, Present and Future - An Interview with Philipp Koehn
http://www.asiaonline.net/EN/Resources/Webinars/#Webinars21

Jeff



Thanks for the link to the interview with Philipp Koehn. Will definitely watch that!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Witten bell VS Kneser ney smoothing

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search