What are trigrams (machine translation)?
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Local time: 01:59
Member (2006)
English to Afrikaans
+ ...
Apr 24, 2006

G'day everyone!

There is an opensource machine translation program called PMST at http://www.geocities.com/bryanmceleney/psmt.htm. It speaks of trigrams, which is a requirement for making the program learn a new language. Does anyone know of some resources on trigrams in machine translation context for me?


(PS. Where's Jeff?)

Direct link Reply with quote

Gad Harel
Local time: 02:59
English to German
+ ...
What are trigrams Apr 24, 2006

Hi Samuel,

a nother time thanks for the last reply of LocStudio

now to your question of tigrams, have a look
at several pages







Direct link Reply with quote

Jennifer Baldwin  Identity Verified
Local time: 16:59
Member (2005)
French to English
+ ...
Trigrams Apr 24, 2006

A trigram is a model for training a statistical (corpus-taught) natural language system, in this case MT. The idea is that, given the probability of the first two words in any three-word sequence, the system can better predict the next word. A more traditional approach is the bigram model, which looks only at two-word sequences, whereby the first word in the sequence can help predict the second.

For example, in the bigram "laptop computer," you can see how "laptop" clues us in to the following word "computer." "Computer" has a high probability of following "laptop" in a text, much more so than "rabbit" or "book," or even other parts of speech - "happy," "quickly," "forever," etc. A trigram model simply goes a step deeper, looking at two words before predicting the third.

The probabilities we get from a bigram or trigram model are primarily obvious to humans who know the language, but they are tremendously important in helping a computer to "understand" language and grammar through statistical examples in a corpus.

In machine translation, trigrams add a layer of context. Rather than translate word-for-word, software can use trigrams to select the best word (probabilistically), given the previous two words in the trigram sequence. This improves overall accuracy.

Offhand, I don't know of any sources online to help you, but I do suggest searching for "bigram" or "n-gram" in addition to "trigram." It's all the same concept. Daniel Jurafsky and James Martin's book "Speech and Language Processing" has a well-written chapter on n-grams. (It's a well-known book in the field, so I would expect most libraries to carry it.)

I would expect that the software needs a large training corpus (in the target language) from which it will extract trigrams on its own.

Direct link Reply with quote
xxxLia Fail  Identity Verified
Local time: 01:59
Spanish to English
+ ...
See this link Apr 24, 2006

Hi Samuela,

I came across this program a long time ago when looking for concordancers.

I wonder would it be of interest?


kfNgram is a free stand-alone Windows program for linguistic research which generates lists of n-grams in text and HTML files. Here n-gram is understood as a sequence of either n words, where n can be any positive integer, also known as lexical bundles, chains, wordgrams, and, in WordSmith, clusters, or else of n characters, also known as chargrams. When not further specified here, n-gram refers to wordgrams. kfNgram also produces and displays lists of "phrase-frames", i.e. groups of wordgrams identical but for a single word.

See also http://www.kwicfinder.com/KWiCFinder.html

Direct link Reply with quote

Jeff Allen  Identity Verified
Local time: 01:59
+ ...
n-grams and MT Apr 24, 2006

Samuel Murray wrote:
Does anyone know of some resources on trigrams in machine translation context for me?

(PS. Where's Jeff?)

Hi Samuel,
sorry for my absence. I indicated in another forum on ProZ a couple of weeks ago that I've been preparing the company I work for a major audit of the entire R&D division.
Also had a bunch of other deadlines recently in parallel.
I've been popping in and out of ProZ when I have a few minutes, but that has been little lately.
Shame that there isn't a "busy" or "gone" status button on ProZ profiles.

Jennifer Baldwin's explanation higher above in this thread is a good into to n-gram analysis. I've used it mainly for speech data processing, and only some for MT systems since most MT packages I work with are rule-based systems and are locked up in a commercial package. Statistical and Example based MT use a lot of this n-gram methods.
Search on Andy Way at Dublin City University and Michael Carl at the IAI at Saarbrücken with regard to Example based MT. They surely give plenty examples of their use of n-gram stuff.


Jeff Allen, PhD

Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

What are trigrams (machine translation)?

Advanced search

Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »

  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search