Statistical Machine Translation and Example-based Machine Translation
Copyright © ProZ.com, 1999-2015. All rights reserved.
This comparative study of machine translation (henceforth, MT) is focussed on corpus-based approaches, in particular, Statistical Machine Translation (SMT) and Example-based Machine Translation (EBMT).
1.1 Corpus-based approaches in MT history
In the early days of MT (1950s and 1960s) there were two contrastive approaches (Hutchins, 2006:380), usually dubbed the ‘empiricists’ and the ‘rationalists’. The empiricists’ methodologies included elements of statistical techniques in analysing texts to derive dictionary rules, whilst the rationalists took a strictly linguistic approach, perhaps due to the inadequate computer facilities at the time. In the quiet years after the 1966 ALPAC report, research was severely underfunded and only a few systems survived. Systran, a rationalist, rule-based MT (RBMT) approach was one of them, and consequently RBMT approaches were dominant in the 1970s and 1980s, seemingly viewed as the ‘safe’ option. Although corpora had previously been used indirectly in RBMT systems for database compilation and other statistical information, systems introduced in the 1990’s exploited corpora directly in the analysis, transfer and generation of translations (Hutchins, 1993:6) and required little or no linguistic programming. New life was breathed into the early empiricist models, and corpus-based approaches were deemed “a new departure in MT research” (Hutchins, 1993:6), providing “an alternative to the intractable complexity of rule-based approaches” (Hutchins, 1994:12). They were theorised and awaited for decades, but only made possible by faster-running computers, large bilingual corpora, alignment tools and other technologies, which have furnished MT with a wealth of possibilities. Since then, corpus-based approaches have gone from strength to strength and now form the backbone of many commercial MT systems (e.g. Google Translate is statistical).
An example-based approach was first proposed by Makoto Nagao in 1981 to translate abstracts in English from Japanese (Carl and Way, 2003:viii), but his vision was only realised towards the end of the decade, principally by research groups in Japan. Later, pleased with the efficacy of stochastic techniques in speech recognition, a group from IBM decided to revive statistical methods in MT. The results from their MT system ‘Candide’ were presented by Peter Brown at the 1988 TMI conference, and SMT emerged as a success, with over half of the translations acceptable. The time was characterised by Frederick Jelinek’s infamous statement, “every time I fire a linguist, my system’s performance improves”, an overt disparagement of rule-based approaches (Carl and Way, 2003:3). Since the 1990’s large bilingual corpora have been built in almost every significant domain and language pair, and their success can only increase.
1.2 Example-based MT
The founding principle of EBMT is to “translate a source sentence by imitating the translation of a similar sentence already in the database” (Sago and Nagao, 1990:1). EBMT requires a bilingual corpus of translation pairs (bitexts), normally aligned at sentence level. Exact matches are rare and the probability of retrieving one decreases as the sentence length/complexity increases, so systems facilitate partial matching (compositional EBMT). An algorithm is employed to match SL input strings against SL strings in the corpus, the corresponding TL segments of which are then selected and recombined into a translation of the original sentence, once they have been verified by the language model as an acceptable sentence in the TL.
Although the core processes of EMBT - matching, retrieval, adaptation - distinguish it from other paradigms (such as RBMT) it has many ancillary processes reminiscent of these approaches. The use of thesauri and dictionaries can make a similarity measure between potentially polysemous words by context and facilitate ‘fuzzy’ matching, and the system allows for variables such as proper names, dates and numerals by recognising structural matches from parsed strings (Hutchins, 2005b; Carl and Way, 2003:xix).
1.3 Statistical MT
SMT is founded on the theory that every source language segment (S) has any number of possible translations (T), and the most appropriate is the translation that is assigned the highest probability by the system. It requires a bilingual corpus for each language pair, a monolingual corpus for each target language (TL), a language modeller and a decoder.
The bilingual corpus is aligned and tagged to correspond at word level. This is measured in fertility, that is, the ratio of how many TL words a source-language (SL) word can give rise to. The system allows for crossed links caused by word order differences (known as distortion), and varying degrees of fertility, such as one-to-one equivalence (fertility=1), one-to-many equivalence (fertility=2) and one-to-zero equivalence (fertility=0).
The statistical translation model decomposes sentences in the bilingual corpus into smaller chunks and then calculates the probabilities of each TL phrase being a translation of its parallel SL phrase. The probabilities are stored as ‘phrase tables’, in which multiple TL phrases are listed as possible translations of a SL phrases, each with a varying probability (Callison-Burch and Koehn). A language model analyses the monolingual TL corpus in order to ‘learn’ a sense of grammaticality (e.g. word order), based on n-gram statistics (usually digrams or trigrams), and then calculates the probabilities of word x following word y etc. in the TL. The probabilities are calculated during the preparation stage and stored.
When presented with a new translation, the SL segments are segmented into smaller phrases. They are matched with source language equivalents in the corpus and their translations harvested by the decoder. As the search space is theoretically infinite, the decoder uses a heuristic search algorithm to harvest and select appropriate translations. “we build an English sentence incrementally ... keep a stack of partial translation hypotheses. At each point, we extend these hypotheses ... then prune the stack” (Manning and Schütze, 1999:487). The most probable translation is constructed and then verified by the language model as a valid TL sentence. If not, the process must be repeated. The result is a global maximum of the product of the two probabilities (ten Hacken, 2001:10), and is calculated with Bayes’ Theorum, a probability inference equation (Quah, 2006:77).
1.4 Pertinent comparisons
Turcato and Popowich (2003:1) suggest three questions in classifying MT approaches, giving an excellent framework in which to discuss the various similarities or differences between SMT and EBMT. As corpus-based approaches are so linguistic-lite, I have rephrased “linguistic information” to simply “information”, giving:
1. What information is used, i.e. linguistic or non-linguistic?
2. From where is information acquired?
3. When is information acquired?
“A fundamental distinction exists between systems that use linguistic knowledge and those that do not. Statistical MT falls into the latter class” (ibid., p3). SMT uses purely statistical methods in the analysis and generation of texts, whereas EBMT uses a variety of linguistic resources, such as dictionaries and thesauri for the similarity measure, a bilingual lexicon for substitutions, and even parsing or morphological analysis at the analysis stage (ibid., p1).
With regard to the source of information, both SMT and EBMT use a corpus as their main data source. However, in SMT the corpus is simply the raw material from which the main data source - the probabilities - is created, whilst EBMT takes information directly from the corpus.
EBMT uses information from the corpus at run-time, i.e. when a new translation has been requested. In SMT, there is no consultation of the corpus at run-time, as all data is derived in advance of the translation process. From this, Turcato and Popowich make a distinction between implicit and explicit knowledge: EBMT leaves the information unexpressed (implicit) in the corpus to be used when requested, whilst SMT extracts it (making it explicit) in the form of probabilities. When given a choice between time or space, SMT has chosen time (the computer does not need to make calculations at run-time but more hard memory is needed to store the probabilities) and EBMT has chosen space (all matching and selection must be performed at run-time but no long-term storage is required).
Both EBMT and SMT systems suffer from ‘boundary friction’, when the putative sentences contain syntactic and morphological discrepancies. SMT uses n-gram statistics in the language model to remedy the problem, but EBMT systems approach this issue in a variety of ways, such as ‘disalignment’ (Carl and Way, 2003:28), the comparison of the new TL sentence to TL sentences in the database in the same way as the ST is at the matching stage; checking which translation candidate occurs most often on the World Wide Web (Gitte Kristiansen, 2006:476); or the syntactic parsing or part of speech tagging of input and examples in the database (Hutchins, 2005b).
By definition, corpus-based approaches are primarily reliant on a bilingual corpus, with which there may be scalability issues (Quah, 2006:84). If the corpus is too small or the subject matter too theoretical, then close matches will be hard to retrieve. Conversely, if it is too large, the repetition of examples may have adverse effects on performance. Some consider corpus-based approaches to be best suited to sublanguages (Carl and Way 2003:9), although this may be an accidental assumption caused by the predefined domain types of corpora used (e.g. parliamentary debates, EU financial documentation).
SMT is currently seen as the most dominant MT approach (Hutchins 2007), whilst EBMT has struggled to find its identity. There are so many variations of the approach, such as the inclusion of rule-based methods (e.g. parsing) and statistical analysis, that there is “no clear consensus on what EBMT is or isn’t” (Hutchins, 2005c:1). The result is the assimilation of EBMT methodologies into other paradigms, leading to hybrid EBMT-RBMT or EBMT-SMT systems (Hutchins, 2005c:2). It is now accepted that a single approach will not maximise output quality, and hybrids are viewed as the best option. The best of all approaches can be combined to maximise results, for example using statistical methods for transfer and generation with a syntactic and morphological base for analysis (Hutchins, 2007:14). Perhaps neither approach has established a new paradigm, but they have brought fresh ideas to the MT table and enhanced existing systems. They are now an integral part of the machine translation process.
Brown, P. et al (1993). The Mathematics of Statistical Machine Translation: Parameter Estimation’. Computational Linguistics 19: 2
Callison-Burch, C. and Koehn, P., “Introduction to Statistical Machine Translation”, (Read online, December 2008: http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/esslli-slides-day1.pdf)
Carl, M. and Way, A., (Eds) (2003) Recent Advances in Example-based Machine Translation. Springer.
Hutchins, J. and Somers, H., (1992). An Introduction to Machine Translation. California: Academic Press.
Hutchins, J. (1993). Latest developments in machine translation technology: beginning a new era in MT research. In MT Summit IV Proceedings, Kobe, Japan, 20-22 July 1993. (Read online, December 2008: http://www.hutchinsweb.me.uk/MTS-1993.pdf)
Hutchins, J. (1994) Machine Translation: Ten Years On, 12-14 November 1994. Organised by Cranfield University in conjunction with the Natural Language Translation Specialist Group of the British Computer Society (BCS-NLTSG). (Read online, December 2008: http://www.hutchinsweb.me.uk/Cranfield-1994.pdf)
Hutchins, J. (1995a). Reflections on the history and present state of machine translation. In MT Summit V Proceedings, Luxemburg, 10-13 July 1995. (Read online, December 2008: http://www.hutchinsweb.me.uk/MTS-1995.pdf)
Hutchins, J. (1995b). A new era in machine translation research. Aslib Proceedings 47 (1): 211 - 219. (Read online, December 2008: http://www.hutchinsweb.me.uk/AslibProc-1995.pdf
Hutchins, J. (2001). Machine translation over fifty years. Histoire, Epistemologie, Langage: Vol. 23 (1): 7-31. (Read online, December 2008: http://www.hutchinsweb.me.uk/HEL-2001.pdf)
Hutchins, J. (2003) Machine Translation: A General Overview. In Mitkov, R. (Ed.)(2003.) The Oxford Handbook of Computational Linguistics. Oxford: University Press. pp.501 - 11. (Read online, December 2008: http://www.hutchinsweb.me.uk/Mitkov-2003.pdf)
Hutchins, J. (2005b) Example-based Machine Translation: A review and commentary. Machine Translation 19: 197 - 211. (Read online, December 2008: http://www.hutchinsweb.me.uk/MTJ-2005.pdf)
Hutchins, J. (2005c). Towards a definition of Example-based machine translation. In MT Summit X Proceedings, Phuket, Thailand. 16 September 2005. (Read online, December 2008: http://www.hutchinsweb.me.uk/MTS-2005.pdf)
Hutchins, J. (2006). Machine translation: history of research and use. In Brown, K. (Ed.) (2006) Encyclopedia of Languages and Linguistics: 2nd Edition. Oxford: Elsevier. (Read online, December 2008: http://www.hutchinsweb.me.uk/EncLangLing-2006.pdf)
Hutchins, J. (2007). Machine Translation: A concise history. To be published in Chan Sin Wai (Ed.) (2007) Computer aided translation: Theory and practice. China: Chinese University of Hong Kong. (Read online, December 2008: http://www.hutchinsweb.me.uk/CUHK-2006.pdf)
Kristiansen, G., (2006) Cognitive Linguistics. Walter de Gruyter.
Manning, C. D. and Schutze, H., (1999) Foundations of Statistical Natural Language Processing. MIT Press.
Sato, S. and Nagao, M. (1990) “Toward Memory-based Translation”, Proceedings of COLING 1990, Finland. (Read online, December 2008: http://portal.acm.org/citation.cfm?id=991190&dl=)
Somers, H., (1999). Review Article: Example-based Machine Translation, Machine Translation 14: 113-157 (Read online, December 2008: http://www.springerlink.com/content/k00lw822j783503t/fulltext.pdf)
Somers, H. (2001). EBMT Seen as Case-based Reading. In MT Summit VIII Workshop Proceedings on Example-based Machine Translation. (Read online, December 2008: http://www.iai.uni-sb.de/~carl/ebmt-workshop/hs.pdf)
Somers, H. (Ed.) (2003) Computers and Translation: A Translator’s Guide. John Benjamins Publishing Company.
ten Hacken, P., (2001). Has There Been a Revolution in Machine Translation? Machine Translation 16: 1-19
ten Hacken, P., (Michaelmas Term 2008) Lecture notes.
ten Hacken, P., (November 2008). Conversation regarding SMT.
Turcato, D. and Popowich, F. (2003). “What is Example-Based Machine Translation?” (Read online, December 2008: http://www.iai.uni-sb.de/~carl/ebmt-workshop/dt.pdf)
Quah, C. K. (2006) Translation and Technology. London: Palgrave Macmillan