Brief survey on machine translation use, input appreciated
Thread poster: Jared Tabor

Jared Tabor
Local time: 05:04
Sep 20, 2011

Hello all,

I would like to invite you to provide your input through a brief survey on the use of machine translation:

The results of this and other surveys will be shared at the upcoming virtual events series,




Katalin Horváth McClure  Identity Verified
United States
Local time: 04:04
Member (2002)
English to Hungarian
+ ...
You should ask for the language pairs Sep 21, 2011

Hi Jared,
In the survey, if a person answers Yes to using MT, I think you should ask the language pairs, as it is an important piece of info. The usefulness of currently available MT systems greatly depend on the language pair they are being used for.
Without including this info, I think the survey results may be hard to interpret correctly.
Another piece of info you may want to ask whether the answerer is a freelancer or an agency.

[Edited at 2011-09-21 17:14 GMT]


Jared Tabor
Local time: 05:04
Thanks Katalin Sep 21, 2011

Hi Katalin,

Thanks for the feedback, I agree that the language pair is an important factor in how useful MT will be for some tasks.



Jeff Allen  Identity Verified
Local time: 10:04
+ ...
language directions for MT Sep 21, 2011

It is actually language direction and not just language pair where the level of MT is impacted. A few examples which enter into the equation of quality.

1) Market need:

MT systems and software which aim at MT for translation publication (outward bound translation) -- basically what most professional translators produce translations for -- will usually be better in language directions FROM SOURCE language English to target languages. This is because a significant amount of content is written in English. English to FIGS (French, Italian, German, Spanish) + RU directions tend to be the strongest.

MT systems where content gisting (known as inward bound translation, and also under as requests of draft translations for understanding-only needs) for languages in general (including less widely used languages) will be better from other languages INTO English. These tend to be for intelligence community content filtering and gisting. These language directions usually include FIGS, Asian languages, Russian, & Middle Eastern languages (+ any language which is on the radar for the intelligence community) INTO English.

2) MT system type:

- Rule and dictionary based MT systems (RBMT) have been traditionally been developed per language "direction" based on the needs mentioned above. These depend on the availability of grammar rules to enter into the system for both source and target language and the creation of a built-in bilingual glossary/dictionary.

It is usually the less commonly used language which are developed the least for such MT systems, because it requires a bit of upfront development (usually about 2 years as a starter) to get these systems to maturity, so it requires a clearly defined business need and accompanying budget.

- Statistical MT systems usually try to start out as language independent from the technical system side (because they are looking at combinations of characters and other forms) and yet really mainly depend on the availability of significant "quantities" of source and target language "content". These are usually abundant for FIGES, Russian and Asian languages. Much less available for any other language that does not fit in the vague group of "major world languages"

Hybrid MT systems are now trying to address these concerns.

3) Maturity of the language direction + maturity of the language resource content:

- rule-based MT language directions that have been around for 30+ years (such as EN-FR and EN-FR) will in general provide better quality output than brand new language direction projects (such as EN-Swedish) until the new projects stabilize over time.

- Statistical MT systems are dependent on both the quality of the texts that are used to train the system and the quantity of texts that are available.

4) language typologies and language complexity

When a language direction is developed on languages with the same general language typology (language structure similarity), there is a general tendency to produce better MT quality. This can be seen in that EN FR and EN ES are better than EN DE.
Language directions introducing significantly different typologies such as EN and JP are a next level of challenge.
This is why "Knowledge-based MT systems (KBMT)", based on underlying semantic analysis, have produced better quality for such language directions than the rule based or even statistical based system. The downside is that KBMT system are very time-consuming (and costly) to create.

5) MT language pivoting

Any language that falls "outside" of being paired with English (or possibly French in the case of Systran) -- as in language X English -- risks using English as a pivot language to provide the language direction (for example: Swedish > English > Norwegian, Korean > English > Japanese).
This is not very different from Language Service Providers/translation agencies who run into the same problem on major projects with multiple language pairs when theu cannot find Professional translators who can do some or all language directions directly, so they sometimes pass through English as a pivot language. We all know that this requires more time, much more quality care, and more money. Using a pivot language for MT suffers the same problem and is even more risky.

5) Comparative timeline
A number of years back we conducted a comparative timeline study of developing both MT and speech systems for several language pairs and presented it in:

Many of the issues we described in that presentation/article are still valid today in the cases of points 1-4 above.

Thus language pairs (and I would even say language directions) are quite important when looking at evaluating MT quality.



Jared Tabor
Local time: 05:04
Thanks, Jeff Sep 21, 2011

Excellent stuff, thanks Jeff!



Neil Coffey  Identity Verified
United Kingdom
Local time: 09:04
French to English
+ ...
Also.. Sep 22, 2011

As well as the factors Jeff mentions, another factor is simply what parallel corpora for training are available in the language pair in question (for statistical systems, but in reality, current systems-- and notably Google Translate-- tend to be of the statistical type).


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Brief survey on machine translation use, input appreciated

Advanced search

memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search