ProZ.com global directory of translation services
 The translation workplace

 
Subscribe to this topic Track this topic
ProZ.com. It starts here. Join now

Pages in topic:   < [1 2 3] >
User
Thread poster: Barnaby Capel-Dunn
Is this the future? Automatic simultaneous translation within 5 years?

Victor Dewsbery  Identity Verified
Germany
Local time: 03:21
German to English
+ ...
Stupid box of wires! Oct 10, 2005

Try throwing a typo at the compruter and then seehing whatt coms ouit in the wash.

You can spot the deliberate typos in the first sentence (after all, you're a language expert). Your spellchecker will probably spot them, too (although it won't ***really*** know what to do with them). But just try feeding that sentence to your favourite MT system!
Ten points to the MT system that gives you a cheeky look and says "Will you please stop pulling my leg?".

Of course, the MT afficionados will tell you that you must proofread everything in the source language before you feed it through their system, and that you should be gentle with idioms (so my first sentence fails on that count, too).
In other words, the systems are "challenged" if you give them real text to cope with. (When did you last have a source text that didn't contain typos and loosely used language?)

Does anybody here feel threatened?


Direct link    Reply with quote
 

Peter Linton  Identity Verified
United Kingdom
Local time: 02:21
Member (2002)
Swedish to English
+ ...
Example-based machine translation Oct 10, 2005

Barnaby Capel-Dunn raises an important point (see his of MT and CAT) about Internet-based translation, or what is often called Example-based machine translation. That I agree provides a promising avenue for advanced CAT.

It will perhaps not provide good MT itself, because of the problem of evaluating the quality of the translations found. Nevertheless, an interesting area to watch.


Direct link    Reply with quote
 
Barnaby Capel-Dunn  Identity Verified
France
Local time: 03:21
French to English
TOPIC STARTER
One Hundred Years of Solitude Oct 10, 2005

Point taken, Gabriel. But what proportion of professional translations fall into this category? My guess is that most texts we are required to translate are only too amenable to the sort of advances in technology we have been talking about.

Direct link    Reply with quote
 

Williamson  Identity Verified
United Kingdom
Local time: 02:21
Flemish to English
+ ...
Technological progress Oct 10, 2005

I forgot to mention technological progress in the field of robotics. Didn't the Japanese made a female human-alike robot recently? In that field to progress will be made. Once a machine can "understand" language, it can also be programmed to interpret or translate much faster than a human being.

Direct link    Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 03:21
German to English
EBMT Oct 10, 2005


Peter Linton wrote:
Internet-based translation, or what is often called Example-based machine translation...

It will perhaps not provide good MT itself, because of the problem of evaluating the quality of the translations found.


Well, the concept of EBMT has been around for quite a while now, but the Internet has certainly given it an impetus because of the huge number of parallel corpora now available.

But as you more or less say yourself, the problem here is the age-old one: GIGO. Isn't the UN University trying to develop an EBMT system (possibly together with Google)? How on earth are they going to validate the quality of the source and target documents? Of course, they can't do that, it would take decades, if not centuries.

EBMT might, conceivably, work for extremely narrowly defined subject areas, in a small number of language pairs, where the source texts for translation display a high degree of homogeneity in terms of structure and content (and we're moving towards controlled language here). But you'd still need a dozen or so person-years to validate the reference documents. And by the time you'd done that, the documents would be obsolete...

[Edited at 2005-10-10 21:01]


Direct link    Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 03:21
German to English
MT and CAT redux Oct 10, 2005


Barnaby Capel-Dunn wrote:
If we accept that CAT is basically a question of harnessing the contents of our hard disk for the purposes of translation, why shouldn't a similar approach covering the whole of the Internet be used in the years ahead? Strides have already been made in this direction, e.g. Transearch.


Surely the problem with the current generation of CAT (i.e. TM) applications is that they're essentially dumb systems. Basically, they're nothing more than glorified pattern recognition and matching systems, and their USPs revolve around user-friendliness and price, rather than any significant technological advantages. To put it another way: there aren't any killer apps.

Adding morphology to 1G TM systems would appear to be the logical way forward, but there don't seem to be any serious efforts in this direction (Trados did have such a project, but abandoned it a few years back). Of course, this means developing language pair-specific systems (or modules), which would up the price. But provided the cost/benefit remained at a reasonable level, there would surely be a market (2000 euros for a TM system with integrated grammar rules, anybody?).

Systems like TranSearch are no doubt useful, but they depend on having high quality corpora in the first place. And correct me if I'm wrong, but I'd have thought that the cross-domain usefulness would be rather limited.

Perhaps we should be demanding more of the CAT manufacturers than merely recycling/repackaging what's essentially 1980s technology.


Direct link    Reply with quote
 
Tsu Dho Nimh
United States
Local time: 19:21
English
Yes ... certainly, Your Billness! Oct 11, 2005


Barnaby Capeel-Dunn wrote:

Herewith a translation of a short article appearing in the reputable French online journal "Génération Nouvelles Technologies" (http://www.generation-nt.com/actualites/).

"... Microsoft's founder thinks that within 5 years, the keyboard-voice-wiring on screen combination should revolutionise our way of working,in the same way as SIMULTANEOUS TRANSLATION WHICH SHOULD COME INTO EFFECT AT THIS TIME (my capitals)".
Oh well, we have been warned!

[Subject edited by staff or moderator 2005-10-10 13:30]


This from the company that is how many years late with their release of Longhorn/Vista?

Until the grammar checker in MSFT Word can match the average 5th-grader's grammar skill, we have nothing to fear from Microsoft.


Direct link    Reply with quote
 

Andrew Steel  Identity Verified
Spain
Local time: 03:21
Spanish to English
Is this the future? Automatic simultaneous translation within 5 years? Oct 11, 2005

My feeling is that there is no need for translators to worry about being usurped, though we will need to worry about keeping up with technology.

I don't think anybody who has looked closely at MT and the predictions for the future believes that these systems will eliminate human intervention.

Let's take a look at the impact of technology on other professions, such as architects and accountants. CAD software and accountancy packages have changed their work enormously. However, even though a client could buy the programs from their local dealer, they would be unlikely to achieve the same results as the professionals.

It is the knowledge of how to fully exploit the technology for a specific purpose, not the technology per se, that clients will pay for.

What technology has done for architects and accountants is enable them to provide a better quality service (3D virtual tours of their designs, or elimination of mathematical errors) much more rapidly than they could in the past, essentially offering a better price-to-quality ratio.

However, I don't see any evidence of architects and accountants being made obsolete.

The key for translators, like the aforementioned professions, is that we need to view our work primarily in terms of rate per hour, not per unit of work performed.

Therefore, taking a hypothetical case, if technology enables us to produce an average of 1,000 words per hour instead of the current benchmark of, let's say, 500, even though the rate per word may halve, we are still earning the same rate per hour.

The problem/opportunity arises because there will be a period in which not everyone will be able to average 1,000 words/hour as they will not have adopted and mastered the technology. Early adopters will earn higher than average hourly rates, late adopters will earn below average rates.

The fact that, under this hypothetical case, clients will potentially receive twice the volume of translation within the same deadline, and for the same price, will be a huge benefit for them, and is likely to lead to them translating more volume. This factor, combined with the ever-increasing volume of translation work required worldwide, means that competent translators are still likely to have enough well-paid work to make remaining in the profession worthwhile.

So, under this scenario, translation is subject to a cycle seen in many other professions/industries:

- practitioners adopt technology that enables them to increase output whilst maintaining quality.
- early adopters of the right technology have an advantage for 3-5 years until everyone else catches up.
- practitioners adopt new technology that enables them to increase output whilst maintaining quality.

In conclusion, I don't see acceptable MT being achieved within 5 or even 10 years, but I do see the benchmark for words/hour rising significantly from the current average of 500 (or whatever it is for each language combination), whilst price per word remains steady or falls.

Just a few thoughts,


Andrew


Direct link    Reply with quote
 
Barnaby Capel-Dunn  Identity Verified
France
Local time: 03:21
French to English
TOPIC STARTER
The voice of wisdom Oct 11, 2005

I absolutely agree with everything you write, Andrew. As you say, the problem/challenge/opportunity is to keep abreast of developments and this is not always an easy task, especially as one gets older (my case!). But I wonder how many people would agree with me that they get almost as much pleasure out of dealing with the technical environment surrounding our profession as from the process of translation itself. When one translates in industrial quantitites - as we have to do if we are to make a living - the "joy" of translating inevitably palls a little but I feel that the "technical" aspect (keeping up to date with developments in CAT and other software, Internet search, etc.) can go some way towards offsetting this phenomenon.
What do you think?


Direct link    Reply with quote
 

Riccardo Schiaffino  Identity Verified
United States
Local time: 19:21
Member (2002)
English to Italian
+ ...
Your examples through actual MT Oct 12, 2005


Peter Linton wrote:
The monkey ate the banana because it was hungry.
The monkey ate the banana because it was ripe.
The monkey ate the banana because it was time for tea.


Peter, if I run your examples through an actual MT system, this is what I get (into Italian):

La scimmia ha mangiato la banana perché era affamata.
[correct]
La scimmia ha mangiato la banana perché era matura.
[correct]
La scimmia ha mangiato la banana perché era tempo per tè.
[not correct: should be "l'ora del tè", but easily intelligible]

Of course, both "scimmia" and "banana" are feminine in Italian, so I replaced the monkey with a lion:

Il leone ha mangiato la banana perché era affamato.
[correct]
Il leone ha mangiato la banana perché era maturo.
[not correct: translated as if "it" referred to the lion, not the banana]
Il leone ha mangiato la banana perché era tempo per tè.
[not correct: should be "l'ora del tè", but easily intelligible]

This test was done using a free demo of an MT program that uses the systran engine.

The fact that the program does not "understand" the context, and that therefore cannot distinguish an "it" from another, is, I think, irrelevant to most realistic uses of MT:

1) when used for gisting, or to give a rough translation of something, level of quality in examples such as those above, is more than enough.
2) when used as another tool that a human translator can use to improve his or her productivity





[Modificato alle 2005-10-13 22:10]


Direct link    Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 03:21
French to English
+ ...
multiple glosses in MT systems Oct 12, 2005


Nick Lingris wrote:

Is anyone familiar with a machine translation system that has, say, two glosses for one term and knows when to use which?


yes, I worked on and implemented such an MT system at Caterpillar 10 years ago. We had some examples with up to 8 glosses per term for the technical domain, and we put into place a semantic model to semi-automate the choices based on empirical choices made by technical writers and translators over the years.


Jeff
http://www.geocities.com/mtpostediting/

[Edited at 2005-10-12 20:35]


Direct link    Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 03:21
French to English
+ ...
MT disambiguation and MT case studies Oct 12, 2005


RobinB wrote:
Disambiguation is the "silver bullet" of MT, and nobody's come even remotely close to cracking that particular problem using MT. But there again, most humans can't hack it either


On the contrary, disambiguation was the key focus of the MT system that we implemented at Caterpillar.

Also, last year (Sept 2004), I presented a MT case study on translation productivity with a focus on dictionary building and disambiguation configuration. It concerns two key pre-sales and post-sales documents for key major customer (big money) contracts -- not cheap or low-quality translation projects. The documents were translated and edited using MT systems in record time, compared to tradition human translation without the use of TM tools, validated by several depts for textual accuracy and acceptability, and were used and accepted by the end customers. And the customers came back to buy more of the software and services, so the translations were simply "exactly what the customer expected and required".

The full text (6 pages) of this case study is available at:
http://www.proz.com/post/211575#211575

Another set of productivity measurements for MT in a 2 page article that is listed and available at:
http://www.proz.com/post/212760#212760

And I'm currently working on a new article based on a 10-page internal technical report (including detailed logs of time spent on all stages of involvement in the project) for an MT project I conducted in Spring 2005. This report shows high-level translation quality achieved with translation productivity that takes less time (total 19 hours) than recorded on any similar project in the past. This is based on new special techniques of text analysis and dictionary building.
And this in using commercial MT systems to translate marketing field press release texts, something which I know few people would even choose to use MT for.

Jeff
-----
Jeff Allen, Ph.D.
Paris, France
http://www.geocities.com/jeffallenpubs/


[Edited at 2006-02-14 23:25]


Direct link    Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 03:21
French to English
+ ...
EBMT doesn't necessarily take several person years to build up Oct 12, 2005


RobinB wrote:
EBMT might, conceivably, work for extremely narrowly defined subject areas, in a small number of language pairs, where the source texts for translation display a high degree of homogeneity in terms of structure and content (and we're moving towards controlled language here). But you'd still need a dozen or so person-years to validate the reference documents. And by the time you'd done that, the documents would be obsolete...


I ran the translation lab of a multilingual EBMT project for several typologically different languages (Croatian, Haitian Creole, Korean, Arabic, French, Spanish) in which we demonstrated and published papers on our methods for rapidly building up EBMT databases for multi-topic and multi-domain translation.
Haitian Creole took the longest amount of time, because of issues regarding literacy levels of users/translators, and yet that only took 1 year to build up the database for the translation system (including a two-level editing process for all texts).
A detailed paper on our methods used for the translation team was presented in the following paper:
ALLEN, Jeffrey and Christopher HOGAN. 1998. Expanding lexical coverage of parallel corpora for the Example-Based Machine Translation approach. In Proceedings of the First International Language Resources and Evaluation Conference (LREC98), 28-30 May 1998, Granada, Spain. Vol. 2, pp. 747-754.
available at: http://www.geocities.com/mtpostediting/

Jeff
http://www.geocities.com/jeffallenpubs/about-jeffallen.htm


Direct link    Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 03:21
German to English
+ ...
Which of the many many links relates to disambiguation? Oct 13, 2005


Jeff Allen wrote:

Nick Lingris wrote:
Is anyone familiar with a machine translation system that has, say, two glosses for one term and knows when to use which?

yes, I worked on and implemented such an MT system at Caterpillar 10 years ago. We had some examples with up to 8 glosses per term for the technical domain, and we put into place a semantic model to semi-automate the choices based on empirical choices made by technical writers and translators over the years.
Jeff
http://www.geocities.com/mtpostediting/



Hi Jeff,
Couldn't find any info about disambiguation on the page quoted - as the URL implies, it seemed to be mainly about (manual?) post-editing. I didn't check the content - there must be 50 or more links on that page, and none of the titles were obviously about semantic disambiguation.

Your subsequent posting seems to concentrate on productivity statistics. (Reading between the lines, these statistics seem to be specific to one particular language pair and heavily dependent on the expertise and experience of the specific post-editor used in the process - were you that post-editor?)

Is there anything you can tell us about your disambuguation process (including what is actually meant by "semi-automate", the breadth of the subject field involved and the effort involved in ascertaining and validating the "empirical choices made ... over the years")?

[Edited at 2005-10-13 00:23]


Direct link    Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 03:21
French to English
+ ...
more on MT disambiguation Oct 13, 2005


Nick Lingris wrote:
Is anyone familiar with a machine translation system that has, say, two glosses for one term and knows when to use which?



Jeff Allen wrote:
yes, I worked on and implemented such an MT system at Caterpillar 10 years ago. We had some examples with up to 8 glosses per term for the technical domain, and we put into place a semantic model to semi-automate the choices based on empirical choices made by technical writers and translators over the years.




Victor Dewsbery wrote:

Hi Jeff,
Couldn't find any info about disambiguation on the page quoted - as the URL implies, it seemed to be mainly about (manual?) post-editing. I didn't check the content - there must be 50 or more links on that page, and none of the titles were obviously about semantic disambiguation.

Your subsequent posting seems to concentrate on productivity statistics. (Reading between the lines, these statistics seem to be specific to one particular language pair and heavily dependent on the expertise and experience of the specific post-editor used in the process - were you that post-editor?)

Is there anything you can tell us about your disambuguation process (including what is actually meant by "semi-automate", the breadth of the subject field involved and the effort involved in ascertaining and validating the "empirical choices made ... over the years")?


Hi Victor,
Sure, let me explain a few points.

Avoiding the term semantic disambiguation:
None of the titles of my writings explicitely indicate semantic disambiguation. About 10 years ago, they might have. But I was catering to more of a research oriented or academic audience at that time. Since 2000, all of my articles and presentations have gone from 10+ pages down to 1-3 pages. And the titles have tended to become more user-friendly.

Productivity statistics articles:
My subsequent posting containing the productivity stats (the telecom documentation paper) involved the implementation of lexical coding and selection techniques that I've been working on for many years with various commercial MT software packages. I did not explain in the article how it was done since I provide courses on this as an officially certified MT dictionary developer. Nor could I publish the telecom texts because they are confidential customer documentation for a multi-million dollar project.
However, I have shown the texts in person to several translators on different occasions, who have stated that it was significantly better MT output than they have ever seen with any other generic MT system.

My previous post mentioned a new upcoming article. This new article is, in fact, based on texts for which I first obtained authorization by the copyright holders to quote the source texts and the translated results for my articles and conference presentations. This removes the text confidentiality issue which often impedes MT specialists from showing the resulting texts (usually when done in technical fields for major customers).

Methods: 1 person and 1 language pair?:
Yes, I was the person in the telecom translation project. It involved 2 key stages, these being a) dictionary building and b) post-editing.
The methods used are not only restricted to a single language pair, nor are they used only by myself. Lorena Guerra, another Proz member, completed her MA thesis (Human Translation versus Machine Translation and Full Post-Editing of Raw Machine Translation Output) on the topic by using the same basic elements of this methodology. Her thesis is available online at:
http://www.geocities.com/mtpostediting/lorena-guerra-masters.pdf

A recently completed 10-page internal technical report with supporting documentation prove my dictionary building methodology on a translation project (mentioned in previous post) concerning marketing press release texts. This subsequently led to my official certification in summer 2005 as an expert dictionary builder for a commercial MT software package.

Disambiguation strategies:
As for the disambiguation strategies that were used on the Caterpillar project, the following papers and thesis provide all the details. Several of the articles mention "interactive disambiguation" which was conducted by the source language authors in the beginning stages of the implementation. I was the main user trainer for the deployment among 200 users. One of the papers gives statistics on the corpus size and types of automatic choices used for semantic domain modeling attachment decisions.

Baker, Franz, Jordan, Mitamura and Nyberg (1994)
"Coping With Ambiguity in a Large-Scale Machine Translation System"
Proceedings of COLING-94
http://www.lti.cs.cmu.edu/Research/Kant/PDF/ambig.pdf

Baker, Franz and Jordan (1994)
"Coping With Ambiguity in Knowledge-based Natural Language Analysis"
Proceedings of FLAIRS-94
http://www.lti.cs.cmu.edu/Research/Kant/PDF/flairs.pdf

Mitamura, Nyberg, Torrejon and Igo (1999)
"Multiple Strategies for Automatic Disambiguation in Technical Translation"
Proceedings of TMI-99
http://www.lti.cs.cmu.edu/Research/Kant/PDF/tmi99.pdf

Mitamura (1999)
"Controlled Language for Multilingual Machine Translation" (invited paper)
Proceedings of MT Summit, 1999
http://www.lti.cs.cmu.edu/Research/Kant/PDF/MTSummit99.pdf

Eric Crestan, 2001. Improvement of French generation for the KANT machine translation system (Caterpillar sponsored project). Diplôme de Recherche Technologique. Laboratoire d'Informatique d'Avignon (LIA) of the Université d'Avignon.
http://www.mail-archive.com/mt-list@eamt.org/msg00259.html
(Jeff Allen, Chairperson of the thesis defense committee)


Jeff
http://www.geocities.com/jeffallenpubs/


[Edited at 2005-10-14 07:42]


Direct link    Reply with quote
 
Pages in topic:   < [1 2 3] >


To report site rules violations or get help, contact a site moderator

Moderator(s) of this forum
Jana Uhlik[Call to this topic]
perry[Call to this topic]
fadidr[Call to this topic]

You may also contact site staff via support request