Statistical machine translation
Thread poster: Barnaby Capel-Dunn
Barnaby Capel-Dunn
Barnaby Capel-Dunn  Identity Verified
Local time: 10:03
French to English
Nov 8, 2006

For those of us interested in this field, there's a very interesting article here:
http://www.nature.com/news/2006/061106/full/061106-6.html

I personally do believe that statistical translation, or maybe a combination of statistical and grammar-based machine translation, is the thing of the future. And I don't write this in a provocative or threatening vein. ... See more
For those of us interested in this field, there's a very interesting article here:
http://www.nature.com/news/2006/061106/full/061106-6.html

I personally do believe that statistical translation, or maybe a combination of statistical and grammar-based machine translation, is the thing of the future. And I don't write this in a provocative or threatening vein.
What do you think?
Incidentally, I would just like to take my hat off to Viktoria Gimbe whose contributions to these forums are unfailingly illuminating. Thanks Viktoria, if you read this!
Collapse


 
Williamson
Williamson  Identity Verified
United Kingdom
Local time: 09:03
Flemish to English
+ ...
The professional editor Nov 8, 2006

Which is what I have always said: within 10-15 years, the "professional translator" will be reduced to a "professional editor", editing translated texts.

 
Barnaby Capel-Dunn
Barnaby Capel-Dunn  Identity Verified
Local time: 10:03
French to English
TOPIC STARTER
And why not.... Nov 8, 2006

Williamson wrote:

Which is what I have always said: within 10-15 years, the "professional translator" will be reduced to a "professional editor", editing translated texts.


...when you come to think about it!


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:03
Multiplelanguages
+ ...
Merging of rule, example and statistics based MT Nov 11, 2006

Barnaby Capel-Dunn wrote:

For those of us interested in this field, there's a very interesting article here:
http://www.nature.com/news/2006/061106/full/061106-6.html

I personally do believe that statistical translation, or maybe a combination of statistical and grammar-based machine translation, is the thing of the future. And I don't write this in a provocative or threatening vein.
What do you think?


As I've discussed lately with colleagues and customers in meetings, the rule-based MT (RBMT) providers are moving in the direction of statistical analysis (SBMT), and the SBMT providers are making moves in the opposite direction. Same thing is seen with Translation Memory (TM) providers (Example-based MT) whose approach is mid-way.
The future will possibly show more types of systems that we were working on at Carnegie Mellon in the late 90s, Multi-Engine MT (MEMT).

Jeff Allen, PhD
http://www.geocities.com/jeffallenpubs/


 
Barnaby Capel-Dunn
Barnaby Capel-Dunn  Identity Verified
Local time: 10:03
French to English
TOPIC STARTER
Thanks for the information, Jeff! Nov 11, 2006

It's very interesting to have the considered judgement of an insider like you.
Perhaps it is unfair to ask you this in view of your current responsibilities, but have you noticed an improvement in the performance of MT in recent years? I did actually try out Systran whe it was in its infancy about 10 years or more ago. I didn't pursue it at the time, not because I didn't like it (I thought it was rather impressive actually) but because a) my computer wasn't equal to the task and b) I was
... See more
It's very interesting to have the considered judgement of an insider like you.
Perhaps it is unfair to ask you this in view of your current responsibilities, but have you noticed an improvement in the performance of MT in recent years? I did actually try out Systran whe it was in its infancy about 10 years or more ago. I didn't pursue it at the time, not because I didn't like it (I thought it was rather impressive actually) but because a) my computer wasn't equal to the task and b) I was feeling my way as a translator and found the input required rather daunting.
Would you recommend Systran or the like to the ordinary freelance translator today?
I think a lot of us would like to know what an expert like you thinks - but perhaps that puts you in an impossible position?
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:03
Multiplelanguages
+ ...
performance of MT systems Nov 11, 2006

Barnaby Capel-Dunn wrote:

It's very interesting to have the considered judgement of an insider like you.
Perhaps it is unfair to ask you this in view of your current responsibilities, but have you noticed an improvement in the performance of MT in recent years? I did actually try out Systran whe it was in its infancy about 10 years or more ago. I didn't pursue it at the time, not because I didn't like it (I thought it was rather impressive actually) but because a) my computer wasn't equal to the task and b) I was feeling my way as a translator and found the input required rather daunting.
Would you recommend Systran or the like to the ordinary freelance translator today?
I think a lot of us would like to know what an expert like you thinks - but perhaps that puts you in an impossible position?


Barnaby,

Thanks for your comments.

Your inquiry about "improvement in the performance of MT" is actually quite a vast one, because it covers many areas. For example, this can refer to any of the following points:
* translation processing speed on the server
* display speed on the local workstation
* translation speed via API when requests are passed through different servers (and proxies)
* compatability and stability when embedded in other software programs (such as plug-in icons with Microsoft Office and Internet Explorer applications
* increase of translation grammar rules (per language direction)
* increase of translation lexicons/dictionaries (per language direction)
* introduction of new features which improve productivity from the standpoint of the user and their work environment
* accuracy of automatic intuitive coding by MT dictionary manager module

As you can see, there is range of points which cover everything from software integration of one product into another, to system speed and processing, to items which affect the linguistic structure of target sentences.

As time goes on, things become more complex. 10 years ago, we all were living the MS Win95 transition from the previous MS-DOS operating systems (up to DOS 6.22 I recall), and on the way toward Win 98. The internet was just catching on, but I remember back then that my DOS updates were very few.
Compare that to today with System Administrators who harrass me every 2 weeks to set up the daily schedule of automatic updates available from Microsoft for everything MS related on my system.
I also recall back in 1995 when I was providing daily training and mentoring support to all translators at Caterpillar where I created icons on their desktops to access the anti-virus
software which we received updates on here and there. Now if you don't do daily updates, you catch everything.

All of this significantly accelerated time-to-market for software applications now makes it necessary to spend an enormous amount of effort to align one's own software with all of those updates.

This makes things complicated, because in the past, it was easier to say that we state supportability of 3rd party products only with the newer version of one's own software. But what do you do when MS IE 7 is already planned to be supported in the next version of your software, but then you discover at the last minute that MS has announced IE 7 as an automatically installable critical update on the computers of millions of people. So now you must also support it in your current version of the product on the market. These types of things affect "performance" in general because a software plug-in which no longer works in the updated version of the web browser is "zero performance" for the user.

The challenge over the past decade has been to line up with such rapid changes, and to juggle the prioritization and amount of effort to handle such issues compared with increasing the number of grammar rules and master dictionary entries.

On the other hand, 10 years ago, it required months of time to obtain signatures from major customers to obtain 50Mb of text from a customer to conduct corpus analyses on their data. Now it is possible to conduct analyses on data to the nth factor beyond that amount, which is available on the internet in many languages. The internet has become a way to test and measure the usage of expressions. In an article I wrote about 2 years ago, I compared the frequency of the variant forms "wreak havoc", "wreak havok", "wreck havoc" and "wreck havok" with several internet search engines. These tools and available of data now make it possible to determine shifts in usage of terms and expressions of the population in general, rather than just assuming that the form in the officially published dictionary is the one that people "really" use.

So, yes performance has improved on many different axes, and there is still more to do. As we progress, more external challenges come along which require us to always reconsider how to implement, or re-implement, things.

Linguistic quality is a never-ending target.

As for recommending commercial MT products to professional translators today, I've been answering a number of these kinds of points in a few other ProZ posts as well.

MT is the parent of TM
http://www.proz.com/post/440750#440750

online MT translation portals versus desktop software & corporate MT solutions
http://www.proz.com/post/440786#440786

replace or aid
http://www.proz.com/post/439868#439868

I suggest you look up the term "productivity" with my name in the ProZ forums, and you will come across several previous posts which specifically discuss different sectors/fields where I've trained users or have used MT tools and have produced concrete results.

The post below has a question which I still need to find the time to answer in detail. Need to go into my archives of a project I worked on 1 1/2 years ago and extract out the results so that I can post them.

Babelfish versus the pay for Systran version
http://www.proz.com/post/441602#441602


Jeff


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Statistical machine translation







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »