Pages in topic:   < [1 2 3 4] >
How big of a threat is Google translate?
Thread poster: Tim Drayton
Tomás Cano Binder, BA, CT
Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 00:04
Member (2005)
English to Spanish
+ ...
Entirely agree... Feb 7, 2009

José Henrique Lamensdorf wrote:
The bottom line is that translators (and musicians) who want to hold their jobs as such must never stop improving, to preserve some safe distance from that line below which machine translation is viable. This line will be chasing them up forever.


Absolutely. I would never hope that my little sons will be able to survive taking over my activity as a translator in its current shape. Many things will change in 20 years time, and in my opinion in that time I will be probably selling my expertise instead of producing millions of translated words.

As in so many other trades and industries, the bulk of translation will be taken over by machines, but that does not mean translation expertise will not be needed. In theory, someone who has studied electrical engineering knows how to design and build a generator, but in practice engineers only install, maintain, sell, or repair them. Electrical engineers are needed, even if motors are made by machines. This same level of expertise will be needed in the translation industry, for sure.

[Edited at 2009-02-07 14:58 GMT]


 
Tim Drayton
Tim Drayton  Identity Verified
Cyprus
Local time: 01:04
Turkish to English
+ ...
TOPIC STARTER
Impressive Feb 7, 2009

Alistair Gainey wrote:

It’s not a threat –yet. But we shouldn’t forget how quickly things change on the Internet. Look at how it has affected the entertainment industry. In my language combination (Russian-English), GT already produces texts that, in some cases, are almost as good as, if not better than, ones produced by a non-native speaker of English. E.g.:

“According to information posted on the site of the organizing committee "Sochi-2014", XXII Olympic Winter Games will begin on Friday, February 7 and will culminate in Sunday, February 23, 2014. Thus, prior to the Olympic Winter Games in Sochi is exactly five years.
According to the Chairman of the Coordinating Commission of the International Olympic Committee (IOC), Jean-Claude Killy in Sochi have all the conditions, "to hold the exclusive Olympic Games." During the meeting with Prime Minister Vladimir Putin, which took place in late January of this year, a representative of the IOC said that "in fact, no obstacles to be held in Sochi these Olympic Games, which surprised the world."
The IOC delegation, who visited Sochi, were satisfied with the progress of preparations for the Olympics-2014. As Jean-Claude Killy, the overall impression is very good, gradually fell into place. " Prime Minister Vladimir Putin, in turn, said that by the year 2012 "should be to ensure that the Olympic facilities in Sochi for a test of competition." According to him, Russia is preparing Olympic infrastructure "on the planned schedule, and already this year at all the sites will begin installation and construction work," recalls agency Interfax.”

(The original is at http://www.vesti.ru/doc.html?id=250930.)

That said, it’s all very well looking at a paragraph or so of a GT output and thinking ‘Blimey! Not bad –especially for something free!’. But while one or two errors in a paragraph may be acceptable if you’re just trying to get the gist of something, imagine reading pages and pages where you’re constantly having to make an extra effort to understand the text.
As translators, we do not translate mere sentences or paragraphs. We translate documents. We translate manuals, brochures, reports, legal texts, instructions, financial statements, patents, and so on. Many of these require specific terminology. A lot are confidential. Others involve particular formatting requirements. I believe that the type of people willing to rely on GT are not the type of people who would be willing to pay good money for a decent translation. Consequently, I am not worried about it at the moment. However, from what I can see, it does have greater potential for improvement than previous machine translation programmes, and I’ll be interested to see how it develops in the next few years.


The example you quote is impressive. I will follow developments in my own main language pair with interest.


 
Daniel Grau
Daniel Grau  Identity Verified
Argentina
Member (2008)
English to Spanish
No threat: the better it gets, the more money I make Feb 7, 2009

Driven by the impossibility of using Wordfast plus an MT program on the Mac (a Windows-only functionality), I developed my own MT macro, which I've been using since last August. Whenever Wordfast gets a 0% match, the macro calls an Applescript that in turn accesses Unix commands to submit (and fetch) text segments to Google translate.

While this translation service fails spectacularly with complex texts and twisted structures, it is extremely amenable to being edited when translatin
... See more
Driven by the impossibility of using Wordfast plus an MT program on the Mac (a Windows-only functionality), I developed my own MT macro, which I've been using since last August. Whenever Wordfast gets a 0% match, the macro calls an Applescript that in turn accesses Unix commands to submit (and fetch) text segments to Google translate.

While this translation service fails spectacularly with complex texts and twisted structures, it is extremely amenable to being edited when translating simpler texts of a general nature. Witness the following example:

English original:

Why should people be concerned about lead?
• It is a toxic substance that is particularly harmful to children and unborn babies.
• Lead can damage a child’s brain. This can affect a child’s ability to learn, lead to behavioral problems, and may even cause mental retardation. Lead can also damage vision, stunt growth, and cause other serious health problems. Some of the damage done by lead is irreversible.
• Children with lead poisoning do not always look or act sick or have symptoms readily associated with lead. The only way to find out if a child is exposed is to have her/him tested for lead.
• Lead can also cause serious health and reproductive problems in adults.

Raw Google Spanish:

¿Por qué la gente debería preocuparse por el plomo?
• Es una sustancia tóxica que es particularmente perjudicial para los niños y bebés.
• El plomo puede dañar el cerebro de un niño. Esto puede afectar la habilidad del niño de aprender, dar lugar a problemas de comportamiento, e incluso puede causar retraso mental. El plomo también puede dañar la visión, reducen el crecimiento, y causar otros problemas graves de salud. Algunos de los daños causados por el plomo es irreversible.
• Los niños con envenenamiento por plomo no parecen siempre actuar o enfermo o tiene síntomas fácilmente asociado con plomo. La única manera de saber si un niño está expuesto es que le realizarán las pruebas de plomo.
• El plomo también puede causar graves problemas reproductivos y de salud en adultos.

The way I works is, I edit each segment as it is being suggested, and then I read it before adding it to the TM and advancing to the next segment. If the segment is no good, I retranslate. In other words, I proceed likewise, whether the text comes from the TM (fuzzy and 100% matches) or from Google (0% matches).

I just timed myself for one hour of work with simple text (the above is a random sample extracted from it), in order to obtain a precise productivity figure. Over the course of that hour, Wordfast suggested units from the TM in just two instances, as I just started this new project with no TM available. And in that hour, I translated 1074 English words (1256 Spanish).

As usual, when I reach the end of this 9000-word file, I will spell-check and read the whole thing through, globally eliminate spurious spaces, check the format, etc., so the time I spend doing that will detract from the high productivity I obtained so far. Nevertheless, my point is that a CAT tool can in certain cases be used to the translator's advantage.

Regards,

Daniel
Collapse


 
Anmol
Anmol
Local time: 04:34
Can you stop the march of technological progress? Feb 8, 2009

I recall walking into a bank in India in 1983. The bank official was a friend of my father, and we struck up a conversation. I casually mentioned that the bank should computerize its operations. The gentleman turned red, and with furrowed brow, thundered at me: "Do you want us to lose our jobs? I will tell your father about your poor behavior!" And he did. About my behavior, that is. Poor or otherwise. My father was both bemu... See more
I recall walking into a bank in India in 1983. The bank official was a friend of my father, and we struck up a conversation. I casually mentioned that the bank should computerize its operations. The gentleman turned red, and with furrowed brow, thundered at me: "Do you want us to lose our jobs? I will tell your father about your poor behavior!" And he did. About my behavior, that is. Poor or otherwise. My father was both bemused and amused.

Fast forward 25 years. Now bank officials here cannot do without their computers. And work levels have multiplied, not come down, due to the use of computers.

I do not believe Google is a threat at all. It could turn out to be a blessing in some way by relieving us of some of the more mundane tasks in translation. In the grander scheme of things, we will evolve to adapt to our circumstances. Perhaps of necessity!

[Edited at 2009-02-08 06:23 GMT]
Collapse


 
Anmol
Anmol
Local time: 04:34
VSO or VOS - not a major obstacle Feb 8, 2009

Tim Drayton wrote:

It doesn't have to be Turkish. There are many VOS languages out there which would probably require the same kind of transformation before any sort of statisticial corelations would make sense: Japanese, Hindi, Farsi to name but three. I would be interested to hear what translators from other such languages into English think.


I don't think the VOS-VSO structure of any given language is an obstacle to automatic translation. Incidentally, Hindi is an SOV language, with the verb routinely (but optionally) appearing at the very end of the sentence in all sentences (unlike German, where the verb is transposed to the end only in phrases beginning with daß).

These sort of transformations are no obstacle at all to automatic translation, as I understand.

A Yale computer scientist came up with a theory of semantic dependencies, where all sentences are broken down into semantic concepts. Another interesting theory breaks down sentences into a classification that includes the action being performed (verb), the actors in the sentence (nouns, including nominative, accusative, dative and genitive), and the conditions under which the action is being performed (instrumental and locative).

Once a sentence has been classified into categories, translating it into another language essentially consists of recombining the concepts in the word order of the target language, be it VSO, VOS, SOV or other, using the terms and case structure of the target language.

Where automated programs falter is in correctly pigeonholing words into the different categories. And of course, in applying the correct case structure! Which is what is likely to keep us in business for a while!


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 00:04
Multiplelanguages
+ ...
Finally, another person showing productivity results in using MT Feb 8, 2009

Daniel Grau wrote:

Driven by the impossibility of using Wordfast plus an MT program on the Mac (a Windows-only functionality), I developed my own MT macro, which I've been using since last August. Whenever Wordfast gets a 0% match, the macro calls an Applescript that in turn accesses Unix commands to submit (and fetch) text segments to Google translate.

The way I works is, I edit each segment as it is being suggested, and then I read it before adding it to the TM and advancing to the next segment. If the segment is no good, I retranslate. In other words, I proceed likewise, whether the text comes from the TM (fuzzy and 100% matches) or from Google (0% matches).

I just timed myself for one hour of work with simple text (the above is a random sample extracted from it), in order to obtain a precise productivity figure. Over the course of that hour, Wordfast suggested units from the TM in just two instances, as I just started this new project with no TM available. And in that hour, I translated 1074 English words (1256 Spanish).

As usual, when I reach the end of this 9000-word file, I will spell-check and read the whole thing through, globally eliminate spurious spaces, check the format, etc., so the time I spend doing that will detract from the high productivity I obtained so far. Nevertheless, my point is that a CAT tool can in certain cases be used to the translator's advantage.


Daniel,

You are seeing similar productivity results with MT as I indicated in 2005 in using MT without any dictionary customization activity.
http://www.geocities.com/mtpostediting/ (The article in section 2: What is Post-editing?)

If you go a next step and do content analysis up-front and feed terminology translations (and variants around those terms) into the MT system, then this improves the speed even further.
I have provided the explanation, productity statistics on real projects (and even full set of examples of one project) at the same website above:
section 2
Improved Translation Quality with Machine Translation Dictionary Building (2006)
&
Case Study: Implementing MT for the Translation of Pre-sales Marketing and Post-sales Software Deployment Documentation (2004)

(for the 2nd one, I had already provided a link to the PDF file in another forum post here at http://www.proz.com/post/211575#211575)

Jeff


[Edited at 2009-02-08 21:36 GMT]


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 00:04
Multiplelanguages
+ ...
more details on Google's statistical MT approach Feb 8, 2009

Charlie Bavington wrote:

I would guess (and it is a guess) that they simply haven't quite collected enough parallel texts yet. My understanding is that any kind of analysis of grammar has been abandonned, given the exponential rise in processing power, in favour of simply collecting enough translations/parallel texts to either find the exact phrase translated before or something similar enough to be worth a punt.


Charlie,
I have provided a more descriptive explanation of the statistical MT approach (now used by Google) a couple of months ago at:
statistical MT approach + TMs
http://www.proz.com/forum/translator_resources/100328-machine_translation:_your_experience_with_the_various_mt_programmes_state_of_play-page2.html#998639

and

the mixing of TM and MT technologies
http://www.proz.com/forum/translator_resources/100328-machine_translation:_your_experience_with_the_various_mt_programmes_state_of_play-page3.html#1000970

The grammatical analysis is still the key to the MT rule-based systems which I have explained in many other posts here at ProZ. It is with those systems that I have achieved the productivity results with MT indicated in many posts.

once statistical systems are able to feed in the terminology and grammatical customizations which can be done, then this type of linguistic override could have a significant impact on the translation output of such statistical systems.

I know that Asia Online has been working on improving their statistical based one along these lines.

Jeff


 
Miguel Llorens
Miguel Llorens  Identity Verified
Local time: 00:04
English to Spanish
+ ...
In memoriam
Google Translate: a threat or an opportunity for translators? Feb 11, 2009

Hasn't anyone ever thought that better machine translation would actually create more work? Theoretically, it would drive down per word rates, but it could also increase productivity and boost overall volumes. My intuition is that 100% publishable MT is unlikely (to cite a simple argument, the avalanche of neologisms brought forth by the credit crunch: “subprime,” “credit default swaps,” “TARP,” etc.). Thus, a human touch will still be necessary. But cheaper translation will probably... See more
Hasn't anyone ever thought that better machine translation would actually create more work? Theoretically, it would drive down per word rates, but it could also increase productivity and boost overall volumes. My intuition is that 100% publishable MT is unlikely (to cite a simple argument, the avalanche of neologisms brought forth by the credit crunch: “subprime,” “credit default swaps,” “TARP,” etc.). Thus, a human touch will still be necessary. But cheaper translation will probably lead companies, governments and just plain people to translate much more material than they do at present. The end consumer will do some of the work by him or herself, just as some executives now do their own word processing. But some work will still require language specialists. The problem is static thinking. One shouldn’t think of the volume of translation as a fixed pie, but one that is liable to (potentially) explosive growth given better tools. Look at Daniel’s example above.

Of course, one can’t be naively optimistic. There is a possibility that the rise in productivity and quality will drive wages down to lower levels (or even subsistence levels). That is difficult to predict. But there is also the possibility that wages will be driven higher. My point is that it is impossible to predict the future or to discern how better technology will transform the world. The only certainty is that the profession will be radically different ten years from now. Therefore, the anxiety over being replaced by technology is really quite misplaced. Embrace the change and try to profit from it instead of wasting time worrying about machines drinking your milk shake.
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 00:04
Multiplelanguages
+ ...
MT for Turkish / English Feb 13, 2009

Alistair Gainey wrote:

It’s not a threat –yet. But we shouldn’t forget how quickly things change on the Internet. Look at how it has affected the entertainment industry. In my language combination (Russian-English), GT already produces texts that, in some cases, are almost as good as, if not better than, ones produced by a non-native speaker of English....



Tim Drayton wrote:
The example you quote is impressive. I will follow developments in my own main language pair with interest.


Tim, your language pair is Turkish to English per your profile. MT development is not equivalent for each language. Russian with English is one of the better developed language pairs, for a number of reasons. French / Spanish with English also have a long history (Systran has been doing French / English for over 3 decades).

Turkish / English is still considered a new language pair for the commercial MT developers, and it takes a significant amount of upfront development (eg, financial, human and time resources) to get a new language pair up and running, at least for rule-based MT systems. The statistical MT systems are a little different because it is possible to use existing aligned translated material as training data, but again this depends on the availability of such translated content.
I know of a few requests for commercial MT systems for Turkish, but having a request and committing to the funding necessary to make it happen are 2 different things. As for the European Union, it is slow on the MT side because the system that is used is the old 1976 EC-Systran system that has been customized for a long time, but it is no longer Systran that is developing it for the EC. There are initiatives for MT development that are funded by the EC-funded technology framework programs, but funding for them has been limited on language technology aspects for the past several years. It is however coming around again. A workshop on translation technologies for EC funding was held a couple of weeks ago. To see where that goes.....

Language technologies take a significant amount of development work, and it is not suprising to see why Translation Memory tools have taken a couple of decades to mature into productivity drivers for translators. TM is easier in a sense in that it is based on character and string pattern matching (example-based memories with some statistics), but nearly all commercial MT systems today are rule-based, so they take a lot of upfront development on the grammatical rules and the dictionaries in order to get a prototype system up and running.

Commercial MT systems are driven by customer sales, so they will invest in language pairs where there is a well-defined market need with enough corporate and mass-market home and business users who will buy the software. In such contexts, it is easier to obtain the customer terminology and data in order to better customize and train the system.

Research based systems (Google, universities) are those that focus on other languages, yet they are limited by type of content and access to content to train the systems on.

John Hutchins' Compendium of Translation Software is "the" reference on all MT systems and language directions that are available. It is updated regularly (at least once a year).

Jeff

[Edited at 2009-02-13 23:04 GMT]


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 00:04
Multiplelanguages
+ ...
MT could bring more work Feb 13, 2009

Miguel Llorens wrote:

Hasn't anyone ever thought that better machine translation would actually create more work?


Hi Miguel,

Thanks for your post.

I've been giving examples for a long time of how MT has been successfully implemented and has increased translation needs, volume, and the work. It also impacted the rate structure from per word to per hour for some MT postediting participants.

A couple of other posts in the forums related to this:

http://www.proz.com/forum/business_issues/57640-do_you_see_systran_as_an_aid_or_as_competition-page3.html#439868

http://www.proz.com/forum/translator_resources/100328-machine_translation:_your_experience_with_the_various_mt_programmes_state_of_play-page2.html#988987

Jeff


 
Anmol
Anmol
Local time: 04:34
Your recommendation Feb 14, 2009

Given that technology changes so fast, what package would you recommend, Jeff? Is Systran still the most reliable, especially since it is a paid product? I understand the EU does not use Systran anymore, however.

 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 00:04
Multiplelanguages
+ ...
European Commission and MT Feb 14, 2009

Anil Gidwani wrote:

I understand the EU does not use Systran anymore, however.


Meetings in Nov 2008 and January 2009 indicate that the European Commission translation service has been successfully using MT. This is reference to the EC-Systran system, and it is of course just one part of the entire translation workflow that the EC translation service has put into place over the years.

I provided info on this in 2 other forum posts:

The MT'd EU texts
http://www.proz.com/forum/business_issues/66636-job_offer_involving_the_proof_reading_of_machine_translation-page2.html#517440

links to European Commission Translation workflow/process article/presentations
http://www.proz.com/forum/cat_tools_technical_help/23397-european_union_translation_software.html#187941



Also, the funding agencies of the EC (IST and HLT programs) have more plans to invest funding into further MT development.

For both points above, see:

ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/language-technologies/events-20081126-cencioni-rossi_en.pdf

ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/language-technologies/ict_psp_wp_and_call_3.ppt

ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/language-technologies/uszkoreit_en.pdf


and it is possible to find the language service provide which provide outsourced MT postediting for the European Commission translation service, such as:

http://www.capstan.be/index.php?option=com_content&task=view&id=12&Itemid=19


Jeff



[Edited at 2009-02-14 15:37 GMT]

[Edited at 2009-02-14 15:44 GMT]

[Edited at 2009-02-14 16:26 GMT]


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 00:04
Multiplelanguages
+ ...
which MT software product Feb 14, 2009

Anil Gidwani wrote:

Given that technology changes so fast, what package would you recommend, Jeff? Is Systran still the most reliable, especially since it is a paid product?


Anil,

The answer to this is not black and white.

If you are used to considering Translation Memory software program, these types of translation tools are really just multilingual search/replace pattern matching tools and usually cover many languages (and sometimes referred to as being language independent, well more-or-less language independent). The choice of one TM tool over another will depend usually on language (group) typology issues such as display single vs double-byte characters, cyrillic fonts, handling Right-to-Left languages. All of the TM tools cover the majority general features for common language issues, and sometimes only the more mature TM tools will cover some of these language typology features. Other than these types of points, your choice of one tool over another (if those features are equal) will be more along the lines of file format support, user friendliness and ergonomy, learning curve timeframe, existence of good training and tutorial materials, support of operating systems, etc. But in general, if you are handling language pairs that do not delve into the language typology issues, you can usually use any of the TM tools.

However, when you look at any MT product, there are 2 different approaches to consider.

1). Does the MT vendor offer your language direction (or both directions of the language pair)?
This seems like a fairly straightforward criterion to a translators, but what you need to consider if how much research and development time was put into creating the engine and dictionary for your language pair. If the language direction has been on the market with that tool for 10 years, then there is a higher probability that it will be more mature than if they only put it on the market within the past year or two.

Also note that some MT vendors that offer a small set of languages (2-10), other than offer 10-40 or so, and then those that offer hundreds.
In general, those who specialize in a smaller set of language directions will have done more focused work on those languages and aim at higher quality than those who have 200+ language directions.

This is the type of information which affects the raw output translation quality of the system.

2) What features exist in the system?

If you have 2 or 3 systems which provide fairly equal raw translation output (push button, click and translate level quality), then the choice of one system over another will depend more on other features (basic versus advanced dictionary building, ease of dictionary entry, level of ability to customize dictionary entries, postediting features, word movement features, ability to code expressions, ability to code semantic notions to terms, etc).

And when you look at the range of MT software products on the market today, they really have a wide range of features.

For professional translators, you want to limit yourself to those MT vendors which have MT products which are called professional or expert level tools, since many of the MT products can be aimed at basic home users. Those such tools will not interest you.

So when you look at MT tools, you want to look at the types of features that are available, and if those features can help you customize your translations or not, and to which extent.

Not all MT vendors offer all language directions, so that will help limit your choice.

In some cases, it can come down to a choice of preferring a tool that has more advanced features in the professional/expert version which allow you to customize the translation, although the raw output translation quality might be lower than with another MT vendor's tool, or you might want to prefer the slightly better raw output over the postediting features.

In general, I have seen that not every MT tool has all language directions, and not all MT tools have the best features.

MT tools are much more language dependent than TM tools, and this affects the choice of the software tool.

But, it is very important not to judge only on the click-and-translate raw output. Dictionary building features and grammatical manual override features are often more useful.


For English/French and French/English I primarily use PROMT although I have SYSTRAN and other MT tools.

But PROMT is not available for all language pairs (Asian languages for example), so I use SYSTRAN for those, but then my usage is more for content gisting that for producing professioal translations for customers for those languages. Different type of use.


Jeff


 
Mark Daniels
Mark Daniels  Identity Verified
Local time: 00:04
Serbian to English
+ ...
Google Translate ANSWERING KudoZ questions! Feb 16, 2009

Somebody quipped that GT might be able to ask questions on KudoZ. Well, what do you think to this?

http://www.proz.com/kudoz/serbo_croat_to_english/education_pedagogy/3076832-ratna_hirurgija.html

Have a close look at the references used by this "winning" responder! I was more than a little shocked...

I
... See more
Somebody quipped that GT might be able to ask questions on KudoZ. Well, what do you think to this?

http://www.proz.com/kudoz/serbo_croat_to_english/education_pedagogy/3076832-ratna_hirurgija.html

Have a close look at the references used by this "winning" responder! I was more than a little shocked...

I do not really feel threatened by GT outdoing me as a translator, even though I have been surprised at how good a job it sometimes does with Serbian - English (not vice-versa though, that sucks right now). What I worry about more is people, especially clients, not understanding the distinction and me having to spend even more time explaining why you can't give your precious translation to just anyone, never mind a computer tool, not when it really matters. It's a full-time marketing job as it is.

I would also echo what somebody else said about proofreading suspiciously machine-like pre-translated texts. Frankly I avoid proofing other people's translations at the best of times. GT texts generally need a bare minimum of 40% correction to get them up to scratch (i.e. more than just comprehensible), and I would certainly not take on a job that required that much work. They can try it if they want, but I won't be biting!
Collapse


 
Tim Drayton
Tim Drayton  Identity Verified
Cyprus
Local time: 01:04
Turkish to English
+ ...
TOPIC STARTER
Turkish and English Feb 16, 2009

Tim, your language pair is Turkish to English per your profile. MT development is not equivalent for each language. Russian with English is one of the better developed language pairs, for a number of reasons. French / Spanish with English also have a long history (Systran has been doing French / English for over 3 decades).

Turkish / English is still considered a new language pair for the commercial MT developers, and it takes a significant amount of upfront development (eg, f
... See more
Tim, your language pair is Turkish to English per your profile. MT development is not equivalent for each language. Russian with English is one of the better developed language pairs, for a number of reasons. French / Spanish with English also have a long history (Systran has been doing French / English for over 3 decades).

Turkish / English is still considered a new language pair for the commercial MT developers, and it takes a significant amount of upfront development (eg, financial, human and time resources) to get a new language pair up and running, at least for rule-based MT systems. The statistical MT systems are a little different because it is possible to use existing aligned translated material as training data, but again this depends on the availability of such translated content.
I know of a few requests for commercial MT systems for Turkish, but having a request and committing to the funding necessary to make it happen are 2 different things. As for the European Union, it is slow on the MT side because the system that is used is the old 1976 EC-Systran system that has been customized for a long time, but it is no longer Systran that is developing it for the EC. There are initiatives for MT development that are funded by the EC-funded technology framework programs, but funding for them has been limited on language technology aspects for the past several years. It is however coming around again. A workshop on translation technologies for EC funding was held a couple of weeks ago. To see where that goes.....

Language technologies take a significant amount of development work, and it is not suprising to see why Translation Memory tools have taken a couple of decades to mature into productivity drivers for translators. TM is easier in a sense in that it is based on character and string pattern matching (example-based memories with some statistics), but nearly all commercial MT systems today are rule-based, so they take a lot of upfront development on the grammatical rules and the dictionaries in order to get a prototype system up and running.

Commercial MT systems are driven by customer sales, so they will invest in language pairs where there is a well-defined market need with enough corporate and mass-market home and business users who will buy the software. In such contexts, it is easier to obtain the customer terminology and data in order to better customize and train the system.

Research based systems (Google, universities) are those that focus on other languages, yet they are limited by type of content and access to content to train the systems on.

John Hutchins' Compendium of Translation Software is "the" reference on all MT systems and language directions that are available. It is updated regularly (at least once a year).

Jeff

[Edited at 2009-02-13 23:04 GMT] [/quote]

Jeff, I think this is part of the story. I have a hypothesis, which I articulated at the beginning of this thread, that purely statistical machine translation will be far more successful in languages with similar syntactic structures, such that words tend to bunch together in similar groups in both the source and target languages and any statistical matches will be highly significant. I will watch developments with interest, but I remain sceptical as to whether statistical machine translation will ever be able to provide satisfactory results between languages with very different structures such as English and Turkish. A detailed investigation of this topic goes well beyond the scope of a thread like this.
Collapse


 
Pages in topic:   < [1 2 3 4] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How big of a threat is Google translate?







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »