Systran vs reverso vs asia online - Feedback needed
Thread poster: kenhoo
kenhoo
France
Local time: 06:17
May 18, 2010

Hello,

I'm looking for feedbacks or links on this 3 solutions in order to be able to compare them and take a decision.

I'm looking for an automatic translation engine for my IT company so that collegues can have one-click-translation of their docs (doc, ppt, pdf, etc.) from english to french and the reverse.

I know it is not as good as human translation, that is not the point here. And we will still use human translator (me and external company) if needed

Thanks.


Direct link Reply with quote
 

RichardDeegan
Local time: 00:17
Spanish to English
On Systran May 18, 2010

I've been using Systran (4.0) for quite some within a Win98 environment on a PII (850). I find it especially helpful with Excel and PowerPoint files. However, there are occasional format problems (numbering changed to bullets, font changed, etc.) that one must be mindful of. Also, large reports in MS Word with a large number of tables may result in a program freeze-up, leading to eventual line-by-line use. Some of these problems may be due to the museum piece I use it with.
Systran also allows building dictionaries, but the number of entries is limited to 2,000 per dictionary, and only 5 dictionaries can be used for a particular file, at least in the version I have.
All things considered, it paid for itself and that particular PC back in 2002.
No point in talking about my other machines and programs.


Direct link Reply with quote
 
kenhoo
France
Local time: 06:17
TOPIC STARTER
Thanks for sharing May 18, 2010

Thanks Richard for this feedback!

Anyone else ?


Direct link Reply with quote
 

Cedomir Pusica  Identity Verified
Serbia
Local time: 06:17
Member (2009)
English to Serbian
+ ...
Both May 18, 2010

I can recommend both these programs. They give quite acceptable translations, provided that the original text is not too complicated. Systran also has the option of choosing the "registry", say whether the text you wish to translate is legal, finance, IT...

Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 06:17
Multiplelanguages
+ ...
info comparing Systran, Reverso and other MT products May 18, 2010

Here is very comprehensive overview with links to all the main software review info on these tools.

1) For the professional translator audience, the versions of Systran and Reverso which would interest you are their desktop versions (not the corporate, enterprise level solutions). Both of these are customizable Rule-Based MT tools.

2) Reverso made a significant change in the underlying technology (shifting from PROMT to LEC) between Reverso Pro v5 and then followed by the release of Reverso Translator 10. This is explained in the help thread about Reverso tips which can be accessed via:
http://jeffallen.chez.aliceadsl.fr/MT-tips.html

This is also explained in a post in the Reverso_users Yahoo group

3) all known software reviews of both SYSTRAN and Reverso can be found at (I have written several of such in-depth reviews as an expert on both sets of tools):

Lang Tech Evaluation website v29:
https://www.box.net/shared/jui3vs50cn


4) AsiaOnline is primarily a customizable Statistical Machine Translation (SMT) solution for corporate level enterprise buyers, which has recently offered a special module for allowing freelancers to use the tool and participate in user feedback improvement loops. This is a very recent offer, and I am not aware of any independent reviews of it.

5) both SYSTRAN (v4 onward, currently at v6 for desktop) as well as the PROMT-based Reverso (Reverso v4, v5, and also PROMT v6, v7, v8) have the possibility to deal with specialized domains. PROMT calls it topic templates.

6) for PROMT-based products, there is also a detailed 2-page explanation of the differences between PROMT-based Reverso Pro/Expert v5 and PROMT v6 in the PROMT MT thread (accessible via MT tips page above) and also in a post in the PROMT_users Yahoo group.

7) @Richard:
The SYSTRAN dictionary max entry has changed since desktop v4. You probably had v4 Premium, which was superceded by v5 Professional Premium and now v6 Professional translator. The maximum number of entries per dictionary are provided in the product comparison matrix on the Systran website.

Jeff


[Edited at 2010-05-18 15:55 GMT]


Direct link Reply with quote
 
kenhoo
France
Local time: 06:17
TOPIC STARTER
yes, we need a corporate solution May 19, 2010

because we want/need ALL our colleagues to have an access to the translation solution for quick understanding of document (40% of the employees don't speak French, and (some of) the French ones are not fluent enough in English to fully understand all the nuances/terms of all the texts)

So buying a desktop license for each person would be too expensive and not efficient as we’re continually hiring new people in our Parisian offices. well...that's my first thought when seeing their prices...

And I am the only one who will use, eventually, cat tools to help me in my translation tasks, so this is not at all the main criterion. We're focusing on a solution to have quality translation by a MT, and when the translation result has to be sent outside, it will go through a post-edition by professional translators.

Thanks Jeff, for all this info but unfortunately all the links in your links are dead...

So, do you also know those corporate solution ?

And what is your personal workflow ? You do a MT that you post-edit (with or without CAT Tools)?


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 06:17
Multiplelanguages
+ ...
there are various workflows for MT May 19, 2010

Yes, I do know the enterpise/corporate level solutions too. See my Proz and LinkedIn profiles.
Each vendor has a different range of intranet and server-based systems. They are not always the same.

I just did sporadic checking of the embedded links in the links I sent to you and most do work for SYSTRAN, Reverso and PROMT. it is true I haven't rechecked all embedded links in all the files because that is an endless task to cover everyone else's changed and broken links.

as for MT Postediting, see my MT postediting site at:
MTPostediting website v27
https://www.box.net/shared/s1xhg3eioy

also my Controlled Languages website v6:
https://www.box.net/shared/qug5r1m9tp


There is no single one-size-fits-all processing and postediting cycle. I've implemented many different types of workflows from small desktop solutions to several of the most complex corporate set-ups put into place over the past 15 years.

Much depends on which specific system type you choose, depending on your requirements, and then implementing it based on your needs and resources. You need the solution that best matches your needs, and/or that will allow you to adjust your own needs with the least amount of disruption to existing processes and expectations.
Each system has its pros and cons depending on the analysis of your full set of needs, setup, configuration, etc.

My statement is that all solution set-ups always require postediting and/or pre-processing. Nothing is a simple push-button solution.
The more you do to improve quality at upstream stages reduces the need to do it at downstream level as a post-processing task.

You are based in France. Where? maybe we can talk/meet?

Jeff

kenhoo wrote:

because we want/need ALL our colleagues to have an access to the translation solution for quick understanding of document (40% of the employees don't speak French, and (some of) the French ones are not fluent enough in English to fully understand all the nuances/terms of all the texts)

So buying a desktop license for each person would be too expensive and not efficient as we’re continually hiring new people in our Parisian offices. well...that's my first thought when seeing their prices...

And I am the only one who will use, eventually, cat tools to help me in my translation tasks, so this is not at all the main criterion. We're focusing on a solution to have quality translation by a MT, and when the translation result has to be sent outside, it will go through a post-edition by professional translators.

Thanks Jeff, for all this info but unfortunately all the links in your links are dead...

So, do you also know those corporate solution ?

And what is your personal workflow ? You do a MT that you post-edit (with or without CAT Tools)?


Direct link Reply with quote
 
kenhoo
France
Local time: 06:17
TOPIC STARTER
No problem Jeff May 19, 2010

Jeff Allen wrote:

I just did sporadic checking of the embedded links in the links I sent to you and most do work for SYSTRAN, Reverso and PROMT. it is true I haven't rechecked all embedded links in all the files because that is an endless task to cover everyone else's changed and broken links.


No problem, I know these issues... It is just that all the links I clicked gave me a 404 :/


Direct link Reply with quote
 

renatob
Local time: 00:17
English
We use Asia Online May 21, 2010

Asia Online is a customizable hybrid (rules + SMT) engine.

We have been using it in Milengo since 2009 and currently have several projects under way. Each engine is customized for a specific domain and this helps to focus on quality. The customization process gives us as an LSP the control we need to deliver high quality translations for our projects.

Unlike Systran, there is no limit on dictionary size or data size. In tests we have seen and our own evaluations, Asia Online delivers quality that is focused on the customers specific needs and thus requires less human editing to make it release quality. Asia Online is more suited to specific domains than to a general one-size-fits-all engine. Which might be your case here.

Once key difference with Asia Online is the feedback loop, where as feedback is given, the quality continues to improve. Over time as the quality improves, less editing is required, leading to greater productivity. Asia Online also allows control of style and grammar, as well as vocabulary choice, which none of the rules based solutions such as Systran do. This means we can stylize to our clients needs which again reduces the amount of editing required.

In our experience, with the engines that we have trained with Asia Online, we can provide very good quality gisting and from there to publishing is just a post-editing exercise. We are training more staff in MT post editing.

I recommend that you take a look at it.

Renato Beninatto
CEO and Chief Instigator
Milengo Ltd.


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 06:17
Multiplelanguages
+ ...
rule-based systems can be tuned to grammar, style, immediate feedback, and iterative PE May 21, 2010

renatob wrote:
Unlike Systran, there is no limit on dictionary size or data size. In tests we have seen and our own evaluations, Asia Online delivers quality that is focused on the customers specific needs and thus requires less human editing to make it release quality. Asia Online is more suited to specific domains than to a general one-size-fits-all engine. Which might be your case here.


Renato,

Customized domains are also very possible in Systran, PROMT, and Reverso v4/v5. I have worked with customers on SYSTRAN products who are heavy-terminology creators with hundreds of thousands of terms. It is possible.
I have never run into terminology limits, but then I take an approach of terminology optimization rather than pumping up a system with unnecessary entries.

renatob wrote:
Once key difference with Asia Online is the feedback loop, where as feedback is given, the quality continues to improve. Over time as the quality improves, less editing is required, leading to greater productivity.


renatob wrote:
Asia Online also allows control of style and grammar, as well as vocabulary choice, which none of the rules based solutions such as Systran do. This means we can stylize to our clients needs which again reduces the amount of editing required.


It is very possible to control style and grammar in rule-based systems. I do this all the time and have created very customized projects in very short periods of time (ie, 1 day for a client with specific style needs) with such tools. Some of the rule-based systems even have built in features to handle these needs. As most people don't take training on such systems, they never learn to master the tools in this way. I can usually modify the style and grammar as desired.

The feedback loop in Systran, PROMT and Reverso, and many other rule-based systems is faster than for SMT-based systems. It is possible to have "immediate" improvement in the system (within 30 seconds) rather than waiting to retrain the system, as is the case for SMT systems in general.

renatob wrote:
In our experience, with the engines that we have trained with Asia Online, we can provide very good quality gisting and from there to publishing is just a post-editing exercise. We are training more staff in MT post editing.


I have been doing the same with several Rule-based MT systems with iterative improvement phases for many years. All the timeframes of each phase of project are indicated in my case studies available online.

Jeff


Direct link Reply with quote
 

Dion Wiggins
Local time: 12:17
English to Thai
Response to Jeff on Asia Online comments May 22, 2010

Jeff Allen wrote:

Customized domains are also very possible in Systran, PROMT, and Reverso v4/v5. I have worked with customers on SYSTRAN products who are heavy-terminology creators with hundreds of thousands of terms. It is possible.
I have never run into terminology limits, but then I take an approach of terminology optimization rather than pumping up a system with unnecessary entries.


Jeff, yes they are possible, but not to the same degree. For example, we recently added 400,000 technical terms - all unique, non ambiguous, to one of our custom translation engines in the patent domain, which resulted in terminology accuracy for patents in the high 90's. This is not a case where terminology optimization/normalization applies, each term is unique. This simply is not possible in Systran - which was also evaluated by the same customer prior to awarding the contract. With respect to limits, please see Systran's own documentation on their website, which explicitly states their size limits for dictionaries and other customization. The level of customization offered by Asia Online is currently unique. For example, consider the following 2 domains in the same language pair (most of the training data is also the same), ES->EN:

Spanish Original Before Translation:
Se necesitó una gran maniobra política muy prudente a fin de facilitar una cita de los dos enemigos históricos.

Business News Domain After Translation:
Significant amounts of cautious political maneuvering were required in order to facilitate a rendezvous between the two bitter historical opponents.

Children’s Books Domain After Translation:
A lot of care was taken to not upset others when organizing the meeting between the two long time enemies.

The goal with Asia Online is not to deliver a good translation, its to deliver a translation that requires the least amount of human edits in order to publish. If the client was Economist, they would not be happy with the Children's Books domain, they would want the Business News domain. But Asia Online can control the style even further and offer different business news styles, if the client was Economist, Forbes or New York Times, each would have a different style specific to their style guidelines. The effort put into customization up front, reduces the effort of human post editing later.

Jeff Allen wrote:
It is very possible to control style and grammar in rule-based systems. I do this all the time and have created very customized projects in very short periods of time (ie, 1 day for a client with specific style needs) with such tools. Some of the rule-based systems even have built in features to handle these needs. As most people don't take training on such systems, they never learn to master the tools in this way. I can usually modify the style and grammar as desired.


We have done comparisons with LSPs who have been building Systran customizations for multiple years, are highly skilled and have full training - some even train others. Even if we choose an easy language such as EN-ES, which Systran is strong in, our first engines out the gate before improvement processes begin are vastly higher quality and much closer to the output style the client requires in order to publish. 3 weeks turn around to deliver better quality than Systran customizaiton of 2+ years by an LSP, some who are major Systran advocates with extensive training.

Asia Online offers free measurement tools that you can compare quality with, using both human measurement techniques such as SAE J2450 or LISA QA Model, as well as automated techniques such as BLEU, F-Measure and TER. We encourage measurement to all our customers and they often compare engine output from their customized Systran systems, Google, Asia Online etc. You would be surprised how often Google beats even a custom rules system that has had several years work by professionals who know how to customize.

Jeff Allen wrote:
The feedback loop in Systran, PROMT and Reverso, and many other rule-based systems is faster than for SMT-based systems. It is possible to have "immediate" improvement in the system (within 30 seconds) rather than waiting to retrain the system, as is the case for SMT systems in general.


You are correct, Asia Online does not do feedback improvements in 30 seconds. We take a few hours and batch improvements together. Asia Online does not require a full retraining, we do incremental training for improvements. But our improvements are based on actual editing feedback (i.e. the edits made by translators/post editors), not dictionary work. This means you don't need linguists, you need translators/post editors.

Asia Online has an approval process, which maps to the LSPs approval process, so that only edits approved by the proof reader or editor are submitted for approvals. The 30 seconds you refer to does not take into account the work that the linguist had to do to determine the error and come up with the dictionary terms. It also does not take into account the time spent on training that you mention that there is a lack of. With Asia Online, the improvements are part of the standard translation process, training is not required in order to get improvements, it comes directly from the translators/post editors edits to the machine translation output. Asia Online's feedback loop is designed to map directly to the same feedback loop, processes and workflow that LSPs use for human translation.

Jeff, Asia Online is a hybrid rules and SMT system and uses both extensively - many modern MT systems are. It is important to distinguish between vanilla SMT and a more evolved platform. Asia Online has offered you the opportunity to try our tools multiple times, but you have yet to take us up on the offer, an thus have never tried our technology. SMT from even a couple of years ago is not the same as modern SMT today. Many vendors have taken it well beyond what you are frequently describing. Again, I invite you the opportunity to pilot our systems, update your knowledge and see for yourself.

Regards

Dion Wiggins
CEO
Asia Online


[Edited at 2010-05-22 02:23 GMT]

[Edited at 2010-05-22 02:31 GMT]

[Edited at 2010-05-22 02:32 GMT]


Direct link Reply with quote
 

Kirti Vashee  Identity Verified
United States
Local time: 21:17
Run Time Glossary on Asia Online May 25, 2010



Jeff Allen wrote:
The feedback loop in Systran, PROMT and Reverso, and many other rule-based systems is faster than for SMT-based systems. It is possible to have "immediate" improvement in the system (within 30 seconds) rather than waiting to retrain the system, as is the case for SMT systems in general.



There is also a concept of a run time glossary at Asia Online that can give you an immediate impact on the MT output and technically if you wanted to just show how a particular phrase would be translated a certain desired and specific way, you could do this in 30 secs. But is this really the point?

However, to develop robust and continuously improving MT engines this term over-ride strategy is one with very short term rewards. It is usually much more useful to do this more carefully so the system continues to evolve at a more fundamental level. This is more akin to developing a good database architecture so that it works well with common queries and requests on an ongoing basis.

Many of your assertions suggest that you do not have very much experience with a current SMT platform. I think that you would find that the "benefit to user corrective action ratios" are much higher with a good SMT platform like Asia Online. Also, we have interacted with some of the most experienced and MT savvy companies in the world, and many of them have gone through a moderately successful RbMT phase - and most see the quality and robustness of SMT as a clear step forward.

Google and Microsoft also used heavily customized Systran engines but opted for the SMT approach and you can see today that they both improve at a pace that was never possible with their dictionary focused approach on Systran. Even IBM now seems to have admitted that 35 years of the RbMT is quite some distance behind their RTTS SMT capabilities.

I think RBMT is a more comfortable approach (paradigm) for many language industry professionals but as SMT tools become more linguistically oriented, I think we will see more acceptance by LSPs as they begin to understand that there is a clear and important linguistic feedback role with hybrid SMT platforms like Asia Online. e.g. Milengo is rapidly developing the expertise on this platform and will have a much more strategic asset and capability at the end of developing an engine than a single project oriented tool. I have noticed of late that RbMT seems to have a strong focus on projects while SMT tends to be more like setting up a production line that can be continually refined and improves with each cycle of use.



[Edited at 2010-05-26 17:37 GMT]

Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 06:17
Multiplelanguages
+ ...
Jeff replies to Asia Online about MT Jun 25, 2010

Dion and Kirti,

Let's put this thread back into the context of the original post.

* A request was made about several different specific MT software/systems

I only commented on what 2 of the 3 specifically mentioned systems actually do offer. Nowhere in my reply above did I say that general SMT systems or even customizable SMT systems (like Asia Online) are bad, inappropriate or useless. And in fact, the only mention of Asia Online in one of my posts was a generic statement:

Jeff Allen wrote:
AsiaOnline is primarily a customizable Statistical Machine Translation (SMT) solution for corporate level enterprise buyers, which has recently offered a special module for allowing freelancers to use the tool and participate in user feedback improvement loops. This is a very recent offer, and I am not aware of any independent reviews of it.


I don't think that this single short statement (which is based on everything that you both have posted on various discussion forums) compared with in-depth facts about other systems that I have explained in this thread, warrants the title of Dion's post which clearly implies that I had specifically discussed Asia Online in detail in my post. I did not do that at all.

* know the audience:

How does your level of participation in ProZ line up with familiarity with this specific audience, the exact specific needs of certain members, their workflow, their configurations?

And your profiles:

Dion, your profile says you have been registered on ProZ since Dec 2006, profile content is completely empty, no information on number of months or years of experience in translation.
A quick search (on both your alias and your ID) on the number of your posts on ProZ bring up the single post above for 3 1/2 years of time. And 0 questions asked or answered on the site.

Kirti, your profile at least contains information, registered on ProZ since July 2009 (so just year now) with total of 17 posts. And 0 questions asked or answered on the site.

So, a combined total of under 20 posts in a combined total of 4.5 years of membership.

Yet, I have been an active member of this specific site since Oct 2004 (5.5+ years) with 600+ postings of which nearly 350 appear in searches simply on the keywords "machine translation tm cat". These posts provide tangible explanations and replies about translation technologies in general, as well as about specific systems. To my knowledge, every question directed at me has been answered. I have participated in nearly every thread that has appeared in this MT forum.
I claim to know the audience in the threads, because I have done their jobs, have attended ProZ pow-wows, have met ProZ site users in person with whom there have been previous discussion forum exchanges, and also know many regularly posting ProZ members by name, their specific language pairs, their fields of expertise, and in some cases their specific operating system and configuration needs.

And I am a "verified site user" via official ProZ identity check by a person with experience working in the MT industry.

* my statements are based on certified, recognized expert level (not just "so-called trained"), real practical, first-hand experience on using, testing, evaluating, deploying over 30+ released-to-customer, major and minor versions of several product brands that were formally released (+ many build level versions). For several of these products, I hold official expert certification with documented proof that I have mastered a set of specific criteria necessary for customer facing training and deployment of a system to users.

* My statements of these systems are also based on a set of proven, published, software reviews which were "independently conducted" of affiliation with the marketing and sales efforts of those vendors, and based on benchmarked datasets which are used to compare apples with apples. Others MT users have used my methods and have made recommendation statements on my LinkedIn profile about the efficiency and productivity that was attained in their daily professional translation work.

This is a bit different from your stance of being a product/service vendor with implicit vested interest in your offer, which of course means it is biased. Any of the members of this forum would say the same of representatives coming from even the TM / CAT solution providers.

* Renato made an independent recommendation of your toolset. This is good, because it did respond to the original question of the thread. His statements said that SYSTRAN and other RBMT systems cannot do X or Y, and I simply pointed out the inaccuracies of these statements by providing exact info on functionality and features that actually do exist in "several different" RBMT systems which can allow users to achieve the expected results.

But I ask again as I did above in this thread, can you point to independently conducted software reviews of Asia Online which correspond specifically to freelance professional translator needs with regard to project sizes and expectations?

The types and sizes of projects corresponding to the present audience are clearly indicated in the following link at a LinkedIn thread in which you have participated, where I specifically cited a series of polls here in ProZ which reflect the needs of this audience with a significant amount of participation per poll.
https://www.box.net/shared/ags9cp49o1

Now to a few specific points in your own statements which are less than accurate:

- terminology optimization/normalization

dionwiggins wrote:
terminology accuracy for patents in the high 90's. This is not a case where terminology optimization/normalization applies, each term is unique.


MT optimization is not harmonization at all. These are two completely different methods which can be applied independently or can be combined. These are explained in published online case studies and in posts. If anyone decided that they needed to create 400,000 unique terminology entries in a high-end RBMT system, I would seriously question the level of knowledge transfer training of the participants which was intended to prepare them for the deployment, and I would request an audit and evaluation of the methods used. It is exactly because of this that I put into place optimization techniques, in order to reduce the risk of over-engineering an RBMT dictionary, and it came from having worked in very highly specialized terminology context, and for which figures have been published about the terminology work of that project.
I have also worked on very big projects with massive terminology needs and have helped put into place terminology harmonization and standardization processes for some cases, and in other cases used other techniques to achieve similar results without requiring the standardization upfront or midstream.
But you are conflating two completely different methods (optimization and harmonization) which can be performed separately.
Do you or anyone at Asia Online have experience on these specific techniques? I might be wrong, but it seems that you are claiming that none of this is possible with RBMT systems without having ever tried to do it with such systems. Optimization and standardization with RBMT systems (like Systran, PROMT, Reverso) has been done before both separately and combined, and has been done successfully. This might not have been achieved by the customers or LSPs with whom you have been talking. But it does not mean that this has never been done. So, maybe you need to talk some with those who have done it successfully.

- SYSTRAN terminology size limits

dionwiggins wrote:
please see Systran's own documentation on their website, which explicitly states their size limits for dictionaries and other customization.


I know that information very well because I participated in the writing of that source content for v6, and I translated all of the marketing brochures, boxes and website content for v6 using MT, MT dictionaries, and MT postediting with excellent productivity times and linguistic quality results.

- customization

dionwiggins wrote:
... dictionaries and other customization. The level of customization offered by Asia Online is currently unique.


You are again using the same term to mean different things. Customization with RBMT tools is a very different type of method than what is done within customizable SMT engines. They are conducted in different ways, with different techniques, to achieve different types of intermediate deliverables, before these are applied to different types of processing stages. Even the SYSTRAN hybrid approach, explained in a demo and Q&A session on 3 Dec 2009, uses a modified customization technique with regard to their RBMT-only approach for v6 and lower.

I have even developed techniques to convert RBMT type terminology customization entries to correspond to current SMT customization models.

- LSP(s) with Systran customization experience

dionwiggins wrote:
We have done comparisons with LSPs who have been building Systran customizations for multiple years, are highly skilled and have full training - some even train others.


You mention that there are LSPs who have used SYSTRAN, but I don't see any indication as to you having done it. So, how could you know if this is possible or not if at best your knowledge of the SYSTRAN system toolset is only 2nd hand (taken from that you state here that LSPs had done it) and likely even 3rd hand because many LSPs use independent freelancers who are not full-time internal employees.
How do you really know that they had reliable training and verified checks on the skills that are found in those who do achieve successful results?

- terminology timeframes

dionwiggins wrote:
The 30 seconds you refer to does not take into account the work that the linguist had to do to determine the error and come up with the dictionary terms. It also does not take into account the time spent on training that you mention that there is a lack of.


kirti vashee wrote:
There is also a concept of a run time glossary at Asia Online that can give you an immediate impact on the MT output and technically if you wanted to just show how a particular phrase would be translated a certain desired and specific way, you could do this in 30 secs. But is this really the point?
However, to develop robust and continuously improving MT engines this term over-ride strategy is one with very short term rewards. It is usually much more useful to do this more carefully so the system continues to evolve at a more fundamental level. This is more akin to developing a good database architecture so that it works well with common queries and requests on an ongoing basis.



Well, it really doesn't take that much extra time at all, and all of my examples of dictionary creation work are "documented" in published case studies, and I give all the time factors, "including the identification and research phases". So it might do you some good to go read them.
And all the 30-second add-ins actually do matter, and are quite pertinent. This is not just simply a small-scale, short-term, over-ride strategy, but also part of an overall, scalable strategy. If you haven't done it, then it may only superficially seem like an over-ride. If done well, it is an approach that brings the least amount of disruption to existing translation cycle processes in place, and which can also be implemented in a phased approach with SMT systems of whichever flavor.


- Project vs production line:

kirti vashee wrote:
RbMT seems to have a strong focus on projects while SMT tends to be more like setting up a production line that can be continually refined and improves with each cycle of use


There does not need to be such a distinction. All of my papers have shown how the tasks are part of a full production process and how projects (small, big, single, multiple, recurrent, etc) fit into that.
However, on clear need expressed by users in this overall forum is how do "translation tools" in general fulfill the needs of the professional translators who are working on projects. This is all over the TM/CAT tools forum areas.


* time to do the dictionary work

Well, here are the exact statistics in terms of numbers, time, which come from my published case studies:

JeffAllen-Is-MT-dict-worth-effort-AutomatedTranslation-LinkedIn-Oct-Dec2009.doc
https://www.box.net/shared/1ut5q888oe

and I provided a clear written disclaimer as follows about how it was done :

"The manual approach to corpus analysis was intentional."

&

"The upfront text analysis and dictionary development can be conducted in a short period of time, even conducted manually without the help of (semi-)automated terminology extraction tools."

This full end-to-end process of the dictionary customization work was contained in these timeframes.

This was done to clearly show that any human translator can do this (without any computational skills) by just using MS Word, MS Excel and the MT tool. And it is very possible to significantly automate the tasks of terminology identification, selection and batch processing, which I have done on other projects and am planning for other new projects of 1+ Million words.

* training

Well, the amount of training time is quite negligeable (a few hours for overview, good getting started skills and then advanced skills) and is transferable knowledge from one RBMT system to another, and also to KBMT and EBMT systems, and even to current SMT customization systems when using special repurposing techniques.

Last month I trained someone in 1-hour on the overview with some basic techniques. I have previously trained a number of people on basic and advanced modules in less than 3 hours, including review and feedback comments of their dictionary entry work. See my
Again, those who have been trained by these techniques have openly stated how the rapid knowledge transfer and mentoring helped them achieve excellent productivity, and sometimes they even publish examples of their results.
My LinkedIn profile contains recommendations from a range of freelance and in-house translators, localization production engineers, translation account managers. The proof is in the pudding. It's that simple.

- objective of translation

dionwiggins wrote:
The goal with Asia Online is not to deliver a good translation, its to deliver a translation that requires the least amount of human edits in order to publish.


Well, I'm not really sure how such a statement corresponds to how the present audience perceives translation.

My goal is to show translators and bilingual speakers of all different levels of experience how they can achieve different levels of translation deliverables of different usable levels of quality (draft, 1st level edit, TEP quality), with the least amount of overall work across the entire lifecycle of the project in terms of research, analysis, thinking, clicks, keystrokes, to achieve the highest level of volume + efficiency + quality which correspond to the requirements of the end customer, and irrespective of how big the project is (1000 words or 2 Million words).
And every entry is evaluated (which takes very little time) with regard to its value of improving the translation resulting the next time when ones pushes a button called "translate", which can even be 5 minutes later.

- what do you mean by "linguist"?

dionwiggins wrote:
This means you don't need linguists, you need translators/post editors.


Your use of the term linguist is not clear here. The audience on this list are professional translators who actually do refer to themselves as linguists. Just do a search in the ProZ discussion forums on the word "linguist"

If you mean that one needs in-depth theoretical knowledge of linguistics with regard to specific theories of syntax and semantics, this is not a prerequisite at all for doing anything I am talking about concerning working with any released version of a commercial RBMT system. I can teach my dad (a medical technologist) and my sister (a sociologist) these principles and they can also immediately apply them.

If you mean that one needs to be a "computational linguist", then again this is not a prerequisite. All a user needs is a basic text editor, the RBMT tool, and it helps to have a tabular spreadsheet editor like Excel (or a clone application, as mentioned in one of my case studies on dictionary building). It is possible to add the skills of computationally focused people (linguists or not) to help (semi-)automate specific processes in order to improve the workflow, but this is not required. So, the methods I have developed are flexible, and do not require any such skills if the user or user group does not want to hire people with such highly specialized skills, but they can if they want to.

- mentioning RBMT system does not mean anti-SMT

It's not because I make a comment in a post about what an RBMT system "can" do that this should be interpreted that I am making a statement "against" SMT systems in general or against any specific SMT system. I simply answered a question and clarified some misunderstandings about what RBMT systems can really do. This is not from just having read some marketing brochures or having attended a demo session at a conference. It is from having personally used the tools, from having deployed them in many different contexts, from having trained others on a wide variety of system types, brands and versions on a regular basis for over 15 years.

Regular posters in these ProZ threads can testify that I have described and spoken in favor of SMT systems in discussion forums here on this site for nearly a decade.

Please recall that I had stated this same fact to you already in:

page 1-2 of:
JeffAllen-SMT-not-always-appropriate-for-all-projects-AutomatedTranslation-LinkedIn-15April2010.doc
https://www.box.net/shared/v801hv74sh

I was talking about MT on professional translator lists before either of you probably knew what SMT was, and maybe even MT in general. But not much stated here in the way of "thanks for having spoken in favor of SMT systems for so many years on this list, in order to pave the path for us to describe our customized SMT technology".

Again, nowhere have I said that one type of system or approach is better than another. I look at the need of the person who is asking the question, their context, their existing processes, the different types of tools and processes which can quickly adapt to existing internally used tools and process (thus being as least disruptive as possible), etc.


- testing the Asian Online solution

dionwiggins wrote:
Asia Online has offered you the opportunity to try our tools multiple times, but you have yet to take us up on the offer, an thus have never tried our technology. SMT from even a couple of years ago is not the same as modern SMT today. Many vendors have taken it well beyond what you are frequently describing. Again, I invite you the opportunity to pilot our systems, update your knowledge and see for yourself.


Well,

1) you might recall that I did in fact already reply to you on the same statement you made elsewhere on this topic.

same link as above, but pages 2-3:
JeffAllen-SMT-not-always-appropriate-for-all-projects-AutomatedTranslation-LinkedIn-15April2010.doc

2) As an experienced and professional software and system tester (and having managed teams of professional testers in several software companies), I don't conduct testing of translation software/systems in an ad-hoc way. All of my software reviews over the past decade have appeared in peer-reviewed magazines. As these software reviews intend to appear in such reputable publications (eg, for TM tools, for Localization QA software, for Authoring tools, and for several other MT software tools), I always arrange the testing through a contact at the magazine, set up the conditions of the review cycle, the tool access, the duration of testing and review write-up, the bug reporting and enhancement request process, the support SLAs for my requests, and many other items, via the magazine. My reviews are benchmarked on a set of baseline content and a item checklist, so this takes an average of 20-40 hours of my time to perform.

I have already considered the idea of testing Asia Online, yet I provided reasons for not doing it yet in my reply cited above.

So, I encourage you to get informed by talking to magazines and journals about their software review process and/or with the other translation tool vendors (who regularly have their tools evaluated in this way) about their experience and what is necessary to put into place to have their tools reviewed by such in-depth professional testers.

Since there are several other translation-related product reviews on the planning board for my time (2 were requested before my Haiti Disaster Relief projects started, and 2 since), they all get priority for any free time that I may eventually be able to dedicate to such evaluations.

Jeff



[Edited at 2010-06-25 22:21 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Systran vs reverso vs asia online - Feedback needed

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search