Several Questions for MT users
Thread poster: NR_Stedman
NR_Stedman  Identity Verified
France
Local time: 09:58
French to English
Nov 11, 2010

I have been battling with Systran Premium for many years. It can increase my output by up to 20 or 30 % but certainly nothing like the 90 % claimed by someone here for Prompt. Could someone else confirm this vast superiority of Prompt??

I find that despite using several giant customised dictionaries I am not getting an overall better translation than Google translate. The mistakes are different, with better technical compliance with Systran but worse overall translation quality. Obviously despite GT being incorporated in Trados studio 2009, I am not using it for confidentiality reasons. Could someone tell me if the corrected target segments are uploaded back into Google translate from Trados studio for future use?? They may not be doing this at the moment but will this happen in the future?

At the moment it seems that you can't use either Prompt or Systran with Trados studio 2009. This doesn't really matter as studio 2009 is such a bug-ridden piece of software anyway!

Concerning Systran: making customised dictionaries is EXTREMELY time consuming and not always effective. Systran sometimes overrides your personal choices: for instance it often insists on translating the French word "dose" as "amount" in English or "tissu" as fabric despite correct entries in my customised dictionaries. It does not translate words before a comma or semi-colon correctly. The Systran dictionaries often give hopeless translations of very simple words obliging you to enter thousands of extra words in your customised dictionaries. Google is far better at distinguishing between nouns and verbs, puttting adjectives before nouns and translating compound nouns.

To get the edge back from Google it is essential for Systran and Prompt to devise a system for rapidly entering a word and all its variants into the customized dictionary with accurate grammatical coding. The context feature could be vastly improved to allow you to enter lots of contextual info.

The best strategy is to enter words from the document into the dictionaries as you translate rather than, as someone suggested in this forum, enter them "en masse" at a later date. This is because documents always have their own very specific vocabulary and lots of the words you enter will basically only be useful for the current document.

Thanks for any feedback


Direct link Reply with quote
 
Claudio Porcellana  Identity Verified
Italy
confidentiality reasons? Nov 11, 2010

and why?

if you use GT with SDL or Wordfast, for example, you send to Google ONLY the source, exactly as you do everyday googling for data, and if a Google research is not forbidden by your NDA, there is no reason GT use is forbidden

if you use the Translator's toolkit, you can:
choose using the Google general memory, so may be feeding it with your translation
OR
choose using your own memories, that you decide NOT SHARING using the relevant option, so NOT feeding AT ALL the Google SMT with your translation

this is what I understood, and tested, reading rules and using both tools

and about the test, it was made translating in the Translator's toolkit an original sentence, and checking every now and again if the translation appeared as a 100% match

BTW, if tou don't encrypt your emails, forget that they will not observed by anyone ...
;-D

thanks anyway for sharing your Systran experience, as I'm thinking on buying some commercial MT, but I'm still quite doubtful


I think that Systran is a rule based MT, while GT is a statistical one, so taking advantages on zillions of sentences: this can explain differencies in performances and usability

Claudio

[Modificato alle 2010-11-11 18:13 GMT]


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 09:58
Multiplelanguages
+ ...
yes, confidentiality still applies even for source Nov 12, 2010

Claudio Porcellana wrote:

confidentiality reasons?

and why?

if you use GT with SDL or Wordfast, for example, you send to Google ONLY the source, exactly as you do everyday googling for data, and if a Google research is not forbidden by your NDA, there is no reason GT use is forbidden

if you use the Translator's toolkit, you can:
choose using the Google general memory, so may be feeding it with your translation
OR
choose using your own memories, that you decide NOT SHARING using the relevant option, so NOT feeding AT ALL the Google SMT with your translation



Claudio, for the past 15 years, whenever I use any online search or translation facility, I always mask/hide any parts (names, product names, etc) of the source text submission that could be memorized and reappear. There are ways to do this quickly and efficiently.

The confidentiality clauses of the NDA do cover Google searching too because using any search engine which retains the query is in fact diffusing the data. It's how you do it which maintains confidentiality or not.

If you submit source content to GT (via any means, including with plug-ins within other tools) for it to translate for you, and they state that anything you submit is retained in their archives, then you are in fact releasing the data.
Are you releasing parallel source and target finished translations? No

Are you releasing the source content which is the basis of the confidentiality to start with: Yes

I have heard second-hand stories of companies (via inhouse or external translators) which have used google to translate their entire manuals for brand new products and when doing google searches independently, the content about brand new confidential features of the product line came through internet search queries. I'm checking the validity and source of that information, which becomes a very important case in point on this topic.

I simply use a variety of locally installed desktop MT applications, and only use GT or other online for languages which are not represented by the ones I've got on hand (and I've got world pack versions). And when I do use GT or other tools, I'm very careful about obfuscating the information especially with regard to people names, product names, company names and other trademark and copyrighted related info.

Jeff


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 09:58
Multiplelanguages
+ ...
comments on SYSTRAN and PROMT Nov 12, 2010

NR_Stedman wrote:

I have been battling with Systran Premium for many years. It can increase my output by up to 20 or 30 % but certainly nothing like the 90 % claimed by someone here for Prompt. Could someone else confirm this vast superiority of Prompt??


PROMT (aka ProMT) (Prompt is actually a different thing, being the MT postediting method used by the company CLS)

I've claimed up to 250-300% productivity gains in some projects and have made many stats available in the published project case studies. Lorena Guerra in his MA thesis and related articles indicated something like 200%. Her thesis is and papers are online.


NR_Stedman wrote:

I find that despite using several giant customised dictionaries I am not getting an overall better translation than Google translate. The mistakes are different, with better technical compliance with Systran but worse overall translation quality. Obviously despite GT being incorporated in Trados studio 2009, I am not using it for confidentiality reasons. Could someone tell me if the corrected target segments are uploaded back into Google translate from Trados studio for future use?? They may not be doing this at the moment but will this happen in the future?


It's all in the methodology used to make the dictionary entries. I have different methodologies for working on dictionaries for rule-based MT software vs for statistical based MT. They are both easy to learn and keep separate when working with one or another.

NR_Stedman wrote:

At the moment it seems that you can't use either Prompt or Systran with Trados studio 2009. This doesn't really matter as studio 2009 is such a bug-ridden piece of software anyway!


NR_Stedman wrote:

Concerning Systran: making customised dictionaries is EXTREMELY time consuming and not always effective. Systran sometimes overrides your personal choices: for instance it often insists on translating the French word "dose" as "amount" in English or "tissu" as fabric despite correct entries in my customised dictionaries.


The PROMT-based software (v3-v5 of Reverso) and PROMT v3 onward now up to v9, have always had the the ability to create translation "alternatives" or "variants". Starting with version 4 there was the ability to show and hide the variants. My review of v5 of Reverso stated that the show/hide feature had disappeared and nothing was documented about how to use it. Later after the review, the director of Reverso showed me that it did in fact still exist, but was enabled elsewhere. However, I challenged him that if the feature is not documented, and not shown to users, then the documentation bug leads to belief that there is a software regression. It doesn't matter what the intention is, but if the information is communicated or not for the user audience to be able to master the new implementation. I believe they fixed the documentation fairly quickly about that.

NR_Stedman wrote:
It does not translate words before a comma or semi-colon correctly.


There are ways to deal with the punctuation translation in Systran, but it does require a lot of twiddling around and creating special rules in the custom dictionary for it to work well.

NR_Stedman wrote:
The Systran dictionaries often give hopeless translations of very simple words obliging you to enter thousands of extra words in your customised dictionaries. Google is far better at distinguishing between nouns and verbs, puttting adjectives before nouns and translating compound nouns.


Again, it's the methodology. I deal a lot with compound nouns and do all possible to avoid putting the entire compound as full entry. My 2006 article on dictionary building explains this.
Some upcoming webinars on ProZ will also cover this.

As for the use of SYSTRAN and PROMT, in my software reviews of SYSTRAN , Reverso and PROMT (available at: http://www.allenkeys2languages.org/language-technology-evaluation/), one of the key features that I have tested from version to version was about custom dictionary entries overriding the built-in standard dictionary, and appearing in the translated text. I wrote about this in SYSTRAN v4 review and mentioned up on a thread on Xing/OpenBC in 2006 or 2007 that the feature had in fact not been fixed in v5 according to my bug reports to SYSTRAN development about v4.



NR_Stedman wrote:

To get the edge back from Google it is essential for Systran and Prompt to devise a system for rapidly entering a word and all its variants into the customized dictionary with accurate grammatical coding. The context feature could be vastly improved to allow you to enter lots of contextual info.

The best strategy is to enter words from the document into the dictionaries as you translate rather than, as someone suggested in this forum, enter them "en masse" at a later date. This is because documents always have their own very specific vocabulary and lots of the words you enter will basically only be useful for the current document.



I've posted info about a special free tool for Systran dictionary uploading here:

T-Manager Terminology tool for importing glossaries into Systran MT
http://tech.groups.yahoo.com/group/SYSTRAN_users/message/153

This can be used for simple single term imports as well as batch imports.

Hope that helps.

Jeff


[Edited at 2010-11-12 08:06 GMT]


Direct link Reply with quote
 
NR_Stedman  Identity Verified
France
Local time: 09:58
French to English
TOPIC STARTER
How sure are you about this Jeff? Nov 12, 2010

[quote]Jeff Allen wrote:
Are you releasing parallel source and target finished translations? No

I was quite surprised to find google translate so closely integrated in SDL Trados 2009 despite these confidentiality issues. The GT translation is automatically loaded into the target segment. How sure are you that your corrected target segment is not sent back when you move on to the next segmant?

SDL seems to have deliberately prioritized GT over locally installed MT applications. Why??


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 09:58
Multiplelanguages
+ ...
It all depends on the MT + TM set-up Nov 12, 2010

Jeff Allen wrote:
Are you releasing parallel source and target finished translations? No


NR_Stedman wrote:
I was quite surprised to find google translate so closely integrated in SDL Trados 2009 despite these confidentiality issues. The GT translation is automatically loaded into the target segment. How sure are you that your corrected target segment is not sent back when you move on to the next segmant?

SDL seems to have deliberately prioritized GT over locally installed MT applications. Why??


It all depends on the specific TM + MT combination that is being used. In my ProZ webinars on MT, I give a list of many examples of TM and MT combinations and the ins and outs of each type, especially in the case of adapters/plug-ins. And there are even more that I hear of each week.
It all comes down to what each tool providers says they do with regard to processing of content that is submitted online directly or indirectly.

I've seen in the Wordfast forum a clear statement on how they do it. For the case of Adapters, this always requires 2 locally installed software programs (unless they start allowing web interface versions do the same) to talk together to transfer the content for processing.

As is the case for all of them, the APIs for GT are very easy to obtain and are documented to be used. APIs for the commercial MT vendor companies are not give-aways, are usually licensed are custom adapters, are part of SDKs (software development kits), and/or are part of enterprise level MT solutions which requiring linking the MT systems with other 3rd party applications like Translation Memory (TM) tools, Globalization/Translation Management Systems (GMS/TMS), terminology systems, source content authoring systems, etc. So it costs money (in some way or another) to get the commercial system APIs.

With SDL's recent acquisition of Language Weaver (a commercial Statistical MT system), it is interesting to note that all the information on their previous MT system (Transparent) is very hard to find or maybe no longer existant. No information anywhere any longer about the desktop version (Autotrans) of that Transparent MT system.
But it is understandable (I didn't say reasonable) that SDL would not focus on creating and offering adapters to other 3rd party commercial MT systems because they have had their own MT system (Transparent) as part of their solution portfolio since 2001). It was integrated into their workflow at different times and has always been part of their KbT (Knowledge Based Translation) offer to customers since 2004.

Their recent investment of acquiring Language Weaver might be changing that focus.

Jeff


Direct link Reply with quote
 
NR_Stedman  Identity Verified
France
Local time: 09:58
French to English
TOPIC STARTER
Thanks Jeff for your exhaustive answers Nov 12, 2010

And I will look into the Systran dictionary building tool.

Direct link Reply with quote
 
Claudio Porcellana  Identity Verified
Italy
Several Questions for MT users Nov 12, 2010

thanks for the explanations Jeff

I feeded some original sentences into the Google Translator's toolkit, choosing the Google general memory, but I can't find these translation so far
this is why I still have some doubt that translations are really acquired by them

furthermore, I signed very few NDAs and no one of them quoted Google (or similar research tools) as forbidden, but I'll take your advice into account anyway


so, can I ask you what is the best MT for Wordfast 2.4.1 Pro?

for "best" I mean: easy to use, easy to feed,
that I can buy without bleed myself dry


I tried the Wordfast manual but it's very poor about this topic
(and other topics too)

Claudio


Direct link Reply with quote
 
Claudio Porcellana  Identity Verified
Italy
Several Questions for MT users Nov 12, 2010

BTW Jeff

even Google Desktop and similia can generate a lack of confidentiality, according to your reasoning?

Claudio


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 09:58
Multiplelanguages
+ ...
I don't use Google desktop Nov 12, 2010

Claudio Porcellana wrote:
even Google Desktop and similia can generate a lack of confidentiality, according to your reasoning?


and thus one of the reasons why I don't use Google Desktop and the likes, but only those that are locally installed and run locally.

Jeff


Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 09:58
Multiplelanguages
+ ...
links to posts about MT and Wordfast Nov 12, 2010

Claudio Porcellana wrote:

so, can I ask you what is the best MT for Wordfast 2.4.1 Pro?

for "best" I mean: easy to use, easy to feed,
that I can buy without bleed myself dry

I tried the Wordfast manual but it's very poor about this topic
(and other topics too)


Claudio

The following link explains a bit of the situation for WF Pro
http://www.proz.com/post/1121165#1121165


also see the following links for the integration of MT software and systems with WF Classic v5
http://www.proz.com/post/1122473#1122473
http://www.proz.com/post/1228178#1228178
http://www.proz.com/post/1370894#1370894

Jeff


Direct link Reply with quote
 
Claudio Porcellana  Identity Verified
Italy
Several Questions for MT users Nov 12, 2010

thanks Jeff

but these links as old and a lot of water has flowed under the bridge since last year ...


on the other hand, the last link
http://www.proz.com/forum/wordfast_support/162649-how_can_wordfast_be_linked_to_most_machine_translation_packages.html#1370894

leaves many doubts open

anyway, I'll ask to Milan that I knew a short time ago

Claudio


Direct link Reply with quote
 

IT Pros Subs
Italy
Local time: 09:58
Member (2005)
English to Italian
+ ...
MT and confidentiality Aug 26, 2011

Hallo Jeff, I've read a message you posted here last year. I am aware of the confidentiality issues with MT engines like google. This is why I have never used it and have disabled all MT options in my CAT tools. I was wondering. You are mentioning Systran as a good alternative. Let me know if I understand correctly: Does the engine in this case run only on the local computer without sending data to the Web? Do the manufacturers put this in writing somewhere in the manuals or related documents?

Thank you very much in advance for your help

Monica

Jeff Allen wrote:

Claudio Porcellana wrote:

so, can I ask you what is the best MT for Wordfast 2.4.1 Pro?

for "best" I mean: easy to use, easy to feed,
that I can buy without bleed myself dry

I tried the Wordfast manual but it's very poor about this topic
(and other topics too)


Claudio

The following link explains a bit of the situation for WF Pro
http://www.proz.com/post/1121165#1121165


also see the following links for the integration of MT software and systems with WF Classic v5
http://www.proz.com/post/1122473#1122473
http://www.proz.com/post/1228178#1228178
http://www.proz.com/post/1370894#1370894

Jeff



Direct link Reply with quote
 

Jeff Allen  Identity Verified
France
Local time: 09:58
Multiplelanguages
+ ...
Systran MT offers/configs Aug 28, 2011

Monica Paolillo wrote:
I am aware of the confidentiality issues with MT engines like google. This is why I have never used it and have disabled all MT options in my CAT tools. I was wondering. You are mentioning Systran as a good alternative. Let me know if I understand correctly: Does the engine in this case run only on the local computer without sending data to the Web? Do the manufacturers put this in writing somewhere in the manuals or related documents?


Hi Monica,

I'll try to find some time later today to briefly describe the different Systran product offers/configurations here in this thread and indicate how it answers your question about content confidentiality.
Same would be true for other MT vendors. I'll give examples.

Jeff


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Several Questions for MT users

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search