Online MT Tools and confidentiality
Thread poster: Anil Gidwani

Anil Gidwani  Identity Verified
India
Local time: 03:14
German to English
+ ...
Apr 23, 2013

As a computer engineer, I'm convinced MT has a role to play in translation. I'm equally convinced that machines can never replace human translators, since translation involves language, easily one of mankind's most complex cognitive skills.

I've been trying out some MT tools over the past couple of years. One of the leading tools on the market, which comes with a trial version, turned out not to meet my expectations just yet. The biggest plus was that it could be installed and run locally off my laptop.

I've tried Google Translate on-line, being careful to translate only sentences or sentence fragments at a time, with all confidential information replaced by fabricated place-holders such as "ABC". I'm quite impressed with Google Translate, and it definitely speeds up the translation process. However, they don't have a downloadable version, and it's clear why they don't, since systems based on the statistical approach feed off a growing on-line corpus by design. Neither does Systran, another leading tool.

Using on-line software such as Google Translate for MT becomes inefficient, since you work one chunk at a time (be it line or paragraph), otherwise you tend to compromise the confidentiality of the text.

Has anyone found an approach using on-line MT software which is efficient and does not lead to a breach of confidentiality? Or a good off-line MT tool? Paying for the software would not be an issue.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 23:44
Member (2006)
English to Afrikaans
+ ...
Use GTT instead of GT Apr 23, 2013

Anil Gidwani wrote:
Has anyone found an approach using on-line MT software which is efficient and does not lead to a breach of confidentiality?


Well, since you mention that you anonymise text before you paste it into Google Translate, I would recommend that you try Google Translate Toolkit. It allows you to upload whole files instead of just a few lines, and if you believe Google then your content will not be shared with anyone.


 

Rolf Keller
Germany
Local time: 23:44
English to German
Google ... Apr 23, 2013

Samuel Murray wrote:

if you believe Google then your content will not be shared with anyone.


https://developers.google.com/terms/?hl=en-EN

Search for "Submission of content".


 

Joakim Braun  Identity Verified
Sweden
Local time: 23:44
German to Swedish
+ ...
But Apr 23, 2013

Confidentiality is not achieved by removing company names. Given a long enough text an attentive reader who knows the field may well figure out who it's about.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 23:44
Member (2006)
English to Afrikaans
+ ...
@Rolf Apr 24, 2013

Rolf Keller wrote:
Samuel Murray wrote:
if you believe Google [when using Google Translate Toolkit], then your content will not be shared with anyone.

https://developers.google.com/terms/?hl=en-EN
Search for "Submission of content".


Yes, but the Google Translate Toolkit is not an API, and that link of yours relate to APIs only.


 

Anil Gidwani  Identity Verified
India
Local time: 03:14
German to English
+ ...
TOPIC STARTER
Which is why feeding an entire text is questionable Apr 25, 2013

Joakim Braun wrote:

Confidentiality is not achieved by removing company names. Given a long enough text an attentive reader who knows the field may well figure out who it's about.


I agree. It's not so easy to anonymize anyway, certain texts are replete with acronyms, names of departments, names of products etc.

A selective line-based usage of an online engine is clearly the safest approach at this time. Unless an off-line version is available, which is doubtful.

Does Google Translate offer an off-line version? Are there any off-line products that are good enough to be considered commercially usable? I used a trial version of PromT a year or so ago, and decided it didn't meet my expectations at the time. Systran does not have a trial version. Does anyone have an opinion of Systran?


 

Rolf Keller
Germany
Local time: 23:44
English to German
API or no API Apr 25, 2013

Samuel Murray wrote:

Yes, but the Google Translate Toolkit is not an API, and that link of yours relate to APIs only.


Ok, but the toolkit includes an API:
https://developers.google.com/translator-toolkit/

BTW, I assume that the web editor of the toolkit uses that API. Unfortunately the terms and conditions for this editor seem to be a secret, at least for non-registered users like me .icon_frown.gif


 

Samuel Murray  Identity Verified
Netherlands
Local time: 23:44
Member (2006)
English to Afrikaans
+ ...
How about anonymising *and* randomising? Apr 25, 2013

Anil Gidwani wrote:
It's not so easy to anonymize anyway, certain texts are replete with acronyms, names of departments, names of products etc.
...
A selective line-based usage of an online engine is clearly the safest approach at this time.


Well, a malicious MT provider would still be able to recreate much of the text by simply keeping track of who submits what. I can see two ways of overcoming that: (a) the simplest but least effective way is to randomise the sentences that you submit to the MT server; (b) if you can write a program that many people use, then such a program can send each segment to the MT server via a random other user's connection, so that the MT service can't create a list of segments that belong to a user.

You can also obfuscate segments by mixing it with segments from other sources (e.g. if you have a program that can add random sentences from the internet that use similar wording as your source text).

My opinion is that absolute confidentiality can't be maintained and that the translator should take reasonable steps to ensure confidentiality. Anonymising the text is one such method. Randomising the segments that are submitted is another such method. And so is obfuscation.

For my language combination, anonymising would be easy, because my source language is English, and the English use capital letters mostly only for things that would normally need to be removed during an anonymisation process. So for me, I can simply replace all words with capital initials with placeholders. You have a bit of a problem, with German...


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Online MT Tools and confidentiality

Advanced search






SDL Trados Studio 2017 only €435 / $519
Get the cheapest prices for SDL Trados Studio 2017 on ProZ.com

Join this translator’s group buy brought to you by ProZ.com and buy SDL Trados Studio 2017 Freelance for only €435 / $519 / £345 / ¥63000 You will also receive FREE access to Studio 2019 when released.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search