Online MT Tools and confidentiality
Thread poster: Anil Gidwani

Anil Gidwani  Identity Verified
India
Local time: 09:37
German to English
+ ...
Apr 23, 2013

As a computer engineer, I'm convinced MT has a role to play in translation. I'm equally convinced that machines can never replace human translators, since translation involves language, easily one of mankind's most complex cognitive skills.

I've been trying out some MT tools over the past couple of years. One of the leading tools on the market, which comes with a trial version, turned out not to meet my expectations just yet. The biggest plus was that it could be installed and run locally off my laptop.

I've tried Google Translate on-line, being careful to translate only sentences or sentence fragments at a time, with all confidential information replaced by fabricated place-holders such as "ABC". I'm quite impressed with Google Translate, and it definitely speeds up the translation process. However, they don't have a downloadable version, and it's clear why they don't, since systems based on the statistical approach feed off a growing on-line corpus by design. Neither does Systran, another leading tool.

Using on-line software such as Google Translate for MT becomes inefficient, since you work one chunk at a time (be it line or paragraph), otherwise you tend to compromise the confidentiality of the text.

Has anyone found an approach using on-line MT software which is efficient and does not lead to a breach of confidentiality? Or a good off-line MT tool? Paying for the software would not be an issue.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 05:07
Member (2006)
English to Afrikaans
+ ...
Use GTT instead of GT Apr 23, 2013

Anil Gidwani wrote:
Has anyone found an approach using on-line MT software which is efficient and does not lead to a breach of confidentiality?


Well, since you mention that you anonymise text before you paste it into Google Translate, I would recommend that you try Google Translate Toolkit. It allows you to upload whole files instead of just a few lines, and if you believe Google then your content will not be shared with anyone.


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 05:07
English to German
Google ... Apr 23, 2013

Samuel Murray wrote:

if you believe Google then your content will not be shared with anyone.


https://developers.google.com/terms/?hl=en-EN

Search for "Submission of content".


Direct link Reply with quote
 
Joakim Braun  Identity Verified
Sweden
Local time: 05:07
German to Swedish
+ ...
But Apr 23, 2013

Confidentiality is not achieved by removing company names. Given a long enough text an attentive reader who knows the field may well figure out who it's about.

Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 05:07
Member (2006)
English to Afrikaans
+ ...
@Rolf Apr 24, 2013

Rolf Keller wrote:
Samuel Murray wrote:
if you believe Google [when using Google Translate Toolkit], then your content will not be shared with anyone.

https://developers.google.com/terms/?hl=en-EN
Search for "Submission of content".


Yes, but the Google Translate Toolkit is not an API, and that link of yours relate to APIs only.


Direct link Reply with quote
 

Anil Gidwani  Identity Verified
India
Local time: 09:37
German to English
+ ...
TOPIC STARTER
Which is why feeding an entire text is questionable Apr 25, 2013

Joakim Braun wrote:

Confidentiality is not achieved by removing company names. Given a long enough text an attentive reader who knows the field may well figure out who it's about.


I agree. It's not so easy to anonymize anyway, certain texts are replete with acronyms, names of departments, names of products etc.

A selective line-based usage of an online engine is clearly the safest approach at this time. Unless an off-line version is available, which is doubtful.

Does Google Translate offer an off-line version? Are there any off-line products that are good enough to be considered commercially usable? I used a trial version of PromT a year or so ago, and decided it didn't meet my expectations at the time. Systran does not have a trial version. Does anyone have an opinion of Systran?


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 05:07
English to German
API or no API Apr 25, 2013

Samuel Murray wrote:

Yes, but the Google Translate Toolkit is not an API, and that link of yours relate to APIs only.


Ok, but the toolkit includes an API:
https://developers.google.com/translator-toolkit/

BTW, I assume that the web editor of the toolkit uses that API. Unfortunately the terms and conditions for this editor seem to be a secret, at least for non-registered users like me .


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 05:07
Member (2006)
English to Afrikaans
+ ...
How about anonymising *and* randomising? Apr 25, 2013

Anil Gidwani wrote:
It's not so easy to anonymize anyway, certain texts are replete with acronyms, names of departments, names of products etc.
...
A selective line-based usage of an online engine is clearly the safest approach at this time.


Well, a malicious MT provider would still be able to recreate much of the text by simply keeping track of who submits what. I can see two ways of overcoming that: (a) the simplest but least effective way is to randomise the sentences that you submit to the MT server; (b) if you can write a program that many people use, then such a program can send each segment to the MT server via a random other user's connection, so that the MT service can't create a list of segments that belong to a user.

You can also obfuscate segments by mixing it with segments from other sources (e.g. if you have a program that can add random sentences from the internet that use similar wording as your source text).

My opinion is that absolute confidentiality can't be maintained and that the translator should take reasonable steps to ensure confidentiality. Anonymising the text is one such method. Randomising the segments that are submitted is another such method. And so is obfuscation.

For my language combination, anonymising would be easy, because my source language is English, and the English use capital letters mostly only for things that would normally need to be removed during an anonymisation process. So for me, I can simply replace all words with capital initials with placeholders. You have a bit of a problem, with German...


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Online MT Tools and confidentiality

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search