Pages in topic:   [1 2] >
Data sharing with Google Translator Toolkit
Thread poster: Marinus Vesseur

Marinus Vesseur  Identity Verified
Canada
Local time: 11:26
English to Dutch
+ ...
Jan 9, 2010

As you might know, Google has put its Translator Toolkit online. It is a simplified CAT tool, combined with the optional use of Google Translate.

I don't want to start another thread on the advantages and disadvantages of Google Translate and other MT (machine translation) tools here. Rather, I'd like to know how much of the data I'd be sharing if I used the Toolkit in conjunction with an uploaded TM, or if I use a TM I made in the Toolkit.

There is an option to switch off sharing your TM here: http://translate.google.com/toolkit/tmupload?hl=en which seems very advisable vor more than one reason.

TMsharing.jpg

BUT..

..look at the Terms of Use here: http://translate.google.com/toolkit/TOS.html?hl=en

Use of your Content

By submitting your content through the Service, you grant Google the permission to use your content permanently to promote, improve or offer the Services. When, in the course of using your content to promote, improve or offer the Services, Google displays the content to an end user, it will do so only according to the sharing rules below, and only on a translation unit basis. ...

Translation Memories

(1) If you mark your translation memory as "not shared with everyone", each translation unit in the translation memory will be viewable only by you and users with whom you explicitly share the translation memory or documents that use the translation memory.

(2) If you mark your translation memory as "shared with everyone", each translation unit in the translation memory will be viewable by other end users.

(3) Note that regardless of whether or not a translation memory is "shared with everyone" or "not shared with everyone", when you create a translation memory entry, your content is subject to the “Use of your Content” section above.


As you can see, there is a 'loop' in the Terms. First it says "according to the sharing rules below" and later, in point 3, "your content is subject to the “Use of your Content” section above", which to me seems like Google DOES use your uploaded TM data at random.

Did I misunderstand? What are the Terms of Use actually saying here? I wrote Google about this and will post the reply when it comes in.

- Rien


 

Neil Coffey  Identity Verified
United Kingdom
Local time: 19:26
French to English
+ ...
My interpretation... Jan 9, 2010

The use to "improve their services" essentially means Google can look at the data you put in to do things like say "ah, in 90% of cases where our translation system has translated X as Y, users have corrected it (to Z)-- we'd better try and improve how our system translates X", or "90% of sentences put through the system are between X and Y words long, we'd better optimise our system for that length", or "80% of sentences have feature X, let's optimise our system for feature X", or... well, really lots of things like that. They're going to be looking at the data that people throw at it and optimise the system to cope with that kind of data.

How they use your data to "offer" services is a bit more vague, but for example, I guess it could include saying "our system has translated over X squillion sentences so far", or "only X% of our translated sentences needed human correction". If you've opted out of making your data public, then in principle, they should only disclose your data insofar as it is part of this kind of aggregate statement/statistic.

The means by which they do all of this is that there are a bunch of Google employees working on the translation tools project who have access to the database, and at any time they choose can access any of the your data as they choose.

So you need to think about this the same way as you think about other pieces of infrastructure managed by people in a privileged position. You've probably decided that your need to have a bank account outweighs the risk of a bank employee stealing your money; you've probably decided that your need to buy a book off Amazon outweighs the risk of an Amazon employee selling your credit card details and then your credit card company deciding not to compensate you for the consequences; you've probably decided that your need to healthcare outweighs the risk of clinic staff or a hospital IT worker making malicious use of your medical record etc etc etc. How trustworthy do you feel Google employees are vs the benefit that you'll get from using Google's translation tools...?


 

Lesley Clarke  Identity Verified
Mexico
Local time: 13:26
Spanish to English
However... Jan 9, 2010

This is not just about your own personal privacy and security, our TMs contain our client's confidential information, is it really a risk worth taking?

And call me a luddite, but do you really need to contribute towards a system that will probably eventually put us out of work?


 

FarkasAndras
Local time: 20:26
English to Hungarian
+ ...
Feeding MT Jan 9, 2010

I'm pretty sure that the only reason the translator's toolkit was developed in the first place is to feed new translations into Google Translate. IIRC Google said so on more than one occasion.
I'd imagine they won't let you use the service without having your translations end up in their database. What would be in it for them?


 

Neil Coffey  Identity Verified
United Kingdom
Local time: 19:26
French to English
+ ...
Not entirely clear Jan 9, 2010

FarkasAndras wrote:
I'm pretty sure that the only reason the translator's toolkit was developed in the first place is to feed new translations into Google Translate. IIRC Google said so on more than one occasion.


I've actually heard Google people express the opposite view-- that the raison d'être of Google Translate is more to allow systems such as the translator's toolkit.

Though surely Google *will* be looking at what people throw at it and improving their system accordingly. All (user-conscious) software designers do this to some extent or other.


I'd imagine they won't let you use the service without having your translations end up in their database. What would be in it for them?


Remember Google don't necessarily consider every individual project in terms of whether it will make a profit by itself. It can be more about what a product does for the Google brand as a whole, and whether as a knock-on effect it will improve their image and profitability in the long run.

Also, Google's whole business model has essentially been to devise a system that generates a lot of traffic and then consider the best way to make a profit it from that system at a later stage (and so far, it's proven to be quite a good methodology for them...).

I'm not sure that individual translations, or even moderate volumes of user-generated translations, are necessarily that valuable to Google. They're probably more interested in acquiring well-defined bilingual corpora in large volumes, and have the cash to pay for them as necessary.


 

FarkasAndras
Local time: 20:26
English to Hungarian
+ ...
What? Jan 9, 2010

Neil Coffey wrote:

FarkasAndras wrote:
I'm pretty sure that the only reason the translator's toolkit was developed in the first place is to feed new translations into Google Translate. IIRC Google said so on more than one occasion.


I've actually heard Google people express the opposite view-- that the raison d'être of Google Translate is more to allow systems such as the translator's toolkit.



Where did you get that from? I have a lot of trouble taking that seriously.

Yes, Google likes to innovate and generally start interesting projects without a strict business plan, but I can assure you Google Translate was not set up to attract people to the Translator Toolkit. For starters, what percentage of Google Translate requests is made through the translator toolkit? 1%? 5%? Not more for sure. And it's completely backwards thinking anyway.

The terms quoted by the OP are unequivocal: they reserve the right to use all translations to improve the Services, i.e. Google Translate, and I'm sure they do. I mean, Google's original announcement of the service says "Best of all, our automatic translation system learns from [...] corrections, creating a virtuous cycle that can help translate content into 47 languages".

I have no clue what you mean by Google paying for corpora... There just isn't anything significant out there for them to buy, not on the scale they need. They started out with UN texts in the 6 official languages and then went on to mine the web for multilingual content. I'm guessing they have also integrated the EU's multilingual content by now.
Compared to the (tens of) millions of TUs they have from these sources available for the cost of processing, the couple of hundred thousand TUs the could buy (from whom?) are a drop in the bucket. Probably not worth bothering with. Making the translator toolkit is a one time investment that could bring in a massive amount of content down the line - not just translations made through the system but also uploaded TMs. And of course it's an interesting project in its own right as well.

[Edited at 2010-01-09 22:13 GMT]


 

Marinus Vesseur  Identity Verified
Canada
Local time: 11:26
English to Dutch
+ ...
TOPIC STARTER
Two things Jan 10, 2010

First, I wouldn't want to use it in my language combinations German-Dutch and vice versa, since the results Google Translate produces in those combinations is still pretty bad and I'd like to keep it that way. If using it adds to the succes of the system, that would be like digging my own grave as a translator. In other words: forget the Toolkit if using it means sharing it with Goorgol.

The Goorgol system appears to center around English as the pivotal language, so anything outside that reach would have to run through an Abalese-English, English-Bebalese routine, which could impossibly deliver very useful results and that is fine with me.

Secondly, I wonder if the system can be corrupted. If you messed up a TM and uploaded it, would the Goorgol eat it? I think I'll try that.


 

FarkasAndras
Local time: 20:26
English to Hungarian
+ ...
Well... Jan 10, 2010

Marinus Vesseur wrote:

First, I wouldn't want to use it in my language combinations German-Dutch and vice versa, since the results Google Translate produces in those combinations is still pretty bad and I'd like to keep it that way. If using it adds to the succes of the system, that would be like digging my own grave as a translator. In other words: forget the Toolkit if using it means sharing it with Goorgol.

The Goorgol system appears to center around English as the pivotal language, so anything outside that reach would have to run through an Abalese-English, English-Bebalese routine, which could impossibly deliver very useful results and that is fine with me.

Secondly, I wonder if the system can be corrupted. If you messed up a TM and uploaded it, would the Goorgol eat it? I think I'll try that.


Let's not get carried away. Your contributions are not what will make or break Google Translate...
Interesting proposition about German-Dutch; I'm pretty sure they do all combinations via English so I'm not sure if they have any use for data that has no English in it. I'm sure they could use it for something eventually, but I think they won't directly upload it into any of the main the database as I believe they only have English-Anything databases.

As to trying to mess up a service that provides a crucial free resource for hundreds of thousands of people around the world, that has to be the most malicious and childish idea I have ever came across on proz. No comment.
As to the particulars, Google crunches data for a living and has some of the world's smartest people on its payroll... I think the odds of you being able to fool them to any significant extent are slim to none.

[Edited at 2010-01-10 08:17 GMT]


 

Adam Łobatiuk  Identity Verified
Poland
Local time: 20:26
Member (2009)
English to Polish
+ ...
Do you really have to contribute? Jan 10, 2010

I've only tried the service once and I might not remember it correctly, but I think you can have your document translated by Google without contributing your own work. The only concern there would be the confidentiality of the document content, but otherwise, submitting your document for translation doesn't seem to improve the Google service in any way. You can submit your TM and glossary, but I don't think you have to.

 

Vito Smolej
Germany
Local time: 20:26
Member (2004)
English to Slovenian
+ ...
...and dont forget all the books ... Jan 10, 2010

FarkasAndras wrote:
...They started out with UN texts in the 6 official languages and then went on to mine the web for multilingual content. I'm guessing they have also integrated the EU's multilingual content by now.
...

original texts and their translations, scanned so far ...


 

FarkasAndras
Local time: 20:26
English to Hungarian
+ ...
Possibly Jan 10, 2010

Vito Smolej wrote:

FarkasAndras wrote:
...They started out with UN texts in the 6 official languages and then went on to mine the web for multilingual content. I'm guessing they have also integrated the EU's multilingual content by now.
...

original texts and their translations, scanned so far ...

It has occurred to me before that they must have done that, but I never saw it confirmed anywhere.
As we all know, they are on pretty shaky legal ground in that project, so they might have opted to keep the books out of Google Translate.
BTW I don't know how intensively they have been scanning non-English books, especiall books they already have in English. I'm not sure they do that at all.


 

DZiW
Ukraine
English to Russian
+ ...
IMO Jan 10, 2010

Translate faster with Google's online tools:
* Correct automatic translations in an easy-to-use editor.
* Search past translations to find words for new translations.
* Publish translations to Wikipedia™ or Knol.
* Collaborate with other translators.
* Use advanced tools like translation memories and multilingual glossaries.
Isn't it about any modern CAT? IMO if that's all then I wonder what makes the process go any faster? And how about (more) reliable?

So we've got:
* on-line pseudo-CAT equivalent or more specifically - a CAT repository;
* on-line access ONLY: it's not for off-line processing;
* Google wants qualified translators to do the job well for *free*;
* MT is wrong, especially after semi-qualified edition; no QA;
* all Google-translated data is at public domain;
. . .
Possibly it's just a good marketing idea or move, but we shall see it in some five years.
Anyway, I'm not ready to make the job dependent on some service or policies.

Cheers)


 

Laurent KRAULAND (X)  Identity Verified
France
Local time: 20:26
French to German
+ ...
The "big" idea behind that... Jan 10, 2010

DZiW wrote:

Anyway, I'm not ready to make the job dependent on some service or policies.

Cheers)

Good you mentioned that point. Methinks that the "big" idea behind that (GTT and other online services) is SaaS - Software as a Service, with users paying fees, tolls and so on to be able to access data. I am not great at catastrophe scenarios, but cannot imagine the benefit of only having an empty "client" workstation in the office and of being nearly forced to rely on clouded applications in order to work. This would be a nightmare AFAIAC.

[Edited at 2010-01-10 17:40 GMT]


 

FarkasAndras
Local time: 20:26
English to Hungarian
+ ...
Not for pros Jan 10, 2010

Google knows as well as we do that professional translators use offline CATs and have performance, reliability and confidentiality expectations an online service can't always meet.
I don't think the system was primarily designed for full-time professional translators. I mean, the usage scenario they bring up in their announcement is "an Arabic-speaking reader wants to translate a Wikipedia™ article into Arabic".
I see it more as a crowdsourcing tool, useful for translating Wikipedia (which is really neatly integrated) and such like.


 

Marinus Vesseur  Identity Verified
Canada
Local time: 11:26
English to Dutch
+ ...
TOPIC STARTER
Terms of Use of Google Translator Toolkit Jan 13, 2010

I guess y'all are right about the Toolkit not being a professional tool. That probably also applies to Google Translate, specifically the "Suggest a better translation" feature. Hands off!

I haven't found the time to test whether nonsense can be fed into the system, but I'm still very curious.

As to the Terms of Use of the Google Translator Toolkit and their meaning, here is the reply by their support:

When you create or upload a translation memory in Translator Toolkit, the following should apply:

1. Regardless of whether your translation memory is marked, "Shared with everyone" or "Not shared with everyone," we will use your translation memory segments to train Google's machine translation system, which is used in Google Translator Toolkit as well as other products like Google Translate.

2. If you mark your translation memory as "not shared with everyone", each translation unit in the translation memory will be viewable only by you and users with whom you explicitly share the translation memory. In addition, if you explicitly share a document that uses that translation memory with another user, each translation unit in the translation memory will be viewable by that user when translating the shared document.

3. If you mark your translation memory as "shared with everyone", each translation unit in the translation memory will be viewable by other end users.


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Data sharing with Google Translator Toolkit

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search