Pages in topic:   [1 2] >
Google misusing uploaded translation samples?
Thread poster: Woodstock

Woodstock  Identity Verified
Germany
Local time: 21:58
German to English
+ ...
Dec 14, 2009

I just realized that Google scanned one of my uploaded translation samples WORD FOR WORD. This is how Google works, granted, but the following gave me a slight shock when I saw it:
From my Visitors record:

Original IP Address Came From Keywords Viewed Number 217.6.182.116
www.google.es

"Scannen, ist, das, Abtasten, der, Schriftvorlage, durch, den, Scanner., Dabei, wird, im, Computer, ein, digitales, Bild, der, Vorlage, erzeugt., Dieses, Bild, ist, eine, Matrix, von, schwarzen,, weißen, bzw., grauen, Punkten., Im

What I couldn't tell is if they also scanned the English version of this, which would be tantamount to theft, in my opinion. Has anyone else noticed this? I had never seen it before, or just never caught it, as I hadn't been using ProZ very much for a while and have only just started logging in more often lately.

Frankly, I'm appalled at this, and feel that it is a breach of my privacy and my work This made me think of a recent post about the Google translating feature, and I wonder if they are now stealing other people's translation work for their own use.

It would be interesting to hear about other people's experiences / opinions on this subject. Maybe I'm incensed for no reason, but it would be nice if ProZ had some kind of safeguard in place to protect our privacy, possibly as an option. On the other hand, I realize we need internet exposure to find new clients. Where and how can a line be drawn? Or do we just have to accept the fact that the trade-off for more and more free and available information is that we lose our privacy completely and our lives become an open book for anyone and everyone anywhere in the world?

Is this the right forum? I had no idea how to categorize this. Please move it if necessary. Thanks.


Direct link Reply with quote
 

Tom in London
United Kingdom
Local time: 20:58
Member (2008)
Italian to English
Hmmm Dec 14, 2009

Woodstock wrote:

I just realized that Google scanned one of my uploaded translation samples


Do you mean your uploaded translation samples from your Proz.com page?

If this can be proved, I think it would be a gross case of intellectual property theft.

Now you've got me seriously concerned.....I would not take kindly to anyone stealing my translations from this site and feeding my vocabulary and syntax into Google Translate. In fact I would be extremely ****** off.

In the first instance I would request that Proz.com's lawyers ask Google for a written undertaking that they are not doing this. If Google is not doing it, they should have no problem providing such an undertaking.



[Edited at 2009-12-14 16:33 GMT]


Direct link Reply with quote
 

David Wright  Identity Verified
Austria
Local time: 21:58
German to English
+ ...
the net is public space Dec 14, 2009

I suspect that the answer is that anything that is placed on the net is public unless it is protected by access restrictions. Since the KudoZ entries are not protected (as far as I can see anyone can use the glossaries, or at least see the asnwers to questions), anything you submit is indeed public. Perhaps we ought to be more aware of this and consider very carefully what we actually place on the net. As to whether they scanned your translation - probably, but not for the purposes of their translation software (I would thin) but merely for the search engine. You can check by entering a sequence fomr your translation and seeing if it is there.

Direct link Reply with quote
 

Tim Drayton  Identity Verified
Cyprus
Local time: 22:58
Turkish to English
+ ...
Google Translate visits my site Dec 14, 2009

I notice that Google Translate has been visiting my website for quite some time. For example, listed under 'referrers' in the usage data for my site so far this month are one referral each from:

http://translate.google.com.tr/translate
http://translate.google.com.nl/translate_p
http://translate.google.com.ru/translate_p

I have no idea what this is about.


Direct link Reply with quote
 

Tom in London
United Kingdom
Local time: 20:58
Member (2008)
Italian to English
not good Dec 14, 2009

David Wright wrote:

I suspect that the answer is that anything that is placed on the net is public unless it is protected by access restrictions. Since the KudoZ entries are not protected (as far as I can see anyone can use the glossaries, or at least see the asnwers to questions), anything you submit is indeed public. Perhaps we ought to be more aware of this and consider very carefully what we actually place on the net. As to whether they scanned your translation - probably, but not for the purposes of their translation software (I would thin) but merely for the search engine. You can check by entering a sequence fomr your translation and seeing if it is there.


It's important that prospective clients are able to see examples of our work, so for the moment I won't be removing mine from my page. But if Google are doing this systematically, and intentionally scanning all our thousands of sample translations, then that is INFORMATION THEFT.

Proz could at least put a warning to that effect, on the section where the sample translations are. Or perhaps we should start posting deliberately bad translations

[Edited at 2009-12-14 16:50 GMT]


Direct link Reply with quote
 

Stanislaw Czech, MCIL  Identity Verified
United Kingdom
Local time: 20:58
Member (2006)
English to Polish
+ ...
misusing? Dec 14, 2009

They scan these pages completely automatically and that's exactly what we expect them to do.

In fact I would be rather grateful as it increases visibility of your profile in Google.

I don't think that the sample you gave suggests in any way that they are using these data to feed their translation engine.

BR|
S


Direct link Reply with quote
 

Tom in London
United Kingdom
Local time: 20:58
Member (2008)
Italian to English
Huh? Dec 14, 2009

Stanislaw Czech wrote:

They scan these pages completely automatically and that's exactly what we expect them to do.


I don't think that the sample you gave suggests in any way that they are using these data to feed their translation engine.


Surely those two statements are contradictory?

There can only be one reason why Google Translate targets and scans the world's most important language translation site.


Direct link Reply with quote
 

Alex Lago  Identity Verified
Spain
Local time: 21:58
Member (2009)
English to Spanish
+ ...
I really don't see the problem Dec 14, 2009

I also have some translations on my profile, but from the moment I posted them there I was aware that they were available to anyone, not just Google, that has internet access. Because of this the translations I have posted are few (they are simply there to give clients an idea of my work), they are not confidential client information (they are from a public source) and they are something I am comfortable with other people having.

I mean lets face it what is to stop anyone from accessing your translations and creating TMs from them by aligning them.

Anything you want to keep private keep off the net, which is why I don't use Google Docs or Calendar or any other tools like that.


Direct link Reply with quote
 

Hynek Palatin  Identity Verified
Czech Republic
Local time: 21:58
English to Czech
+ ...
Google Dec 14, 2009

Original IP Address Came From Keywords Viewed Number 217.6.182.116
www.google.es

"Scannen, ist, das, Abtasten, der, Schriftvorlage, durch, den, Scanner., Dabei, wird, im, Computer, ein, digitales, Bild, der, Vorlage, erzeugt., Dieses, Bild, ist, eine, Matrix, von, schwarzen,, weißen, bzw., grauen, Punkten., Im


Hi Ellinor,

This is what happened:

Google crawled and remembered your public ProZ profile including the translation samples. Somebody searched for the above phrase, found your profile and visited it. Google passed the search phrase to ProZ and you can now see it on your Visitors tab. (That's actually very useful, because you know the keyphrases that return your ProZ profile in Google.) And that's all.

What I couldn't tell is if they also scanned the English version of this, which would be tantamount to theft, in my opinion.


Google crawled and remembered (cached) the English version too, of course. Here is the proof: http://bit.ly/6ZPeHV

Google does not know (yet) that this is the source and the translation. Try to put the German phrase to Google Translate. This is the result: http://bit.ly/5GEhAZ As you can see, the English translation is different from yours. But anybody can "contribute a better translation" and paste your version or maybe Google will get intelligent enough to match your source and target. I don't know what are the legal implications, but once you publish a sample translation, it becomes public.

Hynek


Direct link Reply with quote
 

Woodstock  Identity Verified
Germany
Local time: 21:58
German to English
+ ...
TOPIC STARTER
Translation samples uploaded to ProZ Dec 14, 2009

Tom in London wrote:

Do you mean your uploaded translation samples from your Proz.com page?...



[Edited at 2009-12-14 16:33 GMT]



Yes, I meant the ones on ProZ. Sorry if I didn't make that clear. There is no confidential information in them, I made sure of that when I posted them. But still.... it makes me uncomfortable. Sure, you expect potential clients to look at them, but having them accessible in a somewhat "safe" environment like ProZ is different than having them on Google for the whole world to see and Google potentially using my work for their own profit.


Direct link Reply with quote
 

Hynek Palatin  Identity Verified
Czech Republic
Local time: 21:58
English to Czech
+ ...
How to block Google Dec 14, 2009

Sure, you expect potential clients to look at them, but having them accessible in a somewhat "safe" environment like ProZ is different than having them on Google for the whole world to see and Google potentially using my work for their own profit.


If you don't want Google to store a snapshot your profile, go to the Settings tab in your profile, click Search engine settings, and select No index under Profile indexing.


Direct link Reply with quote
 

Woodstock  Identity Verified
Germany
Local time: 21:58
German to English
+ ...
TOPIC STARTER
Google process Dec 14, 2009

Hi, Hynek,


Hynek Palatin wrote:

...
Google does not know (yet) that this is the source and the translation. Try to put the German phrase to Google Translate. This is the result: http://bit.ly/5GEhAZ As you can see, the English translation is different from yours. But anybody can "contribute a better translation" and paste your version or maybe Google will get intelligent enough to match your source and target. I don't know what are the legal implications, but once you publish a sample translation, it becomes public.

Hynek


Thank you for the very clear explanation. I'm not sure it makes me feel any better, though. Obviously, I'm not going to sue Google because it's a fabulous tool, and I use it all the time for so many things. It just made me stop and think about how exposed we all are, and how little privacy we actually do have now in the Information Age.

My hope is that this topic helps to raise awareness on how vulnerable we are, and as a mild reminder to the people who use this site to exercise caution in regard to what they put anywhere on the web, be it on ProZ or other websites.


Direct link Reply with quote
 

Woodstock  Identity Verified
Germany
Local time: 21:58
German to English
+ ...
TOPIC STARTER
Blocking Google Dec 14, 2009

Thank you again, Hynek. I wasn't aware of that feature, and maybe a lot of other people who use this site aren't, either, so I appreciate your mentioning it.


Woodstock


Direct link Reply with quote
 
FarkasAndras
Local time: 21:58
English to Hungarian
+ ...
Net is public Dec 14, 2009

Hynek Palatin wrote:


Google crawled and remembered (cached) the English version too, of course. Here is the proof: http://bit.ly/6ZPeHV

Google does not know (yet) that this is the source and the translation. Try to put the German phrase to Google Translate. This is the result: http://bit.ly/5GEhAZ As you can see, the English translation is different from yours. But anybody can "contribute a better translation" and paste your version or maybe Google will get intelligent enough to match your source and target. I don't know what are the legal implications, but once you publish a sample translation, it becomes public.

Hynek


I'm sure they have a system for matching up texts with their translations. Obviously, documents/pages of similar length with a URL that only differs in a few letters (EN/DE/IT/FR etc) are prime suspects. I can't be bothered to check the URLs of the samples in this case, but they could well have ended up in google translate's bowels. The fact that they are not being used now doesn't mean anything. They could be pending processing and insertion in the database or the segments may have been misaligned.

Anyone who thinks that text posted on a public website won't be read by robots and put in various databases is naive and uninformed... and anyone who thinks google is not crawling the web and collecting multilingual material to feed into the google translate database is also naive.
I don't see the problem... if you don't want your material te be harvested by others, don't put it on a public website. Once it's online, it's fair game.
Of course proz could use its robots.txt to keep (well-behaved) robots out of certain sections of the site... but I'm guessing the samples are in the profile and most translators want their profile to be indexed as this brings in google visitors and thus business. As Hynek pointed out, proz allows you to keep the robots away from your own profile if you want to.


Honestly, I love google for not being too bogged down with iffy IP issues like this. They set out to "organize the world's information" and they are doing a pretty stellar job of that, and provide a fantastic service to individuals and humanity as a whole... They index first and ask questions later... Just think about google books, for example. If you don't want them to organize your information, you'll have to take positive steps and tell them not to.


Direct link Reply with quote
 

Giuseppina Gatta, MA (Hons)
Member (2005)
English to Italian
+ ...
It doesn't make much sense Dec 14, 2009

If you decided to publish stuff on your profile, and you know that your profile is public, as it should be, if you are working as a professional and are not on Proz for fun, it doesn't make much sense to not want to have your stuff appearing on Google.

When I published my samples, I made sure to choose samples that had already been published somewhere else, so I really don't care that they are online, actually I hope to be as visible as possible on search engines, this is also how my prospective clients may find (and found) me.


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google misusing uploaded translation samples?

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search