What are the benefits and limitations of Google hits as a reference?
Thread poster: Kim Metzger

Kim Metzger  Identity Verified
German to English
Apr 6, 2004


Hello all,
I'm working on a short paper on the subject of providing references for KudoZ answers. I intend to submit it to the ProZ staff for possible inclusion in the ProZ "How to" section.
I think proposals for most pro level KudoZ questions should be supported with explanations and reliable references that help the asker determine whether he or she can rely on the answer. Yes, I know there are also many instances where references aren't really called for. If someone needs a creative translation of a marketing slogan, say. But I'm talking about terms that absolutely have to be translated precisely.
As I see it, Google hits, i.e. the number of times a particular word or phrase occurs in documents the Google search engine has found, can be a valuable tool for a translator and even for justifying a particular translation proposal – but only if the results are understood and properly interpreted. When someone proposes a translation of a term and simply says there are 500,000 Google hits for this term, the only thing that tells me is that yes, the word does in fact exist in the target language, but it doesn't tell me whether the term is therefore the proper translation of the source term in the context provided.

Would you care to contribute your ideas on benefits and limitations of "Google hits?"

[Edited at 2004-04-07 01:25]

[Edited at 2004-04-07 01:35]

Member (2002)
French to English
+ ...

Google hits Apr 6, 2004

...without context are of little value.

Yes, they prove the expression exists, but the context fills in the nuance, so that it is in fact used properly in the target document.

Adding terms to the search request helps with this problem. What ways are there of weeding out the irrelevant citations?

Parrot  Identity Verified
Member (2002)
Spanish to English
+ ...
Lexical frequency of word collocation Apr 6, 2004

This is important in the source language in the sense that it gives an indication as to how far the "question side" is asking for a real term or a coinage of the moment. Reading through the examples gives a further idea (do they all apply to a single concept, or are multiple concepts admissible).

In the target language, every entry has to be read to see if it is indeed valid for the context. Otherwise, it's going to be a case of Heisenberg's axiom on the wave-particle theories: if you ask matter a wave question, it will give you a wave answer and if you ask it a particle question, it's capable of giving you a particle answer as well.

The most reliable Google searches in this business are still the ones using the word "glossary"....

[Edited at 2004-04-06 20:03]

RobinB  Identity Verified
German to English
Google may be an indication, but it's not authoritative Apr 6, 2004


With all due respect: You'll find over 3,650,000 Google hits for the word "accomodation", and 194,000 for "arguement" - words that do *not* "exist in the target language".

Google *may* be an indication of a term's existence, but it's not authoritative. You yourself bring up the issue of whether the term referred to is relevant.

Personally, I'd prefer respondents to give their own substantiated argument for their choice of answer, but I know that this isn't always possible due to time constraints and other reasons. But what I really, really hate is answers that cite a generally available dictionary (print or online)!

To sum up: Google may be a help, but it's not a substitute for intelligent terminology work.

Robert INGLEDEW  Identity Verified
English to Spanish
+ ...

"Washatería" (Texan Spanglish) has more than 1500 hits in Google:

Web Results 1 - 10 of about 1,620 for Washateria. (0.59 seconds)

Mr. Smarty Pants Laundromat Facts
Laundromat Facts, The word "washateria," primarily used in the South,
came into the English language around 1937. Bendix Appliances ... - 35k - Cached - Similar pages

Regards from Argentina. Don't worry. Here we also have our own slang, that many Spanish speaking people would never understand...

Robert Ingledew

[Edited at 2004-04-06 20:26]

Pablo Grosschmid  Identity Verified
English to Spanish
+ ...
depends on language pair Apr 6, 2004

Google is a great help for those who translate ito English.

However, since most other languages have been invaded by anglicisms (most of them unnecessary) and "false friends", if your target language is not Engish, you have a doubt and you check in Google the frequency of usage in the target language, zillions of googlehits are a strong indication that you have to look for somethig different. In this way, Google is a great help!

BTW, you can always find a lot of hits for any wrong iterpretation in the target language.

Uwe Kirmse  Identity Verified
Polish to German
+ ...
I disagree. Apr 6, 2004

Google is the greatest authority for translators.

Try to translate the word "translation" into German. Check "Übersetzung" - there are 2.780.000 Google hits. And then check "a", and you'll get 3.480.000.000 hits, and only from German sites still 11.000.000.
"Übersetzung" vs. "a" - the winner is "a"!!!

Without the Google hits I would have translated "translation" as "Übersetzung", now I know better.

The problem is only, that some of us really use such "references".

[Edited at 2004-04-06 21:08]

Ulrike Lieder  Identity Verified
English to German
+ ...
IMHO, substantiation of an answer is a must Apr 6, 2004

RobinB wrote:

Personally, I'd prefer respondents to give their own substantiated argument for their choice of answer, but I know that this isn't always possible due to time constraints and other reasons. But what I really, really hate is answers that cite a generally available dictionary (print or online)!

I would certainly agree with Robin's first point that answers should be substantiated. And, if there are pertinent Google hits, so much the better. In that case, the number of Google hits would certainly be an indication as to the usage of the target term (wide-spread or very specific, as the case may be). As Parrot noted, the number of hits alone is not a measure of the accuracy of an answer, the hits must be relevant to the question and answer.

As to Robin's second point, answers that cite a readily available dictionary, be it print or online, I can't say that I hate those answers. (In fact, Robin, why do you "really, really hate" those answers?) I do wonder, though, why the asker hasn't taken the trouble to do the research him/herself - I personally consider KudoZ a "last resort," something I turn to after I've exhausted all my resources, online and in print. What I dislike (or, to use Robin's words, "really, really hate" ) is answers that do not cite any references, but are sent off with some flip remark that doesn't help me at all in determining if the answer is indeed applicable to my context. Personally, I'd rather have an answer from a dictionary that provides me with a basis on which to do further research than one that gives me a term, no references, but a flip remark. That, in my book, is not a helpful answer.

Thierry thierry_lafaye
English to French
+ ...
I have seen things in the past you wouldn't imagine :) Apr 6, 2004

Hello Kim and all. Thank you very much for doing this as I believe it will help contribute for our system to work even better for us, and sorry for the very lengthy posting. Here is my humble experience based on the Google hits; Well, it could be Google or any other search engine, I guess the result would be somewhat about the same.

I have seen, while trying to help through the Kudoz system, that some people falsely believe they are just plain right because they found a hit. Whether it is just one or zillions to me just doesn’t matter if you don’t go really to the resulting page(s) to check in details that it matches the context you are looking into, that the document deals seriously about the matter, etc. For example, they literally translate something to propose a translation, which is of course a good thing, but they are making the mistake to believe that, because they found the target proposal without the corresponding document in the source language, they are simply right, and stating the Google hits as some sort of scale of how sure we can be. I am of course not blaming them but I just hope that this mistake does not cost them business.

My humble opinion on this precise matter of Google hits is that it simply does not provide a rough idea of how reliable a translation is so much. I personally check for existing bilingual documents and if it makes sense in both source and target languages to me, making sure that it also matches the context that I am looking into. If not, then you need to understand fairly well what the source document is dealing with to search for specialized glossaries or Websites that will explain you further components, subjects, areas, etc. to make sure you are lead to the right term. Through some very lengthy hours of such researches, I found out many times that only one hit can be just as good as a thousand ones. But I know, this is not effective work…

So to conclude, I believe Google hit count alone as some form of giving you an idea of how relevant a translation is, is just as good as counting the number of E you have in a source term or target translation for example: we are simply not comparing apples with apples. It would be though slightly more relevant to give you the number of checked contexts in the documents provided by Google results that match what you think you are looking for… but then who can check potentially thousands of pages just to find that maybe not even 1% match what we need, even in the first 10 result pages. I believe that providing 2 or more relevant documents helps me to make my confidence level. Not something we can really measure I reckon, can we? So Google hits alone? I does not really look serious to me. Google checked documents? Relevant enough but not really effective and potentially time wasting. Sorry I cannot give you a straight answer as you were maybe looking for but I don’t think we can this easily on this particular matter. Still, I hope it helps a bit though.


Klaus Herrmann  Identity Verified
Member (2002)
English to German
+ ...
Identical definitions of a term in two languages Apr 6, 2004

That's what a perfect Google reference means to me. IMHO it does not matter whether or not these definitions come from a "real" dictionary or from two similar sites, say web sites dedicated to the color of exhaust pipe deposits. If both of them describe dark black tailpipe as indication of a rich fuel mixture, I'll consider this eveidence that rich fuel mixture is the appropriate translation of "fettes Gemisch".

xxxLia Fail  Identity Verified
Spanish to English
+ ...
Great idea Apr 6, 2004

Hi Kim

I believe you are referring to just Google and hits, is that it? Or about 'creative' searches using Google, in a general sense?

I think two important uses Google is not necessarily for hits for individual words - potentially dangerous - but for context and for hits for collocations and phrases. And obviously, reputable organisations are preferred, and non-native sites have to be excluded.

Some ideas:

1. ST context to supplement the ST context that you have (or haven't), seeing a word/collocation in similar and other contexts also contributes to building meaning

2. ST context checks to check whether an expression can be considered a candidate term, e.g. the same expression is repeated a no. of times in the ST, so you check to see frequency in Google to know whether you should be looking for a term in TT.

3. the 'define:' function (works only in EN, as far as I know), to clarify understanding

4. ST or TT word/expression in inverted commas + 'glossary or + 'English' etc to see if bilingual glossary entries or scientific article abstracts (often non-native written, but usually OK for technical language) are thrown up. And I'm always on the lookout for parallel texts (and DE technical ones are often multilingual and pretty convincing)

5. As per Klaus above, another approach - although lengthy - is looking for similar definitions. Problem is in EN an expression such as "a/an XXXX is a" is usually fruitful (or the Google define), but other languages are not so obliging with similar defining expressions. A similar approach is comparing web images in both ST and TT to see if they coincide.

5. Knowing more or less the word, but not quite sure of which form is most typical, i.e. noun, verb, adjective: Google is good for checking that too.

6. Testing which turns of phrase, in a general sense, are most used. Or which preposition is typically used in a collocation

5. A lot depends - not just on language combinations - but also subject field. For example, the more traditional subjects tend to have far less info online (or just in PDF), eg. mining. Google isn't very helpful for law, either. Certain areas of science (med, biomed) have tons of stuff, and a recent one for me, there's loads of nautical stuff in a variety of languages.

6. To check glossary (there are so many and their reliability is often doubtful) and dictionary entries, and with context

Finally, a zero hit is really reassuring, confirming that the word is misspelled in the ST, that you're barking up the wrong tree in the TT, etc. Means - in the first case - that you are justified in offloading that particular problem onto the client/demonstrate your initiative in figuring out their error:-)

[Edited at 2004-04-07 00:00]

[Edited at 2004-04-07 00:01]

[Edited at 2004-04-07 00:06]

PAS  Identity Verified
English to Polish
+ ...
Fuzzy google search Apr 7, 2004

Some of this has probably been said before, but I will be brief:

I don't even look at the absolute number of hits (unless I come up with less than, say, 50 - then I start wondering).

I pay EXTREME attention to the type/ location of the site I find.
Who cares if I find a word on a blog? I don't trust it. If it's an abstract of a doctoral thesis - it's probably OK.

It is not quite true that non-native sites are to be excluded. On some, target language translations are quite good. After all, one of us ProZians could have done the translation! It also requires fuzzy judgement to determine if a translation on a site is dependable.

Just as an aside - what is the source and target language of official European Union sites?

Pawel Skalinski

Mats Wiman  Identity Verified
Member (2000)
German to Swedish
+ ...

Dictionaries are a blessing to the translator Apr 7, 2004

RobinB said:

But what I really, really hate is answers that cite a generally available dictionary (print or online)!
What better source offering TRANSLATIONS can you find but dictionaries.

As to "generally available dictionary" the following must be said:

1. "Generally available" does not mean that they must be available to the translator/asker.

2. There are a number of reasons why you do not have a dictionary:
a) You cannot afford it.
b) You can but the usage percentage is too low.
c) You do not know of it. I e.g. found most of my sources in the KudoZ arena.
d) You not have it with you (especially in the summer)
e) It might not be incuded in my "generally available dictionary" but maybe in yours.

3. A dictionary might be a slower route to an answer for a rushed translator but a golden source when offered by a colleague.

As to Google, it can be of great help confirming that a word/term/expression exists but not so often what it means.


[Edited at 2004-04-07 10:08]

Christian Flury
Latin to German
+ ...
In defense of Googling Apr 7, 2004


To be frank, I feel somehow confused by some of the remarks made about Googling in general so far: ANY translation resource (dictionaries, glossaries, parallel texts, etc.) is useless as soon as you blindly rely on it and unless it is used by somebody possessing the necessary skills. I guess that is one of the main reasons why you need to have an appropriate background as to education and experience in order to work as a professional translator.

I personally consider the "Google frequency and collocation analysis" a great tool. Here are some examples:

1. Context: You think you found the term you were looking for, but you are not sure whether it is used within a specific field or context: Thanks to search machines like Google, you can look for sites containing both your term and the name of that specific field. This is particularly useful when you have several synonyms and you do not know which of them might fit best into a given context.

2. Comparing different collocations: Of course, you will find examples of the most unusual and incorrect collocations on the Internet. I therefore would not consider "3000 Google hits" a good reference in such cases - contrarily to "3000 Google hits for possibility A versus 1000 Google hits for possibility B versus 100 Google hits for possibility C" which is indeed a very useful indication.

3. Comparing ST and TT frequency: Getting a million hits for the ST and 100 hits for the TT is a strong hint you might be wrong, and most KudoZers are quite aware of that fact. What they are less aware of is that getting a million hits for the TT whereas the ST occurs only in a couple of highly specialized texts is also a clear hint one might probably be wrong rather than a reference.

4. Overall term frequency: I do not agree with those who previously stated that, let's say, 1000 Google hits did not prove anything as nobody is able to check if all these 1000 sites are in fact serious and reliable. Let us assume that you are looking for a financial term and you get 1000 Google results for your favored solution. If 19 out of the first 20 hits concern websites of well-known banks, there are very good chances that a high percentage of the other 980 pages are reliable, too. (By the way, what about including some probability figures in your analysis?)

5. Regionalisms: As you can see from my profile, I live in Austria and my parents are Swiss. Sometimes I am not sure whether a German term that comes to my mind may be typically or even exclusively Swiss or Austrian. Normally, it is sufficient to check the first twenty or thirty Google hits and if most urls end with .ch or .at, I know that the term or collocation I was about to use is a helvetism or an austriacism and that I should avoid it when addressing a larger German-speaking target group.

Generally speaking, I would say that Googling cannot replace any skills or resources a translator should have, but it is a great additional tool, especially in cases when one hesitates between several possible solutions. It can also be a very good additional reference. Furthermore, it is very useful for solving questions like for instance whether you say "ask-bid spread" or "bid-ask spread".

Please pardon my poor English and have a nice evening.


[Edited at 2004-04-07 16:35]

[Edited at 2004-04-07 16:36]

RobinB  Identity Verified
German to English
Dictionaries are a mixed blessing at best Apr 7, 2004


You say that you can find no better source offering translations than dictionaries - a highly sweeping generalisation, I fear.

Maybe the situation is different for Swedish, but for German/English at least, there is no more than a handful of dictionaries (in any subject area) that can classed as in any way reliable. There are very many mediocre dictionaries, and as many again that are simply bad.

That this isn't just the case for German is confirmed by the following extract from the programme for the SFT Université d'été in July this year (arrived this morning):

"Pour faire passer avec force et précision le message de son client, le traducteur doit être en mesure d'en comprendre non seulement le contenu technique, mais également le contexte global et les besoins du public visé.
Dans le secteur de la finance, il doit savoir s'adapter à l'évolution rapide des marchés tout en respectant les exigences de style et les délais.
Chaque jour, nous constatons qu’ouvrir un dictionnaire, si spécialisé soit-il, ne suffit pas pour comprendre et rendre le sens d’un texte."

I assume, perhaps too generously, that if somebody is asking a question in KudoZ they have already exhausted all readily available alternatives, including general language dictionaries (which *every* translator surely has) and well-known online dictionaries. I'm quite happy to accept that the asker doesn't have access to good specialist dictionaries, partly because there are so few of them. So I don't then expect respondents to quote any old junk just pulled off the web.

Google may be useful indeed, but only as a secondary resource. The most important resource, surely, is what you as a translator have developed in the form of terminology, bitexts, etc. Google may be useful for confirming that a term is used in particular way in the TL, but no more than that.

