Pages in topic:   [1 2] >
How many English words do you know?
Thread poster: RominaZ

RominaZ  Identity Verified
Argentina
Member (2006)
English to Spanish
+ ...
Jul 27, 2011

This thread is part of the Translator playground: a place for translators to have fun, to network, to learn, and to hone their translation or linguistic skills. See the announcement here.

Need a quick break from work? In this forum translators and language professionals can share quotes about translation, tongue twisters and word plays, translation challenges, etc.

All are welcome to participate and to add new items to this and the other areas of the Translator playground; have fun with it! If you need help or would like to propose an addition to the Translator playground, contact site staff through the online support system.


A short post in The Economist makes a review of a tool that can estimate the number of English words one knows.

testyourvocab.com is a serious research project which will, in five minutes, let you estimate your own vocabulary size. Better still, you'll be contributing to the research. The test is here; the blog about it, here. Go, test, read, enjoy—and remember that bragging in the comments is a bit naff...


Willing to give it a try?

testyourvocab.com.


Direct link Reply with quote
 

Dave Bindon  Identity Verified
Greece
Local time: 21:17
Member (2010)
Greek to English
37,200 Jul 27, 2011

Is that all? I probably know a few more that the dictionary daren't print!

Direct link Reply with quote
 
Krzysztof Kajetanowicz  Identity Verified
Poland
Local time: 20:17
English to Polish
+ ...
17600 Jul 27, 2011

@Dave: some words that would be separate entries in a dictionary are counted as one word here.

Direct link Reply with quote
 

Dave Bindon  Identity Verified
Greece
Local time: 21:17
Member (2010)
Greek to English
I think... Jul 27, 2011

Krzysztof Kajetanowicz wrote:

@Dave: some words that would be separate entries in a dictionary are counted as one word here.


I think this is supposed to be a simple estimate of how many words you know in at least one sense of the word without the complication of estimating how many meanings you know for each word.

A bit of Googling suggests that either my score is very high, or the website has vastly overestimated my vocabulary.

As a professional translator into English, I'd expect (or at least hope) that my vocabulary would be above average. However, I think my score is artificially high because some of the test words happen to be regular jargon for me, in fields which interest me or fields in which I work a lot.


Congratulations on your own score. If Mr Google is correct, then your vocabulary is as good as that of most university-educated native speakers of English.


Direct link Reply with quote
 

Riccardo Schiaffino  Identity Verified
United States
Local time: 12:17
Member (2003)
English to Italian
+ ...
41,200 Jul 27, 2011

Missed just a dozen words in the final page.

Direct link Reply with quote
 

Ambrose Li  Identity Verified
Canada
Local time: 14:17
Chinese to English
+ ...
The research survey Jul 27, 2011

seems very poorly designed. It is asking lots of questions that are completely irrelevant for immigrants that will eventually mislead them.

Direct link Reply with quote
 

Dave Bindon  Identity Verified
Greece
Local time: 21:17
Member (2010)
Greek to English
I'm interested Jul 27, 2011

Ambrose Li wrote:

seems very poorly designed. It is asking lots of questions that are completely irrelevant for immigrants that will eventually mislead them.


Please explain what you think is irrelevant or misleading. I can only see this from the perspective of a single-language "native speaker" of English but I know (from your comments on another site!) that you are "bilingual".

[I've used inverted commas for "native speaker" and "bilingual" because I know that the terms are open to interpretation and that students of Applied Linguistics go off on long rants whenever they read the terms! Chill, guys, they're just words!]


Direct link Reply with quote
 

Werner Maurer  Identity Verified
Canada
Local time: 11:17
Spanish to English
+ ...
33,700 Jul 27, 2011

Thought it'd be more, heheh. OTOH, Shakespeare is said to have around 35,000 (in his writings, that is), so I guess I'm in good company. And the King James Bible only has around 11,000, including numbers, names and place names.

And congrats to the high scorers!


Direct link Reply with quote
 

Arianne Farah  Identity Verified
Canada
Local time: 14:17
Member (2008)
English to French
32,600 Jul 27, 2011

Not bad for a B language I am native in English though; my education just happened to be franco-centric. Also it didn't hurt that a lot of rarer English words are borrowed from French (maladroit, embonpoint, portmanteau, etc.).

As others said it's probably meant to be a measure of general vocabulary since the English language technically encompasses an almost infinite quantity of words (think of technical terms & names of molecules).


Direct link Reply with quote
 

Neil Coffey  Identity Verified
United Kingdom
Local time: 19:17
French to English
+ ...
Method only really suitable for getting averages across population? Jul 27, 2011

You should always bear in mind that these figures are on the "very" side of approximate. I'm not sure it's even clear that they're accurate to 2 significant figures (which seems to be the accuracy they're being quoted to). They also have the problem that it's not very clear what's actually being measured, either in terms of the precise definition of "word", or in terms of whether the measure of somebody "knowing" a rare word is really a measure of them "knowing" word formation rules, words from other languages accidentally used in English, or indeed simply a measure of how closely the biases in a person's reading interests matches the biases of the sample corpora used to construct the test.

The *averages* shown by this study so far seem very in line with other informal estimates that have been made from processes such as going through a sample of pages from a dictionary marking the words you know, then scaling up. What isn't clear (because they haven't given the data yet) is whether the variation shown by this study matches that from other more information techniques. (As an example, Crystal reports carrying out this study on 3 speakers on a secretary, well-read businesswoman and university lecturer and getting results of 31.5K, 63K and 56.25K words respectively, with a 25% increase in these figures when you take into account their reported "passive" vocabulary.)

Note also that when you get on to extremely rare words, you come up against a problem of what it means to actually "know the meaning" of that word. I said that I didn't "know" the words "tricorn" or "uxoricide" because I couldn't say that I've actually ever seen them used in a piece of English text; but I know damn well what I'd *expect* them to mean if ever I saw them used. But then... would another speaker who didn't have the cross-language knowledge to anticipate their meaning but who had seen, say, one instance of these words, then form the same perception of what those words "meant" as (a) me, or (b) another person who had also seen one instance? How many instances of these rare words would it take for speakers to have very similar perceptions of the overall "meaning" of these words? And how much does that differ from speaker to speaker? Then, in that situation, at what point does the distinction between "passive" and "active" vocabulary really make sense at this end of the "vocabulary frequency scale"?

Anyway, coming back to the figures from this test: to really make much more sense of individual figures, we'd need more information on how much variation there is in a given participant's test scores when a different sample of words is used for the evaluation. A limitation of the test at present (or rather, a methodological choice which affects how the results can be used) is that the same list of words is used for every participant. So whilst the mean score across thousands of participants may tell you something, the error rate for an individual participant could be quite large (and is unknown from the information presented). So e.g. Bob gets 29,000 on the test and then Jim gets 41,000. But the "real" vocab of both speakers could be about 35,000, but the margin of error of the test greater than +/-6,000.

[Edited at 2011-07-27 23:09 GMT]


Direct link Reply with quote
 

Neil Coffey  Identity Verified
United Kingdom
Local time: 19:17
French to English
+ ...
Shakespeare: unfair test Jul 27, 2011

Werner Maurer wrote:
Thought it'd be more, heheh. OTOH, Shakespeare is said to have around 35,000 (in his writings, that is), so I guess I'm in good company. And the King James Bible only has around 11,000, including numbers, names and place names.


The 30K-ish figure for Shakespeare's vocabulary comes if you count as separate "words" declined forms of the same base form, e.g. counting "go", "goes", "going" etc as different words.

In the current test, these declined forms would actually be counted as instances of their base word. (Or put another way, when counting words as they are in this test, Shakespeare used a vocabulary of around 20,000 words.)

One thing to note: the entire corpus of Shakespeare's writing from which the above figures are derived comes to only around 880,000 words. A full-time professional translator could readily output this many words in a couple of years.

[Edited at 2011-07-27 23:11 GMT]


Direct link Reply with quote
 

Antonio Fajardo  Identity Verified
Spain
Local time: 20:17
Member (2011)
English to Spanish
+ ...
passive VS active vocabulary Jul 27, 2011

It's not the same being able to understand 35,000 words (passive vocabulary) than being able to use them! (active vocabulary)

Maybe Shakespeare is further than we thought?


Direct link Reply with quote
 

Neil Coffey  Identity Verified
United Kingdom
Local time: 19:17
French to English
+ ...
Update Jul 28, 2011

The authors have actually posted a bit of information on the points I mention above:

http://testyourvocab.com/details.php

On statistical grounds, they claim the margin of error is around +/- 10%. As far as I can understand, this margin of error would be true *IF* it is true that if, say, a speaker knows 15,000 words, those 15,000 words are precisely the most frequent 15,000 occurring in their corpus. It's hard to assess to what extent this is actually the case.


Direct link Reply with quote
 

Michael Grant
Japan
Local time: 03:17
Japanese to English
Just a few words can make a big difference... Jul 28, 2011

Also, I noted that if did the test two times and checked just two more boxes, that the estimated word count jumped by 4000+ words!!!

Obviously this estimation must be taken with a grain of salt...

MGrant


Direct link Reply with quote
 

Stanislaw Czech, MCIL  Identity Verified
United Kingdom
Local time: 19:17
Member (2006)
English to Polish
+ ...
rather disappointing 18,500 Jul 28, 2011

I wonder what would be result if to include legal terminology.

[Edited at 2011-07-28 10:38 GMT]


Direct link Reply with quote
 
Pages in topic:   [1 2] >


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


How many English words do you know?

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search