Bigger, better Google Ngrams: brace yourself for the power of grammar

Source: The Atlantic
Story flagged by: RominaZ

Back in December 2010, Google unveiled an online tool for analyzing the history of language and culture as reflected in the gargantuan corpus of historical texts that have been scanned and digitized as part of the Google Books project. They called the interface the Ngram Viewer, and it was launched in conjunction with a blockbuster paper in the journal Science that baptized this Big Data approach to historical analysis with the label “culturomics.”

The appeal of the Ngram Viewer was immediately obvious to scholars in the digital humanities, linguistics, and lexicography, but it wasn’t just specialists who got pleasure out of generating graphs showing how key words and phrases have waxed and waned over the past few centuries. Here at The Atlantic, Alexis Madrigal collected a raft of great examples submitted by readers, some of whom pitted “vampire” against “zombie,” “liberty” against “freedom,” and “apocalypse” against “utopia.” ATumblr feed brought together dozens more telling graphs. If nothing else, playing with Ngrams became a time suck of epic proportions.

As of today, the Ngram Viewer just got a whole lot better. For starters, the text corpus, already mind-bogglingly big, has become much bigger: The new edition extracts data from more than eight million out of the 20 million books that Google has scanned. That represents about six percent of all books ever published, according to Google’s estimate. The English portion alone contains about half a trillion words, and seven other languages are represented: Spanish, French, German, Russian, Italian, Chinese, and Hebrew.

The Google team, led by engineering manager Jon Orwant, has also fixed a great deal of the faulty metadata that marred the original release. For instance, searching for modern-day brand names — like Microsoft or, well, Google — previously revealed weird, spurious bumps of usage around the turn of the 20th century, but those bumps have now been smoothed over thanks to more reliable dating of books. More.

See: The Atlantic

Comments about this article


Bigger, better Google Ngrams: brace yourself for the power of grammar
Dr Sarai Pahla, MBChB
Dr Sarai Pahla, MBChB
Germany
Local time: 23:24
Member (2012)
Japanese to English
+ ...
Addictive Oct 25, 2012

Thanks for this share - I am completely addicted to searching for phrases now - I try any combinations I can think of. Some trends are hard to analyse, some are incredibly interesting, but some are just downright offensiveicon_smile.gif

 
Gennady Lapardin
Gennady Lapardin  Identity Verified
Russian Federation
Local time: 00:24
Italian to Russian
+ ...
Russian included! Oct 27, 2012

e.g. comparison of usage of some Russian verbs over time (1800-2008) http://tinyurl.com/92v4q23
Very interesting!
Thank you


 

Sign in to add a comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Jared Tabor[Call to this topic]

You can also contact site staff by submitting a support request »
This discussion can also be accessed via the ProZ.com forum pages.


Translation news
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.

All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search