A generation ago, students would say they “graduated from college,” but now they “graduate college.” These tiny fluctuations in the way we use language are ubiquitous because “children don’t learn the language their parents actually speak,” according to David Smith, an assistant professor in the College of Computer and Information Science.
The discrepancies don’t significantly impede our ability to understand our children and grandchildren, he said, “but accumulation of small changes over long periods of time is enough to make our English sound a lot different from Shakespeare, Chaucer, or Beowulf.”
Backed by a Google Faculty Research Award, Smith is currently studying how languages have changed over the last several hundred years. But he’s doing it in a way only recently made possible through technological developments in the digital humanities and natural language processing. In the last few decades, libraries have been working to digitize literature. Now that millions of books are available as searchable files, researchers are able to ask questions that couldn’t be asked before.
Smith and his team will use corpora like the Penn Treebank, which includes the syntactic analyses of 30,000 sentences from The Wall Street Journal, to build statistical models that automatically detect the syntax of a sentence in a digitized book.
See: Northeastern
Subscribe to the translation news daily digest here. See more translation news.
Comments about this article