“It began with a curiosity about why the ten most common verbs in the English language are irregular, even though the vast majority of verbs are regular. Their discovery, arrived at through data-mining several centuries’ worth of texts, amounts to a sort of linguistic natural selection: the more frequently an irregular verb is used, the less likely it is to be regularized over time. It was the Ngram Viewer, and access to Google’s vast library of digitized books, that enabled this discovery.”
Mark O’Connell reads “Uncharted: Big Data as a Lens on Human Culture,” a new book by the scientists Erez Aiden and Jean-Baptiste Michel, founders of the field they call “culturomics”: http://nyr.kr/OBr9bg (via newyorker)
As striking as these infographics are in their encapsulations of historical truths, they don’t typically tell us anything that we didn’t already know. And this is true of the book as a whole. The data on censorship, for instance, is embedded deep in a luxuriance of padding. We get stuff about how Helen Keller was “a hero to millions, a symbol of the triumph of the human spirit over adversity” and how “Marcel Proust became famous for writing good books,” which is one of those facts so incontrovertibly true that stating it sounds a mysteriously false note. And a data-mining examination of the history of fame, whereby we learn that Adolf Hitler is the most famous person born in the past two centuries (i.e., mentioned in the most books), leads to the insight that “darkness, too, lurks among the n-grams, and no secret darker than this: Nothing creates fame more efficiently than acts of extreme evil. We live in a world in which the surest route to fame is killing people, and we owe it to one another to think about what that means.” After a while, you begin to suspect that this sort of wan reflection might be compensating for the fact that the data itself reveal little that is new.
But don’t the emperor’s clothes look lovely?