Experiments in Computational Criticism #2: Victorian Visualizations and the Dangers of Google N-Gram Viewer

**This post was originally written on December 2, 2016.**

This is a quick lesson on the importance of being careful with the terms one uses in taking Google Ngram Viewer as a tool.

Initially, one might have a hypothesis: the idea of science being “fun” (in the English-speaking world) emerges in the twentieth century.  A quick search of the phrase “science is fun” in the Ngram Viewer would seem to support this claim[i]:

science-is-fun.png

The phrase appears to become prominent only after 1940.  But looking at Google Ngram Viewers (ridiculously confusing and UNLABELED axis), we learn that at most, the trigram “science is fun” only occurs in 0.000000180 % of all the trigrams in the Google Bookscorpus.  This seems like a very low amount of data.  But we can verify it with some quick searches through databases such as Google Books and Hathi Trust.  Examples that I could find prior to 1900 of the phrase “science is fun” were generally miscategorizations: for instance, the phrase “science is fundamental.”  So one might be willing to make a weak claim that the phrase does become popular only in the twentieth century.

But wait.  Is it possible that the concept of science being “fun” existed prior to the twentieth century, but instead used different rhetoric?  Of course it is.  For instance, if we search for “fun,” we find that the term “fun” seems to have been less popular in the nineteenth century (This is very much in line with the work of historians of play, who note that industrialization and new concepts of work and leisure time were accompanied by an increase in the number of recreational activities and the rise of the cult of sports.)[ii]:

fun.png

This is where we also need to keep in mind the problems with Optical Character Recognition (OCR) software: if you go into the Google Books corpus, one will find that many of the results which were used as data for the apparent popularity of “fun” from 1800 – 1820 were actually references to “sun.”  The OCR has simply confused “f” for the “long s,” “ſ.”  So perhaps a different term: for instance, “pleasure” or “enjoyment” might be more appropriate.  Similarly, the concept of “science” as a field is relatively new as well; previously “natural philosophy” was a more common label.

To keep an eye on these issues, it is often helpful to construct a chart like the one below, which allows the researcher to more carefully investigate each of the terms of interest.  Note that I have used the “+” to combine all the various cases of the terms into one line.

Note that my search terms are not absolutely symmetrical.  “Science is pleasurable,” for instance, yielded no Ngram results, nor did “fun of natural philosophy.”  So I changed my search terms slightly where applicable.  “Enjoyment(s) of Science” turns out to be too sporadic to draw any conclusions.  But the fact that “Pleasure(s) of Science” declines in the second half of the nineteenth century, while “Science is Fun” increases in the twentieth century, should be a reason to doubt our initial hypothesis.  The concept of science being “fun” almost certainly is not a twentieth-century concept, although the change in rhetoric deserves more attention, given the different connotations of “fun” and “pleasure.”  More interesting, however, is the pervasive ABSENCE of a connection between “Natural Philosophy” and “fun” or “pleasure” suggests that PERHAPS the shift away from the term “natural philosophy” towards the more frequent use of the term “science” is also related to the changing relationship between science and feelings of pleasure.

So while it is hard to draw strong conclusions from the Google Ngram Viewer, given the limitations of its OCR, the difficulty of compiling and searching for all relevant terms, and the fact that its archive is restricted to what is available in Google Books, it can be a useful tool for testing initial hypotheses and for developing ideas for further research.


[i] https://books.google.com/ngrams/graph?content=fun&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cfun%3B%2Cc0

[ii] https://books.google.com/ngrams/graph?content=science+is+fun&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cscience%20is%20fun%3B%2Cc0

[iii] https://books.google.com/ngrams/graph?content=science%2BScience%2BSCIENCE&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28science%20%2B%20Science%20%2B%20SCIENCE%29%3B%2Cc0

[iv] https://books.google.com/ngrams/graph?content=NATURAL+PHILOSOPHY%2Bnatural+philosophy%2BNatural+philosophy+%2B+Natural+Philosophy&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28NATURAL%20PHILOSOPHY%20%2B%20natural%20philosophy%20%2B%20Natural%20philosophy%20%2B%20Natural%20Philosophy%29%3B%2Cc0

[v] https://books.google.com/ngrams/graph?content=pleasure%2BPleasure%2BPLEASURE&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28pleasure%20%2B%20Pleasure%20%2B%20PLEASURE%29%3B%2Cc0

[vi] https://books.google.com/ngrams/graph?content=pleasure+of+science+%2B+Pleasure+of+science+%2B+Pleasure+of+Science+%2B+PLEASURE+OF+SCIENCE+%2B+pleasures+of+science+%2B+Pleasures+of+science+%2B+Pleasures+of+Science+%2B+PLEASURES+OF+SCIENCE&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28pleasure%20of%20science%20%2B%20Pleasure%20of%20science%20%2B%20Pleasure%20of%20Science%20%2B%20PLEASURE%20OF%20SCIENCE%20%2B%20pleasures%20of%20science%20%2B%20Pleasures%20of%20science%20%2B%20Pleasures%20of%20Science%20%2B%20PLEASURES%20OF%20SCIENCE%29%3B%2Cc0

[vii] https://books.google.com/ngrams/graph?content=fun%2BFun%2BFUN&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28fun%20%2B%20Fun%20%2B%20FUN%29%3B%2Cc0

[viii] https://books.google.com/ngrams/graph?content=science+is+fun+%2B+Science+is+fun+%2B+Science+is+Fun+%2B+SCIENCE+IS+FUN&case_insensitive=on&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28science%20is%20fun%20%2B%20Science%20is%20fun%20%2B%20Science%20is%20Fun%20%2B%20SCIENCE%20IS%20FUN%29%3B%2Cc0

[ix] https://books.google.com/ngrams/graph?content=Enjoyment%2Benjoyment%2BENJOYMENT&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28Enjoyment%20%2B%20enjoyment%20%2B%20ENJOYMENT%29%3B%2Cc0

[x] https://books.google.com/ngrams/graph?content=(enjoyment+of+science)+%2B+(Enjoyment+of+science)+%2B+(Enjoyment+of+Science)+%2B+(ENJOYMENT+OF+SCIENCE)+%2B+enjoyments+of+science+%2B+Enjoyments+of+science+%2B+Enjoyments+of+Science+%2B+ENJOYMENTS+OF+SCIENCE&case_insensitive=on&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28%28enjoyment%20of%20science%29%20%2B%20%28Enjoyment%20of%20science%29%20%2B%20%28Enjoyment%20of%20Science%29%20%2B%20%28ENJOYMENT%20OF%20SCIENCE%29%20%2B%20enjoyments%20of%20science%20%2B%20Enjoyments%20of%20science%20%2B%20Enjoyments%20of%20Science%20%2B%20ENJOYMENTS%20OF%20SCIENCE%29%3B%2Cc0