Experiments in Computational Criticism #6: "'Golf','Tennis', 'Shuttlecock', and 'Football' in Victorian Scientific Periodicals"

In a previous experiment, I attempted to see whether within the discourse of Victorian science, “cricket” was associated with nation. I could not find a way to do this successfully. In this experiment, I attempted to see if other sports were associated with nation/empire in NatureNotices of the Proceedings at the Meetings of the Members of the Royal InstitutionPhilosophical MagazineProceedings at the Royal Society of EdinburghProceedings at the Royal Society of London, the Reports of the BAAS. Again, at the end of this experiment, I did not feel I had strong enough results to argue in the affirmative. But it is still interesting for demonstrating how I approach these experiments.

Methodology: Recreational Reckoning

Experimental Question

I complete my distant readings of texts using packages others have developed in R. R can be a powerful tool for better understanding texts. It isn't always necessary to have a fully testable hypothesis in mind; visualizing texts can be a powerful tool for discovery, especially when you are willing to have fun, exploring the many ways in which one can customize your analysis. On the other hand, because the data can be easily manipulated, one can easily fall into the trap of thinking they observe a feature in the text and then manipulating the text to draw out that feature. Fishing for information that supports a theory one already holds is a real problem in the field labelled by scholars such as those in the Stanford Literary Lab as “computational criticism.”

There are several principles that can be used to approach objective experimentation in automated text analysis, as discussed in Justin Grimmer and Brandon M. Stewart's “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” (Political Analysis, 2013). Unlike the social sciences, however, the humanities more generally proceed not through testable and reproducable experiments, but through the development of ideas. Recreational computational criticism–what I call 'Recreational Reckoning'–therefore asks only that you choose one question that your analysis will answer. Questions such as: “Does Dickens's Bleak House include more masculine or feminine pronouns?”; “What topics are central to the Sherlock Holmes canon?”; “Do novel titles become longer or shorter over the course of the nineteenth-century?” New features may become observable while pursuing this analysis. And it is up to the critic to theorize about what this newly visualized feature means. For this project, my question was whether I would find references to the Britain or the British Empire closely associated with references to golf, tennis, shuttlecock, and football.

Why R?

R isn't the only tool one can use for visualizing texts. However, I have found that R computational methods shine when you have texts that are either too long to read quickly, or too many texts to read quickly. They are also useful when you have a specific methodology in mind or prioritize customizability in the data mining or the visualization. For quick visualizations of things like word clouds, Voyant (https://voyant-tools.org) is probably a better.

Downloading R

The first step in using this methodology is obviously to download R. This can be done here (https://www.r-project.org). Users should also download RStudio, an environment which will make running the code easier. (If you are reading this in R/RStudio, then congratulations on already having started!)

Setting Directory

The first step in analyzing your data is choosing a workspace. I recommend creating a new folder for each project. This folder will be your working directory. The working directory in R is generally set via the “setwd()” command. However, here, we're going to be working within R Markdown Files (.Rmd). R Markdowns rely on a package called knitr, which generally requires the R Markdown to be stored in the location of your working directory. So I would recommend creating a new folder, and then downloading these R Markdown Files to the folder where you want to work. For example, you might create a folder called “data” on your computer desktop, in which case your working directory would be something like “C:/Users/Nick/Desktop/data”. You can check that your working directory is indeed in the right place by using the “getwd()” function below.

getwd()

Downloading Packages

The next step is to load in the packages that will be required. My methodology makes use of several packages, depending on what is required for the task. Rather than loading the libraries for each script, I generally find it more useful to install and initialize all the packages I will be using at once.

Packages are initially loaded with the “install.packages()” function. HOWEVER, THIS STEP ONLY HAS TO BE COMPLETED ONCE.

“ggmpap” is a package for visualizing location data.

“ggplot2” is a package for data visualizations. More information can be found here (https://cran.r-project.org/web/packages/ggplot2/index.html).

“pdftools” is a package for reading pdfs. In the past, you had to download a separate pdf reader, and it was a real pain. You, reader, are living in a golden age. Information on the package can be found here (https://cran.r-project.org/web/packages/pdftools/pdftools.pdf).

“plotly” is a package for creating interactive plots.

“quanteda” is a package by Ken Benoit for the quantitative analysis of texts. More information can be found here (https://cran.r-project.org/web/packages/quanteda/quanteda.pdf). quanteda has a great vignette to help you get started (here). There are also exercises available here.

“readr” is a package for reading in certain types of data. More information can be found here (https://cran.r-project.org/web/packages/readr/readr.pdf).

“SnowballC” is a package for stemming words (lemmatizing words, or basically cutting the ends off words as a way of lowering the dimensions of the data. For instance, “working”,“worked”, and “works” all become “work”).

“tm” is a simple package for text mining. An introduction to the package can be found here (https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf).

“tokenizers” is a package which turns a text into a character vector. An introduction to the package can be found here (https://cran.r-project.org/web/packages/tokenizers/vignettes/introduction-to-tokenizers.html).

install.packages("ggmap")
install.packages("ggplot2")
install.packages("pdftools")
install.packages("plotly")
install.packages("quanteda")
install.packages("readr")
install.packages("SnowballC")
install.packages("stm")
install.packages("tm")
install.packages("tokenizers")

Loading Libraries

The next step is to load the libraries for these packages into your environment, which is accomplished with the “library()” function.

library(ggmap)
library(ggplot2)
library(quanteda)
library(pdftools)
library(plotly)
library(readr)
library(SnowballC)
library(stm)
library(tm)
library(tokenizers)

A Note About Citation

Most of the software packages are written by academics. Reliable and easy-to-use software is difficult to make. If you use these packages in your published work: please cite them. In R you can even see how the author would like to be cited (and get a bibtex entry).

citation("ggplot2")
citation("quanteda")
citation("pdftools")
citation("plotly")
citation("readr")
citation("SnowballC")
citation("stm")
citation("tm")
citation("tokenizers")

Uploading Data and setting variables.

I had already created acquired .txt volumes of these texts. So I simply needed to upload the data. There are also various parameters that I might find useful later that need to be defined. The basic methodology is that I am going to construct a script that will go through each word in the .txt files and try to match it with some other words. I chose to look for references to golf, tennis, football, and shuttlecock. However, it is often helpful to make sure you know the words which occur around the referenced term, to provide context. The “conlength” variables provide three different sizes of “windows” for this purpose. For instance, “ProfSportsshortconlength” is set to three, meaning the final dataset will have a column showing the three words to either side of the matched term.

    templocation <- paste0(getwd(),"/Documents")
    ProfSportslocations2 <- c(paste0(templocation,"/Nature/Volumes"),paste0(templocation,"/Philosophical-Magazine/Volumes"),paste0(templocation,"/Reports-of-the-BAAS/Reports"),paste0(templocation,"/Royal-Institution/Proceedings"),paste0(templocation,"/Royal-Society-of-Edinburgh/Proceedings"), paste0(templocation,"/RSL/Proceedings"))
    ProfSportsIndex2 <- c("Nature","Philosophical-Magazine","BAAS","Royal-Institution","RSE","RSL")
    ProfSportslongconlength2 <- 250
    ProfSportsshortconlength2 <- 3
    ProfSportsPOSconlength2 <- 10
    ProfSportssearchedtermlist2 <- c("golf","tennis","football","shuttlecock")
    ProfSportsoutputlocation2 <- paste0(getwd(),"/WordFlagDataFrames")
    ProfSportsWordFlagdfPath2 <- paste0(ProfSportsoutputlocation2,"/","ProfSportsWordFlagdf2.txt")

To create the data frame compiling every reference to a term, run the following script. Be aware that this takes quite a while. So if you already have a dataset that you just need to upload, see below instead.

if(file.exists(ProfSportsoutputlocation2) == FALSE)
ProfSportsstemsearchedtermlist2 <- unique(wordStem(ProfSportssearchedtermlist2)) #lemmatizes the list of terms you want to search for.
ProfSportsWordFlagmat2 <- matrix(,ncol=13,nrow=1)
for (g in 1:length(ProfSportslocations2)) {
      tempdocloc <- ProfSportslocations2[g]
      files <- list.files(path = tempdocloc, pattern = "txt", full.names = TRUE) #creates vector of txt file names.

      for (i in 1:length(files)) {
        fileName <- read_file(files[i])
        Encoding(fileName) <- "UTF-8"  #since tokenize_sentences function requires things to be encoded in UTF-8, need to remove some data.
        fileName <- iconv(fileName, "UTF-8", "UTF-8",sub='')
        ltoken <- tokenize_words(fileName, lowercase = TRUE, stopwords = NULL, simplify = FALSE)
        ltoken <- unlist(ltoken)
        stemltoken <- wordStem(ltoken) #this uses the Snowball library to lemmatize the entire text.
        textID <- i
        for (p in 1:length(ProfSportsstemsearchedtermlist2)) {
          ProfSportsstemsearchedterm2 <- ProfSportsstemsearchedtermlist2[p]
          for (j in 1:length(stemltoken)) {
              if (ProfSportsstemsearchedterm2 == stemltoken[j]) {
                if (j <= ProfSportslongconlength2) {longtempvec <- ltoken[(1:(j+ProfSportslongconlength2))]}
                if (j > ProfSportslongconlength2) {longtempvec <- ltoken[(j-ProfSportslongconlength2):(j+ProfSportslongconlength2)]}
                if (j <= ProfSportsshortconlength2) {shorttempvec <- ltoken[(1:(j+ProfSportsshortconlength2))]}
                if (j > ProfSportsshortconlength2) {shorttempvec <- ltoken[(j-ProfSportsshortconlength2):(j+ProfSportsshortconlength2)]}
                if (j <= ProfSportsPOSconlength2) {POStempvec <- ltoken[(1:(j+ProfSportsPOSconlength2))]}
                if (j > ProfSportsPOSconlength2) {POStempvec <- ltoken[(j-ProfSportsPOSconlength2):(j+ProfSportsPOSconlength2)]}
                TempTextName <- gsub(paste0(ProfSportslocations2[g],"/"),"",files[i]) #This grabs just the end of the file path.
                TempTextName <- gsub(".txt","",TempTextName) #This removes the .txt from the end of the name.
                temprow <- matrix(,ncol=13,nrow=1)
                colnames(temprow) <- c("Text", "Text_ID", "ProfSportsstemsearchedterm2","Lemma","Lemma_Perc","KWIC","Total_Lemma","Date","Category","Short_KWIC","POS_KWIC","Current_Date","Corpus")
                temprow[1,1] <- TempTextName
                temprow[1,2] <- textID
                temprow[1,3] <- ProfSportsstemsearchedterm2
                temprow[1,4] <- j
                temprow[1,5] <- (j/length(stemltoken))*100
                temprow[1,6] <- as.character(paste(longtempvec,sep= " ",collapse=" "))
                temprow[1,7] <- length(stemltoken)
                temprow[1,8] <- strsplit(TempTextName,"_")[[1]][1]
                temprow[1,10] <- as.character(paste(shorttempvec,sep= " ",collapse=" "))
                temprow[1,11] <- as.character(paste(POStempvec,sep= " ",collapse=" "))
                temprow[1,12] <- format(Sys.time(), "%Y-%m-%d")
                temprow[1,13] <- ProfSportsIndex2[g]
                ProfSportsWordFlagmat2 <- rbind(ProfSportsWordFlagmat2,temprow)
              }
          }
        }
        print(paste0(i," out of ",length(files)," in corpus ",g," out of ",length(ProfSportslocations2))) #let's user watch as code runs for long searches
      }
}
      ProfSportsWordFlagmat2 <- ProfSportsWordFlagmat2[-1,]
      ProfSportsWordFlagdf2 <- as.data.frame(ProfSportsWordFlagmat2)
      write.table(ProfSportsWordFlagdf2, ProfSportsWordFlagdfPath2)
ProfSportsWordFlagdf2

If you have a previously constructed dataset, you can obviously upload it using a script like this.

ProfSportsWordFlagdf2 <- read.table(ProfSportsWordFlagdfPath2)

#Results

This time I got 341 flags, a much more substantial number. Here is a random sample of what that data looks like.

ProfSportsWordFlagdf2[sample(1:nrow(ProfSportsWordFlagdf2),5),c("Text","ProfSportsstemsearchedterm2","Short_KWIC")]
##                                Text ProfSportsstemsearchedterm2
## 1   187105-187111_Nature_Vol.04_v00                        golf
## 56  188611-188704_Nature_Vol.35_v00                       tenni
## 65  188705-188710_Nature_Vol.36_v00                        golf
## 36  188311-188404_Nature_Vol.29_v00                        golf
## 173 189505-189510_Nature_Vol.52_v00                        golf
##                                    Short_KWIC
## 1             the europe i golfe de naples it
## 56      the exception of tennis has little of
## 65         decl on meridian golf in theory as
## 36  is linenalgen des golfes von neapel table
## 173   in general and golf links in particular

With the data in hand, I could now ask some questions about the corpus.

Question 1: Do references to these sports in Victorian Professional Science Publications increase over the course of the century?

This question may be important in our interpretation of the data. It is always interesting to consider the historical arc of references to specific key words.

Script

We can visualize the years each of these references occurred in using the following script. Note that in order to find matches, the search terms were all transformed into a “stemmed” version of the word. “Football” became “footbal,” for instance, so that when searching the text it could flag both references to “football” (stemmed as “footbal”) and “footballs” (also stemmed as “footbal”).

library(ggplot2)
# Visualizing ProfSportsFreqdf BY DATE
      p <- ggplot(ProfSportsWordFlagdf2, aes(x = as.numeric(substr(Date,1,4))))
      pg <- geom_histogram(binwidth=5)
      pl <- p + pg + scale_x_continuous(limits = c(1800, 1900))  + labs(x = "Date", y = "Frequency / Decade", title = "Appearances of Sports with Victorian Professional Science Periodicals")+facet_wrap(~ProfSportsstemsearchedterm2)
      (pl)
download (5).png

Results

Almost all sports are mentioned more frequently at the end of the century. However, as this is just a simple measure of how many references were found per decade, this may just be reflective of the increase in the number of periodicals later in the century.

Question 2: What terms are most frequenlty associated with each of these sports?

Again, our central question was whether there are certain terms related to nationality or empire that are closely associated with these sports.

Script

I tested this using the script below. I only looked at the top 20 correlated words for each sport. This is the correlation only within the Key Words in Context. So all terms had at least some correlation.

library(tm)
library(tokenizers)
library(SnowballC)
  CorrelationMin <- 0.1
  ProfSportsstemsearchedtermlist2 <- unique(wordStem(ProfSportssearchedtermlist2)) #lemmatizes the list of terms you want to search for.
  datacorpus <- Corpus(VectorSource(ProfSportsWordFlagdf2$KWIC), readerControl=list(reader = readPlain))
  data.tdm <- TermDocumentMatrix(datacorpus,control = list(removePunctuation = TRUE, stopwords = FALSE, tolower = TRUE, stemming = TRUE, 
                                                           removeNumbers = TRUE, bounds = list(global= c(1,Inf))))
  for (q in 1:length(ProfSportsstemsearchedtermlist2)) {
    print(q)
    assocdata <- findAssocs(data.tdm, ProfSportsstemsearchedtermlist2[q], CorrelationMin)
    tempdata <- as.data.frame(unlist(assocdata[[1]][1:20]))
    keyword <- ProfSportsstemsearchedtermlist2[q]
    AssociatedWord <- rownames(tempdata)
    correlation <- tempdata[[1]]
    tempdata2 <- data.frame(correlation,keyword,AssociatedWord)
      p <- ggplot(tempdata2, aes(x = AssociatedWord, y=as.numeric(correlation)))
      pg <- geom_bar(stat="identity")
      pl <- p + pg + labs(x = "Associated Words", y = "Correlation", title = paste0("Words Associated with '",ProfSportsstemsearchedtermlist2[q], "' in Victorian Professional Science Periodicals"))+coord_flip()
      print(pl)
  }

Results

The results here did not end up being particularly elucidating about the relationship between these sports and the nation / Empire. Some correlations are what one might expect, such as the connection between “courtyard” and “tennis,” or “football” and “school.” Others are things I wouldn't have expected, but can rationalize, such as the connection between “tennis” and “gamble.” Others may the result of errors introduced in the Optical Character Recognition process.

None of these results showed a correlation between these sports and the nation in the corpus. However, the association of “footbal” with “molecul” did strike me as being interesting. This led me to a productive investigation into the history of comparisons between molecular coarseness and sports balls. This is, I believe, the power of recreational reckoning as a methodology. It often illuminates lines of research one may not have otherwise seen.

Experiments in Computational Criticism #5: "Cricket bats" and "cricket balls" Victorian Scientific Periodicals

The following (originally completed in May 2018) was part of a project to investigate the context for the sport of “cricket” for the Victorians. I was particularly interested to see whether cricket's associations with nationality and empire would be visible using distant readings of *Nature*, *Notices of the Proceedings at the Meetings of the Members of the Royal Institution*, *Philosophical Magazine*, *Proceedings at the Royal Society of Edinburgh*, *Proceedings at the Royal Society of London*, the *Reports of the BAAS*. I post this to demonstrate my typical workflow in conducting these experiments, even when they yield less than impressive results.

There were some challenges in this project at the outset. As I quickly discovered, you cannot simply search for “cricket,” as this creates too many false data points in a scientific data set. That is why my search focuses on the sport's ball and its equipment (e.g. “cricket bat”, “cricket ball”). Similarly, I found that my initial plan to add the sport of horse racing as a topic of interest was also too difficult, as “racing” led to too many false matches with discussion of “race” as a grouping of human beings.

Methodology: Recreational Reckoning

Experimental Question

I complete my distant readings of texts using packages others have developed in R. R can be a powerful tool for better understanding texts. It isn't always necessary to have a fully testable hypothesis in mind; visualizing texts can be a powerful tool for discovery, especially when you are willing to have fun, exploring the many ways in which one can customize your analysis. On the other hand, because the data can be easily manipulated, one can easily fall into the trap of thinking they observe a feature in the text and then manipulating the text to draw out that feature. Fishing for information that supports a theory one already holds is a real problem in the field labelled by scholars such as those in the Stanford Literary Lab as “computational criticism.”

There are several principles that can be used to approach objective experimentation in automated text analysis, as discussed in Justin Grimmer and Brandon M. Stewart's “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” (Political Analysis, 2013). Unlike the social sciences, however, the humanities more generally proceed not through testable and reproducible experiments, but through the development of ideas. Recreational computational criticism–what I call 'Recreational Reckoning'–therefore asks only that you choose one question that your analysis will answer. Questions such as: “Does Dickens's Bleak House include more masculine or feminine pronouns?”; “What topics are central to the Sherlock Holmes canon?”; “Do novel titles become longer or shorter over the course of the nineteenth-century?” New features may become observable while pursuing this analysis. And it is up to the critic to theorize about what this newly visualized feature means. For this project, my question was whether I would find references to the Britain or the British Empire closely associated with references to cricket.

Why R?

R isn't the only tool one can use for visualizing texts. However, I have found that R computational methods shine when you have texts that are either too long to read quickly, or too many texts to read quickly. They are also useful when you have a specific methodology in mind or prioritize customizability in the data mining or the visualization. For quick visualizations of things like word clouds, Voyant (https://voyant-tools.org) is probably a better.

Downloading R

The first step in using this methodology is obviously to download R. This can be done here (https://www.r-project.org). Users should also download RStudio, an environment which will make running the code easier. (If you are reading this in R/RStudio, then congratulations on already having started!)

Setting Directory

The first step in analyzing your data is choosing a work space. I recommend creating a new folder for each project. This folder will be your working directory. The working directory in R is generally set via the “setwd()” command. However, here, we're going to be working within R Markdown Files (.Rmd). R Markdowns rely on a package called knitr, which generally requires the R Markdown to be stored in the location of your working directory. So I would recommend creating a new folder, and then downloading these R Markdown Files to the folder where you want to work. For example, you might create a folder called “data” on your computer desktop, in which case your working directory would be something like “C:/Users/Nick/Desktop/data”. You can check that your working directory is indeed in the right place by using the “getwd()” function below.

getwd()

Downloading Packages

The next step is to load in the packages that will be required. My methodology makes use of several packages, depending on what is required for the task. Rather than loading the libraries for each script, I generally find it more useful to install and initialize all the packages I will be using at once, even if I won’t be using all of these packages for a particular experiment.

Packages are initially loaded with the “install.packages()” function. HOWEVER, THIS STEP ONLY HAS TO BE COMPLETED ONCE.

“ggmpap” is a package for visualizing location data.

“ggplot2” is a package for data visualizations. More information can be found here (https://cran.r-project.org/web/packages/ggplot2/index.html).

“pdftools” is a package for reading pdfs. In the past, you had to download a separate pdf reader, and it was a real pain. You, reader, are living in a golden age. Information on the package can be found here (https://cran.r-project.org/web/packages/pdftools/pdftools.pdf).

“plotly” is a package for creating interactive plots.

“quanteda” is a package by Ken Benoit for the quantitative analysis of texts. More information can be found here (https://cran.r-project.org/web/packages/quanteda/quanteda.pdf). quanteda has a great vignette to help you get started (here). There are also exercises available here.

“readr” is a package for reading in certain types of data. More information can be found here (https://cran.r-project.org/web/packages/readr/readr.pdf).

“SnowballC” is a package for stemming words (lemmatizing words, or basically cutting the ends off words as a way of lowering the dimensions of the data. For instance, “working”,“worked”, and “works” all become “work”).

“tm” is a simple package for text mining. An introduction to the package can be found here (https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf).

“tokenizers” is a package which turns a text into a character vector. An introduction to the package can be found here (https://cran.r-project.org/web/packages/tokenizers/vignettes/introduction-to-tokenizers.html).

install.packages("ggmap")
install.packages("ggplot2")
install.packages("pdftools")
install.packages("plotly")
install.packages("quanteda")
install.packages("readr")
install.packages("SnowballC")
install.packages("stm")
install.packages("tm")
install.packages("tokenizers")

Loading Libraries

The next step is to load the libraries for these packages into your environment, which is accomplished with the “library()” function.

library(ggmap)
library(ggplot2)
library(quanteda)
library(pdftools)
library(plotly)
library(readr)
library(SnowballC)
library(stm)
library(tm)
library(tokenizers)

A Note About Citation

Most of the software packages are written by academics. Reliable and easy-to-use software is difficult to make. If you use these packages in your published work: please cite them. In R you can even see how the author would like to be cited (and get a bibtex entry).

citation("ggplot2")
citation("quanteda")
citation("pdftools")
citation("plotly")
citation("readr")
citation("SnowballC")
citation("stm")
citation("tm")
citation("tokenizers")

Uploading Data and setting variables.

I had already created acquired .txt volumes of these texts. So I simply needed to upload the data. There are also various parameters that I might find useful later that need to be defined. The basic methodology is that I am going to construct a script that will go through each word in the .txt files and try to match it with some other words. I chose to look for references to cricket bats and balls, golf balls and clubs, and tennis balls and rackets. However, it is often helpful to make sure you know the words which occur around the referenced term, to provide context. The “conlength” variables provide three different sizes of “windows” for this purpose. For instance, “ProfSportsshortconlength” is set to three, meaning the final dataset will have a column showing the three words to either side of the matched term.

    templocation <- paste0(getwd(),"/Documents")
    ProfSportslocations <- c(paste0(templocation,"/Nature/Volumes"),paste0(templocation,"/Philosophical-Magazine/Volumes"),paste0(templocation,"/Reports-of-the-BAAS/Reports"),paste0(templocation,"/Royal-Institution/Proceedings"),paste0(templocation,"/Royal-Society-of-Edinburgh/Proceedings"), paste0(templocation,"/RSL/Proceedings"))
    ProfSportsIndex <- c("Nature","Philosophical-Magazine","BAAS","Royal-Institution","RSE","RSL")
    ProfSportslongconlength <- 250
    ProfSportsshortconlength <- 3
    ProfSportsPOSconlength <- 10
    ProfSportssearchedtermlist <- c("cricket ball","cricket bat", "golf ball","golf club", "tennis ball","tennis racket")
    ProfSportsoutputlocation <- paste0(getwd(),"/WordFlagDataFrames")
    ProfSportsWordFlagdfPath <- paste0(ProfSportsoutputlocation,"/","ProfSportsWordFlagdf.txt")

To create the data frame compiling every reference to a term, run the following script. Be aware that this takes quite a while. So if you already have a data set that you just need to upload, see below instead.

Running the Script, or Uploading Previous Data

if(file.exists(ProfSportsoutputlocation) == FALSE)
ProfSportsstemsearchedtermlist <- unique(wordStem(ProfSportssearchedtermlist)) #lemmatizes the list of terms you want to search for.
ProfSportsWordFlagmat <- matrix(,ncol=13,nrow=1)
for (g in 1:length(ProfSportslocations)) {
      tempdocloc <- ProfSportslocations[g]
      files <- list.files(path = tempdocloc, pattern = "txt", full.names = TRUE) #creates vector of txt file names.

      for (i in 1:length(files)) {
        fileName <- read_file(files[i])
        Encoding(fileName) <- "UTF-8"  #since tokenize_sentences function requires things to be encoded in UTF-8, need to remove some data.
        fileName <- iconv(fileName, "UTF-8", "UTF-8",sub='')
        ltoken <- tokenize_words(fileName, lowercase = TRUE, stopwords = NULL, simplify = FALSE)
        ltoken <- unlist(ltoken)
        stemltoken <- wordStem(ltoken) #this uses the Snowball library to lemmatize the entire text.
        textID <- i
        for (p in 1:length(ProfSportsstemsearchedtermlist)) {
          ProfSportsstemsearchedterm <- ProfSportsstemsearchedtermlist[p]
          for (j in 1:length(stemltoken)) {
              if (ProfSportsstemsearchedterm == paste0(stemltoken[j]," ",stemltoken[j+1])) {
                if (j <= ProfSportslongconlength) {longtempvec <- ltoken[(1:(j+ProfSportslongconlength))]}
                if (j > ProfSportslongconlength) {longtempvec <- ltoken[(j-ProfSportslongconlength):(j+ProfSportslongconlength)]}
                if (j <= ProfSportsshortconlength) {shorttempvec <- ltoken[(1:(j+ProfSportsshortconlength))]}
                if (j > ProfSportsshortconlength) {shorttempvec <- ltoken[(j-ProfSportsshortconlength):(j+ProfSportsshortconlength)]}
                if (j <= ProfSportsPOSconlength) {POStempvec <- ltoken[(1:(j+ProfSportsPOSconlength))]}
                if (j > ProfSportsPOSconlength) {POStempvec <- ltoken[(j-ProfSportsPOSconlength):(j+ProfSportsPOSconlength)]}
                TempTextName <- gsub(paste0(ProfSportslocations[g],"/"),"",files[i]) #This grabs just the end of the file path.
                TempTextName <- gsub(".txt","",TempTextName) #This removes the .txt from the end of the name.
                temprow <- matrix(,ncol=13,nrow=1)
                colnames(temprow) <- c("Text", "Text_ID", "ProfSportsstemsearchedterm","Lemma","Lemma_Perc","KWIC","Total_Lemma","Date","Category","Short_KWIC","POS_KWIC","Current_Date","Corpus")
                temprow[1,1] <- TempTextName
                temprow[1,2] <- textID
                temprow[1,3] <- ProfSportsstemsearchedterm
                temprow[1,4] <- j
                temprow[1,5] <- (j/length(stemltoken))*100
                temprow[1,6] <- as.character(paste(longtempvec,sep= " ",collapse=" "))
                temprow[1,7] <- length(stemltoken)
                temprow[1,8] <- strsplit(TempTextName,"_")[[1]][1]
                temprow[1,10] <- as.character(paste(shorttempvec,sep= " ",collapse=" "))
                temprow[1,11] <- as.character(paste(POStempvec,sep= " ",collapse=" "))
                temprow[1,12] <- format(Sys.time(), "%Y-%m-%d")
                temprow[1,13] <- ProfSportsIndex[g]
                ProfSportsWordFlagmat <- rbind(ProfSportsWordFlagmat,temprow)
              }
          }
        }
        print(paste0(i," out of ",length(files)," in corpus ",g," out of ",length(ProfSportslocations))) #let's user watch as code runs for long searches
      }
}
      ProfSportsWordFlagmat <- ProfSportsWordFlagmat[-1,]
      ProfSportsWordFlagdf <- as.data.frame(ProfSportsWordFlagmat)
      write.table(ProfSportsWordFlagdf, ProfSportsWordFlagdfPath)
ProfSportsWordFlagdf

If you have a previously constructed dataset, you can obviously upload it using a script like this.

ProfSportsWordFlagdf <- read.table(ProfSportsWordFlagdfPath)

RESULTS

An abbreviated version of the results looks like this:

ProfSportsWordFlagdf[,c("Text","ProfSportsstemsearchedterm","Short_KWIC")]
##                                                                       Text
## 1                                          187311-187404_Nature_Vol.09_v00
## 2                                          188305-188310_Nature_Vol.28_v00
## 3                                          188605-188610_Nature_Vol.34_v00
## 4                                          189211-189304_Nature_Vol.47_v00
## 5                                          189511-189604_Nature_Vol.53_v00
## 6                    185507-185512_Philosophical-Magazine_Ser.4_Vol.10_v00
## 7                    185507-185512_Philosophical-Magazine_Ser.4_Vol.10_v00
## 8                    189201-189206_Philosophical-Magazine_Ser.5_Vol.33_v00
## 9   189911-190107_Proceedings-of-the-Royal-Society-of-Edinburgh_Vol.23_v00
## 10 18540223-18551220_Proceedings-of-the-Royal-Society-of-London_Vol.07_v00
## 11 18540223-18551220_Proceedings-of-the-Royal-Society-of-London_Vol.07_v00
## 12 18991130-19000614_Proceedings-of-the-Royal-Society-of-London_Vol.66_v00
##    ProfSportsstemsearchedterm
## 1                 cricket bat
## 2                   golf club
## 3                   golf club
## 4                   golf club
## 5                 cricket bat
## 6                 cricket bat
## 7                 cricket bat
## 8                 cricket bat
## 9                   golf club
## 10                cricket bat
## 11                cricket bat
## 12                  golf club
##                                        Short_KWIC
## 1        were a molecular cricket bat and suppose
## 2    their foothill club golf club gymnastic club
## 3  when working at golf club felixstowe september
## 4                  room in the golf club house at
## 5                to place a cricket bat in stones
## 6         intending to make cricket bats out each
## 7               the pods his cricket bats but not
## 8        were a molecular cricket bat and suppose
## 9        resembles a miniature golf club the head
## 10          intending to make cricket bats out of
## 11              the pods his cricket bats but not
## 12         the mid surrey golf club arrange ments

The end result was somewhat disappointing. There are only twelve references to these phrases in all of the professional science corpus I've assembled. I determined this to be too little data to make any meaningful conclusions. But that's how things often turn out in Recreational Reckoning experiments.

Experiments in Computational Criticism #4: Visualizing VLC Keywords

Here are some data visualizations I’ve wanted to make since @VLCjournal released its issue on “Keywords.”

Editors Danny Hack and Rachel Ablow emphasized that these keywords were not meant to capture the precise contours of the field. Nonetheless, I wondered what the field would look like if one did visualize the articles in this issue.

 I also wondered in what sense these terms were “keywords.” Were they “keywords” in the sense that they were explanatory words of particular significance, or key in the sense that they indicated or represented the content of a larger text or set of texts: in this case, critical work on Victorian literature and culture (www.oed.com/view/Entry/312961)?

 It struck me that a good measure of this issue might be the co-occurrence of these terms throughout the journal articles (using Ken Benoit’s quanteda package (https://quanteda.io/)). If one suggests that “empire” is key to contemporary work on Victorian literature and culture, then one would expect to find that articles on another keyword, such as “science,” might also mention “empire.” Visualizing a network of these co-occurrences would depict in some fashion the contours of the field as viewed through the Keywords issue, and allow one to measure, in some form, whether certain words are currently more “key” than others.

 If one uses the quanteda package (https://quanteda.io/) to look at the data knowing what the listed keywords might be, the contours of the field might look like this. In the network below I have, for clarity, only plotted the keywords which co-occur more than 10 times. In other words, each edge signals that the two vertices it connects appear together within at least 10 of the entries in the Keywords issue. As we can see, there is a somewhat dense network on the right of keywords which did frequently co-occur, such as “literature,” “work,” “reading,” and “politics.” On the left, however, we have those terms which may be key, but which seem to be less central to the field, given that they are referenced less frequently in other articles: terms like “Anthropocene” and “child.” Much of this seems to align with what I intuit about the shape of critical work on Victorian literature and culture. “Form,” “reading,” “literature,” and “politics” certainly seem to be central concerns.

VLC1.png


However, this is only what the network looks like if we only look at the words authors said were key. If we use quanteda to determine the top features which co-occur in all the articles (editing the data somewhat to remove common terms pronouns, articles, etc.), the result looks like the network below. I have, for clarity, only plotted the top features which co-occur more than 55 times. As we can see, this is a much more conventional view of work on Victorian literature and culture, centered on words like “Victorian,” “Nineteenth-Century,” “history,” “novel,” etc. Very few of the Keywords provided by the journal appear, even though each of the articles is centered on these keywords.


 The data I have visualized here is not meant to be a definitive analysis of the Keywords issue or of modern scholarship on Victorian literature and culture. I have done relatively little work cleaning these text files, for instance, meaning that there certainly could be textual features biasing these visual representations. Nonetheless, there are two lessons I think we can draw from these visualizations.

1.       The editors are right to insist that the keywords they have provided are not meant to provide an image of the contours of the field. Indeed, we see from a comparison of the two images that even the contours of this Keywords issue can take different forms when approached in different ways.

2.       The differences between these visualizations may be a sign that the keywords are functioning exactly as intended, providing a key to work in the field that allows us to view the field in new ways. We all know that academic work on Victorian literature and culture prioritizes literature and history. The importance of work on less obvious issues such as the Victorian child or on ecology can easily be overshadowed. The two figures above demonstrate that we still need scholars (such as the contributors to the “Keywords” issue) to reflect on the topics they find especially generative, in order to draw attention to new aspects of the field which might later become central.  

Experiments in Computational Criticism #3: Charles Dickens and International Cat Day 2018

In honor of International Cat Day 2018 and Charles Dickens’s cat “Bob,” I spent some time today looking into references to cats in Charles Dickens’s novels. Despite Dickens’s presence on today’s definitive Mental Floss list of “11 Writers Who Really Loved Cats” (Sean Hutchinson, August 8, 2018, http://mentalfloss.com/article/49302/11-writers-who-really-loved-cats), a quick search of Dickens’s novels suggests that in his writing, Dickens was much more interested in dogs than in cats (for more on the making of these visualizations, see https://github.com/AnoffCobblah/DickensCats.)

CatDogDickensFrequencies.png.png

This is particularly true of Oliver Twist, in which references to dogs outnumber references to cats eighty-seven to  two. References to cats come closest to the number of references to dogs in Edwin Drood, although the number of references is so paltry that this hardly seems impressive. Therefore, if there is a Dickens novel for cat lovers, data visualization would suggest that readers can do no better than Bleak House, in which cats are referenced almost as frequently as dogs and are referenced enough to suggest their importance to the story.

CatDogDickensDifference.png

This was also reflected in a word cloud I generated of words which frequently appeared next to references to cats and dogs in Dickens’s novelistic corpus. Guppy, Tulkinghorn, and Jarndyce typically appeared, when generating this word cloud, on the “Cat-References” side, while Oliver [Twist] and Sikes frequently appeared on the “Dog-References” side.

CatDogDickensCloud.PNG

However, one would be hard pressed to say that Lady Jane of Bleak House, apparently the most visible cat in Dickens’s novels, serves as a celebration of cats. As Natalie McKnight has pointed out, Lady Jane, like other cat-like characters in Dickens’s writing, is somewhat sinister (Natalie McKnight, “Dickens and Darwin: A Rhetoric of Pets,” The Dickensian 102.469 (Summer 2006): 131-141).  Krook implies that Lady Jane is perfectly capable of tearing a person to shreds (Charles Dickens, Bleak House, Penguin, 2003, p.70). And as Robert E. Lougy points out, Lady Jane’s name is also a slang reference to female genitals (Robert E. Lougy, “Filth, Liminality, and Abjection in Charles Dickens’s Bleak House, ELH 69.2 (Summer 2002): 473-500).

“A large grey cat leaped from some neighbouring shelf.” Frontispiece to th first olume of Dickens's Bleak House, in th heldon &amp; Co. (New York) Househol Edition (1861-71). Scanned image and text by Philip V. Allingham from his own collection. Ava…

“A large grey cat leaped from some neighbouring shelf.” Frontispiece to th first olume of Dickens's Bleak House, in th heldon & Co. (New York) Househol Edition (1861-71). Scanned image and text by Philip V. Allingham from his own collection. Available from Victorian Web: http://www.victorianweb.org/art/illustration/gilbert/15.html.

Perhaps in his personal life Dickens was fonder of cats than his novels might outwardly suggest. In My Father, as I Recall Him (https://www.gutenberg.org/files/27234/27234-h/27234-h.htm), Mary “Mamie” Dickens recalls her father’s fondness for their kittens, especially the deaf kitten whose paw would later be preserved as the taxidermized handle of a letter opener, engraved “C.D. In Memory of Bob 1862” (Alexis Coe, “How Charles Dickens Kept a Beloved Cat Alive,” Slate 18 Dec. 2012. http://www.slate.com/blogs/the_vault/2012/12/18/charles_dickens_cat_the_author_kept_the_pet_alive_through_taxidermy.htm).

However, despite Lady Jane’s violent nature, there is also, just possibly, something almost nice about her relationship with Krook. Krook bought Lady Jane to skin her, but apparently became fond of her (a fact which seems to surprise even him) (Dickens, Bleak House, p.70). While both are grotesque, the relationship between Krook and Lady Jane is arguably the thing which most humanizes Krook. That Krook hid his dead lodger’s papers behind Lady Jane’s bed suggest practicality, but also a form of trust (Dickens, Bleak House, p. 824). Lady Jane even allows Krook to linger in the story after he dies, since, as Mr. Guppy notes, it “almost looks if she was Krook” (Dickens, Bleak House, p. 635). It is hard to imagine loving a cat like Lady Jane. But I would argue that the fact that Krook, in his own way, seems to have done so can be read as evidence that there is apparently a cat for everybody, even people like Krook. So on International Cat Day, we should all pull out our copies of Bleak House and spend some time with these two best friends.

Furniss, Harry. “Mr. Krook and His Cat.” 1910. Dickens's Bleak House, Vol. 11 of Charles Dickens Library Edition, for Chapter 5, "A Morning Adventure," facing p. 64. Scanned image and text by Philip V. Allingham. Available from Victorian Web: http:/…

Furniss, Harry. “Mr. Krook and His Cat.” 1910. Dickens's Bleak House, Vol. 11 of Charles Dickens Library Edition, for Chapter 5, "A Morning Adventure," facing p. 64. Scanned image and text by Philip V. Allingham. Available from Victorian Web: http://www.victorianweb.org/art/illustration/furniss/170.html

Science Is Lit: A Statement of Intent

 In titling this blog "Science Is Lit," I hoped to foreground a statement of intent, riding the coattails of the 2010s slang term "lit," meaning "fun," "cool," or "exciting": a term which officially became uncool roughly two weeks after I had the idea for this title, when Donald Trump Jr. publicly used it to describe the retirement of Justice Anthony Kennedy. This is a blog dedicated to discussing science--particularly Victorian science--as something interesting and exciting.

At the same time, the title was a provocation, meant to summon the specter of straw-man postmodernism, in which everything, including science, is a text to be read or misread. One of the things I explore in this blog are the many overlaps between what C. P. Snow famously identified as the "Two Cultures." As my fellow scholars in the fields of "Literature and Science" and "Science, Technology, and Society" are well aware, the thing we call science has long been either vexingly or felicitously textual, depending on who you ask. When I say "textual" here, I mean in its narrowest sense: in the lab notebook, in scientific correspondence, in the scientific journal and book, science exists on the page or the electronic simulacra of the page. One can, I suppose, imagine a version of science which might be otherwise. Scientists could travel from lab to lab, convincing their fellows through oratory and in-person confirmation of experimental results. They could share video footage of their experiments. But since Boyle's invention of "virtual witnessing," science has depended on "literary technology" (Steven Shapin and Simon Schaffer, Leviathan and the Air-Pump, Princeton UP, 1985). As Bruno Latour famously describes, from the outside scientists in the lab are "a strange tribe who spend the greatest part of their day coding, marking, altering, correcting, reading, and writing" (Bruno Latour and Steve Woolgar, Laboratory Life, Princeton UP, p. 49).

This blog does not seek to treat science as if it were literature, but it does seek to draw attention to the many ways in which science is literary, by investigating how the concept of science, as well as scientific theories and scientific identities, are stabilized through discussions of science and scientific ideas in various texts. "Science is lit" is not meant to be a sweeping, ahistorical claim, or a manifesto, or a call to action. It is, for me, a somewhat banal statement of fact, enjoyable in large part because the obviousness of the claim pairs well with the precipitate triteness of "lit." It's a title designed to prevent readers from taking these entries too seriously. This is because, in my opinion, recognizing the "lit-ness" of science and having fun should be two activities which go hand in hand. So enjoy.

Experiments in Computational Criticism #2: Victorian Visualizations and the Dangers of Google N-Gram Viewer

**This post was originally written on December 2, 2016.**

This is a quick lesson on the importance of being careful with the terms one uses in taking Google Ngram Viewer as a tool.

Initially, one might have a hypothesis: the idea of science being “fun” (in the English-speaking world) emerges in the twentieth century.  A quick search of the phrase “science is fun” in the Ngram Viewer would seem to support this claim[i]:

science-is-fun.png

The phrase appears to become prominent only after 1940.  But looking at Google Ngram Viewers (ridiculously confusing and UNLABELED axis), we learn that at most, the trigram “science is fun” only occurs in 0.000000180 % of all the trigrams in the Google Bookscorpus.  This seems like a very low amount of data.  But we can verify it with some quick searches through databases such as Google Books and Hathi Trust.  Examples that I could find prior to 1900 of the phrase “science is fun” were generally miscategorizations: for instance, the phrase “science is fundamental.”  So one might be willing to make a weak claim that the phrase does become popular only in the twentieth century.

But wait.  Is it possible that the concept of science being “fun” existed prior to the twentieth century, but instead used different rhetoric?  Of course it is.  For instance, if we search for “fun,” we find that the term “fun” seems to have been less popular in the nineteenth century (This is very much in line with the work of historians of play, who note that industrialization and new concepts of work and leisure time were accompanied by an increase in the number of recreational activities and the rise of the cult of sports.)[ii]:

fun.png

This is where we also need to keep in mind the problems with Optical Character Recognition (OCR) software: if you go into the Google Books corpus, one will find that many of the results which were used as data for the apparent popularity of “fun” from 1800 – 1820 were actually references to “sun.”  The OCR has simply confused “f” for the “long s,” “ſ.”  So perhaps a different term: for instance, “pleasure” or “enjoyment” might be more appropriate.  Similarly, the concept of “science” as a field is relatively new as well; previously “natural philosophy” was a more common label.

To keep an eye on these issues, it is often helpful to construct a chart like the one below, which allows the researcher to more carefully investigate each of the terms of interest.  Note that I have used the “+” to combine all the various cases of the terms into one line.

Note that my search terms are not absolutely symmetrical.  “Science is pleasurable,” for instance, yielded no Ngram results, nor did “fun of natural philosophy.”  So I changed my search terms slightly where applicable.  “Enjoyment(s) of Science” turns out to be too sporadic to draw any conclusions.  But the fact that “Pleasure(s) of Science” declines in the second half of the nineteenth century, while “Science is Fun” increases in the twentieth century, should be a reason to doubt our initial hypothesis.  The concept of science being “fun” almost certainly is not a twentieth-century concept, although the change in rhetoric deserves more attention, given the different connotations of “fun” and “pleasure.”  More interesting, however, is the pervasive ABSENCE of a connection between “Natural Philosophy” and “fun” or “pleasure” suggests that PERHAPS the shift away from the term “natural philosophy” towards the more frequent use of the term “science” is also related to the changing relationship between science and feelings of pleasure.

So while it is hard to draw strong conclusions from the Google Ngram Viewer, given the limitations of its OCR, the difficulty of compiling and searching for all relevant terms, and the fact that its archive is restricted to what is available in Google Books, it can be a useful tool for testing initial hypotheses and for developing ideas for further research.


[i] https://books.google.com/ngrams/graph?content=fun&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cfun%3B%2Cc0

[ii] https://books.google.com/ngrams/graph?content=science+is+fun&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cscience%20is%20fun%3B%2Cc0

[iii] https://books.google.com/ngrams/graph?content=science%2BScience%2BSCIENCE&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28science%20%2B%20Science%20%2B%20SCIENCE%29%3B%2Cc0

[iv] https://books.google.com/ngrams/graph?content=NATURAL+PHILOSOPHY%2Bnatural+philosophy%2BNatural+philosophy+%2B+Natural+Philosophy&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28NATURAL%20PHILOSOPHY%20%2B%20natural%20philosophy%20%2B%20Natural%20philosophy%20%2B%20Natural%20Philosophy%29%3B%2Cc0

[v] https://books.google.com/ngrams/graph?content=pleasure%2BPleasure%2BPLEASURE&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28pleasure%20%2B%20Pleasure%20%2B%20PLEASURE%29%3B%2Cc0

[vi] https://books.google.com/ngrams/graph?content=pleasure+of+science+%2B+Pleasure+of+science+%2B+Pleasure+of+Science+%2B+PLEASURE+OF+SCIENCE+%2B+pleasures+of+science+%2B+Pleasures+of+science+%2B+Pleasures+of+Science+%2B+PLEASURES+OF+SCIENCE&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28pleasure%20of%20science%20%2B%20Pleasure%20of%20science%20%2B%20Pleasure%20of%20Science%20%2B%20PLEASURE%20OF%20SCIENCE%20%2B%20pleasures%20of%20science%20%2B%20Pleasures%20of%20science%20%2B%20Pleasures%20of%20Science%20%2B%20PLEASURES%20OF%20SCIENCE%29%3B%2Cc0

[vii] https://books.google.com/ngrams/graph?content=fun%2BFun%2BFUN&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28fun%20%2B%20Fun%20%2B%20FUN%29%3B%2Cc0

[viii] https://books.google.com/ngrams/graph?content=science+is+fun+%2B+Science+is+fun+%2B+Science+is+Fun+%2B+SCIENCE+IS+FUN&case_insensitive=on&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28science%20is%20fun%20%2B%20Science%20is%20fun%20%2B%20Science%20is%20Fun%20%2B%20SCIENCE%20IS%20FUN%29%3B%2Cc0

[ix] https://books.google.com/ngrams/graph?content=Enjoyment%2Benjoyment%2BENJOYMENT&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28Enjoyment%20%2B%20enjoyment%20%2B%20ENJOYMENT%29%3B%2Cc0

[x] https://books.google.com/ngrams/graph?content=(enjoyment+of+science)+%2B+(Enjoyment+of+science)+%2B+(Enjoyment+of+Science)+%2B+(ENJOYMENT+OF+SCIENCE)+%2B+enjoyments+of+science+%2B+Enjoyments+of+science+%2B+Enjoyments+of+Science+%2B+ENJOYMENTS+OF+SCIENCE&case_insensitive=on&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2C%28%28enjoyment%20of%20science%29%20%2B%20%28Enjoyment%20of%20science%29%20%2B%20%28Enjoyment%20of%20Science%29%20%2B%20%28ENJOYMENT%20OF%20SCIENCE%29%20%2B%20enjoyments%20of%20science%20%2B%20Enjoyments%20of%20science%20%2B%20Enjoyments%20of%20Science%20%2B%20ENJOYMENTS%20OF%20SCIENCE%29%3B%2Cc0

Experiments in Computational Criticism #1: The Rise of Darwin in Victorian Science

**This post was originally written in February of 2017**

I spent last night constructing a series of data visualizations which demonstrate why Darwin is probably the most famous Victorian scientist today.  Typically, Darwin’s Victorian fame is evidenced by his existence as a cultural phenomenon (for instance, satires of his theory in Punch and other magazines) , and by the praises other Victorian scientists laid at his feet.  I constructed a series of data visualizations to point out that Darwin’s importance is also demonstrated by the fact that as the nineteenth-century progressed, scientists referred to Darwin more and more frequently in publications intended for other scientists, even after Darwin’s death.

I took as my corpus the annual reports of the meetings of the British Association for the Advancement of Science (BAAS) between 1834 and 1900.  My visualization illustrates the normalized frequency of references to scientists’ last names across this corpus: for instance, a value of 0.015 suggests that of all the lemma published in the Report of the […] Meeting of the British Association for the Advancement of Science that year, 0.015% of those lemma were that last name.  Most scientists, like Huxley, Lyell, and Tyndall, had moments of popularity, and then were referenced less as the century progressed.  Lyell and Tyndall were mentioned most in the 1860s, and Huxley in the 1870s.  But Interest in Darwin trended upward right up until the end of the century.

“Darwin” vs “Lyell”

capture (1).png

“Darwin” vs “Tyndall”

capture1.png

“Darwin” vs “Huxley”

capture2.png

However, the real test of Darwin’s fame must be to compare him to that other paragon of British science, Isaac Newton.  Despite the fact that Darwin has the advantage of novelty, Newton is still mentioned more frequently in the publications of the BAAS.  However, the two show very similar upward trends over the course of the century.

“Darwin” vs “Newton”

capture3.png

My Victorian comparisons to Darwin–Tyndall, Lyell, and Huxley–all worked in similar fields (to the extent that those fields existed as distinct entities in the nineteenth century).  Is there a “hard science” equivalent to Darwin’s centrality to British science?  James Clerk Maxwell is a good option.  As the century drew to a close, Maxwell too saw a posthumous fame within the reports of the BAAS, likely due to late-nineteenth and early-twentieth century reconfigurations of Maxwell’s work on electricity and magnetism.

“Darwin” vs “Maxwell”

That Darwin was referenced more frequently by scientists as the century progressed comes as no surprise.  As I noted, historians of science have long been making analogous claims with other forms of evidence.  However, this method of looking at the publications of the BAAS for clues about which figures were most important to Victorian science can also result in some surprises.  For instance, given the debate surrounding Tyndall’s materialist arguments when he was president of the BAAS, one might expect that Tyndall was a central figure in the 60s and 70s.  However, searching this corpus suggests that Huxley was actually far more visible in the British Association during those years.

“Huxley” vs “Tyndall”

capture5.png

The Era of Progress, the Death of Literature

**This post was originally written July 16, 2018.**

Today I want to talk briefly about this cartoon: “The Era of Progress in Children’s Literature” (Puck, Volume 21, 1871) by Frederick Burr Opper.

puck-era-of-progress.png

In the preface to his popular science text Madam How and Lady Why (Bell and Daldy, 1870), novelist Charles Kingsley (1819-1875) presents the blending of amusement and instruction as a defining feature of nineteenth-century children’s literature: “When I was your age, there were no such children’s books as there are now. […] you have your choice of books without number, clear, amusing, and pretty, as well as really instructive […]” (vii). But not everyone saw the children’s literature market in this way. Some, such as Samuel Smiles, derided the “sort of mania for ‘making things pleasant’ on the road to knowledge” (Self-Help, Ward, Lock, & Co., [1859], 302) as a strategy which would weaken the student. Others saw science and the fancifulness of other children’s literature as such intrinsically opposed concepts that fostering scientific learning through children’s literature could only be accomplished through the diminishment of fairy tales and fantasy stories. As one learns in Dickens’s Hard Times (1854), in a world of “Fact, fact, fact,” “you are never to fancy.”

At first, this image seems to be squarely in the latter camp. The characters of fictional children’s stories lament that their future has been replaced by the scientific texts on the floor. Because they are given human form and the only piece of dialogue, our sympathy ostensibly lies with the fictional characters. Little Willie, Johny, and Tommy, the supposed main characters of some of the science books, are not given form to defend himself. It seems wrong that the boy is already learning how to be a stock broker.

But what I find interesting about this cartoon is that since the audience is absolutely supposed to be familiar with these staples of nineteenth-century children’s literature, there is some dramatic irony in the characters coming together. A weeping Red Riding Hood has her arm around the wolf that eats her grandmother. We are asked to feel sympathy for Struwwelpeter, whose entire character is that his hygiene repulses people, and Bluebeard, a multiple murderer. It’s hard to feel sorry for these particular personifications of the stories losing their “future.”

The scientific texts are also an odd assortment. Many are the sorts of titles which were given to many nineteenth-century children’s books: “Science for Little Readers,” “The Boy Inventor,” “The Boy Astronomers,” and “Youthful Geologists.” But other titles eschew verisimilitude in favor of parody. “Logarithms for Little Ones” is humorous because it suggests an attempt to teach a level of mathematics that would have been out of reach of even nineteenth-century children. “How Johnny Bought a Farm for $4.50” suggests an absurd level of precociousness for a child. And “How Little Willie Discovered Perpetual Motion” suggests that the line between the fantastical characters on the left and the books on the right is not as well defined as the composition of the cartoon suggests. One must imagine that “Tommy’s Adventures in Search of the North Pole” are probably just as adventure filled as Robinson Crusoe’s.

Rather than criticizing the new emphasis on science in children’s literature, the cartoon ultimately reads to me as a critique of the critique: the “Old Favorites” are those who see science and traditional children’s literature as being opposed and believe the more fantastical stories are being replaced. In reality children’s books of the type suggested by “Tommy’s Adventures in Search of the North Pole”–such as Andre’s “Cruise of the Walnut Shell” (1881) were often inspired by the fairy tales, fantasies, and adventure stories which came before them. As always the division between the scientific and the more traditionally literary is false.