POST: The Data Sitters Club #19: Shelley and the Bad Corpus

DCS#19 of The Data Sitters Club, a project that applies “digital humanities computational text analysis tools and methods” to a popular book series from the 1990s, looks at the corpus of works that make up the collection. The author of this chapter, “Shelley and the Bad Corpus,” Quinn Dombrowski (Stanford University), worked with Prof. Shelley Staples (University of Arizona), a corpus linguist, to look more closely at what constitutes a complete corpus. For this project, the items in it were easy to identify, as it consists of a complete series and there was a finite number of installments, but that isn’t always the case. The author also looks to pizza to “illustrate the consequences of corpus choice and set up a discussion about the claims we can make.”

dh+lib Review

This post was produced through a cooperation between Hillary Richardson and Linsey Ford (Editors for the week), and Claudia Berger, Nickoal Eichmann-Kalwara, Pamella Lach, John Russell, and Rachel Starry (dh+lib Review Editors).