Thomas Padilla (Michigan State University) has written a post addressing a central concern in text analysis projects: how do you get the data you want, and how to do you make it usable?
… more experienced Digital Humanists often have programmatic means of getting data and transforming it in such a way that it suits their needs. These means are not inaccessible to beginners but the path from DH interest to DH exploration is sometimes better wended via a route that poses the least resistance. In what follows, I will describe a method that kludges together a couple of different easy to use tools to download web pages en masse, remove markup, and convert them to .txt.
Padilla’s post goes on to provide a step-by-step tutorial, using UC Davis’ British Women Romantic Poets, 1789-1832 project.
This post was produced through a cooperation between Leigh Bonds, Rebel Cummings-Sauls, Nickoal Eichmann, Leah Henrickson, Jasmine Jones, Elizabeth Lorang, Anna Richards, and Allison Ringness (Editors-at-large for the week), Caitlin Christian-Lamb (Editor for the week), Sarah Potvin and Zach Coble (Site Editors), Caro Pinto, Roxanne Shirazi, and Patrick Williams (dh+lib Review Editors).