In advance of Tuesday’s State of the Union address, Benjamin Schmidt (Northeastern University) and Mitch Fraas (University of Pennsylvania) created a series of interactive graphics for The Atlantic that allow readers to explore the State of the Union addresses of every U.S. president: The Language of the State of the Union and Mapping the State of the Union.
Schmidt has written a blog post pointing to an additional tool designed to “compare and contrast language spoken by Presidents in the State of the Union” side-by-side. He explains why it could be an exciting example of online text analysis that shifts focus away from topic modeling and towards leveraging the rich metadata that libraries (and others) already have:
For the State of the Union, there are all sorts of useful comparisons to make: president vs. president, republican vs. Democrat, lame duck vs recently elected, opposition congress vs. friendly crowd… And for every other corpus, there are just as many. We currently treat these kinds of analytics as things that should be run client side, requiring individuals to obtain digital texts (frequently impossible) and install and run some tools for corpus comparison (a high barrier to entry.) But libraries and other content holders can–and I would argue, should–support these things as a form of exploration out of the box.
Just as libraries have provided search functions across and within collections, Schmidt envisions “real-time, fully customizable in-browser comparison across any facets of a corpus as a service libraries and other content providers can easily offer on medium-sized (c. 20,000 documents) corpora.”