Ryan Cordell (Northeastern University) has shared the text of his talk at MLA2016, “‘Q i-jtb the Raven’: Taking Dirty OCR Seriously,” in which he considers the place of the digital products of mass-digitization efforts within the practice of bibliography and scholarly editing. Cordell posits that the digitized texts should themselves be considered a new edition:
Just as cheap, pirated, and errorful American editions of nineteenth-century British novels now teach scholars much about the economics, print technology, and literary culture in that period, dirty OCR illuminates the priorities, infrastructure, and economics of the academy in the late 20th and early 21st centuries.
We might think of OCR as a species of compositor: prone to transcription errors, certainly, but nonetheless resetting the type of its proof texts into .txt or .xml files rather than printer’s frames.
Cordell goes on to examine these ideas in the context of his work with digitized newspapers from the Library of Congress’s Chronicling America project.