POST: Making Scanned Content Accessible Using Full-text Search and OCR

Chris Adams (Library of Congress) has written a guest post for The Signal detailing how the library community can affordably meet the challenge of creating metadata for “our terabytes of carefully produced and diligently preserved TIFF files” to promote discovery and engagement.

In “Making Scanned Content Accessible Using Full-text Search and OCR,” Adams documents how to get “from scan to search” in four steps. Adams also offers possible directions for the future including “a simple web application which would display images with the corresponding OCR with full version control, allowing the review and correction process to be a generic workflow step for many different projects.”

dh+lib review

This post was produced through a cooperation between Laura Braunstein, Joe Grobelny, Nabil Kashyap, Paula S. Kiser, Jan Lampaert, Jennifer Millen, Kristen Totleben and Roberto Vargas (Editors-at-large for the week), Caro Pinto (Editor for the week), Sarah Potvin (Site Editor), and Zach Coble and Roxanne Shirazi (dh+lib Review Editors).