Thomas Padilla (University of California, Santa Barbara) and James Baker (University of Sussex) recently launched the sourcecaster, a tool that “helps you use the command line to work through common challenges that come up when working with digital primary sources.”
Padilla has written a brief post to explain the project, which is based on ffmprovisr, a command line tool to convert multimedia files between formats. The sourcecaster’s commands fall into the following categories:
- casting – changing one type of data to another type (e.g. PDF to TXT for text analysis purposes)
- wrangling – manipulating and navigating data (e.g. remove punctuation, normalize case)
- getting – grabbing data from various locations (e.g. webscraping all relevant images from portions of a website)
- managing – editing and managing your work with data (e.g. save command line history)
The project is intended to be a community resource, and contributors are invited to share solutions on its GitHub page.
dh+lib Review
This post was produced through a cooperation between Shaherzad Ahmadi, Lady Jane Acquah, Leigh Bonds, Taylor Davis-Van Atta, Rose Fortier, Cody Hennesy, Jason T. Mickel, Chelcie Rowell, Joshua Sadvari, and Erin White (Editors-at-large for the week), Roxanne Shirazi (Editor for the week), Sarah Potvin (Site Editor), and Caitlin Christian-Lamb, Caro Pinto and Patrick Williams (dh+lib Review Editors).