Thomas Padilla (University of California, Santa Barbara) and James Baker (University of Sussex) recently launched the sourcecaster, a tool that “helps you use the command line to work through common challenges that come up when working with digital primary sources.”
Padilla has written a brief post to explain the project, which is based on ffmprovisr, a command line tool to convert multimedia files between formats. The sourcecaster’s commands fall into the following categories:
- casting – changing one type of data to another type (e.g. PDF to TXT for text analysis purposes)
- wrangling – manipulating and navigating data (e.g. remove punctuation, normalize case)
- getting – grabbing data from various locations (e.g. webscraping all relevant images from portions of a website)
- managing – editing and managing your work with data (e.g. save command line history)
The project is intended to be a community resource, and contributors are invited to share solutions on its GitHub page.