RESOURCE: Tabulizer, an R package for working with Tabula

The rOpenSci project has released tabulizer, an R package that provides bindings to the Tabula java library.

Tabula is a tool for extracting data from PDF tables:

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there’s no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.

rOpenSci develops R packages “that provide programmatic access to a variety of scientific data, full-text of journal articles, and repositories.”

dh+lib Review

This post was produced through a cooperation between​ Gayle Fischer, Stephen Lingrell, Anna Newman, Kelley Rowan, Chelcie Rowell​, and Ashley Zengerski ​(Editors-at-large for the week), Roxanne Shirazi (Editor for the week), Sarah Potvin (Site Editor), and Caitlin Christian-Lamb, Caro Pinto and Patrick Williams (dh+lib Review Editors).