POST: Refining the Problem — More work with NYPL’s open data, Part Two

In part II of his experiment to create an index of items using the New York Public Library’s What’s on the menu? data set, Trevor Muñoz discusses his work with the data and some of the lessons he learned. Muñoz used the Open Refine tool and, finding the NYPL data set too large to easily work with, he discusses some of his workarounds. Muñoz concludes,

The larger question is whether there is a still a plausible vision for how a data curator could add value to this data set. The need to script around limitations of a tool increases the cost of normalizing the NYPL data. At the same time, the ability to see the clusters of similar values that Refine produces increases my confidence that the potential gain in data quality could be very substantial in going from the raw crowdsourced data to an authoritative index.

RESOURCE: NYPL Releases API

The New York Public Library was busy last week. In addition to announcing support for the DPLA, NYPL also released its Digital Collections API (Application Programming Interface), which allows users to submit large (and small) queries against the metadata for NYPL’s online collections. The API exposes the metadata, distributed under a CC0 license, for over 1 million objects and returns data in either XML or JSON format.

The API was created by NYPL’s Information Technology Group to help support the string of innovative projects coming from NYPL Labs, such as the Menus Project and Direct Me NYC 1940. On releasing the API, David Riordan, product manager for NYPL Labs, explains, “As a public library, we felt a responsibility to make this same data available to the public.”