POST: Refining the Problem — More work with NYPL’s open data, Part Two

In part II of his experiment to create an index of items using the New York Public Library’s What’s on the menu? data set, Trevor Muñoz discusses his work with the data and some of the lessons he learned. Muñoz used the Open Refine tool and, finding the NYPL data set too large to easily work with, he discusses some of his workarounds. Muñoz concludes,

The larger question is whether there is a still a plausible vision for how a data curator could add value to this data set. The need to script around limitations of a tool increases the cost of normalizing the NYPL data. At the same time, the ability to see the clusters of similar values that Refine produces increases my confidence that the potential gain in data quality could be very substantial in going from the raw crowdsourced data to an authoritative index.

POST: What IS on the Menu? More Work with NYPL’s Open Data, Part One

Part of making the argument for open collections data is showing what can be done with it. Trevor Muñoz’s recent blog post, in which he plays with the NYPL’s open data from the “What’s on the Menu?” project, explains how he uses the collection data as a testbed for data curation work. As Muñoz states:

I’m particularly interested right now in work that data curators can do to build secondary and tertiary resources—reference materials, if you will—around data. I mean particularly reference materials that draw on the skills of people with training in library and information science, things like indexes. These types of organized systems of description can be one way to provide additional value over full text search (which, for many kinds of data sets, e.g., a table of numerical readings, is not particularly effective anyway).

After evaluating the data release against Tim Berners Lee’s 5 Star Linked Open Data Scale, Muñoz begins the process of creating a useful index to the names of the dishes represented in the collection, introducing linked data concepts and showcasing the work (and potential work) of data curators along the way.

 

CFParticipation: Help With Technology Challenges at NYPL

The New York Public Library is reimagining the “public” in its name and is calling on you to “help build new tools and services that have the potential to impact libraries everywhere”. The contest is open to all, so assemble your team of librarians and hackers, and create a project in one of the following areas:

Building apps with historical data: Create innovative educational apps with one-of-a-kind materials and datasets digitized from NYPL collections

  • Build a historical ‘check-in’ app using old New York City atlases and other scanned materials

Hardware hacking: Build smarter tools for on-site use

  • Engineer an outdoor 24-hour book drop that checks library books on the way in, and keeps everything else out
  • Design lightweight scanning stations to digitize our legacy card catalogs

Data crunching/Machine learning: Process and analyze library data in creative ways

  • Consolidate records in our online catalog that describe versions of the same book (e.g. editions, formats)
  • Develop new methods to extract structured data from old card catalogs

Data visualization: Use library data to uncover new insights and tell new stories

  • Visualize the flow of physical books and ebooks as they’re borrowed and read across the city
  • Map the flow of letters from Founding Fathers from archival collections

RESOURCE: NYPL Releases API

The New York Public Library was busy last week. In addition to announcing support for the DPLA, NYPL also released its Digital Collections API (Application Programming Interface), which allows users to submit large (and small) queries against the metadata for NYPL’s online collections. The API exposes the metadata, distributed under a CC0 license, for over 1 million objects and returns data in either XML or JSON format.

The API was created by NYPL’s Information Technology Group to help support the string of innovative projects coming from NYPL Labs, such as the Menus Project and Direct Me NYC 1940. On releasing the API, David Riordan, product manager for NYPL Labs, explains, “As a public library, we felt a responsibility to make this same data available to the public.”