RECOMMENDED: Datasheets for Digital Cultural Heritage Data

A recently published paper in the Journal of Open Humanities Data titled “Datasheets for Digital Cultural Heritage Data” explores the complexities of datasets created from digital cultural heritage collections, with the purpose of providing recommended standards for documenting these datasets. Their interest in better describing these kinds of datasets relates primarily to the Collections as Data movement in GLAM institutions, where machine-learning algorithms are often applied to large cultural heritage collections datasets.

Authors Henk Alkemade, Steven Claeyssens, Giovanni Colavizza, Nuno Freire, Jörg Lehmann, Clemens Neudecker, Giulia Osti, and Daniel van Strien summarize this goal and argue for the creation of structured datasheets as a potential solution to the observed data documentation problem:

This paper elaborates on the use of datasheets, as introduced by (Gebru et al., 2021) to the ML [machine-learning] community for the first time in 2018, for creating and disseminating documentation about DCH materials shared as “collections as data.” … Dataset documentation can take on a myriad of shapes and forms, ranging from highly structured data, for both humans and machines to read (for example, metadata description in the Data Catalog Vocabulary1 (DCAT)), over semi-structured datasheets, organised around a standard list of questions, to unstructured, primarily narrative data papers. … Datasheets, however, bring a structured approach to the description of datasets, which provide guidance to the data publisher in describing the datasets according to the information needs of data re-users, and they offer the advantage of allowing information to be collected in both a structured manner, whenever possible, and in a narrative form, whenever necessary. Considering the particularly diverse nature of DCH collections, that combination is invaluable.

The paper addresses the specific characteristics of digital cultural heritage data that need to be considered when utilizing them for “collections as data” projects and provides a Template Datasheet for Digital Cultural Heritage Datasets (doi.org/10.5281/zenodo.8375033).

dh+lib Review

This post was produced through a cooperation between Jennifer Matthews, Kayla Abner, Rebekah Walker, Ruth Carpenter, Arianne Hartsell-Gundy, Elizabeth Parke, Divya Mathur, Kristin Van Diest, Emily Cukier, Leigh Bonds, Melissa Runnels, Johannes Sibeko, and Amy Gay (Editors-at-large for the week), Nickoal Eichmann-Kalwara and Rachel Starry (Editors for the week), Claudia Berger, Linsey Ford, Pamela Lach, Hillary Richardson, and John Russell (dh+lib Review Editors).