When Metadata Becomes Outreach: Indexing, Describing, and Encoding For DH

How can metadata become the most cutting-edge type of library outreach? In this article, we explore how engagement in collaborative library-based digital humanities (DH) projects is proving just that at the University of Alabama. In traditional scholarship, researchers encounter, rely on, and benefit from the work of metadata librarians every day as they access catalogs and online databases, yet metadata librarianship is rarely a directly public-facing role. DH can challenge  this status, bringing metadata librarianship to the foreground, because high-quality, customized metadata schemas form the core of all large-scale initiatives such as database-creation, network analysis, and Textual Encoding Initiative (TEI) projects. Metadata has become one of the essential research tools for faculty members working on digital humanities projects, and at the University’s Alabama Digital Humanities Center (ADHC) we are meeting that need and leveraging it as a new means of outreach to our academic community.

This article presents three different types of DH projects that our metadata librarians are collaborating on at the ADHC, in tandem with our digital scholarship librarian, IT specialist, and faculty members from fifteen different departments across campus. Some projects live in the traditional metadata provinces of indexing and cataloging: digitizing and making searchable the manuscript marginalia of John Stuart Mill, or indexing fabric swatches in a magazine. Other projects require tailored metadata schemas to encode a digital edition using TEI, an approach with scholar-librarian partnerships at its core. These projects reside in different departments across campus, but they share a need for metadata innovation. By virtue of its distinctive research needs, DH is transforming metadata into a new avenue for library outreach.

The Challenge

DH projects require high-quality metadata in order to thrive, and the bigger the project, the more important that metadata becomes to make data discoverable, navigable, and open to computational analysis. The functions of all metadata are to allow our users to identify and discover resources through records acting as surrogates of resources, and to discover similarities, distinctions, and other nuances within single texts or across a corpus. High quality metadata brings standardization to the project by recording elements’ definitions, obligations, repeatability, rules for hierarchical structure, and attributes. Input guidelines and the use of controlled vocabularies bring consistencies that promote findability for researchers and users alike.

[pullquote]Metadata is the heartbeat making DH projects usable, robust, preservable, sustainable, and scalable.[/pullquote]Metadata is the heartbeat making DH projects usable, robust, preservable, sustainable, and scalable. It scaffolds project data in a way that enables the project to expand to meet future research and technology needs. However, metadata is not something which the majority of faculty members in the humanities are trained in or have experience with. Enter metadata librarians, empowered by their skills and professional experiences to take leadership roles in bringing innovation to this part of DH projects. More than that, by raising awareness of the functions of metadata and introducing campus communities to its methods, metadata librarians can actively forge new connections with scholars working on DH projects, presenting a new kind of public outreach. The need is vital, and metadata librarians are uniquely well-positioned to meet it.

Background: Library Outreach For DH

Svensson (2012) argues for libraries’ roles in facilitating collaborative approaches to scholarship within the traditionally solo pursuit of the humanities. Similarly Harkema and Nelson (2013) and Hoeve (2015) offer case studies of librarian-scholar collaborations in DH projects for open access publishing initiatives and an introductory DH course respectively. Yet even in articles acknowledging the potential for libraries to partner with faculty for digital scholarship, metadata expertise is seldom touched upon. Even within libraries-DH discussions, metadata and the role of the metadata librarian remain backstage, even though high quality metadata is a vital ingredient for successful large-scale DH projects.[pullquote]Even within libraries-DH discussions, metadata and the role of the metadata librarian remain backstage, even though high quality metadata is a vital ingredient for successful large-scale DH projects.[/pullquote]

There are some exceptions: Padilla (2016) has argued for new approaches in metadata to render humanities data open to computational analysis, whilst much earlier Llona (2007) made the meta-argument that digital projects themselves need improved metadata in order to be discoverable and preservable. In an article explicitly advocating that metadata librarians and knowledge about metadata needs to emerge from the backroom, McFall (2015) gives an overview of the skills that metadata librarians already possess that can be applied to digital humanities projects; these include experience creating customized schemas and controlled vocabularies, a knowledge of different kinds of metadata standards such as the Dublin Core Metadata Initiative and the Metadata Object Description Standards (MODS), and best practices for digital preservation.

However, this article aims to go one step beyond to argue that the paramount importance of high-quality metadata for DH projects means that metadata librarians have a unique opportunity for a new kind of outreach to faculty and students. It is time to leave the backroom and to partner with faculty and students on the frontiers of DH research, introducing them to metadata best practices and innovations, and sharing with them the creativity required to produce flexible, sustainable, and robust data for their projects.

Metadata Outreach at the Alabama Digital Humanities Center (ADHC)

The Alabama Digital Humanities Center is part of the University Libraries, located in the Amelia Gayle Gorgas Library at the heart of University of Alabama campus. It is funded primarily by the University Libraries, though there is a close relationship maintained with the College of Arts and Sciences, which currently funds 40% of the Center’s digital scholarship librarian position. The ADHC’s mission is to support faculty and graduate students in their digital research and teaching projects.

The ADHC’s position within the University Libraries is important in allowing faculty and students from all over campus to call on the library’s resources to pursue their digital projects. Through a combination of outreach in specific projects, brown bag events, and workshops, the campus is becoming increasingly aware of the kind of expertise needed for DH projects, and that the library is a hub for just that. In this way, engagement in DH is opening up new relationships and partnerships for the University Libraries across campus and with other institutions in joint projects.[pullquote]… engagement in DH is opening up new relationships and partnerships for the University Libraries across campus and with other institutions in joint projects.[/pullquote]

For each ADHC project, digital scholarship librarian Emma Annette Wilson assembles an appropriate team, typically consisting of the faculty member or student, the ADHC’s Information Technology specialist, and herself for smaller projects. A metadata librarian will join a team for larger, more complex projects; whilst initially these were all research-based initiatives, in the last year, we have expanded to include metadata expertise in hybrid graduate-level research and teaching components.

At the ADHC, we are fortunate in being able to call on a percentage of two metadata librarians’ time to work on large-scale DH projects. Innovation and creativity is required in these initiatives, as the materials described are frequently highly idiosyncratic, and the purposes for which they are being documented introduce complications and challenges best met through rigorous metadata setup. Fortunately, the metadata librarians are well-versed in Extensible Mark-up Language (XML) through their daily work with MODS, and have a wide range of experience from applying vocabularies used for archival collections to transform data to share with others.

Consultation is our modus operandi: from initial meetings through to the launch and onward reiterations of a DH project, librarians give metadata guidance by drawing on established schemas and related standards, and when necessary combining these to create custom schemas for highly specialized projects. It is important to note that metadata librarians don’t create the metadata itself for these projects. To do so would be prohibitively time-consuming and would drastically limit the number of projects feasible at any one time. Rather, through a collaborative, consultative approach, we teach faculty and students what makes good metadata and how to create it.

Indexing and Cataloging For DH

A number of ADHC’s projects involve indexing and cataloging materials in order to make them discoverable. For example, American Fashion and Fabrics is a trade publication for the clothing industry that was published under a variety of titles from the 1940s until the 1980s. Various partial indexes of its issues and articles exist, but when Professor Amanda Thompson from the Department of Clothing, Textiles, and Interior Design, approached the ADHC about this project, she did so because much of the most valuable material within the publications is not documented or searchable. Contents not indexed include advertisements indicating trends in fabric design and purchasing; trade reports; fabric swatches retaining their original colors thanks to living in library stacks; and some editorials, articles, poems, and advertorials. Knowing that a digital index would enhance the research and teaching use of this publication, Thompson approached metadata librarian Mary Alexander directly about the problem, and from there the project transitioned into a DH initiative at the ADHC.

Alexander, with the assistance of metadata librarian Vanessa Unkeless-Perez, has worked at length with sample issues of American Fashion and Fabrics to create a custom schema capable of documenting this wide variety of materials. Significant challenges have arisen in terms of describing the fabric swatches, a centerpiece of the publication and its importance in the field. For instance, how much detail is required to make these usable and findable? Can we, and should we, describe the type of weave used in a fabric? Its texture? Pattern? Weight? Color? In a project of this scale, multiple different people will need to create the metadata, and the more complex the information documented, the more specialized their knowledge needs to be, and the more prescriptive the input guidelines.

One of the most challenging aspects of this project involved the very human problem of perception. Everyone perceives colors slightly differently and they might describe patterns in fabric slightly differently. A controlled vocabulary for all of these different elements of the fabric swatches would be enormous to the point of unwieldiness not only in terms of finding the apposite terms but also the time required to describe every single swatch. Our current approach to this problem is under discussion and involves a two-pronged approach. The controlled vocabulary for colors will be primary and secondary hues, thus limiting perception differences by limiting the number of color terms available. The second prong will further develop critical thinking skills of upper-class students majoring in textiles, our proposed data-input workforce, to analyse characteristics of swatches and to apply appropriate terms from controlled vocabularies for weaves, patterns, and other areas, or to pause and seek more information when the degree of certainty for applying a term is below 100%. Thompson is responsible for answering student questions and proposing new terms to be added to the controlled vocabulary lists, so the lists can continue to expand.

Research Applications: Digitizing The Marginalia Of John Stuart Mill

Somerville College, Oxford, is home to the personal library of philosopher and political theorist John Stuart Mill, which Mill inscribed liberally with manuscript commentary. In Summer 2015, Professor Albert Pionke of the University of Alabama set up a partnership between the ADHC and Somerville to digitize and make searchable all of Mill’s marginal annotations. The project team at the ADHC includes Metadata Librarian Mary Alexander, Digital Scholarship Librarian Emma Annette Wilson, and IT Specialist Tyler Grace (see our Project Progress Blog for more information).

How can you make both verbal and non-verbal handwritten marginalia searchable in an online database? This is the challenge of the Mill Marginalia Online project. Traditional searching purposefully eliminates non-verbal inclusions such as punctuation, but frequently Mill’s annotations consist of only that: an exclamation point, question mark, or underlining of a word. Alexander worked closely with Pionke to fit these non-verbal marks in addition to verbal commentary to elements from several metadata standards. The creative element of metadata librarianship came to the fore to capture Mill’s exclamation marks, crossings-out, exasperated strike-through corrections, inter-linear edits, and pithy judgements from the margins of his library to be shared with modern readers. Without a robust and tailored metadata schema, none of this information would be discoverable. The resultant metadata schema is one that could be adapted for use by any DH project documenting handwritten or printed marginalia, an innovation that is rooted in metadata in combination with humanities scholarship.

To ensure that the schema is appropriately represented within the project’s NoSQL database structure, extensive collaboration has taken place between metadata and IT. In this project, Pionke devised a master spreadsheet to capture information about the published text and the marginalia. Another spreadsheet captured the full bibliographic information of the published text at the title level, including related sources for comparison such as translations, and digitizations of full text from identical printed editions, since the project’s sources have never been digitized in full. Spreadsheet headers are in the form of instructions that facilitate the training of students in data input. All headers were mapped to elements comprising a unique metadata schema with elements resembling parts from familiar standards (MODS and TEI), whilst custom elements allow values to describe the location of marginalia at the page level, and specific in-page level.

From Research To Teaching

In 2010, Professor Connie Janiga-Perkins of Modern Languages and Classics travelled to Bogota to retrieve digital images of a late-seventeenth/early eighteenth-century manuscript spiritual autobiography of nun Jeronima Nava y Saavedra. Working with the ADHC and relying upon metadata librarian support, Janiga-Perkins learned how to encode the manuscript for a digital edition, beginning with core elements found in TEI Lite. This enabled Perkins to identify four different authors within the text (the nun and her three confessors), as well as documenting waves of revision. As other text features were identified, more TEI elements were easily added to the schema using the TEI community’s tool for generating custom schemas, Roma.

In Fall 2015, she had obtained another autobiographical manuscript, this time of Madre Maria de San Jose. Janiga-Perkins wanted her graduate students to transcribe and encode the manuscript as part of their 500-level class, so Alexander and Wilson partnered with her as co-teachers of TEI. For the last four weeks of the class, Alexander taught students how to create and follow a metadata schema to encode a section of manuscript that is otherwise unavailable to the general public. Earlier class sessions taught by Janiga-Perkins were spent preparing the transcription using paleographic techniques. Wilson and Janiga-Perkins worked alongside Alexander to elucidate questions from a digital and Modern Languages perspective to ensure the digital edition taking shape captured the manuscript accurately and appropriately for researchers. Students transcribed twenty pages (or one section) of the manuscript before pasting that text into a TEI Lite template using oXygen, an XML editor. Students learned the syntax of XML, the underlying mark-up language of TEI, first by coding line breaks in their transcriptions. They went on to document features of the text including strikethroughs and an unusual image of a cross written by Maria de San Jose at the top of every page. Due to the class’s decision to link to an image of the cross, the schema was expanded beyond TEI Lite giving the students the opportunity to learn about Roma. Through mediated discussion, students played an instrumental role in defining what the final TEI schema would document.

During the class sessions, students experienced first-hand some of the questions and difficulties that had arisen for Janiga-Perkins and Alexander in encoding the first manuscript. How can you describe non-verbal manuscript marks accurately? How do you distinguish between different pen weights and brushstrokes – and does it matter if you do? If you mark up interlinear annotations, to what extent can these be displayed in a web setting? Some of these questions remain undecided and will be debated afresh by the incoming graduate class in Fall 2016, and that very process of indecision or indeterminacy has added a valuable component to the learning experience, as DH projects typically morph as both technology and researchers’ experience working with digital techniques evolve. One interesting outcome of even unresolved discussions was that the class formed a strong bond as they identified the intellectual questions at stake in their project. The class came to be about the research process as well as accumulating TEI-specific knowledge.

This project represents outgrowth from research into a combined research and teaching initiative, meaning that Alexander is able to conduct outreach not only by consulting with a faculty member, but also by actively leading multiple class sessions teaching students about sophisticated metadata techniques outside the remit of Library and Information Studies.

Conclusion: Takeaways From Metadata Outreach

[pullquote]By engaging with faculty and students first-hand through project meetings, training sessions, and co-teaching graduate classes in different fields, metadata librarians are able to disseminate knowledge about metadata in both specific and also conceptual ways.[/pullquote]Metadata outreach is enabling the ADHC to form new cross-campus connections with faculty and students alike in a way that meets a very specific new research need generated by and generative of digital scholarship in the humanities. By engaging with faculty and students first-hand through project meetings, training sessions, and co-teaching graduate classes in different fields, metadata librarians are able to disseminate knowledge about metadata in both specific and also conceptual ways. DH projects tend to be fluid in light of the swift technological developments taking place in the field, so an important part of this kind of outreach is introducing faculty and students to ways of approaching metadata questions, in addition to working with one specific schema. Faculty and student takeaways include not only an enhanced understanding of and experience working with metadata, but also greater knowledge about the fundamental approaches to structuring data in DH projects, which can empower them to engage in further such projects in the future. In short, DH projects depend upon good metadata, and, in turn, metadata librarians have a great opportunity to make a significant impact by collaborating with and teaching faculty and students how to conceptualize and create good metadata to generate current and future projects.

As part of a collaborative effort, it is important to recognize that the research goal is the ultimate objective and the selected metadata schema must be a good fit. Each team member’s expertise must be understood and valued, and project planning should provide learning opportunities for all involved. The metadata librarians have learned new schemas and vocabularies that may be drawn upon by their institution in the future.

References

Harkema, Craig, and Nelson, Brent. “Scholar-Librarian Collaboration in the Publication of Scholarly Materials”. Collaborative Librarianship 5, no.3 (2013): 197-207.

Hoeve, Casey. 2015. Digital Humanities and Librarians: A Team-Based Approach to Learning. In Kathleen L. Sacco, Scott S. Richmond, Sara M. Parme, and Kerrie Fergen Wilkes (Eds.), Supporting Digital Humanities For Knowledge Acquisition in Modern Libraries (107-131). IGI Global: Hershey, PA.

Llona, Eileen. “The Librarian’s Role in Promoting Digital Scholarship: Development and Metadata Issues”. Slavic and East European Information Resources 8, no. 2 (2007): 151-163.

McFall, Lisa M. (2015). Beyond the Back Room: The Role of Metadata and Catalog Librarians in Digital Humanities. In Kathleen L. Sacco, Scott S. Richmond, Sara M. Parme, and Kerrie Fergen Wilkes (Eds.), Supporting Digital Humanities For Knowledge Acquisition in Modern Libraries (21-43). IGI Global: Hershey, PA.

Padilla, Thomas. “Humanities Data in the Library: Integrity, Form, Access”. D-Lib Magazine 22, no.3 (2016): DOI: 10.1045/march2016-padilla.

Svensson, P. “Envisioning the Digital Humanities”. Digital Humanities Quarterly 6, no.1 (2012): retrieved from http://digitalhumanities.org/dhq/vol/6/1/000112/000112.html

Author: Emma Annette Wilson

Dr. Emma Annette Wilson is Digital Scholarship Librarian/Assistant Professor of English at the University of Alabama where she manages over 80 Digital Humanities projects at the Alabama Digital Humanities Center. She is founder of the annual Digital Humanities conference Digitorium (https://apps.lib.ua.edu/blogs/digitorium/).