digital archives

Bias, Perception, and Archival Praxis

Elvia Arroyo-Ramirez is Processing Archivist for Latin American Collections at Princeton University Library. Elvia holds an MLIS with a concentration in Archives, Preservation, and Records Management from the University of Pittsburgh. She has presented widely on digital archives and diversity and is co-author with Rose L. Chou, Jenna Friedman, Simone Fujita, and Cynthia Mari Orozco of the forthcoming article ‘The Reach of a Long Arm Stapler: Calling in Microaggressions in the LIS Field through Zine Making’ (Library Trends, Spring 2018).

Thomas: In Invisible Defaults and Perceived Limitations: Processing the Juan Gelman Files you describe how technologies used to work with digital collections can channel bias – bias that is not just a historical legacy but very much a product of the here and now. Before we discuss this piece in detail I’m curious to hear more about what experiences shaped how you see your work in archives? Perhaps what led you to the archival profession?

Elvia: My interest in archives evolved from my studies in art history as an undergraduate at UCLA. I took a class on Dada and was fascinated by the Dadaists’ tendency to collect and piece together meaning from disposed and/or re-purposed materials. Marcel Duchamp’s The Bride Stripped Bare By Her Bachelor’s Even, The Large Glass and Kurt Schwitters’ Merzbau inspired a deep pathos that eventually became the catalyst to move to a career in archives.

*Merz Picture 32 A. The Cherry Picture (Merzbild 32 A. Das Kirschbild), Kurt Schwitters*
*1921*

To provide a little more context on the catalyst—between 1923 and 1936, Schwitters collected and progressively pieced together his colossal Merzbau with objects gifted or left behind by friends and family such as souvenirs, letters, clippings, and articles of clothing (some stolen by Schwitters). Everything that mattered to Schwitters became part of the bau. It was ultimately destroyed by an Allied air raid during World War II. Schwitters’ loss struck a chord with me. His unconventional way of record keeping and memory construction made me curious about archival collections and the process of maintaining and making them available for access.

[pullquote]Who gets to be remembered and historicized by way of record creation?[/pullquote]Archival work requires an ethics of care for the deeply personal and the deeply political. My former boss at the Center for the Study of Political Graphics often said that all art is political. The same can be said about archives and archival work. Record creation, keeping, obstruction, or misrepresentation are all acts of identity and power. Who gets to be remembered and historicized by way of record creation? Who is forgotten or purposefully silenced in history by way of omission or destruction of records? How are records themselves (official records created for governmental purposes in particular) used to communicate misguided notions of holistic representation, truthfulness, neutrality, and objectivity? These are all questions that initially drew me to and continue to keep me in the profession.

Thomas: I’ve noticed that power and representation or lack thereof are taking a more prominent place in Digital Humanities and digital library conferences. I gather that this focus in archival work isn’t necessarily sparked by a transition to digital environments – rather that it predates that transition and maybe even runs alongside it. Do you think there is a reciprocal value to be gained from working across physical and digital legacies? What sorts of critical questions are raised when working with either? How are these questions different or similar depending on the medium and the technology?

Elvia: Issues of representation and power are fundamentally rooted in archival work and there is rich critical scholarship that discusses these issues in the context of pre-digital archives. Sam Winn’s piece The Hubris of Neutrality in Archives does an excellent job acknowledging some of the recent critical work in the archival profession that addresses issues of representation, and gives a nod to Howard Zinn’s seminal address to the profession at the 1970 Society of American Archivists meeting. Scholars like Verne Harris, Cheryl Beredo, Randall Jimerson, and Michelle Caswell discuss issues of power, representation, and accountability by challenging the existing canon of archival neutrality and objectivity; speaking on colonialism, apartheid, and transitional democracies and their relationships to record keeping; and connecting these challenges to current archival practices. These scholars have built critical foundations for emerging scholarship that speaks to these same issues in the digital realm.

There is definite value to be gained from working across physical and digital legacies. The work helps us recognize our shortcomings. Jarrett Drake has pointed out that the archival profession’s canonical principle of provenance is grounded in a 19th century colonialist and imperialist era wherein legal property and ownership of records was limited to western white men. Historically provenance has more or less worked well for archivists tasked with keeping a history of ownership. In digital mediums and environments things are a bit different. What does the provenance of a collaboratively created or anonymously created Google Doc look like? In digital environments provenance is becoming increasingly difficult to pin down. I believe this will force the profession to re-evaluate how archivists should account for ownership, authenticity, and custody.

*3.5″ floppy disks from the Juan Gelman Papers, Department of Rare Books and Special Collections, Princeton University Library, Elvia Arroyo-Ramirez.*

[pullquote]Appraisal for digital collections is, I believe, slowly being shouldered by the processing archivist…[/pullquote] Of course privacy and volume are issues present in analog collections but they are further problematized when we consider the digital deluge and the responsibility of determining permanent historical value. In analog archival collections donors and creators can physically comb through and filter materials they do not want to deposit in an archival repository due to the presence of sensitive or personal information. Acquisition of entire hard drives makes appraisal for donors a lot more difficult and places the responsibility of protecting sensitive or personal data on archivists, who, on the whole are not nearly paid enough; not equipped with the necessary tools and infrastructure; and do not have enough hours of the day to devote the labor necessary to peruse every file. Appraisal for digital collections is, I believe, slowly being shouldered by the processing archivist without a donor, curator, or administrator understanding of the amount of time it takes to do the work. Questions about how to best address privacy issues and what to keep and what not to keep when we speak with our donors at the point of acquisition is something archivists will have to continue to advocate for.

Thomas: At the end of your previous response you allude to what might be called “the weight of inheritance” – what is passed to us and the wherewithal we gather deal with it. I sense a similar tension at work in Invisible Defaults and Perceived Limitations: Processing the Juan Gelman Files. In that piece you describe how tools you inherit as an archivist carry a set of assumptions that bias processing and representation of digital collections. Are there particular strategies for recognizing these biases and dealing with them? Particular readings or frameworks that guide you in the engagement?

Elvia: I recommend taking a deep dive in social justice and decolonizing technology readings (a trove of which are located here and here).

For me, it has become important to recognize that the tools archivists and other information managers are using (and developing) are part of a larger system that is complicit in propelling and replicating a hegemonic Global North. While technologies are marketed as decentralized, democratic products unbound by location (geographic, cultural) they are largely being developed by a relatively small minority of the world’s population who has the majority control to assert autonomous power. Understanding this, we begin to ask how this frame of thinking impacts an archivists’ responsibility to collections on the margins of, or far from, the Global North.

I want to emphasize that at the heart of what I was writing about in my experience processing the Gelman materials has more to do with recognizing our own biases and perceptions as practitioners learning to be technologists, rather than the current tools we have at our disposal.[pullquote]I also think about the weight of our ancestral and cultural inheritances and how we reckon (or not) with these as practitioners, users, and creators of digital collections. [/pullquote]You mentioned “the weight of inheritance” in the first part of your question — and beside having to reckon with the tools we use and their probable limitations, I also think about two other types of inheritances. I think about the technical language the digital curation community has inherited or adopted as its own and how potentially ill fitting it can be when applying it to cultural heritage collections. I also think about the weight of our ancestral and cultural inheritances and how we reckon (or not) with these as practitioners, users, and creators of digital collections. Tapping into my own cultural inheritances as a bilingual-U.S. living practicing archivist of Mexican ancestral roots, I understood how removing diacritic characters from accented words not only inherently changed the meaning behind filenames, it would be an act of cultural erasure. We need more use cases like Gelman’s in order to critically reflect on our current practices to make them better.

Thomas: In the digital humanities, researchers and practitioners (myself included) often dig into the language that is used to describe data and how one works with data. Verbs like cleaning (see Katie Rawson and Trevor Muñoz’ piece Against Cleaning) are problematized. The word data itself is questioned extensively – some even go the route of suggesting alternative nouns (see Johanna Drucker’s argument for capta). Some question a terrestrial bias at work in our understanding of data (see Melody Jue’s Wild Blue Media: Thinking through Seawater). An increasing number of scholars explore the genealogy of the word data (see Lisa Gitelman’s Raw Data is an Oxymoron). In your work with the Gelman files I was intrigued to see your focus on words like “clean”, “compromise”, and “illegal”. I’m wondering if you might comment on possible alternatives in this space? Maybe models of collaboration and community that could lead to something that better approximates the diversity of a range of lived experience?

Elvia: I find the use of the term “illegal” irresponsible when it is applied outside the confines of the law. Contextualizing the term in our current sociopolitical moment and its application (among others) in the form of a noun to describe migrants not authorized to stay in their country of residence makes for a potentially dangerous association with the dehumanization of migrants. We (digital humanists/archivists) are in the business of preserving and making accessible collections that include a diversity of cultures, identities, and perspectives. Surely we can find more accurate descriptors to communicate what checks out or does not check out in the language we use to describe our practices.

*Elvia Arroyo-Ramirez, Invisible Defaults and Perceived Limitations: Processing the Juan Gelman Files*

[pullquote]… we should keep in mind that wholesale adoption of technological language that has been developed for and by other (dis)similar fields is potentially incongruous to our needs.[/pullquote]Katie Rawson and Trevor Muñoz are onto something when they point to the example of “data cleaning” and how this term is used as an opaque shorthand for a number of diverse actions and steps that are taken to render data usable. This work illustrates the point that emerging areas of work in this space have not fully developed the pointed language needed to communicate our processes and roadblocks. As we move forward we should keep in mind that wholesale adoption of technological language that has been developed for and by other (dis)similar fields is potentially incongruous to our needs. Even in my use of “our” (digital humanists/archivists) there are varying use/need cases.

I believe having conversations across similar fields with a diversity of practitioners is key to understanding how our practices and end goals are alike and dissimilar. Part of the issue is that we are so busy trying to figure out how to reach end goals that we are not quite familiar with the practices each of us employ en route. The proposal of the Collections as Data framework is certainly an opportunity to bring together varied practitioners and users of data to conceptualize or begin reimagining a shared terminology that is mapped to our respective practices and responsibilities.

The records continuum model may add to the collections as data conversation. The model was originally conceptualized to reflect the overlapping responsibilities of records managers and archivists but I think it could potentially be expanded for those working on preserving and researching archival data. For instance, my goal as an archivist is to make little to no changes to the structure and content in a collection while normalizing accessible content to make it as platform and system agnostic as possible. When I intervene (duplicate or irrecoverable files, etc.), I must document and justify why I had to. These decisions should be made transparent to our users. The goals of digital humanists are a lot more diverse (i.e. potentially a lot more “data cleaning”), but their ability to access the content they work on is potentially dependent on my labor to preserve and provide access to it.

While archivists and digital humanists might have different goals, we share similar processes and terminology. I think the records continuum model can reveal how much of our current practices we share, or potentially want to share. I would love to organize a think out loud meeting (a future Collections as Data meeting?) with data curators, archivists, digital humanists, systems administrators and developers, and whomever else is heavily thinking about this. We might create a shared lexicon that better describes our shared needs and practices.

Thomas: Lastly, whose work would you like people to know more about?

Elvia: Tara Robertson’s presentation, Not All Information Wants to be Free, taught me that the library profession’s blanket tendency to digitize pre-Internet print resources can be harmful especially if it clashes with the original consent of participants involved. In the case Tara highlights, materials from an underground print publication that was produced for a very specific target audience were digitized and made accessible to a general audience without taking care to reach out to individual participants to get their renewed consent. The act of digitizing for access, in this case, was an act of “outing” for some participants who relied on the relative obscurity print provides. Everyone should take pause and read it.

Angela Galvan’s Architecture of Authority helps explore the differing and often conflicting core values libraries and vendors have and how these relationships affect the ways we provide access to our resources. The piece also complicates how we see our relationships to our users. My fellow co-presenter, Giordana Mecagni gave an excellent talk, The Colonizing Gaze – Digitized Collections, Radical Communities and Paywalls, on this subject at this year’s Society of American Archivists annual conference. Designer Jen Wang’s Now you see it: Helvetica, Modernism, and the Status Quo of Design, speaks on the history of design and its perpetuation of whiteness as aesthetic neutrality. Todd Honma’s work on teaching community archives and zines can serve as lessons for librarians, archivists, and other information professionals on how to use zines, an originally analog medium, to better engage with broader communities. I’ve gathered much inspiration, perspective, and validation from these readings. I am also excited to hear more from students and new professionals like Itza Carbajal, Chido Muchemwa, Nikki Koehlert, Aliza Elkin, and Crystal Paull, all of whom I just had the pleasure of meeting recently.

This work is licensed under a Creative Commons Attribution 4.0 International License.

POST: What’s a Nice English Professor Like You Doing in a Place Like This: An Interview With Matthew Kirschenbaum

Trevor Owens has posted a terrific interview with Matt Kirschenbaum (Associate Professor in the Department of English at the University of Maryland and Associate Director of the Maryland Institute for Technology in the Humanities). In it, they discuss his involvement in the digital archives and digital forensics communities, the hurdles that born digital materials create, BitCurator, and places like MITH “as inhabiting a kind of ‘third space’ between manuscript repositories processing born-digital collections on the one hand, and computer history museums on the other.”

Regarding what practices to adopt for working with born digital materials in the long-term, Kirschenbaum notes that in some cases the problem is not primarily technical:

[T]he increasing tendency towards preemptive data encryption—practices which will surely become even more commonplace in the wake of recent revelations—threatens to make archival preservation of personal digital content all but unthinkable for entities who lack the resources of the militarized surveillance state. I know of very little that archivists can do in either of these instances other than to educate and advocate (and agitate). They are societal issues and will be addressed through collective action, not technical innovation.

RESOURCE: Crowdsourcing + Machine Learning: Nicholas Woodward at TCDL

Nicholas Woodward, Software Developer at the University of Texas Libraries, shares the text of the talk he gave at the Texas Conference on Digital Libraries. Woodward describes his novel approach for transcribing the Digital Archive of the Guatemalan National Police Historical Archive, a collection of over 12 million pages:

My approach looks to break up documents into individual words with the idea that though no two documents are exactly alike they are likely to contain similar words. And across an entire corpus, particularly very large ones such as AHPN, words are likely to appear many times. Consequently, if users transcribe the words of one document, then I can use image matching algorithms to find other images of the same words and apply the crowdsourced transcription to the new images.

POST: Electronic Literature as Cultural Heritage (Confessions of an Incunk)

Matt Kirschenbaum (Maryland Institute for Technology in the Humanities) shares the text from his talk at the Library of Congress’s Electronic Literature Showcase. In the talk, Kirschenbaum self-identifies as an Incunk, or “one who has assumed archival and curatorial stewardship over… electronic literature collections.” He discusses the issues at stake when “electronic literature passes from outsider practice to cultural heritage as sanctioned by its passage from private hands to an increasing number of major collecting institutions,” where the processing of digital materials both raises important theoretical questions and constitutes “what is increasingly normalized professional practice.”

CFP: Composing In/With/Through Archives: An Open-Access, Born Digital Edited Collection

The Cultural Heritage Informatics Initiative at Michigan State University invites essays (8000 words) and case studies (3000 words) for a digital, OA edition that will examine, among other topics:

How are we theorizing digital archives?
How are we drawing from the work of digital archivists as we build our own archives and conduct digital archival research?
How do digital archives mediate how we write?
How do we differentiate between digital archives/repositories/libraries? Why are these distinctions important?

Abstracts due by April 30.

Reflections on THATCamp MLA 2013

The Digital Media Commons at Northeastern University Libraries.
Photo by Tom Urell, © Northeastern University Libraries. Reproduced with permission.

In this post, Amanda Rust (English + Theatre Librarian at Northeastern University Libraries) shares her notes and reflections from THATCamp MLA, and offers advice for those considering THATCamp attendance.

THATCamp MLA, held in Boston on January 2, 2013, just before the annual MLA Convention, had a rich selection of session proposals (the final schedule is here). While I’ll report more deeply on two sessions below, I’d encourage you to see the complete session notes and Twitter stream for more. My notes are (obviously) shaped around personal interests, so I can’t suggest them as a complete recap of any session, but rather as an introduction to the kinds of conversations you might encounter at a THATCamp.

For those unfamiliar with the THATCamp model: THATCamp is a digital humanities “unconference,” as well as a great time. I’ve heard THATCamps described as “the best part of a conference,” the excellent conversations you have with people interested in the same subjects you are. THATCamp discussions are often unstructured and wide-ranging, and I’ve found my attendance most productive when I bring my own set of questions I want to think about during the day, and view it as an opportunity for interdisciplinary discussion rather than a single, in-depth exploration of one particular subject.

Morning Session: Aesthetics and DH

The first session I attended was Aesthetics and DH. I’m interested in interactive design (particularly around interfaces for serendipity and uncertainty in the research process), and ways that a design approach offers a chance to consider formal/aesthetic elements in research and DH projects. Which is a long way of saying: I was totally looking forward to this session. The session organizer, Amanda French, has good notes here, to which I can add some additional personal reflections.

[pullquote] One of the exciting aspects of DH, for me, is that it brings the art and design disciplines more into discussion with the humanities.[/pullquote]Session attendees were a typically THATCamp interdisciplinary mix: literary, media, and game scholars, librarians from several universities around Boston, education technologists, humanities grad students, and other campus staff interested in DH. Simply during introductions we ranged over poetry and the material, translating into digital; aesthetic analogs to neurological mechanisms; aesthetics of scholarly production; multimodal production; and 10 PRINT and cultural and/or metaphorical aspects of translation (e.g., code as a language in need of translation).

The session facilitator opened with an intriguing question: What do new media folks wish old media folks understood? The broadest and simplest answer seemed to be: understand that it exists as a discipline, with its own history and robust vocabulary and approaches. This theme of interdisciplinarity comes up quite a bit in DH – when working with other disciplines and professions, how do you collaborate in ways that respect the expertise and history of those other areas? There were a few good suggestions for further reading on new media aesthetics (see French’s notes).

The discussion moved into critical code studies, which again seemed to touch on issues of translation and design:

Platform port as translation, generation of remarkably different aesthetic objects (e.g., differences in aesthetic experiences with games translated from the NES to PC emulators to iPad touch versions).
Understanding code as designed, as a series of choices resulting in constraints. (See A Tower of Languages / Paul Swartz)
Being sure not to fetishize code as the objective “real real,” programmers make subjective choices as we all do.

Control was another big theme: as an artist, how can you / can you at all control the aesthetic experience of your viewer? Possibly, in the 19th century, we could assume that a painting hung in a museum with a certain environment, lighting, quietude, but what does an artist assume now?

Many digital current artists have no expectation of continuity, and incorporate lack of control into the experience, embracing ephemerality. There is an illusion of control with physical media and some technology, e.g. we once used Flash to control screen presentation, but Flash is now out of date and hard to play.
The ability to interact with aesthetics is affective: create your own blog template, feel more connected and in control.
Translations can be violent. Apps provide a new arena of control – rather than translation, more of a palimpsest, piling on code without changing the underlying?

As a GLAM professional, I was also fascinated by a mini-discussion of the Smithsonian’s The Art of Video Games:

Exhibit focus on nostalgia vs. art. How is the exhibit different if approached from historical vs. aesthetic organizing principles?
Disembodied game consoles can be a problem — no chance to consider design process and code, not enough aesthetic experience through play.
If including the aesthetic experience and context, how much do GLAMs need to select, preserve, and curate? (See my earlier comments on feeling simultaneously excited and exhausted.)

Interactivity was another sub-theme: do new digital aesthetics lead us to a baseline expectation of interactivity?

As always, don’t want to draw too sharp a distinction with the “old”: readers were interactive just perhaps more slowly. See Pamela/Shamela as fan fiction before the era of broadcast.
Linked data leads to new possibilities for mashup, see Small Demons. Mashups can be both invitation and form of pedagogy.

We wrapped up the discussion with consideration of market forces on design/aesthetic choices:

Market forces – cheap manufacture – were impulse behind now famous Leaves of Grass first edition design.
Incidental aesthetics: we associate emotions with designs made for market reasons (e.g., recall with fondness our intimacy with cheap paperbacks).
Through monetization of ebooks, remove aesthetics from print books to create electronic versions as quickly and cheaply as possible.

I’m often struck by how much work on aesthetics and form requires the creation of cultural and material data that just doesn’t exist yet. One discipline mentioned in the session introductions, fashion history, could be helped by a database of hemlines, textiles used, color, etc. for clothing over the centuries. How will we create this? As a librarian, I am simultaneously excited and exhausted just thinking about it.

I enjoyed this session as a good start to the day, spurring thoughts about some of the more theoretical aspects of DH, as well as how different disciplines approach the same theoretical problems. One of the exciting aspects of DH, for me, is that it brings the art and design disciplines more into discussion with the humanities. While the analysis of art and design have often been a part of the humanities – art and design held apart as objects to be considered – the actual processes and methods of art and design can serve as working models to be emulated rather than simply analyzed. User centered design, for example, is a deeply humanistic approach to the world.

Afternoon Session: Teaching Digital Archives

The afternoon session I attended, Teaching Digital Archives, was an excellent and practical discussion on working with primary sources. Our facilitator, Paul Jaussen, started with some general food for thought: in literature, how do you use the digital archive not just in one’s own work but as a teaching tool?

Process of creating literary history encourages close reading as well as historicization – how to make those two aspects coherent in one class? The literary text and contextual primary sources make each other more valuable as they are connected.
Specific example: students annotated historical maps in David Ramsey collection to give historical context to early American republic. 120 annotations by the end of class, resulting in a visualized transatlantic.

We discussed what features a digital archive needs in order to facilitate teaching and research. (For librarians, these sessions provide quick and easy user research.) To start, platforms need easy annotation tools, portable annotations, levels of openness and privacy, and easy ways to publish.

We shared some specific project examples like Neatline and Omeka (Omeka bonus: in the background libraries can easily archive what’s produced) and Historypin’s integration with mobile technology. Increasing connections between scholars and archivists was a theme, one model is outreach via making faculty members curators of archival collections, and thereby encouraging building classes around those collections.

Digital archives also carry questions of access. With all the great online archives focused around single authors, using manuscripts and primary sources should be part of teaching canonical figures. However, if archival work becomes a new standard in education, unless archival resources are open access, it will only further increase the digital divide: some schools will be able to afford the necessary archives, some won’t.

Individual teachers also have to choose between output: should students to create an online exhibit or write a long-form paper?

Rather than an either/or, address both: making a good web argument is an additional skill, not a replacement for the long-form paper. Grade analytic and synthetic skills in the online medium, too.
Can a department design curriculum to embrace online arguments in one class and the five thousand word essay in another?
Anthologize can help switch between formats.
Archive Fever can serve as an inspirational intro for undergrads.

We ended with a brief discussion of online tools for browsing, to enhance the serendipity of the research process; see Harvard Shelf Life, with book size based on circulation. (My own later research reminded me of CommonsExplorer, additional work done with browsing large archival sets.)

Again, these are just a few notes from two sessions, to give a sense of both the theoretical and practical conversations you can have at a THATCamp. THATCamp offers the chance to see how your research project might be viewed by an instructional designer, an historian, an IT professional. You may be just starting out and in a learning mode, or you may be deeply involved in a particular project and want detailed feedback — I’ve found that there’s room for both, but you should be ready to introduce yourself and drive some conversations. THATCamp challenges attendees to be active in their participation, and learn the art of asking good questions and sharing good ideas. If this sounds appealing, THATCamp ACRL happens soon!