data – dh+lib

Situated Interpretation, Capacious Computation, Empowered Discovery

Tanya Clement is Assistant Professor in the School of Information at the University of Texas at Austin. Her research centers on scholarly information infrastructure as it impacts academic research, research libraries, and the creation of research tools and resources in the digital humanities. She has published in a wide range of publications including Cultural Analytics, Digital Humanities Quarterly, and Information & Culture. Her research on the development of computational methods to enhance description, analysis, and access to audio collections has been supported by the National Endowment for the Humanities and the Institute of Museum and Library Services.

Thomas: You’ve spent a great deal of time researching and developing infrastructure to support computational analysis of recorded sound. A bit later I’ll be asking more about that, but I’m curious where your interest in infrastructure, the affordances of digital vs. analog media, and possibilities for Humanistic inquiry latent in various computational approaches began? Was it a product of your graduate training, or some other combination of experiences?

Tanya: The most time I spent on the family’s Apple IIe when I was little was to play a bowling game. I did play one text-based game, but it wasn’t the ones you hear spoken about by most DH scholars, not Oregon Trail or Colossal Cave Adventure. In the game I played, the title of which I cannot recall, I died from botulism after opening a can of food. I should have known – the can was dented.

Most likely my original interest in infrastructure came from math. My older brother who went on to become a kind of a math whiz, somehow figured out early on that math was a creative endeavor. You could do it all kinds of ways, even if the teachers only show you how to do it one way in school. So, when I didn’t understand things at school (which was often), I would go home and figure it out for myself — pencils and erasers and books spread out all over the glass kitchen table. I approached a math problem as a constructed thing. I learned that I just had to figure out the best ways to build the math in order to use it towards a solution.

[pullquote]Literature was (and remains) a miraculous thing to me, an incredible thing that humans build.[/pullquote] Literature was (and remains) a miraculous thing to me, an incredible thing that humans build. Many in DH, tinkerers all, talk about a desire to know how things are built. The same was true for me in math and in fiction. When I did my MFA in Fiction at UVa (1998-2000), my primary question was how did that author work that language to that effect? How does she build a person, a family, a community, a society, or a universe out of words on the page? While I was working on my MFA, I had a GAship in the Institute for Advanced Technologies in the Humanities at UVa, which was being run at the time by John Unsworth, and in the eText Center, which was headed by David Seaman. I worked at both for a semester, and these jobs led me to a job as a Project Designer at Apex CoVantage. At the time, Apex was contracted by ProQuest and Chadwick Healey to digitize their microfilm collections, one of which was Early English Books Online (EEBO), among others. The history of EEBO’s digitization is described elsewhere (see History of Early English Books Online, Transcribed by hand, owned by libraries, made for everyone: EEBO-TCP in 2012 [PDF]), alongside the collection’s oddities, so it is not shocking to point out that digitizing EEBO was not an exact science and that the collection remains today riddled with inexactitudes beyond anyone’s control. This job brought me close to the complexities behind building important digitized heritage collections, however, which remains a central interest for me.

Whitney Trettien, “thumbprint of scanner visible”, STC / 887:16 EEBO

I could see that our cultural heritage and the infrastructures, both social and technical, that sustain them, preserve them and make them accessible to us were constructed things – and, as all human-made things, constructed more or less well. In my graduate degree at the University of Maryland and in working on DH projects at the Maryland Institute for Technology in the Humanities (MITH), I learned that we have some agency and that telling the stories of a person, family, community, or society well depends on our enacting that agency in the humanities. Already concerned with considerations for the inexactitudes of representing the complexities of human culture, humanists are best situated to get our books and pencils out, spread them on the glass kitchen table, and work towards the best ways to build and sustain our cultural heritage in the digital age.

Thomas: I really appreciate the perspective of enacting agency through the Humanities. Over the past few years you’ve exercised that agency, in part, to develop computational use of audio collections in the Humanities. What are the primary opportunities and challenges in this space as they pertain to infrastructure?

Tanya: Libraries and archives hold large collections of audiovisual (AV) recordings from a diverse range of cultures and contexts that are of interest in multiple communities and disciplines. Historians, linguists, literary scholars, and biologists use AV recordings to document, preserve, and study dying languages, performances and storytelling practices, oral histories, and animal behaviors. Yet, libraries and archives lack tools to make AV collections discoverable, especially for those collections with noisy recordings – recordings created in the forest (or other “crowded” ecological landscapes), on the street, with a live audience, in multiple languages or in languages for which there are no existing dictionaries. These “messy” spoken word and non-verbal recordings lie beyond the reach of emerging automatic transcription software, and, as a result, remain hidden from Web searches that rely on metadata and indexing for discoverability.

Further, these large AV collections are not well represented in our National Digital Platform. The relative paucity of AV collections in the Europeana Collections, in the Digital Public Library of America (DPLA), the HathiTrust Digital Library (HTDL) and the HathiTrust Research Center (HTRC) for instance, is a testament to the difficulties that the Galleries, Libraries, Archives, and Museums (GLAM) community faces in creating access to their AV collections. Europeana is comprised of 59% images and 38% text objects, but only 1% sound objects and less than 1% video objects. DPLA reports that at the end of 2014 it was comprised of 51% text and 48% images with only 0.11% sound objects, and 0.27% video objects. At this time, HTDL and HTRC do not have any AV materials.

The reasons behind these lack of resources range from copyright and sensitivity concerns to the current absence of efficient technological strategies for making digital real-time media accessible to researchers. CLIR and the LoC have called for, “ . . . new technologies for audio capture and automatic metadata extraction” (Smith, et. al, 2004 [PDF]), with a, “ . . . focus on developing, testing, and enhancing science-based approaches to all areas that affect audio preservation” (Nelson-Straus, B., Gevinson, A., and Brylawski, S. 2012, 15 [PDF]). Unfortunately, beyond simple annotation and visualization tools or expensive proprietary software, open access software for accessing and analyzing audio using “science-based approaches” has not been used widely. When it is used with some success, it is typically on well-produced performances recorded in studios, not, for example, on oral histories made adhoc on the street.

[pullquote]Can we make data about sound collections verbose enough to enable an understanding of a collection even if and when that collection is out of hearing reach because of copyright or privacy restrictions?[/pullquote] We need to do a lot of work to better prepare for infrastructures that can better facilitate access to audio especially at the level of usability, efficacy, and sustainability. For instance, we don’t know what kinds of interfaces facilitate the broad use of large-scale “noisy” AV analyses from a diverse range of disciplines and communities? Sound analysis is pretty technical. How do we learn to engage with its complexities in approachable ways? Further, how much storage and processing power do users need to conduct local and large-scale AV analyses? Finally, what are local and global scale sustainability issues? What metadata standards (descriptive or technical) map to these kinds of approaches? Can we make data about sound collections verbose enough to enable an understanding of a collection even if and when that collection is out of hearing reach because of copyright or privacy restrictions?

Thomas: Now seems like a good time to discuss your audio collection infrastructure development. If you were to focus on a couple of examples of how this work specifically supports access and analysis what would they be?

Tanya: Specifically, we’ve been working on developing a tool called ARLO that was originally developed by David Tcheng, who used to be a researcher at the Illinois Informatics Institute at the University of Illinois Urbana Champaign and Tony Borries, a consultant who lives in Austin, Texas. They had created ARLO to help an ornithologist David Enstrom (also from UIUC) to use machine learning to find specific bird calls in the hundreds of hours of recordings that he had collected. Recording equipment has become much more powerful and cheaper over the last decade and scholars and enthusiasts from all kinds of disciplines have more recordings than possible human hours to analyze it all. Our hope is to develop ARLO so that people have a means of creating data about their audio so that they can analyze that data, share it, or otherwise create information about the collections that can make them discoverable.

Tanya Clement, “A visualization of a song in ARLO”, from Machinic Ballads: Alan Lomax’s Global Jukebox and the Categorization of Sound Culture

We are still very much in the research and development phase, but we have worked on a few of our own projects and helped some other groups in our attempt to learn more about what scholars and others might want in such a tool. For example, in a piece I wrote last year I talk about a project we undertook to analyze the Alan Lomax Collection at the John A. Lomax Collection in UT Folklore Center Archives at the Dolph Briscoe Center for American History at that University of Texas at Austin. We used machine learning to find instrumental, sung, and spoken sections in the collection. Using that data, we visualized these patterns across the recordings. It was really the first time such a “view” was afforded of the collection and it sparked a discussion about how the larger folklore collection at UT reflected the changing practices of ethnography and field research in folklore studies from the decades represented by the Lomax collection to those in more recent decades. With the help of Hannah Alpert-Abrams, PhD candidate in Comparative Literature at UT, we used ARLO in a graduate course being taught at LLILAS Benson Latin American Studies and Collections by Dr. Virginia Burnett called History of Modern America through Digital Archives classroom. Students identified sounds of interest in the Radio Venceremos collection of digital audio recordings of guerrilla radio from the civil war in El Salvador. Some of the sounds students used ARLO to find included bird calls, gunfire, specific word sequences, and music.

In another project School of Information PhD candidate, Steve McLaughlin and I used ARLO to analyze patterns of applause across 2,000 of PennSound’s readings. We discovered different patterns of applause in the context of different poetry reading communities. These results are more provocative than prescriptive, but our hope was to show that these kinds of analyses were not only possible but productive. We are still working through how to approach challenges in this work that come up in the form of usability (What kinds of interfaces and workflows are most useful to the community?), efficacy (For what kinds of research and pedagogical and practical goals could ARLO be most useful?), and scalability (How do we make such a tool accessible to as many people as possible?).

Thomas: In An Information Science Question in DH Feminism, you argue for a number of ways that feminist inquiry can help us better understand epistemologies that shape digital humanities and information science infrastructure development. How has this perspective concretely shaped your own work and thinking in this space?

Tanya: What has shaped my work is very much in line with this piece. Many people in STS (Science and Technology Studies) and information studies have written about the extent to which information work and information infrastructures are invisible work (Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work). Feminist inquiry has always been about making the invisible aspects of society more apparent, but it is also about how you take stock of those perspectives in your articulation of research. Everyone’s perspective is shaped by gender (or, really, the construct of gender) but it is also influenced by other aspects of your situated perspectives in the world including your nationality, your ability status, your day-to-day living as a parent, a child, a sibling, a spouse, a friend or any other aspect of your personhood that shapes the way you address and understand the world.

[pullquote]I’ve tried to advocate for developing tools, infrastructures, and protocols that invite others to address research questions according to their own needs.[/pullquote] The concrete ways (the particular or specific ways) that my own situated look at the world has shaped my own work is perhaps less interesting than the ways I’ve tried to advocate for developing tools, infrastructures, and protocols that invite others to address research questions according to their own needs. One aspect of ARLO that continues to intrigue me is the possibility of searching sound with sound. You choose a sound that interests you, you mark it, and you ask the machine to find more of those sounds. Now, what I like about this scenario is that a linguist might mark a sound because it includes a diphthong; someone else might mark the same sound because of the tone; a third person might be interested in the fact that this same snippet is spoken by an older man, a younger woman, or a child.

That our understanding of sound is based on a situated interpretation seems readily apparent especially compared to search scenarios in which words seem to pass as tokens that once represented complex ideas. You can mark gunshots or laughter or code switching moments when a person uses one language intermittently to express something that a society’s dominant language (let’s say English) can’t quite express. The general point or hope is that the process of choosing a sound for searching can be inviting in ways that are different from the process of choosing a single search term. In comparison to using search terms taken out of context, sound snippets remain more complex even with the absent presence of the missing context. It’s as if sounds have more dimensions, even if they clipped from a longer recording. I like working with sound for these reasons.

Thomas: Whose data praxis would you like to learn more about?

Tanya: There is quite a bit of data work going on in digital humanities that is interesting. I appreciate Lauren Klein’s attempt to unravel different histories of data visualization to help us better understand where we are by looking at from where we’ve come as well as Christine Borgman’s Big Data, Little Data, No Data: Scholarship in the Networked World, which exposes the daily practices of scholars who work with data and how those practices influence interpretation.

I have been lucky to participate in the inaugural issue of the Journal of Cultural Analytics, which is an attempt to provide a platform for showcasing how researchers in the humanities can use data to study literature, culture, media and history. Further, Digital Pedagogy in the Humanities: Concepts, Models, and Experiments, in which Daniel Carter and I have written about ten pedagogical assignments that seek to teach students about the situated elements of data in terms of its collection and use. With each of these examples, I am drawn to work that invites us to critique or understand data as a deeply political phenomenon.

Museum as Play: Iteration, Interactivity, and the Human Experience

Sebastian Chan is Chief Experience Officer at the Australian Centre for the Moving Image. Previously he held positions as Director of Digital & Emerging Media at the Cooper Hewitt Smithsonian Design Museum and Head of Digital, Social and Emerging Technologies at the Powerhouse Museum. His work spans consideration of digital and physical spaces and has been recognized by organizations including but not limited to Fast Company, Core77, American Alliance of Museums, and Museums and the Web.

Thomas: Your positions at the Australian Centre for the Moving Image, the Cooper Hewitt Smithsonian Design Museum, and the Powerhouse Museum all in some way focus on the digital aspect of the museum experience. Looking across your career, what combination of experiences and dispositions led you to these types of roles and the responsibilities they come with?

Seb: To be perfectly honest, it’s been a journey of good fortune and having great managers and mentors.

I ended up in the cultural heritage world largely because I had had enough of writing a PhD on the geographies of music subcultures and was working in IT as an escape route. That led to a systems administration role at the Powerhouse Museum because their previous Y2K project manager had unexpectedly departed in mid 1999. The year 2000 was also the year that the Sydney Olympics happened and the Powerhouse had a huge “Treasures of Ancient Olympia” exhibition planned. Tim Hart and Sarah Kenderdine were implementing an immersive 3D reconstruction of Olympia both online and in the exhibition (tiny remnants available) and one day Tim popped down to the IT department and knew that I had some understanding of 3D graphics acceleration and gaming hardware from my time as a videogame reviewer – so I got drafted into the project to do specialist technical support on it. After that I was more heavily involved in web projects and in 2003 I separated from IT and started an independent web unit which reported to Associate Director Kevin Sumption. This autonomy from both IT (and Marketing), and strong alignment with curatorial, meant that we were able to do some interesting projects that otherwise wouldn’t have happened such as a series of “games” around exhibition content and themes: the design process, the mathematics of gambling, and environmental impact calculators. Two really important “failed” projects were Soundbyte, a music education resource and rudimentary social network connected to the museum’s digital music and media labs, now called Thinkspace, and Behind The Scenes, a back-of-house virtual tour and basic collections highlights experience.

Soundbyte ended up winning some awards – but it failed because we really underestimated the social part of it from both a community management perspective and in terms of technical architecture. However, much of what we learned during that process helped us with the Powerhouse’s later push into social media and associated open content initiatives. Behind The Scenes was a faster failure – the Flash interface looked fantastic but its architecture was very problematic. However, in building Behind The Scenes we made a series of rudimentary connectors that opened up programmatic access to the collection management system – and these small bits of code ended up forming the basis of what would become the Electronic Swatchbook project and later the first version of the online collection database, OPAC2.0.

OPAC2.0 was the start of a new wave of work at Powerhouse. The teams I managed grew significantly and from 2006 onwards there was a lot of activity around getting the collection out to the world – first via the database, then data releases, an API, and through various social platforms. The Powerhouse also launched Design Hub (later dHub) as a portal around design content and collections, and a new children’s site that distributed CC-licensed craft activities and games for under 8s and their parents. In 2008 this led to being commissioned to create cross-government experimental projects – a baby names voyager, an experimental semantic web collections portal, and a multi-agency events calendar and app for parents.

This was on top of all the exhibition projects and other things that the teams did. But at the end of the day it was the collection – its diversity and scale – that lay at the heart of most of this work. We pretty quickly realized that the value of a museum’s collection lay in the public’s ability to interact and engage with it and so there was a lot of rapid experimentation around new interfaces and new platforms through which to provide access to the collection.

We even made a collaboration with artist Craig Walsh that was meant to be a virtual monster inside a box that devoured collections – the web interface was the “feeding tube”, so to speak, and drew edible collections from visitors’ choices and uploads.

A huge amount of work was done – we made many things that didn’t work out as planned – and I worked with and managed a very talented pool of individuals, all of whom have gone on to bigger and better things all around the globe. As you move up the org chart you inevitably become further and further removed from production and certainly from writing code – you’re more in the role of a conductor than soloist.

Eventually my interest started to wane – and a set of coincidences meant that in mid-2011, after a visit to New York, I was on the end of a very late night Skype chat-turned-interview with then Director Bill Moggridge and his Director of Marketing, Jen Northrop, at the Cooper Hewitt who were looking for a Director of Digital & Emerging Media. They knew of my work from the web, and through some research collaborations a few years earlier when I had taught a workshop at the Cooper Hewitt on social media and collections. Of course, I also knew Bill’s work and career from following IDEO and the museum itself from its education programs and quirky exhibitions – and my wife had expressed a strong interest in living in New York – so it sounded like an interesting, and unique challenge. Bill obviously understood the challenge of moving a family internationally and so went out of his way to make it work within the Smithsonian structure – in fact his negotiations with Washington took so long that by the time he emailed back with an offer, I thought they had probably hired someone else!

Cooper Hewitt was a fascinating experience – especially coming in to an organization that had a strong desire but few muscles to bring to life a very different vision of the institution. When I left Powerhouse I became acutely aware of two things – one, that really significant change is easiest done when you can stop everything else and close your main galleries, and two, that Australian institutions are much more inherently visitor-focused (and have been for a long time) than their North American counterparts.

Bill had a very generous way of working and he wanted to make the most of the multi-million dollar renovation that Cooper Hewitt had just begun. There’s a great interview with him in Fast Company that was published a few weeks before I landed in which his discontent with the building, its architecture, and the ‘traditional visitor profile’ is obvious. It is also obvious that he treated the idea of the museum itself as a very malleable construct – and in those early months we got some major structural changes through that might have been more difficult in other circumstances. Three months into the job, the collection metadata had been released under a CC0 license – a first for the Smithsonian – and by mid year I’d been able to grow my team by bringing Micah Walter on as a proper staff member, and hire Aaron Cope who had been thinking about his next steps after working at Stamen. The AV duo from Education were also added to my group – and Katie Shelly slowly transformed from video producer to a hybrid videographer and UX advocate. In the sprint to the museum opening we also added Sam Brenner, a super talented developer fresh from NYU’s ITP.

We had also begun the concept stage of the new museum with Diller Scofidio + Renfro, and my team was working a lot with Local Projects who had been hired as media designers. Then suddenly Bill took ill, and several weeks later passed away with brain cancer.

Everyone was in shock.

Most of the work after that was driven by a sense of trying to bring the vision of a more porous, more generous, and more diverse and playful museum to reality. Most people know the story after that – many things got made, all of which are documented over at Cooper Hewitt Labs – and my team got to do some amazing work with lots of collaborators inside and outside the museum.

ACMI is a different beast altogether. It is really interesting to me because it is taking a museum that is already very successful – 1.25m visitors each year – and working with a dynamic executive team to create a more experimental and fluid institution, which almost certainly necessitates breaking a few of the very things that have led to its current success. I’m also perversely excited by the complete challenge of working with contemporary media and Copyright – this is a museum that deals with cinema, TV, video games and contemporary media art so there’s very little that is simple in terms of IP. Similarly, the first question everyone asks me is “why would I go to a museum about things that I can watch on Netflix or play on my Playstation, Xbox or through Steam?” I think that needs a series of razor sharp responses – some of which will be visibly articulated in new ways in the coming months.

So that’s a potted history of how I’ve ended up where I am now.

The missing piece I haven’t mentioned is my other life in music as a DJ, event and festival producer. It has been that other life that really underpins my constant focus and interest in improving access to, and the human experience of, both museums and their contents. Starting out in public radio during my final year at high school, I’ve been part of a DJ/live/FX duo for over 20 years that was all about introducing dance floors to new music – as well as creating physical events and environments in which people open up to new sonic and sensory experiences. Perhaps subconsciously I’ve treated museum collections like obscure records and sample sources – and the purpose of my work in the last 15 years or so has been about liberating and making those not just accessible, but enticing and useful to the public.

Thomas: What work inspires you right now?

Seb: Right now I’m interested in the work of Anab Jain and her practice Superflux, Ingrid Burrington and her work on the infrastructure of the internet, Amy Rose and May Abdalla and their immersive documentary work as Anagram, Jason Scott’s continuing amazing work on video game archiving and preservation, those working at the intersection of cultural orgs/exhibits/digital like Tellart, as well as all the usual museum/cultural sector suspects who I’m sure everyone is already following and reading about.

Thomas: You mentioned above that your museum collections work might be subconsciously influenced by your passion for music and fostering community around it. I really like that! If you were to distill some core lessons on building digital collections and establishing community in a digital environment around them, what would they be? Is there any particular project you have in mind that illustrates these lessons?

Seb: The core lessons are best told as a recounting of how my teams learned those lessons.

At the Powerhouse the OPAC2.0 project opened up a huge number of vectors into the collection – and we learned a huge amount about what did and didn’t work through that process. The Powerhouse was one of the first museums to release its collection (as a raw data file), which was closely followed by a public facing API, yet these were much more helpful internally than externally – in that they allowed us to work with and see the shape of the collection more easily. My team at Cooper Hewitt did the same – data release followed by the API – but at Cooper Hewitt, Aaron Cope spent a lot of time making the web interface itself a lot more linguistically inviting which made all the difference.

Let me explain that a bit better.

When we were designing and building the Powerhouse online collection we were coming from a very low base. None of the collection was online, and we had just seen the failure of our Behind The Scenes (BTS) project. BTS had presented some top level collection highlights but had been built in Flash on top of Coldfusion (anyone remember Coldfusion?!), and alongside that we had three old specialist collections in their own little portals – the Sydney Olympic Games collection, and two photographic collections, the Tyrrell archive and Hedda Morrison archive. All of these specialist collections presumed an interest and knowledge of the collections’ contents and as a result weren’t particularly browsable. Through the Electronic Swatchbook project, though, we had designed and built an interface that was entirely based on browsing and the interrelationship between objects using tagging because the individual swatches weren’t catalogued (or able to be). And we’d also seen a lot of traffic and downloading of the swatches – which taught us the value of browsable interfaces and open access/public domain releases.

So Giv Parvaneh and I started thinking about how we might apply a swatchbook-like approach to the whole collection. After all, in porting the BTS project from Coldfusion to PHP we had built a very rudimentary library for extracting object metadata from the enterprise collection management system. After OPAC2.0 went live in late 2006 we experienced extremely rapid traffic growth, almost entirely driven by the collection – and with that came other challenges. We were completely unprepared – and understaffed – for success. Imagine you were running a museum and suddenly, and consistently, double the number of visitors started arriving at your museum’s door each day asking new types of questions. You’d hire some more front-of-house staff, get curators and subject matter experts out on the floor, and deal with the increase – but online, when this happens, it remains business as usual.

Following OPAC2.0 came the work with Flickr and the Commons on Flickr. I spoke at WebDirections in 2007 and George Oates was on the lineup too. While she was in Sydney she came and visited Powerhouse and told me that she was about to go live with an exciting collaboration with Library of Congress. We agreed to keep in touch and try to get Powerhouse’s historical photo archive online if LoC was open to expanding the project. LoC went live in January 2008 and then Powerhouse became the first museum in the Commons on Flickr in April 2008. Flickr created a huge audience for the historical photographs and during 2008/9 we did a lot of experiments in integrating the user-generated metadata from Flickr (tags) and comments into both the museum’s workflows and OPAC2.0. Paula Bray who headed up Image Services did a lot of work with Flickr hosting events and even publishing a book of user generated comments – a kind of user generated catalogue.

from "Then and now: stories from the Commons" — from “Then and now: stories from the Commons”

The socialising of the collection on Flickr taught us a lot – that there were bigger, general audiences out there, but that the route to these audiences was often controlled by third parties who might have different business agendas. After George left Flickr, the Commons was effectively put on hold by Yahoo for several years – Flickr’s user base also changed as other products and services appeared on the market. Powerhouse’s collections are still there – they weren’t removed like Brooklyn Museum did – but the Flickr experience also demonstrated the importance of continually supporting and feeding the community. It wasn’t like it was possible to outsource engagement.

There’s good documentation of this period on Fresh & New as well as in these Museums and the Web papers by various team members:

Tagging and Searching—Serendipity and Museum Collection Databases – which covers the OPAC2.0 project and very early results on usage and tagging behaviors. These changed quite substantially in the following years and so the early honeymoon period didn’t end up being representative of the longer term. Changes in the way that Google operated meant that a lot of the early SEO gains were diminished from 2010 onwards.

Uniting the Shanty Towns—Data Combining across Multiple Institutions – which covers the early work building About NSW with Dan McKinley and Greg Turner (who later founded the Interaction Consortium), and Renae Mason (now at Museum of City of New York).

Flickr Commons: Open Licensing and the Future for Collections – my former colleague Paula Bray writing on the experience with Flickr.

Reprogramming The Museum – former colleague Luke Dearnley writing on the museum’s data release and API as well as architectural decisions in that period.

Skip forward a couple of years to 2012 and Cooper Hewitt.

After the CC-Zero release of the Cooper Hewitt collection metadata in early 2012, we hired Aaron Cope to the Digital & Emerging Media team. He wanted to be “head of internet typing” but we finally went with “head of engineering.” He built the alpha version of the Cooper Hewitt collection in his first three months and it became a fertile proving ground for a lot of what would go into the making of the new exhibition experiences and The Pen.

API_stack — The API at the center of the museum

Aaron, Micah Walter and myself did a lot of work to ensure that the collection was going to be at the heart of the new museum and Aaron’s experience in building usable and successful APIs was key to putting Cooper Hewitt on the map. When I arrived at Cooper Hewitt at the end of 2011, there were only 10,000 objects online in the vanilla web interface from the collection management system vendor and the museum was completely unknown in the digital humanities world. By the end of 2012, almost all the collection was online and the team picked up awards from AAM and Museums and the Web for the alpha version.

The value of rapid, publicly visible work was key to the Cooper Hewitt’s success here.

Thomas: Excellent lessons for anyone thinking APIs and collections, interface development, platform utilization, and community engagement. In closing, whose data praxis would you like to learn more about?

Seb: Tim Sherratt, Mia Ridge, Mitchell Whitelaw, Geoff Hinchcliffe, Elisa Lee – all Australians doing fascinating work in the digital humanities and its intersection with interaction design.

RESOURCE: “Out of Cite, Out of Mind” Report on Data Citation

“Out of Cite, Out of Mind,” a new report looking at issues surrounding data citation has been released by the US Committee on Data for Science and Technology (CODATA) and the Board on Research Data and Information (BRDI).

The report discusses the current state of data citation policies and practices, its supporting infrastructure, a set of guiding principles for implementing data citation, challenges to implementation of good data citation practices, and open research questions.

RESOURCE: Keeping Up With… Big Data

The latest issue of the Association of College and Research Library’s (ACRL) Keeping Up With… publication is devoted to big data. Written by Mark Bieraugel (Business Librarian at California Polytechnic State University), it covers the nuts and bolts of the topic and offers a bibliography that includes sections such as “Big Data and the Academy,” “Privacy and Criticism,” “Tutorials,” and “Sandboxes.”

Bieraugel advises humanities and social science librarians to recognize that “big data is becoming more commonplace in their disciplines as well, and is no longer restricted to corpus linguistics.” He goes on to advocate for the role of libraries in data curation: “Librarians also need to embrace a role in making big datasets more useful, visible and accessible by creating taxonomies, designing metadata schemes, and systematizing retrieval methods.”