Tool – dh+lib

TOOL: Fulcrum

The University of Michigan Press/Michigan Publishing and the University of Michigan Library IT announced the beta launch of the new digital tool, Fulcrum. This platform uses the Hydra/Fedora Framework and “helps publishers present the full richness of their authors’ research outputs in a durable, discoverable, and flexible form.” During this beta phase, the platform will feature its first project, The Director’s Prism: E.T.A Hoffmann and the Russian Theatrical Avant-Garde, forthcoming from Northwestern University Press.

Fulcrum is focused on the presentation of digital source and supplemental materials that cannot be represented adequately in print form. Fulcrum allows for a richer experience and deeper understanding for the reader and enables authors to make better, multi-faceted arguments. The platform readily supports multimedia content, including playback for audio and video files and pan-zoom capability for high resolution images. All content is discoverable and preserved via durable URLs. Structured metadata and faceted search results also allow for further exploration of the materials.

RESOURCE: Tabulizer, an R package for working with Tabula

The rOpenSci project has released tabulizer, an R package that provides bindings to the Tabula java library.

Tabula is a tool for extracting data from PDF tables:

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there’s no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.

rOpenSci develops R packages “that provide programmatic access to a variety of scientific data, full-text of journal articles, and repositories.”

RESOURCE: Online Tool Aims to Help Researchers Sift Through 15 Centuries of Data

North Carolina State University has announced the unveiling of a new tool, Big Data Infrastructure Visualization Application, or BigDIVA. BigDIVA, created by digital humanities scholars from NC State and Texas A&M University, will allow users to search thousands of scholarly articles and archival items spanning from 450 A.D. to the 20th century, using a visual interface.

One of the project creators, Tim Stinson (NC State), says, “Our goal in developing BigDIVA was to create a tool to help us explore our cultural heritage and facilitate scholarship in fields ranging from literature and religion to art and world history…And we think we delivered.” Stinson goes on to say that the current plan is “market BigDIVA as a subscription-based service to libraries and the higher education community.”

TOOL: Text to Image Linking Tool (TILT)

The British Library Digital Scholarship blog posted an update on the Text to Image Linking Tool (TILT), which was one of the winners of the British Library Labs Competition 2014.

TILT tackles the challenge of making manuscripts machine readable by “link[ing] the transcription to the page-image at the word-level.” Here’s how the tool works:

As the user moves the mouse over, or taps on, a word in the image or in the transcription the corresponding word can be highlighted in the other half of the display, even when the word is split over a line. And if needed the transcription can be scrolled up or down so that it automatically aligns with the word on the page. And now the ‘excise’ drops back to a low level.

TILT’s project leaders plan to have the tool in “demonstrable and usable form” by October 2014.

TaDiRAH: Building Capacity for Integrated Access

In this post, Quinn Dombrowski (UC Berkeley) and Jody Perkins (Miami University in Ohio) introduce the digital humanities taxonomy project known as TaDiRAH, reviewing the motivating factors behind its inception and outlining future goals of the project. Both are members of the TaDiRAH Coordinating Committee.

TaDiRAH, the Taxonomy of Digital Research Activities in the Humanities, is the result of a year-long project undertaken by the DiRT (Digital Research Tools) Directory and DARIAH-DE (Digital Research Infrastructure for Arts and Humanities) to develop a shared taxonomy that can be used to organize the content of sites as diverse as the DARIAH Zotero bibliography ‘Doing Digital Humanities’, the DiRT directory, and the DHCommons project directory.

Motivations

TaDiRAH was developed in part as a response to the evolving needs of the DiRT directory, a longstanding, well-regarded source of information about available tools that support scholarship in the humanities. From its inception, DiRT has sought to engage a broad audience of tool users by limiting the use of jargon, and categorizing tools by the task(s) they perform, rather than using a more abstract taxonomy. A wiki format was originally chosen to ensure a low barrier to entry, providing a great deal of flexibility and allowing the site to develop quickly without a specific source of funding.

As the number of resources grew, the wiki platform became unwieldy. Consequently, DiRT was completely rebuilt in 2011 using Drupal, an open source content management system which provided more structure and enabled each tool to have a unique “profile” page. The platform supports options for browsing, sorting, and searching the entire directory across a variety of facets including tool category, cost, license, and developer. As of May 2014, the DiRT directory consists of approximately 800 tool listings, and receives approximately 3,000 unique visitors and 16-20,000 monthly pageviews. It has received funding from the Mellon Foundation for a new phase of technical development that includes the development of APIs to enable data exchange with DHCommons and Commons In A Box, a new feature for submitting tool reviews, and “recipes” that document how different tools can be combined to address research questions.

[pullquote]This project represents one of many data streams moving toward a networked integration of related hubs in the DH resource ecosystem.[/pullquote]

Early in 2013, members of the DiRT Steering Committee/Curatorial Board started looking at options for improving the site, which included an examination of the ways that the current taxonomy was being used by contributors. Following an analysis of the existing categories and free-form tags, we began a series of discussions with the DARIAH-DE team that created the Zotero bibliography (Christof Schöch, Matt Munson, Luise Borek). They had already begun work on a taxonomy of digital humanities activities. Recognizing our common goal, we formed a transatlantic collaboration around the task of developing a shared taxonomy. Based in Europe, DARIAH aims to enhance and support digitally-enabled research and teaching across the humanities and the arts. The DARIAH infrastructure will be a connected network of people, information, tools, and methodologies for investigating, exploring, and supporting work across the broad spectrum of the digital humanities. DARIAH-DE represents the German contribution to DARIAH.

How does it work?

Although the motivating factors behind the development of TaDiRAH are pragmatic, TaDiRAH and its antecedents are not without more theoretical and scholarly influences, including the concept of “scholarly primitives”[1. Unsworth, John. 2000. “Scholarly Primitives: What Methods Do Humanities Researchers Have in Common, and How Might Our Tools Reflect This?” London: King’s College London], DARIAH research into modeling the research process [2. See, for example: Benardou, Agiatis, Panos Constantopoulos, Costis Dallas, and Dimitris Gavrilis. “Understanding the Information Requirements of Arts and Humanities Scholarship.” International Journal of Digital Curation 5, no. 1 (June 22, 2010): 18–33. doi:10.2218/ijdc.v5i1.141.; Ruth Reiche, Rainer Becker, Michael Bender, Matthew Munson, Stefan Schmunk, Christof Schöch: “Verfahren der Digital Humanities in den Geistes- und Kulturwissenschaften” DARIAH-DE Working Papers Nr.4. Göttingen: DARIAH-DE, 2014. http://webdoc.sub.gwdg.de/pub/mon/dariah-de/dwp-2014-4.pdf], and research on digital scholarly methods in the humanities.[3. See Borgman, Christine. Scholarship in the Digital Age : Information, Infrastructure, and the Internet. Cambridge: MIT Press, 2010; Gasteiner, Martin, and Peter Haber, eds. 2010. Digitale Arbeitstechniken für die Geistes- und Kulturwissenschaften. Vienna: UTB; and Siemens, Ray, John Unsworth, Susan Schreibman, eds. 2004. A Companion to Digital Humanities. Hardcover. Oxford: Blackwell] Unsworth’s “scholarly primitives” were developed with an eye towards practical applications: the “primitives” were functions of scholarship that could be embodied in tools, which could then be combined to achieve “higher order functions” (similar to DiRT’s “recipes”). Later work on articulating and organizing stages and aspects of research activity provides a more process-oriented approach to understanding scholarship. Both ways of breaking down scholarship into its constituent parts, and using those terms to categorize tools, can help a user understand how and when a given tool might apply to their research, and what other tools might complement it.

[pullquote]Two rounds of detailed, thoughtful feedback from the digital humanities community played a significant role in shaping the taxonomy.[/pullquote]

The taxonomy does not aim to be comprehensive, focusing instead on a subset of relatively broad categories that are widely used and generally understandable. It is expected to be most useful to projects seeking to collect, organize and provide access to information on digital humanities tools, methods, projects, or readings.

The current version of the taxonomy is based upon three primary sources:

the arts-humanities.net taxonomy of DH projects, tools, centers, and other resources, especially as it has been expanded by digital.humanities@oxford in the UK and DRAPIer in Ireland;
the categories and tags originally used by DiRT; and
the DARIAH ‘Doing Digital Humanities’ Zotero bibliography of literature on all facets of DH.

These resources were studied and distilled into their essential parts, producing a simplified taxonomy of two levels: eight top-level goals that are broadly based on the steps of the scholarly research process, and a number of general methods under these goals that are typically used by scholars to achieve these research goals. Guided by the principle of separating research activities from research objects and the experience of managing earlier taxonomies, we created two additional open-ended lists for techniques and digital humanities research objects. Terms from either or both of these lists can be combined with any goal and/or method to further describe the activity. Two rounds of detailed, thoughtful feedback from the digital humanities community played a significant role in shaping the taxonomy, particularly the choice to treat techniques as a separate list, rather than forcing them awkwardly into a third level of the main taxonomy.

Acknowledging the impossibility of creating categories that would always be mutually exclusive, we aimed to create groupings that were distinct enough from one another to produce a level of consistency in application that would support interoperability and enhance discovery. We separated compound categories used by DARIAH (e.g. dissemination and storage), collapsed many of DiRT’s more granular categories (image editing and textual editing became: editing + an object), and added categories from both that were not easily mapped in either direction (e.g. designing and organizing). Decisions about what would be considered a “method”, and what would be treated as a “technique” were sometimes contentious. If more than one activity could be used to achieve the same ends then those activities were usually classed as techniques. Having an open list of techniques and objects will make it easier for TaDiRAH to keep up with a fast-changing field, as we anticipate those lists evolving far more quickly than goals or methods.

This project represents one of many data streams moving toward a networked integration of related hubs in the DH resource ecosystem. It will help to address the de-contextualization that is an unavoidable consequence of the move away from comprehensive sites that are difficult to sustain. TaDiRAH allows topically-restricted sites like DiRT (tools) and DHCommons (projects and collaborators) to focus on curating one particular kind of content, while still providing a way to identify and connect related information.

Future Steps

This summer, DiRT will undertake a comprehensive review of each tool entry. Terms from the TaDiRAH taxonomy will be added as part of this process. DHCommons staff will, similarly, add TaDiRAH terms to project profiles based on existing free-form metadata. Information from DiRT and DHCommons will be exposed using RDF, making this content available as linked open data, as well as through the APIs that are currently under development as part of the Mellon-funded integration initiative.

Applying TaDiRAH to actual directories will provide an opportunity to assess the degree to which it can accommodate real-world data. We anticipate revising TaDiRAH periodically in response to issues that arise during this process, as well as feedback from those who have used it in other ways (e.g. Micah Vandegrift, Scholarly Communications Librarian at Florida State University, made reference to TaDiRAH as a resource for introducing digital humanities to undergraduates, by using it as a guide to the roles within digital humanities projects).

DARIAH-EU has also committed to using this taxonomy as a basis for their development of a more complex ontology of digital scholarly methods, and we are also engaged in ongoing dialog with other ontology initiatives, including NeDiMAH’s work around scholarly methods. NeDiMAH (Network for Digital Methods in the Arts and Humanities), funded by the ESF (European Science Foundation), is a network of scholars involved in various aspects of the Digital Humanities across Europe, including understanding and classifying digital research practices. Our goal is to share at least high-level categories with NeDiMAH’s ontology, so that objects (projects, tools, articles, etc.) classified using our taxonomy can be automatically “mapped” to some level of the NeDiMAH ontology, and vice versa.

TaDiRAH (which is pronounced “ta-DEE-rah”, and is almost an anagram of “DARIAH” and “DiRT”) lives on Github at http://github.com/dhtaxonomy/TaDiRAH. We encourage readers to use TaDIRAH and submit feedback via the issue tracker on github. We currently only have a human-readable version available, but we’ll be publishing machine-readable versions (linked data, and a Drupal taxonomy feature module to make it easier for others to implement TaDiRAH on Drupal-based sites) in the near future.

This work is licensed under a Creative Commons Attribution 4.0 International License.

[wp_biographia user=”quinnd”]

[wp_biographia user=”perkintj”]

TOOL: etcML: text classification tool

Geoffrey Rockwell has written a post introducing etcML (Easy Text Classification Machine Learning) a new, freely available, text analysis tool developed at Stanford. The tool’s “primary mode of analysis is ‘classification,’ which you can think of as automatic categorization”:

The tool allows you to pass a text (or a Twitter hashtag) to an existing classifier like the Twitter Sentiment classifier. It then gives you a interactive graph like the one above (which shows tweets about #INKEWhistler14 over time.) You can upload your own datasets to analyze and also create your own classifiers. The system saves classifiers for others to try.

RESOURCE: NCSU Harvests Instagram Photos of Their Library with Open Source Tool

North Carolina State University released a new, open source tool that harvests Instagram images that use the hashtag #HuntLibrary. Initially created to collect images for a Library Journal supplement about Library by Design, the project evolved into a larger community building effort that resulted in more than 2500 images added to the collection.

“NCSU Libraries has now made the code for My #HuntLibrary project freely available on GitHub in the lentil Rails Engine framework. It enables the harvesting of image files and metadata from Instagram, and allows an administrator to moderate submissions and add items to a collection. Lentil also includes a tool that makes it easy to submit agreements to contributors when seeking permission to reuse their photos for additional promotional or research purposes, according to the announcement. Developers can use the code to customize and deploy lentil-based applications on any Ruby on Rails-capable server.”

RESOURCE: Textal Text Analysis App

The University College London (UCL) Centre for Digital Humanities, in collaboration with the UCL Centre for Advanced Spatial Analysis, has released Textal. Textal is a free iOS app “that allows users to analyze documents, web pages and tweet streams, exploring the relationships between words in the text via an intuitive word cloud interface. The app generates visualizations and statics that can be shared without effort, which makes it a fun and useful tool for both research and play, bridging the gap between text analysis and mobile computing. We also see it as a public engagement activity for Digital Humanities.”

RESOURCE: Catalog Search Plugin for Omeka

Lincoln Mullen, PhD candidate at Brandeis University, has created the Catalog Search plugin for Omeka, which builds upon Omeka’s Library of Congress Subject Heading plugin to search Archive Grid, the DPLA, Google Books, Google Scholar, the Hathi Trust, JSTOR, the Library of Congress, and WorldCat based on subject heading. See it in action at the American Converts Database (example) and see the code on GitHub.

RESOURCE: CKAN

The Open Knowledge Foundation has announced the release of CKAN, an open source data management system that provides tools to streamline publishing, sharing, finding and using data. CKAN is intended for large and small data publishers, and will soon power data.gov, the U.S. government open data portal that will soon be seeing a spike in activity.

POST: The Limitations of GitHub for Writers

On ProfHacker, Konrad Lawson reports on the limitations of GitHub for writers, the last in a series of posts introducing and reviewing GitHub (with a posting on alternatives to GitHub in the works). He writes:

GitHub, in its current form, can serve the needs of writers and scholars, just as it currently serves programmers, and more recently, groups adding laws and government regulations as repositories on the site. For many reasons, however, both GitHub, and the broader approach to collaboration that it has promoted in the world of coding, is not ideal for writers and there are good reasons to support the development of alternative services more suited to our academic or other writing needs.