Data Literacy as Digital Humanities Literacy: Exploration of Threshold Concepts

For those who are both librarians and digital humanities instructors, we must either create new frameworks for teaching and learning or attempt to map existing ones to library instruction. “Digital humanities literacy” is a combination of many literacy areas. Still, the prevalence of data in both our daily lives and in digital humanities places data literacy in a position of importance. Below, I propose threshold concepts for data literacy and illustrate how those concepts can provide a lens through which we can explore a piece of digital humanities scholarship. Defining threshold concepts will be useful in communication among digital humanists and our daily pedagogical work.

To illustrate the application of these threshold concepts, I explore the data visualization project, Torn Apart/Separados, through a data literacy lens. The project team has clearly documented and shared their data collection processes and acknowledge the data’s limitations. Through these means, they invite a critical assessment of the data and its use, an overarching principle of data literacy (Calzada Prado & Marzal, 2013).

[pullquote] “Digital humanities literacy” is a combination of many literacy areas. Still, the prevalence of data in both our daily lives and in digital humanities places data literacy in a position of importance.[/pullquote]To discuss the utility of threshold concepts in data literacy, we must start from a shared understanding of them. Threshold concepts represent information previously not known by the learner, and once a person learns and understands that information, it permanently and drastically changes their perception of the topic (Meyer, Land & Baillie, 2010). A person who understands a threshold concept can ask more precise questions and will exhibit certain behaviors that show a deep level of understanding.

Data literacy, broadly, is the ability to critically create, manipulate, manage, analyze, understand, and communicate data (Koltay, 2014; Calzada Prado & Marzal, 2013; Fontichiaro & Oehrli, 2016). Based on relevant work in data literacy and digital humanities, and topics I’ve found to be pertinent in my teaching, I propose and discuss preliminary threshold concepts for data literacy in digital humanities below. I’ve used these as a basis for defining learning outcomes and assessing students’ learning around working with humanities data. These concepts are understanding data as human and/or machine created information, recognizing data as only one part of a narrative, and the role of data in scholarship.

Data is both machine and human-created

Data is not inherently objective, even when created through a seemingly objective process.

Key questions for exploration

  • Who or what created (or guided the creation of) this data?
  • How was the data created?
  • What human or machine-based biases are present?

Information literacy sessions typically cover familiar forms of knowledge dissemination, like books and presentations, which are directly created by humans. Data, on the other hand, can be created or simply influenced by humans. Increasingly, computational systems may not need direct input to create data (think a timestamp on a saved file), or they can create new information based on previous input (think machine learning). Even lacking direct input, the system programmer decides when one thing happens, another thing must be true, or a behavioral researcher decides one variable is important to collect while another is not. Data is always affected by human intervention, even if a human did not directly create it. As such, data is not inherently more objective than a journal article or conference talk, though we often hear that students tend to see data in this way (Fontichiaro & Oehrli, 2016). As much as we try to remove bias from our computing processes, decisions made during the creation process influence the final dataset because “people choose what to count or measure” (Schield, 2004, pp. 7).

Data in Torn Apart/Separados Volume 1 is gathered from multiple governmental sources, including the US Immigration and Customs Enforcement (ICE) division, and restructured to fit the needs of the project. One variable that ICE collects on its detainees is whether they are “criminal.” An early learner of data literacy might accept that information as fact, without asking more questions about how the value of “criminal” is determined. Criminal is a loaded term with meaning that can vary based on cultural, societal, and even demographic factors (do we consider a child who steals candy a “criminal”?). Which crimes earn a “criminal” signifier? A data literate person will ask questions about how the value is determined, who assigns the value, and what societal factors might influence the value. These are research questions in their own right, and the answers are not always readily (if at all) available. However, when learners understand the process of data creation as wholly guided by humans, they think critically about that process and how that affects any data they examine. These effects can be mitigated by supplementing a data-based result with other pieces of information.

Data is one part of the story

Data is one piece of a whole narrative, and cannot give a complete picture alone.

Key questions for exploration

  • What does the data seem to tell us?
  • What other information do we need to complete the picture?

Data serves as a “snapshot” representation of a particular reality. Because of the notion that data is inherently objective, we tend to believe data-driven conclusions more readily than other forms of knowledge (Fontichiaro & Oehrli, 2016). A data literate person understands this connection between data and reality and can identify gaps or discrepancies between them. Once those gaps are identified, they can reasonably suggest other pieces of information to complete a narrative or verify claims derived from a dataset. The ability to act when data is absent, and to understand the limitations of data-driven analysis, will continue to be highly desirable skills for all workers, not just those in DH (Davies, Fidler & Gorbis, 2020).

[pullquote]A data literate person understands this connection between data and reality and can identify gaps or discrepancies between them. Once those gaps are identified, they can reasonably suggest other pieces of information to complete a narrative or verify claims derived from a dataset.[/pullquote]The base data visualization in Torn Apart/Separados Volume 1 tells us that ICE-affiliated detention centers are widespread, which are in use, and some are not, the average daily population, and other characteristics. None of this factual information demonstrates how these centers can operate, a question which the project team set out to answer. Volume 2 analyzes the flow of money between ICE, government officials, and the contractors who complete work for them. This additional piece of information draws the story closer to a complete picture of the current state of immigration policy and enforcement in the US. A data literate person can effectively use the information in both Volumes 1 and 2 to draw correlations between operating costs and facility use, which was impossible without the new information Volume 2 provides. All of this information, from the data to the interpretations presented, are equally integral parts of the research process.

Data is part of the scholarly conversation

Data is an artifact of the iterative process of research.

Key questions for exploration

  • What kind of data is being shared?
  • What steps are being taken to ensure the data is findable, reusable, and preserved?

In digital humanities projects, data can appear both as the subject and product of research. Considering Christof Schöch’s distinction between “big” and “smart” data in the humanities is useful here, wherein big data is relatively unstructured but massive, and smart data is more structured and relatively small (Schöch, 2013). A dataset created by a DH researcher that is new, unique, and derived from multiple information sources is “smart” data. Like a published article, the dataset itself is an integral part of the research process. Findable, reusable, and well-preserved data ensures reproducibility and enables others to ask similar questions. A data literate person understands the flow of data from original source, to researcher, to final presented dataset, and knows how to find information on how the data was modified and recreated at each stage.

[pullquote]Understanding how data is created, data’s capacity to be a form of knowledge, and data’s role in scholarship are crucial for DH practitioners.[/pullquote]Torn Apart/Separados Volume 1 visualizes structured data garnered directly from other sources on a map for exploration and interpretation (another example of “smart” data). The original data is gathered from various organizations, and the cleaned, reformatted versions are preserved on the project’s GitHub repository. Project documentation explains the information in the dataset and its original sources. In this case, data is both the subject and product of research. A data literate researcher reusing or examining the project’s data can easily find documentation on how the data was modified throughout the project and, thus, gains a critical understanding of the process of data creation, transformation, and presentation.

Next Steps

Understanding how data is created, data’s capacity to be a form of knowledge, and data’s role in scholarship are crucial for DH practitioners. These concepts are meant to serve as preliminary benchmarks to gauge learners’ critical understanding of data creation, use, analysis, and sharing. As a community, we could potentially work towards an “interconnected core conceptual framework” for data literacy as a complement to the ACRL Framework for Information Literacy for Higher Education.

References

American Library Association. (2015). Framework for Information Literacy for Higher Education. http://www.ala.org/acrl/standards/ilframework.

Calzada Prado, J. & Marzal, M.A. (2013). “Incorporating Data Literacy into Information Literacy Programs: Core Competencies and Contents.” Libri, 63(2), 123-134.

Davies, A., Fidler, D., & Gorbis, M. (2020). Future Work Skills. http://www.iftf.org/uploads/media/SR-1382A_UPRI_future_work_skills_sm.pdf

Fontichiaro, K. & Oehrli, J.A. (2016). “Why Data Literacy Matters.” Knowledge Quest, 44(5), 22-27.

Koltay, T. (2014). “Data literacy: In search of a name and identity.” Journal of Documentation, 71(2), 401-415. https://doi.org/10.1108/JD-02-2014-0026.

Meyer, J., Land, R., Baillie, C. (Eds.). (2010). Editors’ Preface. Threshold Concepts and Transformational Learning (pp. ix). Sense Publishers.

Schield, M. (2004). “Information Literacy, Statistical Literacy, and Data Literacy.” IASSIST Quarterly Summer/Fall 2004. https://iassistdata.org/iq/issue/28/2.

Schöch, C. (2013). “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities, 2(3). http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/.

Torn Apart / Separados. (2018). http://xpmethod.plaintext.in/torn-apart/

About the Author

Kayla Abner is a Digital Scholarship Librarian at the University of Delaware Library, Museums & Press. She is enthusiastic about teaching as a means to empower scholars to create meaningful digital projects. Kayla’s favorite aspect of digital scholarship is its potential to reach across disciplinary and scholarly boundaries to reach new audiences. Her interests include digital humanities, data curation, visualization, and digital creativity.