A post on the University of Nevada Las Vegas Libraries’ Blog announced the release of a collection of 14,000,000 tweets relating to the mass shooting that occurred at the 2017 Route 91 music festival. The post details the work of Thomas Padilla, Miranda Barrie, and other UNLV Libraries staff and describes some of the complexity of collecting, understanding, and making accessible a dataset of this nature and scale.
Use of the collection is technically challenging and a number of issues call for investigation that would benefit from the attention of researchers. On the technical side, all 14,108,104 tweets are stored in a single JSON file. For those unfamiliar with the JSON format, command line based programs, and scripting languages like Python there will be a learning curve for asking questions of the collection. On issues that call for further investigation, we have observed what appears to be collection infiltration by bots and other bad actors spreading fake news. Issues of this kind merit the specialized attention of researchers.
In a series of posts to follow, Miranda and Thomas will share stories, resources, and more for librarians, researchers, and members of the community. While the full dataset is only directly available for download at UNLV, public tweet identifiers and more information about the data can be found in the Collection Guide in the UNLV LIbraries Digital Collections.
dh+lib readers will be interested in the issues and sensitivities surrounding the practices of collecting, preserving, and providing access to such a dataset and should stay tuned for future posts in the series.