HathiTrust has announced the extension of their non-consumptive research tools to copyrighted materials, expanding researchers ability to explore and data mine the complete 16.7-million-item HathiTrust corpus. From the announcement:
This work has been several years in the making. A primary goal of HathiTrust is to enable the widest possible lawful research and educational uses of the HathiTrust collection. In recent years, US courts have recognized the solid legal basis for non-consumptive research on copyrighted materials. In 2016, HathiTrust established a working group to develop the Non-Consumptive Use Research Policy to ensure the responsible research use of copyrighted items.
The policy is now enacted in an updated release of HTRC Analytics, which allows researchers to conduct computational text analysis on copyrighted items as permitted under US copyright law.
All users are now able to apply the HTRC Algorithms, Extracted Features Dataset, and HathiTrust+Bookworm tools to all copyrighted works, as well as those in the public domain. Member institutions can additionally make use of HTRC Data Capsule tools for the entire corpus.