In a new post on its digital scholarship blog, The British Library has announced a collaborative transcription project that will help “to create a freely available ground truth datataset for anyone wishing to advance the state-of-the-art in optical character recognition (OCR) technology for handwriting.”
This project is a proof of concept exploring whether the creation of such a dataset can be done collaboratively at scale, using the collective expertise of volunteers around the world. At the heart of this approach is the Library’s enduring commitment to creating new and interesting ways to connect diverse communities of interest and expertise, be it scholars, the general public, computer scientists, students, and curators, around our collections. For this we are utilising a free and open-source platform, From the Page, which allows anyone with an interest in historical Arabic manuscripts to experience them up close, many for the first time, to discuss, learn and share expertise in their transcription.
Funding to develop the open-source platform (which supports right-to-left transcription) was provided by the library’s Digital Scholarship Department.