As part of the genetics books project, we are tackling issues of copyright clearance and due diligence head on. Up to 90% of this collection is in copyright, or is likely to be in copyright, so developing a copyright clearance strategy was one of our earliest considerations. This turned into a useful project to test-run the EC-funded ARROW system on a large scale. ARROW provides a workflow for libraries and other content repositories to determine whether books are in-commerce, in copyright, and whether the copyright holders can be identified and traced. This system has undergone small tests throughout Europe, including the UK (using collections and metadata from the British Library), but in order to determine whether ARROW is feasible on a large scale, a realistic large-scale project was needed.
The Wellcome's genetics books project provided this opportunity, and the challenge was taken up by the ALCS and the PLS jointly, as announced previously on our Library Blog. Results from ARROW, combined with the responses from contacted rights holders, determine whether the Wellcome Library will publish a work online.
The collection of (roughly) 1,700 potentially in-copyright books is not enormous, but it is diverse, and has already thrown up some interesting wrinkles in the copyright clearance workflow.
For example, according to the AARC2 standard used to catalogue these books, only up to three authors are included in the metadata record (followed by et al). Works with more than three authors, and collected works such as conference proceedings, had to be manually consulted in order to identify all the named contributors. This inflated the known number of contributors to nearly 7,000 (4 authors on average per book).
Embedded below is a presentation I gave at the London Book Fair earlier this week, which provides an overview of the process, and preliminary statistics from the first 500 books to complete the ARROW workflow.