Monday, April 2, 2012

Learning lessons on the Genetics Books digitisation project

A key component of the theme of our digitisation pilot programme - "Foundations of Modern Genetics" - is a set of printed textbooks and secondary sources published between 1850 and 1990 that shed light on the development of genetic and genomic research. The total collection identified is around 2,000 books. The goal is to digitise these texts in full, and make them freely available online via the Wellcome Digital Library (we are of course dealing with copyright clearance).

Digitisation of books often looks and sounds straightforward. It is not always straightforward of course - but the new book scanners on the market these days do make it quick. There are standard ways of book scanning - you put the book on a cradle, and either turn the pages (by hand), or use a "robotic" contraption that turns the pages automatically. You can use scanning technology, or one-shot dSLR cameras; panes of glass to hold the pages down, or small grips on the outer margins of the pages. The choice depends on the physical nature of the books and how quickly you want to digitise. Even when outsourcing it is useful to understand how book scanning really works. Our Genetics Books digitisation project - a pilot project - is giving us this opportunity.

We commissioned local digitisation company Bespoke Archive Digitisation to carry out the digitisation work for this pilot project. As the digitisation is carried out on site, we have been involved to some extent in all aspects of the digitisation, including the setup and use of new types of equipment, the QA process involved in book digitisation, and the workflow of image conversion and delivery. As we have never carried out high-throughput book digitisation at the Wellcome Library before, this has been a huge learning curve for us, allowing us to gain knowledge that will come in very useful in the future with new (and hopefully larger) projects.

Bespoke Archive Digitisation uses a robotic book scanner and a manual book scanning unit (for books that are not robust enough for the robotic scanner, are outsized, etc.). Both of these "scanners" use Canon 5D Mark II cameras, two per unit to capture each page of an opening simultaneously.  The robotic book scanner is the latest version from Kirtas, the Kabis III. Richard Keenan, owner of Bespoke Archive Digitisation explains, "this unit has a number of time-saving features such as "fluffers," a “snubber,” and a self adjusting book cradle which moves to keep the book at the correct angle to be photographed. This is accomplished through various sensors and lasers, which monitor the book throughout imaging to keep it in the correct position, but must also be monitored by the operator."

A key lesson, according to Richard, is that "although all robotic book scanners include a published throughput (2,890 pages per hour for this particular unit), it is important to understand that the published throughputs do NOT mean that you can do 2,890 pages per hour, hour after hour without stopping. Each book must be set up on the cradle, the cameras may need some adjustment/focusing, and page turning does require manual intervention, every time, to ensure the pages are flat, and to prevent page curvature and glare (especially on sealed paper).

"Also, it is very important to remember that this is just the image capture stage, the pages then have to be batch processed, edited and rigorously quality assessed which can take the same, or more time than imaging. Depending on the book's structure - page thickness, binding type, size of the book etc - you will find that speeds vary considerably, a realistic estimate of throughput over a significant period of time is approximately 1000 pages per hour, but this can be much lower with some books.

"Although these figures differ by a large margin from those published, the Kabis III from Kirtas is still probably the fastest way to digitize books, and the important thing is that the quality of output produced is excellent if operated correctly. The on board editing software 'Book Scan Editor' is very handy, offering the usual cropping, image adjustment and sharpening options, but also deskewing and xml conversion and even OCR. I would say that another thing to bear in mind here, is that there is a large learning curve with this technology, so for anyone thinking of using one - particularly those who have no experience with robotic book scanners - plan plenty of time in the project for training and testing periods."

No comments:

Post a Comment