Friday, October 29, 2010

100,000 image milestone, and how we did it

A significant milestone was reached this week, with 100,000 images of Crick papers photographed. The Crick collection, previously described, includes around 285,000 images - over half the total amount to be digitised as part of the Archives digitisation project over a period of 20 months by Laurie Auchterlonie and Tom Cox in the Imaging department.

Laurie and Tom use Canon Mark II digital SLR cameras mounted to columns attached to two copy stands. These allow the cameras to be automatically raised and lowered according to the size of the items being digitised. This is a fairly typical imaging set-up for this type of material. However, with a project this large, it is important to ensure that the minimum amount of time is taken to photograph, edit and manage the images. In spring 2010, the photographers spent time developing a workflow that would allow them to digitise the highest number of items possible, whilst not compromising on quality or care in handling. It was in fact found that time-saving measures actually resulted in higher quality images, and minimised the amount of handling required.

For example, "live view" screens allow the photographers to easily see and adjust the alignment of each item on the copy stand, and the degree to which it fits the frame of view. This saves time as the photographers do not have to look through the viewfinder (difficult when the camera is 6 or more feet above the ground), or take multiple shots to get it right. Post-processing work has also been almost completely eliminated, as has the need to reshoot items at a later date.

Purchasing higher columns limited the number of times lenses had to be changed. Larger items require the cameras to be raised quite high, and if the column isn't high enough, the photographer has to change to a different, shorter or wide-angle lens. The flexibility built into the workflow by these measures is highly advantageous when dealing with heterogeneous materials such as personal archives.

Other aspects of the workflow that had to be specifically tailored to archival collections was the storage and foldering of images so that they could easily be found and identified. Using the existing archive catalogue hierarchies, the foldering system allows the user to pinpoint the exact file or item to be viewed to create copies for users, or to carry out QA against the original items (a sample of images is checked against the originals by Julia Nurse, who prepares the items before photography, to ensure that filenaming is accurate and that items aren't being missed). Eventually, these folders will be rendered obsolete, as we implement a digital asset management system that will restructure the archive storage of our images on ingest. But it is important not to underestimate the need to access images during the pre-ingest process of digitisation and QA. And if you do not have a digital asset management system, it is even more important that ease of access is factored in from an early stage.

A bit of preparation and testing makes a huge difference when setting up a new workflow. Even a minute saved per item means a large overall time saving when spread over hundreds of thousands of images.

Wednesday, October 13, 2010

Papers, papers and yet more papers … preparing the Crick Archive for digitisation

When first faced with preparing around 300 boxes of Francis Crick’s personal papers for digitisation, I have to confess my heart sank. A far cry from the last very visual digitisation project of 3000 AIDS posters, I was daunted, not only by the very different content of this collection, but the sheer size of it – an estimated half a million items this time. How wrong I was. I feel privileged to have been given the opportunity to delve into one of the most incredible minds of our lifetime.

Although we tend to associate him only with his (and Jim Watson’s) discovery of the double-helix sequence of DNA – and there is plenty of fascinating correspondence within the archive related to this - it is his research on the mind and consciousness in his latter years that is truly ‘astonishing’ as he would put it. Through his endless correspondence with both fellow scientists and the general public, we get a real sense of his probing analysis of what makes our brains tick.

It is very easy to get side-tracked from such a collection but I have to remember that my main task is to ensure the papers are in a suitable condition for the photographers to shoot: a daily scour of the collection is required to remove existing staples and flatten pages but occasionally a conservator is required. For example, Crick’s heavily folded (since 1955) tracing sketches and calculations of Collagen Long Spacings required specialist equipment to flatten out.

Once a particular batch has been checked, data spreadsheets are then produced for the photographers so that they know what to expect in each box – included in this data is an estimate of the percentage of OCR’able (Optical Character Recognition) text, a record of the current location of a particular batch and notes for the archivists’ attention. While doing this I cannot help but siphon off particularly interesting information which has and will continue to be used for publicity about the project – see the recent BBC Audio Slideshow.

Further blogs providing updates on the digitisation project will follow in due course.

Top image: Crick's sketch of genetic code, 1965 (PP/CRI/E/1/13/10)

Bottom image: Francis Crick lecturing at Cambridge University (PP/CRI/A/1/2/9)

Friday, October 1, 2010

Preparing and planning a large archives digitisation project

Archives digitisation is currently underway in our Imaging Studio, with two full-time members and two part-time members of Library staff dedicated to preparing and digitising the items. We will talk more specifically about the work being carried out on these materials on this blog in the near future, but first we present an introduction to the setup and planning of the project.

Once the theme was chosen (Modern Genetics and its Foundations), and relevant collections identified (see previous blog post), we realised that we had quite a large job on our hands. The scope of the project was bigger than anything we had done before: 620 boxes of material, containing around 800 pages each, adds up to around half a million pages to be digitised.

Based on a series of tests, we estimated the project would take 2 years to complete – starting with the preparation of the material in advance, with photography coming into play a few months later. Two full-time staff would be focused on imaging the material, with two part-time member of staff preparing, tracking and assessing the items.

There are a range of logistical issues to bear in mind when planning and starting up a project of this nature. The boxes are stored in the basement stores, and had to be retrieved for a period of some months while the material was being worked on. We divided the collections into batches of a size that could be imaged in a period of 4-6 weeks and retrieve and return each batch as a unit, tracking all movements on a spreadsheet. The tracking spreadsheet also records information such as location of each box in the batch, notes from the preparer for the archivists, photographers, and/or conservation staff, and the percentage of items in each box that can be OCR’d among other things.

We put a notice on our website of the entire schedule of archives to be digitised, so readers could see at a glance what would be unavailable and when. The catalogue records are also amended to show where material cannot be reserved. Each time a batch is retrieved, checked out, checked in, or dates altered, this has an impact on the website and two different cataloguing systems (the Archives and Manuscripts Catalogue, and the Library Catalogue), so communication with the departments responsible for retrieval and metadata was key.

The preparation staff were trained in advance by the conservation team so they could carry out basic stabilisation and first-aid work on the materials if required for digitisation. The photographers ran multiple tests on different equipment and with different cameras to ensure the workflow was efficient and appropriate to the formats of the material, the anticipated end use of the material, and to ensure proper QA could be accommodated. Preparation and imaging takes place in the Imaging Studio - ensuring that all staff are in close proximity and able to communicate easily with each other. The Imaging Studio was refitted with desks, shelving and equipment to make sure all the boxes in process at any one time could be accommodated. A further planning issue was in determining how to assess and record different levels of sensitivity of information contained in the archives. We are currently developing a policy for access to archives that takes account of online display, and this has informed the workflow for assessment.

This project required liaison between several different departments and stakeholders in the Library in order to set up a suitable workflow. In future, we hope that workflow issues will be streamlined further by procuring a Workflow Tracking System that will serve to centralise tracking and monitoring of all digitisation projects. We anticipate that this pilot project will enable us in future to plan effectively for much larger digitisation projects as we work towards the digitisation of all suitable material held in the Wellcome Library.