Friday, March 23, 2012

Three Geneticists from the University of Glasgow

The Wellcome Digital Library isn’t just about collections held in the Wellcome Library. We are working with a number of other organisations that hold material on the history of modern genetics. One of the contributing partners, the University of Glasgow Archives Service, have just started to digitise the archives of three men who worked at the University’s Department of Genetics - Guido Pontecorvo (1907-1999), James (Jim) Harrison Renwick (1926-1994) and Malcolm Ferguson-Smith (1931 – ).

You can see photos of their brand new digitising suite here.

Monday, March 12, 2012

Wellcome Digital Library update

The Wellcome Digital Library pilot has been underway for 18 months with 6 months to go before we launch the new Library website. This will provide access to a wide range of digital content related to the Foundations of Genetics theme. All of the work done so far has been behind the scenes: digitising content, procuring and developing our digital library systems, and designing a new website. We are looking forward to displaying the product of all this work to the public - but we're not quite there yet!

So where are we now and what will we be doing in 2012? Here is a snapshot of progress so far. Further details on some of these projects can be found on this blog, and we will continue to explain our activities in more detail in future posts.

  • Archives: With our in-house team of two photographers, we have digitised around 380,000 pages from the collections of Crick, Mourant, Medawar, Sanger, Wyatt, Grueneberg, and the Blood Group Unit. We have just started the Eugenics Society collection, which will carry on throughout the spring and summer.
  • Genetics Books: This project has just begun, with up to 2,000 books to be digitised this spring by an external supplier, Bespoke Archive Digitisation, working on-site. 
  • MOH reports: A successful JISC funding bid meant we could add the Greater London Medical Officer of Health reports to the pilot project. Conservation is underway, and digitisation will begin in a few month's time.
  • ProQuest: We have partnered with ProQuest to digitise our pre-1700 printed books for Early European Books online, with over 1,000 books now digitised and around 13,000 to go. Those with subscriptions and anyone in the UK can view our first 400 books on the EEB website with more to come shortly (search for "Wellcome").
  • External content: We have had the first delivery from one of our external partners, Cold Spring Harbour Laboratory, including correspondence from the James Watson archive. This adds around 50,000 images to our digital archive collections, with more to come throughout 2012 and early 2013 from all partners.
  • Copyright and sensitivity: Hand in hand with digitisation, we are assessing our content for sensitivity and copyright issues where necessary. Sensitive items (containing certain types of private information as defined by the Data Protection Act) are identified and flagged as unsuitable for online dissemination. Copryight clearance of in-copyright works is underway with the help of the Authors' Licensing and Collecting Society, and the Publishers' Licensing Society. 

Systems development
  • Digital Asset Management & Storage: Safety Deposit Box 4.1, our digital asset management system, was extended to provide extra functionality for large sets of digital assets in 2011. This system is now in production. Our storage system, Pillar, now includes a Write Once Read Many (WORM) backup drive to ensure that our files are secure in the long term.
  • Workflow system: We procured Goobi (Intranda Version) with bespoke modifications in 2011 to act as a workflow system, enabling us to track project progress, and to automate a number of activities (including ingest of content into Safety Deposit Box). This has recently been put into use in production, particularly for the Genetics books digitisation project. Soon we will be using Goobi for all digitisation projects, and to ingest our backlog of images.
  • JPEG 2000: We now archive all our images in the JPEG 2000 (Part 1) format, and have an automated batching process set up with LuraWave. Soon, we will be implementing JPEG 2000 validation as part of this process to ensure all JPEG 2000s meet the correct standards before ingest.
  • Digital delivery: A new digital delivery system is currently under development that will interoperate with Safety Deposit Box and our new website content management system, Alterian CM7. We have commissioned CM7 developers Digirati to carry out this development, which will be completed at the end of the summer. So far they have produced a proof of concept system that demonstrates an end-to-end sequence from retrieval of images from Safety Deposit Box using METS files created by Goobi, to displaying images online. They are adapting Seadragon, the MS viewer used by several other digital libraries, to meet our specific needs and design criteria.
  • Search and discovery: We are also making changes to our single search system, Encore. This work is looking at providing better representation of archival metadata in Encore, and also options for incorporating a full-text index. The purpose is to provide access to all Library content - the catalogues as well as the digitised materials - via a single interface.

New website and user experience
  • User experience-led design: Last year the Library brought on board external suppliers Clearleft - user experience and web design experts - to help redesign the information architecture and visual appearance of the new website. New designs are already visible on the internal web development environment, so further user testing of a real website can soon be done.
  • Transferring content: The Library has carried out a full content audit of the current website, and prioritised content to carry across to the new site. The current site contains over 2,000 pages; this will be considerably reduced. The content carried across to the new site will be thoroughly edited to ensure it is up-to-date and consistent with the new site "style".
  • Creating new content: New content will also be created once the content management system is in action, with a focus on the Foundation of Modern Genetics. This is a major part of the Library's aim to provide interpretative content to both researchers and the "curious public".

Monday, March 5, 2012

Filling the MOH Gaps

The Wellcome Library has a great collection of Medical Officer of Health (MOH) Reports. These reports are stuffed full of grim and useful information from the 19th and 20th century, such as statistics on infant mortality. JISC is, very wisely, funding the digitisation of the London reports. There are some gaps in the Wellcome Library’s collection so, in order to make a really useful digital resource, we have been working out what is missing. This has not been straightforward.

First we needed to check what we held and then we needed to make sure that our gaps really were gaps. We didn’t want to waste time looking for reports that were never created. The very first MOH report in Britain was produced in Liverpool in 1847. The first London report was produced in the following year but the early reports do not cover the whole of London. The Public Health Act of 1848 permitted local authorities to employ MOHs but, since it was not obligatory, only a minority did. The Metropolis Management Act of 1855 required MOHs to be appointed in central London but the big change came with the 1875 Public Health Act. From then until 1972 the production of MOH reports was pretty solid.

Another challenge was getting to grips with the boundary changes. Over the years the administrative boundaries of London have altered several times. The current 32 London Borough boundaries date from 1965 when Greater London was established. Before that there were 28 metropolitan boroughs plus various boroughs, urban and rural district councils in what is now, outer London. Before 1899 much of what we now think of as London was part of Kent, Middlesex, Essex or Surrey. The City of London has long gone its own distinctive way and the tangle of parish boundaries there is particularly confusing. Old maps and a book on administrative units by Frederick A. Youngs helped us to make sense of all these changes.

We have decided to start from the centre and try to create a complete as record as possible for the 12 inner London Boroughs. We’ve got to the stage where we have a list of reports that we want to find. The next step is to track them down in other collections and ask if we can get them digitised.

Watson and Crick Letters

I've just had the privilege of reading a fantastic series of letters written in 1954 by James Watson and Francis Crick. They were written a year after they published their seminal article on the structure of DNA. In the letters the two men are exchanging ideas and their excitement shines through. They write about all sorts of things, for example, the importance of building space filling three dimensional models, confusion over how thymine fits into the helical structure and what the researchers at KCL are up to. In March 1954 Watson also expresses his frustration with the research process, “The whole thing is puzzling and paradoxical (for could DNA be wrong) and is slowly driving me to despair and to loath nucleic acids.” (PP/CRI/D/2/45)

I got to read them because last month the first batch of digitised material arrived from Cold Spring Harbor Laboratory in New York, one of the five external organisations contributing digitized material to the WDL pilot project. The James Watson archive is held at Cold Spring Harbor and contains the letters written to him by Francis Crick. The letters Watson wrote to Crick are held by the Wellcome Library.

Later this year, when the WDL is launched, Watson and Crick’s correspondence will be digitally united. Lots of people will be able to read these letters (and lots of other stuff) online while the originals stay safely tucked away in their archival homes. I am excited about that!