Wednesday, September 21, 2011

Preserving our digital assets #2 - WORM storage

This post follows on from my earlier description of our DAM system (Safety Deposit Box  or SDB), which manages long-term preservation and access to our digital assets. Here we turn to the back-end of the back-end: storage.

There are a number of requirements that must be met to safely storing digital assets. The storage solution must be:
  • Secure (behind a firewall)
  • Robust (able to manage points of failure in the disks)
  • Replicable (multiple copies on multiple sites)
  • Scalable (able to handle tens of millions of files)
  • Quick to access (the archived files are also used for delivery) 
After considering different systems and suppliers – including robotic tape back-up – we settled on a solution that gives us the confidence we need for long-term preservation, and fits well with the Trust’s existing storage infrastructure. Our existing storage suppliers, Pillar Data Systems, have extended the existing RAID5 enterprise storage system used for all the Wellcome Trust’s business needs by incorporating a “Write Once, Read Many” (WORM) back-up storage server for use by the Wellcome Digital Library. Associated management software copies files from the main storage server to the WORM, and monitors the main server for file errors that can be “healed” using the WORM copy.

To explain this in a bit more detail, related to our primary requirements:

Security means that only authorised users or systems are able to access the files, and that unauthorised deletions or changes are guarded against. Locking master files behind a firewall is the main form of defence from unauthorised external access (i.e. hackers, or the ability to download files by "guessing" the network path and filenames). However, we still faced the prospect of file deletion, changes or corruption due to system failures, and accidental or malicious actions by otherwise authorised users. In order to eliminate this possibility, the Trust IT Department recommended permanent, on-line WORM storage as our back-up solution. Files stored on the WORM can be accessed, but they cannot be overwritten or deleted. This means that we have a permanent back-up of every digital file that cannot be tampered with.

Robustness is tied up with the WORM system - although the RAID5 storage we normally use is highly robust. It distributes the bit stream of digital files in such a way that points of failure generally do not damage the entire file, allowing it to be reconstituted, while short-term back-ups allows complete recovery of damaged or lost files. The WORM system adds a further element of confidence. Once a new file is stored on the main servers, it is copied to the WORM drive. The software managing this process checks the files on the main servers periodically, and if it finds a mis-match with the WORM drive (a lost or corrupted file on the main server), it pulls a copy of the WORM'ed file to the main server, thus "self-healing" the damage.

Replicability means that there is more than one copy of the file. One lives on the main servers at the Wellcome Trust offices, and the other is stored outside of London. If either server is damaged in a serious accident, one server remains to keep the content safe.

Scalability is important, as we are creating millions of images during the pilot project, and will create up to 30 million images over the longer term as we digitise the Wellcome Library. All the systems that are in place must be able to increase capacity - both in terms of hardware and processing software. The system we selected is scalable – we simply need to add storage “bricks” and “racks” (additional hardware), and processing units to manage that additional hardware, as our data store grows over time.

Speed of access is also key. In order to keep our storage footprint as small as possible, we are using one file as both the master archive file (preserved in the long-term, and backed-up to the WORM) and as the dissemination file. Our DAM, SDB, will maki copies of the archived files available to the front-end delivery system as the user requests images via the Wellcome Digital Library (or via back-end administrative systems), and this must be handled as quickly as possible – something RAID5 is particularly well suited for.