KBDK Use Case
Introduction
In the following the preservation architecture for the KBDK Hydra project will be presented.
KBDK Hydra architecture
It consists of three platforms:
- A Hydra Dissemination platform called Bifrost.
- A Hydra Management platform called Valhal
- A Preservation platform called Yggdrasil.
Data and metadata is ingested into Valhal, where after processing it is distributed to Yggdrasil and Bifrost.
Some data must not be published (e.g. embargo), and may therefore not be sent to dissemination at Bifrost, whereas other data is not intended for preservation and it is therefore not sent to Yggdrasil.
Data and metadata in the Preservation platforms are focused on long-term preservation ( more than 50 years), so the Preservation platform is responsible for active bit – preservation as well as functional/logical preservation.
KBDK Digital Preservation architecture .
It consist of
- A Preservaton platform (Yggdrasil)
- A Management platform (Valhal)
- KB-Bitrepository
Valhal contains two preservation oriented services (the green boxes in the above figure) a simple file identification service and a characterization service. The characterization service is using the Hydra gem wrapping FITS (hydra-file_characterization).
The Valhal platform also holds the user interface for digital preservation operations. From the interface it is possible to start preservation workflow and set the level and confidentiality of the long-term bit preservation.
The Yggdrasil platform is a service-oriented platform that doesn’t persist data and metadata, but is focused on processing the data (metadata) it receives from respectively Valhal and the Bitrepository.
When Yggdrasil receives a preservation request (along with data and metadata) from Valhal it structures the metadata in METS files and packs the data and the METS into a WARC file. After structuring and packing Yggdrasil sends the WARC file to the KB-Bitrepository. All states in the process are reported back to Valhal during the processing.
When Yggdrasil receives a restore request the reverse process of the above mentioned is started, resulting in a restore of data and metadata into Valhal. An import request from Valhal to Yggdrasil only restores data, not metadata.
Of the services illustrated above (green boxes in the figure) the following is implemented at the time of this writing: Identification, Characterization, METS structuring, WARC packing, validation (METS and WARC), Bitrepository integration.
Metadata exchange between Yggdrasil and Valhal together with the exchange of messages (states) is done using a RabbitMQ message broker. Data exchange between Yggdrasil and Valhal is done through the Valhal Rest interface.
The Valhal platform is implemented in Ruby and the Yggdrasil platform is implemented in Java.
KB-BitRepository
The BitRepository creates an infrastructure for performing distributed Bit preservation.
It basically consists of pillars storing data, clients delivering and retrieving data, services verifying data integrity, and a messagebus protocol for communication.
The composition of pillars in a distributed digital preservation setup should have the following considerations:
- Different hardware – so failures in one hardware model defect does not jepardize all copies. This also involves using different storage media (e.g. harddisc, tape, optical disc, etc.).
- Different software platform – lowers the risk of data loss due to software errors and security issues.
- Different physical location – for having the data survive if a disaster strikes one location, e.g. earthquakes, flooding, terror-attacks, war, etc.
- Different organizations – to avoid having technicians having acces to and being able to delete at all pillars.