Hydra and digital preservation

Breakout sessionWednesday, Sept 23, 2015

Chris Awre, University of Hull, Moderator

(with thanks to Alicia Morris and Barbara Mathe for note-taking)

Sharing Experiences of using Hydra to integrate with a preservation environment

Trinity College Dublin (Digital Repository of Ireland)

  • Has developed a preservation policy document outlining infrastructure and organization issues impacting preservation management. 
  • They are not relying on Fedora for preservation
  • Using the Moab Gem for versioning (created at Stanford as a collection of methods for digital object versioning.  See Code4Lib article for more information, and note below)
  • Will do something with Premis.

Emory reported they are also using Premis.  But it looks as if Premis metadata is not included in the PCDM (see below). 

Stanford - They knew they wanted to process digital object version content and metadata and developed the Moab Gem.  It’s not part of Hydra in itself, but can be used within a Hydra environment. 

WGBH is working with Indiana University on HydraDam 2.  They are keeping an eye on PCDM and how it fits with HydraDam2.  Things are moving quickly and is impacting how HydraDam is developed. They are using Cakebake to manage storage: asynchronous storage connector within Fedora 4.  Ingest is a separate process.  It is NOT Hydra-based EXCEPT there are more ways for Fedora to act along these lines…e.g. predicates in RDF in Fedora.  Federated files. …Fedora will go across the system and put the objects in Fedora stored in hierarchical file system not in the RDF system in Hydra, node based in F4 compared to RDF.

At Indiana preservation happens within Fedora.  It’s not a Hydra based approach.  However, it would be to Indiana’s advantage to be able to interact with Fedora via Hydra for this purpose. 

At Hull there was an initial suggestion that a Hydra gem be created to handle digital preservation.  The experience has been that there may be external applications available to handle that functionality, and no wish to duplicate effort.  It informed their decision to explore Archivmatica and how this can be integrated with repository workflows.

At the Royal Library in Denmark, they are integrating all preservation planning and management into Hydra and providing a curator application interface? They are seeking to be Premis compliant.

Penn State has been building up their ArchiveSphere Hydra head, based on ScholarSphere, which includes a number of digital preservation functions (see http://stewardship.psu.edu/2013/07/08/introducing-archivesphere/).

UCSD has made use of Chronopolis as a back-end shared preservation store (see https://libraries.ucsd.edu/chronopolis/).


Issues arising in discussion

It was mentioned that versioning could be an issue within the Hydra in a Box project.

Has PCDM been part of the preservation discussion?  We don’t think so.  Provenance hasn’t been modelled in the PCDM model – it’s been a set of properties added on.  Is this an issue to be addressed by the Metadata Interest Group?

Could provenance be a datastream to be added to the model? Should we identify critical provenance information (events) to be captured, such as: 

  • Has the object been versioned?
  • Has the metadata changed?

Capturing events such as these can act as triggers to re-export to a preservation archive.  Addressing whether the object been versioned in the IR avoids reimporting entire IR into preservation archive.

There is a process in Fedora that will store Premis events for server side events. The events are stored as the Fedora user that executed those events which may not be enough for preservation management requirements.

In thinking about digital preservation, to what extent should we be thinking about it at the Hydra level instead of the Fedora level?

  • At Trinity College Dublin, they didn’t choose Fedora because the underlying structure of Fedora was not well understood.
  • Many preservation librarians have decided to integrate their preservation management activities as part of the Hydra tools. Perhaps it’s easier to integrate external tools with Hydra?
  • To the extent that we need interfaces for curators to be using, Hydra is a better solution.
  • Question: what are the interfaces and what are they doing?  When a curator needs an interface to make a decision or respond to an event – Hydra is appropriate. (e.g.: fixity checking)
  • Penn State (ScholarSphere) has discussed some of these same issues, but haven’t come up with solutions yet.

For large files, some systems have their own file level fixity check sum possibilities.

Good to have Fedora to provide Hydra with the interface…

Question: Is anyone planning to use consortial preservation services (DPN, APTrust, HathiTrust)?  Some are looking into it. APTrust is using Fedora for the metadata, but the content is stored elsewhere

Need to avoid duplicating what’s out there already and work out what’s better re: sustainability, etc.

Explore Archivematica to see how that tool can be exploited. Hydra as a box in a bigger DP diagram. Focus on local mgmt. and specific needs and allow Archivematica to do its thing.  Systems that help with DP, not a DP system that exists in its own right. 

The discussion highlighted there are two obvious patterns in the Hydra Community: those that want to utilize Hydra to handle preservation needs, and others that are using Hydra as part of a larger preservation management strategy.  The question is, how could this group satisify the needs/desires of each group.

Islandora excited by doing check sums and fixity. 

Digital preservation is still an emerging field.  Given that, we have community of practices.  That’s as much value as the interfaces we are trying to provide.

There are a variety of ways to accomplish digital preservation management and it would be good to share the ways we are doing this and the types of services currently in use.  That may help tease out how and to what extent Hydra can be utilized to provide preservation management services.

Topics that the interest group should undertake

  • Talk about Premis and the events we would want to see defined in Hydra
  • It’s connection with PCDM and to make sure that PCDM can take into account digital preservation needs
  • Finding the common actions and assessing at what layers those actions could be run, e.g., checksum generation and checking. That would help decide where in the service those actions should rest.
  • For those that are considering Hydra it would be helpful to be able to articulate how Hydra can help with satisfying digital preservation needs
  • Performing a gap analysis and creating a list of functional requirements for preservation needs and perhaps identifying costs.  The LIFE cost model could be used for this.
  • Share screen shots of systems in development