Preservation WG Notes, Hydra Connect 2, 2014

Digital Preservation Working Group (Dampeer)

Hydra Connect 2, October 3, 2014
Case Western University, Cleveland OH

Facilitator: Chris Hedegaard
Note Taker: Russell Schelby

Digital Preservation Activities in a Hydra Environment

Business case for PDF ready print
  • two copies of each book published in Denmark
  • Trying to eliminate paper-scanning steps, go right to print-ready pdf
  • Hydra System needs to ensure that intact and openable in the future
  • current research suggests that only a few files are passing (based on open source tools)
Characterization/Validation Tools
  • lots of documents coming in do not adhere to standards
  • using hydra characterization tool, via British Library wrapper: FLINT
  • not sure what the problems are, which is a problem
  • running jhove, frequently throws a valid tree error, should be a warning
  • would like to include FLINT in the Hydra characterization gem
  • Royal Library are a partner in the OpenPlanets foundation
  • right now it is a LEP gem, would like to have some partners for development
  • characterization is preservation tool, but also for digitization confirmation

Responses:

  • In the Archival Working Group, the Archivmatica representative brought up a plan for a Preservation Policy Repository.
  • FITS doesn’t get updated frequently enough, lots of tools
  • OpenPlanets is working towards updating the tools
  • OpenPlanets is not just European
  • Yale would like to be able to select which FITS tools are being run against files
  • FLINT wrapper has different versions: command line, GUI version
  • PDF preservation, does it make sense about lowering the level of pdf, so say to postscript?
  • PDFa would be another method for restricting, but there is still a great deal of lateral
  • Does PDFa 3 open the door for
  • Print-ready PDF will be very self-contained, won’t have much control over incoming pdf; will transform into PDFa and keep both copies.
  • Emulation focus - keep original copy, try and recreate the best environment for viewing
Other Use Cases
  • Video storage, the movements are so cumbersome, checks needed for long term storage
  • has the RDF been tackling Premise? Not heard in these meetings, just vocabularies
  • Prov vs Premise? Prov is WC3 standard for providence, action based, possibly a superset of Premise
  • Premise will offer and environment, much more expressive than a field
  • video checks right now, just an MD5 check against several digitization efforts. Frequently discrepancies

Notes

  • are you scheduling fixity checks? is possible, MOMA has a command line fixity checking script against Archivmatica
  • Royal Library has periodic checks of data pillars, checks current runs against last version’s runs, decides check sums (bit-level preservation)
  • File Format migrations? yes
  • Experience with PLATO? has evolved since use;
  • Plato is a preservation planning tool, checklist tool/support process, links to/decision trees for tools/formats
  • Workflow: use Plato, register with Preservation Task Registry, hydra runs these tasks
  • OpenPlanet project was focused on migration and Tools; Skale project was needed to help scale; didn’t have many use cases for migration
  • Should characterization happen in Hydra or outside; should all technical metadata be all together, e.g. for 20 year reformatting to fix an old reformat
  • How is technical metadata stored? Mets? FITS? would there be a discovery system in archives for searching on technical metadata? Penn State may have put FITS data into metadata stream. Code may have been revoked
  • What content standards should there be for technical metadata? Different formats will have different requirements
  • a really bad pdf could spit out hundreds of errors, wouldn’t be useful
  • does Royal Library track archival through Hydra? no, but that would be a goal

Next Steps?

Should this group update the Hydra Characterization Gem?
  • what is it? a wrapper for characterization tools
  • has a generic wrapper for your own tools
reconcile characterization and validation gems?
  • Penn state is running FITS via ArchiveSphere, how to genericize
  • e.g. Exif, media info
  • Avalon looking at changing their system, perhaps they could change
  • FIDO rather than FITS for pronum id’s; FITS uses brute force, i.e. run all the tests
  • institutions could share tool sets
  • how much of the metadata is stored in the repository vs. archive
  • Some institutions would like to have the ability to store information from archives in repository
Duplication/fixity across copies
  • informed by best practices, make sure all the copies are cool
  • this gets into preservation policies; i.e. different levels of data require different levels of care
  • there is still room for developing these stories & policies
  • different sets of pillars based on analog availability, access rights, etc.
Interest Group VS Working Group
  • Interest Group is more informal,
  • Working Group has more deliverables, formal agreements
  • looking at User Stories in wiki would be a good way to look at moving forward
  • Looks like we should change to an Interest Group, schedule monthly meetings
  • hydra-community might be best route
  • Adjust as necessary