Preservation WG Notes, Hydra Connect 2, 2014
Digital Preservation Working Group (Dampeer)
Hydra Connect 2, October 3, 2014
Case Western University, Cleveland OH
Facilitator: Chris Hedegaard
Note Taker: Russell Schelby
Digital Preservation Activities in a Hydra Environment
Business case for PDF ready print
- two copies of each book published in Denmark
- Trying to eliminate paper-scanning steps, go right to print-ready pdf
- Hydra System needs to ensure that intact and openable in the future
- current research suggests that only a few files are passing (based on open source tools)
Characterization/Validation Tools
- lots of documents coming in do not adhere to standards
- using hydra characterization tool, via British Library wrapper: FLINT
- not sure what the problems are, which is a problem
- running jhove, frequently throws a valid tree error, should be a warning
- would like to include FLINT in the Hydra characterization gem
- Royal Library are a partner in the OpenPlanets foundation
- right now it is a LEP gem, would like to have some partners for development
- characterization is preservation tool, but also for digitization confirmation
Responses:
- In the Archival Working Group, the Archivmatica representative brought up a plan for a Preservation Policy Repository.
- FITS doesn’t get updated frequently enough, lots of tools
- OpenPlanets is working towards updating the tools
- OpenPlanets is not just European
- Yale would like to be able to select which FITS tools are being run against files
- FLINT wrapper has different versions: command line, GUI version
- PDF preservation, does it make sense about lowering the level of pdf, so say to postscript?
- PDFa would be another method for restricting, but there is still a great deal of lateral
- Does PDFa 3 open the door for
- Print-ready PDF will be very self-contained, won’t have much control over incoming pdf; will transform into PDFa and keep both copies.
- Emulation focus - keep original copy, try and recreate the best environment for viewing
Other Use Cases
- Video storage, the movements are so cumbersome, checks needed for long term storage
- has the RDF been tackling Premise? Not heard in these meetings, just vocabularies
- Prov vs Premise? Prov is WC3 standard for providence, action based, possibly a superset of Premise
- Premise will offer and environment, much more expressive than a field
- video checks right now, just an MD5 check against several digitization efforts. Frequently discrepancies
Notes
- are you scheduling fixity checks? is possible, MOMA has a command line fixity checking script against Archivmatica
- Royal Library has periodic checks of data pillars, checks current runs against last version’s runs, decides check sums (bit-level preservation)
- File Format migrations? yes
- Experience with PLATO? has evolved since use;
- Plato is a preservation planning tool, checklist tool/support process, links to/decision trees for tools/formats
- Workflow: use Plato, register with Preservation Task Registry, hydra runs these tasks
- OpenPlanet project was focused on migration and Tools; Skale project was needed to help scale; didn’t have many use cases for migration
- Should characterization happen in Hydra or outside; should all technical metadata be all together, e.g. for 20 year reformatting to fix an old reformat
- How is technical metadata stored? Mets? FITS? would there be a discovery system in archives for searching on technical metadata? Penn State may have put FITS data into metadata stream. Code may have been revoked
- What content standards should there be for technical metadata? Different formats will have different requirements
- a really bad pdf could spit out hundreds of errors, wouldn’t be useful
- does Royal Library track archival through Hydra? no, but that would be a goal
Next Steps?
Should this group update the Hydra Characterization Gem?
- what is it? a wrapper for characterization tools
- has a generic wrapper for your own tools
reconcile characterization and validation gems?
- Penn state is running FITS via ArchiveSphere, how to genericize
- e.g. Exif, media info
- Avalon looking at changing their system, perhaps they could change
- FIDO rather than FITS for pronum id’s; FITS uses brute force, i.e. run all the tests
- institutions could share tool sets
- how much of the metadata is stored in the repository vs. archive
- Some institutions would like to have the ability to store information from archives in repository
Duplication/fixity across copies
- informed by best practices, make sure all the copies are cool
- this gets into preservation policies; i.e. different levels of data require different levels of care
- there is still room for developing these stories & policies
- different sets of pillars based on analog availability, access rights, etc.
Interest Group VS Working Group
- Interest Group is more informal,
- Working Group has more deliverables, formal agreements
- looking at User Stories in wiki would be a good way to look at moving forward
- Looks like we should change to an Interest Group, schedule monthly meetings
- hydra-community might be best route
- Adjust as necessary