/
Preservation WG Notes, Hydra Connect 2, 2014
Preservation WG Notes, Hydra Connect 2, 2014
Digital Preservation Working Group (Dampeer)
Hydra Connect 2, October 3, 2014
Case Western University, Cleveland OH
Facilitator: Chris Hedegaard
Note Taker: Russell Schelby
Digital Preservation Activities in a Hydra Environment
Business case for PDF ready print
- two copies of each book published in Denmark
- Trying to eliminate paper-scanning steps, go right to print-ready pdf
- Hydra System needs to ensure that intact and openable in the future
- current research suggests that only a few files are passing (based on open source tools)
Characterization/Validation Tools
- lots of documents coming in do not adhere to standards
- using hydra characterization tool, via British Library wrapper: FLINT
- not sure what the problems are, which is a problem
- running jhove, frequently throws a valid tree error, should be a warning
- would like to include FLINT in the Hydra characterization gem
- Royal Library are a partner in the OpenPlanets foundation
- right now it is a LEP gem, would like to have some partners for development
- characterization is preservation tool, but also for digitization confirmation
Responses:
- In the Archival Working Group, the Archivmatica representative brought up a plan for a Preservation Policy Repository.
- FITS doesn’t get updated frequently enough, lots of tools
- OpenPlanets is working towards updating the tools
- OpenPlanets is not just European
- Yale would like to be able to select which FITS tools are being run against files
- FLINT wrapper has different versions: command line, GUI version
- PDF preservation, does it make sense about lowering the level of pdf, so say to postscript?
- PDFa would be another method for restricting, but there is still a great deal of lateral
- Does PDFa 3 open the door for
- Print-ready PDF will be very self-contained, won’t have much control over incoming pdf; will transform into PDFa and keep both copies.
- Emulation focus - keep original copy, try and recreate the best environment for viewing
Other Use Cases
- Video storage, the movements are so cumbersome, checks needed for long term storage
- has the RDF been tackling Premise? Not heard in these meetings, just vocabularies
- Prov vs Premise? Prov is WC3 standard for providence, action based, possibly a superset of Premise
- Premise will offer and environment, much more expressive than a field
- video checks right now, just an MD5 check against several digitization efforts. Frequently discrepancies
Notes
- are you scheduling fixity checks? is possible, MOMA has a command line fixity checking script against Archivmatica
- Royal Library has periodic checks of data pillars, checks current runs against last version’s runs, decides check sums (bit-level preservation)
- File Format migrations? yes
- Experience with PLATO? has evolved since use;
- Plato is a preservation planning tool, checklist tool/support process, links to/decision trees for tools/formats
- Workflow: use Plato, register with Preservation Task Registry, hydra runs these tasks
- OpenPlanet project was focused on migration and Tools; Skale project was needed to help scale; didn’t have many use cases for migration
- Should characterization happen in Hydra or outside; should all technical metadata be all together, e.g. for 20 year reformatting to fix an old reformat
- How is technical metadata stored? Mets? FITS? would there be a discovery system in archives for searching on technical metadata? Penn State may have put FITS data into metadata stream. Code may have been revoked
- What content standards should there be for technical metadata? Different formats will have different requirements
- a really bad pdf could spit out hundreds of errors, wouldn’t be useful
- does Royal Library track archival through Hydra? no, but that would be a goal
Next Steps?
Should this group update the Hydra Characterization Gem?
- what is it? a wrapper for characterization tools
- has a generic wrapper for your own tools
reconcile characterization and validation gems?
- Penn state is running FITS via ArchiveSphere, how to genericize
- e.g. Exif, media info
- Avalon looking at changing their system, perhaps they could change
- FIDO rather than FITS for pronum id’s; FITS uses brute force, i.e. run all the tests
- institutions could share tool sets
- how much of the metadata is stored in the repository vs. archive
- Some institutions would like to have the ability to store information from archives in repository
Duplication/fixity across copies
- informed by best practices, make sure all the copies are cool
- this gets into preservation policies; i.e. different levels of data require different levels of care
- there is still room for developing these stories & policies
- different sets of pillars based on analog availability, access rights, etc.
Interest Group VS Working Group
- Interest Group is more informal,
- Working Group has more deliverables, formal agreements
- looking at User Stories in wiki would be a good way to look at moving forward
- Looks like we should change to an Interest Group, schedule monthly meetings
- hydra-community might be best route
- Adjust as necessary