Samvera Community Wiki
Preservation WG Notes, Hydra Connect 2, 2014
Digital Preservation Working Group (Dampeer)
Hydra Connect 2, October 3, 2014
Case Western University, Cleveland OH
Facilitator: Chris Hedegaard
Note Taker: Russell Schelby
Digital Preservation Activities in a Hydra Environment
Business case for PDF ready print
two copies of each book published in Denmark
Trying to eliminate paper-scanning steps, go right to print-ready pdf
Hydra System needs to ensure that intact and openable in the future
current research suggests that only a few files are passing (based on open source tools)
Characterization/Validation Tools
lots of documents coming in do not adhere to standards
using hydra characterization tool, via British Library wrapper: FLINT
not sure what the problems are, which is a problem
running jhove, frequently throws a valid tree error, should be a warning
would like to include FLINT in the Hydra characterization gem
Royal Library are a partner in the OpenPlanets foundation
right now it is a LEP gem, would like to have some partners for development
characterization is preservation tool, but also for digitization confirmation
Responses:
In the Archival Working Group, the Archivmatica representative brought up a plan for a Preservation Policy Repository.
FITS doesn’t get updated frequently enough, lots of tools
OpenPlanets is working towards updating the tools
OpenPlanets is not just European
Yale would like to be able to select which FITS tools are being run against files
FLINT wrapper has different versions: command line, GUI version
PDF preservation, does it make sense about lowering the level of pdf, so say to postscript?
PDFa would be another method for restricting, but there is still a great deal of lateral
Does PDFa 3 open the door for
Print-ready PDF will be very self-contained, won’t have much control over incoming pdf; will transform into PDFa and keep both copies.
Emulation focus - keep original copy, try and recreate the best environment for viewing
Other Use Cases
Video storage, the movements are so cumbersome, checks needed for long term storage
has the RDF been tackling Premise? Not heard in these meetings, just vocabularies
Prov vs Premise? Prov is WC3 standard for providence, action based, possibly a superset of Premise
Premise will offer and environment, much more expressive than a field
video checks right now, just an MD5 check against several digitization efforts. Frequently discrepancies
Notes
are you scheduling fixity checks? is possible, MOMA has a command line fixity checking script against Archivmatica
Royal Library has periodic checks of data pillars, checks current runs against last version’s runs, decides check sums (bit-level preservation)
File Format migrations? yes
Experience with PLATO? has evolved since use;
Plato is a preservation planning tool, checklist tool/support process, links to/decision trees for tools/formats
Workflow: use Plato, register with Preservation Task Registry, hydra runs these tasks
OpenPlanet project was focused on migration and Tools; Skale project was needed to help scale; didn’t have many use cases for migration
Should characterization happen in Hydra or outside; should all technical metadata be all together, e.g. for 20 year reformatting to fix an old reformat
How is technical metadata stored? Mets? FITS? would there be a discovery system in archives for searching on technical metadata? Penn State may have put FITS data into metadata stream. Code may have been revoked
What content standards should there be for technical metadata? Different formats will have different requirements
a really bad pdf could spit out hundreds of errors, wouldn’t be useful
does Royal Library track archival through Hydra? no, but that would be a goal
Next Steps?
Should this group update the Hydra Characterization Gem?
what is it? a wrapper for characterization tools
has a generic wrapper for your own tools
reconcile characterization and validation gems?
Penn state is running FITS via ArchiveSphere, how to genericize
e.g. Exif, media info
Avalon looking at changing their system, perhaps they could change
FIDO rather than FITS for pronum id’s; FITS uses brute force, i.e. run all the tests
institutions could share tool sets
how much of the metadata is stored in the repository vs. archive
Some institutions would like to have the ability to store information from archives in repository
Duplication/fixity across copies
informed by best practices, make sure all the copies are cool
this gets into preservation policies; i.e. different levels of data require different levels of care
there is still room for developing these stories & policies
different sets of pillars based on analog availability, access rights, etc.
Interest Group VS Working Group
Interest Group is more informal,
Working Group has more deliverables, formal agreements
looking at User Stories in wiki would be a good way to look at moving forward
Looks like we should change to an Interest Group, schedule monthly meetings
hydra-community might be best route
Adjust as necessary