Technical Metadata Call 2015-04-15
Time: 11am PDT / 2pm EDT
Call-In Info: Google Hangout: https://plus.google.com/hangouts/_/event/cpt6do5qc6l6mt4g853rqolk5tk
Moderator: Aaron Coburn (Amherst College)
Notetaker: Justin Simpson
Attendees:
Former user (Deleted) (Amherst College)
- Nick Ruest (York University)
- Sharon Farnel (University of Alberta)
- Juliet Hardesty (Artefactual Systems)
Agenda:
- Subgroup review
- introductions
- meeting schedule
- Review subgroup goals
- Technical metadata profile (and how it fits into PCDM)
- Examples of metadata properties for various user stories (image, manuscript, audio, video, dataset, etc)
- Categories of tools (see the list here: Technical Metadata Working Group)
- Mapping of tool output to RDF
- Decide on Next Steps
Notes:
- a. introductions were completed.
b. Schedule - we agreed to hold a regular weekly call Wednesdays at 2pm eastern for the rest of April, and keep the option of extra calls open on an ah hoc basis. The motivation is to have something to provide to the people working on the Sufia code sprint in May. We plan to work by email and in Github pull requests/issues between calls. - We agreed on a set of initial goals
- Define a set of core rdf predicates, that apply to all types of files, and express this as an rdfs/owl schema. Start with this baseline and convert it to rdfs.
- Come up with a suggested mapping between FITS xml and these core predicates (this is what the Sufia group will want we are presuming).
- Define a set of rdf predicates, one set for each different class of File Formats, and add these to the schema one at a time.
We discussed the File Format Types idea from Ben Armintor, that Ben and Aaron have been working on this week. We agreed that goal a) above should be defining the predicates for the Document class, and step c) can define predicates for one or more of the other classes listed there, such as Audio or Video.
We also discussed the list of tools here. We agreed to focus on defining predicates, rather than recommending specific tools. We will do a basic mapping from FITS to these predicates, but FITS is not necessarily the tool of choice for all classes of file formats. We might look at some of these other tools, and work on mappings between them and the properties defined by this group, later. We discussed the importance of being able to reproduce the characterization results - which implies the ability to record what tool was used to generate certain sets of predicates, what version of that tool etc. The idea of building on the Archivematica FPR, in rdf was touched on as a possible way to to do that (https://github.com/jhsimpson/fpr-rdf is an attempt to define a schema for defining preservation tools and commands use to run them). We agreed that this comes later, in any case.
We also discussed marking some of the predicates as required vs recommended. For example, some institutions might want to be able to assert PREMIS compliance, how should they do that? Agreed that rdf is not like xml, you can't validate a schema, in rdf everything is optional. However, we could say that for each event in the list of PREMIS event types (checksum generation, characterization, etc) - here is the predicate that we suggest you use to store the outcome of that event. - Define a set of core rdf predicates, that apply to all types of files, and express this as an rdfs/owl schema. Start with this baseline and convert it to rdfs.
- Next Steps - Nick created a github repo that we will use to house sample FITS output, for various classes of file formats, and work on the rdfs to define the predicates for each class. That repo is here. We talked about how to move code into the actual PCDM organization, as we are ready. Aaron mentioned that there is a Fedora tech call tomorrow, that has an item on the agenda about this - deciding on who the committers are to the duraspace/pcdm github repo, so this issue will likely be resolved before our meeting next week. In the meantime, we will work in Nick's personal github repo and transfer this to pcdm, or do PR's against pcdm, later.
Action Items
Nick to create a github repo (done).
Justin to type up meeting minutes (done).
Justin to email the hydra lists soliciting feedback/help on working in Nicks github repo.
Next call: 2015-04-22