FileSets Working Group

Scope & Objectives  

This working group is interested in expanding PCDM to incorporate FileSet as a defined resource class for use in CurationConcerns as part of the Hydra Stack. This work will expand how FileSets can be used [1], such that the following use cases will be satisfied:

  • AIC and Hydra-in-a-Box - A FileSet is formally defined in PCDM as an aggregation of Files derived from the same source, containing different formats or subsets of the same content (original, master, intermediate, access, thumbnail etc.)

  • AIC and Hydra-in-a-Box - A FileSet becomes the designated place to represent digital content and can have descriptive metadata about that content directly attached to it; technical metadata are related to the individual Files

  • HydraDAM2 (WGBH and IU) - Technical metadata as an explicit XML file can be stored in the same FileSet as the file from which the technical metadata is produced

  • AIC and Hydra-in-a-Box - pcdmuse terms are used to designate the role of each File in a FileSet

Deliverables & Timeframe

The initial goals of the working group are:

  • Review current pcdmuse ontology [2] and propose new classes if necessary

  • Add a CRUD workflow in Curation Concerns to create, edit and delete FileSets as independent resources: 

    • Enter and modify metadata for the FileSet
    • Add and remove Files within a FileSet
    • Change pcdmuse terms for each File in a FileSet

  • Create documentation with guidelines for repository developers and managers about the intended different use of FileSets and Objects
The updates to PCDM and code changes will be available for testing by March 1, 2017 [tentative date].

Meeting Times & Communication Channels

Frequency: every 2 weeks

Connection info: Hangouts: https://hangouts.google.com/hangouts/_/artic.edu/pcdm-filesets

Communication will take place on PCDM and Hydra-Tech mailing lists.

Note that following current best practices within Hydra, Working and Interest groups should use an existing channel unless and until it becomes clear that a dedicated channel is needed. This section should specify which existing channel(s) will be used: e.g., hydra-tech, hydra-partners, hydra-community@googlegroups.com. When using a shared channel, individual working groups should start the subject line with their name in []s, such as [archives] for the Archives Working Group. If and when a dedicated channel is needed, the new channel should be well publicized and open to any interested subscribers/participants in the community. 

Members

Note that Working Groups must have participants from three different Partners, and that all members of a WG must be licensed Hydra contributors, with the appropriate CLAs in place. 

References

Github repo: https://github.com/projecthydra-labs/hydra_file_sets_wg

 

[1] An extensive discussion on the topic can be found at https://github.com/duraspace/pcdm/issues/59

[2] http://pcdm.org/2015/05/12/use