2016-10-20—PCDM FileSets Meeting

Date and Time

October 20 2016, 2pm EDT

Connection Information

Google Hangouts: https://hangouts.google.com/hangouts/_/artic.edu/pcdm-filesets?authuser=0

Attendees

Regrets:

Agenda

  1. Post-Hydra Connect thoughts
    1. Comments on documents produced so far: https://drive.google.com/open?id=0ByRadxtBjDyjbTdGcGhadFJEclk
    2. Any missing use cases?
    3. Unresolved use cases (in particular Andrew Myers' case of extracted metadata files) 
  2. FileSets Manifesto: https://docs.google.com/document/d/1ioBiNqe_bXBm0BPLBnEsbETCZciKAqNHqW253Eud2VI/edit?usp=sharing
    1. Review for publishing to the wider Hydra and PCDM community
  3. Review main implementation tasks and challenges
    1. Gems and projects affected
    2. Content migration—tooling to be provided

Minutes

Indexing will require thought and experimentation. We have content files, metadata files, can loop through them to index them now.

General agreement on Hydra Re-imagined One diagram, but code changes need to happen to make the files discoverable -- can be done now, but needs to be implemented.

Andrew -- we've done that, now our strategy is to pull stuff out of metadata and putting them into Fileset mean adding Fileset model to solr query that returns records .. now we're talking about PCDM file to list of items that get returned and leaving fileset out.

models included in query -- whatever your work is and then PCDM file -- indexer gets them as individual files

Fileset would be adding confusion to WBGH -- they don't have a use case.

Can configuration allow for AIC use cases? We need for fileset to carry metadata. Answer is yes, possible but not simple to do that. However it would be nice to make that configurable with json files, see the plugin working group discussions.

All the machinery required to ingest xml and content -- to be able to go back and build up more technical metadata would be great, but WBGH doesn't have need for UI to handle this. WBGH doesn't have need to put metadata on Fileset. Esme -- it should be pretty easy to override the partial of metadata editing view, an implementation-specific use case.

AIC needs an upload form to add a file to existing Fileset and adding field where you could pick a file and say this is describing that other file would be good. Having files that are different derivatives of same source doesn't mean we have need for files describing other files, also we have heavy metadata on Fileset, taxonomy about department, just like what is in Works in Curation Concerns. We'd need all that transferred to the Fileset for us. We'd want the form to upload a file and use file use attribute to say what kind of file it is.

So we need to build something as flexible as possible to override form partials for these implementation differences.

Adam raised concern about limitation in Curation Concerns that doesn't allow Filesets to have arbitrary relationships with PCDM objects (0..n). You want a Fileset that's part of different PCDM objects -- the problem was PCDM doesn't allow you to have arbitrary number of relationships in Curation Concerns.

In Stefano's proposal, he drew specific diagram with multiple relationships between Filesets so PCDM can have many to many relationships with Filesets. In Sufia we don't have ability for Fileset to be member of different PCDM objects. Fileset and files is direct containment but Fileset and other objects is indirect, seems like it should be fine but don't know what repercussions would be.. Hybox recommendation is that loan document would be an object not a file. So if you have a page and want to use it in some other collection, you need a place to have descriptive metadata and something that can be linked to from outside the context.

So we have arbitrary relationships to do that now, but we want to differentiate between real world object metadata and Fileset metadata. Esme doesn't see PCDM as having anything to do with real world objects, PCDM doesn't have to refer to a real world object.

Their apps (Princeton's) have no real world use cases, but WBGH has both but wants them both to be treated the same. Andrew doesn't think there's a need for a hierarchical relationship between real world and digital objects.

Stefano asks if we want to leave Fileset as carriers of metadata and digital content (whether digitized or born digital), that our relationships are just object to object relationships.

Adam - This model is similar to Lerna and Islandora, where there's a document that governs how the objects are to be treated, which implies an admin role, and admin policies and sets. For archival practices, you receive a gift, you get an object that has the metadata about how to govern the gift, and also has a Fileset that represents that loan agreement. So, not impossible to do this, but seems odd.

The has_documentation could be a very similar use case.

A loan request Fileset object doesn't carry as much weight as donor agreement. One for loan request proper and loan request document. Loan request Fileset has lots of metadata -- this is what policy objects in Lerna are designed to do? They are like what Stefano's talking about, but they make no assumptions that there are any machine readable actionable versions of that document. The APO might be a separate object, though there might be links between loan agreement, object and APO (admin policy objects).

Could anybody draw this? Yes, Esme will try.

So what do we want set as next steps?

More use cases.

Andrew is wondering if we are jumping the gun by going into PCDM, if much of what we could do could be done in Hydra works. We need something that distinguishes between digital content and everything else. But whether this is in PCDM or not is a fair question, and PCDM seems logical place for it to Stefano. Andrew says PCDM is very basic, a simple diagram by design. But at the Hydra works level we have a Fileset, but that thing is lacking the technical metadata and descriptive metadata.

What are real world use cases, leaving PCDM out of the equation? It seems we want PCDM to have the opinion that it represents digital content, and Fileset that it represents real world objects.

PCDM Work, should it have the opinion it's a real world object or should it be agnostic?

If we model real world objects in a different way we don't need to do it at PCDM level. The idea is to keep PCDM as agnostic as possible, this is something that can be done at application level interpretation, totally fine, but not what you'd expect someone else to do. But maybe we want to rethink Fileset name because it might not be accurate for what we want to express.

So action items could be: more real world object modeling, and discussion about what should go into PCDM follows that.

Mark -- we have real world objects in Hybox, but relation to PCDM is not explicit there.
*footnote from Andrew, general clarifying:

The idea is that, when Fedora objects get indexed into Solr, there is a property called “hasModel”, which stores the names of the Ruby classes that represents the Fedora object. Be default, the models that are considered during a Solr query are just “Collection” and any registered Work type — which are any work types that you may have created with using the generator `rails g curation_concerns:work MySpecialWork`.

But if you want files to be discoverable, you have to add the Ruby class that represents a file to the Solr query. Currently, we’re doing that by having Solr also look for `hasModel: "FileSet"`. But given some of the proposed changes, we may also need to tell Solr to consider the ruby class that represents a pcdm:File (although I don’t know what that is off hand).