2017-02-09—FileSets WG Meeting

Date and Time

February 9, 2017, 2pm EDT

Connection Information

Google Hangouts:  https://hangouts.google.com/hangouts/_/artic.edu/pcdm-filesets

Moderator: scossu

Notetaker: Jennifer Lindner

Attendees

Agenda

  1. Notetaker?
  2. Review last week's action items (see below)
  3. Comment on Drew's validation code
  4. Comment  on sample configuration
  5. Next steps

Minutes

https://gist.github.com/afred/14b0ffda000cb43064dfe3a849c52513
https://gist.github.com/scossu/b512326a2e121527ac5ff6a4f04fcd10

Here's an example of a file that QA could use to validate against, whether to allow empty value for pcdmuse, and then types, that have label, uri, required, multiple files for printset, etc.

For uri, you'd possibly do a reverse-lookup, in Fedora you'd need to use a RDF URI object... thinking we'll just one type per file, to allow more gets too complicated.

There probably should be some validation of the configuration itself (Your uses.yml shouldn't override things that are determined by PCDM.)

A big question: the validator is part of Hydra Works but would rely on QA which is in Sufia, so do we push QA further down the stack into Hydra Works? Or does the validator need to be more generic? Although that would require configuration on the part of the user. It's an interface that you need to implement, modeled after existing validators, gets called and run on things that have RDF type, has a value that comes from controlled vocabulary that comes from PCDM .. and where that gets plugged in is an open question.

Next steps  -- find a place to put it in Hydra Works, find place in hydra works where other validators are getting called and add whatever is needed to include this validator.

In discussing the right place for validator - it needs to know that it's been given a fileset, and what it can do with filesets, so since it needs to know that anyway then this is a logical place for it, the question is still if you have a validator high in the stack then you need to tell lower level things down in the stack which validator to use.

Let's ask the community, if no one is upset about using QA, we could make a working PR and see what people think, it just needs some unit tests and to work to get to that point.

What can Stefano (or someone else) do to prepare terrain for development? For example the flow chart. Yet, as developers we do work in chunks at each layer of the stack, at each gem level. Andrew suggests a hydra works validator pr by next meeting.. Would you be able to set a type a file that's still in the current work structure?

This validator .. Hydra Works is probably already using original file as pcdmuse, and we want to first make that a choice, and plug in our PR that lets you choose other files for pcdmuse and run tests on that.

For tests and PR we wouldn't want an external dependency so we'd use QA in a yml file that would become a fixture.. QA goes out to web and grabs controlled vocabularies, so the question is does it only get PCDM vocabulary? The point of QA is to never need to maintain a list of vocabularies.. Our design question is whether to maintain list of vocabulary locally. Do you decide that pcdmuse terms are always fine for your application? Do we want automated vocabularies from the web or do we want to maintain our own lists? QA can be configured to use a local or external terms, isn't that right? It is. People who are implementing QA and pcdmuse as one of their controlled vocabularies will assume it's coming from the web. So QA is flexible enough for us to use, but whether to have pcdmuse as default, and if so, that implies a dependency on an external service from the web, not necessarily easy or desirable for each institution (like AIC).


With this uses.yml file, we're conflating two concerns - one is providing terms and the other is validating them. One file for the list of terms and another that provides validation -- QA would look at file and provide terms, pcdmuse validator would validate them. But having one list does keep it east for configuring.

We'd need to maintain our own validation rules so we'd need to maintain local list of rules to do that.

Takeaways and main points left to discuss

  • Questioning Authority and use types
    • We can use QA to automatically pull PCDM use terms from the web
    • Alternatively QA supports hardcoded URIs that are maintained manually
    • Pros of using QA:
      • No need to maintain the use type list internally
      • If terms change, these are automatically updated in the application
      • Flexible enough to support one-off terms and non-derefer4enceable URIs not part of a shared vocabulary
    • Cons: 
      • Extra coding required
      • Probably not worth the effort since the list of use types is not huge (less than 10 terms in most cases? pcdmuse has currently 7 terms)
      • We still have to maintain a config for validation and other things (e.g. human readable label) in which each term is referenced
      • One may want to use a subset of a vocabulary
      • Some scenarios may need extra setup (e.g. an app running in an internal network with no access to the Web needs a firewall hole to retrieve terms)
    • (note: this is from Stefano's biased POV, so edit accordingly) 
  • More developers needed (all on Drew's shoulders right now)—plan broader engagement
    • A session at LDCX would be great; several in this group are going

Previous Action Items

    • Drew - demonstrate progress on metadata-as-files branch of Hyrax
    • Stefano - demonstrate using QA to provide options for pcdm:use

Action Items

  • Jennifer Lindner contact Andrew Myers next week about some coding on the validator.
  • scossu Draft generic wireframe for UX, what the view, create and edit pages should look like and provide
  • Andrew Myers (question) isolate belongs_to behavior and apply it AF File, rather than AF Base. (Scalpel please.)
  • Andrew Myers scossu propose this for good area for dev time for the LDCX conference coming up