Hydra Tech Call 2016-07-06

Time: 9:00am PDT / Noon EDT

Call-In Info: 1-641-715-3660, access code 651025

Moderator: Steven Ng

Notetaker: Jennifer Lindner


Attendees:

  • Peter Binkley  - U of Alberta
  • Trey Pendragon - Princeton
  • Mike Giarlo -  Stanford
  • Tom Johnson - DPLA
  • Jennifer Lindner - Northwestern
  • Stephen Ng - Temple University
  • Esme Cowles - Princeton
  • Adam Wead  - Penn State
  • Corey Harper - NYU
  • Anna Headley - Chemical Heritage Foundation
  • Ben Armintor - Columbia
  • Carolyn Cole - Penn State

Agenda:

  1. Roll call by timezone (moderator)
  2. Call for additional agenda items (moderator)
  3. ActiveTriples 11(?)/RDF 2.0 & Field Ordering (Tom Johnson)
    1. Any upgrade to RDF.rb 2.0 will fail to preserve order of values in an RDF property
      1. See the current proposed upgrade: https://github.com/projecthydra/active_fedora/pull/1104
      2. We could reinstate the old Ruby-Hash/Array based RDF::Repository with an order preservation guarantee... but:
    2. Current assumptions of order preservation are fragile in a number of places: 
      1. The Fedora API/implementation;
      2. RDF parsers and serializers (both Java and Ruby);
      3. The Ruby in-memory datastore (this is the one that's biting us now);
      4. Any external RDF transformations.
    3. Discussion/What is required to make this transition? Discussion below:

Active triples 11, Active Fedora 11 -- Trey has been catching Fedora back up to triples, a side effect has been to move to RDF IO. In RDF.rb in general, not as part of the API, status quo has been multiple triples put in in order comes out in same order. The effect has been -- as an example, with multiple authors, you get them back in same order every time you retrieve them, and people rely on this and expect the order. So in RDF.rb 2.0 -- this order is no longer guaranteed and in fact order is very likely to be reversed in a retrieval.

Basically, things had been really fragile, and we're just uncovering that -- parsers and Fedora implementation are places we'll see this breaking. It's important and good we're seeing it, but we have to figure out how to make it work and to how to communicate it.

Is there way in Active Triples to enforce this?

-- from Trey: yes, two ways, but none of them are [great]. The assumption people have is less that they come out in a specific order and more that they properties come out in standard way -- ordered in same way in each time it's accessed.

RDF List -- use this if you really need order.

It's not that we need order, it's that we need to preserve the way they have been put in. So people will want titles, for instance. You can have a Roman and an Arabic title, which means you can't rely on sorting, and then which one do you display first? We're trusting  -- a lot of our data is in text not in sub-notes, and if one is in Arabic and one is in English we're relying on fact that English comes first.

Sort is the first answer, when we display items on page, this will guarantee same thing on each page -- alphabetically ordered on pages.

For literals, Ruby defines total ordering for us, but I don't know we have sensible sorting of Active Fedora triple based objects. It doesn't really matter because for display we're dealing with solrized documents, so just serialized terms. There's sorting in PCDM, however that's a complex solution.

If people want to preserve the order of property the way to do that is with RDF List. The problem is we use Fedora and so we don't get RDF List, we'd have to impose something of our own. Blank nodes is how to do RDF lists in Fedora and the default is not what we want, so we'd have to customize that.

In active triple, list and tests showing them not working is problematic.

Is there a way that's better than RDF List? You have to do an ordered list some way or another -- problem is Fedora doesn't do lists -- you'd need to make lists first class objects in Fedora and not blank nodes, and then you'd serialize them. I think it should be raised but not in scope for this issue unless we decide it needs to be.

A PR is in, if it goes through, then right now as things are, if you edit a Curation Concerns object and save it again your show page will look different. You'd need to impose order on the CC show page.  We should figure out ordering down the line, and agree on imposing on show page as out the short term solution. Curation Concerns would need a major release for this change.

What's best way to find out if we're all okay with this? A lazy email to all? Trey is happy to draft email to ask all if we agree to CC show page changes and major release.

But, the CC code is dependent on a an Active Triple branch that needs to get merged. So all of the the changes needed to address this won't happen this week, but can be done.
Sufia 7.0.0 release -- release before this or after? As 7.0.0 is out, before is what Mike Giarlo is in favor of, but also could not keep pointing Sufia at the latest dependencies. There are options.

 

  1. Moderator and notetaker for next call (moderator)

Next call: July 13th (moderator)

  1. Moderator: cam156
  2. Notetaker: Benjamin Armintor