Reversing Collection Membership

Background

In Hydra::PCDM, Hydra::Works, and CurationConcerns, Collections link to their member Objects using pcdm:hasMember. Ā In Fedora 4, this is implemented as an IndirectContainer which contains a proxy for each member Object. Ā When the RDF for the Collection is retrieved, those proxies are followed to the member Objects, which are used to generate the membership triples (<col1> pcdm:hasMember <obj1>).

Processing each individual proxy takes only a few milliseconds, but since Collections can have thousands or tens of thousands of members, that can add up to 30+ seconds to retrieve a Collection with 10,000 members.

SeeĀ Real World PerformanceĀ for more information on the underlying issue and discussion about it in the Fedora context. Ā It's also worth noting that there is ongoing performance work in Fedora, which may improve the performance of listing members. Ā There may also be other ways of working around the performance issues, such as background indexing.

Using pcdm:memberOf

PCDM provides reciprocal predicates for linking from an Object to the Collections it is a member of, (<obj1> pcdm:memberOf <col1>). Ā Reversing Collection membership to link from Objects to Collections avoids the issue of having a large number of proxies to follow, since an Object is typically only a member of a small number of Collections. Ā Using memberOf Collection membership for the 10,000 members discussed above resulted in a Collection and individual Objects that could all be retrieved in less than a second.

One downside to using pcdm:memberOf is that it is harder to get a list of all the Objects in a Collection. Ā This will typically be handled by indexing all of the Objects in Solr, including their Collection membership. Ā This provides a fast way to list the Collection's members. Ā If the Objects aren't indexed in Solr or the index is inconsistent for some reason, the Collection can be retrieved from Fedora with the InboundReferences Prefer header. Ā This is slow, but would typically only be needed to rebuild a Collection index.

Implementation

A proof-of-concept implementation is available demonstrating using pcdm:memberOf to link from Objects to Collections:

This code is available as a starting point for adoption in your applications.

Outstanding Questions

  • Does this replace hasMember Collection membership, or should we have both kinds of Collection membership?
    • For example, if there are some scenarios where one Object might be present in a large number of Collections (such as a popular item in many user Collections).
    • Ordered collections might need to use hasMember in order to have a place to encode the order.
  • If we keep both kinds of Collection membership, should the two kinds of members be grouped together, or kept separate?
  • If separate, how do we clearly label the different types of Collection and member lists
    • HasMemberCollection vs. MemberOfCollection?
    • UserCollection vs. DisplayCollection?
    • Something else?
  • If Fedora performance improves, do we keep using hasMember Collections? Ā Or are there other reasons to have memberOf Collections?