Further thinking on Collections, Sets & Lists

Discussing the strong basis inĀ  User Collections, Admin Sets, Display Sets with proposed revisions.

Stanford would use the term "User List" instead of "Collection". Ā This communicates the context of its generation, its primary audience and the requirement for ordering. Ā I apply that terminology here.

The differences between the three models are:

  • User List: optional, ordered, user generated, recursively composed, rights independent.
  • Admin: unordered set, mandatory: each work belongs to exactly one Admin Set. Ā Affects rights and description.
  • Display: basically the same as User List, plus some hierarchical facet magic. Ā 

In terms of data modeling, there is one most important takeaway: only the Admin Set affects the work data itself. Ā It imposes a requirement on the Work model, provides default rights and descriptive metadata. Ā Critically, at least one Admin Set must exist before adding any work! Ā The Admin Set resembles our existing management practice.

I would direct attention to two points where the model introduces problematic complexity:

  1. User Lists exposing recursion to end users, and
  2. the Display concept of hierarchy. Ā 

I will attempt to dissuade you from each.

Reasons not to allow User Lists thatĀ reference other User Lists:

  • Logic: once two sets include each other they are mathematically the same set. Ā That is to say, by referencing any other user's List, you are giving them the ability to change what is in your List, published under your account. Ā 
  • Computation: how many queries and subqueries does it take to enumerate the IDs of works in a given User List? Ā Unknown!Ā Is the result cacheable? Ā Never!Ā  Recall that we (quite sensibly) do not want User Lists to affect the index or the repository data, which blocks several strategies for mitigating the cost of recursive dereferencing.
  • User may wish to include theĀ works of another's List, but in a different order. Ā Therefore they must be dealing with their own ordered list rather the sublist.
  • User may wish to include theĀ works of another's List, but with exclusions. Ā A recursive model would be responsible for representing an additional "exclusion" model, and solving the problem of recursing exclusions.
  • User may wish to include theĀ worksĀ of another's List, but with exclusions that don't exist yet.Ā Ā  I.E., I wan't to include another's existing listĀ without exposing my list to updates in the sublist.
  • User may wish to include theĀ worksĀ of another's List, prepares a class presentation based on them and subsequently finds all such works removed! Ā 

In general, the effects of recursive references to end users will be bewildering and unanticipated. Ā Effectively the same recursive composition but with intuitive user-mediated flow would be to allow the users to be notified of additions to sets they have interest in. Ā This is no more than an RSS transform of a query, as implemented in most discovery systems already. Ā Example notification: "2 new works added toĀ @atz's list 'Research'" with a View button and an "Add to List" button. Ā In such a case, each User List is just flat IDs, no computational risk or unexpected behavior. Ā 

Notably, because of the single-membership and the fact you can discern the Admin Set from the actual works data, if you want to make a User List reference Admin Sets, there is no recursion. Ā 

I will not spend long on Display Lists (not Sets, because ordered), only because I think of them as a special class of User List and I think we can safely declare that a 12-level faceted hierarchy is out of scope. Ā The example Biology Dept. > Master's Theses > Year problem is, by my view, unrelated to special set membership and the discussion is probably misapplied to modeling of sets. Ā Are you saying that nowhere in metadata can you identify that a work was a thesis, the departmental affiliation or datePublished? Ā Ideally, those are 3 independent facets built on existing data. Ā Certainly, in no case should datePublished be specific to any kind of work!

Recommendations:

  • Support two main kinds of groups: Admin Sets and User Lists.
  • Lists shouldĀ copy Lists, not reference Lists.
  • Determine if User Lists referencing Admin Sets is really desirable and necessary for this phase.