Further thinking on Collections, Sets & Lists
Discussing the strong basis inĀ User Collections, Admin Sets, Display Sets with proposed revisions.
Stanford would use the term "User List" instead of "Collection". Ā This communicates the context of its generation, its primary audience and the requirement for ordering. Ā I apply that terminology here.
The differences between the three models are:
- User List: optional, ordered, user generated, recursively composed, rights independent.
- Admin: unordered set, mandatory: each work belongs to exactly one Admin Set. Ā Affects rights and description.
- Display: basically the same as User List, plus some hierarchical facet magic. Ā
In terms of data modeling, there is one most important takeaway: only the Admin Set affects the work data itself. Ā It imposes a requirement on the Work model, provides default rights and descriptive metadata. Ā Critically, at least one Admin Set must exist before adding any work! Ā The Admin Set resembles our existing management practice.
I would direct attention to two points where the model introduces problematic complexity:
- User Lists exposing recursion to end users, and
- the Display concept of hierarchy. Ā
I will attempt to dissuade you from each.
Reasons not to allow User Lists thatĀ reference other User Lists:
- Logic: once two sets include each other they are mathematically the same set. Ā That is to say, by referencing any other user's List, you are giving them the ability to change what is in your List, published under your account. Ā
- Computation: how many queries and subqueries does it take to enumerate the IDs of works in a given User List? Ā Unknown!Ā Is the result cacheable? Ā Never!Ā Recall that we (quite sensibly) do not want User Lists to affect the index or the repository data, which blocks several strategies for mitigating the cost of recursive dereferencing.
- User may wish to include theĀ works of another's List, but in a different order. Ā Therefore they must be dealing with their own ordered list rather the sublist.
- User may wish to include theĀ works of another's List, but with exclusions. Ā A recursive model would be responsible for representing an additional "exclusion" model, and solving the problem of recursing exclusions.
- User may wish to include theĀ worksĀ of another's List, but with exclusions that don't exist yet.Ā Ā I.E., I wan't to include another's existing listĀ without exposing my list to updates in the sublist.
- User may wish to include theĀ worksĀ of another's List, prepares a class presentation based on them and subsequently finds all such works removed! Ā
In general, the effects of recursive references to end users will be bewildering and unanticipated. Ā Effectively the same recursive composition but with intuitive user-mediated flow would be to allow the users to be notified of additions to sets they have interest in. Ā This is no more than an RSS transform of a query, as implemented in most discovery systems already. Ā Example notification: "2 new works added toĀ @atz's list 'Research'" with a View button and an "Add to List" button. Ā In such a case, each User List is just flat IDs, no computational risk or unexpected behavior. Ā
Notably, because of the single-membership and the fact you can discern the Admin Set from the actual works data, if you want to make a User List reference Admin Sets, there is no recursion. Ā
I will not spend long on Display Lists (not Sets, because ordered), only because I think of them as a special class of User List and I think we can safely declare that a 12-level faceted hierarchy is out of scope. Ā The example Biology Dept. > Master's Theses > Year problem is, by my view, unrelated to special set membership and the discussion is probably misapplied to modeling of sets. Ā Are you saying that nowhere in metadata can you identify that a work was a thesis, the departmental affiliation or datePublished? Ā Ideally, those are 3 independent facets built on existing data. Ā Certainly, in no case should datePublished be specific to any kind of work!
Recommendations:
- Support two main kinds of groups: Admin Sets and User Lists.
- Lists shouldĀ copy Lists, not reference Lists.
- Determine if User Lists referencing Admin Sets is really desirable and necessary for this phase.