Hydra::Works Shared Modeling

Note: This work led to the implementation of Hyra Works, Hydra PCDM, and eventually the Hyrax solution bundle. For the finished data model, see Hydra Works Data Model.

Hydra::Works is a flexible, extensible domain model that is intended to underlie a wide array of Hydra repository applications in order to facilitate collaboration across projects.  Hydra::Works emerged from a discussion at Hydra Connect 2 in a session about the various community solutions to institutional repository-like applications – such as Sufia, Worthwhile, Curate, and Hydrus, and including some non-IR-like apps like Avalon – during which we learned that there was broad interest in an IR-like solution bundle that combined the best features available in these applications (see notes from that session).

The first step towards building this solution bundle was to come up with a domain model that is designed to work for use cases across these distinct approaches. A follow-up discussion was held during the October 8th Hydra Tech call and it was decided to create a GitHub repository which is holding most of the discussion so far (deliverables from GitHub will ultimately live on this wiki). We are gathering use cases until October 22nd for the first phase of development, so please see here for sample use cases. If that timeline doesn't suit your needs, there will be opportunities to get involved later – like most of our work, we will continue to iterate upon and improve Hydra::Works over time.

After we've reviewed the initial use cases, development can begin on an alpha version of Hydra::Works. The actual merger of Sufia and Worthwhile can take place between Q1 and Q3 of 2015, which was the timeline that most of the attendees of the Sufia Futures discussion at Hydra Connect felt best able to work within.

The process for the alpha version of Hydra:Works will be to build a base model prototype that will cover two base use cases:

  1. User contributed document.
  2. Book w/ ordered pages.

See gist:  https://gist.github.com/jeremyf/4ac46163af876673e5a7

https://github.com/projecthydra-labs/hydra-works/tree/hydra-works-interface

An application profile will also be created as a human readable schema to be used as a data dictionary.

See gist:


See here for the work in progress: Hydra::Works

Diagram

Preliminary Definitions

GenericWork

A GenericWork is an intellectual entity, sometimes called a "work", "digital object", etc., with the following attributes:

  • Identifiable - has a unique identifier
  • Collectible - can be added to a GenericCollection
    • Are Collectibles sortable within a Collection?
      • Depends on the Collection implementation – some Collections are unordered sets, but others are ordered lists.
  • Describable - can have descriptive metadata
  • Composable - can be composed within other GenericWorks
  • Orderable - the Works and/or Files within the Work can be given an order
  • AttachmentReady - can have GenericFiles attached
  • Accessible - can have access control metadata assigned to enable/restrict access

GenericFile

A GenericFile is a sequence of binary data, with the following attributes:

  • Identifiable - has a unique identifier
  • Accessible - can have access control metadata assigned to enable/restrict access
  • Attachable - must be associated with something AttachmentReady
  • Payloadable - can hold a sequence of binary data or "bitstream"
  • Characterizable - can have technical metadata describing the bitstream, digitization process, etc.

GenericCollection

 A GenericCollection is a group of GenericWork records:

  • Identifiable - has a unique identifier
  • Collectible - can be added to a GenericCollection
  • Orderable - the Works, Collections and/or Files within the Collection can be given an order
  • Describable - can have descriptive metadata
  • AttachmentReady - can have GenericFiles attached
  • Accessible - can have access control metadata assigned to enable/restrict access

See also User Collections, Admin Sets, Display Sets and  Further thinking on Collections, Sets & Lists for more description of different types of collections with differing rules for duplicate entries, ordering, etc.

Use Case Summary

A review of the use cases submitted as issues or pull requests to the Hydra::Works Github repository found that virtually all use cases were compatible with the consensus model, and fell into a few basic scenarios:

  1. Works with differing levels of component structure

    1. Works without components

      1. Structure

        1. Work (with optional file(s))

      2. Examples

        1. #14 User-contributed work

    2. Works with a single level of components

      1. Structure

        1. Work (with optional file(s))

          1. Component (with optional file(s))

      2. Examples

        1. #8 CD with tracks

        2. Newspaper with articles

        3. #9, #10, #22 Book with pages

        4. #23 Photograph with front/back

        5. poster, postcard with front/back

    3. Works with a multi-level hierarchy of components

      1. Structure

        1. Work (with optional file(s))

          1. Component (with optional file(s))

            1. Component (with optional file(s))

              1. etc.

      2. Examples

        1. #11 Research dataset

        2. #23c Book set or multi-part manuscript

        3. #23d Photo album

        4. #29 Event

  2. Collections of Works with differing notions of collection order and membership

    1. Unordered sets

      1. Structure: Works are unordered, and may only appear once

      2. Examples: #17 Playlist, bookmark list

    2. Ordered lists

      1. Structure: Works are ordered, and may appear more than once

      2. Examples: #18, #26 User collections

    3. Admin sets

      1. Variation of unordered set where each Work is required to belong to exactly one admin set.

In addition to these patterns, there are also a few variations that apply across the different structures:

  1. Constraints

    1. Number of children may be constrained (e.g., a Work representing a postcard, coin, etc. may allow at most two Components).

    2. Works may be required to have exactly one of a certain kind of collection (i.e., Admin set).

  2. Ordering

    1. Contained items (Components within Work/Component, File within Work/Component) may have a single order denoted by a sort property (e.g. mods:partNumber).

    2. Any item that contains or links to other items (Collections linking to Works, Works/Components containing Components/Files) may have any number of additional ordering schemes.

 Points for Further Discussion

  1. Files may be references to a portion of a File attached to parent Work/Component (e.g., track linking to section of parent CD’s audio file).
    1. This relationship is different from the usual Work/Component relationship to a File, since it cannot contain the File.
    2. How is the portion specified, and how does that differ for different kinds of content (byte range? time range for audio/video? bounding box for images?).
    3. UNRESOLVED
  2. Files may be references to external content (e.g., external datastreams, streaming server).
    1. Does this kind of file behave in the same way as an internal file?  For example, if the external reference is managed by the repository and presents the same API, does the model even need to provide for this difference?
    2. If the behavior is different, do we need to have a new kind of entity for this scenario?
    3. UNRESOLVED
  3. Collections may have links to Files (e.g., thumbnail)

    1. Is this relationship the same as Work/Component containing Files? 
    2. Or is this a different kind of relationship more like a property of the Collection?
    3. RESOLVED: A collection can contain files.
  4. Works may contain other works (e.g. certain kinds of events/exhibitions/etc.) – probably linked and defined in relations rather than the Work containing other Works.

    1. Is this relationship similar to Work containing Components?
    2. Or is this a different kind of relationship more like RELS-EXT inter-object relationship
    3. RESOLVED:  A work will have membership in other works.