Executive Summary

The Data Mapper Working Group has formed to address a number of difficulties arising from one specific decision: to build a library in the ActiveRecord pattern and place it firmly at the center of the Samvera architecture. This is not to discredit Fedora, Solr, ActiveFedora, or any of the work that has brought the community together. However, this coupled design locks in semantic and technical choices that have proven costly to maintain, difficult to train on, and which limit options for storage and metadata implementation and experimentation.

As preliminary material, the working group has identified specific challenges and an experimental approach to break this coupling to ActiveFedora. An early proof of concept led by Princeton University is the basis for this experimentation. The working group has also defined specific goals and deliverables as follows.

Challenges with the Samvera Architecture

On-ramp time for developers is really high, because of the necessity to understand the entire tightly coupled stack to do work.
Maintaining parity with the ActiveRecord interface for ActiveFedora has proven difficult, and the problem is compounded when trying to experiment with alternative backends.

The interface is too large to replicate for a secondary backend.
ActiveFedora is unable to support transactionality, which ActiveRecord developers would expect to have and are surprised to find missing.

Some institutions feel that Fedora isn’t an appropriate backend for their public-facing repository, but still want to be a part of the Hydra community.

There are different requirements for each institution regarding preservation, external support availability, and redundancy for their repository contents.
Real Use Cases:

Princeton: Managing resources as RDF has proven time-consuming and inflexible. We would like to store abstract resources, but can’t right now.
Avalon: Performance issues on the current stack have made large scale presentation to students and the wider university community problematic.

Some institutions want to store their files in different places, independent of where they choose to store metadata, and can’t.

Real Use Cases:

Princeton: We’ve talked about storing some original files in cloud storage, but haven’t because of the difficulty.

Large experiments around Hydra architecture have proven too hard to implement in the current stack to easily support additional persistence and preservation layers that benefit from larger communities.
Community movement toward Amazon has led to interest in using persistence layers, such as Amazon RDS, to reduce hosting costs.

Goals of Data Mapper Working Group

Determine both feasibility and cost to community regarding the ability to support multiple persistence layers, addressing questions such as:

Will the data mapper pattern provide a simpler set of code to maintain, and if not - why?
Is the support of multiple persistence backends feasible and can they meet community standards for preservation?
Does the Hydra community benefit from supporting these persistence layers with regard providing additional tools to developers and/or reducing the amount of time spent maintaining core tooling.
Can we support multiple binary storage endpoints for a given metadata storage system?

If feasible, determine a path to bring the DMWG’s pattern to the mainstream Hydra community. Potentially inclusion in Hyrax or some other common tool.
If feasible, begin to deprecate ActiveFedora and free up resources from its maintenance.

Deliverables

Generate an MVP repository which can persist to two or more backends.

Agree upon a representative set of features necessary for confidence that this strategy will work in the core of Hydra.
Participate in scheduled week-long development efforts for development.
Recognize and document differences between persistence backends and the impact of use cases on our requirements for each.

On Success

Generate a document of recommendations on how to implement the code in the core of the Hydra stack.

On Failure

Generate documentation on what failed, why, and potential next steps.

Timeline

Work for this group will be done in 1-week sprints, completed by Hydra Connect 2017 (November 6, 2017).

Data Mapper Working Group Charge

Executive Summary

Challenges with the Samvera Architecture

Goals of Data Mapper Working Group

Deliverables

Timeline