Hyrax/Fedora 6 technical discussion
Attendees
@Arran Griffith
@Dan Field
@Nick Steinwachs
@Heather Greer Klein
@Daniel Pierce
@Randall Floyd
Agenda/Notes
Is there still a noticeable difference in performance between Fedora 6 and Postgres? What are the problems?
People want Fedora 6 for the preservation features, but they are hesitant about the perceived performance issues.
All the storage options have something in common – all using LDP (linked data platform, also an ontology) and RDF (resource description framework, ontology for linked data) gems to speak Fedora language. This is how Valkyrie works. Postgres does not have that translation layer in the middle, which makes it always faster. Valkyrie to Postgres is just database.
Not clear how else to talk to the Fedora API without this. Even with the Fedora 6 API, it inherently has to have that middle process.
Daniel helped ID in a performance testing outlier that you can cache and reduce the number of times you reinterpret, to increase the speed.
Cannot be eliminated, only optimized.
Putting metadata and storage into Fedora is now a choice. Can put one or both into Fedora 6. Can optimize them differently.
If a user puts storage and metadata into Fedora with Hyrax/Hyku, is that replacing Postgres? Is it going directly to Fedora? Yes, into RDF predicates. Needs a database in Hyrax, but the metadata is not going in there. The Solr reflection if it has that indexed. If it is done well it could be walked back out if your indexing is correct.
For performance reasons, you’re hopefully always getting data out of Solr, but that is a dirivitve of Fedora which is your source of truth.
Performance for ingestion - how much better is it for Fedora 6? Can we find a way to get from Fedora 4 to 6 quickly? Are there best practices, is there a way to get Postgres level ingestion performance?
Another barrier that is historical. Notion that Fedora was so bad and unworkable that we need to just go to a database. People found that usually it was the application stack doing repetitive things and Fedora was not the sole blame for that performance. There is an assumption that Postgres = fast.
People want to persist data in OCFL. Want to give what they want, but without performance losses/only marginal losses in exchange for the data persistence.
Will never be 1 to 1 with postgres, but there is something you get in exchange
Postgres in Valkyrie gives a JSON ‘blob’ and the valkyrie resource objects are included. Membership part in particular causes slowdown in Fedora 6. Have discussed changing this to also be a JSON ‘blob’.
Have seen Fedora simple search API unimplemented ancestor/descendants, and this could be beneficial here with the way Hyrax makes use of things and where it slows down. A quick way to get membership and their order without having to walk anything to get it. https://wiki.lyrasis.org/display/FEDORA6x/Simple+Search
There is a difference between membership and desired order. Forcing order into linked data is another dimension to store and process. All happening on the application side. PCD layer is painful and expensive in Fedora 6 to keep things displaying in order.
For trying to solve performance: how expensive is it to have both? A whole persistance layer with a 1 to 1 copy synced in postgres. That idea is something Valkyrie is built to do. Could do storage in both. Could be a background job. Another idea is use postgres on the metadata side and Fedora on the storage side. Do they need the RDF stuff?
Confirming OCFL intention is keeping metadata and storage together to be recreated together. Some have original in another place (tape) but derivatives in Fedora.
Application stack quickly as fast postgres, but reflect that in a true preservation process at Fedora 6. That has been discussed many times as the ideal.
Want a confirmation that things are successfully preserved, but this would be a useful workflow.