2016-05-27 Derby, LD4L & Hydra
Time: 9:00am PDT / Noon EDT
WebEx Info: Join WebEx meeting - Meeting # 649 923 648 , Meeting password: HTig16-5-27 (Hotel-Tango-igloo-golf-one-six-dash-five-dash-two-seven I'm not sure you need the password.)
Audio Connection: Computer, or 1-855-244-8681 Call-in toll-free number (US/Canada), or 1-650-479-3207 Call-in toll number (US/Canada)
Moderator: E. Lynette Rayle (Cornell)
Notetaker: Sheila Rabun
Attendees: Lynette Rayle, tamsin woo, Sheila Rabun, Corey Harper, James Griffin
Agenda:
- Next Call
- date/time: 2016-06-24
- Moderator:
- Notetaker:
- Call for additional agenda items
- Neo4J - POSTPONED
- Introduction to Derby - Tom Johnson
- Derby has facetious tagline (Derby is mostly a logo)
- Short blog post that Tom wrote when we first put Derby up
- Fedora API discussions – Hydra dev congress at UCSD
- Tom was in the back of the room hacking on quick implementation of Fedora
- Tooling for LDP already existed
- Process for implementation that runs pure ruby was quick, mostly putting a logo on it
Derby = to provide testing ground for fedora API spec, drop in replacement for fedora reference implementation in testing client software - Quickly became goal in SD to be able to do this with active fedora and have ruby implementation so test suite didn’t need to spin up actual fedora instance
- Fedora API discussion will influence this
- Different from ref implementation – handling certain aspects of LDP – Derby runs directly over quad store backend, configure some sort of RDF repo behind it, Derby runs in memory out of the box (fine for testing, but there is a need for high quality prod ready for RDF LDP in general)
- Plus side is that some of the restrictions that fedora places on kinds of triples that can go into LDP RDF sources are not there in Derby – you can put any LDP you want into Derby – simplifies some of the questions regarding resources in triple store vs elsewhere
- You can use triple store along non-LDP RDF, can also put arbitrary RDF resources in
- Use case – test suite for ruby LDP client now runs directly over Derby, used to be missing some important tests. By the end of the UCSD meeting, they had the client library running directly against LDP implementation in the test suite itself. Mock data store. Anticipate being able to do this for active fedora too. But there is some uncertainty about which parts that active fedora relies on will be part of the spec. Wen you run test suite against derby instead of fedora, you can get a better sense of what fedora does that active fedora depends on that is not base LDP. Derby is just LDP. Its possible that this will mature into something more sophisticated. If this looks like a good decision outside of test suite, what do we need to do to get it to a good point?
- Derby – github code: https://github.com/fcrepo4-labs/derby - mounts RDF LDP https://github.com/ruby-rdf/rdf-ldp
- RDF LDP Library is suite of middleware for rack? Easy to put up server, ships with lampray (dummy server) not a lot of difference between lampray and derby, other than logo. Middleware punts all http interactions to rack. You could put this behind any rails server
- Derby tracks API discussions – as Fedora spec gets bilt out, Derby would be testing round to see if it’s implementable without fedora backend assumptions. What is client software actually depending on?
- No active triples dependency. All active triples in hydra lives in client side of LDP stuff
- Potentially long term replacement for various jetty services that exist in test suite. Potentially useful to active fedora as testing ground.
- As base LDP implementation, they were able to identify “bugs” or assumptions in active fedora that aren’t in LDP and aren’t necessary. Patched to make active fedora more general.
- Trying to figure out which things are fedora specific and which are more general LDP.
Here's the info on the Fedora API Spec Tom's mentioning: https://wiki.duraspace.org/display/FEDORAAPI/Fedora+Specification
- LD4L & Hydra - Lynette Rayle
- LD4L (linked data for libraries) – grant for past 2 years (LD4L 2014) Cornell
- Also LD4P (linked data for production) – practical systems up and running, production of metadata
- LD4L labs – research oriented, new tool development
- Work with Hydra community via grant
- Built-in feature checklist chart
- Typical feature chart when choosing between different app levels (PCDM, HydraWorks, Curation Concerns, Sufia)
- Lynette Rayle please add link to slides
- First grant period -
- ontology gems – ORE ontology (annotations) being used by AF in new PCDM Hydra Works implementation to organize resources where a collection has a bunch of other collections or works inside it (aggregating process)
- open annotation – annotation ontology (freeform comment, tags, semantic tags)
- comment can be “myannotation.commet” and then add text
- FOAF – interaction with ontologies using object implementation that AF does
- Second grant period:
- Original gems in AT, no ability to add gems?
- Extend existing ontology gems to work in Hydra framework
- Extend FOAF, open annotation, metadata ontologies (BibFrame, MODS, etc.)
- For example instead of defining ontologies yourself, plug in gems to do it for you…
- What would it take to have Hydra stack over a triple store?
- Implementation that would allow hydra users to plug in something to write straight to triplestore
- Goal = put RDF metadata in triple store and binary blobs still in fedora
- Vitro as metadata extension?
- Limitations to what triples you can put in fedora
- Base properties + additional statements about resource, or put resource in object position (moving that functionality out into Vitro) – right now any object has to be represented in Fedora?
- Allows working with same metadata without having to enter it multiple times
- Questioning Authority gem extension
- Convert string to URI for storage, and converting back to string for form editing
- Using LDF for caching – external authorities aren’t always reliable – local caching would be nice
- Entity finding tools for disambiguation during data entry
- At developers meeting in May, Lynette did informal poll:
- Working with LD authorities is top winner
- Hydra stack over triple store is second priority
- Linked Data Authorities – extending the questioning authority gem
- Work through fall 2016, re-evaluate after, maybe looking at triple store piece after
- Punted hydra stack over triple store
- Proof of concept:
- Active Fedora uses Active Triples to store graph inside Fedora object
- Since AT object is there already, just write what’s there out to a triple store
- AF currently uses AT 0.7, and in 0.8 there were major modification for AT for persistence strategies, in 0.7 that does not exist
- AT did not allow for changing repo that things are being written to
- Lynette modified persistence of records – created blazegraph repo and tried to write out to it – it worked! Also could read it back in!
- Structural properties needed to persist in triple store is already in place
- Full implementation – there needs to be more looking at where AF is actually writing out LDP calls – instead of writing out to F, write to triple store instead
- Would like to see AF 0.8 to make it easier
- Had to save to Fedora before writing out to AT resource – needed to save first to keep from missing a bunch of info
- If we can get AF on a recent version of AT, we could use the transactions interface in ruby RDF 2.0 to basically buffer the changes that you have and commit them at the same time, save in AF could become a commit to AT database (could work with blazegraph, but would need some work to the gem)
- Proof of concept:
- Use cases – coming from broader community
- Desire to have hydra stack work over triplestore
- Desire to have triples in a triplestore that could be searchable as an endpoint
- Need ability to search across triples (not easily done in Fedora)
- Corey – currently using fedora as backend for metadata creation, not preservation. Normalize metadata and be able to export. Interested in blazegraph as better backend…
- Other thoughts?
- Moving AF to be over AT 0.8 for persistence strategies ( would work similar moving things into triplestore instead of Fedora)
- In ideal case – active fedora persistence would be thin layer of persistence strategy to provide abstraction, then possible to use non-fedora specific LDP persistence strategy - more flexibility in plugging in different targets for data
- More re-writing of AF to accomplish this – flipping relationship – AF stores AT resources à AF would be more dependent on AT to do the persistence
- This would take a lot of work
- Requires re-structuring of AF itself
- Approaches to putting Hydra over a triplestore - discussion in remaining time
- What Next?
- Ideas for next time =
- blazegraph gem, works but needs work
- would be nice to have set of choices around these gems
- deeper dives into some of the working parts (LD4L labs stuff?) using blazegraph gem to handle persistence
- Lynette has documentation for persistence with AT (doesn’t really go into Blazegraph) – knowledge of how AT works in order to do things
- Fedora is not in the picture, writing straight to Blazegraph
- Lynette has documentation for persistence with AT (doesn’t really go into Blazegraph) – knowledge of how AT works in order to do things