RDF Design

RDF in Hydra Design

RDF Summit meeting in October to talk about RDF roadmap for Hydra (See Karen’s lightning talk and notes from Hydra RDF Summit on DuraSpace Wiki)

RDF Working Group
Mike Giarlo
Jon Stroop
Adam Wead
Esme Cowles
Anjanette Young
Anusha Ranganathan
Alicia Morris
James Van Mil
Karen Estlund
Tom Johnson (chair)
Corey Harper
Matt Critchlow
Christian Ertmann-Christiansen
Ad Hoc - Bess Sadler (workshops/training)

RDF and ActiveFedora
   •   Redoing parts of RDF branch
   •   Need to pass tests now
   •   on RDF-noreply, https://github.com/no-reply/active_fedora

Issues/Patterns
   •   Persistence triple in the datastream gets persisted in Fedora but anything about the linked data is pushed off to a triple store (e.g. label or other upstream data about a term). Nodes that are deeper than one deep by default persist to parent data stream. Oregon is using Reddis for the triplestore right now.
   •   UCSD put those in the repository rather than a separate triplestore, and UCSD has all data streams in triplestore, so they get recursive retrieval and the object and all the other triples that it is linked to. Fedora 4 may work that way in some convergence?
   •   For Curate, need a way to have a link and look it up somehow
   ⁃   Graph nodes as metadata objects in Fedora
   ⁃   Instead of putting “Mike Giarlo” as the name, look up to see if a Mike Giarlo person exists. If exists, puts in the URI. If doesn’t exist, makes up a new URI and creates a new Mike Giarlo person.
   •   Another way is to denormalize for Solr
   ⁃   Would do as a background job
   ⁃   Sufia has a queuing implementation that is agnostic (Order Up gem)
   •   Oregon fetches data
   ⁃   Ruby object has RDF subject of some URI and has a fetch method to go and get it with LD gem and extract RDF it can pull out
   ⁃   Punted on the ActiveFedora side how/when it gets called
   ⁃   Want a community pattern and maintenance needs for that (How often check LCSH)? Would it be vocabulary specific?
   ⁃   Other pattern
   ⁃   If single source document, we have a pattern to pull whole source document and through that into the triple store and build triple objects off of it

Low hanging fruit?
   •   Questioning authority is main way to pull from large vocals
   •   Oregon controlled vocab code - Active Fedora or gem? (heavy dependencies)
   ⁃   Assumes source document has some sort of dump and generates a list of terms
   ⁃   Assumes search-end point
   •   Would need to gemify or pull controlled vocabulary into Active Fedora before a quick way to get people into RDF
   •   Core RDF QA class is kludgy
   •   Should small vocals just be indexed in Solr? Bigger vocabs requires a solution for things that don’t have open source already
   •   No typeahead works well with SPARQL for anything over 10,000 terms
   ⁃   Index whole thing in Solr
   ⁃   Doesn’t make sense to have everyone do it, maybe run a QA endpoint service as a community?
   •   Trever at last code4lib may have a relevant code base

How can Vocab providers make Vocabs more useful?
   •   Provide REST API
   •   Solrize vocab
   •   Publish as SKOS

RDF and LD aren’t the same
   •   Royal Library of Denmark has gotten data online
   •   Looking at giving URIs to things and if so what things?

Scope of RDF Working Group
   •   Recommend good patterns for modeling
   ⁃   with roadmap from easy to complex (e.g. PROV-O, http://www.w3.org/TR/prov-o/) with core set of classes
   •   Conceptual design separate from all specific schemas
   ⁃   See UCSD data model on github, https://github.com/ucsdlib/dams/tree/master/ontology
   ⁃   UCSD data dictionary, http://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html
   ⁃   RDF Working group should help revise and make more into the PROV-O type layered model!
   •   LD and/or RDF?
   •   One of the roles to help institutions figure out how to model this stuff
   ⁃   E.g. how to work with legacy data
   ⁃   Recommendation migration paths for things like MODS
   •   Document stuff!
   •   Simple tools needed
   ⁃   Authority services
   ⁃   Model forms and relationships with external classes are causing issues for UCSD
   ⁃   Will some of the ActiveFedora changes that Tom J has been working on help?
   ⁃   Will Questioning Authority help?
   ⁃   Oregon Digital tools for RDF data stream and ActiveFedora rewrite will be ready in about a week or so, https://gist.github.com/no-reply/7803282
   ⁃   Testing with list issues and nested attributes in process
   ⁃   Will not be a problem for those who have implemented RDF data stream
   ⁃   Need recipe/ tutorial to incorporate into RDF tutorial
   ⁃   Needs more documentation and would be good to get a new person testing it to help with documentation

Justin C explains some of the problem
   ⁃   How do you get a form to handle complex classes and nested attributes?
   ⁃   Like how to create a form for MODS?
   •   Will there be application specific forms though? Data and forms separate.
   •   When you add / edit things causes issues
   ⁃   Add to bud tracker!

Way Forward
   •   At LibDevConX (now at code4lib), run through RDF tutorial as a session and also fix documentation, https://github.com/projecthydra/active_fedora/wiki/Tame-your-RDF-Metadata-with-ActiveFedora

HydraCamp / Dive into Hydra
   •   OM or RDF?
   •   Don’t use OM for new data, persist as RDF
   •   OM useful for reading old metadata
   •   Using RDF as a default would be a easy nice way to get people started for just simple starts and then do more complex either OM or RDF
   •   Don’t need OM unless designing a terminology
   ⁃   Designing your own, you deserve pain

What about those lingering XML
   •   XML output from FITS and other technical metadata
   •   (Other fun vocab Media Ontology from W3C, http://www.w3.org/TR/mediaont-10/)

What is the community recommendation for Hydra?
   •   Is it true Sufia/Curate(Hydramata) is the way forward?
   •   Not clear in Hydra documentation
   •   Or hybrid with generic gems

Working Group
   •   Conversations on hydra-tech until obnoxious
   •   Tom Johnson as chair

Action Items
   •   Created tiered ontology with documentation
   •   Fixing ActiveFedora tutorial and field test at code4lib then fix documentation
   •   List out tools and publish
   •   Tom Johnson get people together for next conversation in Feb. or start email conversations
   ⁃   LibDevConX agenda for Group TBD at Feb. call