2016-07-29 BlazeGraph gem, ActiveTriples Parenting Strategy, QA LOD
Time:Ā 9:00am PDT / Noon EDT
WebEx Info:Ā Join WebEx meeting - Meeting #Ā 642 228 154, Meeting password:Ā HTig0729 Ā (Hotel-Tango-igloo-golf-zero-seven-two-nine Ā I'm not sure you need the password.)
Audio Connection:Ā Computer, or 1-855-244-8681 Call-in toll-free number (US/Canada), or 1-650-479-3207 Call-in toll number (US/Canada)
Moderator: E. Lynette Rayle (Cornell)
Notetaker: Ā Corey Harper (NYU)
Attendees: Ā Lynette Rayle,Ā Corey Harper,Ā Anna Headley,Ā tamsin woo
Agenda:
- Next Call
- date/time: 2016-08-XX
- Moderator:Ā
- Notetaker:Ā
- Call for additional agenda items
Deeper dive into BlazeGraph gem (Tom Johnson) (https://github.com/ruby-rdf/rdf-blazegraph)Ruby RDF -- Ongoing gap between high-quality rdf repo backend connecting to remote, persistent repo is hard to find
RDF Blazegraph tries to fix that: https://github.com/ruby-rdf/rdf-blazegraph
Limited success
RDF Repository relies on standard ruby ennumerable interface
Hard to implement -- ennumerable runs over _every_ statement -- maybe millions
Doing it over remote repo requires streaming statements:
https://github.com/ruby-rdf/rdf-blazegraph/blob/develop/lib/rdf/blazegraph/repository.rb#L27
"each" in this example doesn't scale.Ā
No way to talk about blank nodes efficiently
We have this problem in _every_ triplestore we've tried
If you write bnodes, and read them back, the scope is diff, so the ids don't cross from write to read
You end up with new bnodes every time
Therefore, you can't edit blanknodes.
THis is a blazegraph jira issue: https://jira.blazegraph.com/browse/BLZG-1434
Active convo -- even in last 24 hours
RDF Repos try to get around this with sparql, but it doesn't scale well
LDP refers to "pathological graphs", where you can't refer to bnodes unambiguously
Repository class & rest client are the two ways to work with this
Repository interface is the standard repository interface.
Drop in replacement for in memory repository interface, but with performance caveats
Q: Tested with rdf2.0?
A: WIP branch that is up-to-date and maintained with rdf2.0
- Understanding ActiveTriples Parenting Strategy (Lynette Rayle) (https://gist.github.com/elrayle/11898117572445a15c4a)
Repository Strategy -- things just go in the repo as usual
Parenting strategy -- "This thing has this as a parent."
Can be nested and it will keep going up the chain until it gets to a top-level that has repo strat
When you save a thing that goes up to a particular parent, it saves everything in it's whole parent chain...
Talking through the examples at: https://gist.github.com/elrayle/11898117572445a15c4a#examples
DummyResource is topmost, and has a child and a grandchild
First example sets each to have repo strategy.
* Manually set each child on it's parent
* What you get before and after persisting on various objects
* Question for Tom on why you get cr:type on pr's triples, it's because it's cached at the ruby object level?
* When "resuming" (reading back into new ruby objects)
* And when destroying. Note that destroying a child doesn't remove triples that refer to it from parent
* This is "by design", but merits further discussionNow we do the same thing with parent strategy
* We pass the parent resource when creating objects
* Now we have methods that can trace through the various ancestors
* Still setting child properties
* Now dumping parent gives every statement on the ancestor chain
* Some questions about up-and-down the higherarchy.
* Tom: This is going to change with PR224
* Changes in 224, the parent strategy object approach changes:
* Now: having it's own graph that it tries to persist up
* Post 224: Transaction strategy.
* Persist on grandkid, post 224 it executes a transaction buffered in gp will execute into cp
* So post 224, add a statemetn to a gp, it's only in gp
* pesist gp, and it pushes it into cp, but no further.
* So basically the parent strategy example in the gist is inaccurate post 224.
* This needs to be reworked into actual documentation.
* Note that when resuming in parent staretgy, you resume pp, the only triples there are the triples for pp.
* resume cp, you only get the triples you expect for cp. And it's strategy changes from parent to repo.
- POSTPONED to August āĀ Potential Changes to Questioning Authority for configuring linked data authorities (Lynette Rayle) (ld4l-labs/questioning_authority - linked_data branch) (Linked Data Sources)
- POSTPONED to August ā triplestore_adapter gem (Josh Gum)
Ā