2016-07-29 BlazeGraph gem, ActiveTriples Parenting Strategy, QA LOD

Time: 9:00am PDT / Noon EDT

WebEx Info: Join WebEx meeting - Meeting # 642 228 154, Meeting password: HTig0729  (Hotel-Tango-igloo-golf-zero-seven-two-nine  I'm not sure you need the password.)

Audio Connection:  Computer, or 1-855-244-8681 Call-in toll-free number (US/Canada), or 1-650-479-3207 Call-in toll number (US/Canada)

Moderator: E. Lynette Rayle (Cornell)

Notetaker:  Corey Harper (NYU)

Attendees:  Lynette RayleCorey HarperAnna Headleytamsin woo


Agenda:

  1. Next Call
    1. date/time: 2016-08-XX
    2. Moderator: 
    3. Notetaker: 
  2. Call for additional agenda items
  3. Deeper dive into BlazeGraph gem (Tom Johnson) (https://github.com/ruby-rdf/rdf-blazegraph)Ruby RDF -- Ongoing gap between high-quality rdf repo backend connecting to remote, persistent repo is hard to find

    1. RDF Blazegraph tries to fix that: https://github.com/ruby-rdf/rdf-blazegraph

    2. Limited success

    3. RDF Repository relies on standard ruby ennumerable interface

      1. Hard to implement -- ennumerable runs over _every_ statement -- maybe millions

      2. Doing it over remote repo requires streaming statements:

      3. https://github.com/ruby-rdf/rdf-blazegraph/blob/develop/lib/rdf/blazegraph/repository.rb#L27

      4. "each" in this example doesn't scale. 

    4. No way to talk about blank nodes efficiently

      1. We have this problem in _every_ triplestore we've tried

      2. If you write bnodes, and read them back, the scope is diff, so the ids don't cross from write to read

      3. You end up with new bnodes every time

      4. Therefore, you can't edit blanknodes.

      5. THis is a blazegraph jira issue: https://jira.blazegraph.com/browse/BLZG-1434

      6. Active convo -- even in last 24 hours

      7. RDF Repos try to get around this with sparql, but it doesn't scale well

      8. LDP refers to "pathological graphs", where you can't refer to bnodes unambiguously

    5. Repository class & rest client are the two ways to work with this

      1. Repository interface is the standard repository interface.

      2. Drop in replacement for in memory repository interface, but with performance caveats

    6. Q: Tested with rdf2.0?
      A: WIP branch that is up-to-date and maintained with rdf2.0

  4. Understanding ActiveTriples Parenting Strategy (Lynette Rayle) (https://gist.github.com/elrayle/11898117572445a15c4a)
    1. Repository Strategy -- things just go in the repo as usual

    2. Parenting strategy -- "This thing has this as a parent."

      1. Can be nested and it will keep going up the chain until it gets to a top-level that has repo strat

      2. When you save a thing that goes up to a particular parent, it saves everything in it's whole parent chain...

    3. Talking through the examples at: https://gist.github.com/elrayle/11898117572445a15c4a#examples

      1. DummyResource is topmost, and has a child and a grandchild

      2. First example sets each to have repo strategy.
        * Manually set each child on it's parent
        * What you get before and after persisting on various objects
        * Question for Tom on why you get cr:type on pr's triples, it's because it's cached at the ruby object level?
        * When "resuming" (reading back into new ruby objects)
        * And when destroying. Note that destroying a child doesn't remove triples that refer to it from parent
        * This is "by design", but merits further discussion

      3. Now we do the same thing with parent strategy
        * We pass the parent resource when creating objects
        * Now we have methods that can trace through the various ancestors
        * Still setting child properties
        * Now dumping parent gives every statement on the ancestor chain
        * Some questions about up-and-down the higherarchy.
        * Tom: This is going to change with PR224
        * Changes in 224, the parent strategy object approach changes:
        * Now: having it's own graph that it tries to persist up
        * Post 224: Transaction strategy.
        * Persist on grandkid, post 224 it executes a transaction buffered in gp will execute into cp
        * So post 224, add a statemetn to a gp, it's only in gp
        * pesist gp, and it pushes it into cp, but no further.
        * So basically the parent strategy example in the gist is inaccurate post 224.
        * This needs to be reworked into actual documentation.
        * Note that when resuming in parent staretgy, you resume pp, the only triples there are the triples for pp.
        * resume cp, you only get the triples you expect for cp. And it's strategy changes from parent to repo.

  5. POSTPONED to August – Potential Changes to Questioning Authority for configuring linked data authorities (Lynette Rayle) (ld4l-labs/questioning_authority - linked_data branch) (Linked Data Sources)
  6. POSTPONED to August – triplestore_adapter gem (Josh Gum)