2016-01-15 Initial Meeting

Time: 9:00am PDT / Noon EDT

WebEx Info: Join WebEx meeting - Meeting # 640 236 331, Meeting password: htIG0114  (hotel-tango-Igloo-Golf-zero-one-one-four  I'm not sure you need the password.)
Audio Connection:  Computer, or 1-855-244-8681 Call-in toll-free number (US/Canada), or 1-650-479-3207 Call-in toll number (US/Canada)

Moderator: E. Lynette Rayle (Cornell)

Notetaker:  cmharlow

Attendees:

(Please update/correct as needed)

Agenda:

  1. Next Call
    1. date/time: Fri, Feb 26 noon EST/9am PST
    2. Moderator: 
    3. Notetaker: 
  2. Call for additional agenda items
  3. Welcome & Introductions - What is your interest and hopes for the group?
  4. Mission and Goal
  5. Review Challenges for Linked Data
    1. Add to Challenges List
    2. First and Second choice for topic from Challenges List (or new topic)

Meeting Notes:

Helpful Links: 

Notes based off Agenda:

  1. Next Call - discussed at end of meeting/notes.
  2. Call for additional agenda items
    1. Skipped/integrated into group goals discussions, below (topic 4)
  3. Welcome & Introductions, including answers to "What is your interest and hopes for the group?"
    1. Lynette Rayle, Cornell, and is involved with the LD4L project as well as Sufia and Hydra work at Cornell. 
      1. Interested in what kinds of things can we do to link Hydra to/with other applications, especially to work done directly in triple stores that are outside of Hydra.
      2. How can Hydra applications leverage external triple stores?
    2. Stefano Cossu, Art Institute of Chicago.
      1. Mainly listening and is interested to see the direction that emerges for this discussion. 
      2. Has been experimenting a lot with triple stores, Hydra, Fedora. Trying to get Solr to index triple stores directly recently.
    3. Aaron Coburn, Amherst College.
      1. In process now of migrating from Fedora 3 to Fedora 4. Part of this work is having/maintaining an external triple stores as a side car index. Curious to see direction Hydra goes in supporting this.
    4. Nabeela Jaffer, University of Michigan. 
      1. New to Hydra at her university, and is working on a research data repository. 
      2. Wants to see how the triple store would work with Fedora and how it would/could be used with research data
    5. Trey Pendragon, Princeton. 
      1. Developer for ActiveFedora and Hydra stack in general for the Princeton repository. 
      2. Some experience in past job for working with triple stores in tandem with Hydra stack. 
      3. Interested in seeing the idealistic and other goals for/from this group.
    6. Tom Johnson, DPLA. 
      1. DPLA uses some Hydra tooling with a triple store already, so is interested in keeping that work up to date with Hydra community. 
      2. Another interest: one of the maintainers of Active Triples and other projects that play into this group's scope, including some RDF LDP libraries, some Blazegraph repository interfaces in Ruby, recently working on RDF.rb 2.0.
    7. Mark Matienzo, DPLA. 
      1. Same interests as Tom’s. 
      2. In the back of his mind, also has a set of possible concerns of implementation of this related to the Hydra in a Box project. 
      3. Using ActiveTriples, LDP to manage metadata at DPLA.
    8. Hector Correa, Penn State. 
      1. Curious to see what triple stores offer and see what others are doing
      2. Mostly observer at this point, curious to see what functions can offer to Hydra, and how to use them.
    9. Adam Wead, Penn State: 
      1. Wants to learn more about triple stores and see how it improves Active Fedora.
    10. Jean Colt, Cornell: 
      1. Managing a Hydra instance being moved from Fedora 3 to Fedora 4
      2. Wants to see where the rest of the community is on this work
    11. Match Critchlow, UCSD:
      1. Has managed a variety of triple store implementations over years. 
      2. Current repository is Jena-based. 
      3. In middle of migration to Fedora 4, so can envision having a side car triple store as well. 
      4. Local concern: requests from metadata specialists for running queries or identifying updates usually gets to developers like himself, but triple stores in the future could empower the metadataists to do this work directly
    12. Anna Headley, American Chemical Society:
      1. Interested in authority control and how that relates to linked data and triple stores, especially within an effort to keep metadata consistent. 
      2. Brief mention of pursuing a grant with applied linked data group that may be of interest to this group in near future - more details as they become available.
    13. Justin Coyne, DCE:
      1. Curious to see what everyone is interested in doing with this.
    14. Sheila Rabun, Univ of Oregon:
      1. Also curious to see what this group does.
    15. Christina Harlow, Cornell University:
      1. Interested in how metadata workflows might change for metadataists with implementation of these triple stores.
      2. Also interested in the Authority question.
  4. Mission and Goal
    1. wishes to get an idea of what kind of conversations where happening at HydraConnect around this topic and choose what kind of focus this group wants to have, as well as overall goals.
    2. In introductions there was interested expressed in a number of areas: 
      1. exploration comments
      2. learn more about what’s going on
      3. explore where they can go with Fedora 4 in tandem with triple store. 
      4. Also addressing challenges there.
    3. Mention of the wiki page for institutions that already using triple stores: https://wiki.duraspace.org/display/hydra/Resources+about+Working+with+Triple+Stores
      1. Please add your institution to that page if you are already using triple stores, as the group would like to have this as a resource for others.
  5. Challenges for linked data list - review, update, prioritize the challenges given on this wiki page: https://wiki.duraspace.org/display/hydra/Challenges+for+Linked+Data
    1. Existing Challenges on that wiki page:
      1. One thing mentioned at HydraConnect was reconciliation services. 
        1. Going from things to things. 
        2. Mentions of entity resolution (string to things) and lexicaliztation (things to strings) also
      2. Caching of external fragments
        1. So in the situation where you go and get the triples underneath the URI and 
        2. the URI may or may not resolve as the server is down
        3. or the response is too slow for use
        4. So there is a need for a solution to cache external triples locally for use, as well as have way to update the local store as the external triples change.
      3. External entity as subject in Fedora issue: 
        1. Can’t currently have URI that is external to Fedora be a subject in Fedora. 
        2. Creates some limits and requirements for URIs to be pulled into Fedora in order to make some statements. 
      4. Issues with controlled vocabulary management
      5. Use of SEO - search engine optimization
        1. Hector Correa: goal of linked data is to have data exposed to search engines:
          1. With SEO, it would enhance search engines picking library data up and make those connections externally with our local data.
        2. tamsin woo: there is probably a broad topic here about linked data publishing:
          1. A lot of the RDF data in Fedora stores is not following LD publishing best practices. Possibly a topic for this group.
      1. SPARQL queries with Fedora
        1. Fedora doesn’t allow you to do SPARQL queries at this time.
      2. Hydra Stack over a triplestore?
        1. Can have metadata in triplestore, but need to have object store too.
      3. Interoperability
        1. No further comments from group.
      4. LDPath
        1. this is XPath, but for RDF. 
        2. Can use to convert RDF to other formats, with use cases similar to XSLT for XML.
      5. Other?
        1. (lead to questions/comments about group's conversation) Question from scossu: Are we looking at Hydra as a tool for managing linked data or for publishing linked data? For example, are we wanting to publish information outside of Hydra domain - is that in scope of this group?
          1. tamsin woo: Yes, that is exactly the sort of use case he is thinking of. 
          2. scossu: So it was a question about this conversation, if we are including those types of use cases in this current discussion.
    2. Now, voting on these topics. Which to address first? Where is there the most interest?
      1. Hector Correa: curious to know more about triple stores, what the experiences have been with working with triple stores, what functionalities are offered that go beyond Fedora.
      2. scossu: Another question of this group: what is the role of Solr is in this context. Will Solr be working alongside triplestore? On top of? 
      3. Trey Pendragon: curious what the group's goal is - is the goal to solve LD problems in general (like reconciliation services), or is the goal to answer what do we need to do to use triple stores in the Hydra stack (as a replacement for fedora or with fedora)?
        1. Lynette Rayle: looking at how triple stores would be used in a Hydra stack.
        2. matienzo: notes that a lot of the challenges seem to overlap with Samvera Applied Linked Data Interest Group. Teasing that overlap out might help give clarity of scope for this group.
        3. Lynette Rayle: Yes, we can set this clarity in this meeting. As a group, we need to define scope and goals. This is perhaps a better starting place for our conversation.
      4. Preliminary focus of the group: Discussing triple stores in general versus triple stores use in or with a Hydra stack?
        1. Trey Pendragon: the Samvera Applied Linked Data Interest Group is involved in a caching use case solution, but they are at capacity with this work. So some of the other general linked data things on the challenges list wouldn’t come up for a while.
          1. Strategy now is a LDF (Linked Data Fragments) server for getting cached triples.
          2. Work that hasn’t started yet: side car indexer for atomic updates of linked data and Solr. 
          3. So if you have URIs for values in solr, you want to use information cached in triple store to make thing like labels searchable instead of URIs.
        2. justin: 2 distinctions in this group about what we would do with this triple store: All of the caching data is about input into a repository/Fedora. If you talk to folks on the Technical Working Group Charter (check link is to correct group), they’re talking about exporting triples from Fedora to a triple store. A lot of the work so far has been on the input side, with exception of Lynette and Aaron. 
          1. tamsin woo: do we know the use cases for this? Using SPARQL for analytical reports, for example?
          2. scossu: interested in enriching existing information with semantic data, and making that available in Hydra.
          3. Former user (Deleted)
            1. He has a couple of use cases - the inferencing one is big. Triple stores can inference, Fedora doesn't. 
            2. Another use case is the analytical piece - want to be able to have ad hoc queries across dataset in such a way that SPARQL could handle well, but that Solr or Fedora couldn’t. 
            3. Also, a question of validation - using a triple store to question the validity of an object.
              1. matienzo: on the validation use case - that's active work being done by the (Fedora) API Extensions interest group.
              2. scossu?: Part of that work may overlap with API Extensions interest group, so this group should make sure to not duplicate effort.
              3. Former user (Deleted): background on the API Extension work:
                1. Fedora can do some things well, some things not at all. 
                2. This group is working with idea of a repository where you want to expose certain services for certain endpoints. 
                3. This could lead to work on anything from validation to transformation to running analytics etc. This is scope of group, broadly.
              4. tamsin woo: if anybody is unaware, there is currently a W3C working group making some progress on RDF shapes: https://www.w3.org/2014/data-shapes/wiki/Main_Page
            4. scossu: Solr and SPARQL endpoint should be two distinct ways to retrieve information. 
        3. Lynette Rayle: Back to the original question - should the group focus first on triple stores in general or triple stores with Hydra?
          1. tamsin woo : 2 big areas that he’d like to see this group active in: 
            1. Triple stores in general:
              1. Building community knowledge around which triple stores have which performance properties for these various use cases
              2. Getting Ruby level support for some of them
                1. Ruby RDF core team has a beta-level working, but nothing production ready 
              3. Working on triple store RDF repository interfaces, for most of the triple stores reviewed
              4. No real active development resources, so getting a sense here of what folks can use, who might maintain them.
            2. Gap analysis and development work of plugging all of that stuff above back into the Hydra stack
          2. From the Webex chat window:
            1. Hector Correa: general triple stores focus first, then Hydra focus next
            2. Adam Wead: general focus on triple stores first too, but frame within the context of Hydra applications with specific use cases
            3. general agreement
      5. Selecting a specific topic to discuss next time:
        1. Initially, not asking for development commitment yet, but first how to approach this work. 
        2. Meeting once a month to do this? (general agreement with once a month)
        3. Starting point that is pressing?
          1. Anna Headley: overview of different triple stores and tooling around them would be helpful and a good way to frame work ongoing.
          2. Lynette Rayle: that would also be a good starting point to leveling folks understanding of what kind of code is already available for working with triple stores. 
            1. Can make this the topic for next time.
            2. Anyone in group willing to look at specific triple stores or share knowledge they already have about specific triple stores and working with them with Ruby?
          3. Blazegraph
            1. Hector Correa: interested in blaze graph. Critchlow, Matthew interested in helping with this.
            2. tamsin woo: wrote a blazegraph gem (verify link is to correct gem --Christina).
              1. Be aware that it is performant, except if you use blank nodes.
              2. Trey Pendragon: it might not respond the way you expect to quads either.
            3. Lynette Rayle: folks channel blazegraph information to Hector Correa and he can summarize for next time.
          1. scossuJena/Fuseki should be considered as well:
            1. He (Stefano) will do summary for Jena/Fuseki.
            2. Former user (Deleted) also willing to work on reviewing jena fuseki, especially with some larger scale data.
            3. Also looking at state of ruby code to work with fuseki.
            4. Jena/Fuseki give it a spin repo: https://github.com/ucsdlib/sparql-workshop-vagrant
          2. tamsin woo: will give information about Marmotta and the state of the marmotta RDF gem he pulled together.
          3. Virtuoso another option: 
            1. along with needed current status of gem for working with Virtuoso.
            2. nobody…
            3. will review later - have enough already for the next meeting.
          4. Former user (Deleted): Rya (Apache project) is another option.
            1. It is an incubator now. Runs on top of Accumulo and Hadoop
            2. Critchlow, Matthew: has an Accumulo instance at his place of work, would be interested in testing.
            3. https://wiki.apache.org/incubator/RyaProposal
        1. For folks involved in looking at these different triple stores, could they also include set up information - or links to that existing information. Set of walk throughs or dive ins would be helpful too.
          1. Adam Wead is happy to be guinea pig for walk throughs or testing that folks have.
          2. 4 triple stores is a good starting point, after the first round of reports, can choose 1 or 2 to work on.
  6. Decisions:
    1. Focus on both general triple store issues as well as looking at how we can pull things we learn into the Hydra stack
    2. For next time, will look at 4 different triple stores, give introduction, have some information on how to get those up and running.
  7. Meeting time: Does Friday at 9:00am PDT / Noon EDT, repeated every 4 weeks, work?
    1. Hector will be out of town, but says keep time like that + he will work around this
    2. Next meeting: 5 weeks from now, same day/time (Friday, 9:00am PDT / Noon EDT). Watch for Webex (call in) information in Hydra Tech Google Group.