Applied Linked Data Call 2015-10-15

Attendees:


Linked Data Fragments Update:

  • No ActiveTriples progress so still have to hold off on integrating with that.
  • Steven will have work towards a working "Repository" (from rdf.rb) interface as an alternative to Marmotta by a Friday standup.
  • Corey has documentation on his TODO list and will work on that as time allows.

Side car indexer discussion:

  • Atomic Updates:
    • When stored fields were enabled for one institution, some of their OCR was 700 MB.
      • So then you get back 700 Megabytes of full text in Solr if storing it there and no way in Solr to exclude returning a certain field.
      • Don't want to have to pick out just the fields one wants... makes it more complicated to write code.
    • Possible solution: Request the full document from Solr, then append the update, then resubmit.
      • But can't do that since the data isn't stored in Solr to do that. And this is somewhat what atomic update is since Solr does this internally if it has all stored fields.
      • Even if you can store it elsewhere, don't want to push 700 MB over HTTP back to Solr.
    • Another possible solution: One that may work is turn on field highlighting for your full text... may not return the full field in the response, only the matching part.
    • Other possible solution: solr child objects?
      • Won't work as can't really query for the main object with that?
    • More solution option: Can you provide wild-cards to the field list selector to handle this issue?
    • Other possibility: Elastic Search over Solr?
  • Reason for Atomic Updates: don't want to have to query Fedora (or any persistence layer).
    • One possibility to reduce the amount of persistence layer calls is to have the encrichments in Fedora save method.
      • But slows down the save method.
      • And still need to update the entire document on an external source change.
         
  • Some work towards a side-car indexer sort of done as part of Trey's Hydra Connect 2015 talk.

 


Linked Data Fragments Standup:

 

  • Will be on Friday, October 23rd at 11:00 AM PST / 2:00 PM EST on the same Google Hangouts link.

Next Official Meeting:

  • Next meeting will be October 29th at 9:00 AM PST / Noon EST.