Linked Data Fragments Call with Ruben Verborgh 2015-04-14

Meeting on 2015-04-14

Present: Steven Anderson, Corey Harper, Tom Johnson, Trey Terrell, Ruben Verborgh

Corey introduced topic. Cacahing issues for remote resources. How to do this?

Trey: Caching isn't hard because triples change a lot, it's hard  because servers are often down.

Tom: DCMI Types -- Hitting cache that sits in ruby, but for biger data sets, large sizes, this will be a challenge
  • DPLA interested in reconcilation endpoint, but also for front-end like Trey's use case
  • LD Fragments servers for others to use as a potential DPLA goal

Trey: Thinks he understands the goal of TPF & LDF, but wants to hear it. What is the problem set.

Ruben:
  • Availability on the Web. 
  • Data Dump & do stuff locally -- or --
  • query live.
  • Many data sets aren't queriable.
  • Those that are suffer from downtime
  • LDF is a conceptual framework to say "This API offers that kind of fragments"
  • Data dumps have one fragment: the entire dataset
  • SPARQL endpoints have many highly specific fragments, many of which are expensive to compute
  • Can we find different types of fragments that divide the workload differently? (Triple Pattern Fragments are an example.)
  • Moves some intelligence and business logic to client side.
  • Clients solve complex queries by splitting them into smaller queries the server can handle, depending on its interface.
     
Q from Trey: High availability theory is that server does less work, so easier to keep up
 
Ruben: That's part of it. Most APIs on the Web have far less expensive requests than SPARQL endpoints.
  • 1st, the TPF API is low-cost for the server.
  • 2nd, Web is optimized for caching.
  • overlapping questions can reuse same fragments and be more cacheable
  • Publication (http://linkeddatafragments.org/publications/iswc2014.pdf) includes evidence that avaialbility data & cost-data per request, this is cheaper
     
Pushing this to the client side -- Ability to combine from multiple data streams
  • Trey's Primary use case is "I have stuff, I need labels"
  • Can have a interface that says, ask me for a subject, I'll always give you the label
  • Layers of abstraction
     
Reconciliation: 
  • Now experimenting with full text search
  • Example: http://data-test.linkeddatafragments.org/dbpedia2014-es?subject=&predicate=&object=*belgium*
  • Corey: Question about ranking, probabilistic matching.
  • Ruben, these examples have some rank, since from Elastic Search
  • Could have interfaces that supported explicitly scored
  • Corey: Even support "just give me your top match" interfaces
  • Some LD Frags might take responsibility for the ranking
  • this is powerful, since we don't trust LC server ranking
  • could support different ranking methods
  • Reliably combine different data sources 
     
Reusability:
  • Support for multiple interfaces, which are composed of interface features
  • Allows us to keep a lot of this functionality out of Hydra, have a nice clear interface, separation of concerns goodness
  • Figure out which interfaces are useful to whom
  • This allows for reusable interfaces
     
Next steps:
  • Hydra folks should think about a ruby implemntation.
  • Spec exists.
  • Implementations in JavaScript, Java, Perl
  • Tom's interested in setting up a geospatial frags server & integrating with Two Fishes
  •  Reverse geocoding