Linked Data Fragments Call with Ruben Verborgh 2015-04-14

Linked Data Fragments Call with Ruben Verborgh 2015-04-14

Meeting on 2015-04-14



Present: Steven Anderson, Corey Harper, Tom Johnson, Trey Terrell, Ruben Verborgh



Corey introduced topic. Cacahing issues for remote resources. How to do this?



Trey: Caching isn't hard because triples change a lot, it's hard  because servers are often down.



Tom: DCMI Types -- Hitting cache that sits in ruby, but for biger data sets, large sizes, this will be a challenge

  • DPLA interested in reconcilation endpoint, but also for front-end like Trey's use case

  • LD Fragments servers for others to use as a potential DPLA goal


Trey: Thinks he understands the goal of TPF & LDF, but wants to hear it. What is the problem set.



Ruben:

  • Availability on the Web. 

  • Data Dump & do stuff locally -- or --

  • query live.

  • Many data sets aren't queriable.

  • Those that are suffer from downtime

  • LDF is a conceptual framework to say "This API offers that kind of fragments"

  • Data dumps have one fragment: the entire dataset

  • SPARQL endpoints have many highly specific fragments, many of which are expensive to compute

  • Can we find different types of fragments that divide the workload differently? (Triple Pattern Fragments are an example.)

  • Moves some intelligence and business logic to client side.

  • Clients solve complex queries by splitting them into smaller queries the server can handle, depending on its interface.
     

Q from Trey: High availability theory is that server does less work, so easier to keep up
 

Ruben: That's part of it. Most APIs on the Web have far less expensive requests than SPARQL endpoints.

  • 1st, the TPF API is low-cost for the server.

  • 2nd, Web is optimized for caching.

  • overlapping questions can reuse same fragments and be more cacheable

  • Publication (http://linkeddatafragments.org/publications/iswc2014.pdf) includes evidence that avaialbility data & cost-data per request, this is cheaper
     

Pushing this to the client side -- Ability to combine from multiple data streams

  • Trey's Primary use case is "I have stuff, I need labels"

  • Can have a interface that says, ask me for a subject, I'll always give you the label

  • Layers of abstraction
     

Reconciliation: 

  • Now experimenting with full text search

  • Example: http://data-test.linkeddatafragments.org/dbpedia2014-es?subject=&predicate=&object=*belgium*

  • Corey: Question about ranking, probabilistic matching.

  • Ruben, these examples have some rank, since from Elastic Search

  • Could have interfaces that supported explicitly scored

  • Corey: Even support "just give me your top match" interfaces

  • Some LD Frags might take responsibility for the ranking

  • this is powerful, since we don't trust LC server ranking

  • could support different ranking methods

  • Reliably combine different data sources 
     

Reusability:

  • Support for multiple interfaces, which are composed of interface features

  • Allows us to keep a lot of this functionality out of Hydra, have a nice clear interface, separation of concerns goodness

  • Figure out which interfaces are useful to whom

  • This allows for reusable interfaces
     

Next steps:

  • Hydra folks should think about a ruby implemntation.

  • Spec exists.

  • Implementations in JavaScript, Java, Perl

  • Tom's interested in setting up a geospatial frags server & integrating with Two Fishes

  •  Reverse geocoding