Linked Data Fragments Call with Ruben Verborgh 2015-04-14
- Bill Branan
Owned by Bill Branan
Jun 16, 2017
2 min read
Loading data...
Meeting on 2015-04-14
Present: Steven Anderson, Corey Harper, Tom Johnson, Trey Terrell, Ruben Verborgh
Corey introduced topic. Cacahing issues for remote resources. How to do this?
Trey: Caching isn't hard because triples change a lot, it's hard because servers are often down.
Tom: DCMI Types -- Hitting cache that sits in ruby, but for biger data sets, large sizes, this will be a challenge
- DPLA interested in reconcilation endpoint, but also for front-end like Trey's use case
- LD Fragments servers for others to use as a potential DPLA goal
Trey: Thinks he understands the goal of TPF & LDF, but wants to hear it. What is the problem set.
Ruben:
- Availability on the Web.
- Data Dump & do stuff locally -- or --
- query live.
- Many data sets aren't queriable.
- Those that are suffer from downtime
- LDF is a conceptual framework to say "This API offers that kind of fragments"
- Data dumps have one fragment: the entire dataset
- SPARQL endpoints have many highly specific fragments, many of which are expensive to compute
- Can we find different types of fragments that divide the workload differently? (Triple Pattern Fragments are an example.)
- Moves some intelligence and business logic to client side.
- Clients solve complex queries by splitting them into smaller queries the server can handle, depending on its interface.
Q from Trey: High availability theory is that server does less work, so easier to keep up
Ruben: That's part of it. Most APIs on the Web have far less expensive requests than SPARQL endpoints.
- 1st, the TPF API is low-cost for the server.
- 2nd, Web is optimized for caching.
- overlapping questions can reuse same fragments and be more cacheable
- Publication (http://linkeddatafragments.org/publications/iswc2014.pdf) includes evidence that avaialbility data & cost-data per request, this is cheaper
Pushing this to the client side -- Ability to combine from multiple data streams
- Trey's Primary use case is "I have stuff, I need labels"
- Can have a interface that says, ask me for a subject, I'll always give you the label
- Layers of abstraction
Reconciliation:
- Now experimenting with full text search
- Example: http://data-test.linkeddatafragments.org/dbpedia2014-es?subject=&predicate=&object=*belgium*
- Corey: Question about ranking, probabilistic matching.
- Ruben, these examples have some rank, since from Elastic Search
- Could have interfaces that supported explicitly scored
- Corey: Even support "just give me your top match" interfaces
- Some LD Frags might take responsibility for the ranking
- this is powerful, since we don't trust LC server ranking
- could support different ranking methods
- Reliably combine different data sources
Reusability:
- Support for multiple interfaces, which are composed of interface features
- Allows us to keep a lot of this functionality out of Hydra, have a nice clear interface, separation of concerns goodness
- Figure out which interfaces are useful to whom
- This allows for reusable interfaces
Next steps:
- Hydra folks should think about a ruby implemntation.
- Spec exists.
- Implementations in JavaScript, Java, Perl
- Tom's interested in setting up a geospatial frags server & integrating with Two Fishes
- Reverse geocoding