Practical Cross-Dataset Queries on the Web of Data

Tuesday, April 17, 2012 - 09:30

The web is increasingly developing into a platform for data exchange, as shown by the rise of web APIs, HTML's Microdata, by Google, Microsoft and Yahoo!, Facebook's Open Graph Protocol, the Linked Open Data Cloud, etc. - all these sources of web data have one thing in common: they can be converted to the RDF data model with off-the-shelf tools, or already use RDF.

The SPARQL query language is W3C's recommended standard for querying RDF. Recently updated to a much-expanded version 1.1, SPARQL provides many features that are geared towards queries across several sources of data. This makes RDF+SPARQL a powerful “lingua franca” for web data, allowing data mashups and ad-hoc queries across multiple heterogeneous data sources at a higher level of abstraction than the ubiquitous JavaScript mashups. The topic of the tutorial is how to put this into practice.

The tutorial will start with the basics of RDF and SPARQL, and provide recipes for accessing data sources such as JSON, XML, CSV and databases as RDF. It will cover various aspects of practical cross-dataset SPARQL, including: assembling ad-hoc RDF datasets; Basic Federated Query; using CONSTRUCT and UPDATE for vocabulary mappings; using owl:sameAs links to map between identifiers in different datasets (and how to generate these links with linking tools). Also covered will be recipes for visualizing SPARQL query results with JavaScript.

A third of the tutorial's time will be used for hands-on sessions where participants work through exercises on their own laptops, using online tools and services (or their own locally installed versions of these tools, but this will be optional).


Session A – Basics (2h presentation + 1h hands-on)

  1. Motivation – We motivate why is cross-dataset query is important and introduce scenarios that will be used throughout the tutorial. Richard Cyganiak (Slides)
  2. Linked Data basics – We introduce the RDF data model and the Linked Data principles (incl. the notion of dereferencability and interlinking) and how to access non-RDF data. Anja Jentzsch (Slides)
  3. Query basics – We introduce the SPARQL query language, the basics of triple store setup, as well as handling of SPARQL endpoints. Knud Hinnerk Möller (Slides)
  4. Federated queries with SPARQL – We show how to do basic SPARQL federated query across datasets. Knud Hinnerk Möller (Slides)
  5. Hands-on session I – Writing SPARQL queries; using ad-hoc datasets, named graphs, and Basic Federated Query. (Queries and exercises)

Session B – Mastering the real world (2h presentation + 1h hands-on)

  1. Schema mapping – Using SPARQL’s CONSTRUCT features, rules and vocabulary mapping frameworks we show how to map classes and properties across datasets. Andreas Schultz (Slides)
  2. Instance matching – How to connect the same real-world entities in different datasets with Silk link specifications. Robert Isele (Slides)
  3. Finding datasets – Exploiting search engines and metadata directories for RDF, such as, the LATC Dataset Inventory and Sindice, to find relevant datasets. Anja Jentzsch (Slides)
  4. Displaying query results – Via one of the introduced scenarios from the first session we show how to use the results of the queries in a simple application that visualises data from different datasets. Pablo Mendes (Slides)
  5. Hands-on session II – Participants apply instance-level and schema-level integration and produce a mini-app with JavaScript that visualises cross-dataset results. (Queries and exercises)
Course Material

Download (zip)