Data Extraction & Exploration with SPARQL & the Talis Platform
shared innovation
Agenda
Tutorial Schema Graph Patterns Simple SELECT queries OPTIONAL patterns UNION queries Sorting & Limiting Filtering & Restrictions DISTINCT SPARQL Query Forms Useful Links
shared innovation
Tutorial Schema
Based on NASA spaceflight data Available in: http://api.talis.com/stores/space
shared innovation
Triple and Graph Patterns
How do we describe the structure of the RDF graph which we're interested in?
shared innovation
#An RDF triple in Turtle syntax
<http://purl.org/net/schemas/space/spacecraft/1957-001B> foaf:name Sputnik 1.
shared innovation
#An SPARQL triple pattern, with a single variable
<http://purl.org/net/schemas/space/spacecraft/1957-001B> foaf:name ?name.
shared innovation
#All parts of a triple pattern can be variables
?spacecraft foaf:name ?name.
shared innovation
#Matching labels of resources
?subject rdfs:label ?label.
shared innovation
#Combine triples patterns to create a graph pattern
?subject rdfs:label ?label. ?subject rdf:type space:Discipline.
shared innovation
#SPARQL is based on Turtle, which allows abbreviations #e.g. predicate-object lists:
?subject rdfs:label ?label; rdf:type space:Discipline.
shared innovation
#Graph patterns allow us to traverse a graph
?spacecraft foaf:name Sputnik 1. ?launch space:spacecraft ?launch.
?launch space:launched ?launchdate.
shared innovation
#Graph patterns allow us to traverse a graph
?spacecraft foaf:name Sputnik 1. ?launch space:spacecraft ?launch.
?launch space:launched ?launchdate.
shared innovation
Structure of a Query
What does a basic SPARQL query look like?
shared innovation
#Ex. 1 #Associate URIs with prefixes PREFIX space: <http://purl.org/net/schemas/space/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> #Example of a SELECT query, retrieving 2 variables #Variables selected MUST be bound in graph pattern SELECT ?subject ?label WHERE { #This is our graph pattern ?subject rdfs:label ?label; rdf:type space:Discipline. }
shared innovation
#Ex. 2 PREFIX space: <http://purl.org/net/schemas/space/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> #Example of a SELECT query, retrieving all variables SELECT * WHERE { ?subject rdfs:label ?label; rdf:type space:Discipline. }
shared innovation
OPTIONAL bindings
How do we allow for missing or unknown information?
shared innovation
#Ex. 3 PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?image WHERE { #This pattern must be bound ?spacecraft foaf:name ?name. #Anything in this block doesn't have to be bound OPTIONAL { ?spacecraft foaf:depiction ?image. } }
shared innovation
UNION queries
How do we allow for alternatives or variations in the graph?
shared innovation
#Ex. 4 PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?subject ?displayLabel WHERE { { ?subject foaf:name ?displayLabel. } UNION { ?subject rdfs:label ?displayLabel. } }
shared innovation
Sorting & Restrictions
How do we apply a sort order to the results? How can we restrict the number of results returned?
shared innovation
#Ex.5 #Select the uri and the mass of all the spacecraft PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?spacecraft ?mass WHERE { ?spacecraft space:mass ?mass. }
shared innovation
#Ex. 6 #Select the uri and the mass of all the spacecraft #with highest first PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?spacecraft ?mass WHERE { ?spacecraft space:mass ?mass. } #Use an ORDER BY clause to apply a sort. Can be ASC or DESC ORDER BY DESC(?mass)
shared innovation
#Ex. 7 #Select the uri and the mass of the 10 heaviest spacecraft PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?spacecraft ?mass WHERE { ?spacecraft space:mass ?mass. } #Order by weight descending ORDER BY DESC(?mass) #Limit to first ten results LIMIT 10
shared innovation
#Ex. 8 #Select the uri and the mass of the 11-20th most #heaviest spacecraft PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?spacecraft ?mass WHERE { ?spacecraft space:mass ?mass. } ORDER BY DESC(?mass) #Limit to ten results LIMIT 10 #Apply an offset to get next page OFFSET 10
shared innovation
Filtering
How do we restrict results based on aspects of the data rather than the graph, e.g. string matching?
shared innovation
#Sample data for Sputnik launch
<http://purl.org/net/schemas/space/launch/1957-001> rdf:type space:Launch; #Assign a datatype to the literal, to indicate it is #a date space:launched "1957-10-04"^^xsd:date; space:spacecraft <http://purl.org/net/schemas/space/spacecraft/1957-001B> .
shared innovation
#Ex. 9 #Select name of spacecraft launched between #1st Jan 1969 and 1st Jan 1970
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?name WHERE { ?launch space:launched ?date; space:spacecraft ?spacecraft. ?spacecraft foaf:name ?name. FILTER (?date > "1969-01-01"^^xsd:date && ?date < "1970-01-01"^^xsd:date)
shared innovation
#Ex. 10 #Select spacecraft with a mass of less than 90kg
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?spacecraft ?name WHERE { ?spacecraft foaf:name ?name; space:mass ?mass. #Note that we have to cast the data to the right type #As it is not declared in the data FILTER( xsd:double(?mass) < 90.0 )
shared innovation
#Ex. 11 #Select spacecraft with a name like ollo
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?name WHERE { ?spacecraft foaf:name ?name. }
FILTER( regex(?name, ollo, i ) )
shared innovation
Built-In Filters
Logical: !, &&, || Math: +, -, *, / Comparison: =, !=, >, <, ... SPARQL tests: isURI, isBlank, isLiteral, bound SPARQL accessors: str, lang, datatype Other: sameTerm, langMatches, regex
shared innovation
DISTINCT
How do we remove duplicate results?
shared innovation
#Ex. 12 #Select list of agencies associated with spacecraft
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT ?agency WHERE { ?spacecraft space:agency ?agency.
shared innovation
SPARQL Query Forms
Does SPARQL do more than just SELECT data?
shared innovation
ASK
Test whether the graph contains some data of interest
shared innovation
#Ex. 13
#Was there a launch on 16th July 1969? PREFIX space: <http://purl.org/net/schemas/space/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> ASK WHERE { ?launch space:launched "1969-07-16"^^xsd:date. }
shared innovation
DESCRIBE
Generate an RDF description of a resource(s)
shared innovation
#Ex. 14
#Describe launch(es) that occurred on 16th July 1969 PREFIX space: <http://purl.org/net/schemas/space/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> DESCRIBE ?launch WHERE { ?launch space:launched "1969-07-16"^^xsd:date. }
shared innovation
#Ex. 15
#Describe spacecraft launched on 16th July 1969 PREFIX space: <http://purl.org/net/schemas/space/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> DESCRIBE ?spacecraft WHERE { ?launch space:launched "1969-07-16"^^xsd:date. ?spacecraft space:launch ?launch. }
shared innovation
CONSTRUCT
Create a custom RDF graph based on query criteria Can be used to transform RDF data
shared innovation
#Ex. 16
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { ?spacecraft foaf:name ?name; space:agency ?agency; space:mass ?mass. } WHERE { ?launch space:launched "1969-07-16"^^xsd:date. ?spacecraft space:launch ?launch; foaf:name ?name; space:agency ?agency; space:mass ?mass. }
shared innovation
SELECT
SQL style result set retrieval
shared innovation
#Ex. 17
PREFIX space: <http://purl.org/net/schemas/space/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?agency ?mass WHERE { ?launch space:launched "1969-07-16"^^xsd:date. ?spacecraft space:launch ?launch; foaf:name ?name; space:agency ?agency; space:mass ?mass. }
shared innovation
Useful Links
SPARQL FAQ
http://www.thefigtrees.net/lee/sw/sparql-faq
SPARQL Recipes
http://n2.talis.com/wiki/SPARQL_Recipes
SPARQL By Example Tutorial
http://www.cambridgesemantics.com/2008/09/sparql-by-example
Twinkle, GUI SPARQL editor
http://www.ldodds.com/projects/twinkle http://code.google.com/p/twinkle-sparql-tools
shared innovation