Wikidata:SPARQL query service/Federated queries

From Wikidata
Jump to navigation Jump to search

Introduction

[edit]

The vision of Linked Open Data is to overcome disconnected data silos. Federated queries, in other words the simultaneous querying of multiple SPARQL endpoints, play a key role in realizing this potential. Ideally, different datasets that have been created in very different contexts can refer to each other, so that 'federation' brings together knowledge stored in different locations in the global Linked Open Data Cloud. In order to actually be able to connect several knowledge graphs via 'federated queries', a number of requirements must be met. This page focuses on federation, which has Wikidata as a starting point and addresses other SPARQL endpoints via federation. (To learn more about what steps are necessary when hosting a Wikibase instance yourself, see: #)

Connected items

[edit]

In order to be able to write federated queries with Wikidata as starting point, you need to know the Wikidata data model [1] and the data model(s) of the SPARQL endpoint(s) you want to query. To enable federation, the items between the different knowledge graphs must refer to each other. There are standardized properties such as owl:sameAs and wdt:exact match for this purpose. In addition, matching in Wikidata often works via “external identifiers”. Precise knowledge of the data model with regard to the respective property is therefore important. Note that the triples in other non-Wikibase databases might have a different vocabulary of subjects, predicates and objects.

Allowlist

[edit]

The SPARQL endpoints must also each allow federation to another SPARQL endpoint. The current [Wikidata allowlist #] provides an overview of all SPARQL endpoints that you can query via Wikidata.

Calling up an external endpoints

[edit]

The basic mechanism of federation is to use the SERVICE operator to call up another endpoint. As part of the Prefix defaults, wdqs already defines the the Wikidata endpoint <https://query.wikidata.org/sparql>, so a simple federation syntax to Wikidata would look like this:

SERVICE wdqs { ?a ?b ?c }

or if not using prefixes the call function would look like this:

SERVICE <https://query.wikidata.org/sparql> { ?a ?b ?c }

Using external identifiers / IRIs / and the BIND operator

[edit]

When federating, typically you want to get more information about items that exist, or are referenced, in more than one LOD resource. This means we need to have a connecting mechanism between the two resources. One common way of doing that is using External ID properties in Wikidata. However, the default result when asking for the value of an External ID property in Wikidata is a literal and not an IRI that can be used in federated queries. This means we will only get a Q-ID, rather than a full IRI. To be able to re-use the same Q-ID as a starting point in the federated part of the query, we need to BIND the Q-ID literal to a path (either with a prefix or not). To do this, we need to implement the following syntax:

BIND(IRI(CONCAT(STR(wd:), ?Wikidata_id)) AS ?Wikidata_item)

There is a workaround to avoid using BIND, but there are some caveats. Instead of using the typical path to request a value for a property (i.e. via wdt:, or <http://www.wikidata.org/prop/direct/>), you could also use wdtn: (or <http://www.wikidata.org/prop/direct-normalized/>). This will call up the value generated via the formatter URI for RDF resource (https://www.wikidata.org/wiki/Property:P1921), if it has been defined for the respective external ID property. This will then generate the correct URI for an entity, instead of having to use BIND(IRI(CONCAT("somePrefix", ?id)) AS ?x) in your query.

But there are some potential issues that have to be kept in mind:

- once the formatter URI for RDF resource has been added, you have to wait 24 hours for the property info cache to expire.

- only then start adding actual statements for the property, previously added statements will not be formatted properly (so for those BIND operator will still be needed).

- note the difference between http and https - the source of truth is always the Concept URI (available in the sidebar on most Wikibase installations including Wikidata); the correct address (http/https) method has to be used consistently when defining prefixes, BINDings, or when adding values to the formatter URI for RDF property.

Simple Example queries

[edit]

Here are some example of federated queries

Getting basic information for one item

[edit]

This query gets the narrative form of the book "Candide" from MiMoText.

# declare MiMoText prefixes
PREFIX mmd: <http://data.mimotext.uni-trier.de/entity/>
PREFIX mmdt: <http://data.mimotext.uni-trier.de/prop/direct/>

SELECT * WHERE {
  # start with this book as an example
  BIND(wd:Q215894 AS ?bookOnWikidata)
  
  # select additional information on Wikidata
  ?bookOnWikidata wdt:P50 ?authorOnWikidata.
  
  # add labels from Wikidata
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en".
    ?bookOnWikidata rdfs:label ?bookOnWikidataLabel.
    ?authorOnWikidata rdfs:label ?authorOnWikidataLabel.
  }
  
  # connect to MiMoText
  ?bookOnWikidata wdt:P12047 ?MiMoTextID.
  BIND(IRI(CONCAT(STR(mmd:), ?MiMoTextID)) AS ?bookOnMiMoText)
  
  # federated subquery to MiMoText
  SERVICE <https://query.mimotext.uni-trier.de/proxy/wdqs/bigdata/namespace/wdq/sparql> {
    # select additional information on MiMoText
    ?bookOnMiMoText mmdt:P5 ?authorOnMiMoText;
                    mmdt:P33 ?narrativeFormOnMiMoText.
    
    # add labels from MiMoText (this only works because MiMoText is also a Wikibase)
    SERVICE wikibase:label {
      bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en".
      ?bookOnMiMoText rdfs:label ?bookOnMiMoTextLabel.
      ?authorOnMiMoText rdfs:label ?authorOnMiMoTextLabel.
      ?narrativeFormOnMiMoText rdfs:label ?narrativeFormOnMiMoTextLabel.
    }
  }
}
Try it!

Works in Japan Search by British sculptors

[edit]
PREFIX jps: <https://jpsearch.go.jp/term/property#>
SELECT DISTINCT ?name ?work_link ?work ?creatorLabel ?date
WITH {
  SELECT ?creator WHERE {
     VALUES ?citizenship {wd:Q145 wd:Q174193} . # UK, Britain & Ireland
    ?creator wdt:P6698 [];   # Only get things with a Japan Search ID
   wdt:P27 ?citizenship; wdt:P106 wd:Q1281618}    # sculptors
     } AS %creators
WHERE {
include %creators
  SERVICE <https://jpsearch.go.jp/rdf/sparql/> {
   ?jps_creator owl:sameAs ?creator . # convert Wikidata ID to Japan Search ID
    ?work schema:creator ?jps_creator .  # Works by this artist
    OPTIONAL {?work schema:name ?name }   # This will return separate names in English and Japanese names
    OPTIONAL {?work schema:dateCreated ?date}
    }
FILTER (lang(?name)="en")  # Show only the English name
BIND(URI( REPLACE(STR(?work), "/data/", "/item/") ) AS ?work_link)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?date
Try it!
[edit]
PREFIX jps: <https://jpsearch.go.jp/term/property#>
SELECT ?name ?work ?date ?url ?image WHERE {
  {
  SERVICE <https://jpsearch.go.jp/rdf/sparql/> {
    ?jps_creator owl:sameAs wd:Q200798 .
    ?work schema:creator ?jps_creator .
    OPTIONAL {?work schema:dateCreated ?date}
    OPTIONAL {?work schema:image ?image}
    OPTIONAL {?work schema:name ?name}
    FILTER (lang(?name)="ja").  # Show only the Japanese name
    }
BIND(URI( REPLACE(STR(?work), "/data/", "/item/") ) AS ?url)
    }
UNION {
  ?work wdt:P170 wd:Q200798.
  OPTIONAL {?work wdt:P973 ?url}
  OPTIONAL {?work wdt:P18 ?image}
  OPTIONAL {?work wdt:P571 ?date}
  ?work rdfs:label ?name FILTER (lang(?name)="ja")
  }
}
Try it!

UK Parliament constituencies whose official point location is more than 10km from the location in Wikidata

[edit]
# compare lat/long of Parliament and Wikidata constituency records

#defaultView:Map{"hide":["?line"]}
PREFIX parliament:<https://id.parliament.uk/schema/>

SELECT DISTINCT ?constituency ?parlcoord ?item ?itemLabel ?wdcoord ?dist ?line WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  SERVICE <https://api.parliament.uk/sparql> 
    { ?constituency parliament:constituencyGroupHasConstituencyArea ?area .
      ?area parliament:latitude ?lat . ?area parliament:longitude ?long . 
      bind(SUBSTR(str(?constituency),26) as ?parlid) . }
  BIND(concat("Point(",str(?long)," ",str(?lat),")") as ?parlcoord) 
  # get constituencies from Parliament with coordinates
  ?item wdt:P6213 ?parlid . ?item wdt:P31 wd:Q27971968 . ?item wdt:P625 ?wdcoord . 
  # now get them from Wikidata with coordinates
  BIND(geof:distance(?parlcoord, ?wdcoord) as ?dist) . filter (?dist >= 10)
  # now find out the distance (in kms)
  ?item p:P625 ?statementnode. ?statementnode psv:P625 ?valuenode.
  ?valuenode wikibase:geoLatitude ?wikilat . ?valuenode wikibase:geoLongitude ?wikilon.
  BIND(CONCAT('LINESTRING (', STR(?wikilon), ' ', STR(?wikilat), ',', STR(?long), ' ', STR(?lat), ')') AS ?str) .
  BIND(STRDT(?str, geo:wktLiteral) AS ?line) 
}
Try it!

Quering multiple SPARQL Endpoints: Factgrid, Wikidata, DBPedia

[edit]

This query combines three sources, Wikidata, Factgrid and DBPedia. Starting from one Factgrid item, it looks for persons and organizations that have any relation to that starting item.

In the example, the starting point is Magnus Hirschfeld, it is defined with BIND(fg:Q225307 as ?fg_item). Then it will search for the corresponding wikidata item and use that wikidata item to look up Hirschfeld on DBPedia as well.

Get all persons and organisation that have a connection to Magnus Hirschfeld.

[edit]
# Get all persons and organisation that have some kind of connection to a Factgrid item. The example is Magnus Hirschfeld. 

PREFIX fg: <https://database.factgrid.de/entity/>
PREFIX fgt: <https://database.factgrid.de/prop/direct/>
# DBpedia
PREFIX dbo: <http://dbpedia.org/ontology/> 
PREFIX dbr: <http://dbpedia.org/resource/> 
# Wikidata
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdp: <http://www.wikidata.org/prop/>
PREFIX wps: <http://www.wikidata.org/prop/statement/>
PREFIX wdpsv: <http://www.wikidata.org/prop/statement/value/>
# misc
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dct:<http://purl.org/dc/terms/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX schema: <http://schema.org/>
prefix foaf:<http://xmlns.com/foaf/0.1/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT 
?fg_item ?fg_itemLabel ?wd_item ?db_item ?value ?valueLabel ?relation ?relation_stringLabel ?image ?source

WHERE {

  # labels from Factgrid
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }

  # starting point is a factgrid item
  BIND(fg:Q225307 as ?fg_item)
  # transform wikidata qid in factgrid to wikidata entity iri
  ?link schema:about ?fg_item .
  ?link schema:isPartOf <https://www.wikidata.org/> . 
  ?link schema:name ?qid.
  BIND(IRI(CONCAT(STR(wd:), ?qid)) AS ?wd_item)

  # FACTGRID: Get relations with all humans and organisations
  {
    # get all statments to a PERSON
    {
      ?fg_item ?relation ?value .
      ?value fgt:P2 fg:Q7 .
      ?relation_string wikibase:directClaim ?relation.
      BIND ("factgrid" AS ?source)
      OPTIONAL { ?value fgt:P189 ?image }
    }

  }

  # WIKIDATA: Get relations with all humans and organisations
  UNION {
    # Wikidata: get all statments to a PERSON
    OPTIONAL {
      SERVICE <https://query.wikidata.org/sparql> {
        ?wd_item ?relation ?value .
        ?value wdt:P31 wd:Q5 .
        BIND ("wikidata" AS ?source)
        OPTIONAL { ?value wdt:P18 ?image }
        SERVICE wikibase:label {
          bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en".
          ?wd_item rdfs:label ?wd_itemLabel.
          ?value rdfs:label ?valueLabel.
        }
      }

    }
  }

  # WIKIPEDIA: Persons mentioned in Wikipedia article 
  UNION {
    # fetch data from DBpedia
    SERVICE <https://dbpedia.org/sparql> {
      # get Wikidata QID from DBPedia resource
      ?db_item owl:sameAs ?wd_item .

      OPTIONAL { 
        # get all wikipedia links
        ?db_item dbo:wikiPageWikiLink ?value .
        # just those that are persons
        ?value a dbo:Person .
        ?value rdfs:label ?valueLabel .
        FILTER(LANG(?valueLabel) = "[AUTO_LANGUAGE]") .
        # remove Stefan Zweig, because there is some data fuckup within that resource (I informed DBpedia)
        MINUS {FILTER(REGEX(STR(?value), "Stefan_Zweig|LGBT_rights_by_country_or_territory|Barbara Hammer|Deutschland|LGBT_rights_by_country_or_territory|Barbara Hammer|Deutschland|LGBT_rights_by_country_or_territory|Barbara Hammer|Deutschland|Coming_out"))}
      BIND ("dbpedia" AS ?source)
      OPTIONAL { ?value dbo:thumbnail ?image }

    }
  }
}
}
Try it!

Mining and Modeling Text (MiMoText), a Wikibase on French Enlightenment Novels

[edit]

The following two queries combine two sources, Wikidata and MiMoText, a knowledge graph on French Enlightenment Novels built at Trier University. The federation combines information on gender retrieved on Wikidata and information on thematic concepts per novel retrieved on MiMoTextBase.

Most common themes on MiMoText by male authors

[edit]
# title:most common themes on MiMoText by male authors 
#defaultView:BubbleChart
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX mmd:<http://data.mimotext.uni-trier.de/entity/>
PREFIX mmdt:<http://data.mimotext.uni-trier.de/prop/direct/>
SELECT ?themeLabel (COUNT(*) AS ?count) WHERE {
  wd:P12047 wdt:P1630 ?formatterURL.
  ?wikidataitem wdt:P12047 ?mimotextid; #select all items with MiMoTextID
    wdt:P50 ?author.
  ?author wdt:P21 wd:Q6581097. # select all authors with gender = male 
  BIND(IRI(REPLACE(?mimotextid, "^(.+)$", ?formatterURL)) AS ?mimotextitem)
  SERVICE <https://query.mimotext.uni-trier.de/proxy/wdqs/bigdata/namespace/wdq/sparql> {
    ?mimotextitem mmdt:P36 ?theme. #literary themes
    ?theme rdfs:label ?themeLabel.
    FILTER((LANG(?themeLabel)) = "en")
  }
}
GROUP BY ?themeLabel
ORDER BY DESC (?count)
Try it!

Most common themes on MiMoText by female authors

[edit]
# title:most common themes on MiMoText by female authors
#defaultView:BubbleChart
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX mmd:<http://data.mimotext.uni-trier.de/entity/>
PREFIX mmdt:<http://data.mimotext.uni-trier.de/prop/direct/>

SELECT ?themeLabel (COUNT(*) AS ?count) WHERE { 
  wd:P12047 wdt:P1630 ?formatterURL.
  ?wikidataitem wdt:P12047 ?mimotextid; #select items with MiMoText ID 
                wdt:P50 ?author.        
  ?author wdt:P21 wd:Q6581072.  # select all authors with gender = female
  BIND(IRI(REPLACE(?mimotextid, '^(.+)$', ?formatterURL)) AS ?mimotextitem).
  service <https://query.mimotext.uni-trier.de/proxy/wdqs/bigdata/namespace/wdq/sparql> {
    ?mimotextitem mmdt:P36 ?theme. #  literary themes
    ?theme rdfs:label ?themeLabel.
    FILTER(LANG(?themeLabel) = "en")
  }
}
GROUP BY ?themeLabel
ORDER BY DESC(?count)
Try it!