User:PAC2/Documented queries
Documented queries is an initiative to support the development of documented SPARQL queries in Wikidata.
SPARQL queries are essential to explore the Wikidata graph. Right now, there are many examples queries (Wikidata:SPARQL query service/queries/examples). Each wikiproject has some specific queries in the project pages (See Wikidata:WikiProject France/Queries for example. Users share their queries in the weekly newsletter (see Wikidata:SPARQL query service/qotw), and many users store queries in their own user pages (see User:PAC2/SPARQL queries for example). That's great but those queries are often poorly documented. They have a title but few explanations, context, insights, checks and tests.
The documentated queries initiative provides a template to document a query with a set of sections which could be written in order to provide a useful documentation.
The goal of this approach would be to have reliable queries. By providing context, explanation, checks and tests for data quality, one could reuse the query to get insights.
Template
[edit]Title
[edit]A query should have a title which describes the query in natural language.
Scope
[edit]The scope section provides a detailed description of the scope of the query.
Context
[edit]The context section provides details of why the query has been created (is it relative to a wikiproject? a request for comments ?).
Query
[edit]The query section provides the query in a standardized fashion (e.g. using the {{SPARQL}}
template) and a precise description in one or more natural languages (preferably including English).
If relevant, we can have a main query and some complementary queries.
Checks and tests
[edit]In this section, we find a set of checks and tests about the resukts of the query. For instance, check the number of results, check that this item is in the query, check that the gender distribution is balanced, etc. Each check or test can be accompanied by a SPARQL query if relevant using {{SPARQL Inline}}
.
Insights
[edit]If relevant, add some insights from the query.
See also
[edit]In this section, we can add links to related documented queries and related WikiProjects.
Category
[edit]The page should be added in the the Category:Documented query category.
Tools
[edit]{{Documented queries skeleton}}
provides a basic template for a documented query page. This can be used easily to create a page using the following syntax :
{{subst:Documented queries skeleton}}
{{Querydoc}}
can be inserted at the top of a documentated query page. It provides categorization and a link to this page. It is already included in the skeleton.
Examples
[edit]- User:PAC2/Query/List of current French departments
- User:PAC2/Query/Gender and labels for properties whose values are instances or subclasses of human in French
Comparison with existing approaches
[edit]{{Query page}}
provides a useful template to write queries but doesn't provide documentation with context, checks and tests to the query.
Wikidata:Showcase Queries proposes criteria for showcase queries. This is useful and complementary to the documented query approach. One can imagine that well documented showcase queries once become showcase queries.
The documented query approach is inspired by the "Datasheets for datasets" paper. This paper argue that there is a lack of documentation for datasets in the field of artificial intelligence and provides a documentation framework which includes a set of questions about the context,the feature of the dataset[1].
Open questions
[edit]There are at least two options. The first one would be to create all documented queries in a specific subspace inside the Wikidata namespace : Wikidata:Query:. The other option would be to create documented query pages in user pages. In any case, the category Category:Documented query would regroup all documented query pages. Alternatively, queries could be documented in a dedicated Wikibase instance.
References
[edit]- ↑ Timnit Gebru; Jamie Morgenstern; Briana Vecchione; Jennifer Wortman Vaughan; Hanna Wallach; Hal Daumé III; Kate Crawford (23 March 2018), Datasheets for Datasets, arXiv:1803.09010, doi:10.48550/ARXIV.1803.09010, Wikidata Q60487752