Accessing Data Through The CKAN API
Accessing Data Through The CKAN API
The Humanitarian Data Exchange (HDX) runs on a platform called CKAN. You can use
CKAN’s API to discover, update, and download data from HDX without human
intervention.
This cookbook focuses on data access rather than data updating (if you’re interested in
automated data updating, we have a Python library hdx-python-api that will make the
process much simpler).
We’ll start with some basic ingredients, then move on to some full recipes for data
access. In each case, we’ll show both the direct RESTful API URL, and the Python code
that you can use via the official ckanapi (which also includes command-line utilities for
use in shell scripts).
For a simpler method to get automatic notifications about new or modified datasets
matching any search criteria (e.g. for a specific organization, country, tag, search string,
or combination of those), see Appendix A. Syndication feeds for notifications.
1. Basics
This section introduces the fundamental information you’ll need to know to connect to
the API. You may choose to skip straight to 3. Simple search examples if you prefer to
see some examples first.
Page 1 of 22
https://data.humdata.org/api/3/
(Note that in Python, you supply just the CKAN domain name, not the full API path.) For
all the Python examples that follow, we’ll assume that you’ve created this ckan object
already.
https://data.humdata.org/api/3/action/package_show?id=hr
p-projects-nga
or in Python:
package =
ckan.action.package_show(id="hrp-projects-nga")
(In Python, you’ll get the JSON result directly; with the URL, you’ll get the metadata
response under a key called "result".) The string "hrp-projects-nga", is the same stub
that appears at the end of the dataset URL on HDX,
https://data.humdata.org/dataset/hrp-projects-nga.
Exercise: try using the patterns above to read the metadata for other datasets on ckan.
Page 2 of 22
{
"name": "...",
"title": "...",
"description": "...",
"created": "...",
"last_modified": "...",
"dataset_date": "...",
"dataseries_name": "...",
"groups": [
{ … },
{ … }
],
"organization": { … },
"resources": [
{ … },
{ … }
],
"tags" [
{ … },
{ … }
]
}
title: the full human-readable dataset title, like "Humanitarian Response Plan projects
for Nigeria"
created: the date and time when the dataset was first created on ckan.
last_modified: the date and time when the dataset metadata on HDX was last changed
(not the data itself can change independently, especially if it’s hosted off HDX).
dataset_date: the date range when the data is applicable (could be in the past or future
relative to the creation and last-modified dates).
Page 3 of 22
groups: a list of data structures describing countries or country-like entities associated
with the dataset (in this case, just Nigeria).
organization: a data structure describing the data provider (e.g. OCHA Financial Tracking
Service).
resources: a list of data structures describing the resources (files) inside the dataset.
tags: a list of data structures describing the semantic tags associated with the dataset
(like "who is doing what and where-3w-4w-5w").
(You can learn more about the different data structures and properties at
https://docs.ckan.org/en/api)
We’ll talk more about using the different parts of the dataset metadata later. The main
focus of this cookbook will be how to locate datasets on HDX so that your code can
download and process them with minimal human intervention.
https://data.humdata.org/api/3/action/package_search
or, in Python
Page 4 of 22
packages = ckan.action.package_search()
https://data.humdata.org/api/3/action/package_search?sta
rt=200&rows=100
or, in Python
packages = ckan.action.package_search(start=200,
rows=100)
Tip: instead of doing your own paging, you can use the ckancrawler Python package,
which feeds search results smoothly into a single iterator like this, doing all the paging
behind the scenes:
(There is an alternative parameter named fq that works the same way but doesn't affect
search-result weighting for relevancy. You can ignore it for the sake of the examples in
this cookbook, but you might find it useful if you decide to try more-complex free-text
searches in the future.)
Page 5 of 22
If you’re constructing a URL query directly, then you will have to URL-encode your search
string, so that "displaced people" (for example) becomes "displaced%20people" or
"displaced+people":
https://data.humdata.org/api/3/action/package_search?q=d
isplaced%20people
packages = ckan.action.package_search(q="displaced
people")
groups: datasets related to this country or country-like entity, using the HDX country
stub, which is usually the ISO3 code in lower case (more information).
organization: datasets from this provider, using the HDX organization stub (more
information).
For example, this URL retrieves datasets provided by FAO (440 of them as of November
2024), using the HDX organization stub "fao":
https://data.humdata.org/api/3/action/package_search?q=o
rganization:fao
Python code:
packages =
ckan.action.package_search(q="organization:fao")
For a complete list of query filters available for HDX, see B.1. Complete list of HDX CKAN
search fields.
Page 6 of 22
2.2.2. Advanced boolean logic
By default, repeated filters imply boolean AND: if you have a query "groups:afg
group:pak" it will include only datasets that apply to Afghanistan and to Pakistan. For
advanced use (beyond what’s needed for the examples in this cookbook), you can use
special Solr filter syntax. For example, the query "group:afg OR group:pak" will
return datasets associated with either or both countries:
https://data.humdata.org/api/3/action/package_search?q=g
roups:afg%20OR%20groups:pak
Python code:
packages = ckan.action.package_search(
q="groups:afg OR groups:pak"
)
last_modified: the last time the dataset was changed on HDX (will not detect changes to
remote resources, like APIs).
score: relevance to your search query (add desc to get the most-relevant datasets first).
Page 7 of 22
The following URL returns datasets sorted by total downloads in descending order (so
that the most-popular ones appear first); note that the whitespace needs to be
URL-encoded:
https://data.humdata.org/api/3/action/package_search?sor
t=total_res_downloads%20desc
Python code:
packages = ckan.action.package_search(
sort="total_res_download desc"
)
{
"help": "…",
"success": true,
"result": {
count: …,
facets: { … },
expanded: { … },
results: [
{ … },
{ … }
],
}
}
The Python API strips off the top layer, so you see just the following:
{
count: …,
facets: { … },
expanded: { … },
results: [
{ … },
{ … }
],
}
Page 8 of 22
Only two of these fields are essential:
count: the total results available (not just the ones returned from this paged query)
https://data.humdata.org/api/3/action/group_list?all_fie
lds=true
or in Python,
countries = ckan.action.group_list(all_fields=True)
Use the name field in your query (that contains the code).
https://data.humdata.org/api/3/action/package_search?q=g
roups:ukr
Page 9 of 22
or in Python,
packages = ckan.action.package_search(q="groups:ukr")
As described in 2.3. Sorting results, if you want the most-recent datasets first, you can
add a sort parameter:
https://data.humdata.org/api/3/action/package_search?q=g
roups:ukr&sort=metadata_created%20desc
or in Python,
packages = ckan.action.package_search(
q="groups:ukr",
sort="metadata_created desc"
)
https://data.humdata.org/api/3/action/organization_list?
all_fields=true
or in Python,
orgs = ckan.action.organization_list(all_fields=True)
As with countries, use the name field from these results in your queries.
The following query will return a list of IPC’s datasets (57 of them as of November 2024):
Page 10 of 22
https://data.humdata.org/api/3/action/package_search?fq=
organization:ipc
or in Python,
packages =
ckan.action.package_search(q="organization:ipc")
https://data.humdata.org/api/3/action/tag_list?vocabular
y_id=Topics&all_fields=true
or in Python,
topics = ckan.action.tag_list(
vocabulary_id="Topics",
all_fields=True
)
As with countries and organizations, use the name field from these results in your
queries.
https://data.humdata.org/api/3/action/package_search?q=v
ocab_Topics:%22gender-based%20violence-gbv%22
WARNING: If the topic tag contains whitespace (as many in HDX unfortunately do), you
will have to both URL-encode the whitespace and surround the tag name in quotation
marks when constructing the URL. This is an easy trap to fall into when working with the
HDX CKAN API.
Page 11 of 22
In Python, the quotation marks are also required (but not, obviously, the URL encoding):
packages = ckan.action.package_search(
q="vocab_Topics:\"gender-based violence-gbv\""
)
https://data.humdata.org/api/3/action/package_search?row
s=0&fq=dataset_type:dataset&facet.field=[%22dataseries_n
ame%22]&facet.limit=1000
https://data.humdata.org/api/3/action/package_search?q=d
ataseries_name:%22IOM%20-%20DTM%20Baseline%20Assessment%
22
WARNING: If the data-series name contains whitespace (as many in HDX unfortunately
do), you will have to both URL-encode the whitespace and surround the tag name in
quotation marks when constructing the URL. This is an easy trap to fall into when
working with the HDX CKAN API.
In Python, you don’t need to URL-encode the whitespace, but you still need to quote it:
packages = ckan.action.package_search(
Page 12 of 22
q="dataseries_name:\"IOM - DTM Baseline
Assessment\""
)
https://data.humdata.org/api/3/action/package_search?q=v
olunteers
packages = ckan.action.package_search(q="volunteers")
For information on using wildcards in text searches, see B.3. Querying with wildcards
and ranges.
4. Complex queries
The API calls in 3. Simple search examples won’t always be enough. To locate a specific
dataset with confidence, you will need to combine multiple filters, sorting specifications,
and (in some cases) free-text search. The following examples show how your code can
use these together to find a relevant dataset automatically.
As of December 2024, there is no data series for all OCHA 3Ws, but there is a general
CKAN topic tag "who is doing what and where-3w-4w-5w" (see 3.3. Finding
datasets by topic tag), so this makes a good starting point:
Page 13 of 22
vocab_Topics:"who is doing what and where-3w-4w-5w"
Next, we want to narrow the results down to Lebanon (see 3.1. Finding datasets by
country/group), so we add the filter
group:leb
And we want only 3Ws from OCHA, not from other organisations (see 3.2. Finding
datasets by provider). That’s trickier, because each OCHA field office is a separate
organisation on ckan. For OCHA Lebanon, the HDX organisation is "ocha-lebanon".
So we also add
organization:ocha-lebanon
(In this case, we could have left out the country filter, but in others, one OCHA field
office might produce 3Ws for multiple countries, so it’s usually best to leave it in.)
And finally, we’ll want to sort the results so that the latest 3W appears first (see 2.3.
Sorting results), so we will add the sort parameter
last_modified desc
https://data.humdata.org/api/3/action/package_search?q=v
ocab_Topics%3A%22who%20is%20doing%20what%20and%20where-3
w-4w-5w%22%20groups:lbn%20organization:ocha-lebanon&sort
=last_modified%20desc
or in Python,
query_parts = (
"vocab_Topics:\"who is doing what and
where-3w-4w-5w\"",
"groups:lbn",
"organization:ocha-lebanon",
)
packages = ckan.action.package_search(
q=" ".join(query_parts),
sort="last_modified desc"
)
Page 14 of 22
The first result should reliably be the latest available OCHA 3W for Lebanon. This will
work even if the dataset name and URL change on HDX.
We also need to set the group to "ven" for Venezuela (see 3.1. Finding datasets by
country/group):
groups:ven
last_modified desc
so that the first result will be the most-recent one in case there is more than one result
(see 2.3. Sorting results).
https://data.humdata.org/api/3/action/package_search?q=d
ataseries_name:%22WFP%20-%20Food%20Prices%22%20groups:ve
n&sort=last_modified%20desc
query_parts = (
"dataseries_name:\"WFP - Food Prices\"",
"groups:ven",
)
packages = ckan.action.package_search(
q=" ".join(query_parts),
sort="last_modified desc"
)
Page 15 of 22
4.4. Sex- and age-disaggregated datasets related to refugees
Now, let’s try a more-thematic approach. We want to find all datasets that contain sex-
and age-disaggregated data about refugees. In this case, we want to combine two topic
tags, as introduced in 3.3. Finding datasets by topic tag (note that we have to quote the
first one because of the internal spaces):
https://data.humdata.org/api/3/action/package_search?q=v
ocab_Topics:%22sex%20and%20age%20disaggregated%20data-sa
dd%22%20vocab_Topics:refugees
query_parts = (
"vocab_Topics:\"sex and age disaggregated
data-sadd\"",
"vocab_Topics:refugees",
)
packages = ckan.action.package_search(
q=" ".join(query_parts)
)
Page 16 of 22
Appendix A. Syndication feeds for notifications
As an alternative to using the CKAN API to find data on HDX, you can use Atom (similar
to RSS) syndication feeds to receive notifications of any new or modified datasets
matching your search criteria. You construct searches the same way as for the CKAN API,
but the URL pattern looks like this (using a simple text search for "food"):
https://data.humdata.org/feeds/dataset.atom?q=food
The result is a list of entries like this (in XML), though libraries for all major programming
languages ensure that you will never have to deal with the XML markup directly:
<entry>
<id>https://data.humdata.org/dataset/efad2587-3c06-
4530-ba12-1c6e8ae393db</id>
<title>Guinea - HungerMap data</title>
<updated>2024-12-10T14:13:31.694584+00:00</updated>
<content>HungerMapLIVE is ...</content>
<link
href="https://data.humdata.org/api/3/action/package_show
?id=efad2587-3c06-4530-ba12-1c6e8ae393db"
rel="alternate"/>
<link
href="https://data.humdata.org/api/3/action/package_show
?id=wfp-hungermap-data-for-gin" rel="enclosure"/>
<category term="food security"/>
<category term="hxl"/>
<category term="indicators"/>
<published>2024-11-25T21:44:13.878433+00:00</published>
</entry>
Note that the entry contains a direct link into the HDX CKAN API to download the
package metadata (see 1.3. Anatomy of a dataset). The first entry will be the
most-recently-modified result, and so on, so you can use this to get automated update
notifications for any of the searches described in the cookbook.
Page 17 of 22
There are also simpler, dedicated URLs to get updates for a country or organization
without using the fielded search syntax. For example, this feed will always return the
most-recently-updated public datasets related to Afghanistan:
https://data.humdata.org/feeds/group/afg.atom
And this feed will always return the latest public datasets provided by the World Food
Programme:
https://data.humdata.org/feeds/organization/wfp.atom
You can consume these feeds programmatically using a library like atoma in Python, or
you can load them into a feed reader like Feedly or NetNewsWire for human
consumption (beside blogs, news articles, and other syndicated information).
Page 18 of 22
Appendix B: Advanced CKAN search features
(Contributed by Ian Hopkinson)
CKAN and HDX provide more-advanced search features that aren’t covered in this
cookbook, but might be useful for specific needs.
Page 19 of 22
B.2. Querying date fields
CKAN’s date search facilities are powerful but not always obvious. You can do an exact
search for a datetime with a query like
metadata_created:"2019-12-04T10:23:27.806321Z", note that if the trailing Z is omitted
the search fails with an Invalid Date String error, however dates are returned from
package_search without a trailing Z! Date fields can also be queried with a range
expression which allows for the special values NOW, DAY, MONTH, YEAR, HOUR,
MINUTE, these can be combined with "normal" dates with +,- and / operators (/ is
rounding):
Example:
https://data.humdata.org/api/action/package_search?q=las
t_modified:[NOW-1DAY%20TO%20NOW]
or in Python,
packages = ckan.action.package_search(
q="last_modified:[NOW-1DAY TO NOW]"
)
The multicharacter (*) and single character (?) wildcard operators are supported but
cannot be used in quoted search terms (use a backslash instead to escape whitespace).
Example:
Find all datasets with the source "ETH Zurich Cli?ada" where "?" represents any letter:
https://data.humdata.org/api/action/package_search?q=dat
aset_source:ETH\%20Zurich\%20Cli?ada
or in Python,
Page 20 of 22
packages = ckan.action.package_search(
q="dataset_source:ETH\\ Zurich\\ Cli?ada"
)
Example:
Find all datasets with the source "ETH Zurich Cli*" where "*" represents 0 or more
letters:
https://data.humdata.org/api/action/package_search?q=dat
aset_source:ETH\%20Zurich\%20Cli*
or in Python,
packages = ckan.action.package_search(
q="dataset_source:ETH\\ Zurich\\ Cli*"
)
As well these simple wildcard usages the search API also supports range queries which
can include wildcards.
Example:
https://data.humdata.org/api/action/package_search?q=num
_resources:[2%20TO%205]
Or in Python,
packages = ckan.action.package_search(
q="num_resources:[2 TO 5]"
)
Range queries can be used to select values which are not null with the query
fieldname:[* TO *]
Page 21 of 22
fieldname:*
includes datasets that have null values as well as all other values.
Page 22 of 22