See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/308802095
Downloading species occurrence data using the GBIF web-service API
Presentation · September 2016
DOI: 10.13140/RG.2.2.19572.35202
CITATIONS READS
0 1,697
1 author:
Dag Terje Filip Endresen
University of Oslo
62 PUBLICATIONS 256 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Nordic Crop Wild Relatives in iNaturalist View project
BioDATA – Biodiversity data management skills for students View project
All content following this page was uploaded by Dag Terje Filip Endresen on 04 October 2016.
The user has requested enhancement of the downloaded file.
NINA, Trondheim, 15th September 2016
GBIF data use
Dag Endresen
GBIF Norway
UiO Natural History Museum in Oslo
University of Oslo
Thursday, September 15th, 2016
Slides: CC-BY-4.0, GBIF.no
Status
14th September 2016
GBIF enables free and open access to
biodiversity data online.
We are an interna2onal government-ini2ated and -funded ini2a2ve focused on
making biodiversity data available to all and anyone, for scien2fic research,
conserva2on and sustainable development.
GBIF provides a data discovery system
that is dependent on resolvable stable iden;fiers for efficient
func;onality
global registry data portal
3
GBIF and GEO
Intergovernmental group on earth observations
GEO BON
Biodiversity observa2on network
Data Integration & Interoperability
GBIF provides the infrastructure delivering species occurrence
data in GEO.
GBIF BY THE NUMBERS
649,054,525
species occurrence records
32,440
datasets
813
data-publishing institutions
http://www.gbif.org | 06 JUN 2016
GBIF BY THE NUMBERS: MAY 2016
+3,818,408
species occurrence records
+4,267
datasets
+4
data-publishing institutions
http://www.gbif.org | 6 JUN 2016
data mobilizatio
DATA PUBLISHED THROUGH GBIF.ORG
Occurrence records (millions)
700
650
600
550
500
450
400
350
300
250
200
150
100
http://www.gbif.org | 6 JUN 2016
Asia (lack of data)
Africa (lack of data)
participatio
MAP OF GBIF COUNTRY PARTICIPANTS
Asia (lack of data)
Africa (lack of data)
August 2016
data publishin
DATA—BY GBIF PARTICIPANT Status May 2016
Other
United States
So
ut
hA
fri
Cos ca
taR
ica
Austra
li a
Denmark Norway
Netherlands
Belgium
Germany
Spain
Norway
Norway
Number of new records published—Top 10 participant Countries Total number of records published—Top 10 Participant Countries
(1 to 31 May 2016) (as of 31 May 2016)
1. United States 3,348,499 6. Belgium 1,620423 1. United States 271,901,500 6. Netherlands 24,241,092
2. Denmark 2,972,094 7. Netherlands 1,094,804 2. Sweden 53,776,182 7. Norway 23,811,863
3. Germany 2,868,240 8. Australia 859,896 3. United Kingdom 49,786,646 8. Germany 22,151,479
4. Norway 2,322,797 9. Costa Rica 810,035 4. France 39,896,982 9. Finland 16,612,735
5. Spain 2,238,363 10. South Africa 436,236 5. Australia 37,489,401 10. Spain 13,630,866
NOTE: Datasets are assigned to countries according to the location of the publishing institution,
including aggregated datasets with contributors from many other countries. http://www.gbif.org | 09 JUN 2016
use of gbif.or
DATA DOWNLOAD REQUESTS, BY COUNTRY
1 January – 31 May 2016
Total of
37,552 requests
From 5,131 users in
127 countries, islands
and territories
1. United States 7128 6. Colombia 2235
2. Mexico 5526 7. Italy 1319
3. Brazil 3079 8. China 1263
4. United Kingdom 2670 9. France 949
5. Spain 2478 10. Australia 858
Norwegian scien2sts generally use Artskart…
Requests for download do not necessarily result in data actually being downloaded. Based on country indicated by user login | 06 JUN 2016
data us
CITATIONS IN PEER-REVIEWED RESEARCH
Annual number of peer-reviewed publications using GBIF-mediated data
9 JUN 2016
research us
USE CITATIONS, BY COUNTRY OF AUTHORS
May 2016
1. United States 15 7. Australia 4
2. Germany 9 7. Brazil 4
3. China 5 9. Canada 3
3. France 5 9. Netherlands 3
3. Spain 5 9. South Africa 3
3. United Kingdom 5
Number of research publications in May 2016 citing use of
GBIF-mediated data, ranked by country according to affiliation of author.
Top 11 countries shown.
Total 2016
1. United States 49 5. Brazil 14
2. Germany 22 5. United Kingdom 14
3. France 18 8. Australia 11
4. China 17 8. Spain 11
5. Mexico 14 10. Canada 10
Number of research publications in 2016 citing use of GBIF-mediated data,
Norway ranked by country according to affiliation of author. Top 10 countries shown.
10 JUN 2016
research us
RESEARCH EXAMPLES (FOR NORWAY)
• Araújo R, Assis J, Aguillar R, Airoldi L, Bárbara I, Bartsch I, Bekkby T et al. (2016)
Status, trends and drivers of kelp forests in Europe: an expert assessment.
Biodiversity and Conservation 25(7) 1319-1348.
• Jb N (2016)
Some interesting lichenized fungi from old Fraxinus excelsior and Ulmus glabra in
Norway, including four new country records. Graphis Scripta 28(1-2) 17-21.
• Newbold T, Hudson L, Hill S, Contu S, Gray C, Scharlemann J, Sheil D et al.
(2016)
Global patterns of terrestrial assemblage turnover within and among land uses.
Ecography.
hGp://www.gbif.org/country/NO/publica2ons
A complete archive of research citing use of GBIF can be accessed at http://www.mendeley.com/groups/1068301/gbif-public-library
10 JUN 2016
GBIF portal:
22,0 million occurrences with loca2ons in Norway.
Published from 31 countries worldwide.
Updated 5 September 2016
GBIF portal:
21,5 million occurrences from Norwegian ins2tu2ons.
Coverage 219 countries worldwide.
Updated 5 September 2016
STATUS FOR NORDIC GBIF NODES (DATA HOSTED BY…)
hGp://www.gbif.org/country/NO
Danmark Finland
Norway Sweden
Sept 2016 Datasets Occurrences
Denmark 66 + 1 10 905 213
Finland 54 3 611 729
Iceland 4 458 705
Norway 112 + 2 21 684 727
Iceland
Sweden 42 53 787 704
Download data
GBIF DATA PORTAL
SPECIES SEARCH
Portal API
webservices
GBIF DATA PORTAL API
An interface to access
data published through
the GBIF network using
web services.
PORTAL API
GBIF Data Portal API:
h9p://api.gbif.org/v1/ (+parameters)
Summary and informa2on:
hGp://www.gbif.org/developer/summary
The RESTful API take search parameters as
key=value pairs and respond with json content
type.
RESTful query format
JSON response type
GBIF API SECTIONS
• Registry
informa2on about the datasets, organiza2ons (e.g. data
publishers), networks and the means to access them (technical
endpoints)
• Species
informa2on about species and higher taxa, and u2lity services for
interpre2ng names and looking up the iden2fiers (access to all
published checklists in the GBIF checklist bank)
• Occurrence
occurrence informa2on crawled and indexed by GBIF and search
services to do real 2me paged search and asynchronous
download services to do large batch downloads
• Maps
simple services to show the maps of GBIF mobilized content
API EXAMPLE : DATASET
Search for datasets by publishing country:
http://api.gbif.org/v1/dataset/search?publishingCountry=NO
Dataset information (UiO NHM Lichens):
http://api.gbif.org/v1/dataset/7948250c-6958-4a29-a670-
ed1015b26252
Contacts for a dataset:
http://api.gbif.org/v1/dataset/7948250c-6958-4a29-a670-
ed1015b26252/contact
Dataset endpoint (get the download URL):
http://api.gbif.org/v1/dataset/7948250c-6958-4a29-a670-
ed1015b26252/endpoint
http://www.gbif.org/developer/registry
API EXAMPLE : SPECIES
List all name usages (across all checklists):
http://api.gbif.org/v1/species?name=Beta%20vulgaris
Name usage across checklists (Beta vulgaris, 5383920):
http://api.gbif.org/v1/species/5383920/related
Name parsed into epithets and author etc.:
http://api.gbif.org/v1/parser/name?name=Abies%20alba
%20Mill.%20sec.%20Markus%20D.
{"scientificName": "Abies alba Mill. sec. Markus D.",
"type": "SCINAME",
"genusOrAbove": "Abies",
"specificEpithet": "alba",
"authorsParsed": true,
"authorship": "Mill.",
"sensu": "sec. Markus D.",
"canonicalName": "Abies alba",
"canonicalNameWithMarker": "Abies alba",
"canonicalNameComplete": "Abies alba Mill."
}
http://www.gbif.org/developer/species
API EXAMPLE : OCCURRENCE
List occurrences of Beta vulgaris:
http://api.gbif.org/v1/species/match?name=Beta+vulgaris => taxonKey
http://api.gbif.org/v1/occurrence/search?taxonKey=5383920
List occurrences from Norway (of Beta vulgaris):
http://api.gbif.org/v1/occurrence/search?publishingCountry=NO
http://api.gbif.org/v1/occurrence/search?publishingCountry=NO&taxonKey=5383920
Information about a single occurrence record:
http://api.gbif.org/v1/occurrence/1040970640
http://api.gbif.org/v1/occurrence/1040970640/fragment
http://api.gbif.org/v1/occurrence/1040970640/verbatim
List occurrence counts for datasets of country (or taxon):
http://api.gbif.org/v1/occurrence/counts/datasets?country=NO
http://www.gbif.org/developer/occurrence
API EXAMPLE : DOWNLOAD DATA
Lookup speciesKey (1) and download occurrences (2):
http://api.gbif.org/v1/species/match?
verbose=false&kingdom=Plantae&name=Beta+vulgaris
=> usageKey/speciesKey = 5383920
http://api.gbif.org/v1/occurrence/search?
taxonKey=5383920 [&limit=1000&offset=0]
=> notice: count = 25 513
=> then: page through results…
(using offset & limit)
http://api.gbif.org/v1/occurrence/download/request
[POST] => downloadKey (see next slide)
API EXAMPLE : ASYNCHRONOUS (1)
Request asynchronous download:
$ curl -i --user yourGbifUserName:yourGbifPassord -H
"Content-Type: application/json" -H "Accept: application/json"
-X POST -d @filter.json http://api.gbif.org/v1//occurrence/
download/request >> log.txt
Search parameters in a json text file: filter.json (in current
directory or located in a “PATH-directory”):
{
"creator":”yourGbifUserName",
"notification_address": [“
[email protected]"],
"predicate":
{
"type":"and",
"predicates":
[{"type":"equals","key":"HAS_COORDINATE","value":"false"},
{"type":"equals","key":"TAXON_KEY","value":"5383920"}]
}
}
DOWNLOADS ARE AVAILABLE IN THE
PORTAL (FROM YOUR USER PROFILE)
API EXAMPLE : ASYNCHRONOUS (2A)
Request asynchronous download:
function gbifapi {
curl -i –user yourGbifUserName:yourGbifPassword -H "Content-Type:
application/json" -H "Accept: application/json" -X POST -d "{\"creator\":
\”yourGbifUserName\", \"notification_address\": [\”[email protected]\"],
\"predicate\": {\"type\":\"and\", \"predicates\": [{\"type\":\"equals\",\"key\":
\"HAS_COORDINATE\",\"value\":\"true\"}, {\"type\":\"equals\", \"key\":
\"TAXON_KEY\", \"value\":\"$1\"}] }}" http://api.gbif.org/v1/occurrence/
download/request >> log.txt
echo -e "\r\n$1 $2\r\n\r\n----------------\r\n\r\n" >> log.txt
}
$ gbifapi 4140730 "Aciachne acicularis"
$ gbifapi 4140704 "Aciachne flagellifera"
$ gbifapi 5289784 "Aegilops comosa”
…
API EXAMPLE : ASYNCHRONOUS (2B)
(…clean log.txt with the downloadKeys using regular
expressions…)
function gbifwget {
echo -e "\n\n----------------\n$1 $2 $3\n" >> log_wget.txt
wget http://api.gbif.org/v1/occurrence/download/request/$1.zip 2>&1 | tee /
dev/tty >> log_wget.txt
mv $1.zip ./dwca/$2.zip 2>&1 | tee /dev/tty >> log_wget.txt
}
$ gbifwget 0006050-141024112412452 4140730 "Aciachne acicularis"
$ gbifwget 0006053-141024112412452 4140704 "Aciachne flagellifera"
$ gbifwget 0006056-141024112412452 5289784 "Aegilops comosa"
…
(work in progress…)
Slide by Daniel Amariles, 2013
MAPPING API V1.0
You can easily overlay GBIF content on
your own maps.
http://www.gbif.org/developer/maps
Slide by Daniel Amariles, 2013
MAPPING API V1.0
This service is intended for use with commonly used map clients such
as the Google Maps API, Leaflet JS library or Modest maps JS library.
hGp://leafletjs.com/
hGp://modestmaps.com/
These libraries allow the GBIF layers to be visualized with other
content, such as those coming from Web Map Service (WMS)
providers. It should be noted that the mapping API is not a WMS
service, nor does it support WFS capabili2es.
USEFUL TOOLS (JSON & REST)
REST client …
JSON client/parser …
JSONView (Firefox, Chrome, …)
http://jsonview.com/
Display formatted JSON in browser
R CRAN : jsonlite
http://cran.r-project.org/web/packages/jsonlite/
E.g. read json into a dataframe [link]
OpenRefine
http://openrefine.org/
ROPENSCI : RGBIF
library(rgbif)
key <- name_backbone(name='Hepatica nobilis', kingdom=‘Plantae')$speciesKey
sp <- occ_search(taxonKey=key, return='data', hasCoordinate=TRUE, limit=1000)
gbifmap(sp)
R CRAN
rOpenSci provides programmatic access to scientific data
with R (rgbif, taxize, EML, geonames, …).
https://github.com/ropensci
http://ropensci.org/packages/
http://ropensci.org/tutorials/rgbif_tutorial.html
http://ropensci.org/tutorials/taxize_tutorial.html
RASTER : WORLDCLIM, BIOCLIM LAYERS
# using GBIF data (bv) from the previous slide…
library(raster)
xy <- cbind('lon'=bv$decimalLongitude, 'lat'=bv$decimalLatitude);
env <- getData('worldclim', var='bio', res=10) # bioclim (pkg raster)
plot(env, 1) # plot the first bioclim layer
points(xy[,'lon'], xy[,'lat'], col='red') # plot points
bio <- extract(env, xy); # extract environment to points (pkg raster)
bv_bio <- cbind(bv, bio); # column-bind GBIF-data and bioclim
ROPENSCI : RWBCLIMATE
library(rWBclimate, ggplot2)
country_dat <- get_historical_temp(c("NOR", "SWE", "DNK", "FIN"), "year")
ggplot(country_dat, aes(x = year, y = data, group = locator)) +
theme_bw(base_size=18) + geom_point() + geom_path() +
labs(y="Average annual temperature of Nordic countries", x="Year") +
stat_smooth(se = F, colour = "black") +
facet_wrap(~locator, scale = "free")
RESOLVE TAXONOMIC NAMES
library(taxize) # rOpenSci Taxize
gnr <- gnr_resolve(names = "Beta vuulgariss") # Misspelled name
gnr$results # display suggested names
submitted_name matched_name data_source_title score
1 Beta vuulgariss Beta vulgaris L. Catalogue of Life 0.75
2 Beta vuulgariss Beta vulgaris L. ITIS 0.75
3 Beta vuulgariss Beta vulgaris NCBI 0.75
4 Beta vuulgariss Beta vulgaris var.-gr. crassa Alef. GRIN Taxonomy for Plants 0.75
specieslist <- c("Beta vulgaris", "Phleum pratensis", "Nicotiana glauca")
classification(specieslist, db = 'itis') # lookup higher taxonomy
db = ’i2s'
db = ’col'
Global Names Resolver: http://resolver.globalnames.org/
rOpenSci Taxize: http://ropensci.org/tutorials/taxize_tutorial.html
ROPENSCI : EML
library(EML, rfigshare)
description <- "My dataset published in GBIF"
eml_write(dat = dat, meta, title = "My Dataset",
description = description, creator = "Your Name
<
[email protected]>", file = "dataset.xml")
eml_publish("dataset.xml", description = description,
categories = "Ecology", tags = "biodiversity", destination
= "figshare", visibility = "public")
meta <- eml_read("eml_example.xml")
GBIF API SUPPORT
Subscribe to the mailing-list for help and
information messages:
[email protected] Node team at NHM, University of Oslo
Dag Endresen, Node manager
Chris2an Svindseth, Database manager
Fridtjof Mehlum, Research director
Einar Timdal, Associate professor
Geir Søli, Associate professor
Vidar Bakken, Consultant
Artsdatabanken, Trondheim
Wouter Koch
Nils Valland
NTNU University Museum
Anders Finstad, GBIF Science commiKee
Research Council of Norway
Per Backe-Hansen, Head of delega;on
View publication stats