Eurosat Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319463676

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land
Cover Classification

Article  in  IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing · August 2017
DOI: 10.1109/JSTARS.2019.2918242

CITATIONS READS
462 20,967

4 authors, including:

Patrick Helber Andreas Dengel


Deutsches Forschungszentrum für Künstliche Intelligenz Deutsches Forschungszentrum für Künstliche Intelligenz
19 PUBLICATIONS   881 CITATIONS    842 PUBLICATIONS   9,603 CITATIONS   

SEE PROFILE SEE PROFILE

Damian Borth
Deutsches Forschungszentrum für Künstliche Intelligenz
91 PUBLICATIONS   3,467 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Table Recognition in Heterogenous Documents View project

metis – Knowledge-based search and query methods for the development of semantic information models (BIM) for use in early design phases View project

All content following this page was uploaded by Patrick Helber on 18 January 2019.

The user has requested enhancement of the downloaded file.


JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

EuroSAT: A Novel Dataset and Deep Learning


Benchmark for Land Use and Land Cover
Classification
Patrick Helber1,2 Benjamin Bischke1,2 Andreas Dengel1,2 Damian Borth2
1 2
TU Kaiserslautern, Germany German Research Center for Artificial Intelligence (DFKI), Germany
{Patrick.Helber, Benjamin.Bischke, Andreas.Dengel, Damian.Borth}@dfki.de

Abstract—In this paper, we address the challenge of land use


and land cover classification using Sentinel-2 satellite images. The
Sentinel-2 satellite images are openly and freely accessible pro-
vided in the Earth observation program Copernicus. We present
a novel dataset based on Sentinel-2 satellite images covering 13
spectral bands and consisting out of 10 classes with in total 27,000
labeled and geo-referenced images. We provide benchmarks for
this novel dataset with its spectral bands using state-of-the-art
deep Convolutional Neural Network (CNNs). With the proposed
novel dataset, we achieved an overall classification accuracy of
98.57%. The resulting classification system opens a gate towards
a number of Earth observation applications. We demonstrate
how this classification system can be used for detecting land
use and land cover changes and how it can assist in improving
geographical maps. The geo-referenced dataset EuroSAT is made
publicly available at https://github.com/phelber/eurosat.
Fig. 1: Land use and land cover classification based on
Index Terms—Remote Sensing, Earth Observation, Satellite Sentinel-2 satellite images. Patches are extracted with the pur-
Images, Satellite Image Classification, Land Use Classification,
pose to identify the shown class. This visualization highlights
Land Cover Classification, Dataset, Machine Learning, Deep
Learning, Deep Convolutional Neural Network the classes annual crop, river, highway, industrial buildings
and residential buildings.

I. I NTRODUCTION
a network. Unfortunately, current land use and land cover
W E are currently at the edge of having public and
continuous access to satellite image data for Earth
observation. Governmental programs such as ESA’s Coper-
datasets are small-scale or rely on data sources which do not
allow the mentioned domain applications.
nicus and NASA’s Landsat are taking significant efforts to In this paper, we propose a novel satellite image dataset for
make such data freely available for commercial and non- the task of land use and land cover classification. The proposed
commercial purpose with the intention to fuel innovation and EuroSAT dataset consists of 27,000 labeled images with 10
entrepreneurship. With access to such data, applications in different land use and land cover classes. A significant differ-
the domains of agriculture, disaster recovery, climate change, ence to previous datasets is that the presented satellite image
urban development, or environmental monitoring can be real- dataset is multi-spectral covering 13 spectral bands in the visi-
ized [37], [2], [3], [5]. However, to fully utilize the data for ble, near infrared and short wave infrared part of the spectrum.
the previously mentioned domains, first satellite images must In addition, the proposed dataset is georeferenced and based on
be processed and transformed into structured semantics [35]. openly and freely accessible Earth observation data allowing a
One type of such fundamental semantics is Land Use and unique range of applications. The labeled dataset EuroSAT is
Land Cover Classification [1], [29]. The aim of land use made publicly available at https://github.com/phelber/eurosat.
and land cover classification is to automatically provide labels Further, we provide a full benchmark demonstrating a robust
describing the represented physical land type or how a land classification performance which is the basis for developing
area is used (e.g., residential, industrial). applications for the previously mentioned domains. We outline
As often in supervised machine learning, the performance how the classification model can be used for detecting land
of classification systems strongly depends on the availability use or land cover changes and how it can assist in improving
of high-quality datasets with a suitable set of classes [21]. geographical maps.
In particular when considering the recent success of deep We provide this work in the context of the recently pub-
Convolutional Neural Networks (CNN) [12], it is crucial to lished EuroSAT dataset, which can be used similar to [18] as
have large quantities of training data available to train such a basis for a large-scale training of deep neural networks for
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

Fig. 2: This illustration shows an overview of the patch-based land use and land cover classification process using satellite
images. A satellite scans the Earth to acquire images of it. Patches extracted out of these images are used for classification.
The aim is to automatically provide labels describing the represented physical land type or how the land is used. For this
purpose, an image patch is feed into a classifier, in this illustration a neural network, and the classifier outputs the class shown
on the image patch.

the task of satellite image classification. per pixel. Since the creation of a labeled dataset is extremely
In this paper, we make the following contributions: time-consuming, these datasets consist likewise of only a few
• We introduce the first large-scale patch-based land hundred images per class. One of the largest datasets is the
use and land cover classification dataset based on Aerial Image Dataset (AID). AID consists of 30 classes with
Sentinel-2 satellite images. Every image in the 200 to 400 images per class. The 600x600 high-resolution
dataset is labeled and geo-referenced. We release images were also extracted from Google Earth imagery.
the RGB and the multi-spectral (MS) version of Compared to the EuroSAT dataset presented in this work,
the dataset. the previously listed datasets rely on commercial very-high-
• We provide benchmarks for the proposed Eu- resolution and preprocessed images. The fact of using com-
roSAT dataset using Convolutional Neural Net- mercial and preprocessed very-high-resolution image data
works (CNNs). makes these datasets unsatisfying for real-world Sentinel-
• We evaluate the performance of each spectral band 2 Earth observation applications as proposed in this work.
of the Sentinel-2 satellite for the task of patch- Furthermore, while these datasets put a strong focus on
based land use and land cover classification. strengthening the number of covered classes, the datasets
suffer from a low number of images per class. The fact of a
spatial resolution of up to 30 cm per pixel, with the possibility
II. R ELATED W ORK
to identify and distinguish classes like churches, schools etc.,
In this section, we review previous studies in land use and make the presented datasets difficult to compare with the
land cover classification. In this context, we present remotely dataset proposed in this work.
sensed aerial and satellite image datasets. Furthermore, we A study closer to our work, provided by Penatti et al.
present state-of-the-art image classification methods for land [20], analyzed remotely sensed satellite images with a spatial
use and land cover classification. resolution of 10 meters per pixel to classify coffee crops.
Based on these images, Penatti et al. [20] introduced the
A. Classification Datasets Brazillian Coffee Scene (BCS) dataset. The dataset covers
the two classes coffee crop and non-coffee crop. Each class
The classification of remotely sensed images is a challeng-
consists of 1,423 images. The images consist of a red, green
ing task. The progress of classification in the remote sensing
and near-infrared band.
area has particularly been inhibited due to the lack of reli-
Similar to the proposed EuroSAT dataset, Basu et al. [1]
ably labeled ground truth datasets. A popular and intensively
introduced the SAT-6 dataset relying on aerial images. This
studied [6], [19], [20], [27], [29] remotely sensed image classi-
dataset has been extracted from images with a spatial reso-
fication dataset known as UC Merced (UCM) land use dataset
lution of 1 meter per pixel. The image patches are created
was introduced by Yang et al. [29]. The dataset consists of 21
using images from the National Agriculture Imagery Program
land use and land cover classes. Each class has 100 images and
(NAIP). SAT-6 covers the 6 different classes: barren land,
the contained images measure 256x256 pixels with a spatial
trees, grassland, roads, buildings and water bodies. The pro-
resolution of about 30 cm per pixel. All images are in the RGB
posed patches have a size of 28x28 pixels per image and
color space and were extracted from the USGS National Map
consist of a red, green, blue and a near-infrared band.
Urban Area Imagery collection, i.e. the underlying images
were acquired from an aircraft. Unfortunately, a dataset with
100 images per class is small-scale. Trying to enhance the B. Land Use and Land Cover Classification
dataset situation, various works used commercial Google Earth Convolutional Neural Networks (CNNs) are a type of Neu-
images to manually create novel datasets [22], [27], [28], ral Networks [13], which became with the impressive results
[30] such as the two benchmark datasets PatternNet [39] and on image classification challenges [12], [21], [23] the state-
NWPU-RESISC45 [36]. The datasets are based on very-high- of-the-art image classification method in computer vision and
resolution images with a spatial resolution of up to 30 cm machine learning.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

III. DATASET ACQUISITION


Besides NASA with its Landsat Mission, the European
Space Agency (ESA) steps up efforts to improve Earth ob-
servation within its Copernicus program. Under this program,
ESA operates a series of satellites known as Sentinels.
In this paper, we use mutli-spectral image data provided
by the Sentinel-2A satellite in order to address the challenge
of land use and land cover classification. Sentinel-2A is
one satellite in the two-satellite constellation of the identical
land monitoring satellites Sentinel-2A and Sentinel-2B. The
satellites were successfully launched in June 2015 (Sentinel-
2A) and March 2017 (Sentinel-2B). Both sun-synchronous
satellites capture the global Earth’s land surface with a Multi-
spectral Imager (MSI) covering the 13 different spectral bands
listed in Table I. The three bands B01, B09 and B10 are
intended to be used for the correction of atmospheric effects
(e.g., aerosols, cirrus or water vapor). The remaining bands are
primarily intended to identify and monitor land use and land
cover classes. In addition to mainland, large islands as well as
inland and coastal waters are covered by these two satellites.
Each satellite will deliver imagery for at least 7 years with a
spatial resolution of up to 10 meters per pixel. Both satellites
carry fuel for up to 12 years of operation which allows for
an extension of the operation. The two-satellite constellation
Fig. 3: The diagram illustrates the EuroSAT dataset creation
generates a coverage of almost the entire Earth’s land surface
process.
about every five days, i.e. the satellites capture each point in
the covered area about every five days. This short repeat cycle
as well as the future availability of the Sentinel satellites allows
To classify remotely sensed images, various different feature a continuous monitoring of the Earth’s land surface for about
extraction and classification methods (e.g., Random Forests) the next 20 - 30 years. Most importantly, the data is openly
were evaluated on the introduced datasets. Yang et al. eval- and freely accessible and can be used for any application
uated Bag-of-Visual-Words (BoVW) and spatial extension (commercial or non-commercial use).
approaches on the UCM dataset [29]. Basu et al. analyzed We are convinced that the large volume of satellite data
deep belief networks, basic CNNs and stacked denoising au- in combination with powerful machine learning methods will
toencoders on the SAT-6 dataset [1]. Basu et al. also presented influence future research. Therefore, one of our key research
an own framework for the land cover classes introduced in aims is to make this large amount of data accessible for
the SAT-6 dataset. The framework extracts features from the machine learning based applications. To construct an image
input images, normalizes the extracted features and used the classification dataset, we performed the following two steps:
normalized features as input to a deep belief network. Besides 1) Satellite Image Acquisition: We gathered satellite images
low-level color descriptors, Penatti et al. also evaluated deep of European cities distributed in over 34 countries as
CNNs on the UCM and BCS dataset [20]. In addition to shown in Fig. 5.
deep CNNs, Castelluccio et al. intensively evaluated various 2) Dataset Creation: Based on the obtained satellite images,
machine learning methods (e.g., Bag-of-Visual-Words, spatial we created a dataset of 27,000 georeferenced and labeled
pyramid match kernels) for the classification of the UCM and image patches. The image patches measure 64x64 pixels
BCS dataset. and have been manually checked.
In the context of deep learning, the used deep CNNs have
been trained from scratch or fine-tuned by using a pretrained
A. Satellite Image Acquisition
network [6], [19], [31], [36], [16]. The networks were mainly
pretrained on the ILSVRC-2012 image classification chal- We have downloaded satellite images taken by the satel-
lenge [21] dataset. Even though these pretrained networks lite Sentinel-2A via Amazon S3. We chose satellite images
were trained on images from a totally different domain, the associated with the cities covered in the European Urban
features generalized well. Therefore, the pretrained networks Atlas. The covered cities are distributed over the 34 Euro-
proved to be suitable for the classification of remotely sensed pean countries: Austria, Belarus, Belgium, Bulgaria, Cyprus,
images [17]. The presented works extensively evaluated all Czech Republic (Czechia), Denmark, Estonia, Finland, France,
proposed machine learning methods and concluded that that Germany, Greece, Hungary, Iceland, Ireland, Italy / Holy See,
deep CNNs outperform non-deep learning approaches on the Latvia, Lithuania, Luxembourg, Macedonia, Malta, Republic
considered datasets [6], [17], [15], [27]. of Moldova, Netherlands, Norway, Poland, Portugal, Romania,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

(a) Industrial Buildings (b) Residential Buildings (c) Annual Crop (d) Permanent Crop (e) River

(f) Sea & Lake (g) Herbaceous Vegetation (h) Highway (i) Pasture (j) Forest
Fig. 4: This overview shows sample image patches of all 10 classes covered in the proposed EuroSAT dataset. The images
measure 64x64 pixels. Each class contains 2,000 to 3,000 image. In total, the dataset has 27,000 geo-referenced images.

Slovakia, Slovenia, Spain, Sweden, Switzerland, Ukraine and TABLE I: All 13 bands covered by Sentinel-2’s Multispectral
United Kingdom. Imager (MSI). The identification, the spatial resolution and the
In order to improve the chance of getting valuable image central wavelength is listed for each spectral band.
patches, we selected satellite images with a low cloud level.
Besides the possibility to generate a cloud mask, ESA provides Band Spatial Central
a cloud level value for each satellite image allowing to quickly Resolution Wavelength
m nm
select images with a low percentage of clouds covering the
B01 - Aerosols 60 443
land scene. B02 - Blue 10 490
We aimed for the objective to cover as many countries B03 - Green 10 560
as possible in the EuroSAT dataset in order to cover the B04 - Red 10 665
B05 - Red edge 1 20 705
high intra-class variance inherent to remotely sensed images. B06 - Red edge 2 20 740
Furthermore, we have extracted images recorded all over the B07 - Red edge 3 20 783
year to get a variance as high as possible inherent in the B08 - NIR 10 842
B08A - Red edge 4 20 865
covered land use and land cover classes. Within one class of B09 - Water vapor 60 945
the EuroSAT dataset, different land types of this class are rep- B10 - Cirrus 60 1375
resented such as different types of forests in the forest class or B11 - SWIR 1 20 1610
B12 - SWIR 2 20 2190
different types of industrial buildings in the industrial building
class. Between the classes, there is a low positive correlation.
The classes most common to each other are the two presented
agricultural classes and the two classes representing residential land use and land cover classes based on the principle that they
and industrial buildings. The composition of the individual showed to be visible at the resolution of 10 meters per pixel
classes and their relationships are specified in the mapping and are frequently enough covered by the European Urban
guide of the European Urban Atlas [40]. An overview diagram Atlas to generate thousands of image patches. To differentiate
of the dataset creation process is shown in Fig. 3 between different agricultural land uses, the proposed dataset
covers the classes annual crop, permanent crop (e.g., fruit
orchards, vineyards or olive groves) and pastures. The dataset
B. Dataset Creation also discriminates built-up areas. It therefore covers the classes
The Sentinel-2 satellite constellation provides about 1.6 highway, residential buildings and industrial buildings. The
TB of compressed images per day. Unfortunately, supervised residential class is created using the urban fabric classes
machine learning is restricted even with this amount of data described in the European Urban Atlas. Different water bodies
by the lack of labeled ground truth data. The generation of appear in the classes river and sea & lake. Furthermore, unde-
the benchmarking EuroSAT dataset was motivated by the veloped environments such as forest and herbaceous vegetation
objective of making this open and free satellite data accessible are included. An overview of the covered classes with four
to various Earth observation applications and the observation samples per class is shown in Fig. 4.
that existing benchmark datasets are not suitable for the We manually checked all 27,000 images multiple times and
intended applications with Sentinel-2 satellite images. The corrected the ground truth by sorting out mislabeled images
dataset consists of 10 different classes with 2,000 to 3,000 as well as images full of snow or ice. Example images, which
images per class. In total, the dataset has 27,000 images. The have been discarded, are shown in Fig. 6. The samples are
patches measure 64x64 pixels. We have chosen 10 different intended to show industrial buildings. Clearly, no industrial
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

TABLE II: Classification accuracy (%) of different training-test splits on the EuroSAT dataset.
Method 10/90 20/80 30/70 40/60 50/50 60/40 70/30 80/20 90/10
BoVW (SVM, SIFT, k = 10) 54.54 56.13 56.77 57.06 57.22 57.47 57.71 58.55 58.44
BoVW (SVM, SIFT, k = 100) 63.07 64.80 65.50 66.16 66.25 66.34 66.50 67.22 66.18
BoVW (SVM, SIFT, k = 500) 65.62 67.26 68.01 68.52 68.61 68.74 69.07 70.05 69.54
CNN (two layers) 75.88 79.84 81.29 83.04 84.48 85.77 87.24 87.96 88.66
ResNet-50 75.06 88.53 93.75 94.01 94.45 95.26 95.32 96.43 96.37
GoogleNet 77.37 90.97 90.57 91.62 94.96 95.54 95.70 96.02 96.17

IV. DATASET B ENCHMARKING


As shown in previous work [6], [15], [17], [19], deep
CNNs have demonstrated to outperform non-deep learning
approaches in land use and land cover image classification.
Accordingly, we use the state-of-the-art deep CNN models
GoogleNet [25] and ResNet-50 [9], [10] for the classifica-
tion of the introduced land use and land cover classes. The
networks make use of the inception module [25], [26], [24],
[14] and the residual unit [9], [10]. For the proposed EuroSAT
dataset, we also evaluated the performance of the 13 spectral
bands with respect to the classification task. In this context,
we evaluate the classification performance using single-band
and band combination images.

TABLE III: Benchmarked classification accuracy (%) of the


two best performing classifiers GoogLeNet and ResNet-50
with a 80/20 training-test split. Both CNNs have been pre-
trained on the image classification dataset ILSVRC-2012 [21].
Fig. 5: EuroSAT dataset distribution. The georeferenced im- Method UCM AID SAT-6 BCS EuroSAT
ages are distributed all over Europe. The distribution is influ- ResNet-50 96.42 94.38 99.56 93.57 98.57
enced by the number of represented cities per country in the GoogLeNet 97.32 93.99 98.29 92.70 98.18
European Urban Atlas.

A. Comparative Evaluation
building is visible. Please note, the proposed dataset has not We respectively split each dataset in a training and a test set
received atmospheric correction. This can result in images with (80/20 ratio). We ensured that the split is applied class-wise.
a color cast. Extreme cases are visualized in Fig. 7. With the While the red, green and blue bands are covered by almost
intention to advocate the classifier to also learn these cases, all aerial and satellite image datasets, the proposed EuroSAT
we did not filter the respective samples and let them flow into dataset consists of 13 spectral bands. For the comparative
the dataset. evaluation, we computed images in the RGB color space
combining the bands red (B04), green (B03) and blue (B02).
Besides the 13 covered spectral bands, the new dataset has For benchmarking, we evaluated the performance of the Bag-
three further central innovations. Firstly, the dataset is not of-Visual-Words (BoVW) approach using SIFT features and a
based on non-free satellite images like Google Earth imagery trained SVM. In addition, we trained a shallow Convolutional
or relies on data sources which are not updated on a high- Neural Network (CNN), a ResNet-50 and a GoogleNet model
frequent basis (e.g., NAIP used in [1]). Instead, an open on the training set. We calculated the overall classification
and free Earth observation program whose satellites deliver accuracy to evaluate the performance of the different models
images for the next 20 - 30 years is used allowing real- on the considered datasets. In Table II we show how the
world Earth observation applications. Secondly, the dataset approaches perform in case of different training-test splits for
uses a 10 times lower spatial resolution than the benchmark the EuroSAT RGB dataset.
dataset closest to our research but at once distinguishes 10 It can be seen that all CNN approaches outperform the
classes instead of 6. For instance, we split up the built-up BoVW method and, overall, deep CNNs perform better than
class into a residential and an industrial class or distinguish shallow CNNs. Nevertheless, the shallow CNN classifies the
between different agricultural land uses. Thirdly, we release EuroSAT classes with a classification accuracy of up to
the EuroSAT dataset in a georeferenced version. 89.03%. Please note [6], [19], [22] for the benchmarking
With the release of the geo-referenced EuroSAT we aim to performance of the other datasets on different training-test
make the large amount of Sentinel-2 satellite imagery accessi- splits.
ble for machine learning approaches. There effectiveness was Table III lists the achieved classification results for the two
successfully demonstrated in [32], [33], [34]. best performing CNN models GoogLeNet and ResNet-50. In
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

1.0
An. Crop 0.98 0.0 0.0 0.0 0.0 0.0 0.02 0.0 0.0 0.0
Forest 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.8
Herbaceous 0.0 0.0 0.99 0.0 0.0 0.01 0.0 0.0 0.0 0.0

Fig. 6: Four examples of bad image patches, which are Highway 0.0 0.0 0.0 0.98 0.0 0.0 0.0 0.0 0.02 0.0
0.6
Industrial 0.0 0.0 0.0 0.0 0.99 0.0 0.0 0.01 0.0 0.0

True label
intended to show industrial buildings. Clearly, no industrial
building is shown due to clouds, mislabeling, dead pixels or Pasture 0.0 0.0 0.02 0.0 0.0 0.98 0.0 0.0 0.0 0.0
0.4
ice/snow. Per. Crop 0.0 0.0 0.02 0.0 0.0 0.0 0.98 0.0 0.0 0.0
Residential 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.2
River 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.98 0.01
Sea & Lake 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
0.0

Fo p
t

Hig ous
Ind ay

Pa ial
Pe re
Re rop

Riv al
Se er
ke
res
Cro

ti
stu
tr

La
hw

en
r. C
ce

us
.

a&
sid
rba
An

He
Fig. 7: Color cast due to atmospheric effects. Predicted label

Fig. 8: Confusion matrix of a fine-tuned ResNet-50 CNN on


this experiment, the GoogleNet and ResNet-50 CNN models the proposed EuroSAT dataset using satellite images in the
were pretrained on the ILSVRC-2012 image classification RGB color space.
dataset [21]. In all fine-tuning experiments, we first trained
the last layer with a learning rate of 0.01. Afterwards, we TABLE IV: Classification accuracy (%) of a fine-tuned
fine-tuned through the entire network with a low learning ResNet-50 CNN on the proposed EuroSAT dataset with
rate between 0,001 and 0,0001. With a finetuned network we the three different band combinations color-infrared (CI),
achieve a classification accuracy of about 2% higher compared shortwave-infrared (SWIR) and RGB as input.
to randomly initialized versions of the networks which have Band Combination Accuracy (ResNet-50)
been trained on the EuroSAT dataset with the same training- CI 98.30
test split (see Table II). RGB 98.57
The deep CNNs achieve state-of-the-art results on the UCM SWIR 97.05
dataset and outperform previous results on the other three
presented datasets by about 2-4% (AID, SAT-6, BCS) [6],
100
[19], [22]. Table III shows that the ResNet-50 architecture
performs best on the introduced EuroSAT land use and land 80
cover classes. In order to allow an evaluation on the class
level, Fig. 8 shows the confusion matrix of this best performing 60
Accuracy

network. It is shown that the classifier sometimes confuses the


agricultural land classes as well as the classes highway and 40
river.
20

B. Band Evaluation
0
In order to evaluate the performance of deep CNNs using
B01
B02
B03
B04
B05
B06
B07
B08
B8A
B09
B10
B11
B12

Band
single-band images as well shortwave-infrared and color-
infrared band combinations, we used the pretrained ResNet-50 Fig. 9: Overall classification accuracy (%) of a fine-tuned
with a fixed training-test split to compare the performance ResNet-50 CNN on the given EuroSAT dataset using single-
of the different spectral bands. For the single-band image band images.
evaluation, we used images as input consisting of the in-
formation gathered from a single spectral band on all three
input channels. We analyzed all spectral bands, even the bands band combination. Table IV shows a comparison of the per-
not intended for land monitoring. Bands with a lower spatial formance of these combinations. As shown, band combination
resolution have been upsampled to 10 meters per pixel using images outperform single-band images. Furthermore, images
cubic-spline interpolation [8]. Fig. 9 shows a comparison of in the RGB color space performed best on the introduced land
the spectral band’s performance. It is shown that the red, use and land cover classes. Please note, networks pretrained
green and blue bands outperform all other bands. Interestingly, on the ILSVRC-2012 image classification dataset have initially
the bands red edge 1 (B05) and shortwave-infrared 2 (B12) not been trained on images other than RGB images.
with an original spatial resolution of merely 20 meters per
pixel showed an impressive performance. The two bands even V. A PPLICATIONS
outperform the near-infrared band (B08) which has a spatial The openly and freely accessible satellite images allow
resolution of 10 meters per pixel. a broad range of possible applications. In this section, we
In addition to the RGB band combination, we also analyzed demonstrate that the novel dataset published with this paper
the performance of the shortwave-infrared and color-infrared allows real-world applications. The classification result with
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

an overall accuracy of 98.57% paves the way for these appli-


cations. We show land use and land cover change detection
applications as well as how the the trained classifier can assist
in keeping geographical maps up-to-date.

A. Land Use and Land Cover Change Detection

Since the Sentinel-2 satellite constellation will scan the


Earth’s land surface for about the next 20 - 30 years on a Fig. 11: The left image was acquired in the surroundings of
repeat cycle of about five days, a trained classifier can be used Dallas, USA in August 2015 showing no dominant residential
for monitoring land surfaces and detect changes in land use buildings in the highlighted area. The right image shows the
or land cover. To demonstrate land use and land cover change same area in March 2017 showing that residential buildings
detection, we selected images from the same spatial region but have been built up.
from different points in time. Using the trained classifier, we
analyzed 64x64 image regions. A change has taken place if
the classifier delivers different classification results for patches
taken from the same spatial 64x64 region. In the following, we
show three examples of spotted changes. In the first example
shown in Fig. 10, the classification system recognized that
the land cover has changed in the highlighted area. The left
image was acquired in the surroundings of Shanghai, China in
December 2015 showing an area classified as industrial. The
right image shows the same area in December 2016 revealing
that the industrial buildings have been demolished. The second
example is illustrated in Fig. 11. The left image was acquired Fig. 12: The left image was acquired near Villamontes, Bolivia
in the surroundings of Dallas, USA in August 2015 showing in October 2015. The right image shows the same area in
no dominant residential buildings in the highlighted area. September 2016 revealing that a large land area has been
The right image shows the same area in March 2017. The deforested.
system has identified a change in the highlighted area revealing
that residential buildings have been constructed. The third
example presented in Fig. 12 shows that the system detected
deforestation near Villamontes, Bolivia. The left image was B. Assistance in Mapping
acquired in October 2015. The right image shows the same
region in September 2016 revealing that a large area has been While a classification system trained with 64x64 image
deforested. The presented examples are particularly of interest patches does not allow a finely graduated per-pixel segmenta-
in urban area development, nature protection or sustainable tion, it cannot only detect changes as shown in the previous
development. For instance, deforestation is a main contributor examples, it can also facilitate keeping maps up-to-date. This
to climate change, therefore the detection of deforested land is is an extremely helpful assistance with maps created in a
of particular interest (e.g., to notice illegal clearing of forests). crowdsourced manner like OpenStreetMap (OSM). A possible
system can verify already tagged areas, identify mistagged
areas or bring large area tagging. The proposed system is
based on the trained CNN classifier providing a classification
result for each image patch created in a sliding windows based
manner.
As shown in Fig. 13, the industrial buildings seen in the
left up-to-date satellite image are almost completely covered
in the corresponding OSM mapping. The right up-to-date
satellite image also shows industrial buildings. However, a
major part of the industrial buildings is not covered in the
corresponding map. Due to the high temporal availability of
Fig. 10: The left image was acquired in the surroundings Sentinel-2 satellite images in the future, this work together
of Shanghai in December 2015 showing an area classified with the published dataset can be used to build systems which
as industrial. The right image shows the same region in assist in keeping maps up-to-date. A detailed analysis of the
December 2016 revealing that the industrial buildings have respective land area can then be provided using high-resolution
been demolished. satellite images and an advanced segmentation approach [4],
[11].
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

Fig. 13: A patch-based classification system can verify already tagged areas, identify mistagged areas or bring large area
tagging as shown in the above images and maps. The left Sentinel-2 satellite image was acquired in Australia in March 2017.
The right satellite image was acquired in the surroundings of Shanghai, China in March 2017. The corresponding up-to-date
OpenStreetMap (OSM) mapping images show that the industrial areas in the left satellite image are almost completely covered
(colored gray). However, the industrial areas in the right satellite image are not properly covered.

VI. C ONCLUSION [3] B. Bischke, D. Borth, C. Schulze, and A. Dengel. Contextual Enrichment
of Remote-Sensed Events with Social Media Streams. In Proceedings of
In this paper, we have addressed the challenge of land the 2016 ACM on Multimedia Conference, pages 1077–1081. ACM, 2016.
use and land cover classification. For this task, we presented [4] B. Bischke, P. Helber, J. Folz, D. Borth, and A. Dengel. Multi-Task
a novel dataset based on remotely sensed satellite images. Learning for Segmentation of Buildings Footprints with Deep Neural
Networks. In arXiv preprint arXiv:1709.05932, 2017.
To obtain this dataset, we have used the openly and freely [5] B. Bischke, P. Helber, C. Schulze, V. Srinivasan, and D. Borth. The
accessible Sentinel-2 satellite images provided in the Earth Multimedia Satellite Task: Emergency Response for Flooding Events. In
observation program Copernicus. The proposed dataset con- MediaEval, 2017.
[6] M. Castelluccio, G. Poggi, C. Sansone, and L. Verdoliva. Land use
sists of 10 classes covering 13 different spectral bands with in classification in remote sensing images by convolutional neural networks.
total 27,000 labeled and geo-referenced images. We provided arXiv preprint arXiv:1508.00092, 2015.
benchmarks for this dataset with its spectral bands using state- [7] G. Cheng, J. Han, and X. Lu. Remote sensing image scene classification:
Benchmark and state of the art. Proceedings of the IEEE, 2017.
of-the-art deep Convolutional Neural Network (CNNs). For [8] C. De Boor, C. De Boor, E.-U. Mathématicien, C. De Boor, and
this novel dataset, we analyzed the performance of the 13 C. De Boor. A practical guide to splines, volume 27. Springer-Verlag
different spectral bands. As a result of this evaluation, the New York, 1978.
[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image
RGB band combination with an overall classification accuracy recognition. In Proceedings of the IEEE Conference on Computer Vision
of 98.57% outperformed the shortwave-infrared and the color- and Pattern Recognition, pages 770–778, 2016.
infrared band combination and leads to a better classification [10] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual
networks. In European Conference on Computer Vision, pages 630–645.
accuracy than all single-band evaluations. Overall, the avail- Springer, 2016.
able free Sentinel-2 satellite images offer a broad range of [11] M. Kampffmeyer, A.-B. Salberg, and R. Jenssen. Semantic segmentation
possible applications. This work is a first important step to of small objects and modeling of uncertainty in urban remote sensing im-
ages using deep convolutional neural networks. In The IEEE Conference
make use of the large amount of available satellite data in on Computer Vision and Pattern Recognition (CVPR) Workshops, June
machine learning allowing to monitor Earth’s land surfaces 2016.
on a large scale. The proposed dataset can be leveraged for [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification
with deep convolutional neural networks. In Advances in neural infor-
multiple real-world Earth observation applications. Possible mation processing systems, pages 1097–1105, 2012.
applications are land use and land cover change detection or [13] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
the improvement of geographical maps. W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten
zip code recognition. Neural computation, 1(4):541–551, 1989.
[14] M. Lin, Q. Chen, and S. Yan. Network in network. arXiv preprint
ACKNOWLEDGMENT arXiv:1312.4400, 2013.
This work was partially funded by the BMBF project [15] F. P. Luus, B. P. Salmon, F. van den Bergh, and B. Maharaj. Multiview
deep learning for land-use classification. IEEE Geoscience and Remote
DeFuseNN (01IW17002). The authors thank NVIDIA for the Sensing Letters, 12(12):2448–2452, 2015.
support within the NVIDIA AI Lab program. [16] Z. Ma, Z. Wang, C. Liu, and X. Liu. Satellite imagery classification
based on deep convolution network. World Academy of Science, Engi-
R EFERENCES neering and Technology, International Journal of Computer, Electrical,
Automation, Control and Information Engineering, 10(6):1113–1117,
[1] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, and 2016.
R. Nemani. Deepsat: a learning framework for satellite imagery. In Pro- [17] D. Marmanis, M. Datcu, T. Esch, and U. Stilla. Deep learning earth
ceedings of the 23rd SIGSPATIAL International Conference on Advances observation classification using imagenet pretrained networks. IEEE
in Geographic Information Systems, page 37. ACM, 2015. Geoscience and Remote Sensing Letters, 13(1):105–109, 2016.
[2] B. Bischke, P. Bhardwaj, A. Gautam, P. Helber, D. Borth, and A. Dengel. [18] K. Ni, R. Pearce, K. Boakye, B. Van Essen, D. Borth, B. Chen, and
Detection of Flooding Events in Social Multimedia and Satellite Imagery E. Wang. Large-scale deep learning on the yfcc100m dataset. arXiv
using Deep Neural Networks. In MediaEval, 2017. preprint arXiv:1502.03409, 2015.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

[19] K. Nogueira, O. A. Penatti, and J. A. dos Santos. Towards better exploit- [40] European Commission. Mapping guide for a European urban atlas. https:
ing convolutional neural networks for remote sensing scene classification. //ec.europa.eu/regional policy/sources/tender/pdf/2012066/annexe2.pdf,
Pattern Recognition, 61:539–556, 2017. 2012.
[20] O. A. Penatti, K. Nogueira, and J. A. dos Santos. Do deep features
generalize from everyday objects to remote sensing and aerial scenes
domains? In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, pages 44–51, 2015.
[21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-
Fei. ImageNet Large Scale Visual Recognition Challenge. International
Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
[22] G. Sheng, W. Yang, T. Xu, and H. Sun. High-resolution satellite scene
classification using a sparse coding based multiple feature combination.
International journal of remote sensing, 33(8):2395–2412, 2012.
[23] K. Simonyan and A. Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[24] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4,
inception-resnet and the impact of residual connections on learning. arXiv
preprint arXiv:1602.07261, 2016.
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1–9, 2015.
[26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking
the inception architecture for computer vision. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 2818–
2826, 2016.
[27] G.-S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, and L. Zhang.
Aid: A benchmark dataset for performance evaluation of aerial scene
classification. arXiv preprint arXiv:1608.05167, 2016.
[28] G.-S. Xia, W. Yang, J. Delon, Y. Gousseau, H. Sun, and H. Maı̂tre.
Structural high-resolution satellite image indexing. In ISPRS TC VII
Symposium-100 Years ISPRS, volume 38, pages 298–303, 2010.
[29] Y. Yang and S. Newsam. Bag-of-visual-words and spatial extensions
for land-use classification. In Proceedings of the 18th SIGSPATIAL
international conference on advances in geographic information systems,
pages 270–279. ACM, 2010.
[30] L. Zhao, P. Tang, and L. Huo. Feature significance-based multibag-of-
visual-words model for remote sensing image scene classification. Journal
of Applied Remote Sensing, 10(3):035004–035004, 2016.
[31] Kashif Ahmad, Konstantin Pogorelov, Michael Riegler, Nicola Conci,
and H Pal. Cnn and gan based satellite and social media data fusion for
disaster detection. In Proc. of the MediaEval 2017 Workshop, Dublin,
Ireland, 2017.
[32] Guanzhou Chen, Xiaodong Zhang, Xiaoliang Tan, Yufeng Cheng, Fan
Dai, Kun Zhu, Yuanfu Gong, and Qing Wang. Training small networks for
scene classification of remote sensing images via knowledge distillation.
Remote Sensing, 10(5):719, 2018.
[33] Subhankar Roy, Enver Sangineto, Nicu Sebe, and Begüm Demir.
Semantic-fusion gans for semi-supervised satellite image classification.
In 2018 25th IEEE International Conference on Image Processing (ICIP),
pages 684–688. IEEE, 2018.
[34] Patrick Helber, Benjamin Bischke, Andreas Dengel, Damian Borth.
Introducing EuroSAT: A Novel Dataset and Deep Learning Benchmark
for Land Use and Land Cover Classification In Geoscience and Remote
Sensing Symposium (IGARSS), 2018 IEEE International. IEEE, 2018.
[35] Lanqing Huang, Bin Liu, Boying Li, Weiwei Guo, Wenhao Yu, Zenghui
Zhang, and Wenxian Yu. Opensarship: A dataset dedicated to sentinel-1
ship interpretation. IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, 11(1):195–208, 2018.
[36] Gong Cheng, Junwei Han, and Xiaoqiang Lu. Remote sensing image
scene classification: benchmark and state of the art. Proceedings of the
IEEE, 105(10):1865–1883, 2017.
[37] Moacir Ponti, Arthur A Chaves, Fábio R Jorge, Gabriel BP Costa,
Adimara Colturato, and Kalinka RLJC Branco. Precision agriculture:
Using low-cost systems to acquire low-altitude images. IEEE computer
graphics and applications, 36(4):14–20, 2016.
[38] Weixun Zhou, Shawn Newsam, Congmin Li, and Zhenfeng Shao.
Patternnet: a benchmark dataset for performance evaluation of remote
sensing image retrieval. ISPRS Journal of Photogrammetry and Remote
Sensing, 2018.
[39] Weixun Zhou, Shawn Newsam, Congmin Li, and Zhenfeng Shao.
Patternnet: a benchmark dataset for performance evaluation of remote
sensing image retrieval. ISPRS Journal of Photogrammetry and Remote
Sensing, 2018.

View publication stats

You might also like