0% found this document useful (0 votes)
116 views

Deep Learning For Classification Tasks On Geospatial Vector Polygons

This document evaluates deep learning approaches for classifying geospatial vector polygon data. It shows that deep learning models can achieve comparable accuracy to traditional machine learning methods when trained directly on polygon coordinate sequences, without requiring extracted feature inputs. Three classification tasks on real-world Dutch geospatial data demonstrate this ability. The benchmark data and models allow novel analysis of vector shape recognition using deep learning.

Uploaded by

emezac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Deep Learning For Classification Tasks On Geospatial Vector Polygons

This document evaluates deep learning approaches for classifying geospatial vector polygon data. It shows that deep learning models can achieve comparable accuracy to traditional machine learning methods when trained directly on polygon coordinate sequences, without requiring extracted feature inputs. Three classification tasks on real-world Dutch geospatial data demonstrate this ability. The benchmark data and models allow novel analysis of vector shape recognition using deep learning.

Uploaded by

emezac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Deep Learning for Classification Tasks on

Geospatial Vector Polygons


arXiv:1806.03857v2 [stat.ML] 11 Jun 2019

R.H. van ’t Veer∗ P. Bloem† E.J.A. Folmer‡


June 12, 2019

Abstract
In this paper, we evaluate the accuracy of deep learning approaches
on geospatial vector geometry classification tasks. The purpose of this
evaluation is to investigate the ability of deep learning models to learn
from geometry coordinates directly. Previous machine learning research
applied to geospatial polygon data did not use geometries directly, but
derived properties thereof. These are produced by way of extracting ge-
ometry properties such as Fourier descriptors. Instead, our introduced
deep neural net architectures are able to learn on sequences of coordi-
nates mapped directly from polygons. In three classification tasks we
show that the deep learning architectures are competitive with common
learning algorithms that require extracted features.

Acknowledgements
This work was supported by the Dutch National Cadastre (Kadaster) and the
Amsterdam Academic Alliance Data Science (AAA-DS) Program Award to the
UvA and VU Universities. We would also like to thank the following organisa-
tions. The source data for the neighbourhoods task is published by Statistics
Netherlands (CBS) and distributed by the Publieke Dienstverlening op de Kaart
organization (PDOK) under a Creative Commons (CC) Attribution license. The
data for the buildings task was published by the Dutch National Cadastre un-
der a CC Zero license. The archaeological data in raw form is hosted by Data
Archiving and Networked Services, and re-licensed by kind permission of copy-
right holder ADC ArcheoProjecten under CC-BY-4.0. We thank Henk Scholten,
Frank van Harmelen, Xander Wilcke, Maurice de Kleijn, Jaap Boter, Chris Lu-
cas, Eduardo Dias, Brian de Vogel and anonymous reviewers for their helpful
comments.
∗ Vrije Universiteit Amsterdam, Kadaster, Geodan: [email protected], orcid: https://

orcid.org/0000-0003-0520-6684
† Vrije Universiteit Amsterdam, orcid: https://orcid.org/0000-0002-0189-5817
‡ University of Twente, Kadaster, orcid: https://orcid.org/0000-0002-7845-1763

1
1 Introduction
For many tasks, it is useful to analyse the geometric shapes of geospatial objects,
such as in quality assessment or enrichment of map data (Fan et al, 2014) or
such as the classification of topographical objects (Keyes and Winstanley, 1999).
Machine learning is increasingly used in geospatial analysis tasks. Machine
learning can learn from data by extracting patterns (Goodfellow et al, 2016,
2). For example, machine learning can be applied to classify building types
(Xu et al, 2017), analyse wildfires (Araya et al, 2016), traffic safety (Effati
et al, 2015), cluster spatial objects (Hagenauer, 2016), detect aircraft shapes
(Wu et al, 2016) or classify road sections (Andrášik and Bı́l, 2016): tasks that
extend beyond standard GIS processing operations. The prediction of house
prices (Montero et al, 2018) and the estimation of pedestrian side walk widths
(Brezina et al, 2017) are tasks that could also benefit from the application of
machine learning analysis on geometric shapes.
Deep learning is a relatively new addition to the collection of machine learn-
ing methods. Deep learning allows stacking multiple learning layers to form
a model that is able to train latent representations at varying levels of data
abstraction (LeCun et al, 2015). In this paper, we will use the term shallow
machine learning (Ball et al, 2017, 2-3) to refer to methods that are not based
on deep learning methods. A distinguishing property of deep versus shallow
learning methods is that shallow learning requires a preprocessing step known
in machine learning as feature extraction (LeCun et al, 2015, 438), a lossy data
transformation process. Shallow models require feature vectors as input data,
so when using data of variable length such as geometries, shallow learning al-
gorithms depend on feature extraction. One advantage of deep models over
shallow models is that these feature extraction methods are not required for
deep learning, which is why we want to explore the abilities of deep learning
to operate on all available geometry data rather than on an extracted set of
features.
The purpose of this article is to assess the accuracy of working with vec-
tor geometries in deep neural nets, by comparing them with existing shallow
machine learning methods in an experiment with three classification tasks on
vector polygons. Our main objective is to train deep learning models on all
available data. From this objective we do not require our deep learning models
to exceed shallow model accuracy, but we do require the deep models to at least
match shallow model accuracy. Thus, the main question we want to answer is:
Can deep learning models achieve accuracies comparable with
shallow learning models in analysing geospatial vector shapes?
These are the contributions made in this paper:
1. We compare the performance of shallow and deep learning methods on
geospatial vector data, as detailed in Section 3. We show that the deep
learning models introduced here match shallow models in accuracy at
classification tasks on real-world geospatial polygon data.
2. We introduce three classification tasks restricted to geospatial vector poly-

2
Term GIS ML
Geometry A spatial representation
of an object encoded as
one or more points that
may be interconnected
Vector A geometry defined by A one-dimensional array
vertices and edges
Vectorization Conversion of raster or Conversion of data into
analog data into a tensor interpretable by
geospatial vector a machine learning algo-
geometries rithm
Feature A geospatial object A data property
Shape A geospatial object A tensor size along its di-
geometry mensions
K-nearest neighbours The k spatially closest A learning algorithm
objects based on closest resem-
blance

Table 1: Terms in the fields of GIS and ML

gons that serve as a novel and open access benchmark on geospatial vector
shape recognition, detailed in Section 4. The benchmark data files are
available as open data.1
Since the domains of geospatial information systems (GIS) and machine
learning (ML) have partially overlapping vocabularies, we provide Table 1 of
homonyms and their use in the two fields of GIS and ML. Where used in this
article, the terms are clarified by their field or, where possible, avoided.

Vector geometries as raster data Our research focuses on machine learn-


ing on geospatial vector data without rasterization of the geometry data. How-
ever, as any geometry can be expressed as raster data, the question must be
addressed why one should not simply convert vector data to raster and use ma-
chine learning algorithms that are commonly used on raster data. The answer
to this is that geospatial vector data is often better suited for representation
of discrete geospatial objects, because rasterization leads to loss of information
almost everywhere:
1. Geospatial vector data is highly versatile in representing geometries at
varying spatial levels of detail, whereas raster data has a resolution of
fixed and uniform size. Geospatial vector data can leverage these different
levels to represent, for example, the rough shape of a country and the
detailed shape of a microbe, at opposite sides of the globe within a single
multi-part geometry.
1 Data available at http://hdl.handle.net/10411/GYPPBR

3
2. Vector data is almost always more compact in comparison with raster
data. Depending on the accuracy of the source data, materialisation of
vector data into raster data often requires expansion to transform the
vector data into a uniform rasterized sampling of a continuous field.
3. Geospatial vector data can be reasoned over by any Geospatial Information
System (GIS) in terms of topology: properties of geospatial objects with
respect to other geospatial objects in the same set that are invariant under
linear transformations, such as object intersection or spatial adjacency
(Huisman and De By, 2009, 102). With rasterization, this information
may be partially or completely lost: a small gap between two disjoint
geometries for example may be lost if the pixel resolution is lower than
the gap size.

Thus, the rasterization process is trivial but lossy, where the inverse process
of geospatial vectorization is non-trivial and requires human or algorithmic in-
terpretation (Huisman and De By, 2009, 309). For these reasons, it is important
to explore the capabilities of shape analysis by machine learning models without
resorting to rasterization.
The further article structure is as follows: we position our work within re-
lated research in Section 2, we explain the methods of our research in Section 3,
we discuss the classification tasks in Section 4, and the model performance re-
sults on these tasks in Section 5.

2 Related work
The vast majority of machine learning research in the geospatial domain is
focused on analysis of remote sensing data, as shown by overview works from,
for example, Zhu et al (2017) and Ball et al (2017); and by challenges such as the
CrowdAI mapping challenge2 and the DeepGlobe Machine Vision Challenge3
(Demir et al, 2018).
Compared to remote sensing raster data, far fewer publications go into the
matter of analysing geospatial vector shape data through machine learning
strategies. The most common method is to rasterize the vector shapes first.
Xu et al (2017) have published a deep learning strategy for comparing building
footprints. However, the approach by Xu et al. requires preprocessing that ras-
terizes aggregated data and does not classify individual geometries. Similarly,
the shapes in the deep learning image retrieval task through sketches described
by Jiang et al (2017) are raster-based rather than vector-based abstractions, as
are the aircraft shapes extracted from remote sensing in the work by Wu et al
(2016). The work on 3D model retrieval by Wang et al (2017) uses a different
rasterization strategy: 2D-projected images are generated from 3D models to
create an image search database. Cheng and Han (2016, 2-9) survey a number of
2 https://www.crowdai.org/challenges/mapping-challenge
3 http://www.grss-ieee.org/news/the-deepglobe-machine-vision-challenge/

4
works involving geometric shape data, but aimed at classical (i.e. non-machine
learning) remote sensing object detection strategies, rather than on machine
learning analysis of the geometric shapes themselves. In contrast to the raster-
based strategies from these works, we aim to research the possibility of avoiding
the rasterization process and operate on geometries directly, as will be explained
in Section 3.2.1.
Research on machine learning analysis of non-rasterized vector shapes is
scarce. The algorithms used by Andrášik and Bı́l (2016) are trained directly
on geometry properties based on angles and radii of vertices in road sections,
extracted from simplified road geometries. Their method, however, is optimized
to the specific task of classifying short road sections from short polylines. Effati
et al (2015, 120-121) adopt a similar strategy for the road properties for the
purposes of traffic safety analysis. We aim to explore more generic shape anal-
ysis methods through machine learning, rather than task-specific ones. A deep
learning model operating on vector geometries was developed by Ha and Eck
(2018), using a model they named sketch-rnn. Sketch-rnn shows how a deep
learning architecture can be used work with vector geometries directly. The
data collected for sketch-rnn used a web-based crowd-sourcing tool, inviting
users to draw simple vector drawings of cats, t-shirts and a host of other object
categories. Given an object category, the generative sketch-rnn model is able
to analyse partial shapes drawn by the user and extrapolate these to complete
sketches.4

3 Methods
The classification tasks in this paper operate on real-world polygon data. To
be precise, we use the term polygon to mean a single connected sequence (i.e.
without polygon holes) of three or more coplanar lines. Every line in a polygon
is defined by two points in R2 , where each point is shared by exactly two lines to
form a closed loop. We impose no validity constraint on polygons, i.e. polygons
may be self-intersecting.

3.1 Shallow models


3.1.1 Preprocessing
Shallow machine learning methods operate on feature vectors of fixed length,
so more complex input data such as geometries need to be transformed. This
transformation step is known in machine learning as feature extraction (LeCun
et al, 2015, 438) or feature engineering (Domingos, 2012, 84). So, contrary to
deep learning models discussed in Section 3.3, shallow methods do not oper-
ate on geometry coordinates directly, since vector geometries are sequences of
vertices that are both of higher rank and of variable length, as we will explain
4 https://magenta.tensorflow.org/assets/sketch_rnn_demo/index.html

5
Figure 1: Order 1, 2, 3 and 4 elliptic Fourier reconstruction approximations
(red) of a polygon (blue). Each order level adds to the approximation. Adapted
from Kuhl and Giardina (1982, 237)

in Section 3.2.1. Applied to geospatial vector data, we rely on extracting in-


formation from a geometry that characterizes its shape in a lower-dimensional,
fixed-length representation.
Fourier descriptors are a common choice as a feature engineering method
for extracting properties from geometries (Zhang et al, 2002; Keyes and Win-
stanley, 1999; Zahn and Roskies, 1972; Loncaric, 1998). For our preprocessing,
we used the Elliptic Fourier Descriptor (EFD) method by Kuhl and Giardina
(1982). Elliptic Fourier descriptors are created by iterating over the coordinates
of the vertices in a geometry, transforming any number of coordinates of the
geometry into a vector representing the geometry in an elliptic approximation.
This transformation can be reversed, producing an approximation of the origi-
nal geometry, its reconstructive accuracy depending on the order or the number
of harmonics (Kuhl and Giardina, 1982, 239), the order or number being a pos-
itive integer. The higher the order, the better the approximation gets, as shown
in Figure 1.
The Fourier descriptors were constructed using the pyefd package,5 which
implements the algorithms by Kuhl and Giardina (1982) made specifically for
creating descriptors for vector geometries. The pyefd package produces nor-
malized and non-normalized descriptors; the normalized descriptors are start
position, scale, rotation and translation invariant (Kuhl and Giardina, 1982,
236). For the data used in training the shallow models, both normalized and
non-normalized Fourier descriptors were included. Added to the descriptors
are three other easily obtained geometry properties: the polygon surface area,
number of vertices and geometry boundary length.
5 https://pypi.python.org/pypi/pyefd

6
3.1.2 Shallow model selection
As explained in Section 1 we distinguish between two families of machine learn-
ing methods. From the shallow model family, we selected four standard algo-
rithms:

• K-nearest neighbour classifier;


• Logistic regression;
• Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel.
Other kernels (linear, polynomial) were tested but RBF always produced
better results;
• Decision tree classifier.
These model types are so well-established that we will not elaborate upon
these here. Logistic regression goes back to Cox (1958), decision trees to
Breiman et al (1984), k-nearest neighbour to at least Cover and Hart (1967).
With over twenty-five years of history, the SVM by Boser et al (1992) is the
relatively new algorithm in the shallow model family.

3.2 Deep models


3.2.1 Preprocessing
The problem with shallow model feature extraction of geospatial vector data is
that it results in information loss: the original shapes can only be approximated,
but not fully reconstructed by reversing the feature extraction process. Ideally,
it would not be necessary to extract an information subset of geospatial data in
advance to obtain good predictions. Deep learning allows us to train models on
geospatial vector data by directly feeding the geometry coordinates to a deep
learning model, without extracting intermediate features. To explain how deep
learning is able to learn from polygon data directly, we need to discuss our
machine learning vectorization method.

3.2.2 Geometries as machine learning vector sequences


Our geometry tensor encoding was derived from the work by Ha and Eck (2018),
where each geometry sample Gi in a data set of size n is encoded as a sequence
of geometry vertex vectors: hg1i , g2i , g3i , . . . , gm
i
i , and where m is the number
i
of vertices in the geometry. Each vector gj is a concatenation of:

• a coordinate point vector pij in R2 . In fact any coordinate system in R2


is supported in this vector representation.
• A one-hot vector rij in R3 to mark the end of either the point, sub-geometry
or a final stop for the vertices in Gi . For each gji in a polygon geometry,
 
rij = 1 0 0 except for the last vertex, where rim marks the end of

7
 
the polygon as the final stop 0 0 1 . In case of a multipolygon,
 each
sub-polygon is terminated by a sub-geometry stop 0 1 0 except for
the last, which is marked as a final stop.

Combined, gji is a vector of length 5, as shown in Figure 2.


As we explained in Section 1, the rasterization process is lossy. As such, it
can be considered a feature extraction method by which we lose information
that may be of use to the machine learning model. So, for our deep neural
net architectures, geometries are expressed as normalized vector sequences of
geometry vertices. This process is fully reversible in reconstructing the geometry
shape and orientation, but it does require centering and scaling the data. For
deep learning models to perform, the data needs to be normalized to a mean of
zero and scaled to a variance of about one.
Geospatial coordinates are often expressed in degrees of longitude and lat-
itude, where one degree of latitude equals roughly 111 kilometres. However,
vector geometries usually operate on the level of meters or even centimetres. To
counter this normalization imbalance, every point vector pij in every geometry
Gi , pij is normalized to
0 pij − pi
pij = , (1)
s
where pi is the geometry centroid of geometry Gi , computed as the mean average
of all pi in a single geometry Gi . Scale factor s is the standard deviation over the
bounding values bimin and bimax of all geometries. bimin and bimax for a geometry
Gi are defined as  
bimin = min pi − pi , (2)

and  
bimax = max pi − pi . (3)

This is a simpler two-value version of the standard bounding box that would
normally list the minimum and maximum values for a geometry in two dimen-
sions. Scale factor s is then computed as the scalar standard deviation over all
bounding values B:

B = hb1min , b1max , b2min , b2max , . . . , bnmin , bnmax i (4)

Geometries often vary in the number of vertices required to approximate the


shape of a real-world object. As a consequence, the geometry vector sequences
vary in length. Deep learning models have the benefit of being able to train and
predict on variable length sequences (Bahdanau et al, 2014). However, within
one batch the sequences need to be of the same length in order to uniformly
apply the model weights and biases on the entire batch as a single tensor. To
achieve this fixed sequence size within a batch, the geometry vectors are first
sorted in reverse order, with the largest geometry first and the smallest last.
This sorted set of geometries is subdivided into bins of size nbin , where nbin is
at least the training batch size. This is to increase computational efficiency and

8
4.8643 4.8644 4.8645 4.8646 4.8647

52.3340 52.3340

52.3339 52.3339

52.3338 52.3338

4.8643 4.8644 4.8645 4.8646 4.8647

(a)
Polygon coordinates Center: remove mean of Scale: divide by scale
[4.8644271, 52.3339057] factor of 2.64501e-4
4.86447, 52.33384 4.2857e-5, -6.5714e-5 0.16198, -0.24845
4.86447, 52.33386 4.2857e-5, -4.5714e-5 0.16198, -0.17283
4.86456, 52.33386 1.32857e-4, -4.5714e-5 0.50229, -0.17283
4.86456, 52.33386 1.32857e-4, 1.44286e-4 0.50229, 0.54550
4.86423, 52.33405 -1.97143e-4, 1.44286e-4 -0.74534, 0.54550
4.86423, 52.33405 -1.97143e-4, -6.5714e-5 -0.74534, -0.24845
4.86447, 52.33384 4.2857e-5, -6.5714e-5 0.16959, -0.24845
(b)
Tensor representation
[0.16198, -0.24845, 1, 0, 0],
[0.16198, -0.17283, 1, 0, 0],
[0.50229, -0.17283, 1, 0, 0],
[0.50229, 0.54550, 1, 0, 0],
[-0.74534, 0.54550, 1, 0, 0],
[-0.74534, -0.24845, 1, 0, 0],
[0.16959, -0.24845, 0, 0, 1]
(c)

Figure 2: A building polygon (a) with its coordinate normalization by local mean
subtraction and global scaling (b) and the vector representation (c). Coordinates
(in CRS84 projection) and standard deviation have been truncated to five digit
precision for the sake of brevity. In the final tensor representation (c) the render
type is added.

9
reduce training time on what otherwise would be a large array of very small
batches. If there are insufficient geometries of sequence length mbin to create
a set of samples of batch size, smaller geometries are added and padded to
sequence length mbin . Thus, a geometry with a sequence length m of 144 points
is zero-padded to a size mbin of 148 if the largest sequence length in the batch
is 148. This preprocessing of binning and limited padding reduced the training
time to one quarter of the time needed for training on fixed size sequences.
Although there is no theoretical upper bound to the sequence length, there
is a practical one for the amount of memory on commodity hardware. The data
sets contain a small amount of very large geometries. To improve computational
efficiency and prevent memory errors, these rare cases are simplified using the
Douglas-Peucker algorithm (Douglas and Peucker, 1973). In this way, only 0.17
percent of the geometries in our experiments needed to be simplified.

3.3 Deep model selection


The motivation for using deep learning on geospatial vector data goes beyond
matching or improving existing methods. Deep learning allows us to explore new
methods for working with geospatial data, in complex pipelines involving com-
binations of raster, numerical and textual data (Ngiam et al, 2011), including
geospatial vector data. Deep learning can be used for classification or regression
tasks, but also for training generative models, producing new text (Sutskever
et al, 2014), image (Goodfellow et al, 2014) and even vector shape (Ha and Eck,
2018) outputs. Knowing how well deep learning models can learn directly from
geometries is a first step in building more complex generative pipelines with
confidence that the model is able to correctly interpret the data.
We introduce two deep learning, end-to-end trained models where vector-
serialized geometries are given as input data. The deep learning model figures
out the relevant data properties for itself without the need for the feature ex-
traction required for the shallow models. In the next subsections we describe
two relatively simple deep learning architectures that we evaluate on our tasks:
a convolutional model and a recurrent model. We explain these deep learning
models in some detail.

3.3.1 Convolutional neural net


The first introduced deep learning model uses a 1D convolutional neural net
(CNN) layout, shown in Figure 3. For an introduction to the workings of the
CNN, we refer the reader to Olah (2014). As a first layer, our model uses a
ReLU-activated convolution layer with a filter size of 32, a kernel size of five
and a stride of one. With this configuration, the CNN starts a sliding window
across the first five geometry vectors, i.e. g1i through g5i , producing a vector of
size 32 as specified by the filter hyperparameter6 . This window of size five is
slid along the vectors of the geometry, until the end of the geometry including
6 Hyperparameters are the configuration settings of the machine learning model that are

not optimized by the algorithm itself, such as the batch size.

10
shape: (#classes) shape: (#classes)

Fully connected (#classes, SoftMax) Fully connected (#classes, SoftMax)

shape: (m, 64)


shape: (32)

Fully connected (32, ReLU) LSTM (32) LSTM (32)


Forwards Backwards

shape: (64) Bi-directional

GlobalAveragePooling1D shape: (m, 5)

shape: (m/3, 64) Input layer

Conv1D (filters: 64, kernel: 5, ReLU)

shape: (m/3, 32)

MaxPooling1D (pool size: 3, stride: 3)

shape: (m, 32)

Conv1D (filters: 32, kernel: 5, ReLU)

shape: (m, 5)

Input layer

Figure 3: Convolutional (left) and recurrent (right) model layouts.

11
vector: 32

Max pooling
pool size: 3
vector: 32

stride: 3

CNN 1D
filters: 32
kernel: 5
vector: 5

...
geometry 3
geometry 2
geometry 1
sequence length: 9 + padding
Figure 4: The first two layers of the CNN model. With a kernel size of five, the
CNN inspects a sliding window over the first five geometry vectors in geometry
G1 , producing the green element in the CNN output vector. The CNN then
moves to the five elements to the right, and produces the red vector element,
next the orange vector, repeated until the end of the geometry (the next three
windows in grey). This process is repeated for each filter and then moves to the
next geometry, in the direction of the black arrow. The max pooling operation
combines the maximum output element values of the CNN, shown in purple for
geometry 1.

12
vector: 32

output
output

output

output

output

output
LSTM cell

LSTM cell

LSTM cell

LSTM cell

LSTM cell

LSTM cell
state

state

state

state

state
g1 g2 g3 g4 g5 g6
vector: 5

geometry 3
geometry 2
geometry 1

sequence length: 6
Figure 5: Forward-facing LSTM, as part of the first layer of the LSTM model.
Unlike the CNN architecture, complete vectors are fed one by one to the same
LSTM cell. The green boxes therefore represent the same cell, with only its
state updated: along with each next geometry vector, the output and previous
state of the LSTM cell are passed along from one vector to the next. For the
purposes of classification as in this article, only the last LSTM output is returned
(in orange), the intermediate outputs (in grey) are discarded.

padding. Padding ensures outputs by the CNN of the same sequence length
as the input, to prevent size errors on small geometries where the tensor size
becomes too small to pass through the specified network layers. After g1i through
g5i the CNN continues at the second set of geometry entries g2i through g6i . After
inspecting all values of all the vectors in the first geometry, the CNN continues
at the next geometry (see Figure 4).
The first CNN layer is followed by a max pooling layer with a pooling size
of three and a stride of three. The max pooling operation with a pool size of
three combines the maximum values of three CNN output vectors into a single
sequence vector of the same length. The reduction of the CNN output to one-
third is specified by the max pooling stride hyperparameter: after combining
CNN output vectors ci1 , ci2 and ci3 , the max pooling operation skips forward
to combine outputs ci4 , ci5 and ci6 , and so on. After the max pooling layer, a
second convolution layer (not shown in Figure 4) interprets the output of the

13
max pooling layer, with hyperparameters identical to the first but with 64 filters
instead of 32. This CNN layer is followed by a global average pooling layer that
reduces the tensor rank to two by computing the average over the third tensor
axis. The output is subsequently fed to a ReLU activated fully connected layer.
The last layer is a softmax-activated fully connected layer to produce probability
outputs that sum to one.

3.3.2 Recurrent neural net


The second deep learning model uses a recurrent neural net (RNN) layout as
shown in Figure 3. An RNN processes input data sequentially, re-using the
outputs of the network in each slice of the input sequence as additional input
for the next slice (Goodfellow et al, 2016, 364) by which the network can carry
over information between slices and capture long-term dependencies in the in-
put sequence. While there are several RNN architectures, we opted to evaluate
an Long-Short Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) ar-
chitecture, as this is one popular architecture, shown by Ha and Eck (2018) to
be effective for vector geometry analysis.
The core of our model is a single bi-directional LSTM layer. The LSTM
architecture is a particular type of RNN, designed to process sequences of data
with a trainable forget gate. This forget gate regulates the information retained
from one geometry vertex to the next. During training, the LSTM learns which
information in the sequence is of interest to retain, by passing both the vector
in the sequence, the output and the cell state from one input vector to the next
(see Figure 5). For a detailed discussion of recurrent neural nets and LSTMs in
the geospatial domain, we refer the reader to Mou et al (2017). An introduction
to LSTMs is given by Olah (2015). LSTMs have been shown to be effective on
sequences such as words in a sentence (Sutskever et al, 2014), but also sequen-
tial geometry-like data such as handwriting recognition and synthesis (Graves,
2013). The ability of the LSTM to learn long-term dependencies (Goodfellow
et al, 2016, 400) in sequences of input data renders it a suitable architecture to
test its abilities on geospatial vector geometries.
The bi-directional architecture feeds the sequence of geometry vertices for-
wards as well as backwards through LSTM cells (Schuster and Paliwal, 1997),
where the resulting output from these cells is concatenated. This allows the
network to learn from the preceding vertices in the geometry as well as the ones
that are ahead. In our model, we configured both the forwards and backwards
LSTM to produce an output of size 32, combined to 64. As with the CNN
model, the last layer is a softmax-activated fully connected layer to produce
probability outputs that sum to one.

3.4 Evaluation methodology


To compare our selected shallow and deep learning models, we designed a set
of experiments applicable to both shallow and deep models, informative to our
research question and straightforward to interpret. We focus on a set of clas-

14
sification tasks as explained in Section 4, for which our models are required to
correctly assign individual polygons to a certain type, based on only the polygon
shape.
For the classification tasks, we select data sets on the following requirements:
• Each task contributes to evaluate the accuracy performance of deep learn-
ing models versus shallow models. The data sets for the tasks contain
real-world polygon data from different domains, with different use cases
and on different spatial scales;
• Each data set contains enough data to draw conclusions on model general-
ization; we set a requirement for data sets of at least 12,000 geometries in
order to provide a training set of at least 10,000 geometries, a validation
set of at least 1,000 geometries and a test set of at least 1,000 geometries;
• Each task requires the models to infer information from the polygon shape,
and from the polygon shape alone. To this end, data is selected to be likely
to contain shape information relevant to the classification task but not a
trivial solution;
• We require our data to be available under an open license, to be accessible
for future research.
Through the use of classification tasks, both the shallow and deep model perfor-
mance can be directly compared. Classification tasks can be expressed through
the simple metric of accuracy score: the ratio of correctly assigned test samples
over the total set of test samples. We also add a baseline majority class accu-
racy score: the fraction of the prevalent class, included as a baseline of the most
simple method to exceed. The models are trained on a training data set, the
model performance was iteratively tuned to best perform on an evaluation set
and finally tested once on a test set that was unseen during any of the tuning
runs. The resulting test set accuracy scores are listed in Table 3. We include
a discussion of the misclassification behaviour of the models by analysing the
confusion matrices (Stehman, 1997).
For optimal reproducibility, we use only open data7 and open source meth-
ods to answer our research question. We release all preprocessing code, deep
learning models and shallow models as open source software.8 Scikit-learn (Pe-
dregosa et al, 2011) provides the shallow learning algorithms. To evaluate shal-
low model accuracy on each task, we use a brute force grid search with 5-fold
cross validation to find the best applicable hyperparameters for k (k-nearest
neighbours), degree (decision tree), C (SVM, logistic regression) and gamma
(SVM). SVM grid searches are restricted to a maximum number of 10M itera-
tions to allow the grid search operation to complete within a day. Grid searches
on SVM models and k-nearest neighbours are restricted to a subset of the train-
ing data to allow the grid search to finish within a day on commodity hardware.
7 Data available at http://hdl.handle.net/10411/GYPPBR
8 Code available at https://github.com/SPINlab/geometry-learning

15
All shallow models are trained, however, using the full training set on the the
best hyperparameters obtained from the grid search. The deep learning models
are implemented using Keras (Chollet et al, 2015) version 2 with a TensorFlow
(Abadi et al, 2016) version 1.7 backend. All deep model hyperparameters are
tuned on the full training and validation set, using validation data split ran-
domly from the training data.
For the grid searches, we do not assume that including an arbitrarily high
number of descriptors produces the best accuracy score. Instead, the number of
extracted Fourier descriptors used during training is included as a hyperparam-
eter in the grid search for each shallow model, to produce the descriptor order
at which the grid search obtains the best results. The best parameters found in
the grid searches are listed in Table 4 of Appendix A.

4 Tasks
We created a set of three classification tasks to evaluate the performance of
several machine learning algorithms in shape recognition on real-world poly-
gon data. From the requirements listed in Section 3.4 the following tasks and
accompanying data sets were selected:
1. Predicting whether the number of inhabitants in a neighbourhood is above
or below the national median, based on the neighbourhood geometry;
2. Predicting a building class from the building contour polygon. The avail-
able classes are buildings for the purpose of gathering; industrial activity;
lodging; habitation; shopping; office buildings; health care; education; and
sports.
3. Predicting an archaeological feature type from its geometry. Features
are available as an instance of either a layer; wall; ditch; pit; natural
phenomenon; post hole; well; post hole with visible post; wooden object;
or recent disturbance.
The classes and their frequencies are displayed in Table 2.

4.1 Neighbourhood inhabitants


The first task is to predict the number of inhabitants for a certain neighbour-
hood to be above or below the Dutch national median of 735 inhabitants. A
neighbourhood is a geographical region of varying size and shape, as defined
by Statistics Netherlands.9 The data was harvested from the 2017 districts
and neighbourhoods Web Feature Service.10 For the sake of evaluation sim-
plicity, the task has been shaped into a binary class prediction: to predict a
neighbourhood for having equal or more (6,610 neighbourhoods) or less (6,598
9 https://www.cbs.nl/-/media/_pdf/2017/36/2017ep37%20toelichting%20wijk%20en%

20buurtkaart%202017.pdf (only available in Dutch).


10 https://geodata.nationaalgeoregister.nl/wijkenbuurten2017/wfs

16
4.85 4.86 4.87 4.88 4.89

52.34 52.34

52.33 52.33

4.85 4.86 4.87 4.88 4.89


Neighborhood inhabitants
Below median

Equal or above median

Figure 6: Map visualisation of neighbourhoods at scale 1:25,000 in Amsterdam


near the VU University complex. In bold the absolute number of inhabitants
per neighbourhood. Zero-count neighbourhoods are areas such as parks or office
zones. Map coordinates in WGS84 degrees.

17
4.860 4.864 4.868 4.872

52.336 52.336 Building type


Gathering

Health care

Industry

Office

Lodging

Education

52.332 52.332 Sports

Shopping
4.860 4.864 4.868 4.872

Habitation

Other

Figure 7: Map visualisation of buildings at scale 1:10,000 near the VU University


complex. Map coordinates in WGS84 degrees.

5.2656 5.2657
52.7168 52.7168
Archaeology types
Ditch

Wooden object

Pit

Layer

Wall

Natural

Post hole with post

Post hole

52.7167 52.7167
5.2656 5.2657 Recent

Well

Figure 8: Map visualisation of archaeological ground features at scale 1:500,


from (Roessingh and Lohof, 2010). Map coordinates in WGS84 degrees.

18
Neighbourhood Buildings Archaeological
inhabitants features
Class frequency Function frequency Class frequency
≥ median 6,610 Habitation 23,000 Post hole 24,991
< median 6,598 Industrial 23,000 Pit 7,713
Total 13,208 Lodging 23,000 Natural 6,136
Shopping 23,000 phenomenon
Gatherings 22,007 Recent 5,625
Office 21,014 disturbance
Education 10,717 Ditch 4,926
Healthcare 7,832 Wooden 2,499
Sports 6,916 object
Total 160,486 Layer 1,321
Wall 1,387
Post hole with 1,005
visible post
Water well 980
Total 56,583

Table 2: Class frequency for the three tasks of neighbourhood inhabitants (left),
building types (middle) and archaeological feature types (right).

neighbourhoods) than the median of the absolute number of inhabitants of the


entire set of neighbourhoods for the Netherlands for the year 2017. The me-
dian was chosen to create an even split11 of the two classes. For purposes of
evaluation simplicity it is appropriate to estimate the median on the absolute
number of inhabitants rather than population density. Calculating population
density would add complexity to the task by requiring the models to divide the
estimated number of inhabitants over an estimate of the neighbourhood sur-
face area. Designed as a binary classification task, its purpose is primarily to
evaluate model performance. However, the task could be re-designed to solve
real-world problems: for example in estimating population sizes or densities for
disaster areas based on topographical region shapes (Brown et al, 2001).

4.2 Building types


The second task is to classify a building from the building footprint geometry,
harvested from the buildings and addresses base registration (BAG) Web Fea-
ture Service.12 The data set consists of buildings in nine functional classes, for
a total of 160,486 buildings in the combined training and test dataset. Since the
complete source data set comprises over five million buildings, each class was
11 The slight difference between the class frequencies is explained by a higher occurrence of

neighbourhoods with number of inhabitants equal to the median number.


12 https://geodata.nationaalgeoregister.nl/bag/wfs

19
trimmed to a maximum of the first 23,000 instances per class to prevent creating
a data set too large to experiment on. With this selection, the buildings set still
has the most data of the three tasks. The task of classifying a building based on
its shape can have clear quality control benefits: one could assess the likelihood
of a specific building to belong to a specific type based on shape characteris-
tics. Subsequently, a system could flag buildings as suspect that fall below a
likelihood threshold for their current building class and suggest a replacement
building class based on a higher likelihood.

4.3 Archaeological features


The third task is to classify an archaeological feature from its observed geome-
try. Archaeological features are field observations of disturbances in the subsoil
as a result of human activities in the past. Archaeological institutions in the
Netherlands store the results of archaeological field research in a digital repos-
itory (Gilissen, 2017; Hollander, 2014). From the DANS EASY repository,13
from ten archaeological projects (Roessingh and Lohof, 2010; Gerrets and Ja-
cobs, 2011; Van der Veken and Prangsma, 2011; Dijkstra and Zuidhoff, 2011;
Dijkstra et al, 2010; van de Velde et al, 2002; van der Velde, 2011; Dijkstra,
2012; Roessingh and Blom, 2012; Van der Veken and Blom, 2012), a total of
56583 geometries was collected in ten classes. Of the three tasks, the class
distribution for this dataset is the most unbalanced: the data shows a clear
over-representation of post holes (44,2 %). Although the archaeological data
task was designed for our experimental model evaluation purposes primarily,
the ability to infer feature types from archaeological feature shapes is useful
in field work, where a good performing model can assist in a documentation
system, suggesting a likely feature class based on shape information.

5 Evaluation
The accuracy scores of the model performance on each of the three tasks allow
us to compare the performance of the deep learning models against the shallow
learning models. Table 3 shows the results for each of the three benchmark
tasks. The accuracy scores were produced from model predictions on the test
set, consisting of geometries in the data set that were unseen by the models
during training. The deep learning model experiments were repeated ten-fold:
randomized network initialisation and batch sampling produce slight variations
in accuracy scores between training sessions. The accuracy figures for the deep
neural models therefore represent mean and standard deviation from the test
predictions on the independently repeated training sessions.
We note the following conclusions from these accuracy scores:
1. The deep neural nets are at least competitive with the best shallow models,
for each of the three tasks. In five out of the six deep learning experiments,
13 https://easy.dans.knaw.nl

20
Method Task (no. of classes)
Neighbourhood Building types Archaeological
inhabitants (2) (9) feature types
(10)
Majority class 0.514 0.142 0.444
k-NN 0.671 0.377 0.596
Logistic regression 0.659 0.328 0.555
SVM RBF 0.683 0.365 0.601
Decision tree 0.682 0.389 0.615
CNN 0.664 ± 0.005 0.408 ± 0.003 0.624 ± 0.002
RNN 0.608 ± 0.016 0.389 ± 0.008 0.614 ± 0.004

Table 3: Table of results with accuracy scores for the majority class (top row)
shallow models (middle four rows) the deep learning models (bottom two rows),
with the best scores per task in bold. The number of classes per task is listed
between brackets in the column headers. The standard deviations on the deep
learning models on the bottom two rows were obtained from test set predictions
on ten-fold repeated, independent training sessions.

the deep models perform on par with or slightly better than the best
shallow models, but in the broad sense they do not outperform the shallow
models by a wide margin.
2. On two of the three classification tasks, the CNN architecture is able
to outperform the shallow models by a few percentage points. If top
performance in a certain geometry classification task is required, the CNN
is likely to be a good choice.
To gain further insight into model performance, we include an analysis the
confusion matrices of the test runs. As there are 18 confusion matrices in total,
these are not included in the article.14 In general the misclassification is reflected
in similar patterns across all models, with higher errors for models that under-
perform on a certain task. A few task-specific details are discussed in B, but in
general the misclassification behaviour of the deep models does not differ from
the shallow models.
As mentioned in Section 3.2.1, the number of elliptic Fourier descriptors
used for training the shallow models was included as a hyperparameter in the
grid search. A closer inspection of these hyperparameters in Appendix A is of
interest:
1. Nearly all shallow models benefit from adding Fourier descriptors. A no-
table exception is the k-nearest neighbours algorithm, which scores the
14 The confusion matrices are available for download from https://dataverse.nl/api/

access/datafile/13051

21
highest accuracy in two of the three tasks only when no Fourier descrip-
tors are added to the training data. As it appears from the three tasks,
the k-NN algorithm is less able to extract meaningful information from
the Fourier descriptors.
2. The shallow models have a clear preference for lower orders of Fourier de-
scriptors. Even though many higher orders (up to order 24) were tested,
no shallow model was able to perform better on descriptor orders higher
than four. Order four descriptors only provide a very rough approxima-
tion of the original geometry, as is well visualised in Kuhl and Giardina’s
paper Kuhl and Giardina (1982, 243) and Figure 1. Still, the descriptors
evidently contain enough important shape information for most shallow
models to improve the accuracy score.
3. Support vector machines come with a misclassification tolerance hyperpa-
rameter C. In situations where SVMs with high C settings (low tolerance)
lead to higher performance, such as the ones for the archaeology classifica-
tion task, the training sessions were exceedingly time-consuming to train
on our data. Where low C-values tended to converge in seconds, high val-
ues could literally take days or even weeks to converge. To prevent having
to wait for extended periods of time—there is no indication in what time
frame a training session on a set of hyperparameters will converge—we
needed to constrain the amount of training data and the maximum of it-
erations, especially on hyperparameter grid searches. It is quite possible
that as a consequence of these constraints, the grid search fails to produce
the optimal hyperparameter settings, but this is an unfortunate side effect
of using SVMs on Fourier descriptors of geometries.

6 Conclusion and future research


In this article, we compared the accuracy of deep learning models against base-
lines of shallow learning methods on geospatial vector data classification tasks.
We evaluated two deep learning models and four shallow learning models on
three new classification tasks involving only geometries. To answer the ques-
tion whether deep learning performs better than shallow learning, we directly
compared the accuracy scores of these models on these three tasks. From our
experiments we showed that our deep learning models are competitive with
established shallow machine learning models, in two of the three tasks outper-
forming them by a small margin.
None of the chosen recognition tasks appear to be trivial. Classifying objects
from geometries alone is a tough assignment for any algorithm. The advantage
of having a set of tough tasks is that these can serve as benchmarks: in future
experiments, different learning algorithms or model layouts it may be possible to
obtain higher accuracy scores. However, there is a possibility that the accuracy
figures presented in the evaluation represent the maximum that can be learned
from geometries alone. From these experiments alone it cannot be deduced

22
whether these figures can actually be improved on. If there is a hard ceiling
at the best performing models, perhaps the benchmarks can be improved by
including more data than just the geometries alone, for example information
gathered from the direct spatial surroundings or other properties of the spatial
objects. The benchmark presented here can be considered a first attempt.
An area that might see improvement is the performance of LSTMs. In an
earlier development stage, the LSTMs were trained on fixed length rather than
on the variable length sequences. During this stage, the LSTMs performed sig-
nificantly better (on validation data, no final tests were performed) on fixed
length sequences, outperforming the CNNs. However, training on fixed length
sequences was abandoned because it requires simplifying geometries to a fixed
maximum of points per geometry. This was not consistent with our aim of
training on all available information. Creating fixed length sequences also re-
quired adding a large amount of zero-padding to increase sequence length on
all geometries shorter than the fixed size. After switching to variable length
sequences, the performance of the CNN models increased and the LSTM per-
formance dropped considerably. We hypothesize that there is room to improve
the LSTM model configuration to CNN model performance or perhaps even
better. To test this in future research, the fixed length sequences were included
in the benchmark data.
There are several open questions to further explore the use of deep learning
models for geometries. It would be helpful to verify the accuracy on other
types of geometries, such as multi-lines, multi-points or even heterogeneous
geometry collections. Also, the deep neural net’s comprehension of holes in
polygons could be beneficial. Another interesting road to explore is to combine
geometries with other information sources as input data, such as remote sensing
data or textual descriptions. Deep learning poses a viable route to explore more
complex pipelines. Such pipelines could include geometries and other modalities
as inputs, to produce multi-modal combinations of sequences (Sutskever et al,
2014), images (He et al, 2017) or new geometries (Ha and Eck, 2018) as output.
This paper is a step in that direction, by showing that is possible to have deep
neural nets learn from geometries directly.

Appendices
A Hyperparameter grid search results for shal-
low models
The grid searches discussed in Section 3 resulted in a set of best hyperparameter
settings for the shallow models. These best settings are listed in Table 4 and
include the ranges that were searched. The range for the elliptic fourier descrip-
tor order o is always the same: each grid search was executed on the orders
h0, 1, 2, 3, 4, 6, 8, 12, 16, 20, 24i. The search intervals for the other hyperparam-

23
Method Task
Neighbourhood Building types Archaeological
inhabitants (2) (9) feature types
(10)
o=2 o=3 o=3
Decision tree
d=6 in [4, 9] d=10 in [6, 12] d=9 in [5, 10]
o=1 o=0 o=0
k-NN k=26 in [21, 30] k=29 in [21, 30] k=29 in [21, 30]
o=1 o=0 o=2
SVM RBF C=1 in 1e[−2, 3] C=1000 in C=100 in
1e[−2, 3] 1e[−1, 3]
γ=1 in 1e[−3, 3] γ=10 in 1e[−2, 3] γ=0.01 in
1e[−4, 4]
Logistic o=1 o=4 o=8
regression C=0.01 in C=1 in 1e[−2, 3] C=1000 in
1e[−3, 1] 1e[−2, 3]

Table 4: Hyperparameters for the shallow models. Interval values for decision
tree and k-nearest neighbours ∈ N, for the SVM in log scale, with the exponent
interval ∈ N.

eters are listed in Table 4. The k-hyperparameter of the k-nearest neighbour


models and the maximum depth d hyperparameter for the decision tree have
intervals with values ∈ N. For the other hyperparameters, the listed interval
values are powers of ten, as indicated by the scientific notation.

B Confusion matrix discussion


There are a few task-specific details to the confusion matrices mentioned in
Section 5. Interesting to note is that all models over-estimate the number of
inhabitants on the test set. Due to random selection, there is a slight over-
representation of neighbourhoods below the median in the test set (51,4 %)
but this does not account for the large over-estimation of the models. Over-
estimations are about twice (1,92) as frequent as the number of under-estimated
errors. We can only speculate on why this is—it may be due to some shape
imbalance in the random test set selection. All models struggle to distinguish
between buildings with lodging function and habitation, which is understandable
since any house can be made suitable for temporary lodging. Interestingly, the
RNN model appears less prone to identifying lodging as habitation, but only
to lose this advantage to a higher misclassification of habitation as lodging.
Similarly, on the archaeology task all models have particular trouble separating
natural phenomena from post holes, which can be attributed to being of similar
size and shape. Often many archaeological features are preliminary marked as
post holes when later, after making cross-sections, to be identified as natural

24
phenomena.

References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis
A, Dean J, Devin M, Ghemawat S, Goodfellow IJ, Harp A, Irving G, Isard
M, Jia Y, Józefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga
R, Moore S, Murray DG, Olah C, Schuster M, Shlens J, Steiner B, Sutskever
I, Talwar K, Tucker PA, Vanhoucke V, Vasudevan V, Viégas FB, Vinyals
O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) Tensorflow:
Large-scale machine learning on heterogeneous distributed systems. CoRR
abs/1603.04467, URL http://arxiv.org/abs/1603.04467
Andrášik R, Bı́l M (2016) Efficient road geometry identification from digital
vector data. Journal of Geographical Systems 18(3):249–264, DOI 10.1007/
s10109-016-0230-1, URL https://doi.org/10.1007/s10109-016-0230-1

Araya YH, Remmel TK, Perera AH (2016) What governs the presence of
residual vegetation in boreal wildfires? Journal of Geographical Systems
18(2):159–181, DOI 10.1007/s10109-016-0227-9, URL https://doi.org/10.
1007/s10109-016-0227-9
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly
learning to align and translate. CoRR abs/1409.0473, URL http://arxiv.
org/abs/1409.0473, 1409.0473
Ball JE, Anderson DT, Chan CS (2017) Comprehensive survey
of deep learning in remote sensing: theories, tools, and chal-
lenges for the community. Journal of Applied Remote Sensing
11(4):042609, URL https://www.spiedigitallibrary.org/journals/
Journal-of-Applied-Remote-Sensing/volume-11/issue-4/042609/
Comprehensive-survey-of-deep-learning-in-remote-sensing--theories/
10.1117/1.JRS.11.042609.full
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal mar-
gin classifiers. In: Proceedings of the fifth annual workshop on Computational
learning theory, ACM, pp 144–152, URL http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.21.3818&rep=rep1&type=pdf
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and re-
gression trees. Wadsworth & Brooks/Cole Advanced Books & Software

Brezina T, Graser A, Leth U (2017) Geometric methods for estimating represen-


tative sidewalk widths applied to vienna’s streetscape surfaces database. Jour-
nal of Geographical Systems 19(2):157–174, DOI 10.1007/s10109-017-0245-2,
URL https://doi.org/10.1007/s10109-017-0245-2

25
Brown V, Jacquier G, Coulombier D, Balandine S, Belanger F, Legros D (2001)
Rapid assessment of population size by area sampling in disaster situations.
Disasters 25(2):164–171, URL http://www.parkdatabase.org/files/
documents/2001_brown_et_al_rapid_assessment_of_population_size_
by_area_sampling_in_disaster_situations.pdf
Cheng G, Han J (2016) A survey on object detection in optical remote sensing
images. ISPRS Journal of Photogrammetry and Remote Sensing 117:11–28,
URL https://arxiv.org/pdf/1603.06201
Chollet F, et al (2015) Keras. https://github.com/fchollet/keras
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE transac-
tions on information theory 13(1):21–27, URL https://www.cs.bgu.ac.il/
~adsmb182/wiki.files/borak-lecture%20notes.pdf
Cox DR (1958) The regression analysis of binary sequences. Journal of the Royal
Statistical Society Series B (Methodological) 20(2):215–242, URL http://
www.jstor.org/stable/2983890
Demir I, Koperski K, Lindenbaum D, Pang G, Huang J, Basu S, Hughes F, Tuia
D, Raskar R (2018) Deepglobe 2018: A challenge to parse the earth through
satellite images. ArXiv e-prints URL https://arxiv.org/abs/1805.06561
Dijkstra J (2012) Wijk bij Duurstede veilingterrein do opgraving. https://doi.
org/10.17026/dans-x8d-qmae, DOI 10.17026/dans-x8d-qmae
Dijkstra J, Zuidhoff F (2011) Veere rijksweg N57 proefsleuven begeleid-
ing opgraving. https://doi.org/10.17026/dans-xyc-re2w, DOI 10.17026/
dans-xyc-re2w
Dijkstra J, Houkes M, Ostkamp S (2010) Gouda Bolwerk opgraving en
begeleiding. https://doi.org/10.17026/dans-xzm-x29h, DOI 10.17026/
dans-xzm-x29h
Domingos P (2012) A few useful things to know about machine learning.
Communications of the ACM 55(10):78–87, URL https://dl.acm.org/
citation.cfm?id=2347755
Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number
of points required to represent a digitized line or its caricature. Cartograph-
ica: The International Journal for Geographic Information and Geovisual-
ization 10(2):112–122, DOI 10.3138/FM57-6770-U75U-7727, URL https:
//doi.org/10.3138/FM57-6770-U75U-7727
Effati M, Thill JC, Shabani S (2015) Geospatial and machine learning tech-
niques for wicked social science problems: analysis of crash severity on
a regional highway corridor. Journal of Geographical Systems 17(2):107–
135, DOI 10.1007/s10109-015-0210-x, URL https://doi.org/10.1007/
s10109-015-0210-x

26
Fan H, Zipf A, Fu Q, Neis P (2014) Quality assessment for building footprints
data on openstreetmap. International Journal of Geographical Information
Science 28(4):700–719
Gerrets D, Jacobs E (2011) Venlo TPN deelgebied 1 en 2 opgraving. https:
//doi.org/10.17026/dans-26f-55zu, DOI 10.17026/dans-26f-55zu

Gilissen V (2017) Archiving the past while keeping up with the times. Studies
in Digital Heritage 1(2):194–205, DOI 10.14434/sdh.v1i2.23238, URL https:
//doi.org/10.14434/sdh.v1i2.23238
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair
S, Courville A, Bengio Y (2014) Generative adversarial nets. In:
Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ
(eds) Advances in Neural Information Processing Systems 27, Curran
Associates, Inc., pp 2672–2680, URL http://papers.nips.cc/paper/
5423-generative-adversarial-nets.pdf

Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, http:


//www.deeplearningbook.org
Graves A (2013) Generating sequences with recurrent neural networks. CoRR
abs/1308.0850, URL http://arxiv.org/abs/1308.0850, 1308.0850
Ha D, Eck D (2018) A neural representation of sketch drawings. In: International
Conference on Learning Representations, URL https://openreview.net/
forum?id=Hy6GHpkCW
Hagenauer J (2016) Weighted merge context for clustering and quantizing spa-
tial data with self-organizing neural networks. Journal of Geographical Sys-
tems 18(1):1–15, DOI 10.1007/s10109-015-0220-8, URL https://doi.org/
10.1007/s10109-015-0220-8
He K, Gkioxari G, Dollár P, Girshick RB (2017) Mask R-CNN. CoRR
abs/1703.06870, URL http://arxiv.org/abs/1703.06870, 1703.06870
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Compu-
tation 9(8):1735–1780, DOI 10.1162/neco.1997.9.8.1735, URL https://doi.
org/10.1162/neco.1997.9.8.1735
Hollander H (2014) The e-depot for dutch archaeology. archiving and publica-
tion of archaeological data. In: Conference on Cultural Heritage and New
Technologies (CHNT), Vienna, Stadt Archäologie Wien

Huisman O, De By R (2009) Principles of geographic information systems.


ITC Educational Textbook Series 1, URL https://kartoweb.itc.nl/
geometrics/Publications/PoGIS2009%20Chapter%204%20selection.pdf

27
Jiang T, Xia G, Lu Q (2017) Sketch-based aerial image retrieval. In: 2017 IEEE
International Conference on Image Processing (ICIP), pp 3690–3694, DOI 10.
1109/ICIP.2017.8296971, URL http://captain.whu.edu.cn/papers/ICIP_
jiang.pdf
Keyes L, Winstanley AC (1999) Fourier descriptors as a general classification
tool for topographic shapes. IPRCS, pp 193–203, URL http://eprints.
maynoothuniversity.ie/66/
Kuhl FP, Giardina CR (1982) Elliptic fourier features of a closed contour.
Computer graphics and image processing 18(3):236–258, DOI 10.1016/
0146-664X(82)90034-X, URL http://dx.doi.org/10.1016/0146-664X(82)
90034-X
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–
444, DOI doi:10.1038/nature14539, URL http://dx.doi.org/10.1038/
nature14539
Loncaric S (1998) A survey of shape analysis techniques. Pattern Recog-
nition 31(8):983 – 1001, DOI https://doi.org/10.1016/S0031-2023(97)
00122-2, URL http://www.sciencedirect.com/science/article/pii/
S0031202397001222
Montero JM, Mı́nguez R, Fernández-Avilés G (2018) Housing price predic-
tion: parametric versus semi-parametric spatial hedonic models. Journal of
Geographical Systems 20(1):27–55, DOI 10.1007/s10109-017-0257-y, URL
https://doi.org/10.1007/s10109-017-0257-y
Mou L, Ghamisi P, Zhu XX (2017) Deep recurrent neural networks for hyper-
spectral image classification. IEEE Transactions on Geoscience and Remote
Sensing 55(7):3639–3655, DOI 10.1109/TGRS.2016.2636241, URL http://
dx.doi.org/10.1109/TGRS.2016.2636241
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep
learning. In: Proceedings of the 28th international conference on machine
learning (ICML-11), pp 689–696, URL http://ai.stanford.edu/~ang/
papers/icml11-MultimodalDeepLearning.pdf

Olah C (2014) Conv nets: A modular perspective. URL https://colah.


github.io/posts/2014-07-Conv-Nets-Modular/
Olah C (2015) Understanding lstm networks. URL https://colah.github.io/
posts/2015-08-Understanding-LSTMs/

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel


M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau
D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research 12:2825–2830, URL http:
//www.jmlr.org/papers/v12/pedregosa11a.html

28
Roessingh W, Blom E (2012) Oosterhout Vrachelen de Contreie Vrachelen
4 opgraving. https://doi.org/10.17026/dans-25d-fpe5, DOI 10.17026/
dans-25d-fpe5
Roessingh W, Lohof E (2010) Enkhuizen Kadijken 5a en 5b opgraving. https:
//doi.org/10.17026/dans-27r-e5f8, DOI 10.17026/dans-27r-e5f8
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks.
IEEE Transactions on Signal Processing 45(11):2673–2681, DOI
10.1109/78.650093, URL https://www.researchgate.net/profile/Mike_
Schuster/publication/3316656_Bidirectional_recurrent_neural_
networks/links/56861d4008ae19758395f85c.pdf
Stehman SV (1997) Selecting and interpreting measures of thematic classifi-
cation accuracy. Remote sensing of Environment 62(1):77–89, URL https:
//www.researchgate.net/profile/Stephen_Stehman/publication/
222169047_Selecting_and_interpreting_measures_of_thematic_
classification_accuracy/links/5b5a0fe5a6fdccf0b2f8fe87/
Selecting-and-interpreting-measures-of-thematic-classification-accuracy.
pdf
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learn-
ing with neural networks. In: Advances in neural information pro-
cessing systems, pp 3104–3112, URL http://papers.nips.cc/paper/
5346-sequence-to-sequence-learning-with-neural-networks.pdf
Van der Veken B, Blom E (2012) Veghel Scheiffelaar ii opgraving. https://
doi.org/10.17026/dans-z93-7zbe, DOI 10.17026/dans-z93-7zbe
Van der Veken B, Prangsma N (2011) Montferland Didam westelijke randweg
Kerkwijk opgraving. https://doi.org/10.17026/dans-zmk-35vy, DOI 10.
17026/dans-zmk-35vy
van der Velde H (2011) Katwijk Zanderij Westerbaan opgraving. https://doi.
org/10.17026/dans-znz-r2ba, DOI 10.17026/dans-znz-r2ba
van de Velde H, Ostkamp S, Veldman H, Wyns S (2002) Venlo Maasboulevard.
https://doi.org/10.17026/dans-x84-msac, DOI 10.17026/dans-x84-msac
Wang Y, Zhang L, Tong X, Liu S, Fang T (2017) A feature extraction and
similarity metric-learning framework for urban model retrieval. International
Journal of Geographical Information Science 31(9):1749–1769, DOI 10.1080/
13658816.2017.1334888, URL https://doi.org/10.1080/13658816.2017.
1334888, https://doi.org/10.1080/13658816.2017.1334888
Wu Q, Diao W, Dou F, Sun X, Zheng X, Fu K, Zhao F (2016) Shape-based ob-
ject extraction in high-resolution remote-sensing images using deep boltzmann
machine. International Journal of Remote Sensing 37(24):6012–6022, DOI 10.
1080/01431161.2016.1253897, URL https://doi.org/10.1080/01431161.
2016.1253897, https://doi.org/10.1080/01431161.2016.1253897

29
Xu Y, Chen Z, Xie Z, Wu L (2017) Quality assessment of building footprint
data using a deep autoencoder network. International Journal of Geographical
Information Science 31(10):1929–1951, URL http://www.tandfonline.com/
doi/abs/10.1080/13658816.2017.1341632
Zahn CT, Roskies RZ (1972) Fourier descriptors for plane closed curves. IEEE
Transactions on computers C-21(3):269–281, DOI 10.1109/TC.1972.5008949,
URL http://dx.doi.org/10.1109/TC.1972.5008949
Zhang D, Lu G, et al (2002) A comparative study of fourier descriptors for shape
representation and retrieval. In: Proc. of 5th Asian Conference on Computer
Vision (ACCV), Citeseer, pp 646–651, URL http://citeseerx.ist.psu.
edu/viewdoc/summary?doi=10.1.1.73.5993
Zhu XX, Tuia D, Mou L, Xia G, Zhang L, Xu F, Fraundorfer F (2017)
Deep learning in remote sensing: A comprehensive review and list of re-
sources. IEEE Geoscience and Remote Sensing Magazine 5(4):8–36, DOI
10.1109/MGRS.2017.2762307, URL http://arxiv.org/abs/1710.03959,
1710.03959

30

You might also like