Deep Learning For Classification Tasks On Geospatial Vector Polygons
Deep Learning For Classification Tasks On Geospatial Vector Polygons
Abstract
In this paper, we evaluate the accuracy of deep learning approaches
on geospatial vector geometry classification tasks. The purpose of this
evaluation is to investigate the ability of deep learning models to learn
from geometry coordinates directly. Previous machine learning research
applied to geospatial polygon data did not use geometries directly, but
derived properties thereof. These are produced by way of extracting ge-
ometry properties such as Fourier descriptors. Instead, our introduced
deep neural net architectures are able to learn on sequences of coordi-
nates mapped directly from polygons. In three classification tasks we
show that the deep learning architectures are competitive with common
learning algorithms that require extracted features.
Acknowledgements
This work was supported by the Dutch National Cadastre (Kadaster) and the
Amsterdam Academic Alliance Data Science (AAA-DS) Program Award to the
UvA and VU Universities. We would also like to thank the following organisa-
tions. The source data for the neighbourhoods task is published by Statistics
Netherlands (CBS) and distributed by the Publieke Dienstverlening op de Kaart
organization (PDOK) under a Creative Commons (CC) Attribution license. The
data for the buildings task was published by the Dutch National Cadastre un-
der a CC Zero license. The archaeological data in raw form is hosted by Data
Archiving and Networked Services, and re-licensed by kind permission of copy-
right holder ADC ArcheoProjecten under CC-BY-4.0. We thank Henk Scholten,
Frank van Harmelen, Xander Wilcke, Maurice de Kleijn, Jaap Boter, Chris Lu-
cas, Eduardo Dias, Brian de Vogel and anonymous reviewers for their helpful
comments.
∗ Vrije Universiteit Amsterdam, Kadaster, Geodan: [email protected], orcid: https://
orcid.org/0000-0003-0520-6684
† Vrije Universiteit Amsterdam, orcid: https://orcid.org/0000-0002-0189-5817
‡ University of Twente, Kadaster, orcid: https://orcid.org/0000-0002-7845-1763
1
1 Introduction
For many tasks, it is useful to analyse the geometric shapes of geospatial objects,
such as in quality assessment or enrichment of map data (Fan et al, 2014) or
such as the classification of topographical objects (Keyes and Winstanley, 1999).
Machine learning is increasingly used in geospatial analysis tasks. Machine
learning can learn from data by extracting patterns (Goodfellow et al, 2016,
2). For example, machine learning can be applied to classify building types
(Xu et al, 2017), analyse wildfires (Araya et al, 2016), traffic safety (Effati
et al, 2015), cluster spatial objects (Hagenauer, 2016), detect aircraft shapes
(Wu et al, 2016) or classify road sections (Andrášik and Bı́l, 2016): tasks that
extend beyond standard GIS processing operations. The prediction of house
prices (Montero et al, 2018) and the estimation of pedestrian side walk widths
(Brezina et al, 2017) are tasks that could also benefit from the application of
machine learning analysis on geometric shapes.
Deep learning is a relatively new addition to the collection of machine learn-
ing methods. Deep learning allows stacking multiple learning layers to form
a model that is able to train latent representations at varying levels of data
abstraction (LeCun et al, 2015). In this paper, we will use the term shallow
machine learning (Ball et al, 2017, 2-3) to refer to methods that are not based
on deep learning methods. A distinguishing property of deep versus shallow
learning methods is that shallow learning requires a preprocessing step known
in machine learning as feature extraction (LeCun et al, 2015, 438), a lossy data
transformation process. Shallow models require feature vectors as input data,
so when using data of variable length such as geometries, shallow learning al-
gorithms depend on feature extraction. One advantage of deep models over
shallow models is that these feature extraction methods are not required for
deep learning, which is why we want to explore the abilities of deep learning
to operate on all available geometry data rather than on an extracted set of
features.
The purpose of this article is to assess the accuracy of working with vec-
tor geometries in deep neural nets, by comparing them with existing shallow
machine learning methods in an experiment with three classification tasks on
vector polygons. Our main objective is to train deep learning models on all
available data. From this objective we do not require our deep learning models
to exceed shallow model accuracy, but we do require the deep models to at least
match shallow model accuracy. Thus, the main question we want to answer is:
Can deep learning models achieve accuracies comparable with
shallow learning models in analysing geospatial vector shapes?
These are the contributions made in this paper:
1. We compare the performance of shallow and deep learning methods on
geospatial vector data, as detailed in Section 3. We show that the deep
learning models introduced here match shallow models in accuracy at
classification tasks on real-world geospatial polygon data.
2. We introduce three classification tasks restricted to geospatial vector poly-
2
Term GIS ML
Geometry A spatial representation
of an object encoded as
one or more points that
may be interconnected
Vector A geometry defined by A one-dimensional array
vertices and edges
Vectorization Conversion of raster or Conversion of data into
analog data into a tensor interpretable by
geospatial vector a machine learning algo-
geometries rithm
Feature A geospatial object A data property
Shape A geospatial object A tensor size along its di-
geometry mensions
K-nearest neighbours The k spatially closest A learning algorithm
objects based on closest resem-
blance
gons that serve as a novel and open access benchmark on geospatial vector
shape recognition, detailed in Section 4. The benchmark data files are
available as open data.1
Since the domains of geospatial information systems (GIS) and machine
learning (ML) have partially overlapping vocabularies, we provide Table 1 of
homonyms and their use in the two fields of GIS and ML. Where used in this
article, the terms are clarified by their field or, where possible, avoided.
3
2. Vector data is almost always more compact in comparison with raster
data. Depending on the accuracy of the source data, materialisation of
vector data into raster data often requires expansion to transform the
vector data into a uniform rasterized sampling of a continuous field.
3. Geospatial vector data can be reasoned over by any Geospatial Information
System (GIS) in terms of topology: properties of geospatial objects with
respect to other geospatial objects in the same set that are invariant under
linear transformations, such as object intersection or spatial adjacency
(Huisman and De By, 2009, 102). With rasterization, this information
may be partially or completely lost: a small gap between two disjoint
geometries for example may be lost if the pixel resolution is lower than
the gap size.
Thus, the rasterization process is trivial but lossy, where the inverse process
of geospatial vectorization is non-trivial and requires human or algorithmic in-
terpretation (Huisman and De By, 2009, 309). For these reasons, it is important
to explore the capabilities of shape analysis by machine learning models without
resorting to rasterization.
The further article structure is as follows: we position our work within re-
lated research in Section 2, we explain the methods of our research in Section 3,
we discuss the classification tasks in Section 4, and the model performance re-
sults on these tasks in Section 5.
2 Related work
The vast majority of machine learning research in the geospatial domain is
focused on analysis of remote sensing data, as shown by overview works from,
for example, Zhu et al (2017) and Ball et al (2017); and by challenges such as the
CrowdAI mapping challenge2 and the DeepGlobe Machine Vision Challenge3
(Demir et al, 2018).
Compared to remote sensing raster data, far fewer publications go into the
matter of analysing geospatial vector shape data through machine learning
strategies. The most common method is to rasterize the vector shapes first.
Xu et al (2017) have published a deep learning strategy for comparing building
footprints. However, the approach by Xu et al. requires preprocessing that ras-
terizes aggregated data and does not classify individual geometries. Similarly,
the shapes in the deep learning image retrieval task through sketches described
by Jiang et al (2017) are raster-based rather than vector-based abstractions, as
are the aircraft shapes extracted from remote sensing in the work by Wu et al
(2016). The work on 3D model retrieval by Wang et al (2017) uses a different
rasterization strategy: 2D-projected images are generated from 3D models to
create an image search database. Cheng and Han (2016, 2-9) survey a number of
2 https://www.crowdai.org/challenges/mapping-challenge
3 http://www.grss-ieee.org/news/the-deepglobe-machine-vision-challenge/
4
works involving geometric shape data, but aimed at classical (i.e. non-machine
learning) remote sensing object detection strategies, rather than on machine
learning analysis of the geometric shapes themselves. In contrast to the raster-
based strategies from these works, we aim to research the possibility of avoiding
the rasterization process and operate on geometries directly, as will be explained
in Section 3.2.1.
Research on machine learning analysis of non-rasterized vector shapes is
scarce. The algorithms used by Andrášik and Bı́l (2016) are trained directly
on geometry properties based on angles and radii of vertices in road sections,
extracted from simplified road geometries. Their method, however, is optimized
to the specific task of classifying short road sections from short polylines. Effati
et al (2015, 120-121) adopt a similar strategy for the road properties for the
purposes of traffic safety analysis. We aim to explore more generic shape anal-
ysis methods through machine learning, rather than task-specific ones. A deep
learning model operating on vector geometries was developed by Ha and Eck
(2018), using a model they named sketch-rnn. Sketch-rnn shows how a deep
learning architecture can be used work with vector geometries directly. The
data collected for sketch-rnn used a web-based crowd-sourcing tool, inviting
users to draw simple vector drawings of cats, t-shirts and a host of other object
categories. Given an object category, the generative sketch-rnn model is able
to analyse partial shapes drawn by the user and extrapolate these to complete
sketches.4
3 Methods
The classification tasks in this paper operate on real-world polygon data. To
be precise, we use the term polygon to mean a single connected sequence (i.e.
without polygon holes) of three or more coplanar lines. Every line in a polygon
is defined by two points in R2 , where each point is shared by exactly two lines to
form a closed loop. We impose no validity constraint on polygons, i.e. polygons
may be self-intersecting.
5
Figure 1: Order 1, 2, 3 and 4 elliptic Fourier reconstruction approximations
(red) of a polygon (blue). Each order level adds to the approximation. Adapted
from Kuhl and Giardina (1982, 237)
6
3.1.2 Shallow model selection
As explained in Section 1 we distinguish between two families of machine learn-
ing methods. From the shallow model family, we selected four standard algo-
rithms:
7
the polygon as the final stop 0 0 1 . In case of a multipolygon,
each
sub-polygon is terminated by a sub-geometry stop 0 1 0 except for
the last, which is marked as a final stop.
and
bimax = max pi − pi . (3)
This is a simpler two-value version of the standard bounding box that would
normally list the minimum and maximum values for a geometry in two dimen-
sions. Scale factor s is then computed as the scalar standard deviation over all
bounding values B:
8
4.8643 4.8644 4.8645 4.8646 4.8647
52.3340 52.3340
52.3339 52.3339
52.3338 52.3338
(a)
Polygon coordinates Center: remove mean of Scale: divide by scale
[4.8644271, 52.3339057] factor of 2.64501e-4
4.86447, 52.33384 4.2857e-5, -6.5714e-5 0.16198, -0.24845
4.86447, 52.33386 4.2857e-5, -4.5714e-5 0.16198, -0.17283
4.86456, 52.33386 1.32857e-4, -4.5714e-5 0.50229, -0.17283
4.86456, 52.33386 1.32857e-4, 1.44286e-4 0.50229, 0.54550
4.86423, 52.33405 -1.97143e-4, 1.44286e-4 -0.74534, 0.54550
4.86423, 52.33405 -1.97143e-4, -6.5714e-5 -0.74534, -0.24845
4.86447, 52.33384 4.2857e-5, -6.5714e-5 0.16959, -0.24845
(b)
Tensor representation
[0.16198, -0.24845, 1, 0, 0],
[0.16198, -0.17283, 1, 0, 0],
[0.50229, -0.17283, 1, 0, 0],
[0.50229, 0.54550, 1, 0, 0],
[-0.74534, 0.54550, 1, 0, 0],
[-0.74534, -0.24845, 1, 0, 0],
[0.16959, -0.24845, 0, 0, 1]
(c)
Figure 2: A building polygon (a) with its coordinate normalization by local mean
subtraction and global scaling (b) and the vector representation (c). Coordinates
(in CRS84 projection) and standard deviation have been truncated to five digit
precision for the sake of brevity. In the final tensor representation (c) the render
type is added.
9
reduce training time on what otherwise would be a large array of very small
batches. If there are insufficient geometries of sequence length mbin to create
a set of samples of batch size, smaller geometries are added and padded to
sequence length mbin . Thus, a geometry with a sequence length m of 144 points
is zero-padded to a size mbin of 148 if the largest sequence length in the batch
is 148. This preprocessing of binning and limited padding reduced the training
time to one quarter of the time needed for training on fixed size sequences.
Although there is no theoretical upper bound to the sequence length, there
is a practical one for the amount of memory on commodity hardware. The data
sets contain a small amount of very large geometries. To improve computational
efficiency and prevent memory errors, these rare cases are simplified using the
Douglas-Peucker algorithm (Douglas and Peucker, 1973). In this way, only 0.17
percent of the geometries in our experiments needed to be simplified.
10
shape: (#classes) shape: (#classes)
shape: (m, 5)
Input layer
11
vector: 32
Max pooling
pool size: 3
vector: 32
stride: 3
CNN 1D
filters: 32
kernel: 5
vector: 5
...
geometry 3
geometry 2
geometry 1
sequence length: 9 + padding
Figure 4: The first two layers of the CNN model. With a kernel size of five, the
CNN inspects a sliding window over the first five geometry vectors in geometry
G1 , producing the green element in the CNN output vector. The CNN then
moves to the five elements to the right, and produces the red vector element,
next the orange vector, repeated until the end of the geometry (the next three
windows in grey). This process is repeated for each filter and then moves to the
next geometry, in the direction of the black arrow. The max pooling operation
combines the maximum output element values of the CNN, shown in purple for
geometry 1.
12
vector: 32
output
output
output
output
output
output
LSTM cell
LSTM cell
LSTM cell
LSTM cell
LSTM cell
LSTM cell
state
state
state
state
state
g1 g2 g3 g4 g5 g6
vector: 5
geometry 3
geometry 2
geometry 1
sequence length: 6
Figure 5: Forward-facing LSTM, as part of the first layer of the LSTM model.
Unlike the CNN architecture, complete vectors are fed one by one to the same
LSTM cell. The green boxes therefore represent the same cell, with only its
state updated: along with each next geometry vector, the output and previous
state of the LSTM cell are passed along from one vector to the next. For the
purposes of classification as in this article, only the last LSTM output is returned
(in orange), the intermediate outputs (in grey) are discarded.
padding. Padding ensures outputs by the CNN of the same sequence length
as the input, to prevent size errors on small geometries where the tensor size
becomes too small to pass through the specified network layers. After g1i through
g5i the CNN continues at the second set of geometry entries g2i through g6i . After
inspecting all values of all the vectors in the first geometry, the CNN continues
at the next geometry (see Figure 4).
The first CNN layer is followed by a max pooling layer with a pooling size
of three and a stride of three. The max pooling operation with a pool size of
three combines the maximum values of three CNN output vectors into a single
sequence vector of the same length. The reduction of the CNN output to one-
third is specified by the max pooling stride hyperparameter: after combining
CNN output vectors ci1 , ci2 and ci3 , the max pooling operation skips forward
to combine outputs ci4 , ci5 and ci6 , and so on. After the max pooling layer, a
second convolution layer (not shown in Figure 4) interprets the output of the
13
max pooling layer, with hyperparameters identical to the first but with 64 filters
instead of 32. This CNN layer is followed by a global average pooling layer that
reduces the tensor rank to two by computing the average over the third tensor
axis. The output is subsequently fed to a ReLU activated fully connected layer.
The last layer is a softmax-activated fully connected layer to produce probability
outputs that sum to one.
14
sification tasks as explained in Section 4, for which our models are required to
correctly assign individual polygons to a certain type, based on only the polygon
shape.
For the classification tasks, we select data sets on the following requirements:
• Each task contributes to evaluate the accuracy performance of deep learn-
ing models versus shallow models. The data sets for the tasks contain
real-world polygon data from different domains, with different use cases
and on different spatial scales;
• Each data set contains enough data to draw conclusions on model general-
ization; we set a requirement for data sets of at least 12,000 geometries in
order to provide a training set of at least 10,000 geometries, a validation
set of at least 1,000 geometries and a test set of at least 1,000 geometries;
• Each task requires the models to infer information from the polygon shape,
and from the polygon shape alone. To this end, data is selected to be likely
to contain shape information relevant to the classification task but not a
trivial solution;
• We require our data to be available under an open license, to be accessible
for future research.
Through the use of classification tasks, both the shallow and deep model perfor-
mance can be directly compared. Classification tasks can be expressed through
the simple metric of accuracy score: the ratio of correctly assigned test samples
over the total set of test samples. We also add a baseline majority class accu-
racy score: the fraction of the prevalent class, included as a baseline of the most
simple method to exceed. The models are trained on a training data set, the
model performance was iteratively tuned to best perform on an evaluation set
and finally tested once on a test set that was unseen during any of the tuning
runs. The resulting test set accuracy scores are listed in Table 3. We include
a discussion of the misclassification behaviour of the models by analysing the
confusion matrices (Stehman, 1997).
For optimal reproducibility, we use only open data7 and open source meth-
ods to answer our research question. We release all preprocessing code, deep
learning models and shallow models as open source software.8 Scikit-learn (Pe-
dregosa et al, 2011) provides the shallow learning algorithms. To evaluate shal-
low model accuracy on each task, we use a brute force grid search with 5-fold
cross validation to find the best applicable hyperparameters for k (k-nearest
neighbours), degree (decision tree), C (SVM, logistic regression) and gamma
(SVM). SVM grid searches are restricted to a maximum number of 10M itera-
tions to allow the grid search operation to complete within a day. Grid searches
on SVM models and k-nearest neighbours are restricted to a subset of the train-
ing data to allow the grid search to finish within a day on commodity hardware.
7 Data available at http://hdl.handle.net/10411/GYPPBR
8 Code available at https://github.com/SPINlab/geometry-learning
15
All shallow models are trained, however, using the full training set on the the
best hyperparameters obtained from the grid search. The deep learning models
are implemented using Keras (Chollet et al, 2015) version 2 with a TensorFlow
(Abadi et al, 2016) version 1.7 backend. All deep model hyperparameters are
tuned on the full training and validation set, using validation data split ran-
domly from the training data.
For the grid searches, we do not assume that including an arbitrarily high
number of descriptors produces the best accuracy score. Instead, the number of
extracted Fourier descriptors used during training is included as a hyperparam-
eter in the grid search for each shallow model, to produce the descriptor order
at which the grid search obtains the best results. The best parameters found in
the grid searches are listed in Table 4 of Appendix A.
4 Tasks
We created a set of three classification tasks to evaluate the performance of
several machine learning algorithms in shape recognition on real-world poly-
gon data. From the requirements listed in Section 3.4 the following tasks and
accompanying data sets were selected:
1. Predicting whether the number of inhabitants in a neighbourhood is above
or below the national median, based on the neighbourhood geometry;
2. Predicting a building class from the building contour polygon. The avail-
able classes are buildings for the purpose of gathering; industrial activity;
lodging; habitation; shopping; office buildings; health care; education; and
sports.
3. Predicting an archaeological feature type from its geometry. Features
are available as an instance of either a layer; wall; ditch; pit; natural
phenomenon; post hole; well; post hole with visible post; wooden object;
or recent disturbance.
The classes and their frequencies are displayed in Table 2.
16
4.85 4.86 4.87 4.88 4.89
52.34 52.34
52.33 52.33
17
4.860 4.864 4.868 4.872
Health care
Industry
Office
Lodging
Education
Shopping
4.860 4.864 4.868 4.872
Habitation
Other
5.2656 5.2657
52.7168 52.7168
Archaeology types
Ditch
Wooden object
Pit
Layer
Wall
Natural
Post hole
52.7167 52.7167
5.2656 5.2657 Recent
Well
18
Neighbourhood Buildings Archaeological
inhabitants features
Class frequency Function frequency Class frequency
≥ median 6,610 Habitation 23,000 Post hole 24,991
< median 6,598 Industrial 23,000 Pit 7,713
Total 13,208 Lodging 23,000 Natural 6,136
Shopping 23,000 phenomenon
Gatherings 22,007 Recent 5,625
Office 21,014 disturbance
Education 10,717 Ditch 4,926
Healthcare 7,832 Wooden 2,499
Sports 6,916 object
Total 160,486 Layer 1,321
Wall 1,387
Post hole with 1,005
visible post
Water well 980
Total 56,583
Table 2: Class frequency for the three tasks of neighbourhood inhabitants (left),
building types (middle) and archaeological feature types (right).
19
trimmed to a maximum of the first 23,000 instances per class to prevent creating
a data set too large to experiment on. With this selection, the buildings set still
has the most data of the three tasks. The task of classifying a building based on
its shape can have clear quality control benefits: one could assess the likelihood
of a specific building to belong to a specific type based on shape characteris-
tics. Subsequently, a system could flag buildings as suspect that fall below a
likelihood threshold for their current building class and suggest a replacement
building class based on a higher likelihood.
5 Evaluation
The accuracy scores of the model performance on each of the three tasks allow
us to compare the performance of the deep learning models against the shallow
learning models. Table 3 shows the results for each of the three benchmark
tasks. The accuracy scores were produced from model predictions on the test
set, consisting of geometries in the data set that were unseen by the models
during training. The deep learning model experiments were repeated ten-fold:
randomized network initialisation and batch sampling produce slight variations
in accuracy scores between training sessions. The accuracy figures for the deep
neural models therefore represent mean and standard deviation from the test
predictions on the independently repeated training sessions.
We note the following conclusions from these accuracy scores:
1. The deep neural nets are at least competitive with the best shallow models,
for each of the three tasks. In five out of the six deep learning experiments,
13 https://easy.dans.knaw.nl
20
Method Task (no. of classes)
Neighbourhood Building types Archaeological
inhabitants (2) (9) feature types
(10)
Majority class 0.514 0.142 0.444
k-NN 0.671 0.377 0.596
Logistic regression 0.659 0.328 0.555
SVM RBF 0.683 0.365 0.601
Decision tree 0.682 0.389 0.615
CNN 0.664 ± 0.005 0.408 ± 0.003 0.624 ± 0.002
RNN 0.608 ± 0.016 0.389 ± 0.008 0.614 ± 0.004
Table 3: Table of results with accuracy scores for the majority class (top row)
shallow models (middle four rows) the deep learning models (bottom two rows),
with the best scores per task in bold. The number of classes per task is listed
between brackets in the column headers. The standard deviations on the deep
learning models on the bottom two rows were obtained from test set predictions
on ten-fold repeated, independent training sessions.
the deep models perform on par with or slightly better than the best
shallow models, but in the broad sense they do not outperform the shallow
models by a wide margin.
2. On two of the three classification tasks, the CNN architecture is able
to outperform the shallow models by a few percentage points. If top
performance in a certain geometry classification task is required, the CNN
is likely to be a good choice.
To gain further insight into model performance, we include an analysis the
confusion matrices of the test runs. As there are 18 confusion matrices in total,
these are not included in the article.14 In general the misclassification is reflected
in similar patterns across all models, with higher errors for models that under-
perform on a certain task. A few task-specific details are discussed in B, but in
general the misclassification behaviour of the deep models does not differ from
the shallow models.
As mentioned in Section 3.2.1, the number of elliptic Fourier descriptors
used for training the shallow models was included as a hyperparameter in the
grid search. A closer inspection of these hyperparameters in Appendix A is of
interest:
1. Nearly all shallow models benefit from adding Fourier descriptors. A no-
table exception is the k-nearest neighbours algorithm, which scores the
14 The confusion matrices are available for download from https://dataverse.nl/api/
access/datafile/13051
21
highest accuracy in two of the three tasks only when no Fourier descrip-
tors are added to the training data. As it appears from the three tasks,
the k-NN algorithm is less able to extract meaningful information from
the Fourier descriptors.
2. The shallow models have a clear preference for lower orders of Fourier de-
scriptors. Even though many higher orders (up to order 24) were tested,
no shallow model was able to perform better on descriptor orders higher
than four. Order four descriptors only provide a very rough approxima-
tion of the original geometry, as is well visualised in Kuhl and Giardina’s
paper Kuhl and Giardina (1982, 243) and Figure 1. Still, the descriptors
evidently contain enough important shape information for most shallow
models to improve the accuracy score.
3. Support vector machines come with a misclassification tolerance hyperpa-
rameter C. In situations where SVMs with high C settings (low tolerance)
lead to higher performance, such as the ones for the archaeology classifica-
tion task, the training sessions were exceedingly time-consuming to train
on our data. Where low C-values tended to converge in seconds, high val-
ues could literally take days or even weeks to converge. To prevent having
to wait for extended periods of time—there is no indication in what time
frame a training session on a set of hyperparameters will converge—we
needed to constrain the amount of training data and the maximum of it-
erations, especially on hyperparameter grid searches. It is quite possible
that as a consequence of these constraints, the grid search fails to produce
the optimal hyperparameter settings, but this is an unfortunate side effect
of using SVMs on Fourier descriptors of geometries.
22
whether these figures can actually be improved on. If there is a hard ceiling
at the best performing models, perhaps the benchmarks can be improved by
including more data than just the geometries alone, for example information
gathered from the direct spatial surroundings or other properties of the spatial
objects. The benchmark presented here can be considered a first attempt.
An area that might see improvement is the performance of LSTMs. In an
earlier development stage, the LSTMs were trained on fixed length rather than
on the variable length sequences. During this stage, the LSTMs performed sig-
nificantly better (on validation data, no final tests were performed) on fixed
length sequences, outperforming the CNNs. However, training on fixed length
sequences was abandoned because it requires simplifying geometries to a fixed
maximum of points per geometry. This was not consistent with our aim of
training on all available information. Creating fixed length sequences also re-
quired adding a large amount of zero-padding to increase sequence length on
all geometries shorter than the fixed size. After switching to variable length
sequences, the performance of the CNN models increased and the LSTM per-
formance dropped considerably. We hypothesize that there is room to improve
the LSTM model configuration to CNN model performance or perhaps even
better. To test this in future research, the fixed length sequences were included
in the benchmark data.
There are several open questions to further explore the use of deep learning
models for geometries. It would be helpful to verify the accuracy on other
types of geometries, such as multi-lines, multi-points or even heterogeneous
geometry collections. Also, the deep neural net’s comprehension of holes in
polygons could be beneficial. Another interesting road to explore is to combine
geometries with other information sources as input data, such as remote sensing
data or textual descriptions. Deep learning poses a viable route to explore more
complex pipelines. Such pipelines could include geometries and other modalities
as inputs, to produce multi-modal combinations of sequences (Sutskever et al,
2014), images (He et al, 2017) or new geometries (Ha and Eck, 2018) as output.
This paper is a step in that direction, by showing that is possible to have deep
neural nets learn from geometries directly.
Appendices
A Hyperparameter grid search results for shal-
low models
The grid searches discussed in Section 3 resulted in a set of best hyperparameter
settings for the shallow models. These best settings are listed in Table 4 and
include the ranges that were searched. The range for the elliptic fourier descrip-
tor order o is always the same: each grid search was executed on the orders
h0, 1, 2, 3, 4, 6, 8, 12, 16, 20, 24i. The search intervals for the other hyperparam-
23
Method Task
Neighbourhood Building types Archaeological
inhabitants (2) (9) feature types
(10)
o=2 o=3 o=3
Decision tree
d=6 in [4, 9] d=10 in [6, 12] d=9 in [5, 10]
o=1 o=0 o=0
k-NN k=26 in [21, 30] k=29 in [21, 30] k=29 in [21, 30]
o=1 o=0 o=2
SVM RBF C=1 in 1e[−2, 3] C=1000 in C=100 in
1e[−2, 3] 1e[−1, 3]
γ=1 in 1e[−3, 3] γ=10 in 1e[−2, 3] γ=0.01 in
1e[−4, 4]
Logistic o=1 o=4 o=8
regression C=0.01 in C=1 in 1e[−2, 3] C=1000 in
1e[−3, 1] 1e[−2, 3]
Table 4: Hyperparameters for the shallow models. Interval values for decision
tree and k-nearest neighbours ∈ N, for the SVM in log scale, with the exponent
interval ∈ N.
24
phenomena.
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis
A, Dean J, Devin M, Ghemawat S, Goodfellow IJ, Harp A, Irving G, Isard
M, Jia Y, Józefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga
R, Moore S, Murray DG, Olah C, Schuster M, Shlens J, Steiner B, Sutskever
I, Talwar K, Tucker PA, Vanhoucke V, Vasudevan V, Viégas FB, Vinyals
O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) Tensorflow:
Large-scale machine learning on heterogeneous distributed systems. CoRR
abs/1603.04467, URL http://arxiv.org/abs/1603.04467
Andrášik R, Bı́l M (2016) Efficient road geometry identification from digital
vector data. Journal of Geographical Systems 18(3):249–264, DOI 10.1007/
s10109-016-0230-1, URL https://doi.org/10.1007/s10109-016-0230-1
Araya YH, Remmel TK, Perera AH (2016) What governs the presence of
residual vegetation in boreal wildfires? Journal of Geographical Systems
18(2):159–181, DOI 10.1007/s10109-016-0227-9, URL https://doi.org/10.
1007/s10109-016-0227-9
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly
learning to align and translate. CoRR abs/1409.0473, URL http://arxiv.
org/abs/1409.0473, 1409.0473
Ball JE, Anderson DT, Chan CS (2017) Comprehensive survey
of deep learning in remote sensing: theories, tools, and chal-
lenges for the community. Journal of Applied Remote Sensing
11(4):042609, URL https://www.spiedigitallibrary.org/journals/
Journal-of-Applied-Remote-Sensing/volume-11/issue-4/042609/
Comprehensive-survey-of-deep-learning-in-remote-sensing--theories/
10.1117/1.JRS.11.042609.full
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal mar-
gin classifiers. In: Proceedings of the fifth annual workshop on Computational
learning theory, ACM, pp 144–152, URL http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.21.3818&rep=rep1&type=pdf
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and re-
gression trees. Wadsworth & Brooks/Cole Advanced Books & Software
25
Brown V, Jacquier G, Coulombier D, Balandine S, Belanger F, Legros D (2001)
Rapid assessment of population size by area sampling in disaster situations.
Disasters 25(2):164–171, URL http://www.parkdatabase.org/files/
documents/2001_brown_et_al_rapid_assessment_of_population_size_
by_area_sampling_in_disaster_situations.pdf
Cheng G, Han J (2016) A survey on object detection in optical remote sensing
images. ISPRS Journal of Photogrammetry and Remote Sensing 117:11–28,
URL https://arxiv.org/pdf/1603.06201
Chollet F, et al (2015) Keras. https://github.com/fchollet/keras
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE transac-
tions on information theory 13(1):21–27, URL https://www.cs.bgu.ac.il/
~adsmb182/wiki.files/borak-lecture%20notes.pdf
Cox DR (1958) The regression analysis of binary sequences. Journal of the Royal
Statistical Society Series B (Methodological) 20(2):215–242, URL http://
www.jstor.org/stable/2983890
Demir I, Koperski K, Lindenbaum D, Pang G, Huang J, Basu S, Hughes F, Tuia
D, Raskar R (2018) Deepglobe 2018: A challenge to parse the earth through
satellite images. ArXiv e-prints URL https://arxiv.org/abs/1805.06561
Dijkstra J (2012) Wijk bij Duurstede veilingterrein do opgraving. https://doi.
org/10.17026/dans-x8d-qmae, DOI 10.17026/dans-x8d-qmae
Dijkstra J, Zuidhoff F (2011) Veere rijksweg N57 proefsleuven begeleid-
ing opgraving. https://doi.org/10.17026/dans-xyc-re2w, DOI 10.17026/
dans-xyc-re2w
Dijkstra J, Houkes M, Ostkamp S (2010) Gouda Bolwerk opgraving en
begeleiding. https://doi.org/10.17026/dans-xzm-x29h, DOI 10.17026/
dans-xzm-x29h
Domingos P (2012) A few useful things to know about machine learning.
Communications of the ACM 55(10):78–87, URL https://dl.acm.org/
citation.cfm?id=2347755
Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number
of points required to represent a digitized line or its caricature. Cartograph-
ica: The International Journal for Geographic Information and Geovisual-
ization 10(2):112–122, DOI 10.3138/FM57-6770-U75U-7727, URL https:
//doi.org/10.3138/FM57-6770-U75U-7727
Effati M, Thill JC, Shabani S (2015) Geospatial and machine learning tech-
niques for wicked social science problems: analysis of crash severity on
a regional highway corridor. Journal of Geographical Systems 17(2):107–
135, DOI 10.1007/s10109-015-0210-x, URL https://doi.org/10.1007/
s10109-015-0210-x
26
Fan H, Zipf A, Fu Q, Neis P (2014) Quality assessment for building footprints
data on openstreetmap. International Journal of Geographical Information
Science 28(4):700–719
Gerrets D, Jacobs E (2011) Venlo TPN deelgebied 1 en 2 opgraving. https:
//doi.org/10.17026/dans-26f-55zu, DOI 10.17026/dans-26f-55zu
Gilissen V (2017) Archiving the past while keeping up with the times. Studies
in Digital Heritage 1(2):194–205, DOI 10.14434/sdh.v1i2.23238, URL https:
//doi.org/10.14434/sdh.v1i2.23238
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair
S, Courville A, Bengio Y (2014) Generative adversarial nets. In:
Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ
(eds) Advances in Neural Information Processing Systems 27, Curran
Associates, Inc., pp 2672–2680, URL http://papers.nips.cc/paper/
5423-generative-adversarial-nets.pdf
27
Jiang T, Xia G, Lu Q (2017) Sketch-based aerial image retrieval. In: 2017 IEEE
International Conference on Image Processing (ICIP), pp 3690–3694, DOI 10.
1109/ICIP.2017.8296971, URL http://captain.whu.edu.cn/papers/ICIP_
jiang.pdf
Keyes L, Winstanley AC (1999) Fourier descriptors as a general classification
tool for topographic shapes. IPRCS, pp 193–203, URL http://eprints.
maynoothuniversity.ie/66/
Kuhl FP, Giardina CR (1982) Elliptic fourier features of a closed contour.
Computer graphics and image processing 18(3):236–258, DOI 10.1016/
0146-664X(82)90034-X, URL http://dx.doi.org/10.1016/0146-664X(82)
90034-X
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–
444, DOI doi:10.1038/nature14539, URL http://dx.doi.org/10.1038/
nature14539
Loncaric S (1998) A survey of shape analysis techniques. Pattern Recog-
nition 31(8):983 – 1001, DOI https://doi.org/10.1016/S0031-2023(97)
00122-2, URL http://www.sciencedirect.com/science/article/pii/
S0031202397001222
Montero JM, Mı́nguez R, Fernández-Avilés G (2018) Housing price predic-
tion: parametric versus semi-parametric spatial hedonic models. Journal of
Geographical Systems 20(1):27–55, DOI 10.1007/s10109-017-0257-y, URL
https://doi.org/10.1007/s10109-017-0257-y
Mou L, Ghamisi P, Zhu XX (2017) Deep recurrent neural networks for hyper-
spectral image classification. IEEE Transactions on Geoscience and Remote
Sensing 55(7):3639–3655, DOI 10.1109/TGRS.2016.2636241, URL http://
dx.doi.org/10.1109/TGRS.2016.2636241
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep
learning. In: Proceedings of the 28th international conference on machine
learning (ICML-11), pp 689–696, URL http://ai.stanford.edu/~ang/
papers/icml11-MultimodalDeepLearning.pdf
28
Roessingh W, Blom E (2012) Oosterhout Vrachelen de Contreie Vrachelen
4 opgraving. https://doi.org/10.17026/dans-25d-fpe5, DOI 10.17026/
dans-25d-fpe5
Roessingh W, Lohof E (2010) Enkhuizen Kadijken 5a en 5b opgraving. https:
//doi.org/10.17026/dans-27r-e5f8, DOI 10.17026/dans-27r-e5f8
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks.
IEEE Transactions on Signal Processing 45(11):2673–2681, DOI
10.1109/78.650093, URL https://www.researchgate.net/profile/Mike_
Schuster/publication/3316656_Bidirectional_recurrent_neural_
networks/links/56861d4008ae19758395f85c.pdf
Stehman SV (1997) Selecting and interpreting measures of thematic classifi-
cation accuracy. Remote sensing of Environment 62(1):77–89, URL https:
//www.researchgate.net/profile/Stephen_Stehman/publication/
222169047_Selecting_and_interpreting_measures_of_thematic_
classification_accuracy/links/5b5a0fe5a6fdccf0b2f8fe87/
Selecting-and-interpreting-measures-of-thematic-classification-accuracy.
pdf
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learn-
ing with neural networks. In: Advances in neural information pro-
cessing systems, pp 3104–3112, URL http://papers.nips.cc/paper/
5346-sequence-to-sequence-learning-with-neural-networks.pdf
Van der Veken B, Blom E (2012) Veghel Scheiffelaar ii opgraving. https://
doi.org/10.17026/dans-z93-7zbe, DOI 10.17026/dans-z93-7zbe
Van der Veken B, Prangsma N (2011) Montferland Didam westelijke randweg
Kerkwijk opgraving. https://doi.org/10.17026/dans-zmk-35vy, DOI 10.
17026/dans-zmk-35vy
van der Velde H (2011) Katwijk Zanderij Westerbaan opgraving. https://doi.
org/10.17026/dans-znz-r2ba, DOI 10.17026/dans-znz-r2ba
van de Velde H, Ostkamp S, Veldman H, Wyns S (2002) Venlo Maasboulevard.
https://doi.org/10.17026/dans-x84-msac, DOI 10.17026/dans-x84-msac
Wang Y, Zhang L, Tong X, Liu S, Fang T (2017) A feature extraction and
similarity metric-learning framework for urban model retrieval. International
Journal of Geographical Information Science 31(9):1749–1769, DOI 10.1080/
13658816.2017.1334888, URL https://doi.org/10.1080/13658816.2017.
1334888, https://doi.org/10.1080/13658816.2017.1334888
Wu Q, Diao W, Dou F, Sun X, Zheng X, Fu K, Zhao F (2016) Shape-based ob-
ject extraction in high-resolution remote-sensing images using deep boltzmann
machine. International Journal of Remote Sensing 37(24):6012–6022, DOI 10.
1080/01431161.2016.1253897, URL https://doi.org/10.1080/01431161.
2016.1253897, https://doi.org/10.1080/01431161.2016.1253897
29
Xu Y, Chen Z, Xie Z, Wu L (2017) Quality assessment of building footprint
data using a deep autoencoder network. International Journal of Geographical
Information Science 31(10):1929–1951, URL http://www.tandfonline.com/
doi/abs/10.1080/13658816.2017.1341632
Zahn CT, Roskies RZ (1972) Fourier descriptors for plane closed curves. IEEE
Transactions on computers C-21(3):269–281, DOI 10.1109/TC.1972.5008949,
URL http://dx.doi.org/10.1109/TC.1972.5008949
Zhang D, Lu G, et al (2002) A comparative study of fourier descriptors for shape
representation and retrieval. In: Proc. of 5th Asian Conference on Computer
Vision (ACCV), Citeseer, pp 646–651, URL http://citeseerx.ist.psu.
edu/viewdoc/summary?doi=10.1.1.73.5993
Zhu XX, Tuia D, Mou L, Xia G, Zhang L, Xu F, Fraundorfer F (2017)
Deep learning in remote sensing: A comprehensive review and list of re-
sources. IEEE Geoscience and Remote Sensing Magazine 5(4):8–36, DOI
10.1109/MGRS.2017.2762307, URL http://arxiv.org/abs/1710.03959,
1710.03959
30