Satellite Image Segmentation With Convolutional Neural Networks (CNN)

This project was part of a Kaggle competition organized by the Ecole polytechnique fédérale de Lausanne, in Switzerland. Image segmentation is a research topic that is becoming increasingly popular for various tasks in computer vision, especially for what concerns the extraction of features from satellite images. In particular, this project aims to build a classifier that is able to detect whether an area represents a road or not, and label each pixel of the images accordingly. The task poses some challenges, as some roads are not directly visible (for instance, because they are covered by trees), and the labelling of the training set might be inconsistent in certain cases (e.g. parking lots and walkways). A custom convolutional neural network (CNN) has been developed in order to solve this specific task, and has proved to generate correct predictions with an accuracy of 93.6% on the test set. This score allowed us to reach the first place on the Kaggle competition.

Uploaded by

Mattia Martinelli

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

601 views

Satellite Image Segmentation With Convolutional Neural Networks (CNN)

Uploaded by

Mattia Martinelli

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

EPFL Machine Learning Project 2 Report

Road Segmentation
Dario Pavllo , Mattia Martinelli and Chanhee Hwang

Ecole polytechnique federale de Lausanne
Switzerland
[email protected], [email protected], [email protected]

AbstractThis project aims to build a classifier that seg-

ments satellite images, where the segmentation part consists
in separating the areas that represent roads from the rest.
Among various techniques, only convolutional neural networks
(CNNs) have been shown to solve this task with satisfying results.
Specifically, our method is based on a sliding window approach,
and has been deployed on a CNN that makes use of Leaky
ReLU activation functions and Dropout layers. Moreover, real-
time image augmentation has proved to improve the accuracy of
the model and avoid overfitting. According to our experiments,
this model outperforms all the other tested techniques.

I. I NTRODUCTION (a) Trees over roads (b) Parking lot

Image segmentation is a technique that is becoming increas- Fig. 1: Some areas from the training set and their respective
ingly popular for various tasks in computer vision. Generally ground truth masks (in red).
speaking, this process consists in labelling every part of
an image according to certain criteria. For instance, such
a technique could be used for face detection in photos, or of the proposed techniques, and Section VI gives a comparison
detection of roads in autonomous driving vehicles. of the obtained results.
Recently, the increase of computing performance, as well
II. E XPLORATORY DATA ANALYSIS
as the ability to exploit massively parallel computation with
GPUs, has led to the development of new machine learning The dataset consists of 100 satellite images of urban areas
techniques that are able to process images in reasonable time. and their respective ground truth masks, where white pixels
However, image processing has always been a challenging represent roads (foreground) and black pixels represent the rest
task, as the information is organized in a definite geometri- (background).
cal structure, and therefore algorithms should consider their The task is to classify blocks of 16 16 pixels, considering
morphology. In addition, the computational cost to process an that the label associated with each block corresponds to 1 if
image does not scale linearly with its size, and this has led the average value of the ground truth pixels in that block is
to the development of new techniques such as convolutional greater than a threshold (0.25), 0 otherwise.
neural networks, which foster sparse connections and weight By looking at the training set, it can be observed that the
sharing in order to reduce the complexity of the problem. classification task is not trivial, as some roads are covered
The aim of this project is to build a model that is able to by trees. Furthermore, some asphalt areas are not labelled
perform the segmentation of satellite images. Specifically, the as road (e.g. parking lots and walkways), and this could
segmentation consists in detecting which parts of the images potentially confuse the training model. Figure 1 shows these
are roads, and which parts are background (e.g. buildings, complications.
fields, water). For these reasons, it is agreeable to think that the classifier
This report provides a brief overview of different methods should take into consideration a sort of context, i.e. it should
that can be used to solve this problem, and particularly it look at nearby pixels in order to infer some information about
addresses convolutional neural networks, which represent the the block that is being classified.
state-of-the-art technique for image classification. Specifically,
Section III proposes several approaches and methods that III. M ODELS AND METHODS
can be used to perform this task. The rest is organized as In order to the evaluate the quality of the proposed model,
follows: Section IV provides some implementation details that it is important to have a baseline model that will be used
have proved useful to improve the performance of our model; as a comparison for the classification accuracy. Based on
Section V explains the methodology of the accuracy validation the observation that there are fewer foreground areas than
(a) Dead filters (ReLU) (b) Good filters (Leaky ReLU)
Fig. 2: Sliding window approach. The small square at the Fig. 3: Visualization of the filters in the first layer, with the
centre is the patch (of size 16 16) that is being classified, same model and the same training set, but different activation
whereas the big square represents the current context (i.e. functions.
window, of size 72 72). In this figure, subsequent windows
are spaced apart by 16 pixels (stride).
a longer training time (which can be already excessive). For
this reason, a variant known as Leaky ReLU has been used as
background areas, the first baseline model that has been used the activation function for all intermediate layers, with good
classifies all blocks as background (i.e. 0). results. It is defined as f (x) = max(x, x), with 1, and
It is possible to improve the initial result by using a linear in our case has been chosen to be equal to 0.1 . Although
classification model, such as the logistic regression. However, this might seem a high value, some studies have shown that
for the reasons mentioned in the previous section, such a model higher values perform better than lower ones [2], and with this
would not be able to correctly process the context of images, dataset = 0.1 has proved effective to prevent dead filters,
since it would not be capable to detect their morphological as shown in Figure 3.
structure (unless complex feature extraction techniques are
used).
In order to confirm this claim, a logistic regression model B. Image augmentation
has been implemented for comparison purposes. The input
Since the dataset is very small (100 images), an image
features correspond to the mean (over the entire 16 16
augmentation strategy has been adopted to virtually increase
patch) and the standard deviation of each RGB channel (for a
its size. Specifically, before being supplied to the neural
total of 6 features), and they are transformed according to a
network, each training sample (i.e window) is randomly ro-
polynomial basis of degree 4 (including interactions as well).
tated in steps of 90 degrees, and it is also randomly flipped
However, the most reasonable choice would be to use
horizontally/vertically. This effectively yields an increase of
a model based on convolutional neural networks (CNNs),
the dataset size by a multiplicative factor of 8, and has been
since they are well suited for images. Indeed, this model has
shown to greatly improve the accuracy of the model. The
provided excellent results on our dataset and has been adopted
implementation details of this technique are shown in section
as final solution.
Section IV.
According to our research, several methods have been
proposed to solve the task of per-pixel classification; the
one adopted in this project consists of a sliding window C. Regularization
approach [1]: the objective is to classify the block at the
Although the dataset augmentation helps to reduce overfit-
centre of an image, according a certain context, which in this
ting, the use of Dropout layers has been very effective in our
case corresponds to a square window of size window_size
model. They have been added after each max-pooling layer
window_size (a hyperparameter). Figure 2 shows this
(with p = 0.25), and also after the fully connected layer (with
technique more clearly.
p = 0.5). Furthermore, L2 regularization has been used for
For what concerns the neural network structure, the number
the weights (and not biases) of the fully connected and output
of layers and filters has been optimized to perform well on
layers, with = 106 .
this dataset. Furthermore, the following features have been
explored:
The window size has been empirically chosen in order to
A. Activation functions take into account a context that is large enough, considering
ReLUs are the standard choice for deep neural networks; that large windows are computationally expensive. Therefore,
however, when a high learning rate is used, some units can a size of 72 72 has proved to be a good compromise. Table I
get stuck and cause the so-called dead filters. This problem shows the complete structure of the proposed neural network,
can be mitigated by using a lower learning rate, at the cost of which is the result of various experiments.
Type Notes
Input 72 72 3
Convolution + Leaky ReLU 64 5 5 filters
Max Pooling 22
Dropout p = 0.25
Convolution + Leaky ReLU 128 3 3 filters
Max Pooling 22
Dropout p = 0.25
Convolution + Leaky ReLU 256 3 3 filters
Max Pooling 22
Dropout p = 0.25
Convolution + Leaky ReLU 256 3 3 filters
Max Pooling 22
Dropout p = 0.25
Fully connected + Leaky ReLU 128 neurons
Dropout p = 0.5
Output + Softmax 2 neurons
TABLE I: Full list of layers in the neural network.

IV. I MPLEMENTATION DETAILS AND TRAINING

The neural network has been implemented and trained
through the Keras library, which is well suited for prototyping
and can use either TensorFlow or Theano as backend.
One problem with per-pixel classifications is that a stride
(i.e. how much the sliding window moves between each
training sample) has to be chosen. Ideally, to guarantee that
the model is shift-invariant, the stride should be equal to
1; however, this leads to an extremely large training set
which does not fit in memory, and the condition is worsened
by the dataset augmentation. As a consequence, we have
implemented a real-time training set generator that works as
follows: at each iteration, the algorithm generates a mini-batch
composed of randomly sampled windows from the original
training set; all images are transformed according to the image
augmentation technique described in Section III-B, and their
ground truth is calculated; finally, they are supplied to the
learning algorithm. This method ensures that, eventually, every Fig. 4: Mirror boundary conditions. The red line has been
possible area of the training set will be explored. Furthermore, added for illustration purposes, and shows the original bound-
the algorithm is executed on a separate thread, thereby causing aries of the images.
no performance hit.
For what concerns the behaviour at image borders, mirror
boundary conditions have been applied, i.e. the image is V. VALIDATION OF RESULTS
reflected along the boundary axis, as shown in Figure 4. This In order to verify the robustness and the accuracy of the
produces a good estimation, especially because the majority model, it must be tested with a known validation set.
of images are vertically or horizontally aligned. As a first approach, the predictions have been tested with
The model has been trained with a NVIDIA GeForce GTX a static validation set. The original training dataset has been
960 graphics card (with 2 GB of VRAM), using 32 bit floating- separated into two subsets: 75% of the dataset has been used
point precision and Theano backend. The loss function is the to actually train the model (training set), while the remaining
softmax categorical cross-entropy, which has been minimized 25% has been used to perform cross-validation (validation
using the Adam optimizer [3] (a variant of SGD), with set). This simple method, which corresponds to a partial 4-
automatic learning rate adjustment once the accuracy reaches a fold cross-validation, does not require long execution cycles
plateau for a certain number of iterations (it is halved when the to be executed, and therefore it has been preferred for the
training accuracy has not improved over the last 10 iterations). first experiments. After having selected a number of candidate
Furthermore, the initial learning rate is set to = 0.001 and models, they have been validated with a full k-fold cross-
the batch size is 125 samples. validation algorithm (with k = 4), which has been used
to estimate their final accuracy, along with their standard
deviation.
Since convolutional neural networks require a long time to
be trained, it is not feasible to optimize their hyperparameters
through a grid search. Therefore, they have all been tuned
manually according to various heuristics.
VI. F INAL RESULTS
The final cross-validation results are shown in Table II
and Figure 5. As can be observed, the approach based on
the convolutional neural network with Dropout layers, Leaky
ReLU activations and image augmentation is clearly superior,
with an accuracy of 92.89% 0.7%. Furthermore, this value
is expected to be greater when the model is trained with the
full dataset.
The difference between the methods based on CNNs is
small, but still significant.

# Model Accuracy
A All background 74.09% 1.2%
B Logistic regression 78.53% 0.2%
C CNN 92.05% 0.9%
D CNN + LR 92.22% 0.9%
E CNN + LR + D 92.57% 1.0%
F CNN + LR + D + IA 92.89% 0.7%
TABLE II: Tested models along with their respective cross-
validation results.
Legend: CNN: Convolutional Neural Network; LR: Leaky
ReLU; D: Dropout; IA: Image Augmentation.

Fig. 6: Segmentation of two images from the test set.

cases such as streets covered with trees or image boundaries.

However, the model also tends to produce glitches in certain
cases, e.g. diagonal streets. This might be caused by the fact
that block-classification (in 16 16 patches) is not suitable for
these cases, as well as the fact that the majority of the training
set is aligned either horizontally or vertically.
Nevertheless, the final result is quite impressive, and is
comparable to human performance.
Fig. 5: Comparison among the different models.
R EFERENCES
VII. C ONCLUSIONS [1] S. Bittel, V. Kaiser, M. Teichmann, and M. Thoma, Pixel-wise segmen-
tation of street with neural networks, arXiv preprint arXiv:1511.00513,
In conclusion, the performance of the final model has proved 2015.
to be satisfactory. As can be observed in Figure 6, which [2] B. Xu, N. Wang, T. Chen, and M. Li, Empirical evaluation of rectified
activations in convolutional network, arXiv preprint arXiv:1505.00853,
shows some examples of classification on the test set, the 2015.
neural network segments the image in a human-like manner. [3] D. Kingma and J. Ba, Adam: A method for stochastic optimization,
In particular, the results seem to be correct even in border arXiv preprint arXiv:1412.6980, 2014.

Using Grayscale Images For Object Recognition With Convolutional-Recursive Neural Network
No ratings yet
Using Grayscale Images For Object Recognition With Convolutional-Recursive Neural Network
5 pages
Modeling, Simulation and Control of A Robotic Arm
No ratings yet
Modeling, Simulation and Control of A Robotic Arm
7 pages
1907.06119
No ratings yet
1907.06119
58 pages
Time Series Classification: Lab Based Project
No ratings yet
Time Series Classification: Lab Based Project
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
02_Sliding_Windows_15_min
No ratings yet
02_Sliding_Windows_15_min
8 pages
Classification of Textures Using Convolutional
No ratings yet
Classification of Textures Using Convolutional
30 pages
Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding
No ratings yet
Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding
6 pages
A Utilization of Convolutional Matrix Methods On Sliced Hippocampal Neuron Region Images For Cell Segmentation
No ratings yet
A Utilization of Convolutional Matrix Methods On Sliced Hippocampal Neuron Region Images For Cell Segmentation
8 pages
Image Segmentationand Semantic Labelingusing Machine Learning
No ratings yet
Image Segmentationand Semantic Labelingusing Machine Learning
6 pages
paper analysis
No ratings yet
paper analysis
8 pages
Group Q Presentation4
No ratings yet
Group Q Presentation4
21 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
ML Report-Image Segmentation
No ratings yet
ML Report-Image Segmentation
19 pages
Image Segmentation in Deep Learning
No ratings yet
Image Segmentation in Deep Learning
12 pages
Thesis Z Ai
No ratings yet
Thesis Z Ai
46 pages
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
No ratings yet
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
149 pages
Image Segmentation Algorithms With Implementation in Python
No ratings yet
Image Segmentation Algorithms With Implementation in Python
7 pages
Without IEEE Logo
No ratings yet
Without IEEE Logo
7 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Le y Yang - Tiny ImageNet Visual Recognition Challenge
No ratings yet
Le y Yang - Tiny ImageNet Visual Recognition Challenge
6 pages
A Review On Multiscale-Deep-Learning Applications
No ratings yet
A Review On Multiscale-Deep-Learning Applications
28 pages
Implementation of Handwritten Digit Recognizer Using CNN: Vinjit, Bhojak, Kumar and Nikam
No ratings yet
Implementation of Handwritten Digit Recognizer Using CNN: Vinjit, Bhojak, Kumar and Nikam
9 pages
Deep Learning Approach For Object Detection Using CNN: Abstract
No ratings yet
Deep Learning Approach For Object Detection Using CNN: Abstract
7 pages
Currency Recognition System Using Image
No ratings yet
Currency Recognition System Using Image
4 pages
Image Recognition Using Machine Learning Research Paper
No ratings yet
Image Recognition Using Machine Learning Research Paper
5 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Novel Based Image Segmentation De-Noising: Color and Image Techniques
No ratings yet
Novel Based Image Segmentation De-Noising: Color and Image Techniques
5 pages
2017 05 12 Image Segmentation
No ratings yet
2017 05 12 Image Segmentation
2 pages
CAPSTONE_PROJECT
No ratings yet
CAPSTONE_PROJECT
47 pages
Transfer Learning For Image Classification
No ratings yet
Transfer Learning For Image Classification
5 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Automatic License Plate Recognition Using Deep Learning Techniques
No ratings yet
Automatic License Plate Recognition Using Deep Learning Techniques
96 pages
Histogram Equalization: Enhancing Image Contrast for Enhanced Visual Perception
From Everand
Histogram Equalization: Enhancing Image Contrast for Enhanced Visual Perception
Fouad Sabry
No ratings yet
MP Final Report
No ratings yet
MP Final Report
19 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Intro Ai Group3
No ratings yet
Intro Ai Group3
35 pages
Road Image Classification: Leonid Dashko
No ratings yet
Road Image Classification: Leonid Dashko
5 pages
MNIST Dataset
No ratings yet
MNIST Dataset
12 pages
6-8photos
No ratings yet
6-8photos
8 pages
Digital Image Processing Segmntation Lab With Python
No ratings yet
Digital Image Processing Segmntation Lab With Python
9 pages
Deep Learning Manual (1)
No ratings yet
Deep Learning Manual (1)
44 pages
ML Project Docs
No ratings yet
ML Project Docs
45 pages
Deep Learning lab manual
No ratings yet
Deep Learning lab manual
69 pages
Image Restoration by Learning Morphological Opening-Closing Network
No ratings yet
Image Restoration by Learning Morphological Opening-Closing Network
21 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Research Paper
No ratings yet
Research Paper
7 pages
How Are We Going To Build - Review-1
No ratings yet
How Are We Going To Build - Review-1
8 pages
Fusing Global and Local Features For Generalized AI-synthesized Image Detection
No ratings yet
Fusing Global and Local Features For Generalized AI-synthesized Image Detection
5 pages
Milestone 1
No ratings yet
Milestone 1
3 pages
Sensors: Semantic Segmentation With Transfer Learning For Off-Road Autonomous Driving
No ratings yet
Sensors: Semantic Segmentation With Transfer Learning For Off-Road Autonomous Driving
21 pages
Expl CV
No ratings yet
Expl CV
16 pages
Autonomous Car
No ratings yet
Autonomous Car
12 pages
Lin_Bilinear_CNN_Models_ICCV_2015_paper
No ratings yet
Lin_Bilinear_CNN_Models_ICCV_2015_paper
9 pages
Decomposing A Scene Into Geometric and Semantically Consistent Regions
No ratings yet
Decomposing A Scene Into Geometric and Semantically Consistent Regions
15 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
CV Expl 21070126001
No ratings yet
CV Expl 21070126001
16 pages
Research Paper1
No ratings yet
Research Paper1
12 pages
Previous AI Projects - 10 Sample Projects
No ratings yet
Previous AI Projects - 10 Sample Projects
14 pages
2205.07690
No ratings yet
2205.07690
11 pages
Ai Research
No ratings yet
Ai Research
2 pages
EC402 - Control System
No ratings yet
EC402 - Control System
4 pages
PID Intech-Pid Control Theory PDF
No ratings yet
PID Intech-Pid Control Theory PDF
18 pages
ML DL Projects and Tutorials
100% (1)
ML DL Projects and Tutorials
21 pages
The 30 Most Useful Python Libraries For Data Engineering - by ODSC - Open Data Science - Medium
No ratings yet
The 30 Most Useful Python Libraries For Data Engineering - by ODSC - Open Data Science - Medium
23 pages
Dandii Boru College: Level-Iii
No ratings yet
Dandii Boru College: Level-Iii
54 pages
Machine Learning With Python.
0% (1)
Machine Learning With Python.
13 pages
Quiz 2 Previous Year Questions: Y(s) R(S)
No ratings yet
Quiz 2 Previous Year Questions: Y(s) R(S)
2 pages
SSRN Id4458723
No ratings yet
SSRN Id4458723
8 pages
Neuropunk Cards
No ratings yet
Neuropunk Cards
13 pages
Data Lake Slides
No ratings yet
Data Lake Slides
83 pages
10 11648 J Ajcst 20220503 11
No ratings yet
10 11648 J Ajcst 20220503 11
10 pages
AnneDashini 1106191005 SE Assignment
No ratings yet
AnneDashini 1106191005 SE Assignment
6 pages
2 Early Stopping - But When?
No ratings yet
2 Early Stopping - But When?
2 pages
Nr411503 Control Engineering
No ratings yet
Nr411503 Control Engineering
2 pages
Eng 03-Speech Communication Topic One Speech: Its Nature and Importance
No ratings yet
Eng 03-Speech Communication Topic One Speech: Its Nature and Importance
4 pages
Adaptive Control For Turning: System
No ratings yet
Adaptive Control For Turning: System
4 pages
De Saussure
No ratings yet
De Saussure
5 pages
Image Based Virtual Try On Network
No ratings yet
Image Based Virtual Try On Network
4 pages
EEE373 Electric Motor Drive: Asst. Prof. Dr. Mongkol Konghirun Ee, Kmutt
No ratings yet
EEE373 Electric Motor Drive: Asst. Prof. Dr. Mongkol Konghirun Ee, Kmutt
16 pages
2005 FBLA Introduction To Business Communication
No ratings yet
2005 FBLA Introduction To Business Communication
7 pages
Medical Image Processing Parasitology Brief
No ratings yet
Medical Image Processing Parasitology Brief
17 pages
4150 Lab 3
No ratings yet
4150 Lab 3
6 pages
Plant Disease Detection Using Deep Learning Approach: Project phase-II Presentation On
No ratings yet
Plant Disease Detection Using Deep Learning Approach: Project phase-II Presentation On
20 pages
Hadsell Chopra Lecun 06 PDF
No ratings yet
Hadsell Chopra Lecun 06 PDF
8 pages
Communications Skills - The Importance of Removing Barriers
No ratings yet
Communications Skills - The Importance of Removing Barriers
12 pages
Artificial Intelligence: Mca V Sem/ MSC Iii Sem
No ratings yet
Artificial Intelligence: Mca V Sem/ MSC Iii Sem
18 pages
Blue Brain Technology
No ratings yet
Blue Brain Technology
2 pages
IPC
No ratings yet
IPC
15 pages