Detection Sugarcane Crop Lines
Detection Sugarcane Crop Lines
Uberlândia
2020
Renato Rodrigues da Silva
Uberlândia
2020
I dedicate this work to all type 1 diabetics who, like a tightrope walker, have to keep the
balance between managing their lives while handling all the other responsibilities this
world demands.
A special feeling of gratitude to my love Pâmela, whose words of encouragement and
push for tenacity were crucial, and my sister Susana who had always believed in me.
This work is for you guys, too.
Acknowledgements
First of all, I thank God, who have lit my way during this path. I thank all the
professors of the course, who were very important in my academic life, especially those
who have been true mentors to me: Mauricio Cunha Escarpinati, my co-advisor, for his
companionship, friendship and all the advice given along these years; and André Ricardo
Backes, who, in addition to being my advisor, is a great friend and has always been a key
player in this journey. I also thank my friends and all the people who are important to me
and who have always been by my side, especially my parents and wife. Thanks, Pâmela
for the patience and for giving me strength at all times. Finally, I thank all those who,
in some way, were and are close to me, making this life more and more worthwhile.
ŞLife is like riding a bicycle. To keep your balance, you must keep moving.Ť
(Albert Einstein)
Resumo
Nos últimos anos, os VANTs (Veículos Aéreos Não Tripulados) têm se tornado cada
vez mais populares no setor agrícola, promovendo e possibilitando o monitoramento de
imagens aéreas tanto no contexto cientíĄco, quanto no de negócios. Imagens capturadas
por VANTs são fundamentais para práticas de agricultura de precisão, pois permitem a
realização de atividades que lidam com imagens de baixa ou média altitude. O cenário
da área plantada pode mudar drasticamente ao longo do tempo devido ao aparecimento
de erosões, falhas de plantio, morte e ressecamento de parte da cultura, intervenções de
animais, etc. Assim, o processo de detecção das linhas de plantio é de grande importância
para o planejamento da colheita, controle de custos de produção, contagem de plantas,
correção de falhas de semeadura, irrigação eĄciente, entre outros. Além disso, a infor-
mação de geolocalização das linhas detectadas permite o uso de maquinários autônomos e
um melhor planejamento de aplicação de insumos, reduzindo custos e a agressão ao meio
ambiente. Neste trabalho, abordamos o problema de segmentação e detecção de linhas de
plantio de cana-de-açúcar em imagens de VANTs. Primeiro, experimentamos uma abor-
dagem baseada em Algoritmo Genético (AG) e Otsu para produzir imagens binarizadas.
Posteriormente, devido a alguns motivos, incluindo a relevância recente da Segmentação
Semântica, seus níveis de abstração e os resultados inviáveis obtidos com AG, estudamos
e propusemos uma nova abordagem baseada em Semantic Segmentation Network (SSN)
em duas etapas. Primeiro, usamos uma SSN para segmentar as imagens, classiĄcando
suas regiões como linhas de plantio ou como solo não plantado. Em seguida, utilizamos a
transformada de Radon para reconstruir e melhorar as linhas já segmentadas, tornando-
as mais uniformes ou agrupando fragmentos de linhas e plantas soltas. Comparamos
nossos resultados com segmentações feitas manualmente por especialistas e os resultados
demonstram a eĄciência e a viabilidade de nossa abordagem para a tarefa proposta.
Uberlândia
2020
Abstract
In recent years, UAVs (Unmanned Aerial Vehicles) have become increasingly popular in
the agricultural sector, promoting and enabling the application of aerial image monitoring
in both scientiĄc and business contexts. Images captured by UAVs are fundamental
for precision farming practices, as they allow activities that deal with low and medium
altitude images. After the effective sowing, the scenario of the planted area may change
drastically over time due to the appearance of erosion, gaps, death and drying of part
of the crop, animal interventions, etc. Thus, the process of detecting the crop rows
is strongly important for planning the harvest, estimating the use of inputs, control of
costs of production, plant stand counts, early correction of sowing failures, more-efficient
watering, etc. In addition, the geolocation information of the detected lines allows the
use of autonomous machinery and a better application of inputs, reducing Ąnancial costs
and the aggression to the environment. In this work we address the problem of detection
and segmentation of sugarcane crop lines using UAV imagery. First, we experimented
an approach based on Genetic Algorithm (GA) associated with Otsu method to produce
binarized images. Then, due to some reasons including the recent relevance of Semantic
Segmentation in the literature, its levels of abstraction, and the non-feasible results of
Otsu associated with GA, we proposed a new approach based on SSN divided in two
steps. First, we use a Convolutional Neural Network (CNN) to automatically segment
the images, classifying their regions as crop lines or as non-planted soil. Then, we use
the Radon transform to reconstruct and improve the already segmented lines, making
them more uniform or grouping fragments of lines and loose plants belonging to the same
planting line. We compare our results with segmentation performed manually by experts
and the results demonstrate the efficiency and feasibility of our approach to the proposed
task.
GA Genetic Algorithm
HT Hough Transform
PA Precision Agriculture
VI Vegetation Index
Contents
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 FUNDAMENTALS . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Precision Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Unmanned Aerial Vehicles . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Convolutional neural network . . . . . . . . . . . . . . . . . . . . . 28
2.4 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.1 Binarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Radon Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 RELATED-WORK . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Otsu Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . 48
3.4 Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Final Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.1 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Segmentation using Genetic Algorithm . . . . . . . . . . . . . . . 55
4.3 Semantic Segmentation Networks . . . . . . . . . . . . . . . . . . . 56
4.3.1 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Line Reconstruction and Refinement . . . . . . . . . . . . . . . . . 59
5 EXPERIMENTAL RESULTS . . . . . . . . . . . . . . . . . . . 61
5.1 Segmentation using Generic Algorithm . . . . . . . . . . . . . . . 61
5.2 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Comparison of approaches . . . . . . . . . . . . . . . . . . . . . . . 70
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Contributions in Bibliographic Production . . . . . . . . . . . . . 74
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
I hereby certify that I have obtained all legal permissions from the owner(s) of each
third-party copyrighted matter included in my dissertation, and that their permissions
allow availability such as being deposited in public digital libraries.
Chapter 1
Introduction
Sugarcane is one of the most planted cultures in the planet. Its planting practice and
mechanization are the development tendency of the modern agro-industry (UCHIMIYA;
SPAUNHORST, 2020). Brazil is the largest producer of sugarcane in the world. The
country registered an area around 10,123.5 million hectares (Mha) of land planted with
the crop in the 2018/2019 harvest. This area includes Ąelds meant for both sugar and
ethanol production (LIMA et al., 2020).
The main destination for ethanol is the biofuel industry supplying the Brazilian vehicle
Ćeet with the mixture of anhydrous ethanol for gasoline and for the engines with Ćex
fuel technology which make up an increasingly emerging market in Brazil and worldwide
(LIMA et al., 2020). As expected, Brazil is also the largest producer of sugarcane ethanol
worldwide and this production is expected to have a substantial increase in the coming
years as the sugarcane ethanol sector contributes signiĄcantly to the national economy
(BRINKMAN et al., 2018).
Despite all the economic beneĄts that sugarcane and other crop cultures bring, the
massive expansion of agriculture also leads to some social and ecological issues. Some of
them due to drastic deforestation, social conĆicts, land disputes and the degradation of the
environment caused by the spread of pesticides and agricultural inputs. A great ally for
helping solving some of these problems is the Precision Agriculture (PA) (MCBRATNEY
et al., 2005; MILELLA; REINA; NIELSEN, 2018).
PA is a modern concept for managing agricultural activities and it has been increas-
ingly adopted by producers in several countries. This concept is associated with research,
information gathering, and the use of various technologies to analyze and monitor the
conditions of planted areas in a more precise and efficient way. It is based on observation,
measurement, monitoring, and rapid decision making in the face of the variability that
planting crops can present (LINDBLOM et al., 2017).
The main objective of the research in the Ąeld of PA is to deĄne a support system
for the necessary decisions aiming at a better management of the crop to optimize the
use of resources and inputs, while increasing the Ąnancial return (MCBRATNEY et al.,
16 Chapter 1. Introduction
2005; MILELLA; REINA; NIELSEN, 2018). More recently, research in this area has
had a major positive impact on the growth of agricultural production (REN et al., 2020).
Methodologies for improving seed quality, more efficient irrigation systems and soil quality
control are just a few examples of the techniques that beneĄt from this research (JR;
DAUGHTRY, 2018).
Technological advances in the use of unmanned aerial vehicles (UAVs) have also opened
up new opportunities in the PA sector. This type of equipment allows for more effective
monitoring and greater agility in cultivation. Sensors coupled to an UAV are able to collect
large amounts of information about the plantation. In addition, they enable more frequent
data collections and less cloud interference due to their lower Ćight altitude (SILVA et al.,
2017; SOARES; ABDALA; ESCARPINATI, 2018; SOUZA; ESCARPINATI; ABDALA,
2018; FUENTES-PEÑAILILLO et al., 2018). The use of UAVs has also fostered the
development of new and more efficient digital image processing techniques to analyze im-
ages acquired by their sensors (SILVA et al., 2017; SOARES; ABDALA; ESCARPINATI,
2018; SOUZA; ESCARPINATI; ABDALA, 2018). Most of these techniques aim to es-
timate the growth of the crop or to identify other important agronomic characteristics,
such as nitrogen stress, water stress, new diseases, known pests and vegetation indexes.
After the initial planning and effective sowing, the scenario of the planted area may
change over time due to the appearance of failures, erosion, death, and drying of part
of the plantation, tipping of plants, animal interventions, among others. This makes
the identiĄcation of crop lines, and how they are arranged in a region, an important
task within PA. With this information, it is possible, for example, to better plan the
application of inputs, thus reducing Ąnancial costs and the aggression to the environment.
Recently, Convolutional Neural Networks (CNNs) have emerged as a powerful ap-
proach to computer vision tasks. Its use has been widespread in the most diverse areas
of research and it has presented relevant results in applications of classiĄcation, object
detection, and facial recognition (LIU et al., 2020; SIMONYAN; ZISSERMAN, 2014;
KANG et al., 2014). They have, for example, been used with great success in identifying
pests in agricultural environments of complex soil structures (CHENG et al., 2017), in the
detection of weeds (FERREIRA et al., 2017), detection of plant diseases (FERENTINOS,
2018) and even in the detection of Ćowers (DIAS; TABB; MEDEIROS, 2018).
In this work we address the problem of crop line detection and segmentation in aerial
images of sugarcane plantations obtained by UAVs. First, we experimented an approach
based on Genetic Algorithm associated with Otsu method to produce binarized images
that were then reconstructed using a Radon transform. Then, due to some reasons includ-
ing the recent relevance of Semantic Segmentation in the literature, its levels of abstrac-
tion, and the non-feasible results of Otsu associated with GA, we studied and proposed
a new automatic segmentation approach based on SSN consisting in two steps. First, we
use a Convolutional Neural Network (CNN) to segment the planted area into regions of
1.1. Motivation 17
crop lines (region of interest) and unplanted soil (background). Then, we use a reĄne-
ment process which aims to reconstruct and to improve the previous detected lines. This
is performed in order to make the detected crop lines more uniform and to connect line
fragments and isolated plants that originally belonged to the same crop line.
The remainder of this dissertation is organized as follows: In the chapter 2 we describe
the fundamental concepts relevant to the understanding of the working. In the chapter
3 we present a review of the related architecture and the state of the art in detecting
crop lines. In chapter 4 we describe our methodology, showing the datasets as well as the
techniques used in the proposed approach. In the chapter 5 we show our experiments and
obtained results and Ąnally, the chapter 6 concludes this dissertation.
1.1 Motivation
The identiĄcation of crop lines, and how they are arranged in the planted area, in
low and medium altitude images obtained by Unmanned Aerial Vehicle (UAV) is an
important problem within PA. The lower cost of obtaining images by the UAV also allows
farmers to monitor them more frequently. This is important because after the initial
planning and effective sowing, the scenario of the planted area may change over time,
such as the appearance of failures, erosion, death and drying of part of the plantation,
tipping of plants, animal interventions, among others. Thus, this process of detecting
the lines is important for planning the harvest, estimating the use of inputs, controlling
costs, estimating production, counting plants and early correction of sowing failures. In
addition, the geolocation information of the crop allows a better planning of application of
inputs, thus reducing Ąnancial costs and less aggression to the environment as sugarcane
represents a great percentage of all plantation crop worldwide.
Another important point is the fact that the harvest may come from autonomous
vehicles and machinery. The geolocation of the crop-rows are crucial for these machines
to drive themselves through the Ąeld. Plus, additional information such as which parts of
the crop-row have gaps help the machinery to know which parts of the row do not need
to receive inputs, suppressing their spread and thus saving money and, more importantly,
lessening the degradation of the environment as some of theses substances can be harmful.
Figure 1 shows an example of sugarcane crop where the crop lines were detected by an
expert. Also, the exact location of the line can minimize the stump trampling and soil
compaction in the seedling zone done by the own machinery. In the beneath layer we can
see the image captured by the UAV. In the above layer we can see the segments of the
crop lines. The green segments represent the part of the line where there is plants and the
gaps of the line are described in different colors depending on the extension of the gap.
The red color segments represent small gaps. Orange represents the medium segments.
The yellow, large gaps.
20 Chapter 1. Introduction
1.3 Hypothesis
This research explores basically two hypotheses:
2. Using the binarized output images from the CNN, it is possible to perform a Radon
transform to achieve a reĄnement process that targets the reconstruction (gap Ąlling)
and enhancement of the previous detected lines. This is performed in order to make
the detected crop lines more uniform and to link row fragments and isolated plant
areas that originally belonged together.
1.4 Contributions
The main contributions of this work are :
❏ Chapter 4 states the methodology of this project, including the image acquisition
process and the datasets used, a description of our Ąrst approach based on genetic
algorithm, and Ąnally our prime approach based on semantic segmentation to bina-
rize images, as well as the post-processing step to reconstruct lines and the metrics
used for the work evaluation;
1.5. Thesis Organization 21
❏ Chapter 5 describes the experimental results obtained for each dataset. Discussions
and comparisons between the results obtained by GA and CNN as well as the results
obtained by the Radon transform.
❏ Finally, in Chapter 6 we present the conclusions of this research work, its results,
contributions and future work.
22 Chapter 1. Introduction
23
Chapter 2
Fundamentals
their lower Ćight altitude (SILVA et al., 2017; SOARES; ABDALA; ESCARPINATI,
2018; SOUZA; ESCARPINATI; ABDALA, 2018; FUENTES-PEÑAILILLO et al., 2018).
The use of UAVs has also fostered the development of new and more efficient digital
image processing techniques to analyze images acquired by their sensors (SILVA et al.,
2017; SOARES; ABDALA; ESCARPINATI, 2018; SOUZA; ESCARPINATI; ABDALA,
2018). In addition, most of these techniques aim to estimate the growth of the crop
or to identify other important agronomic characteristics, such as nitrogen stress, water
stress, new diseases, known pests and VI. In the next section the UAV technology is more
explored and explained.
Figure 7 Ű Example of sugarcane crop seen from above. This image was taken by the
unmanned aerial vehicle (UAV) shown in Figure 6.
that are normally black and white (VERMA; PARIHAR, 2017). These two values labels
the regions of the image as regions of interest and background (SEZGIN; SANKUR, 2004).
There are several ways to perform the binarization, but the simplest technique is the
use of a threshold value to classify the pixels of the source image based on that value.
All pixels with values greater than this threshold are deĄned as white or intensity 255,
and all other pixels will receive an intensity value of 0 or black. The threshold value in a
binarization can be set based on the region of the image or globally (VERMA; PARIHAR,
2017). One of the most common approaches to calculate a threshold is through the use of
Otsu method. This algorithm basically assumes that the image has two classes of pixels
following a bi-modal histogram (background and foreground). Then, the ideal threshold
is reached by separating the two classes so that the combined intra-class variation is
minimized (OTSU, 1979). Usually, Ąnding a single correct threshold that satisfactorily
represents the entire image can be quite difficult or even impossible. Thus, applying local
thresholds to smaller regions of the image may be the best option (GONZALEZ; WOODS,
2000).
Despite the impressive advances these networks have made, there are still two main chal-
lenges associated with this task to be worked out. The Ąrst one is how to correctly capture
rich contextual information and features to determining confusing classes. The second
one is related to how to accurately recover feature map resolution to improve spatial
performance (SANG; ZHOU; ZHAO, 2020). Figure 11 shows an example of a semantic
segmentation process performed in some images, their results, as well as their classiĄ-
cations and respective percentage score per segment/label. In the sequence we explain
three of the most used semantic segmentation networks (U-net, PSPNet and LinkNet).
We chose these networks due to the great results they have been showing in literature.
Further explanation regarding CNNs is provided in section 2.3
2.4.2.1 U-net
U-net was initially proposed by (RONNEBERGER; FISCHER; BROX, 2015) for tasks
that needed precise segmentation, but had few examples available for training, such as
medical images. It consists of two main segments: contracting and expanding paths,
giving to the network the shape of an ŞUŤ, which justiĄes its name.
The contracting path consists of the repeated application of blocks of two 3 × 3 con-
volution layers (each followed by a ReLU unit) and a 2 × 2 max-polling layer. After
each block, the number of Ąlters doubles so that the network can learn the more complex
structures.
In the expanding path, each block consists of two 3 × 3 convolution layers (each
followed by a ReLU unit) and a 2 × 2 up convolution layer. It also concatenates the
high-resolution features maps from the respective step of the contracting path in order
to ensure the proper reconstruction of the image. After each block, the number of Ąlters
halves. This is necessary due to the loss of the border region in each convolution.
Both contracting and expanding path present the same number of blocks. After ex-
panding path, the resulting feature map passes through a 3 × 3 convolution layer where
the number of feature maps is equal to the number of classes in the segmented image.
2.4.2.2 PSPNet
The Pyramid Scene Parsing Network (PSPNet) (ZHAO et al., 2017) has as its fun-
damental principle the use of global information from the image by extracting context
information in each scene. Its architecture consists of a fully convolutional network, be-
ing the ResNet (HE et al., 2016) used in (ZHAO et al., 2017). From the feature map,
four pooling layers of different sizes are applied, thus generating four feature sub-maps.
Subsequently, the network applies a 1 × 1 convolution to reduce the mapsŠ dimensionality,
which are enlarged through a bilinear interpolation to return to the size of the original
feature map. Finally, the feature maps are concatenated and a convolution is applied to
obtain the prediction map.
2.5. Genetic Algorithm 35
In certain individuals of offspring, some of their genes can be subjected to a mutation with
a low random probability, 𝑟. This implies that some of the genes in the chromosome 𝑔(𝑖)j
that represents the individual can have its value modiĄed. A mutation occurs to maintain
diversity within the population and to prevent premature convergence of the population.
The mutation depends on how the gene is used to represent the data. For binary genes,
selected genes may have their bits Ćipped. For non-binary genes, for example, a unit
Gaussian distributed random value can be added to the gene.
Finally, the algorithm terminates if the population converges to a solution, i.e., the
population does not produce offspring signiĄcantly different from the previous generation.
When it occurs, it can be said that the genetic algorithm has provided a set of solutions
to the problem (EIBEN; SMITH, 2003).
In our work, we propose to use a genetic algorithm (GA) to estimate the best pa-
rameters of a kernel mask used to segment crop lines. We opted for this algorithm as
it presents a large range of application in several areas. For example, GA is used to
36 Chapter 2. Fundamentals
select optimal parameter values for image defogging algorithms, a technique commonly
used to correct image degradation produced by many outdoor working systems (GUO;
PENG; TANG, 2016). A modiĄed genetic algorithm (HEMANTH; ANITHA, 2019) is
proposed to minimize the random nature of conventional GA. The authors proposed this
modiĄcation aiming to improve medical image classiĄcation, more speciĄcally, the classi-
Ącation of abnormal brain images from four different classes. This modiĄcation presented
promising results and achieve 98% accuracy in the given problem. In (GHOSH et al.,
2016), GA is used for prostate automatic segmentation on pelvic images. The authors
propose a framework where GA evaluates candidate contours by combining representa-
tions of learned information (e.g., known shapes and local properties). Visual analysis of
the three dimensional segmentation indicates that GA is a feasible approach for pelvic
CT and MRI image segmentation.
Since the one-dimensional projection of the 𝑓 (𝑥, 𝑦) function at the ã angle is deĄned
as 𝑝φ (𝑥′ ), the Radon transform calculates the integral of a two-dimensional image on the
axis 𝑦 ′ .
40 Chapter 2. Fundamentals
41
Chapter 3
Related Work
As already mentioned in this dissertation, the popularization of UAVs has been en-
abling authors to address many existing problems in agriculture. One of these problems
is to accurately identify existing crop lines in a region and, consequently, its geolocation,
arrangement, as well as how failures and gaps are distributed in the Ąeld. Literature
presents some existing approaches to address this problem, many of them are commonly
based on the Hough transform (HT) (HOUGH, 1962), which is widely used in problems
involving the detection of known objects, such as straight lines and circles (ILLING-
WORTH; KITTLER, 1988; HASSANEIN et al., 2015). Nonetheless, other techniques in
the state of the art have also been used to study problems similar to the one exposed
in this work. Hence, in the remainder of this chapter, we will discuss some of classical
approaches that have been taken mostly based on HT technique as well as some new and
very interesting approaches applied to similar problems that, by the way, also use CNN
as our work.
root rot that, despite being related to another culture crop, it gives us a great overview
of the state of the art in terms of modern technology and techniques applied in the Ąeld
and alternative approaches.
The work in (VIDOVIC; CUPEC; HOCENSKI, 2016) also uses ExG index to segment
images under perspective projection. It combines image evidence and prior knowledge
of the geometric structure of the crop using a dynamic programming technique. This is
performed to detect regular patterns related to the appearance of crop rows, both straight
and curved. This method is used as base algorithm in (BASSO; FREITAS, 2020), where
an entire guiding system for spraying UAVs is proposed. The idea is to identify the crop
rows during the UAV Ćight and to use this information to generate the driving parameters
sent to the Ćight controller. The authors claim that their approach is able to deal with
curved crop rows by dividing the curves into segments of straight line.
In (MONTALVO et al., 2012) yet another method of crop row identiĄcation is pre-
sented. It was devised to work on crops with high incidence of weeds and with camera
mounted on ground vehicles. According to the authors the image processing of this work
consists of three main stages: image segmentation, double thresholding based on the
OtsuŠs method, and crop row detection. The image segmentation is based on the applica-
tion of a VI, the double thresholding handles the separation between weeds and crops and
the crop row detection applies least squares linear regression for line adjustment. Also,
in (MONTALVO et al., 2013), the authors explore another technique based on image
segmentation procedures that works independently of the loss of greenness. First, they
perform a combination of vegetation indices and apply a Ąrst Otsu thresholding. Then,
they select black pixels and apply a second Otsu thresholding. Lastly, the histogram
obtained from pixels belonging to the background and masked plants is thresholded by
applying a last stage of Otsu. However, the main focus of these works falls on the image
processing task, since the crop row identiĄcation became trivial due the fact that only
straight lines are expected.
A different approach (SOUZA; ESCARPINATI; ABDALA, 2018) was used for images
of sugar cane and coffee taken on low altitude (≡100m). They assumed the images
are well segmented and, from a cloud of points representing the plants in a Ąeld they
subdivided such points into lines representing the true plantation rows. The proposed
algorithm is a two-fold process. First the points were subdivided into preliminary lines
by a procedure inspired in the formulation of hierarchical clustering. Afterward the lines
are pruned to correct for imprecisions introduced by Ąeld speciĄcities and the image pre-
processing. They achieved good results, but the total dependence of the segmentation
process represents a real problem for the algorithm.
3.5. Final Considerations 51
Chapter 4
Methodology
4.1 Datasets
To evaluate the proposed methodology we used four test images of different sizes.
These images are mosaics of aerial images that represent areas of sugar cane cultivation
and that contain planting lines of different ages and widths. These images were acquired
using an eBee SenseFly mapping drone. We used a senseFly S.O.D.A. camera of 1in
Sensor, 5472 × 3648 pixel resolution and RGB lens F/2.8-11, 10.6 mm. Each pixel in the
image represents 5𝑐𝑚 of ground (Ground sample distance - GSD - of 0.053 meters).
Figure 24 shows a preview of the four test images used in the experiments (named
Dataset A, B, C, and D, respectively). It is important to notice that each image has the
crop lines of its entire region segmented by an expert, as illustrated in Figure 25. Note
that the rows can vary in width depending on the age of the crop and the level of success
of the planting process.
Figure 24 Ű Test images used to evaluate our approach and their respective sizes: (a)
11180 × 8449; (b) 19833 × 30255; (c) 17497 × 10771; (d) 16677 × 24181.
4.2. Segmentation using Genetic Algorithm 55
♣𝐴 ∩ 𝐵♣
𝐷=2 (13)
♣𝐴♣ + ♣𝐵♣
Also, for the SSN approach, the Jaccard Similarity Coefficient (JSC) (also known as
Intersection over Union) was used as a loss function in the training process. This measure
is highly recommended for segmentation problems where there are unbalanced classes.
The Jaccard Loss between two images 𝐴 and 𝐵 is deĄned as:
♣𝐴 ∩ 𝐵♣ ♣𝐴 ∩ 𝐵♣
𝐽(𝐴, 𝐵) = 1 ⊗ =1⊗ (14)
♣𝐴 ∪ 𝐵♣ ♣𝐴♣ + ♣𝐵♣ ⊗ ♣𝐴 ∩ 𝐵♣
In addition, DSC is quite similar to JSC. In fact, it is possible to make a conversion
from DSC to JSC as described in Equation 15 and vice versa (Equation 16).
𝐷
𝐽= (15)
2⊗𝐷
2𝐽
𝐷= (16)
𝐽 +1
learning and computer vision and it has been used with success in many applications,
including autonomous driving. Its relevance has been increasingly growing in the last years
due the resumption of convolution neural networks and its fast development, producing
the named Semantic Segmentation Networks (SSN). As better stated in Section 2.4.2,
these networks aim to assign semantic labels accurately to each pixel of an image. Thus,
in the proposed approach, the binarization is performed by using a network trained with
a dataset referenced by two classes (crop rows and background). An unclassiĄed image is
introduced into the CNN in order to generate an output binarized image, where each of
the two color values represents the classiĄcation of pixels based on these two classes.
In addition, it is important to state that while the Genetic Algorithm technique was
able to work over only one type of feature: reddish and greenish color tones to produce
a kernel Ąlter, Semantic Segmentation manages to extract several other different levels of
abstraction, each of these levels focusing on a different type of feature, such as border,
texture, etc. This is a very important aspect as depending on the stage of the after-
cut, dry leaves and ratoon is present in the soil between the crop-rows, confounding the
contrast between plants and the soil interfering in the computational analysis process.
Thus, as this color contrast may be compromised, a GA based method, in this case,
has a disadvantage and that is one of the reasons why we decided to go for the SSN
approach, even though the two methods being trained with both cane plant and cane
ratoon datasets.
Yet, as previously stated, Otsu global binarization method does not perform well to
segment crop lines as they present different local features due to the age and width of
the crop line. Also, the Otsu method picking a local threshold to perform binarization is
not a feasible option. As the CNN approach does not depend on Otsu as the GA based
approach did, this is another reason why we chose to follow the Semantic Segmentation
path in our research.
For this approach, we used the same datasets previewed in Figure 24. For each image,
we cropped the mosaic into pieces of 256 × 256 pixels size, with a stride of 256 pixels.
If the cropped area does not contain at least 80% of useful information (i.e., pixels with
values other than zero), the sample is discarded. After cropping, datasets A, B, C, and
D contained a total of, respectively, 678, 3291, 1550 and 2162 images.
In this work, we evaluated U-net, LinkNet, and PSPNet semantic neural networks
for the segmentation in aerial crop images. We replaced the encoder of each network
by the VGG16 pre-trained with ImageNet weights (SIMONYAN; ZISSERMAN, 2014)
due to its high performance in feature extraction in precision agriculture applications
(FAWAKHERJI et al., 2019). The basics of these networks was explored in section 2.4.2
and the use of this networks in our new approach is described in the next section (4.3.1).
58 Chapter 4. methodology
In an ideal crop, only the segmentation step should be sufficient to obtain the crop
lines accurately. However, in the vast majority of cases, there are external factors that
may affect the results. Among these factors are sowing failures (i.e., absence of plants
in a section of the line), weed plants (which in the segmentation process will be treated
as plants) and plants that are outside the crop row. An example of these problems are
shown in Figure 29.
Thus, to improve the segmentation obtained, we used a reĄnement step, which aims to
reconstruct and to improve the previous segmented lines by making them more uniform
or linking line fragments and loose plants that belong to the same crop line. For this new
approach we also used the Radon transform (Section 4.4) to reconstruct the line obtained
during the segmentation step to coincide with the one marked by the expert, as shown in
Figure 30.
4.4. Line Reconstruction and Refinement 59
Figure 29 Ű Example of problems encountered after the segmentation step: (a) Original
image; (b) Planting lines provided by an expert; (c) Image after segmentation.
Figure 30 Ű Proposed scheme for crop line reconstruction using Radon transform: (a) In-
put image; (b) Matrix obtained with the Radon transform. The red dot
represents the location of the maximum point and the orientation angle of
the input image; (c) Radon transform obtained for the image orientation an-
gle (red line in (b)). Each peak of the curve corresponds to the center of a
line in the input image; (d) Reconstruction of the lines using the orientation
angle and the peaks of the Radon transform for that angle.
61
Chapter 5
Experimental Results
Figure 31 Ű Average Dice coefficient and standard deviation for different images for 5
different GA kernel masks.
also marked regions where the crop line should exist, even though there is not any plant
there. Since these markings do not follow the natural width of the crop lines, an error
is expected when comparing the segmentation provided by an expert with the results
obtained by our approach.
We noticed that the application of the convolution kernel resulted in an image that
is mostly black and white. Although other gray levels are present in the image, their
frequency are not signiĄcant and could compromise the use of a Global Otsu threshold.
This explain the poor segmentation obtained in Figure 32d. The same is true when using
a manually deĄned threshold. Although faster than using the Otsu method, it is not
possible to set a user deĄned global threshold that achieves good results for all evaluated
images, as shown in Figure 33. As a result, depending on the threshold value used, some
crop lines may be missed while we improve the detection of other crop lines (Figure 32c).
One must consider that global binarization may not be the best approach to segment
crop line as they present different local features due to the age and width of the crop
line. In order to investigate such matter we applied Otsu algorithm locally on the images.
To accomplish that we used a square window of 𝑊 × 𝑊 pixels size and we moved this
window along the image using different values of stride, 𝑆. We used ŞOR operatorŤ to
combine all local binarizations into a single one binary image. Figure 34 shows that local
analysis improves the Dice coefficient obtained by Otsu binarizaton in all conĄgurations
evaluated. In general, small windows present better results as they enable Otsu method
to capture the local features with more precision, thus providing a better binarization and
a higher Dice coefficient, as shown in Figure 32e.
Although the use of local Otsu improves the detection of crop lines (Figure 32e),
we noticed that crop lines which are parallels to each other are sometimes connected by
regions incorrectly detected as crop lines. Moreover, the detected line presents an irregular
width which compromises the comparison with the expertŠs segmentation.
Since the orientation angle of the image is a local feature, we evaluated this approach
64 Chapter 5. Experimental Results
Figure 34 Ű Dice coefficient obtained using Global Otsu and Local Otsu for different com-
binations of Window 𝑊 and Stride 𝑆.
Figure 35 Ű Dice coefficient obtained for the line reconstruction for different combinations
of Window 𝑊 and Stride 𝑆.
66 Chapter 5. Experimental Results
Results show that LinkNet obtained the best result for the Dice coefficient when
segmenting crop lines in the images. Besides, this network was also the one that showed
the least variation among test folds. Due to its consistent result in the plant detection
process, we chose LinkNet as the standard network for the segmentation stage. We restore
the weights that generated the best result during the training of the LinkNet and we used
them to evaluate other datasets (B, C, and D). Table 2 shows the results obtained for
datasets B, C, and D.
We noticed a slight worsening of the LinkNet result when applied to other datasets.
The decrease in the Dice coefficient depends on the dataset evaluated. However, it is
possible to notice that the average Dice coefficient is superior to 0.80 in all datasets. A
possible explanation for this behavior lies in the fact that crop lines can vary in width
depending on the age of the crop and how the planting process was successful, as illustrated
in Figure 25.
5.3. Comparison of approaches 71
constant Dice coefficient when compared to Genetic Algorithm results. Certainly, this is
due to the fact that in our approach GA was able to work over only one type of feature
(reddish and greenish color tones to produce a kernel Ąlter). Depending on the crop, dry
leaves, weed and ratoon may be present between the crop-rows, compromising the reddish
contrast between plants and the background. Thus, as the Semantic Segmentation based
technique manages to extract several other different levels of abstraction it tends to be
more capable of operating well in different stages of the crop regardless of color contrast.
Yet, the experiments with Genetic Algorithm method did not show good segmentation
results according to the Dice evaluation metric when associated with a global Otsu tech-
nique. It is undeniable that a local Otsu analysis improves the Dice coefficient obtained.
Small windows present better results as they enable Otsu method to better capture the
features with more precision. However, the general results did not reach even 0.78, while
LinkNet shows average Dice results from 0.80 to 0.86 in tested datasets that were not
part of the training, reaching 0.90 when tested with the training dataset.
72 Chapter 5. Experimental Results
73
Chapter 6
Conclusion
In this work, we presented a methodology to segment crop lines from UAV images.
First, we experimented an approach based on Genetic Algorithm associated with Otsu
method to produce binarized images that were then reconstructed using Radon transform.
Then, due to some reasons including the recent relevance of Semantic Segmentation in the
literature, its levels of abstraction, and the non-feasible results of Otsu associated with
GA, we studied and proposed a new approach based on SSN. This new approach uses a
Convolutional Neural Network to perform the segmentation step. Among the networks
evaluated, the one that stands is LinkNet presenting the best results to segment crop lines,
obtaining a higher and much more consistent Dice coefficient for the datasets evaluated.
Which is extremely positive despite the fact that this method requires a larger number of
images for the training process. We also proposed a line reconstruction approach based
on the Radon transform for this technique testing some variation Ąlters. Although the
crop row reconstruction sometimes producing a slight decrease in the Dice coefficient, it
enables us to improve the segmentation results by connecting fragments of crop lines and
by Ąlling segmentation errors caused by missing plants, thus indicating that our approach
is a feasible solution to segment crop lines in images.
strongly for important for PA, specially planning the harvests, inputs usage, estimating
of production, counting plants, early correction of sowing failures. In addition, the geolo-
cation information of the detected crop rows lessens the waste of inputs, the harm to the
environment and Ąnancial costs. Also, it allows autonomous machinery guidance through
the crop,
Finally, sugarcane represents a great percentage of all crops worldwide and also it is
a semi-perennial crop, which means that it can be harvested annually for several years
without replanting. Thus, the correct and effective detection and frequent maintenance
over the years can bring a huge economic impact for producers and, consequently, their
countries.
❏ The study of CNNs done for this research also allowed us to work in other problems
related to computer vision. Hence, our following paper was recently published:
❏ Furthermore, it is worth mentioning that this work is currently running for the
Prêmio Mercosul de Ciência e Tecnologia (Mercosur Science and Technology Award):
Bibliography
BAH, M. D.; HAFIANE, A.; CANALS, R. Crownet: Deep network for crop row
detection in uav images. IEEE Access, v. 8, p. 5189Ű5200, 2020. Disponível em:
<https://doi.org/10.1109/ACCESS.2019.2960873>.
BASSO, M.; FREITAS, E. P. de. A uav guidance system using crop row detection
and line follower algorithms. J. Intell. Robotic Syst, v. 97, n. 3, p. 605Ű621, 2020.
Disponível em: <https://doi.org/10.1007/s10846-019-01006-0>.
BEERS, F. van et al. Deep neural networks with intersection over union loss for binary
image segmentation. In: ICPRAM. [S.l.: s.n.], 2019. p. 438Ű445.
BRAS, G. et al. Transfer learning method evaluation for automatic pediatric chest
x-ray image segmentation. In: 2020 International Conference on Systems,
Signals and Image Processing (IWSSIP). [s.n.], 2020. p. 128Ű133. Disponível em:
<https://doi.org/10.1109/IWSSIP48289.2020.9145401>.
78 Bibliography
CHAMOLA, V. et al. A comprehensive review of the covid-19 pandemic and the role
of iot, drones, ai, blockchain, and 5g in managing its impact. IEEE Access, v. 8, p.
90225Ű90265, 2020. Disponível em: <https://doi.org/10.1109/ACCESS.2020.2992341>.
DEWA, C. K.; AFIAHAYATI. Suitable cnn weight initialization and activation function
for javanese vowels classiĄcation. Procedia Computer Science, v. 144, p. 124 Ű 132,
2018. ISSN 1877-0509. INNS Conference on Big Data and Deep Learning. Disponível
em: <http://www.sciencedirect.com/science/article/pii/S187705091832221X>.
DIAS, P. A.; TABB, A.; MEDEIROS, H. Apple Ćower detection using deep
convolutional networks. Comput. Ind, v. 99, p. 17Ű28, 2018. Disponível em:
<https://doi.org/10.1016/j.compind.2018.03.010>.
DOW, J. M.; NEILAN, R. E.; RIZOS, C. The international gnss service in a changing
landscape of global navigation satellite systems. Journal of Geodesy, v. 83, n. 3, p.
191Ű198, 2009. Disponível em: <https://doi.org/10.1007/s00190-008-0300-3>.
DUDA, R. O.; HART, P. E. Use of the hough transformation to detect lines and
curves in pictures. Commun. ACM, v. 15, n. 1, p. 11Ű15, 1972. Disponível em:
<https://doi.org/10.1145/361237.361242>.
ELFWING, S.; UCHIBE, E.; DOYA, K. Sigmoid-weighted linear units for neural
network function approximation in reinforcement learning. Neural Networks, v. 107,
p. 3 Ű 11, 2018. ISSN 0893-6080. Special issue on deep reinforcement learning. Disponível
em: <http://www.sciencedirect.com/science/article/pii/S0893608017302976>.
GHOSH, P. et al. Incorporating priors for medical image segmentation using a genetic
algorithm. Neurocomputing, v. 195, p. 181 Ű 194, 2016. ISSN 0925-2312. Learning for
Medical Imaging. Disponível em: <https://doi.org/10.1016/j.neucom.2015.09.123>.
JR, E. R. H.; DAUGHTRY, C. S. What good are unmanned aircraft systems for
agricultural remote sensing and precision agriculture? International journal of
remote sensing, Taylor & Francis, v. 39, n. 15-16, p. 5345Ű5376, 2018. Disponível em:
<https://doi.org/10.1080/01431161.2017.1410300>.
KANG, L. et al. Convolutional neural networks for no-reference image quality assessment.
In: CVPR. IEEE Computer Society, 2014. p. 1733Ű1740. ISBN 978-1-4799-5118-5.
Disponível em: <http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=
6909096>.
KHADANGA, G.; JAIN, K. Tree census using circular hough transform and grvi.
Procedia Computer Science, v. 171, p. 389 Ű 394, 2020. ISSN 1877-0509.
Third International Conference on Computing and Network Communications
(CoCoNetŠ19). Disponível em: <http://www.sciencedirect.com/science/article/pii/
S1877050920310061>.
LEEMANS, V.; DESTAIN, M.-F. Line cluster detection using a variant of the hough
transform for culture row localisation. Image and Vision Computing, Elsevier, v. 24,
n. 5, p. 541Ű550, 2006. Disponível em: <https://doi.org/10.1016/j.imavis.2006.02.004>.
LIMA, M. et al. Sugarcane: Brazilian public policies threaten the amazon and
pantanal biomes. Perspectives in Ecology and Conservation, 2020. ISSN
2530-0644. Disponível em: <http://www.sciencedirect.com/science/article/pii/
S2530064420300262>.
MILELLA, A.; REINA, G.; NIELSEN, M. A multi-sensor robotic platform for ground
mapping and estimation beyond the visible spectrum. Precision Agriculture, Springer,
p. 1Ű22, 2018. Disponível em: <https://doi.org/10.1007/s11119-018-9605-2>.
MONTALVO, M. et al. Automatic detection of crop rows in maize Ąelds with high weeds
pressure. Expert Systems with Applications, Elsevier, v. 39, n. 15, p. 11889Ű11897,
2012. Disponível em: <https://doi.org/10.1016/j.eswa.2012.02.117>.
view images. Health & Place, v. 66, p. 102428, 2020. ISSN 1353-8292. Disponível em:
<http://www.sciencedirect.com/science/article/pii/S1353829220302720>.
NAIR, V.; HINTON, G. E. RectiĄed linear units improve restricted boltzmann machines.
In: FÜRNKRANZ, J.; JOACHIMS, T. (Ed.). Proceedings of the 27th International
Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel.
[S.l.]: Omnipress, 2010. p. 807Ű814.
PANG, Y. et al. Improved crop row detection with deep neural network for
early-season maize stand count in uav imagery. Computers and Electronics
in Agriculture, v. 178, p. 105766, 2020. ISSN 0168-1699. Disponível em:
<http://www.sciencedirect.com/science/article/pii/S0168169920311376>.
SANG, H.; ZHOU, Q.; ZHAO, Y. Pcanet: Pyramid convolutional attention network
for semantic segmentation. Image and Vision Computing, v. 103, p. 103997, 2020.
ISSN 0262-8856. Disponível em: <http://www.sciencedirect.com/science/article/pii/
S0262885620301293>.
84 Bibliography
SENIOR, A. W.; LEI, X. Fine context, low-rank, softplus deep neural networks for
mobile speech recognition. In: ICASSP. IEEE, 2014. p. 7644Ű7648. Disponível em:
<https://doi.org/10.1109/ICASSP.2014.6855087>.
SEZGIN, M.; SANKUR, B. Survey over image thresholding techniques and quantitative
performance evaluation. Journal of Electronic Imaging, v. 13, p. 13 Ű 13 Ű 20, 2004.
Disponível em: <https://doi.org/10.1117/1.1631315>.
VARUN, R. et al. Face recognition using hough transform based feature extraction.
Procedia Computer Science, v. 46, p. 1491 Ű 1500, 2015. ISSN 1877-0509. Proceedings
of the International Conference on Information and Communication Technologies, ICICT
2014, 3-5 December 2014 at Bolgatty Palace & Island Resort, Kochi, India. Disponível
em: <http://www.sciencedirect.com/science/article/pii/S1877050915001337>.
VERMA, O. P.; PARIHAR, A. S. An optimal fuzzy system for edge detection in color
images using bacterial foraging algorithm. IEEE Trans. Fuzzy Syst, v. 25, n. 1, p.
114Ű127, 2017. Disponível em: <https://doi.org/10.1109/TFUZZ.2016.2551289>.
Bibliography 85
ZADA, B.; ULLAH, R. Pashto isolated digits recognition using deep convolutional
neural network. Heliyon, v. 6, n. 2, p. e03372, 2020. ISSN 2405-8440. Disponível em:
<http://www.sciencedirect.com/science/article/pii/S2405844020302176>.
ZHAO, H. et al. Pyramid scene parsing network. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. [s.n.], 2017. p. 2881Ű2890.
Disponível em: <https://doi.org/10.1109/CVPR.2017.660>.