Hasan 2024
Hasan 2024
Submitted By:
A S M Mahmudul Hasan
Information Technology, Murdoch University, Australia.
Principal Supervisor:
Professor Ferdous Sohel
Co-supervisors:
Professor Michael Jones
Professor Dean Diepeveen
Professor Hamid Laga
January, 2024
Thesis Declaration
I, A S M Mahmudul Hasan verify that in submitting this thesis; the thesis is my own
account of the research conducted by me, except where other sources are fully acknowl-
edged in the appropriate format, the extent to which the work of others has been used
is documented by a percent allocation of work and signed by myself and my Principal
Supervisor, the thesis contains as its main content work which has not been previously
submitted for a degree at any university, the University supplied plagiarism software has
been used to ensure the work is of the appropriate standard to send for examination, any
editing and proof-reading by professional editors comply with the standards set out on
the Graduate Research School website, and that all necessary ethics and safety approvals
were obtained, including their relevant approval or permit numbers, as appropriate.
i
Acknowledgements
I am profoundly grateful to all those whose support, guidance, and encouragement
have contributed to completing this thesis. First and foremost, I extend my deepest
gratitude to my principal supervisor, Professor Ferdous Sohel, whose unwavering support,
expertise, and mentorship were invaluable throughout this journey. I am indebted to my
co-supervisors, Professor Dean Diepeveen, Professor Hamid Laga and Professor Michael
Jones, for their insightful feedback and constructive criticism.
My heartfelt appreciation goes to my wife, Saria, whose unwavering love, encourage-
ment, and sacrifices made this achievement possible. To my children, Arisha and Ahyan,
your boundless patience, understanding, and the joy you bring into my life sustained me
through this challenging endeavour. I owe an immeasurable gratitude to my parents and
sister, whose unwavering belief in me, endless encouragement, and sacrifices paved the
way for my educational pursuits.
This research is supported by Murdoch International Postgraduate Scholarship and
Murdoch Strategic Scholarship. I am thankful to Murdoch University, Australia.
Lastly, I express my gratitude to all those whose names might not be mentioned but
whose contributions, in various forms, have been instrumental in shaping this thesis.
ii
Abstract
Weed is a major problem faced by the agriculture and farming sector. Advanced
imaging and deep learning (DL) techniques have the potential to automate various tasks
involved in weed management. However, automatic weed detection in crops from im-
agery is challenging because both weeds and crops are of similar colour (green on green),
and their growth and texture are somewhat similar; weeds vary based on crop, season
and weather. Moreover, recognising weed species is crucial for applying targeted con-
trolling mechanisms. This thesis focuses on improving the accuracy and throughput of
deep learning models for weed species recognition. This thesis has the following contri-
butions: First, we present a comprehensive literature review highlighting the challenges
in developing an automatic weed species recognition technique.
Second, we evaluate several neural networks for weed recognition in various exper-
imental settings and dataset combinations. Moreover, we investigate transfer-learning
techniques by preserving the pre-trained weights for extracting the features of crop and
weed datasets.
Third, we repurpose a public dataset and construct an instance-level weed dataset.
We annotate the dataset using a bounding box around each instance and label them with
the appropriate species of the crop or weed. To establish a benchmark, we evaluate the
dataset using several models to locate and classify weeds in crops.
Fourth, we propose a weed classification pipeline where only the discriminative image
patches are used to improve the performance. We enhance the images using generative
adversarial networks. The enhanced images are divided into patches, and a selected
subset of these are used for training the DL models.
Finally, we investigate an approach to classify weeds into three categories based on
morphology: grass, sedge and broadleaf. We train an object detection model to detect
plants from images. A Siamese network, leveraging state-of-the-art deep learning models
as its backbone, is used for weed classification.
Our experiments demonstrate the proposed DL techniques can be used in detecting
and classifying weeds at the species level and thereby help weed mitigation.
iii
Attribution Statement
In accordance with the Murdoch University Graduate Degrees Regulations, it is ac-
knowledged that this thesis represents the work of the Candidate with contributions from
their supervisors and, where indicated, collaborators. The Candidate is the majority con-
tributor to this thesis with no less than 75% of the total work attributed to their efforts.
iv
Authorship Declaration: Co-Authored Publications
This thesis contains works which have been published and prepared for publication.
Publication 1:
Title A survey of deep learning techniques for weed detection from images.
Authors A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga
and Michael G K Jones.
Journal Computers and Electronics in Agriculture.
Publisher Elsevier
Publication Date May 2021
DOI https://doi.org/10.1016/j.compag.2021.106067
Location in thesis Chapter 2
A S M Mahmudul
Ferdous Sohel Dean Diepeveen Hamid Laga Michael G K Jones
Hasan
Publication 2:
Title Weed recognition using deep learning techniques on class-imbalanced
imagery.
Authors A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga
and Michael G K Jones.
Journal Crop and Pasture Science.
Publisher CSIRO PUBLISHING
Publication Date April 2022
DOI https://doi.org/10.1071/CP21626
Location in thesis Chapter 3
A S M Mahmudul
Ferdous Sohel Dean Diepeveen Hamid Laga Michael G K Jones
Hasan
Publication 3:
Title Object-level benchmark for deep learning-based detection and classifi-
cation of weed species.
Authors A S M Mahmudul Hasan, Dean Diepeveen, Hamid Laga, Michael GK
Jones, Ferdous Sohel.
Journal Crop Protection.
Publisher Elsevier
Publication Date March 2024
DOI https://doi.org/10.1016/j.cropro.2023.106561
Location in thesis Chapter 4
A S M Mahmudul
Dean Diepeveen Hamid Laga Michael G K Jones Ferdous Sohel
Hasan
v
Publication 4
Title Image patch-based deep learning approach for crop and weed recogni-
tion.
Authors A S M Mahmudul Hasan, Dean Diepeveen, Hamid Laga, Michael GK
Jones, Ferdous Sohel.
Journal Ecological informatics.
Publisher Elsevier
Publication Date December 2023
DOI https://doi.org/10.1016/j.ecoinf.2023.102361
Location in thesis Chapter 5
A S M Mahmudul
Dean Diepeveen Hamid Laga Michael G K Jones Ferdous Sohel
Hasan
Publication 5
Title Morphology-based weed type recognition using Siamese network
Authors A S M Mahmudul Hasan, Dean Diepeveen, Hamid Laga, Michael GK
Jones, Ferdous Sohel.
Journal XXXXX.
Publisher XXXXX
Publication Date Under review
DOI Under review
Location in thesis Chapter 6
A S M Mahmudul
Dean Diepeveen Hamid Laga Michael G K Jones Ferdous Sohel
Hasan
I, Professor Ferdous Sohel, certify that the student statements regarding their contribu-
tion to each of the works listed above are correct.
vi
Contents
Thesis Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Attribution Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Authorship Declaration: Co-Authored Publications . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction 1
1.1 Weed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Manual approach to control weeds . . . . . . . . . . . . . . . . . . 2
1.1.2 Automation in weed control systems . . . . . . . . . . . . . . . . 2
1.2 Automation and industry needs . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Weed recognition using artificial intelligence deep learning . . . . . . . . 4
1.4 Challenges in developing deep learning based weed management system . 5
1.5 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Literature Review 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Related Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Traditional ML- vs DL-based Weed Detection Methods . . . . . . . . . . 15
2.4 Paper Selection Criteria in this Survey . . . . . . . . . . . . . . . . . . . 18
vii
Contents
viii
Contents
3 Weed classification 59
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1.1 DeepWeeds dataset . . . . . . . . . . . . . . . . . . . . . 64
3.2.1.2 Soybean Weed Dataset . . . . . . . . . . . . . . . . . . . 64
3.2.1.3 Cotton Tomato Weed Dataset . . . . . . . . . . . . . . . 65
3.2.1.4 Corn Weed Dataset . . . . . . . . . . . . . . . . . . . . . 65
3.2.1.5 Our Combined Dataset . . . . . . . . . . . . . . . . . . . 65
3.2.1.6 Unseen Test Dataset . . . . . . . . . . . . . . . . . . . . 66
3.2.2 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2.5 Transfer Learning and Fine-Tuning . . . . . . . . . . . . . . . . . 71
3.2.6 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 Experiment 1: Comparing the performance of DL models for clas-
sifying images in each of the datasets . . . . . . . . . . . . . . . . 74
3.3.2 Experiment 2: Combining two datasets . . . . . . . . . . . . . . . 76
3.3.3 Experiment 3: Training the model with all four datasets together 78
3.3.4 Experiment 4: Training the models using both real and augmented
images of the four datasets . . . . . . . . . . . . . . . . . . . . . . 80
3.3.5 Experiment 5: Comparing the performance of two ResNet-50 mod-
els individually trained on ImageNet dataset, and the combined
dataset, and testing on the Unseen Test dataset . . . . . . . . . . 83
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
ix
Contents
x
Contents
xi
Contents
7 Conclusion 171
7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.1.1 Comprehensive literature review . . . . . . . . . . . . . . . . . . . 171
7.1.2 Weed classification pipeline and evaluation of deep learning . . . . 172
7.1.3 Weed detection and classification . . . . . . . . . . . . . . . . . . 172
7.1.4 Enhancing classification accuracy . . . . . . . . . . . . . . . . . . 173
7.1.5 Generalised weed recognition technique . . . . . . . . . . . . . . . 173
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2.1 Benchmark dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2.2 Deep learning in weed detection and classification . . . . . . . . . 175
7.2.3 Field trial of the proposed models . . . . . . . . . . . . . . . . . . 176
Bibliography 178
xii
List of Figures
3.1 Sample crop and weed images of each class from the datasets. . . . . . . 67
3.2 The basic block diagram of DL models used for the experiments. . . . . . 73
3.3 Confusion matrix of “DeepWeeds” combined with other three dataset. . . 79
3.4 Example of incorrectly classified images. . . . . . . . . . . . . . . . . . . 80
3.5 Confusion matrix after combining four dataset using ResNet-50 model . . 81
3.6 Confusion matrix for ResNet-50 model with augmentation . . . . . . . . 82
3.7 Confusion matrix for CW ResNet-50 and SOTA ResNet-50 model. . . . . 83
xiii
List of Figures
5.4 Deep learning models’ accuracy with respect to image size . . . . . . . . 135
5.5 Confusion matrix for DeepWeeds dataset using DenseNet201 model . . . 138
5.6 Confusion matrix for DenseNet201 model on DeepWeeds dataset. . . . . 142
5.7 Relationship between the number of data and classification accuracy . . . 143
5.8 Image of Chenopodium album weed classified as Bluegrass. . . . . . . . . 144
5.9 Example of a chinee apple weed classified as snake weed . . . . . . . . . . 146
5.10 Grad-CAM of the extracted patches from the image . . . . . . . . . . . . 147
xiv
List of Tables
xv
List of Tables
xvi
Chapter 1
Introduction
The increase in global population has significant implications for agriculture and food
production. As the world’s population continues to grow, the demand for food rises,
putting pressure on agricultural systems to produce more efficiently and sustainably
(Schneider et al., 2011). Increasing agricultural production to meet the growing global
demand for food faces several challenges. Those includes, limited arable land, water
scarcity, climate change, soil degradation and environmental sustainability (Canavari et
al., 2010; J.-W. Han et al., 2021; Nosov et al., 2020).
1.1 Weed
The approaches to manage and control the impact of weeds depend on many factors.
However, the methods can be categorised in five main types: Preventative (prevent them
from being established), Cultural (by maintaining field conditions – low weed bank),
Mechanical (e.g., mowing, mulching and tilling), Biological (using weed’s native natural
enemy such as insects, grazing animals or disease), Chemical (using herbicide) (R. E.
Stewart, 2018).
2
Chapter 1. Introduction
Automated weed control systems can enhance operational efficiency by working con-
tinuously and at a consistent pace. Precision in weed control is significantly improved,
reducing the use of herbicides and minimising the impact on non-target crops (Korres
et al., 2019). Automation reduces reliance on manual labour and potentially lowers oper-
ational costs (Shaner & Beckie, 2014). Although many obstacles are yet to be addressed,
researchers are improving automated and sustainable weed management systems to help
overcome the agricultural production challenge of 2050 (Westwood et al., 2018).
Automation in weed detection could be an essential tool for farmers, addressing crit-
ical challenges and meeting their evolving needs. The efficiency it can bring to weed
management is particularly noteworthy, allowing farmers to swiftly identify and address
weed issues before they compromise crop health.
Labour savings can be a significant benefit. Traditional weed control methods often
rely on manual labour, which is time-consuming and costly. Automation minimises the
need for human intervention, addressing labour shortages and allowing farmers to allocate
their workforce more strategically. Moreover, providing accurate data on weed distribu-
tion within fields enables farmers to make informed decisions about resource allocation,
contributing to the overall optimisation of farming practices.
Automated weed detection may allow more efficient application of treatments, reduc-
ing the overall use of herbicides and leading to potential cost savings for farmers. Besides,
automation may help farmers intervene promptly, ensuring higher yields and maintaining
3
1.3. Weed recognition using artificial intelligence deep learning
crop quality.
Automation may provide farmers with real-time data on weed distribution and sever-
ity, empowering them to implement timely and effective weed control strategies. More-
over, Integrated Weed Management (IWM) strategies may benefit from automated weed
detection. Farmers can develop comprehensive and sustainable approaches to managing
weed populations on their farms by combining various weed control methods.
Deep learning is a subset of machine learning that uses artificial neural networks to
model and solve complex problems. It is inspired by the structure and function of the
human brain, precisely the way neurons are interconnected to process information (Ag-
garwal et al., 2018; Choi et al., 2020; Nielsen, 2015). Weed recognition using deep learning
involves the application of neural network architectures, specifically deep convolutional
neural networks (CNNs), to identify and classify weeds in images or videos (Hasan et al.,
2021; Rakhmatulin et al., 2021). Deep learning has shown remarkable success in image
recognition tasks, making it well-suited for automated weed detection in agricultural set-
tings (Rakhmatulin et al., 2021; A. Wang et al., 2019). In the context of weed detection,
deep learning offers several advantages:
• Deep learning models excel in image recognition tasks, achieving high levels of
accuracy and precision (Chartrand et al., 2017). Their ability to determine nuanced
differences in visual features makes them well-suited for distinguishing between
crops and weed species.
4
Chapter 1. Introduction
2018). This approach is particularly beneficial when dealing with limited annotated
data, as it allows the model to benefit from knowledge gained on broader image
recognition tasks.
• Deep learning models can adapt to diverse environmental conditions, lighting vari-
ations, and changes in crop and weed appearance (A. Wang et al., 2019). This
adaptability is crucial in agricultural settings where conditions can be dynamic and
challenging.
• Deep learning models can handle large-scale datasets and complex visual informa-
tion (Najafabadi et al., 2015). This capability is essential for capturing the diversity
of weed species, growth stages, and background variations commonly encountered
in agricultural imagery.
• Deep learning models can be optimised for real-time inference. This is particularly
relevant for applications like drone-based or tractor-mounted systems, where timely
decisions are critical for effective weed management (S. R. Saleem et al., 2023).
5
1.5. Aims and Objectives
depending on the geographical area and variety of crops, weather and soil conditions
and weed species vary. Therefore, most of the automatic weed management systems are
site-specific (López-Granados, 2011).
(a) Weeds and crops with similar (b) Texture similarities between (c) Weeds and crops share similar
colour (Haug & Ostermann, 2014) crop and weed plants (Bakhshipour shape (PyTorch, 2020)
& Jafari, 2018)
Figure 1.1: Example of crop and weed similarities. Here, green boxes indicates crops and
red boxes indicates weeds. Weeds and crops have quite same colour texture and shape.
Moreover, deep learning models heavily depend on large and diverse datasets for
training (Shrestha & Mahmood, 2019). However, obtaining labelled datasets with a wide
range of weed species, growth stages, and environmental conditions can be challenging
(Hasan et al., 2021; Teimouri et al., 2018). Variability in data introduces difficulties in
creating robust models that generalise well to different scenarios. Furthermore, creating
labelled datasets for training deep learning models requires significant effort. Annotating
diverse and large-scale datasets that encompass various weed types, growth stages, and
environmental conditions is time-consuming and resource-intensive.
In addition, accurate identification of weeds is crucial for effective control. Deep learn-
ing models must be trained to distinguish between various weed species and differentiate
them from crops accurately (Osorio et al., 2020). Achieving high levels of precision and
recall in weed identification remains a persistent challenge. Besides, agricultural envi-
ronments are dynamic, with lighting, weather, and soil conditions variations. Developing
deep learning models that adapt to these changing conditions and maintain accuracy
under diverse scenarios is a significant challenge (A. Sharma et al., 2020).
Given the use of deep learning in weed detection and its associated challenges mention
in Sections 1.3 and 1.4, we have the following aims and objectives in this thesis:
6
Chapter 1. Introduction
• Evaluate the performance of existing state-of-the-art deep learning model and in-
vestigate transfer-learning techniques by preserving the pre-trained weights for ex-
tracting the features of crop and weed datasets.
• Construct instance-level weed datasets and evaluate the datasets using several mod-
els to locate and classify weeds in crops.
• Recognising weeds from images is challenging due to the visual similarities between
weeds and crop plants, exacerbated by varying imaging conditions and environmen-
tal factors. We investigated advanced machine learning techniques, precisely five
state-of-the-art deep neural networks, and evaluated them across multiple exper-
imental settings and dataset combinations. Transfer learning methods were also
explored, leveraging pre-trained weights to extract features and fine-tuning the
models with images from crop and weed datasets. The objective is to enhance the
models’ performance in accurately identifying and distinguishing between crop and
weed species in diverse agricultural scenarios.
• Most existing weed datasets often lack instance-level annotations needed for robust
deep learning-based object detection. The researchers constructed a new dataset
7
1.6. Thesis contributions
with instance-level labelling, annotating bounding boxes around each weed or crop
instance. Using this dataset, they evaluated several deep-learning models for crop
weed detection, comparing their performance in inference time and detection ac-
curacy. Introducing data augmentation techniques improved results by addressing
class imbalance. The findings suggest that these deep learning techniques have the
potential to be applied in developing an automatic field-level weed detection system.
• Accurate classification of weed species within crop plants is vital for targeted treat-
ment. While recent studies demonstrate the potential of artificial intelligence,
particularly deep learning (DL) models, several challenges, including insufficient
training data and complexities like inter-class similarity and intra-class dissimilar-
ity, hinder their effectiveness. To address these challenges, the authors propose
an image-based weed classification pipeline. The pipeline involves enhancing im-
ages using generative adversarial networks, dividing them into overlapping patches,
and selecting informative patches for training deep learning models. Evaluation
of the proposed pipeline on four publicly available crop weed datasets with ten
state-of-the-art models demonstrates significant performance improvements. The
pipeline effectively handles intra-class and inter-class similarity challenges, show-
casing its potential for enhancing weed species classification in precision farming
applications.
8
Chapter 1. Introduction
Moving to Chapter 3, we assess various neural networks for weed recognition across
diverse experimental setups and dataset combinations. We also explore transfer-learning
techniques, preserving pre-trained weights to extract features from crop and weed datasets.
Finally, Chapter 7 provides concluding remarks on the thesis and explores potential
avenues for future research for recognising weeds in crops using deep machine learning
and applying them in actual field settings. Emphasis is placed on envisioning how these
advancements may lead to substantial commercial adoption of automatic weed control
technology in the foreseeable future.
9
Chapter 2
Literature Review
The rapid advances in Deep Learning (DL) techniques have enabled rapid detection,
localisation, and recognition of objects from images or videos. DL techniques are now
being used in many applications related to agriculture and farming. Automatic detec-
tion and classification of weeds can play an important role in weed management and so
contribute to higher yields. Weed detection in crops from imagery is inherently a chal-
lenging problem because both weeds and crops have similar colours (‘green-on-green’),
and their shapes and texture can be very similar at the growth phase. Also, a crop in
one setting can be considered a weed in another. In addition to their detection, the
recognition of specific weed species is essential so that targeted controlling mechanisms
(e.g. appropriate herbicides and correct doses) can be applied. In this paper, we review
existing deep learning-based weed detection and classification techniques. We cover the
detailed literature on four main procedures, i.e., data acquisition, dataset preparation,
DL techniques employed for detection, location and classification of weeds in crops, and
evaluation metrics approaches. We found that most studies applied supervised learning
techniques, they achieved high classification accuracy by fine-tuning pre-trained models
on any plant dataset, and past experiments have already achieved high accuracy when a
large amount of labelled data is available.
This chapter has been published: Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G.
(2021). A survey of deep learning techniques for weed detection from images. Computers and Electronics
in Agriculture, 184, 106067.
Chapter 2. Literature Review
2.1 Introduction
The world population has been increasing rapidly, and it is expected to reach nine
billion by 2050. Agricultural production needs to increase by about 70% to meet the
anticipated demands (Radoglou-Grammatikis et al., 2020). However, the agricultural
sector will face many challenges during this time, including a reduction of cultivatable
land and the need for more intensive production. Other issues, such as climate change and
water scarcity, will also affect productivity. Precision agriculture or digital agriculture
can provide strategies to mitigate these issues (Lal, 1991; Radoglou-Grammatikis et al.,
2020; Seelan et al., 2003).
Weeds are plants that can spread quickly and undesirably, and can impact on crop
yields and quality (Patel & Kumbhar, 2016). Weeds compete with crops for nutrition,
water, sunlight, and growing space (Iqbal et al., 2019). Therefore, farmers have to deploy
resources to reduce weeds. The management strategies used to reduce the impact of
weeds depend on many factors. These strategies can be categorised into five main types
(Sakyi, 2019): ‘preventative’ (prevent weeds from becoming established), ‘cultural’ (by
maintaining field hygiene – low weed seed bank), ‘mechanical’ (e.g., mowing, mulching
and tilling), ‘biological’ (using natural enemies of weeds such as insects, grazing animals
or disease), and ‘chemical’ (application of herbicides). These approaches all have draw-
backs. In general, there is a financial burden and they require time and extra work. In
addition, control treatments may impact the health of people, plants, soil, animals, or
the environment (Holt, 2004; Okese et al., 2020; Sakyi, 2019).
As the costs of labour has increased, and people have become more concerned about
health and environmental issues, automation of weed control has become desirable (B. Liu
& Bruch, 2020). Automated weed control systems can be beneficial both economically
and environmentally. Such systems can reduce labour costs by using a machine to remove
weeds and, selective spraying techniques can minimise the use of the herbicides (Lameski
et al., 2018).
11
2.1. Introduction
(a) Occlusion of crop and (b) Colour and texture (c) Shadow effects in natural
weed (Haug & Ostermann, similarities between crop weed image (PyTorch, 2020)
2014) and weed plants
(Bakhshipour & Jafari,
2018)
(d) Effects of illumination (e) Four different species of (f) Sugar beet crop at
conditions (Di Cicco et al., weeds that share similarities different growth stages
2017) (inter-class similarity) (intra-class variations)
(Olsen et al., 2019) (Giselsson et al., 2017)
(g) Effects of motion blur (h) Weeds can vary at different geographic/weather locations:
and noise (J. Ahmad et al., weed in carrot crop collected from Germany(left) (Haug &
2018; Giselsson et al., 2017) Ostermann, 2014) and Macedonia (Right) (Lameski et al.,
2017)
Figure 2.1: Weeds in different crops (green boxes indicate crops and red boxes indicate
weeds).
and shapes. Figure 2.1 shows crop plants with weeds growing amongst them. Common
challenges in detection and classification of crops and weeds are occlusion (Figure 2.1a),
similarity in colour and texture (Figure 2.1b), plants shadowed in natural light (Figure
2.1c), colour and texture variations due to lighting conditions and illumination (Figure
2.1d) and different species of weeds which appear similar (Figure 2.1e). Same crop plants
or weeds may show dissimilarities during growth phases (Figure 2.1f). Motion blur and
noise in the image also increase the difficulty in classifying plants (Figure 2.1g). In
addition, depending on the geographical location (Figure 2.1h) and the variety of the
crop, weather and soil conditions, the species of weeds can vary (Jensen et al., 2020a).
12
Chapter 2. Literature Review
A typical weed detection system follows four key steps: image acquisition, pre-processing
of images, extraction of features and detection and classification of weeds (Shanmugam
et al., 2020). Different emerging technologies have been used to accomplish these steps.
The most crucial part of these steps is weed detection and classification. In recent years,
with advances in computer technologies, particularly in graphical processing units (GPU),
embedded processors coupled with the use of Machine Learning (ML) techniques have
become more widely used for automatic detection of weed species (Gu et al., 2018; LeCun
et al., 2015; Yu et al., 2019b).
Deep learning (DL) is an important branch of ML. For image classification, object
detection, and recognition, DL algorithms have many advantages over traditional ML ap-
proaches (in this paper, the term machine learning, we mean traditional machine learning
approaches). Extracting and selecting discriminating features with ML methods is diffi-
cult because crops and weeds can be similar. This problem can be addressed efficiently by
using DL approaches based on their strong feature learning capabilities. Recently, many
research articles have been published on DL-based weed recognition, yet few review ar-
ticles have been published on this topic. Su (2020) recently published a review paper
in which the main focus was on the use of point spectroscopy, RGB, and hyperspectral
imaging to classify weeds in crops automatically. However, most of the articles covered
in this review have applied traditional machine learning approaches, with few citations
of recent papers. B. Liu and Bruch (2020) analysed a number of publications on weed
detection, but from the perspective of selective spraying.
We provide this comprehensive literature survey to highlight the great potential now
presented by different DL techniques for detecting, localising, and classifying weeds in
crops. We present a taxonomy of the DL techniques for weed detection and recognition,
and classify major publications based on that taxonomy. We also cover data collection,
data preparation, and data representation approaches. We provide an overview of differ-
ent evaluation metrics used to benchmark the performance of the techniques surveyed in
this article.
The rest of the paper is organised as follows. Existing review papers in this area
are discussed briefly in Section 2.2. Advantages of DL-based weed detection approaches
over traditional ML methods are discussed in Section 2.3. In Section 2.4, we describe
13
2.2. Related Surveys
how the papers for review were selected. A taxonomy and an overview of DL-based
weed detection techniques are provided in Section 2.5. We describe four major steps of
DL-based approaches, i.e. data acquisition (Section 2.6), dataset preparation (Section
2.7), detection and classification methods (Section 2.10) and evaluation metrics (Section
2.11). In Section 2.8 we have highlighted the approaches to detection of weeds in crop
plants adopted in the related work. The learning methods applied the relevant studies
are explained in Section 2.9. We summarise the current state in this field and provide
future directions in Section 2.12 with conclusions are provided in Section 2.13.
ML and DL techniques have been used for weed detection, recognition and thus for
weed management. In 2018, Kamilaris and Prenafeta-Boldú (2018) published a survey of
40 research papers that applied DL-techniques to address various agricultural problems,
including weed detection. The study reported that DL-techniques outperformed more
than traditional image processing methods.
In 2016, Merfield (2016) discussed ten components that are essential and possible
obstructions to develop a fully autonomous mechanical weed management system. With
the advance in DL, it seems that the problems raised can now be addressed. Amend
et al. (2019) articulated that DL-based plant classification modules can be deployed not
only in weed management systems but also for fertilisation, irrigation, and phenotyping.
Their study explained how “Deepfield Robotics” systems could reduce labour required for
weed control in agriculture and horticulture.
A. Wang et al. (2019) highlighted that the most challenging part of a weed detection
techniques is to distinguish between weed and crop species. They focused on different
machine vision and image processing techniques used for ground-based weed detection.
Brown and Noble (2005) made a similar observation. They reviewed remote sensing for
weed mapping and ground-based detection techniques. They also reported the limitations
of using either spectral or spatial features to identify weeds in crops. According to their
study, it is preferable to use both features.
14
Chapter 2. Literature Review
weeds in crops. They explored different remotely sensed and ground-based weed moni-
toring systems in agricultural fields. They reported that weed monitoring is essential for
weed management. They foresaw that the data collected using different sensors could
be stored in cloud systems for timely use in relevant contexts. In another study, Moaz-
zam et al. (2019) evaluated a small number of DL approaches used for detecting weeds in
crops. They identified research gaps, e.g., the lack of large crop-weed datasets, acceptable
classification accuracy and lack of generalised models for detecting different crop plants
and weed species. However, the article only covered a handful of publications and as such
the paper was not thorough and did not adequately cover the breadth and depth of the
literature.
A typical ML-based weed classification technique follows five key steps: image ac-
quisition, pre-processing such as image enhancement, feature extraction or with feature
selection, applying an ML-based classifier and evaluation of the performance (Bini et al.,
2020; César Pereira Júnior et al., 2020; Liakos et al., 2018; B. Liu & Bruch, 2020).
Different image processing methods have been applied for crop and weed classification
(Hemming & Rath, 2002; L. Tian et al., 2000; Woebbecke et al., 1995). By extracting
shape features, many researchers identify weeds and crops using discriminate analysis
(Chaisattapagon, 1995; G. Meyer et al., 1998). In some other research, different colour
(Hamuda et al., 2017; Jafari et al., 2006; Kazmi et al., 2015b; Zheng et al., 2017) and
texture (Bakhshipour et al., 2017) features were used.
The main challenge in weed detection and classification is that both weeds and crops
can have very similar colours or textures. Machine learning approaches learn the features
from the training data that are available (Bakhshipour & Jafari, 2018). Understandably,
for traditional ML-approaches, the combination of multiple modalities of data e.g. the
shape, texture and colour or a combination of multiple sensor data is expected to generate
superior results to a single modality of data. Kodagoda et al. (2008) argued that colour
or texture features of an image alone are not adequate to classify wheat from weed
15
2.3. Traditional ML- vs DL-based Weed Detection Methods
species Bidens pilosa. They used Near-Infrared (NIR) image cues with those features.
Sabzi et al. (2020) extracted eight texture features based on the grey level co-occurrence
matrix (GLCM), two spectral descriptors of texture, thirteen different colour features, five
moment-invariant features, and eight shape features. They compared the performance
of several algorithms, such as the ant colony algorithm, simulated annealing method,
and genetic algorithm for selecting more discriminative features. The performance of
the Cultural Algorithm, Linear Discriminant Analysis (LDA), Support Vector Machine
(SVM), and Random Forest classifiers were also evaluated to distinguish between crops
and weeds.
Karimi et al. (2006) applied SVM for detecting weeds in corn from hyperspectral im-
ages. In other research, Wendel and Underwood (2016) used SVM and LDA for classifying
plants. They proposed a self-supervised approach for discrimination. Before training the
models, they applied vegetation separation techniques to remove background and dif-
ferent spectral pre-processing to extract features using Principal Component Analysis
(PCA). Ishak et al. (2007) extracted different shape features and the feature vectors were
evaluated using a single-layer perceptron classifier to distinguish narrow and broad-leafed
weeds.
Several popular and high performing network architectures are available in deep learn-
ing. Two of the frequently used architectures are Convolutional Neural Networks (CNNs)
and Recurrent Neural Networks (RNNs) (Hosseini et al., 2020; LeCun et al., 2015). Al-
though CNNs are used for other types of data, the most widespread use of CNNs is
to analyse and classify images. The word convolution refers to the filtering process. A
16
Chapter 2. Literature Review
Removing
Image En-
Motion
hancement
Blur
Colour
Data
Model
Labelling
Conversion
Image Generate
Image
Augmen- Synthetic
Resizing
tation Data
Evaluation
of the Model
stack of convolutional layers is the basis of CNN. Each layer receives the input data,
transform, or convolve them and output to the next layer. This convolutional operation
eventually simplifies the data so that it can be better processed and understood. RNNs
have a built-in feedback loop, which allows them to act as a forecasting engine. Feed-
forward or CNN take a fixed size input and produces a fixed size output. The signal flow
of the feed-forward network is unidirectional, i.e., from input to output. They cannot
even capture the sequence or time-series information. RNNs overcome the limitation.
In RNN, the current inputs and outputs of the network are influenced by prior input.
Long Short-Term Memory (LSTM) is a type of RNN (LeCun et al., 2015), which has
a memory cell to remember important prior information, thus can help improving the
performance. Depending on the network architecture, DL has several components like
convolutional layers, pooling layers, activation functions, dense/fully connected layers,
encoder/decoder schemes, memory cells, gates etc. (Patterson & Gibson, 2017).
For image classification, object detection, and localisation, DL algorithms have many
17
2.4. Paper Selection Criteria in this Survey
advantages over traditional ML approaches. Because of the strong feature learning capa-
bilities, DL methods can effectively extract discriminative features of crops and weeds.
Also, with increasing data, the performance of traditional ML approaches has become
saturated. Using large dataset, DL techniques show superior performance compared to
traditional ML techniques (Alom et al., 2019). This characteristic is leading to the in-
creasing application of DL approaches. Many of the research reports in Section 2.10 show
comparisons between DL and other ML approaches to detect weeds in crops. Figure 2.2
gives an overview of DL-based weed detection and recognition techniques.
Not all the steps outlined in Figure 2.2 need to be present in every method. Four major
steps are followed in this process. They are Data Acquisition, Dataset Preparation/Image
Pre-processing, Classification and Evaluation. In this paper, we describe the steps used
in different research work to discriminate between weeds and crops using DL techniques.
After searching the above databases, duplicated documents were removed: that pro-
vided 988 documents. We further identified and counted those using DL-based method-
ology. In Figure 2.3, we show the total number of papers which used DL between 2010
to 30 August 2020. This shows that before 2016, the number of publications in this area
was very small, but that there is an upward trend in the number of papers from 2016.
For this reason, articles published from 2016 and onward were used in this survey.
18
Chapter 2. Literature Review
331
Searched document DL based article
Number of Publication
300 275
200 193
100 91
47 46 38
16 30
11 3 4 8 1 12 1 13 2 5
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Figure 2.3: The number of selected publications on DL-based weed detection approach
from 2010 to 30 August 2020
19
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches
The related publications have been analysed based on the taxonomy in Figure 2.4.
Here, the data acquisition process, sensors and mounting vehicles are highlighted. More-
over, an overview of the dataset preparation approaches, i.e., image pre-processing, data
generation and annotation are also given. While analysing these publications, it has been
found that the related works either generate a weed map for the target site or a classifi-
cation for each of the plants (crops/weeds). For developing the classifiers, the researchers
applied supervised, unsupervised or semi-supervised learning approaches. Depending on
the learning approaches and the research goal, different DL architectures were used. An
overview of the related research is provided in Table 2.2. It shows the crop and weed
species selected for experimental work, the steps taken to collect and prepare the datasets,
and the DL methods applied in the research.
20
Chapter 2. Literature Review
Espejo-Garcia Tomato, Cot- Black nightshade, velvetleaf Modified Xception, DC; (IP, IA,
et al. (2020) ton Inception-ResNet, ILA); PBC
VGGNet, MobileNet,
DenseNet
A. Wang et al. Sugar beet, Not specified FCN (DC, FR); (IP,
(2020) Oilseed IA, ILA); PBC
Le et al. (2020a) Canola, corn, Not specified Filtered Local Binary (ATV, MC);
radish Pattern with Contour (IP, IA, ILA);
Mask and Coefficient PBC
k (k-FLBPCM),
VGG-16, VGG-19,
ResNet-50, Inception-
v3
Hu et al. (2020) Not specified Chinee apple, Lantana, Parkinsonia, Inception-v3, ResNet- PD; IP; PBC
Parthenium, Prickly acacia, Rubber 50, DenseNet-202,
vine, Siam weed, Snake weed Inception-ResNet-v2,
GCN
H. Huang et al. Rice Leptochloa chinensis, Cyperus iria, FCN (DC, UAV);
(2020) Digitaria sanguinalis (L). Scop, (IP, PLA); WM
Barnyard Grass
Gao et al. Sugar beet Convolvulus sepium (hedge YOLO-v3, tiny DC; (IA, BBA);
(2020) bindweed) YOLO-v3 PBC
H. Jiang et al. Corn, lettuce, Cirsium setosum, Chenopodium al- GCN PD; (IP, ILA);
(2020) radish bum, bluegrass, sedge, other unspec- PBC
ified weed
21
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches
Bosilj et al. Sugar Beets, Not specified SegNet PD; PLA; PBC
(2020) Carrots,
Onions
Yan et al. Paddy Alternanthera philoxeroides, Eclipta AlexNet DC; ILA; PBC
(2020) prostrata, Ludwigia adscendens,
Sagittaria trifolia, Echinochloa crus-
galli, Leptochloa chinensis
R. Zhang et al. Wheat Cirsium Setosum, Descurainia YOLO-v3, Tiny (DC, UAV);
(2020) Sophia, Euphorbia Helioscopia, YOLO-v3 (IP, PLA);
Veronica Didyma, Avena Fatu PBC
Lottes et al. Sugar beet Dicot weeds, grass weeds FCN MC; (IP, PLA);
(2020) PBC
Trong et al. Not specifies 12 species of “Plant Seedlings NASNet, ResNet, In- DC; ILA, PD
(2020) dataset”, 21 species of “CNU weeds ception–ResNet, Mo-
dataset” bileNet, VGGNet
Patidar et al. Not specified Scentless Mayweed, Chickweed, Mask R-CNN PD; PLA; PBC
(2020) Cranesbill, Shepherd’s Purse,
Cleavers, Charlock, Fat Hen, Maise,
Sugar beet, Common wheat, Black-
grass, Loose Silky-bent
Ramirez et al. Sugar beet Not specified DeepLab-v3, SegNet, (MC, UAV);
(2020) U-Net (IP, PLA)
22
Chapter 2. Literature Review
Czymmek et al. Carrot Not specified Faster YOLO-v3, tiny (DC, FR); ILA;
(2019) YOLO-v3 PBC
Olsen et al. Not specified Chinee apple, Lantana, Parkinsonia, Inception-v3, ResNet- (DC, FR); (IP,
(2019) Parthenium, Prickly acacia, Rubber 50 ILA); PBC
vine, Siam weed, Snake weed
Rasti et al. Mache salad Not specified Scatter Transform, (DC, FR); (IP,
(2019) Local Binary Pattern SDG, BBA);
(LBP), GLCM, Ga- PBC
bor filter, CNN
Sarvini et al. Chrysanthemum Para grass, Nutsedge SVM, Artificial Neu- DC; (IP, IA,
(2019) ral Network (ANN), ILA); PBC
CNN
Ma et al. (2019) Rice Sagittaria trifolia SegNet, FCN, U-Net DC; (IP, BBA);
PBC
Asad and Bais Canola Not specified U-Net, SegNet (DC, ATV);
(2019) (IP, IA, PLA);
PBC
Yu et al. Bermudagrass Hydrocotyle spp., Hedyotis cormy- VGGNet, DC; (IP, ILA);
(2019b) bosa, Richardia scabra GoogLeNet, De- PBC
tectNet
23
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches
Yu et al. Perennial rye- dandelion, ground ivy, spotted AlexNet, VGGNet, DC; (IP, ILA);
(2019a) grass spurge GoogLeNet, Detect- PBC
Net
Liang et al. Not specified Not specified CNN, Histogram of (DC, UAV);
(2019) oriented Gradients (IP, ILA); PBC
(HoG), LBP
Fawakherji Sunflower, Not specified SegNet, U-Net, Bon- (DC, FR, PD);
et al. (2019) carrots, sugar Net, FCN8 PLA; PBC
beets
Binguitcha- Maise, com- Scentless Mayweed, common chick- ResNet-101 PD, (IP, IA,
Fare and mon wheat, weed, shepherd’s purse, cleavers, BBA); PBC
Sharma (2019) sugar beet Redshank, charlock, fat hen, small-
flowered Cranesbill, field pansy,
black-grass, loose silky-bent
Y. Jiang et al. Cotton Not specified Faster R-CNN DC, (IP, BBA);
(2019) PBC
24
Chapter 2. Literature Review
Adhikari et al. Paddy Wild millet ESNet, U-Net, FCN- DC; (IP, IA,
(2019) 8s, and DeepLab-v3, PLA); PBC
Faster R-CNN, ED-
Net
Farooq et al. Sugar beet Alli, hyme, hyac, azol, other unspec- CNN, FCN, LBP, su- HC; (IP, PLA);
(2019) ified weeds perpixel based LBP, PBC
FCN-SPLBP
dos Santos Fer- Soybean grass, broadleaf weeds, Chinee ap- Joint Unsupervised PD; PBC
reira et al. ple, Lantana, Parkinsonia, Parthe- LEarning (JULE),
(2019) nium, Prickly acacia, Rubber vine, DeepCluster
Siam weed, Snake weed
Rist et al. Not specified Gamba grass U-Net SI; (IP, PLA)
(2019)
25
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches
Teimouri et al. Not specified Common field speedwell, field pansy, Inception-v3 DC; (IP, ILA);
(2018) common chickweed, fat-hen, fine PBC
grasses (annual meadow-grass, loose
silky-bent), blackgrass, hemp-nettle,
shepherd’s purse, common fumi-
tory, scentless mayweed, cereal, bras-
sicaceae, maise, polygonum, oat
(volunteers), cranesbill, dead-nettle,
common poppy
Suh et al. Sugar beets Volunteer potato AlexNet, VGG-19, (DC, ATV);
(2018) GoogLeNet, ResNet- (IP, IA, ILA);
50, ResNet-101, PBC
Inception-v3
Farooq et al. Not specified Hyme, Alli, Azol, Hyac CNN HC, (IP, IA,
(2018a) BBA); PBC
Farooq et al. Not specified Hyme, Alli, Azol, Hyac CNN, HoG HC, (IP, ILA);
(2018b) PBC
Lottes et al. Sugar beet Not specified FCN (MC, FR); (IP,
(2018b) PLA); PBC
Sa et al. (2018) Sugar beet Galinsoga spec., Amaranthus SegNet (MC, UAV);
retroflexus, Atriplex spec., PLA; WM
Polygonum spec., Gramineae
(Echinochloa crus-galli, agropyron,
others.), Convolvulus arvensis, Stel-
laria media, Taraxacum spec.
26
Chapter 2. Literature Review
H. Huang et al. Rice Not specified FCN-8s, FCN-4s, (DC, UAV) (IP,
(2018a) DeepLab PLA); WM
Chavan and Maise, com- Scentless Mayweed, common chick- AlexNet, VGGNet, PD; PBC
Nandedkar mon wheat, weed, shepherd’s purse, cleavers, Hybrid Network
(2018) sugar beet Redshank, charlock, fat hen, small-
flowered Cranesbill, field pansy,
black-grass, loose silky-bent
Nkemelu et al. Maise, com- Scentless Mayweed, common chick- KNN, SVM, CNN PD; (IP, BBA);
(2018) mon wheat, weed, shepherd’s purse, cleavers, PBC
sugar beet Redshank, charlock, fat hen, small-
flowered Cranesbill, field pansy,
black-grass, loose silky-bent
Andrea et al. Maise Not specified LeNET, AlexNet, DC; (IP, IA,
(2017) cNET, sNET PLA); PBC
dos Santos Fer- Soybean Grass, broadleaf weeds AlexNet, SVM, Ad- (DC, UAV);
reira et al. aboost – C4.5, Ran- (IP, ILA); PBC
(2017) dom Forest
Tang et al. Soybean Cephalanoplos, digitaria, bindweed Back propagation DC; (IP, ILA);
(2017) neural network, SVM, PBC
CNN
Pearlstein et al. Lawn grass Not specified CNN (DC, FR); (IP,
(2016) SDG, BBA);
PBC
27
2.6. Data Acquisition
Di Cicco et al. Sugar beet Capsella bursa-pastoris, galium SegNet SDG, PBC
(2017) aparine
Dyrmann et al. Tobacco, thale Sherpherd’s-Purse , chamomile, CNN PD; (IP, IA);
(2016) cress, cleavers, knotweed family, cranesbill, chick- PBC
common weed, veronica, fat-hen, narrow-
Poppy, corn- leaved grasses, field pancy, broad-
flower, wheat, leaved grasses, annual nettle, black
maise, sugar nightshade
beet, cabbage,
barley
Unmanned Aerial Vehicles are often used for data acquisition in agricultural research.
Generally, UAVs are used for mapping weed density across a field by collecting RGB
images (H. Huang et al., 2018b, 2018c, 2020; Petrich et al., 2019) or multispectral images
(Osorio et al., 2020; Patidar et al., 2020; Ramirez et al., 2020; Sa et al., 2017, 2018).
In addition, UAVs can be used to identify crop rows and map weeds within crop rows
by collecting RGB (Red, Green and Blue color) images (Bah et al., 2018). Valente et
al. (2019) used a small quad-rotor UAV for recording images from grassland to detect
broad-leaved dock (Rumex obtusifolius). As UAVs fly over the field at a certain height,
28
Chapter 2. Literature Review
the images captured by them cover a large area. Some of the studies split the images
into smaller patches and use the patches to distinguish between weeds and crop plants
(dos Santos Ferreira et al., 2017; Milioto et al., 2017; Sivakumar et al., 2020). However,
the flight altitude can be maintained at a low height, e.g. 2 meters, so that each plant
can be labelled as either a weed or crop (Osorio et al., 2020; R. Zhang et al., 2020). Liang
et al. (2019) collected image data using a drone by maintaining an altitude of 2.5 meters.
H. Huang et al. (2018a) collected images with a resolution of 3000×4000 pixels using a
sequence of forward-overlaps and side-overlaps to cover the entire field. Lam et al. (2020)
flew DJI Phantom 3 and 4 Pro drones with a RGB camera at three different heights (10,
15 and 20 m) to determine the optimal height for weed detection.
Various types of field robot can also be used to collect images. A robotic vehicle can
carry one or more cameras. As previously discussed, robotic vehicles are used to collect
RGB images by mounted digital cameras (Czymmek et al., 2019; Fawakherji et al., 2019;
Kounalakis et al., 2019; Olsen et al., 2019; Rasti et al., 2019). Mobile phone in-built
cameras have also been used for such data collection. For example, an iPhone 6 was
used to collect video data by mounting it on a Robotic Rover (Pearlstein et al., 2016).
A robotic platform called “BoniRob” has been used to collect multi-spectral images from
the field (Lottes et al., 2018b, 2020). Kounalakis et al. (2018) used three monochrome
cameras mounted on a robot to take images. They argued that, in most cases, weeds are
green, and so are the crops. There is no need to use colour features to distinguish them.
To collect images from the field, all-terrain vehicles have also been used. ATVs can
be mounted with different types of camera (Asad & Bais, 2019; Chechlinski et al., 2019;
Dyrmann et al., 2017; Partel et al., 2019b; W. Zhang et al., 2018). Le et al. (2020a)
used a combination of multi-spectral and spatial sensors to capture data. Even multiple
low-resolution webcams have been used on an ATV (Partel et al., 2019a). To maintain
specific height with external lighting conditions, and illumination, custom made mobile
platforms have been used to carry the cameras for capturing RGB images (Skovsen et al.,
29
2.6. Data Acquisition
2019; Suh et al., 2018). When it is not possible to use any vehicle to collect images at a
certain height, tripods can be used as an alternative (Abdalla et al., 2019).
On a few occasions, weed data have been collected by cameras without being mounted
on a vehicle. As such, video data are collected using handheld cameras (Adhikari et al.,
2019; Espejo-Garcia et al., 2020; Gao et al., 2020; Y. Jiang et al., 2019; Knoll et al.,
2019; Ma et al., 2019; Sarvini et al., 2019; Sharpe et al., 2020; Tang et al., 2017; Teimouri
et al., 2018; Yan et al., 2020; Yu et al., 2019b, 2019a). Sharpe et al. (2019) collected their
data by maintaining a certain height (130 cm) from the soil surface. Brimrose VA210
filter and JAI BM-141 cameras have been used to collect hyperspectral images of weeds
and crops without using any platform (Farooq et al., 2018a, 2018b, 2019). Andrea et al.
(2017) manually focused a camera on the target plants in such a way that it could capture
images, including all the features of these plants. In Trong et al. (2020), they focus the
camera on many parts of weeds, such as flowers, leaf, fruits, or the full weeds structure.
Rist et al. (2019) use the Pleiades-HR 1A to collect high-resolution 4-band (RGB+NIR)
imagery over the area of interest. They made use of high-resolution satellite images and
applied masking to indicate the presence of weeds.
There are several publicly available crop and weed datasets that can be used to train
the DL models. Chebrolu et al. (2017) developed a dataset containing weeds in sugar beet
crops. Another annotated dataset containing images of crops and weeds collected from
fields has been made available by Haug and Ostermann (2014). A dataset of annotated
(7853 annotations) crops and weed images was developed by Sudars et al. (2020), which
comprises 1118 images of six food crops and eight weed species. Leminen Madsen et al.
(2020) developed a dataset containing 7,590 RGB images with 315,038 plant objects,
representing 64,292 individual plants from 47 different species. These data were collected
30
Chapter 2. Literature Review
in Denmark and made available for further use. A summary of the publicly available
datasets related to weed detection and plant classification is listed in Table 2.3.
We have listed nineteen datasets in Table 2.3 which are available in this area, and can
be used by researchers. Amongst these datasets, researchers will need to send a request
to the owner of “Perennial ryegrass and weed”, “CNU Weed Dataset” and “Sugar beet
and hedge bindweed” dataset to obtain the data. Other datasets can be downloaded
directly on-line. Most of the datasets contain RGB images of food crops and weeds from
different parts of the world. The RGB data have generally been collected using high-
resolution digital cameras. However, Teimouri et al. (2018) used a point grey industrial
camera. While acquiring data for the “DeepWeeds” dataset, the researchers added a
“Fujinon CF25HA-1” lens with their “FLIR Blackfly 23S6C” camera and mounted the
camera on a weed control robot (“AutoWeed”). Chebrolu et al. (2017) and Haug and
Ostermann (2015) employed “Bonirob” (an autonomous field robot) to mount the multi-
spectral cameras. “Carrots 2017” and “Onions 2017” datasets were also acquired using
a multi-spectral camera, namely the “Teledyne DALSA Genie Nano”. These researchers
used a manually pulled cart to carry the camera. The “CNU Weed Dataset” has 208,477
images of weeds collect from farms and fields in the Republic of Korea, which is the highest
number among the datasets. Though this dataset exhibits a class imbalance, it contains
twenty-one species of weeds from five families. Skovsen et al. (2019) developed a dataset
of red clover, white clover and other associated weeds. The dataset contains 31,600
unlabelled data together with 8000 synthetic data. Their goal was to generate labels for
the data using unsupervised or self-supervised approaches. All the other datasets were
manually labelled using image level, pixel-wise or bounding box annotation techniques.
Dyrmann et al. (2016) use six publicly available datasets containing 22 different plant
species to classify using deep learning methods. Several studies proposed an encoder-
decoder architecture to distinguish crops and weeds using the Crop Weed Field Image
Dataset (Brilhador et al., 2019; Umamaheswari & Jain, 2020; Umamaheswari et al.,
2018). The DeepWeeds dataset (Olsen et al., 2019) was used by Hu et al. (2020) to
evaluate their proposed method. In the study of H. Jiang et al. (2020), the “Carrot-Weed
dataset” (Lameski et al., 2017) was used with their own dataset the “Corn, lettuce and
weed dataset”. Fawakherji et al. (2019) collected data from a sunflower farm in Italy.
To demonstrate the proposed method’s generalising ability, they also used two publicly
31
2.6. Data Acquisition
32
Chapter 2. Literature Review
available datasets containing images of carrots, sugar beets and associated weeds. Bosilj
et al. (2020) also used those datasets along with the Carrot 2017 and Onion 2017 datasets.
The “Plant Seedlings” dataset is a publicly available dataset containing 12 different plant
species. Several studies used this dataset to develop a crop-weed classification model
(Binguitcha-Fare & Sharma, 2019; Chavan & Nandedkar, 2018; Nkemelu et al., 2018;
Patidar et al., 2020). dos Santos Ferreira et al. (2019) used DeepWeeds (Olsen et al.,
2019) and “Soybean and weed” datasets, which are publicly available.
While several datasets are publicly available, they are somewhat site/crop-specific. As
such there is no so-called benchmark weed dataset like ImageNet (Deng et al., 2009) and
MS COCO (Lin et al., 2014) in this research field, that is widely used in the evaluation.
After acquiring data from different sources, it is necessary to prepare data for training,
testing, and to validate models. Raw data is not always suitable for the DL model. The
dataset preparation approaches include applying different image processing techniques,
data labelling, using image augmentation techniques to increase the number of input data,
or impose variations in the data and generating synthetic data for training. Commonly
used image processing techniques are - background removal, resizing the collected image,
green component segmentation, removing motion blur, de-noising, image enhancement,
extraction of colour vegetation indices, and changing the colour model. Pearlstein et
al. (2016) decoded video into a sequence of RGB images and then converted them into
grayscale images. In further research, the camera was set to auto-capture mode to collect
images in the TIFF format and then these were converted into the RGB colour model
(Suh et al., 2018). Using three webcams on an ATV, Partel et al. (2019a) took videos
and then converted them into different frames of images. In some occasions, it was
necessary to change the image format to accurately train the model, especially when using
public datasets. For instance, Binguitcha-Fare and Sharma (2019) converted the “Plant
Seedlings Dataset” (Giselsson et al., 2017) from PNG to JPEG format, as a number
of studies have show that the JPEG format is better for training Residual Networks
architectures (Ehrlich & Davis, 2019).
33
2.7. Dataset Preparation
The majority of relevant studies undertook some level of image processing before
providing the data as an input to the DL model. It helps the DL architecture to extract
features more accurately. Here we discuss image pre-processing operations used in the
related studies.
Image Resizing Farooq et al. (2018a) investigate the performance of Deep Convolu-
tional Neural Networks based on spatial resolution. They used three different special
resolutions 30×30, 45×45, and 60×60 pixels. The lower patch size achieved good accu-
racy and required less time to train the model. To make the processing faster and reduce
the computational complexity, most of the studies performed image resizing operations
on the dataset before inputting into the DL model. After collecting images from the
field, the resolution of the images is reduced based on the DL network requirement. Yu
et al. (2019b) used 1280×720 pixel-sized images to train DetectNet (Tao et al., 2016)
architecture and 640×360 pixels for GoogLeNet (Szegedy et al., 2015) and VGGNet (Si-
monyan & Zisserman, 2014) neural networks. The commonly used image sizes (in pixel)
are- 64×64 (Andrea et al., 2017; Bah et al., 2018; Milioto et al., 2017; W. Zhang et al.,
2018), 128×128 (Binguitcha-Fare & Sharma, 2019; Dyrmann et al., 2016; Espejo-Garcia
et al., 2020), 224×224 (Binguitcha-Fare & Sharma, 2019; H. Jiang et al., 2020; Olsen
et al., 2019), 227×227 (Suh et al., 2018; Valente et al., 2019), 228×228 (Le et al., 2020a),
256×256 (dos Santos Ferreira et al., 2017; Hu et al., 2020; Pearlstein et al., 2016; Petrich
et al., 2019; Tang et al., 2017), 320×240 (Chechlinski et al., 2019), 288×288 (Adhikari
et al., 2019), 360×360 (Binguitcha-Fare & Sharma, 2019).
Images with high resolution are sometimes split into a number of patches to reduce the
computational complexity. For instance, in the work of Rasti et al. (2019), the images were
split with a resolution of 5120×3840 into 56 patches. Similar operations were performed
by Asad and Bais (2019), H. Huang et al. (2018b), and Ma et al. (2019) where they
divided the original images into tiles of size 912×1024, 1440×960 and 1000×1000 pixels.
Ramirez et al. (2020) captured only five images at high resolution using a drone which
was then split into small patches of size 480×360 without overlapping and 512×512 with
30% overlap. Partel et al. (2019b) collected images using three cameras simultaneously
34
Chapter 2. Literature Review
of resolution 640×480 pixels. They then merged those into a single image of 1920×480
pixels which was resized to 1024×256 pixels. Yu et al. (2019a) scaled down the images of
their dataset to 1224×1024 pixels, so that the training did not run low on memory. H.
Huang et al. (2018a) used orthomosaic imagery, which is usually quite large. They split
the images into small patches of 1000×1000 pixels. In the study of Sharpe et al. (2019),
the images were resized to 1280×720 pixels and then cropped into four sub-images. Osorio
et al. (2020) used 1280×960 pixel size image with four spectral bands. By applying union
operation on the red, green, and near infrared bands, they generated a false green image
in order to highlight the vegetation. Sharpe et al. (2020) resized the collected image to
1280×853 pixels and then cropped it to 1280×720 pixels.
Background Removal H. Huang et al. (2020) collected images using a UAV and
applied image mosaicing to generate an orthophoto. Bah et al. (2018) applied Hough-
transform to highlight the aligned pixels and used Otsu-adaptive-thresholding method to
differentiate the background and green crops or weeds. On the other hand, for remov-
ing the background soil image, Milioto et al. (2017) applied the Normalised Difference
Vegetation Index (NDVI). They also used morphological opening and closing operations
to remove the noise and fill tiny gaps among vegetation pixels. To annotate the images
manually into respective classes, dos Santos Ferreira et al. (2017) applied the Simple Lin-
ear Iterative Clustering algorithm. This algorithm helps to segment weeds, crops, and
background from images. Image pre-processing techniques were also involved in (Sa et
al., 2017) for having a bounding box around crop plants or weeds and removed the back-
ground. They first used image correlation and cropping for alignment and then applied
Gaussian blur, followed by a sharpening operation to remove shadows, small debris, etc.
Finally, for executing the blob detection process on connected pixels, Otsu’s method was
employed. Lottes et al. (2020) applied the pre-processing operation on red, green, blue,
and NIR channels separately. They also performed the Gaussian blur operation to remove
noise using a [5×5] kernel. To standardise the channels, the values were subtracted by
the mean of all channel values and divided by their standard deviation. After that, they
normalised and zero-centred the channel values. Y. Jiang et al. (2019) applied a Contrast
Limited Adaptive Histogram Equalisation algorithm to enhance the image contrast and
reduce the image variation due to ambient illumination changes.
35
2.7. Dataset Preparation
In the work of Le et al. (2020a) and (Bakhshipour et al., 2017), all images were seg-
mented using the Excess Green minus Excess Red Indices (ExG-ExR) method, which
effectively removed the background. They also applied opening and closing morphologi-
cal operations of images and generated contour masks to extract features. On the other
hand, Asad and Bais (2019) argued that the Maximum Likelihood Classification tech-
nique performed better than thresholding techniques for segmenting the background soil
and green plants. According to Alam et al. (2020), images captured from the field had
many problems (e.g. lack of brightness). It was necessary to apply image pre-processing
operations to prepare the data for training. They performed several morphological oper-
ations to remove motion blur and light illumination. They also removed the noisy region
before applying segmentation operations for separating the background. Threshold-based
segmentation techniques had been used to separate the soil and green plants in an im-
age. In the reports of Espejo-Garcia et al. (2020) and Andrea et al. (2017), the RGB
channels of the images were normalised to avoid differences in lighting conditions before
removing the background. For vegetation segmentation, Otsu’s thresholding was applied,
followed by the ExG (Excess Green) vegetation indexing operation. However, Dyrmann
et al. (2016) used a simple excessive green segmentation technique for removing the back-
ground and detecting the green pixels. Knoll et al. (2019) converted the RGB image to
HSV colour space, applied thresholding method and band-pass filtering, and then used
binary masking to extract the image’s green component.
Image Enhancement and Denoising Nkemelu et al. (2018) investigated the impor-
tance of image pre-processing operation by training the CNN model with raw data and
processed data. They found that without image pre-processing the model performance
decreased. They used Gaussian Blur for smoothing the images and removed the high-
frequency content. They then converted the colour of the image to HSV space. Using a
morphological erosion with an 11×11 structuring kernel, they subtracted the background
soil and produced foreground seedling images. Lottes et al. (2018b) reported that image
pre-processing improved the generalisation capabilities of a classification system. They
applied [5×5] Gaussian Kernel to remove noise and to normalise the data. They also zero-
centred the pixel values of the image. The study of Sarvini et al. (2019) used the Gaussian
and median filter to remove Gaussian noise and Salt and Pepper noise respectively. Tang
36
Chapter 2. Literature Review
et al. (2017) also normalised the data to maintain zero-mean and unit variance. Besides,
they applied Principal Component Analysis and Zero-phase Component Analysis data
whitening for eliminating the correlation among the data.
A. Wang et al. (2020) evaluated the performance of the DL model based on the input
representation of images. They applied many image pre-processing operations, such as
histogram equalisation, automatic adjustment of the contrast of images and deep photo
enhancement. They also used several vegetation indices including ExG, Excess Red,
ExG-ExR, NDVI, Normalised Difference Index, Colour Index of Vegetation, Vegetative
Index, and Modified Excess Green Index and Combined Indices. Liang et al. (2019)
split the collected data into blocks which contained multiple plants. The blocks were
then divided into sub-images with a single plant in them. After that, the histogram
equalisation operation was performed to enhance the contrast of the sub-images.
To enlarge the size of the training data, in several related studies data augmentation
was applied. It is a very useful technique when the dataset is not large enough (Umama-
heswari & Jain, 2020). If there is a little variation (Sarvini et al., 2019) or class imbalance
(Bah et al., 2018) among the images of the dataset, the image augmentation techniques
are helpful. A. Wang et al. (2020) applied an augmentation to the dataset to determine
the generalisation capability of their proposed approach. Table 2.4 shows different types
of data augmentation used in the relevant studies.
As shown in Table 2.4, it is observed that in most of the studies, different geometric
transformation operation were applied to the data. Use of colour augmentation can be
helpful to train a model for developing a real-time classification system. This is because
the colour of the object varies depending on the lighting condition and motion of the
sensors.
Image data that are not collected from the real environments and created artificially
37
2.7. Dataset Preparation
Table 2.4: Different types of data augmentation techniques used in the relevant studies
Image
Augmentation Description Reference
Technique
Rotation Rotate the image to the right or (Adhikari et al., 2019; Andrea et al., 2017; Bah et al.,
left on an axis between 1◦ and 359◦ 2018; Binguitcha-Fare & Sharma, 2019; Brilhador et al.,
(Shorten & Khoshgoftaar, 2019) 2019; Dyrmann et al., 2016; Espejo-Garcia et al., 2020;
Farooq et al., 2018a; Gao et al., 2020; Le et al., 2020a;
Sarvini et al., 2019; W. Zhang et al., 2018)
Scaling Use zooming in/out to resize the im- (Adhikari et al., 2019; Asad & Bais, 2019; Binguitcha-
age (H. Kumar, 2019). Fare & Sharma, 2019; Brilhador et al., 2019; Gao et al.,
2020)
Shearing Shift one part of the image to a direc- (Asad & Bais, 2019; Brilhador et al., 2019; Gao et al.,
tion and the other part to the oppo- 2020; Le et al., 2020a; W. Zhang et al., 2018)
site direction (Shorten & Khoshgof-
taar, 2019).
Flipping Flip the image horizontally or verti- (Abdalla et al., 2019; Adhikari et al., 2019; Asad &
cally (H. Kumar, 2019). Bais, 2019; Binguitcha-Fare & Sharma, 2019; Brilhador
et al., 2019; Chechlinski et al., 2019; Dyrmann et al.,
2016; Gao et al., 2020; Petrich et al., 2019; Sarvini et
al., 2019; W. Zhang et al., 2018)
Gamma Encode and decode the luminance (A. Wang et al., 2020)
Correction values of an image (Brasseur, n.d.).
Colour Isolating a single colour channel, in- (Adhikari et al., 2019; Asad & Bais, 2019; Bah et al.,
Space crease or decrease the brightness of 2018; Chechlinski et al., 2019; Espejo-Garcia et al.,
the image, changing the intensity val- 2020; Petrich et al., 2019; A. Wang et al., 2020)
ues in the histograms (Shorten &
Khoshgoftaar, 2019).
Colour Increase or decrease the pixel val- (Binguitcha-Fare & Sharma, 2019; Le et al., 2020a; Pet-
Space ues by a constant value and restrict- rich et al., 2019; Sarvini et al., 2019)
Transfor- ing pixel values to a certain min or
mations max value (Shorten & Khoshgoftaar,
2019).
Noise Injec- Injecting a matrix of random values (Espejo-Garcia et al., 2020; Petrich et al., 2019; Sarvini
tion to the image matrix. For example: et al., 2019)
Salt-Pepper noise, Gaussian noise etc
(Shorten & Khoshgoftaar, 2019).
Kernel fil- Sharpening or blurring the image (Asad & Bais, 2019; Bah et al., 2018; Espejo-Garcia et
tering (Shorten & Khoshgoftaar, 2019). al., 2020; Petrich et al., 2019)
Cropping Remove a certain portion of an im- (Adhikari et al., 2019; Asad & Bais, 2019; Farooq et al.,
age (Takahashi et al., 2018). Usually 2018a; Petrich et al., 2019)
this is done at random in case of data
augmentation (Shorten & Khoshgof-
taar, 2019).
Translation Shift the position of all the image (Abdalla et al., 2019; Asad & Bais, 2019; Brilhador et
pixels (S.-W. Huang et al., 2018). al., 2019)
or programmatically are known as synthetic data or images (Viraf, 2020). It is not always
possible to manage a large amount of labelled data to train a model. In these cases, the
use of synthetic data is an excellent alternative to use together with the real data. Several
research studies show that artificial data might have a significant change in classifying
images (Andreini et al., 2020). In weed detection using DL approaches, synthetic data
generation technique is not applied very often. Rasti et al. (2019) used synthetically
generated images to train the model and achieved a good classification accuracy while
38
Chapter 2. Literature Review
On the other hand, Pearlstein et al. (2016) created complex occlusion of crops and
weeds and generated variation in leaf size, colour, and orientation by producing synthetic
data. To minimise human effort for annotating data, Di Cicco et al. (2017) generated
synthetic data to train the model. For that purpose, they used a generic kinematic model
of a leaf prototype to generate a single leaf of different plant species and then meshed
that leaf to the artificial plant. Finally, they placed the plant in a virtual crop field for
collecting the data without any extra effort for annotation.
Skovsen et al. (2019) generated a 8000 synthetic dataset for labelling a real dataset.
To create artificial data, they cropped out different parts of the plant, randomly selected
any background from the real data, applied image processing (e.g. rotation, scaling, etc.),
and added an artificial shadow using a Gaussian filter.
The majority of the reviewed publications used manually annotated data labelled by
experts for training the deep learning model in a supervised manner. The researchers
applied different annotations, such as bounding boxes annotation, pixel-level annotation,
image-level annotation, polygon annotation, and synthetic labelling, based on the research
need. Table 2.5 shows different image annotation approaches used for weed detection.
However, H. Jiang et al. (2020) applied a semi-supervised method to label the images;
they used a few labelled images to annotate the unlabelled data. On the other hand, dos
Santos Ferreira et al. (2019) proposed a semi-automatic labelling approach. Unlike semi-
supervised data annotation, they did not use any manually labelled data, but applied the
clustering method to label the data. First, they divided the data into different clusters
according to their features and then labelled the clusters. Similar techniques were used
by Hall et al. (2018). Yu et al. (2019a) separated the collected images into two parts;
one with positive images that contained weeds, and the other of negative images without
weeds. Lam et al. (2020) proposed an object-based approach to generate labelled data.
39
2.8. Detection Approaches
Table 2.5: Different image annotation techniques used for weed detection using deep
learning
Type of Image
Description Reference
Annotation
Pixel Level Annotation Label each pixel whether Abdalla et al. (2019), Adhikari et al. (2019), Andrea et al.
it belongs to crop or (2017), Asad and Bais (2019), Bini et al. (2020), Bosilj et al.
weed in the image. (2020), Brilhador et al. (2019), Chechlinski et al. (2019), Fa-
rooq et al. (2019), Fawakherji et al. (2019), Hall et al. (2018),
H. Huang et al. (2018a, 2018b, 2020), Ishak et al. (2007), Knoll
et al. (2019), Kounalakis et al. (2019), Lam et al. (2020), Li-
akos et al. (2018), Lottes et al. (2018b, 2020), Milioto et al.
(2017), Osorio et al. (2020), Patidar et al. (2020), Ramirez et
al. (2020), Rist et al. (2019), Sa et al. (2018), Skovsen et al.
(2019), Umamaheswari and Jain (2020), and R. Zhang et al.
(2020)
Bounding There may be a mixture Bah et al. (2018), Binguitcha-Fare and Sharma (2019), Dyr-
Region Level Annotation
Boxes of weeds and crops in a mann et al. (2017), Farooq et al. (2018a), Gao et al. (2020),
Annotation single image. Using a H. Huang et al. (2018c), Y. Jiang et al. (2019), Kounalakis
bounding box the crops et al. (2018), Ma et al. (2019), Nkemelu et al. (2018), Partel
and weeds are labelled in et al. (2019a), Petrich et al. (2019), Rasti et al. (2019), Sa et
the image. al. (2017), Sharpe et al. (2019, 2020), Sivakumar et al. (2020),
and Valente et al. (2019)
Polygon This is used for seman- Patidar et al. (2020)
Annotation tic segmentation to de-
tect irregular shaped ob-
ject. It outlines the re-
gion of interest with ar-
bitrary number of sides.
Image Level Annotation Uses separate image for Alam et al. (2020), Czymmek et al. (2019), dos Santos Fer-
weeds and crops to train reira et al. (2017), Espejo-Garcia et al. (2020), Farooq et al.
the model. (2018b), H. Jiang et al. (2020), Le et al. (2020a), Liang et al.
(2019), Olsen et al. (2019), Partel et al. (2019b), Sarvini et al.
(2019), Suh et al. (2018), Tang et al. (2017), Teimouri et al.
(2018), Trong et al. (2020), A. Wang et al. (2020), Yan et al.
(2020), Yu et al. (2019b, 2019a), and W. Zhang et al. (2018)
Synthetic Labelling For training the model (Di Cicco et al., 2017; Pearlstein et al., 2016)
use generated and la-
belled data.
shape: by using polygon annotation, the images of crops and weeds can be separated ac-
curately. Synthetic labelling approaches can minimise labelling costs and help to generate
large annotated datasets.
Studies in this area apply two broad approaches for detecting, localising, and classi-
fying weeds in crops: i) localise every plant in an image and classify that image either as
a crop or as a weed; ii) map the density of weeds in the field. To detect weeds in crops,
the concept of “row planting” has been used. In some of these studies, there are further
classification steps of the weed species.
40
Chapter 2. Literature Review
Mapping weed density can also be helpful for site-specific weed management and can
lead to a reduction in the use of herbicides. H. Huang et al. (2020) used the DL technique
to map the density of weeds in a rice field. An appropriate amount of herbicides can be
applied to a specific site based on the density map. The work in Abdalla et al. (2019)
segmented the images and detected the weed presence in the region of that image. Using
a deep learning approach, H. Huang et al. (2018c) generated a weed distribution map of
the field. In addition, some researchers argued that weed mapping helps to monitor the
conditions of the field automatically (Sa et al., 2017, 2018). Farmers can monitor the
distribution and spread of weeds, and can take action accordingly.
Supervised learning occurs when the datasets for training and validation are labelled.
The dataset passed in the DL model as input contains the image along with the corre-
41
2.9. Learning Methods
sponding labels. That means, in supervised training, the model learns how to create a
map from a given input to a particular output based on the labelled dataset. Supervised
learning is popular to solve classification and regression problems (Caruana & Niculescu-
Mizil, 2006). In most of the related research the supervised learning approach was used to
train the DL models. Section 2.10 presents a detail description of those DL architectures.
Unsupervised learning occurs when the training set is not labelled. The dataset
passed as input in the unsupervised model has no corresponding annotation. The models
attempt to learn the structure of the data and extract distinguishable information or
features from data. Using this process, the model becomes able to map the input to
the particular output. From this, the objects in the whole dataset will be divided into
separate groups or clusters. The features of the objects in a cluster are similar and differ
from other clusters. This is how unsupervised learning can classify objects of a dataset
into separate categories. Clustering is one of the applications of unsupervised learning
(Barlow, 1989).
Most of the relevant studies used a supervised learning approach to detect and classify
weeds in crops automatically. However, dos Santos Ferreira et al. (2019) proposed un-
supervised clustering algorithms with a semi-automatic data labelling approach in their
research. They applied two clustering methods- Joint Unsupervised Learning (JULE) and
Deep Clustering for Unsupervised Learning of Visual Features algorithms (DeepCluster).
They developed the models using AlexNet (Krizhevsky et al., 2012) and VGG-16 (Si-
monyan & Zisserman, 2014) architecture and initialised with pre-trained weights. They
achieved a promising result (accuracy 97%) in classifying weeds in crops and reduce the
cost of manual data labelling.
42
Chapter 2. Literature Review
Semi-supervised learning takes the middle ground between supervised and unsuper-
vised learning (Lee, 2013). A few researchers used Graph Convolutional Network (GCN)
(Kipf & Welling, 2016) in their research, which is a semi-supervised model. The major
difference between CNN and GCN is the structure of input data. CNN is for regular
structured data, whereas GCN uses graph data structure (Mayachita, 2020). We discuss
the use of GCN in the related work in Section 2.10.4.
Our analysis shows that the related studies apply different DL architectures to clas-
sify the weeds in crop plants based on the dataset and research goal. Most researchers
compared their proposed models either with other DL architecture or with traditional
machine learning approaches. Table 2.2 shows an overview of different DL approach used
in weed detection. A CNN model generally consists of two basic parts- feature extrac-
tion and classification (Khoshdeli et al., 2017). In related research, some researchers
applied CNN models using various permutation of feature extraction and classification
layers. However, in most cases, they preferred to use state-of-art CNN models like VG-
GNet (Simonyan & Zisserman, 2014), ResNet (deep Residual Network) (K. He et al.,
2016), AlexNet (Krizhevsky et al., 2012), InceptionNet (Szegedy et al., 2015), and many
more. Fully Convolutional Networks (FCNs) like SegNet (Badrinarayanan et al., 2017)
and U-Net (Ronneberger et al., 2015) were also used in several studies.
Suh et al. (2018) applied six well known CNN models namely AlexNet, VGG-19,
GoogLeNet, ResNet-50, ResNet-101 and Inception-v3. They evaluated the network per-
formance based on the transfer learning approach and found that pre-trained weights
had a significant influence on training the model. They obtained the highest classifica-
43
2.10. Deep Learning Architecture
tion accuracy (98.7%) using the VGG-19 model, but it took the longest classification
time. Considering that, the AlexNet model worked best for detecting volunteer potato
plants in sugar beet according to their experimental setup. Even under varying light
conditions, the model could classify plants with an accuracy of about 97%. The study
of dos Santos Ferreira et al. (2017) also supported that. They compared the classifica-
tion accuracy of AlexNet with SVM, Adaboost – C4.5, and the Random Forest model.
The AlexNet architecture performed better than other models in discriminating soybean
crop, soil, grass, and broadleaf weeds. Similarly, Valente et al. (2019) reported that
the AlexNet model with pre-trained weights showed excellent performance for detecting
Rumex in grasslands. They also showed that by increasing heterogeneous characteris-
tics of the input image might improve the model accuracy (90%). However, Lam et al.
(2020) argued that to detect Rumex in grassland the VGG-16 model performs well with
an accuracy of 92.1%.
Teimouri et al. (2018) demonstrated that, although ImageNet dataset does not contain
the images of different plant species, the pre-trained weights of the dataset could still help
to reduce the number of training iterations. They fine-tuned Inception-v3 architecture
for classifying eighteen weed species and determining growth stages based on the number
of leaves. The model achieved the classification accuracy of 46% to 78% and showed an
average accuracy of 70% while counting the leaves. However, Olsen et al. (2019) differed
from them. They developed a multi-class weed image dataset consisting of eight nationally
significant weed species. The dataset contains 17,509 annotated images collected from
different locations of northern Australia. They also applied the pre-trained Inception-
v3 model along with ResNet-50 to classify the weed species (source code is available
here: https://github.com/AlexOlsen/DeepWeeds). The average classification accuracy
of ResNet-50 (95.7%) was a little higher than Inception-v3 (95.1%). Bah et al. (2018)
also used ResNet with pre-trained weights as they found it more useful to detect weeds.
44
Chapter 2. Literature Review
of ≥0.99. On the other hand, Sharpe et al. (2019) evaluated the performance of VGGNet,
GoogLeNet, and DetectNet architecture using two variations of images (i.e., whole and
cropped images). They also agreed that the DetectNet model could detect and classify
weed in strawberry plants more accurately using cropped sub-images. They suggested
that the most visible and prevalent part of the plant should be annotated rather than
labelling the whole plant in the image.
Le et al. (2020a) proposed a model namely Filtered LBP (Local Binary Patterns) with
Contour Mask and coefficient k (k-FLBPCM). They compared the model with VGG-16,
VGG-19, ResNet-50, and Inception-v3 architecture. The k-FLBPCM method effectively
classified barley, canola and wild radish with an accuracy of approximately 99%, which
was better than other CNN models (source code is available here: https://github.com/
vinguyenle/k-FLBPCM-method). The network was trained using pre-trained weights
from ImageNet dataset.
Andrea et al. (2017) compared the performance of LeNET (LeCun et al., 1989),
AlexNet, cNET (Gabor et al., 1996), and sNET (Qin et al., 2019) in their research.
They found that cNET was better in classifying maize crop plants and their weeds. They
further compared the performance of the original cNET architecture with the reduced
number of filter layers (16 filter layers). The result reported that with pre-processed
images, 16 filter layers were adequate to classify the crops and weeds. Besides, it made
the model 2.5 times faster than its typical architecture and helped to detect weeds in
real-time.
Partel et al. (2019b) analysed the performance of Faster R-CNN (Ren et al., 2015),
YOLO-v3 (Redmon & Farhadi, 2018), ResNet-50, ResNet-101, and Darknet-53 (Redmon,
n.d.) models to develop a smart sprayer for controlling weed in real-time. Based on
precision and recall value, ResNet-50 model performed better than others. In contrast,
Binguitcha-Fare and Sharma (2019) applied the ResNet-101 model. They demonstrated
that the size of the input image could affect the performance of ResNet-101 architecture.
They used three different pixel sizes (i.e., 128px, 224px, and 360px) for their experiment
and reported that model accuracy gets better by increasing the pixel size of the input
image.
45
2.10. Deep Learning Architecture
weeds. In this approach, they trained five pre-trained DL models, including NASNet,
ResNet, Inception–ResNet, MobileNet, and VGGNet independently. The Bayesian con-
ditional probability-based technique and priority weight scoring method were used to
calculate the score vector of models. The model with better scores has a higher prior-
ity on determining the classes of species. To classify weed species, they summed up the
probability vectors generated by the softmax layer of each model and the species with the
highest probability value was determined. According to the experimental results, they
argued that the performance of this approach was better than a single DL model.
Dyrmann et al. (2016) argued that, a CNN model initialised with pre-trained weights
which was not trained on any plant images would not work well. They therefore built a
new architecture using a combination of convolutional layers, batch normalisation, acti-
vation functions, max-pooling layers, fully connected layers, and residual layers according
to their need. The model was used to classify twenty-two plant species, and they achieved
a classification accuracy ranging from 33% to 98%.
Milioto et al. (2017) built a CNN model for blob wise discrimination of crops and
weeds. They used multi-spectral images to train the model. They investigated different
combinations of convolutional layers and fully connected layers to explore an optimised,
light-weight and over-fitting problem-free model. Finally, using three convolutional layers
and two fully connected layers, they obtained a better result. They stated that this
approach did not have any geometric priors like planting the crops in rows. Farooq et
al. (2018a) claimed in their research that the classification accuracy of the CNN model
depended on the number of the hyperspectral band and the resolution of the image
patch. They also built a CNN model using a combination of convolutional, nonlinear
transformation, pooling and dropout layers. In further research, they proved that a CNN
model trained with a higher number of bands could classify images more accurately than
HoG (Histogram of oriented Gradients) based method (Farooq et al., 2018b).
Nkemelu et al. (2018) compared CNN’s performance with SVM (61.47%) and K-
Nearest Neighbour (KNN) algorithm (56.84%) and found that CNN could distinguish
crop plants from weeds better. They used six convolutional layers and three fully con-
46
Chapter 2. Literature Review
nected layers in the CNN architecture to achieve the accuracy of 92.6%. They also
evaluated the accuracy of CNN using the original images and the pre-processed images.
The experimental results suggested that classification accuracy improved by using pre-
processed images. Sarvini et al. (2019) agreed that CNN offers better accuracy than
SVM and ANN in detecting weeds in crop plants because of its deep learning ability.
Liang et al. (2019) employed a CNN architecture that consists of three convolutional,
three pooling, four Dropout layers, and a fully connected layer for developing a low-
cost weed recognition system. Their experiment also proved that the performance of the
CNN model in classification was better than the HoG and LBP methods. W. Zhang et al.
(2018) also demonstrated that the CNN model was better than SVM for detecting broad-
leaf weeds in pastures. They used a CNN model with six convolutional layers and three
fully connected classification layers. The model could recognise weeds with an accuracy
of 96.88%, where SVM achieved maximum accuracy of 89.4%.
Pearlstein et al. (2016) used synthetic data to train their CNN model and evaluated
it on real data. They built a CNN model with five convolutional layers and two fully
connected layers. The results showed that CNN could classify crop plants and weeds very
well from natural images and with multiple occlusion. Although Rasti et al. (2019) applied
the same architecture in their research, they argued that the Scatter Transform method
achieved better accuracy with a small dataset than the CNN architecture. They compared
several machine learning approaches like Scatter Transform, LBP, GLCM, Gabor filter
with the CNN model. They also used synthetic data for training and evaluated the
models’ performance on real field images.
Based on the tiny YOLO-v3 (Yi et al., 2019) framework, Gao et al. (2020) proposed
a DL model which speeds up the inference time of classification (source code is avail-
able here: https://drive.google.com/file/d/1-E_b_5oqQgAK2IkzpTf6E3X1OPm0pjqy/
view?usp=sharing). They added two extra convolutional layers to the original model
for better feature fusion and also reduced the number of detection scales to two. They
trained the model with both synthetic data and real data. Although YOLO-v3 archived
better classification accuracy in the experiments, they recommended the tiny YOLO-v3
47
2.10. Deep Learning Architecture
model for real-time application. Sharpe et al. (2020) also used tiny YOLO-v3 model to
detect goosegrass in strawberry and tomato plants.
YOLO-v3 and tiny YOLO-v3 models were also employed in a research by Partel et
al. (2019a). The aim was to find a low-cost, smart weed management system. They
applied the models on two machines with different hardware configurations. Their paper
reported that YOLO-v3 showed good performance when tested on powerful and expensive
computers, but the processing speed decreased if executed on a lower power computer.
From their experiments, they came to the conclusion that to save the hardware cost,
the tiny YOLO-v3 model was better. W. Zhang et al. (2018) also preferred to use tiny
YOLO-v3 instead of YOLO-v3, because it was a lightweight method and took less time
and resources to classify objects. In contrast, Czymmek et al. (2019) proposed to use
YOLO-v3 with a relatively larger input image size (832 × 832 pixels). They argued that
the model performed better in their research with a small dataset. They agreed that tiny
YOLO-v3 or Fast YOLO-v3 could improve the detection speed, but there was a need to
compromise with the model accuracy.
Sivakumar et al. (2020) trained and evaluated the performance of a pre-trained Faster
R-CNN and SSD (Single Shot Detector) (W. Liu et al., 2016) object detection models to
detect late-season weed in soybean fields. Moreover, they compared these object detection
models with patch-based CNN model. The result showed that Faster R-CNN performed
better in terms of weed detection accuracy and inference speed. Y. Jiang et al. (2019)
proposed the Faster R-CNN model to detect the weeds and crop plants and to count the
number of seedlings from the video frames. They used Inception-ResNet-v2 architecture
as the feature extractor. On the other hand, by applying the Mask R-CNN model on
“Plant Seedlings Dataset” Patidar et al. (2020) achieved more than 98% classification
accuracy. They argued that Mask R-CNN detected plant species more accurately with
less training time than FCN.
Osorio et al. (2020) compared two RPN models, namely YOLO-v3 and Mask R-CNN
with SVM. The classification accuracy of RPN architectures was 94%, whereas SVM
achieved 88%. However, they reported that as SVM required less processing capacity, it
could be used for IoT based solution.
48
Chapter 2. Literature Review
Unlike CNN, FCN replaces all the fully connected layers with convolutional layers
and uses a transposed convolution layer to reconstruct the image with the same size as
the input. It helps to predict the output by making a one-to-one correspondence with
the input image in the spatial dimension (H. Huang et al., 2020; Shelhamer et al., 2017).
H. Huang et al. (2018b) compared the performance of AlexNet, VGGNet, and GoogLeNet
as the base model for FCN architecture. VGGNet achieved the best accuracy among
those. They further compared the model with patch-based CNN and pixel-based CNN
architectures. The result showed that the VGG-16 based FCN model achieved the highest
classification accuracy. On the other hand, H. Huang et al. (2018c) applied ResNet-101
and VGG-16 as a baseline model of FCN for segmentation. They also compared the per-
formance of the FCN models with a pixel-based SVM model. In their case, ResNet-101
based FCN architecture performed better. Asad and Bais (2019) compared two FCN ar-
chitecture for detecting weeds in canola fields, i.e., SegNet and U-Net. They used VGG-16
and ResNet-50 as the encoder block in both the models. The SegNet with ResNet-50 as
the base model achieved the highest accuracy.
According to Ma et al. (2019), SegNet (accuracy 92.7%) architecture was better than
traditional FCN (accuracy 89.5%) and U-Net (accuracy 70.8%) for weed image segmen-
tation when classifying rice plants and weeds in the paddy field. The study of Abdalla
et al. (2019) reported that the accuracy of image segmentation depended on the size of
the dataset. That is why it is difficult to train a model from scratch. To address this
problem, they applied transfer learning and real-time data augmentation to train the
model. In their experiment, they used VGG-16 based SegNet architecture. They applied
three different transfer learning approaches for VGG-16. Moreover, the performance of
the model was compared with the VGG-19 based architecture. The VGG-16 based Seg-
Net achieved the highest accuracy of 96% when they used pre-trained weights only for
feature extraction and the shallow machine learning classifier (i.e., SVM) for segmenta-
tion. Sa et al. (2017) also applied SegNet with the pre-trained VGG-16 as the base model
(source code is available here: https://github.com/inkyusa/weedNet). They trained the
model by varying the number of channels in the input images. They then compared the
inference speed and accuracy of different arrangements by deploying the model on an
49
2.10. Deep Learning Architecture
embedded GPU system, which was carried out by a small micro aerial vehicle (MAV).
Umamaheswari and Jain (2020) compared the performance of SegNet-512 and SegNet-
256 encoder-decoder architectures for semantic segmentation of weeds in crop plants. The
experiment proved that SegNet-512 was better for classification. In the study of Di Cicco
et al. (2017), the SegNet model was trained using synthetic data, and the performance
was evaluated on a real crop and weed dataset.
Lottes et al. (2018b) also proposed FCN architecture using DenseNet as a baseline
model. Their novel approach provided a pixel-wise semantic segmentation of crop plants
and weeds. The work of Lottes et al. (2020) proposed a task-specific decoder network.
As the plants were sown at a regular distance, they trained the model in a way so that
the model could learn the spatial plant arrangement from the image sequence. They then
fused this sequential feature with the visual features to localise and classify weeds in crop
plants. Dyrmann et al. (2017) used FCN architecture not only for segmentation but also
for generating bounding boxes around the plants. They applied pre-trained GoogLeNet
architecture as the base model.
According to A. Wang et al. (2020), changes in the input representation could make a
difference in classification performance. They employed the encoder-decoder deep learn-
ing network for semantic segmentation of crop and weed plants by initialising the input
layers with pre-trained weights. They evaluated the model with different input repre-
sentation by including NIR information with colour space transformation on the input,
which improved crop-weed segmentation and classification accuracy (96%). Sa et al.
(2018) also evaluated different input representation to train the network. They applied
50
Chapter 2. Literature Review
VGG-16 based SegNet architecture for detecting background, crop plants and weeds.
The model was evaluated by varying the number of spectral bands and changing the
hyper-parameters. The experimental results showed that the model achieved far better
accuracy by using nine spectral channels of an image rather than the RGB image.
H. Huang et al. (2018a) stated that the original FCN-4s architecture was designed for
PASCAL VOC 2011 dataset, which had 1000 classes of objects. However, their dataset
had only three categories (i.e., rice, weeds, and others). As a result they reduced the
feature maps of the intermediate layers to 2048. They then compared the accuracy
and efficiency of the model with original FCN-8s and DeepLab architecture and proved
that the modified FCN-4s model performed better. For the same reason, Bosilj et al.
(2020) simplified the original architecture of SegNet and named it as SegNet-Basic. They
decreased the number of convolutional layers from 13 to 4.
One of the problems with the basic architecture of FCN is that the spatial features
can not be recovered properly. The prediction accuracy can be decreased due to this
issue. To address this problem, H. Huang et al. (2020) improved the model by adding
skip architecture (SA), fully connected conditional random fields and partially connected
conditional random fields. They fine-tuned AlexNet, VGGNet, GoogLeNet, and ResNet
based FCN. They then compared the performance of different FCNs and Object-based
image analysis (OBIA) method. Experimental results reported that the VGGNet-based
FCN with proposed improvements achieved the highest accuracy.
Brilhador et al. (2019) modified the original U-Net architecture for pixel-level classi-
fication of crop plants and weeds. They added a convolutional layer with a kernel size of
1×1. For that change, they adjusted the input size of the network. Besides, replacing the
ReLU activation functions with the Exponential Linear Unit (ELU), they used adadelta
optimiser algorithm instead of the stochastic gradient descent and included dropout lay-
ers in between convolutional layers. Petrich et al. (2019) also modified the U-Net model
to detect one species of weed in grasslands.
51
2.10. Deep Learning Architecture
Hu et al. (2020) proposed Graph Weeds Net (GWN). GWN is a graph-based deep
learning architecture to classify weed species. Hu et al. (2020) used ResNet-50 and
DenseNet-202 model to learn vertex features with graph convolution layers, vertex-wise
dense layers, and the multi-level graph pooling mechanisms included in GWN architec-
ture. Here, an RGB image was represented as a multi-scale graph. The graph-based
model with DenseNet-202 architecture achieved the classification accuracy of 98.1%.
Hybrid architectures are those where the researchers combine the characteristics of
two or more DL models. For instance, Chavan and Nandedkar (2018) proposed the
AgroAVNET model, which was a hybrid of AlexNet and VGGNet architecture. They
chose VGGNet for setting the depth of filters and used the normalisation concept of
AlexNet. They then compared the performance of the AgroAVNET network with the
original AlexNet and VGGNet and their different variants. All the parameters were
initialised using pre-trained weights except for the third layer of the fully connected
layers. They initialised that randomly. The AgroAVNET model outperformed others
with a classification accuracy of 98.21%. However, Farooq et al. (2019) adopted the
feature concatenation approach in their research. They combined a super pixel-based
LBP (SPLBP) method to extract local texture features, CNN for learning the spatial
features and SVM for classification. They compared their proposed FCN-SPLBP model
with CNN, LBP, FCN, and SPLBP architectures.
52
Chapter 2. Literature Review
Espejo-Garcia et al. (2020) also replaced CNN’s default classifier with traditional ML
classifiers including SVM, XGBoost, and Logistic Regression. They initialised Xception,
Inception-ResNet, VGGNets, MobileNet, and DenseNet model with pre-trained weights.
The experimental result showed that the best performing network was DenseNet model
with the SVM classifier. The micro F1 score for the architecture was 99.29%. This
research also reported that with a small dataset, network performance could be enhanced
using this approach.
53
2.11. Performance Evaluation Metrics
As Table 2.6 shows, it is not easy to compare the related works as different types of
evaluation metrics are employed depending on the DL model, the goal of classification,
dataset and detection approach. However, the most frequently used evaluation metrics
are CA, F1 score and mIoU. In the case of classifying plant species, researchers prefer to
use confusion metrics to evaluate the model.
In addition to the evaluation metrics provided in Table 2.6, Milioto et al. (2017) jus-
tified their model based on run-time. This was because, to develop a real-time weeds and
crop plants classifier, it is important to identify the class of a plant as quickly as possible.
They showed how quickly their model could detect a plant in an image. Similarly, Suh
et al. (2018) calculated the classification accuracy of their model along with the time
required to train and identify classes of plants, as they intended to develop a real-time
classifier. Ma et al. (2019) also used run-time for justifying the model performance. They
found that, by increasing the patch size of the input images, it was possible to reduce
the time required to train the model. Another research method used inference time to
compare different DL architecture (H. Huang et al., 2018b). dos Santos Ferreira et al.
(2017) evaluated the CNN model not only based on time but also in terms of the memory
consumed by the model during training. They argued that though the CNN architecture
achieved higher accuracy than other machine learning model, it required more time and
memory to train the model. Andrea et al. (2017) showed that reducing the number of
layers of the DL model could make it faster in detecting and identifying the crop and
54
Chapter 2. Literature Review
Table 2.6: The evaluation metrics applied by different researchers of the related works
weed plants. They also used processing time as an evaluation criterion while choosing
the CNN architecture.
55
2.12. Discussion
2.12 Discussion
It is evident that the DL model offers high performance in the area of weed detection
and classification in crops. In this paper, we have provided an overview of the current sta-
tus of the area of the automatic weed detection technique. In most relevant studies, the
preferred method to acquire data was using a digital camera mounted on a ground vehicle
to collect RGB images. A few research studies collected multi-spectral or hyper-spectral
data. To prepare the dataset for training, different image processing techniques were
used to resize the images, background and noise removing and image enhancement. The
datasets were generally annotated using bounding boxes, pixel-wise and image level an-
notation approaches. For training the model, supervised learning approaches are applied
by the researchers. They employ different DL techniques to find a better weed detection
model. Detection accuracy is given as the most important parameter to evaluate the
performance of the model.
Nevertheless, there is still room for improvements in this area. Use of emerging
technologies can help to improve the accuracy and speed of automatic weed detection
systems. As crop and weed plants have many similarities, the use of other spectral
indices can improve the performance.
However, there is a lack of large datasets for crops and weeds. It is necessary to
construct a large benchmark dataset by capturing a variety of crops/weeds from different
geographical locations, weather conditions and at various growth stages of crops and
weeds. At the same time, it will be expensive to annotate these large datasets. Semi-
supervised (Chapelle et al., 2009; X.-Y. Zhang et al., 2019) or weakly supervised (Durand
et al., 2017; Zhou, 2018) approaches could be employed to address this problem.
Moreover, Generative Adversarial Network (GAN) (Ledig et al., 2017) or other syn-
thetic data generation techniques can contribute to creating a large dataset. Random
point generation and polygon labelling can further improve the precision of automatic
weed detection systems. DL is evolving very fast, and new state-of-art techniques are
being proposed. In addition to developing new solutions, researchers can enhance and
apply those methods in the area of weed detection. They can also consider using weakly
supervised, self-supervised or unsupervised approaches like multiple instance learning,
56
Chapter 2. Literature Review
Furthermore, most datasets mentioned in this paper exhibit class imbalance, which
may create biases and lead to over-fitting of the model. Future research needs to ad-
dress the problem. This can be achieved via the use of appropriate data redistribution
approaches, cost-sensitive learning approaches (Khan et al., 2017), or class balancing
classifiers (Bi & Zhang, 2018; Taherkhani et al., 2020).
2.13 Conclusion
57
2.13. Conclusion
Other potential future research directions include the need for large generalised datasets,
tailored machine learning models in weed-crop settings, addressing the class imbalance
problems, identifying the growth stage of the weeds, as well as thorough field trials for
commercial deployments.
58
Chapter 3
Weed classification
Most weed species can adversely impact agricultural productivity by competing for
nutrients required by high-value crops. Manual weeding is not practical for large cropping
areas. Many studies have been undertaken to develop automatic weed management sys-
tems for agricultural crops. In this process, one of the major tasks is to recognise the weeds
from images. However, weed recognition is a challenging task. It is because weed and crop
plants can be similar in colour, texture and shape which can be exacerbated further by the
imaging conditions, geographic or weather conditions when the images are recorded. Ad-
vanced machine learning techniques can be used to recognise weeds from imagery. In this
paper, we have investigated five state-of-the-art deep neural networks, namely VGG16,
ResNet-50, Inception-V3, Inception-ResNet-v2 and MobileNetV2, and evaluated their
performance for weed recognition. We have used several experimental settings and mul-
tiple dataset combinations. In particular, we constructed a large weed-crop dataset by
combining several smaller datasets, mitigating class imbalance by data augmentation, and
using this dataset in benchmarking the deep neural networks. We investigated the use
of transfer learning techniques by preserving the pre-trained weights for extracting the
features and fine-tuning them using the images of crop and weed datasets. We found that
VGG16 performed better than others on small-scale datasets, while ResNet-50 performed
better than other deep networks on the large combined dataset.
This chapter has been published: Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G.
(2022). Weed recognition using deep learning techniques on class-imbalanced imagery. Crop and Pasture
Science.
3.1. Introduction
3.1 Introduction
Weeds in crops compete for water, nutrients, space and light, and may decrease prod-
uct quality (Iqbal et al., 2019). Their control, using a range of herbicides, constitutes a
significant part of current agricultural practices. In Australia weed control costs in grain
production is estimated at $4.8 billion per annum. These costs include weed control and
the cost of lost production (McLeod, 2018).
The most widely used methods for controlling weeds are chemical-based, where her-
bicides are applied at an early growth stage of the crop (Harker & O’Donovan, 2013;
López-Granados, 2011). Although the weeds spread in small patches in crops, herbicides
are usually applied uniformly throughout the agricultural field. While such an approach
works reasonably well against weeds, it also affects the crops. A report from the Eu-
ropean Food Safety Authority (EFSA) shows that most of the unprocessed agricultural
produces contain harmful substances originating from herbicides (Medina-Pastor & Tri-
acchini, 2020).
Recommended rates of herbicide application are expensive and may also be detrimen-
tal to the environment. Thus, new methods that can be used to identify weeds in crops,
and then selectively apply herbicides on the weeds, or other methods to control weeds,
will reduce production costs to the farmers and benefit the environment. Technologies
that enable the rapid discrimination of weeds in crops are now becoming available (H.
Tian et al., 2020).
Recent advances in Deep Learning (DL) have revolutionised the field of Machine
Learning (ML). DL has made a significant impact in the area of computer vision by
learning features and tasks directly from audio, images or text data without human
intervention or predefined rules (Dargan et al., 2019). For image classification, DL meth-
ods outperform humans and traditional ML methods in accuracy and speed (Steinberg,
2017). In addition, the availability of computers with powerful GPUs, coupled with the
availability of large amounts of labelled data, enable the efficient training of DL models.
As for other computer vision and image analysis problems, digital agriculture and
digital farming also benefits from the recent advances in deep learning. Deep learning
techniques have been applied for weed and crop management, weed detection, localisa-
60
Chapter 3. Weed classification
tion and classification, field conditions and livestock monitoring (Kamilaris & Prenafeta-
Boldú, 2018).
Further development of autonomous weed control systems can be beneficial both eco-
nomically and environmentally. Labour costs can be reduced by using a machine to
identify and remove weeds. Selective spraying can also minimise the amount of herbi-
cides applied (Lameski et al., 2018). The success of an autonomous weed control system
will depend on four core modules: (i) weed detection and recognition, (ii) mapping, (iii)
guidance and (iv) weed control (Olsen et al., 2019). This paper focuses on the first
module: weed detection and recognition, which is a challenging task (Slaughter et al.,
2008). This is because both weeds and crop plants often exhibit similar colours, textures
and shapes. Furthermore, the visual properties of both weeds and crop plants can vary
depending on the growth stage, lighting conditions, environments and geographical lo-
cations (Hasan et al., 2021; Jensen et al., 2020b). Also, weeds and crops, exhibit high
inter-class similarity as well as high intra-class dissimilarity. The lack of large-scale crop
weed datasets is a fundamental problem for DL-based solutions.
There are many approaches to recognise weed and crop classes from images (Wäld-
chen & Mäder, 2018). High accuracy can be obtained for weed classification using Deep
Learning (DL) techniques (Kamilaris & Prenafeta-Boldú, 2018) whereas Chavan and
Nandedkar (2018) used Convolutional Neural Network (CNN) models to classify weeds
and crop plants. Teimouri et al. (2018) used DL for the classification of weed species and
61
3.1. Introduction
the estimation of growth stages, with an average classification accuracy of 70% and 78%
for growth stage estimation.
As a general rule, the accuracy of the methods used for the classification of weed
species decreases in multi-class classification when the number of classes is large (Dyr-
mann et al., 2016; Peteinatos et al., 2020). Class-imbalanced datasets also reduce the
performance of DL-based classification techniques because of overfitting (Ali-Gombe &
Elyan, 2019). This problem can be addressed using data-level and algorithm-level meth-
ods. Data-level methods include oversampling or undersampling of the data. In contrast,
algorithm-level methods work by modifying the existing learning algorithms to concen-
trate less on the majority group and more on the minority classes. The cost-sensitive
learning approach is one such approach (Khan et al., 2017; Krawczyk, 2016).
DL techniques have been used extensively for weed recognition, for example Hasan et
al. (2021) have provided a comprehensive review of these techniques. dos Santos Ferreira
et al. (2017) compared the performance of CNN with Support Vector Machines (SVM),
Adaboost – C4.5, and Random Forest models for discriminating soybean plants, soil,
grass, and broadleaf weeds. This study shows that CNN can be used to classify images
more accurately than other machine learning approaches. Nkemelu et al. (2018) report
that CNN models perform better than SVM and K-Nearest Neighbour (KNN) algorithms.
Transfer learning is an approach that uses the learned features on one problem or
data domain for another related problem. Transfer learning mimics classification used
by humans, where a person can identify a new thing using previous experience. In deep
learning, pre-trained convolutional layers can be used as a feature extractor for a new
dataset (Shao et al., 2014). However, most of the well-known CNN models are trained
on ImageNet datasets, which contains 1000 classes of objects. That is why, depending on
the number of classes in the desired dataset, only the classification layer (fully connected
layer) of the models need to be trained again in the transfer learning approach. Suh et
al. (2018) applied six CNN models (AlexNet, VGG-19, GoogLeNet, ResNet-50, ResNet-
101 and Inception-v3) pre-trained on the ImageNet dataset to classify sugar beet and
volunteer potatoes. They reported that these models can achieve a classification accuracy
of about 95% without retraining the pre-trained weights of the convolutional layers. They
also observed that the models’ performance improved significantly by fine-tuning the pre-
62
Chapter 3. Weed classification
trained weights. In the fine-tuning approach, the convolutional layers of the DL models
are initialised with the pre-trained weights, and subsequently during the training phase
of the model, those weights are retrained for the desired dataset. Instead of training a
model from scratch, initialising it with pre-trained weights and fine-tuning them helps the
model to achieve better classification accuracy for a new target dataset, and this also saves
training time (Gando et al., 2016; Girshick et al., 2014; Hentschel et al., 2016). Olsen
et al. (2019) fine-tuned the pre-trained ResNet-50 and Inception-V3 models to classify
nine weed species in their study and achieved an average accuracy of 95.7% and 95.1%
respectively. In another study, VGG16, ResNet-50 and Inception-V3 pre-trained models
were fine-tuned to classify the weed species found in the corn and soybean production
system (A. Ahmad et al., 2021). The VGG16 model achieved the highest classification
accuracy of 98.90% in their research.
• construction of a large data set by combining four small-scale datasets with a variety
of weeds and crops.
• addressing the class imbalance issue of the combined dataset using the data aug-
mentation technique.
• evaluating the efficiency of the pre-trained models on the combined dataset using
the transfer learning and fine-tuning approach.
63
3.2. Materials and Methods
This paper is organised as follows: Section 3.2 describes the materials and methods,
including datasets, pre-processing approaches of images, data augmentation techniques,
DL architectures and performance metrics. Section 3.3 covers the experimental results
and analysis, and Section 3.4 concludes the paper.
3.2.1 Dataset
In this work, four publicly available datasets were used: the “DeepWeeds dataset”(Olsen
et al., 2019), the “Soybean weed dataset” (dos Santos Ferreira et al., 2017), the “Cotton-
tomato and weed dataset” (Espejo-Garcia et al., 2020) and the “Corn weed dataset” (H.
Jiang et al., 2020).
The DeepWeeds dataset contains images of eight nationally significant species of weeds
collected from eight rangeland environments across northern Australia. It also includes
another class of images that contain non-weed plants. These are represented as a negative
class. In this research, the negative image class was not used as it does not have any
weed species. The images were collected using a FLIR Blackfly 23S6C high-resolution
(1920 × 1200 pixel) camera paired with the Fujinon CF25HA-1 machine vision lens
(Olsen et al., 2019). The dataset is publicly available through the GitHub repository:
https://github.com/AlexOlsen/DeepWeeds.
dos Santos Ferreira et al. (2017) acquired soybean, broadleaf, grass and soil images
from Campo Grande in Brazil. We did not use the images from the soil class as they
did not contain crop plants or weeds. dos Santos Ferreira et al. (2017) used a “Sony
EXMOR” RGB camera mounted on an Unmanned Aerial Vehicle (UAV - - DJI Phantom
3 Professional). The flights were undertaken in the morning (8 to 10 am) from December
2015 to March 2016 with 400 images captured manually at an average height of four
64
Chapter 3. Weed classification
meters above the ground. The images of size 4000 × 3000 were then segmented using
the Simple Linear Iterative Clustering (SLIC) superpixels algorithm (Achanta et al.,
2012) with manual annotation of the segments to their respective classes. The dataset
contained 15336 segments of four classes. This dataset is publicly available at the website:
https://data.mendeley.com/datasets/3fmjm7ncc6/2.
This dataset was acquired from three different farms in Greece, covering the south-
central, central and northern areas of Greece. The images were captured in the morning
(8 to 10 am) from May 2019 to June 2019 to ensure similar light intensities. The images
of size 2272 × 1704 were taken manually from about one-meter height using a Nikon
D700 camera (Espejo-Garcia et al., 2020). The dataset is available through the GitHub
repository: https://github.com/AUAgroup/early-crop-weed.
This dataset was taken from a corn field in China. A total of 6000 images were
captured using a Canon PowerShot SX600 HS camera placed vertically above the crop.
To avoid the influence of illumination variations from different backgrounds, the images
were taken under various lighting conditions. The original images were large (3264 ×
2448), and these were subsequently resized to a resolution of 800 × 600 (H. Jiang et
al., 2020). The dataset is available at the Github: https://github.com/zhangchuanyin/
weed-datasets/tree/master/corn%20weed%20datasets.
In this paper, we combine all these datasets to create a single large dataset with weed
and crop images sourced from different weather and geographical zones. This has created
extra variability and complexity in the dataset with a large number of classes. This is
also an opportunity to test the DL models and show their efficacy in complex settings.
We used this combined dataset to train the classification models. Table 3.1 provides a
summary of the dataset used. The combined dataset contains four types of crop plants
65
3.2. Materials and Methods
and sixteen species of weeds. The combined dataset is highly class-imbalanced since 27%
of images are from the soybean crop, while only 0.2% of images are from the cotton crop
(Table 3.1).
Another set of data was collected from the Eden Library website (https://edenlibrary.
ai/) for this research. The website contains some plant datasets for different research
work that use artificial intelligence. The images were collected under field conditions.
We used images of five different crop plants from the website namely: Chinese cabbage
(142 images), grapevine (33 images), pepper (355 images), red cabbage (52 images) and
zucchini (100 images). In addition, we also included 500 images of lettuce plants (H.
Jiang et al., 2020) and 201 images of radish plants (Lameski et al., 2017) in the combined
dataset. This dataset was then used to evaluate the performance of the transfer learning
approach. This experiment checks the reusability of the DL models in the case of a new
dataset.
Table 3.1: Summary of crop and weed datasets used in this research
% of images in
Number
Dataset Location Crop/weed species the class in the
of images
combined dataset
Chinee apple 1126 4.17
Lantana 1063 3.94
Parkinsonia 1031 3.82
Parthenium 1022 3.78
DeepWeeds Australia Weed
Prickly acacia 1062 3.93
Rubber vine 1009 3.74
Siam weed 1074 3.98
Snakeweed 1016 3.76
Crop Soybean 7376 27.31
Soybean Weed Brazil Broadleaf 1191 4.41
Weed
Grass 3526 13.06
Cotton 54 0.20
Crop
Cotton Tomato Tomato 201 0.74
Greece
Weed Black nightshade 123 0.46
Weed
Velvet leaf 130 0.48
Crop Corn 1200 4.44
Blue Grass 1200 4.44
Corn Weed China Chenopodium album 1200 4.44
Weed
Cirsium setosum 1200 4.44
Sedge 1200 4.44
66
Chapter 3. Weed classification
In the study, the images of each class were randomly assigned for training (60%),
validation (20%) and testing (20%). Each image was labelled with one image-level anno-
tation. This means that each image has only one label, i.e., the name of the weed or crop
classes, e.g., Chinee apple or corn. Figure 3.1 provides sample images in the dataset.
(e) Prickly acacia (f) Rubber vine (g) Siam weed (h) snakeweed
(m) Tomato (n) Black nightshade (o) Velvet leaf (p) Corn
(q) Blue grass (r) Chenopodium album (s) Cirsium setosum (t) Sedge
Figure 3.1: Sample crop and weed images of each class from the datasets.
67
3.2. Materials and Methods
Some level of image pre-processing is needed before the data can be used as input
for training the DL model. This includes resizing the images, removing the background,
enhancing and denoising the images, colour transformation, morphological transformation
etc. In this study, the Keras pre-processing utilities (Chollet et al., 2015) were used to
prepare the data for training. This function applies some predefined operations to the
data. One of the operations is to increase the dimension of the input. DL models process
images in batches. To create the batches of images, additional dimension resizing is
needed. An image contains three properties, e.g., image height, width and the number of
channels. The pre-processing function adds a dimension to the image for inclusion in the
batch information. Pre-processing involves normalising the data so that the pixel values
range is from 0 to 1. Each model has a specific pre-processing technique to transform a
standard image into an appropriate input. Research suggests that the classification model
performance is improved by increasing the input resolution of the images (Sabottke &
Spieler, 2020; Sahlsten et al., 2019). However, the model’s computational complexity also
increases with a higher resolution of the input image. The default input resolution for all
the models used in this research is 224 × 224.
The combined dataset is highly class-imbalanced. The minority classes are over-
sampled using image augmentation to balance the dataset. The augmented data is only
used to train the models. Image augmentation is done using the Python image processing
library Scikit-image (Van der Walt et al., 2014). After splitting the dataset into training,
validation and testing sets, most training images were from soybean with 4,425 image.
By applying augmentation approaches, we obtained 4425 images for all other weed and
crop classes; thus we ensured that all classes were balanced. The following operations
were applied randomly to the data to generate the augmented images:
68
Chapter 3. Weed classification
The models are then trained on both actual data and augmented data without making
any discrimination.
Five state-of-the-art deep learning models with pre-trained weights were used in this
research to classify images. These models were made available via the Keras Application
Programming Interface (API) (Chollet et al., 2015). TensorFlow (Abadi et al., 2016) was
used as a machine learning framework. The selected CNN architectures were:
• VGG16 (Simonyan & Zisserman, 2014) uses a stack of convolutional layers with a
very small receptive field (3 × 3). It was the winner of ImageNet Challenge 2014
in the localisation track. The architecture consists of a stack of 13 convolutional
layers, followed by three fully connected layers. A very small receptive field (3 ×
3) is used in the convolutional layers. The network fixes the convolutional stride
and padding to 1 pixel. Spatial pooling is carried out by the max-pooling layers.
However, only five of the convolutional layers are followed by the max-pooling layer.
This actual state-of-the-art VGG16 model has 138,357,544 trainable parameters.
Of these, about 124 million parameters are contained in the fully connected layers.
Those layers were customised in this research.
• ResNet-50 (K. He et al., 2016) is deeper than VGG16 but has a lower compu-
tational complexity. Generally, with increasing depths of the network, the perfor-
mance becomes saturated or degraded. The model uses residual blocks to maintain
accuracy with the deeper network. The residual blocks also contain convolutions
layers like VGG16. The model uses batch normalisation after each convolutional
69
3.2. Materials and Methods
layer and before the activation layer. The model explicitly reformulates the layers
as residual functions with reference to the input layers and skip connections. Al-
though the model contains more layers than VGG16, it only has 25,636,712 trainable
parameters.
• Inception-V3 (Szegedy et al., 2016) uses a deeper network with fewer training
parameters (23,851,784). The model consists of symmetric and asymmetric building
blocks with convolutions, average pooling, max pooling, concats, dropouts, and fully
connected layers.
All the models were initialised with pre-trained weights trained on the ImageNet
dataset. As the models were trained to recognise 1000 different objects, the original
architecture was slightly modified to classify twenty crops and weed species. The last
fully-connected layer of the original model was replaced by a global average pooling
layer followed by two dense layers with 1024 neurons and “ReLU” activation function.
The output contained another dense layer where the number of neurons depended on
70
Chapter 3. Weed classification
the number of classes. The softmax activation function was used in the output layer
since the models were multi-class classifiers. The size of the input was 256×256×3, and
the batch size was 64. The maximum number of epochs for training the models was
100. However, often the training was completed before reaching the maximum number.
The initial learning rate was set to 1 × 10-4 and is randomly decreased down to 10-6 by
monitoring the validation loss in every epoch. Table 3.2 shows the number of parameters
of each of the models used in this research without the output layer. It was found that
the Inception-Resnet-V2 model has the most parameters, and the MobileNetV2 model
has the least.
VGG16 16,289,600
ResNet-50 26,735,488
Inception-V3 24,950,560
Inception-ResNet-V2 56,960,224
MobileNetV2 4,585,216
71
3.2. Materials and Methods
the pre-trained weights of the layer in the feature extractor. This process eliminates the
potential issue of training the complete network on a large number of labelled images.
However, in the fine-tuning approach (Figure 3.2c), the weights in the feature extractor
were initialised from the pre-trained model, but not fixed. During the training phase of the
model, the weights were retrained together with the classifier part. This process increased
the efficiency of the classifier because it was not necessary to train the whole model
from scratch. The model can extract discriminating features for the target dataset more
accurately. Our experiments used both approaches and evaluated their performance on
the crop and weed image dataset. Finally, we trained one state-of-the-art DL architecture
from scratch, using our combined dataset (Section 3.2.1.5) and used its feature extractor
to classify the images in an unseen test dataset (Section 3.2.1.6) using the transfer learning
approach. The performance of the pre-trained state-of-the-art model was then compared
with the model trained on the crop and weed dataset.
The models were tested and thoroughly evaluated using several metrics: accuracy,
precision, recall, and F1 score metrics, which are defined as follows:
• Accuracy (Acc): It is the percentage of images whose classes are predicted cor-
rectly among all the test images. A higher value represents a better result.
• Precision (P): The fraction of correct prediction (True Positive) from the total
number of relevant result (Sum of True Positives and False Positives).
• Recall (R): The fraction of True Positive from the sum of True Positive and False
Negative (number of incorrect predictions).
• F1 Score (F1): The harmonic mean of precision and recall. This metric is useful
to measure the performance of a model on a class-imbalanced dataset.
72
Chapter 3. Weed classification
Figure 3.2: The basic block diagram of DL models used for the experiments.
to visualise how well the classification model is performing and what prediction
errors it is making.
We conducted five sets of experiments on the data. Table 3.3 shows the number of
images used for training, validation and testing of the models. Augmentation was applied
to generate 4,425 images for each of the classes. However, only actual images were used to
validate and test the models. All the experiments were done on a desktop computer, with
73
3.3. Results and Discussions
an Intel(R) Core(TM) i9-9900X processor, 128 gigabyte of RAM and a NVIDIA GeForce
RTX 2080 Ti Graphics Processing Unit (GPU). We used the Professional Edition of the
Windows 10 operating system. The deep learning models were developed using Python
3.8 and Tensorflow 2.4 framework.
Table 3.3: The numbers of images used to train (after augmentation), validate and test
the models.
In this experiment, we trained the five models separately on each dataset using only
actual images (see Table 3.3). Both transfer learning (TL) and fine-tuning (FT) ap-
proaches were used to train the models. Table 3.4 shows the training, validation and
74
Chapter 3. Weed classification
Table 3.4: Training, validation and testing accuracy for classifying crop and weed
species of all four datasets using different DL models.
On the DeepWeeds dataset, the VGG16 model achieved the highest training, valida-
tion and testing accuracy (98.43%, 83.84% and 84.05% respectively) using the transfer
learning approach. The training accuracy of the other four models was above 81%. How-
ever, the validation and testing accuracy for those models were less than 50%. This
suggests that the models are overfitting. After fine-tuning the models, the overfitting
problem was mitigated except for the MobileNetV2 architecture. Although four of the
models achieved 100% training accuracy after fine-tuning, the validation and testing
accuracy was between 86% and 94%. MobileNetV2 model still overfitted even after fine-
tuning with about 32% validation and testing accuracy. Overall, the VGG16 model gave
the best results for the DeepWeeds dataset as they had the least convolutional layers,
which was adequate for small datasets. It should be noted that Olsen et al. (2019),
who initially worked on this dataset, achieved an average classification accuracy of 95.1%
75
3.3. Results and Discussions
and 95.7% using Inception-V3 and ResNet-50, respectively. However, they applied data
augmentation techniques to overcome the variable nature of the dataset.
On the Corn Weed and Cotton Tomato Weed datasets, the VGG16 and ResNet-50
models generally gave accurate result, but the accuracy of validation and testing were
low for the DL models using the transfer learning approach for both datasets, and the
classification performance of the models was substantially improved after fine-tuning.
Among the five models, the retrained Inception-ResNet-V2 model gave better results for
the Corn Weed dataset with training, validation and testing accuracy of 100%, 99.75%
and 99.33% respectively. The ResNet-50 model accurately classified the images of the
Cotton Tomato Weed dataset.
According to the results of this experiment, as shown in Table 3.4, it can be concluded
that, for classifying the images of crop and weed species dataset, the transfer learning
approach does not work well. Since the pre-trained models were trained on the “ImageNet”
dataset (Deng et al., 2009), which does not contain images of crop or weed species, the
models cannot accurately classify weed images.
In the previous experiment, we showed that it was unlikely to achieve better classi-
fication results using pre-trained weights for the convolutional layers of the DL models.
The image classification accuracy improved by fine-tuning the weights of the models for
the crop and weed dataset. For that reason, in this experiment, all the models were ini-
tialised with pre-trained weights and then retrained for the dataset. In this experiment,
the datasets were paired up and used to generate six combinations to train the models.
76
Chapter 3. Weed classification
The training, validation and testing accuracies are shown in Table 3.5. The combinations
were-
After fine-tuning the weights, all the DL models reached 100% training accuracy. The
accuracy of the DL architectures also gave better validation and testing results when
trained with CW-CTW, CW-SW, CTW-SW combined datasets. However, the models
overfitted when trained on the “DeepWeeds” dataset and combined with any of the other
three datasets.
Table 3.5: Training, validation and testing accuracy of the DL models after training by
combining two of the datasets
The results of the confusion matrix are provided in Figure 3.3. We found that chinee
apple, lantana, prickly acacia and snakeweed had a high confusion rate. This result
77
3.3. Results and Discussions
agrees with that of Olsen et al. (2019). Visually, the images were quite similar and so
were difficult to distinguish. That is why the DL model also failed to detect those. Since
the dataset was small and did not have enough variations among the images, the models
were not able to distinguish among the classes. The datasets also lacked enough images
taken under different lighting conditions. The models were unable to detect the actual
class of the images because of the illumination effects.
For the DW-CW dataset, the VGG16 model was more accurate. In this case, the
model did not distinguish between chinee apple and snakeweed. As shown in the con-
fusion matrix in Figure 3.3a, out of 224 test images of chinee apple, 16 were classified
as snakeweed, and 23 of the 204 test images of snakeweed identified as chinee apple. A
significant number of chinee apple and snakeweed images were not correctly predicted
by the VGG16 model (see Figure 3.3b). For the DW-SW dataset, the ResNet-50 model
achieved 100% training, 97.68% validation and 97.42% testing accuracy. The confusion
matrix is shown in Figure 3.3c. The ResNet-50 model identified 13 chinee apple images
as snakeweed, and the same number of snakeweed images were classified as chinee apple.
The model also identified 9 test images of snakeweed as lantana. Figure 3.4 shows some
sample images which the models classified incorrectly.
By applying data augmentation techniques, one can create more variations among the
classes which may also help the model to learn more discriminating features.
In this experiment, all the datasets were combined to train the deep learning models.
Classifying the images of the combined dataset is much more complex, as the data is
highly class-imbalanced. The models were initialised with pre-trained weights and then
fine-tuned. Table 3.6 shows the training, validation and testing accuracy and average
precision, recall, and F1 scores achieved by the models on the test data.
After training the models with the combined dataset, the ResNet-50 model performed
better. Though all the models except VGG16 achieved 100% training accuracy, the vali-
dation (97.83%) and testing (98.06%) accuracies of ResNet-50 architecture were higher.
78
Chapter 3. Weed classification
(a) Confusion matrix for DW-CW dataset (using VGG16 (b) Confusion matrix for DW-CTW dataset (using
model) VGG16 model)
Figure 3.3: Confusion matrix of “DeepWeeds” combined with other three dataset.
The average precision, recall and F1 score also verified these results. However, the mod-
els still did not correctly classify the chinee apple and snakeweed species mentioned in
the previous experiment (Section 3.3.2). A confusion matrix for predicting the classes of
images using ResNet-50 is shown in Figure 3.5. The confusion of ResNet-50 is chosen,
since the highest accuracy is achieved in this experiment using this model. Seventeen
chinee apple images were classified as snakeweed, and fifteen snakeweeds images were
79
3.3. Results and Discussions
(a) Chinee apple predicted (b) snakeweed predicted as (c) Lantana predicted as (d) Prickly acacia
as snakeweed chinee apple prickly acacia predicted as lantana
Table 3.6: The performance of five deep learning models after training with the
combined dataset
classified incorrectly as chinee apple. In addition, the model also incorrectly classified
some lantana and prickly acacia weed images. To overcome this classification problem,
both actual and augmented data were used in the following experiment.
Augmented data were used together with the real data in the training phase to address
the misclassification problem in the previous experiment (Section 3.3.3). All the weed
species and crop plant images had the same training data for this experiment. The models
were initialised with pre-trained weights, and all the parameters were fine-tuned. Table
3.7 shows the result of this experiment.
From Table 3.7, we can see that the training accuracy for all the DL models is 100%.
Also the validation and testing accuracies were reasonably high. In this experiment,
the ResNet-50 models achieved the highest precision, recall and F1 score for the test
data. Figure 3.6 shows the confusion matrix for the ResNet-50 model. We compared the
80
Chapter 3. Weed classification
Figure 3.5: Confusion matrix after combining four dataset using ResNet-50 model
Table 3.7: Performance of five deep learning models after training with the real and
augmented data
performance of the model using the confusion matrix with the previous experiment. The
performance of the model was improved using both actual and augmented data. The
81
3.3. Results and Discussions
classification accuracy increased for chinee apple, lantana, prickly acacia and snakeweed
species by 2%.
Figure 3.6: Confusion matrix for ResNet-50 model using combined dataset with
augmentation
In this research, the ResNet-50 model attained the highest accuracy using actual and
augmented images. The Inception-ResNet-V2 model gave similar results. The explana-
tion is that both of the models used residual layers. Residual connections help train a
deeper neural network with better performance and reduced computational complexity.
A deeper convolutional network works better when trained using a large dataset (Szegedy
et al., 2017). Since we have used the augmented data and actual images, the dataset size
has increased by several times.
82
Chapter 3. Weed classification
In this experiment, we used two ResNet-50 models. The first was trained on our com-
bined dataset with actual and augmented data (Sec. 3.2.1.5). Here, the top layers were
removed from the model and a global average pooling layer and three dense layers were
added as before. Other than the top layers, all the layers used pre-trained weights, which
were not fine-tuned. This model termed as “CW ResNet-50”. The same arrangement was
used for the pre-trained ResNet-50 model, which was instead trained on the ImageNet
dataset. It was named as “SOTA ResNet-50” model for further use. We trained the top
layers of both models using the training split of the Unseen Test Dataset (3.2.1.6). Both
models were tested using the test split of the Unseen Test Dataset. The confusion matrix
for CW ResNet-50 and SOTA ResNet-50 model is shown in Figure 3.7.
(a) Confusion matrix showing the classification (b) Confusion matrix showing the classification
accuracy of CW ResNet-50 model accuracy of CW ResNet-50 model
Figure 3.7: Confusion matrix for CW ResNet-50 and SOTA ResNet-50 model.
We can see in Figure 3.7 that the performance of the two models is very similar. The
“SOTA ResNet-50” model detected all the classes of crop and weeds accurately. However,
the pre-trained “CW Resnet-50” model only identified two images incorrectly. As the
“SOTA ResNet-50” model was trained on a large dataset containing millions of images,
it detected the discriminating features more accurately. On the other hand, the “CW
83
3.4. Conclusion
Resnet-50” model was only trained on 88,500 images. If this model were trained with
more data, it is probable that it would be more accurate using the transfer learning
approach. This type of pre-trained model could be used for classifying the images of new
crop and weed datasets, which would eventually make the training process faster.
3.4 Conclusion
This study was undertaken on four image datasets of crop and weed species collected
from four different geographical locations. The datasets contained a total of 20 different
species of crops and weeds. We used five state-of-the-art CNN models, namely VGG16,
ResNet-50, Inception-V3, Inception- ResNet-V2, MobileNetV2, to classify the images of
these crops and weeds.
Another finding was that using the transfer learning method was that in most cases
the models did not achieve the desired accuracy. As ResNet-50 was the most accurate
system, we ran a test using this pre-trained model. The model was used to classify the
images of a new dataset using the transfer learning approach. Although the model was
84
Chapter 3. Weed classification
not more accurate than the state-of-the-art pre-trained ResNet-50 model, it was very
close to that. We could expect a higher accuracy using the transfer learning approach if
the model can be trained using a large crop and weed dataset.
This research shows that the data augmentation technique can help address the class
imbalance problem and add more variations to the dataset. The variations in the images of
the training dataset improve the training accuracy of the deep learning models. Moreover,
the transfer learning approach can mitigate the requirement of large data sets to train the
deep learning models from scratch. The pre-trained models are trained on a large dataset
to capture the detailed generalised features from the imagery, e.g., ImageNet in our case.
However, because, ImageNet data set was not categorically labelled for weeds or crops,
fine-tuning the pre-trained weights with crop and weed datasets help capture the dataset
or task-specific features. Consequently, fine-tuning improves classification accuracy.
For training a deep learning model for classifying images, it is essential to have a large
dataset like ImageNet (Deng et al., 2009) and MS-COCO (Lin et al., 2014). Classification
of crop and weed species cannot be generalised unless a benchmark dataset is available.
Most studies in this area are site-specific. A large dataset is needed to generalise the
classification of crop and weed plants, and as an initial approach, large datasets can be
generated by combining multiple small datasets, as demonstrated here. In this work, the
images only had image-level labels. A benchmark dataset can be created by combining
many datasets annotated with a variety of image labelling techniques. Generative Ad-
versarial Networks (GANs) (Goodfellow et al., 2014) based image sample generation can
also be used to mitigate class-imbalance issues. Moreover, it is needed to develop a crop
& weed dataset annotated at the object level. For implementing a real-time selective
herbicide sprayer, the classification of weed species is not enough. It is also necessary to
locate the weeds in crops. Deep learning-based object detection models can be used for
detecting weeds.
85
Chapter 4
Weeds can decrease yields and the quality of crops. Detection, localisation, and classi-
fication of weeds in crops are crucial for developing efficient weed control and management
systems. Deep learning (DL) based object detection techniques have been applied in var-
ious applications. However, such techniques generally need appropriate datasets. Most
available weed datasets only offer image-level annotation, i.e., each image is labelled with
one weed species. However, in practice, one image can have multiple weed (and crop)
species and/or multiple instances of one species. Consequently, the lack of instance-level
annotations of the weed datasets puts a constraint on the applicability of powerful DL
techniques. In the current research, we construct an instance-level labelled weed dataset.
The images are sourced from a publicly available weed dataset, namely the Corn weed
dataset. It has 5,997 images of Corn plants and four types of weeds. We annotated the
dataset using a bounding box around each instance and labelled them with the appropri-
ate species of the crop or weed. Overall, the images contain about three bounding box
annotations on average, while some images have over fifty bounding boxes. To establish
the benchmark dataset, we evaluated the dataset using several DL models, including
YOLOv7, YOLOv8 and Faster-RCNN, to locate and classify weeds in crops. The per-
This chapter has been published: Hasan, A. M., Diepeveen, D., Laga, H., Jones, M. G., & Sohel,
F. (2024). Object-level benchmark for deep learning-based detection and classification of weed species.
Crop Protection, 177, 106561.
Chapter 4. Real-time weed detection and classification
formance of the models was compared based on inference time and detection accuracy.
YOLOv7 and its variant YOLOv7-tiny models both achieved the highest mean average
precision (mAP) of 88.50% and 88.29% and took 2.7 and 1.43 milliseconds, respectively,
to classify crop and weed species in an image. YOLOv8m, a variant of YOLOv8, detected
the plants in 2.2 milliseconds with the mAP of 87.75%. Data augmentation to address
the class imbalance in the dataset improves the mAP results to 89.93% for YOLOv7 and
89.39% for YOLOv8. The detection accuracy and inference time performed by YOLOv7
and YOLOv8 models in this research indicate that these techniques can be used to develop
an automatic field-level weed detection system.
4.1 Introduction
Global agriculture today faces many challenges such as, a reduction in cultivable land,
lack of water, and abiotic and biotic issues such as frost, heat, pests, diseases and weeds
(Amrani et al., 2023b; Haque & Sohel, 2022; J. Liu et al., 2021; Raj et al., 2021; Shammi
et al., 2022). Weeds are one of the major constraints that can significantly reduce crop
performance by competing for resources such as water, sunlight, nutrition and growing
space (DOĞAN et al., 2004; Gao et al., 2018). Each year, farmers around the world
invest large amounts of time, money, and resources to prevent yield losses from weed
infestations. For instance, Australia alone spends about AUD 4.8 billion annually to
control weeds (Chauhan, 2020).
There are several control approaches, which include preventive, cultural, mechanical,
biological, and chemical methods. However, farmers rely mostly on chemical methods by
applying herbicides at the early growth stage of the crop (Harker & O’Donovan, 2013;
López-Granados, 2011). Oftentimes, weeds infest fields in small patches rather than the
whole field (Rew & Cousens, 2001), whereas herbicides are usually broadcasted across
the entire field rather than only where the weeds are. This increases the costs and is
also potentially hazardous for humans and the environment. According to the European
Food Safety Authority (EFSA), most raw agricultural commodities i.e., fruit, grain and
livestock feed contains herbicide residuals, which may have long-term ramifications for
human health, soil health, and wildlife well-being (Medina-Pastor & Triacchini, 2020).
87
4.1. Introduction
Artificial Intelligence (AI) techniques have been used in several commercial solutions
to manage and control weeds by minimising herbicide use. For example, the “Robocrop
Spot Sprayer” (“Robocrop Spot Sprayer: Weed Removal”, 2018) is a video analysis-based
autonomous selective spraying system that can selectively spray potatoes grown in car-
rots, parsnips, onions or leeks. The “WEED-IT” (“Precision Spraying - Weed Sprayer”,
n.d.) and the “WeedSeeker sprayer” (“WeedSeeker 2 Spot Spray System”, n.d.) sprayers
can target all living green materials on soil and apply herbicide on them. The prob-
lem with these systems is that they are not designed to detect and recognise individual
species of in-crop weeds (Hasan et al., 2023a, 2023b; Z. Wu et al., 2021), i.e., one image
can include multiple weeds of different species.
Several AI based solutions for selective herbicide spraying systems have been proposed
to reduce the use of chemicals in the field (Alam et al., 2020; N. Hussain et al., 2020; Raja
et al., 2020; Ruigrok et al., 2020). Using computer vision-based machine learning (ML)
techniques to detect and classify weeds in crops is difficult to implement. Traditional
ML approaches use predefined features such as plant patterns, colour, shape and texture
to distinguish crops from weeds (Bakhshipour et al., 2017; Hamuda et al., 2017; Jafari
et al., 2006; Kazmi et al., 2015b; P. Li et al., 2013; Zheng et al., 2017). However, Deep
Learning (DL) technique, which is a type of ML, can learn discriminating features and are
being used in many real-time object detection and classification problems. The adoption
of deep learning for weed detection is becoming more popular (D. Chen et al., 2022a;
Kamilaris & Prenafeta-Boldú, 2018; Reedha et al., 2021; Z. Wu et al., 2021); for instance
Hasan et al. (2021) reported that the use of DL methods for detection, localisation, and
88
Chapter 4. Real-time weed detection and classification
classification of weeds in crops has significantly increased since 2016 due to their detection
accuracy and speed.
There have been a number of studies that have compared the performance of DL
techniques with the traditional ML approaches to classify images of crop and weed plants;
the result showed that DL methods outperformed traditional methods. Researchers have
proposed several DL methods for classifying images of crop plants and weed species (dos
Santos Ferreira et al., 2017; Farooq et al., 2018b; H. Huang et al., 2018c; Nkemelu et al.,
2018). Espejo-Garcia et al. (2020) compared the performance of several state-of-the-art
deep learning architectures such as Xception, Inception-ResNet, VGGNet, MobileNet
and DenseNet to classify weeds in cotton and tomato. Yu et al. (2019b), Yu et al.
(2019a) and Lammie et al. (2019) performed comparative studies of different DL models
on their respective datasets. Pre-trained models, such as Inception-v3 and ResNet50, were
applied by Olsen et al. (2019) to categorise the images of eight species of weed found in
Australian rangeland. A. Ahmad et al. (2021) evaluated the performance of VGG16,
ResNet50 and Inception-v3 architectures for classifying weeds in Corn (Zea mays) and
soybean production systems using Keras and PyTorch framework. H. Jiang et al. (2020)
used the Graph Convolutional Network (GCN) to classify the images in the Corn weed
dataset. They have introduced this dataset along with the lettuce weed dataset and tested
the model. The datasets contained image-level annotation, which means each image has
only one label. Hasan et al. (2023b) also used the Corn weed dataset for training and
classifying the images using several state-of-the-art deep learning models.
However, image classification approaches do not localise the weeds and crops in an
image, which is required to develop a real-time selective spraying system. Besides, if
an image contains multiple instances of weeds and crops, then the classification will not
be appropriate. Object detection techniques are required to overcome these limitations.
The aforementioned deep learning models are based on Convolutional Neural Networks
(CNN). A CNN is a deep learning model, which can automatically learn and extract
features from pixel data of images for classifying or recognising them.
Based on the number of learning steps, object detection techniques can broadly be
classified into two categories: single-stage and two-stage detection methods (J. Huang
et al., 2017). Typically, an object detection process consists of two tasks: identifying the
89
4.1. Introduction
regions of interest (ROI) and then localising and classifying them. The two-stage detec-
tors divide the process into region proposal and classification stages. At first, the models
extract several ROIs called object proposals, and then classification and localisation are
performed only on these proposed regions. It is like first looking at the images, extracting
the interesting regions, and then analysing only the interesting regions. R-CNN (Girshick
et al., 2014), Fast R-CNN (Girshick, 2015), Faster R-CNN (Ren et al., 2015), Mask R-
CNN (K. He et al., 2017) and Cascade R-CNN (Cai & Vasconcelos, 2018) are examples
of widely used two-stage object detection models. A single-stage detector predicts boxes
and simultaneously classifies objects. Single Shot Detector (SSD) (W. Liu et al., 2016),
Detectron2 (Y. Wu et al., 2019), MMDetection (K. Chen et al., 2019) and You Only Look
Once (YOLO) (Redmon & Farhadi, 2017) are examples of commonly used single-stage
object detection models. Single-stage detectors are faster in inference and computation-
ally efficient compared to two-stage object detection techniques. However, single-stage
methods cannot achieve high accuracy for images with extreme foreground-background
imbalance (Carranza-García et al., 2021; J. Huang et al., 2017).
Sivakumar et al. (2020) compared the performance of Faster R-CNN, SSD and patch-
based Convolutional Neural Network (CNN) models for detecting weeds in soybean fields.
Although Faster R-CNN and SSD models showed similar performance based on the met-
rics and inference time, the optimal confidence threshold for the SSD model was lower
than Faster R-CNN. Faster R-CNN with Inception-ResNet-v2 as a feature extractor was
also proposed by Y. Jiang et al. (2019) for detecting weeds in crops. Patidar et al. (2020)
applied Mask R-CNN architecture and Fully Convolutional Network (FCN) on a public
dataset known as the “Plant Seedling Dataset” (Giselsson et al., 2017). M. H. Saleem
et al. (2022) used Faster-RCNN with ResNet-50 and ResNet-101 model to detect and
classify weeds. Le et al. (2021) and Quan et al. (2019) also proposed a Faster-RCNN
model for detecting weeds in crops.
Osorio et al. (2020) argued that the YOLOv3 model detected weeds in lettuce crops
more accurately than Mask R-CNN using multispectral images. Gao et al. (2020) trained
both YOLOv3 and tiny YOLOv3 (Redmon and Farhadi 2018) models to detect C. sepium
and sugar beet. Although the complete architecture of YOLOv3 performed better than
tiny YOLOv3, it required less inference time. Since tiny YOLOv3 has less number of
convolutional layers, it occupies less number of resources and thus reduces the inference
90
Chapter 4. Real-time weed detection and classification
time. Sharpe et al. (2020) also proposed the tiny YOLOv3 model to localise and classify
goosegrass, strawberry and tomato plants. Espinoza et al. (2020) showed that YOLOv3
achieved higher detection accuracy in less inference time than Faster R-CNN and SSD
models. Y. Li et al. (2022) also obtained better detection accuracy using YOLOX (Ge
et al., 2021) model on a crop-weed dataset (Sudars et al., 2020) containing eight weed
species and six different food crops.
Partel et al. (2019a) reported that the performance of YOLOv3 and tiny YOLOv3
models depended on computer’s hardware configuration. Their research aim was to
develop a cost-effective and smart weed management system. Although the YOLOv3
model achieved higher classification and localisation accuracy, they preferred to use tiny
YOLOv3 architecture to develop the autonomous herbicide sprayer. The tiny YOLOv3
model was compatible with less expensive hardware, performed better in real-time ap-
plications and had good accuracy. The reasons, as mentioned earlier, also inspired N.
Hussain et al. (2020) and W. Zhang et al. (2018) to use this model. In contrast, Czymmek
et al. (2019) argued the full version YOLOv3 performed better for their small dataset.
They doubled the default height and width of the input image (832 × 832 pixels) for train-
ing the model. Although tiny YOLOv3 worked faster, they did not want to compromise
the accuracy.
Dang et al. (2023) compared the performance of seven YOLO model versions. The
research showed the competence of YOLO models for real-time weed detection and clas-
sification tasks. The YOLOv4 model achieved the highest mAP of 95.22%. On the other
hand, Sportelli et al. (2023) reported that the performance of YOLOv7 and YOLOv8 were
very similar in detecting turfgrasses. Although YOLOv8 showed some improvement, the
difference was not significant. Abuhani et al. (2023) also agreed that the performance
of YOLOv7 and YOLOv8 was similar while detecting the weeds in sunflower and sugar
beet plants.
The literature clearly shows that Deep Convolutional Neural Network (DCNN) is
suitable for developing a real-time weed detection, localisation and classification system.
However, there is a lack of benchmark datasets of crops and weed species annotated at
the individual object level P. Wang et al. (2022). In addition, a comparative study of ex-
isting object detection techniques can help develop a real-time weed management system.
91
4.2. Materials and Methods
Therefore, the objectives of this paper are (1) to relabel an existing Corn weed dataset
with object level annotations and repurpose it for localising and classifying different
species from imagery and (2) to evaluate the performance of single-stage and two-stage
object detection models for localising and classifying weeds in crop in real-time.
The pipeline for detecting and classifying weeds in crop is shown in Figure 4.1. Here,
we have annotated the data, prepared the data for training, and then train and evaluate
the trained models. The steps are described as follows.
The dataset used in this paper is made available publicly by H. Jiang et al. (2020). It
contains Corn (Zea mays) plants and four weed species images: Bluegrass (Poa praten-
sis), Goosefoot (Chenopodium album), Thistle (Cirsium setosum) and Sedge (Cyperus
compressus). The original dataset stored in the Github repository (https://github.com/
zhangchuanyin/weed-datasets) contains 5,997 images of five classes. Each class has 1,200
images except Sedge weeds, which has 1,197 images. H. Jiang et al. (2020) collected
the dataset from an actual Corn field. They used a Cannon PowerShot SX600 HS
(https://www.canon.com.au/) camera to acquire images. The camera was placed ver-
tically towards the ground to reduce the influence of sunlight. As displayed in Figure 4.2,
the images have different soil backgrounds (e.g., moisture and wheat straw residue) and
light illumination. Changes in lighting conditions and backgrounds add complexity to
the dataset and affect the performance of deep-learning models (J. Liu & Wang, 2021).
All the images have a dimension of 800 × 600 pixels. Figure 4.4 shows example images
of each class as annotated by H. Jiang et al. (2020).
92
Chapter 4. Real-time weed detection and classification
Data annotation
Data description
Annotated dataset
Train
(Original dataset) Valid Test
(10%) (10%)
(80%)
Performance evaluation
H. Jiang et al. (2020) labelled the dataset using image-level annotation techniques,
which means the entire image is identified using a single label. This approach is suitable
for identifying a single object in the image; everything else is considered background.
However, in this dataset, most images contain more than one plant, and some also have
multiple crop and weed species. For instance, in Figure 4.3, we have two images. Although
there were three Bluegrass, two Corn and one Goosefoot plant in the first image (Figure
4.3(a)), it was labelled as Bluegrass. Similarly, the second image (Figure 4.3(b)) was
93
4.2. Materials and Methods
(a) Corn in different growth stage (b) Corn with other plants (c) Multiple plants of Bluegrass
(d) Several Goosefoot plant with (e) Image of Thistle containing (f) Image representing Sedge weed
other weed species other plants along with some have other species of weed and Corn
Unknown weeds plants
Figure 4.2: Sample crop and weed images of each class from the dataset.
annotated as Sedge. However, the image has other plants which are not similar to the
Sedge plant. Besides, there are three plants which have no similarity with the plant of
any of the five classes of this dataset. We have labelled them as “Unknown” plants.
H. Jiang et al. (2020) labelled the images like Figure 4.2a and Figure 4.2b as “Corn”.
However, Figure 4.2a has multiple instances of Corn plants at different growth stages and
Figure 4.2b contains Bluegrass and Corn plants. Although Figure 4.2c was annotated
as “Bluegrass”, the plants in it were not identified separately. Figure 4.2d also contains
multiple instances of “Goosefoot” and other weed species. There are several Unknown
plants, along with the five crop and weed species, which are not labelled in the dataset.
According to H. Jiang et al. (2020), Figure 4.2e is an image of “Thistle”, although it
contains other plants as well. Since Figure 4.2f is annotated as “Sedge”, one would expect
that it contains only that plant. However, it also has other weeds and crop plants. To
illustrate the data labelling complexity, we have provided few more example images in
Figure 4.4.
94
Chapter 4. Real-time weed detection and classification
(a) Bluegrass
(b) Sedge
Figure 4.3: The images in left is the original image with only one label and right one is
our label for that image. Figure (a) and (b) were originally labelled as Bluegrass and
Sedge respectively.
each object in the image. Since our goal is to locate and classify weed and crop plants for
developing selective sprayers, we have re-annotated the “Corn weed dataset” (H. Jiang
et al., 2020) again using bounding boxes; see Figure 4.4 for some annotation examples.
The bounding boxes cover the whole plant for all crop and weed species except Sedge.
The leaves of Sedge weeds are narrow and long, covering a wider area of an image and
are more likely to overlap with other plants. We label them by keeping the stem in the
middle and covering as much area as possible without overlapping with other annotated
plants in an image. In the dataset, we have bounding boxes that may have overlapped.
According to Yoo et al. (2015), occlusion of objects is likely to occur in situations where
weeds may grow over each other. The overlapping of bounding boxes may affect the
accuracy of the model. However, weed detection aims to apply treatments to control
weeds (e.g., herbicides). Although some bounding boxes overlap, we have annotated all
the available objects in the image to detect the weeds in the crop.
In this research, we used 80% of the data for training and validation and 20% data
for testing. Table 4.1 shows the number of instances in each class used for training and
testing the model. We found 16,620 plant objects in 5997 images. From the table, each
95
4.2. Materials and Methods
Figure 4.4: Bounding box annotation of crop and weed images. It shows that the
dataset images contain multiple crop or weed plants of more than one class.
image contains around three labels on average. Some of the Corn images have more than
50 plants.
96
Chapter 4. Real-time weed detection and classification
Table 4.1: Number of images and annotations for each class of crop and weed species.
Unknown 0 1121
From Table 4.1, it is clear that the dataset is imbalanced. Moreover, the objects
belonging to the “Unknown” class have differences in shape, colour and texture (intra-
class dissimilarity). This can affect the performance of deep learning model (Y. Li et
al., 2020; Lin et al., 2017). According to Qian et al. (2020) and Zoph et al. (2020),
data augmentation can improve the performance of object detection models with an
imbalanced dataset. This research has augmented the data to address the class imbalance
issue.
We have considered two scenarios of data augmentation in this study, i.e., augmenting
the entire training dataset four times (it will be termed as “All augmentation”) and aug-
menting the images of the training dataset containing any of the four classes of objects:
Goosefoot, Thistle, Sedge and Unknown (it will be termed as “Selective Augmentation”).
The Albumentation package (Buslaev et al., 2020) was used to perform image augmen-
tation. Ten well-known image geometric and photometric transformations, namely, ran-
dom rotation, horizontal flip, vertical flip, blur (Hendrycks & Dietterich, 2019), random
brightness and contrast, Gaussian noise, multiplicative noise, RGB shift, compression
and Fancy Principle Component Analysis (PCA) (Krizhevsky et al., 2017) were used
here. Four (randomly selected) of the ten transformation techniques were applied for
each training image to get the augmented images. The outcome of one image after using
the image augmentation approach is shown in Figure 4.5.
In this study, we used 80% of the data for training, 10% for validation and 10% data
for testing. The table 4.2 shows the number of objects for training, validation and testing.
97
4.2. Materials and Methods
(a) Original Image (b) Random roation (c) Horizontal flipping (d) Vertical flipping
(e) Blurring (f) Brightness and (g) Adding Gaussian (h) Adding
Contrast change noise multiplicative noise
Figure 4.5: Illustration of the ten image augmentation techniques applied on one of the
training image.
Although the classes of objects with fewer samples increase after augmenting the entire
training set, the dataset remained imbalanced in the first scenario. In the second scenario,
the dataset becomes quite balanced.
Table 4.2: The number of objects used to train (before and after augmentation),
validate and test the models.
98
Chapter 4. Real-time weed detection and classification
In this research, we used two well-known object detection models: the You Only Look
Once (YOLO) model (Bochkovskiy et al., 2020) and Faster-RCNN (Ren et al., 2015).
We have chosen one single-stage and one two-stage object detector for this research. The
models were trained using the Corn and weed dataset to detect, localise and classify the
plants in an image.
YOLOv7 model exceeds all the previous object detection techniques, including the
earlier versions of YOLO algorithms, in terms of speed and accuracy (C.-Y. Wang et al.,
2023a). We evaluated the performance of YOLOv7 and YOLOv7-tiny models in this
study. YOLOv7-tiny (6.2 million parameters) is a compressed version of the original
YOLOv7 (36.9 million parameters). YOLOv7-tiny is suitable for real-time applications
and can be implemented on devices with low computational power. However, YOLOv7
offers higher accuracy. The models were trained with MS COCO (Microsoft Common
Objects in Context) (Lin et al., 2014), and the pre-trained weights were made available
(https://www.kaggle.com/datasets/parapapapam/yolov7-weights) to use. We train the
models (using pre-trained weights) with our dataset and compare the performance of
99
4.2. Materials and Methods
YOLOv7 and YOLOv7-tiny for detecting, localising and classifying weeds in the Corn
crop.
YOLOv8 is the latest iteration of YOLO models, which exhibits higher performance
in terms of accuracy and speed. The YOLOv8 introduced several improvements, such as
mosaic augmentation, C3 convolutions and anchor-free detection, to improve the perfor-
mance and inference speed. In this study, we have evaluated all five variants of the model:
YOLOv8n (3.2 million parameters), YOLOv8s (11.2 million parameters), YOLOv8m
(25.9 million parameters), YOLOv8l (43.7 million parameters) and YOLOv8x (68.2 mil-
lion parameters). YOLOv8n is the smallest among them, yet the fastest one. Although
YOLOv8x provides the most accurate result, the model takes more time to train and
detect objects in the image.
4.2.2.2 Faster-RCNN
The widely used evaluation metrics for object detection models are: Precision, Recall,
Intersection over Union, Average Precision and Mean Average Precision (Padilla et al.,
2020). These metrics were used in this paper to evaluate the performance of the models.
Before discussing these metrics, we first present some basic concepts:
100
Chapter 4. Real-time weed detection and classification
• True Positive (TP): A correct detection of an object that matches the ground truth
is TP. Detection will be a true positive if the confidence score and the IoU value of
the predicted bounding box, the ground truth are higher than the threshold, and
the predicted class matches the class of the ground truth.
• False Positive (FP): A prediction will be a false positive if the predicted class does
not match with the ground truth class or the IoU value of the predicted bounding
box with the ground truth is less than a predefined threshold.
• False Negative (FN): A detection is counted as a false negative if the model fails to
detect a ground truth bounding box.
• True Negative (TN): If a model detects an object that is not supposed to be de-
tected, it will be considered as a true negative prediction. Since there are many
bounding boxes predicted by the model with different confidence scores, true neg-
ative results is not applicable in the object detection context.
The evaluation metrics that are used in this paper are explained below:
• Precision: It is the ratio of the correctly detected objects (TP) and the total number
of objects predicted by the model (sum of TP and FP). Precision measures the
correct positive prediction of the model. It represents how accurate the model is to
predict the objects.
TP
P = .
TP + FP
• Recall: The ratio of the total number of correctly detected objects (TP) and all
ground-truth objects (sum of TP and FN) are called recall. The recall measures
the ability of the model to detect objects that are identified in the ground truth
dataset. It represents how good the model is in finding the correct objects.
TP
R= .
TP + FN
• Intersection over Union: This metric evaluates the model’s ability to localise an
object and also determines whether the detection is correct or not. In the object
detection context, the IoU is the ratio of the overlapping area and the area of union
101
4.2. Materials and Methods
between the ground truth bounding box and the box predicted by the model. If the
IoU of a bounding box is equal or greater than a specific threshold value then the
detection is considered as correct (i.e., true positive), otherwise incorrect or (i.e.,
false positive).
Area of intersection
IoU = .
Area of Union
• Mean Average Precision: The mean average precision is the mean of APs for all ob-
ject categories. This metric is used to evaluate the accuracy of the object detection
model over all classes in a dataset (W. Liu et al., 2016).
102
Chapter 4. Real-time weed detection and classification
All the experiments were performed on a desktop computer with an Intel(R) Core(TM)
i9-9900X processor, 128 gigabytes of RAM and an NVIDIA GeForce RTX 2080 Ti Graph-
ics Processing Unit (GPU). We used the Professional Edition of the Windows 10 operating
system. To implement the weed detection and classification system, Python 3.81 with
OpenCV 4.52 and Pytorch (version 1.10) (Paszke et al., 2019) library was used. The
CUDA 11.1 and CUDNN 8.0.4 were installed in the system to fully use the GPU and
accelerate the model’s training.
All images were resized to the spatial resolution of 416 × 416 pixels for training and
inferencing. We used a batch size of 32 for all the models, and the models were trained
for 100 epochs (we let the models be trained for up to 300 epochs, but the results were
not improved).
The training time and inference time (for weed detection and classification) depend on
the hardware’s availability and the model’s computational complexity. Table 4.3 shows
the training and inference times for the models. For developing an automatic spray-rig to
apply herbicide on a specific weed, the inference time has a significant role in the vehicle
speed. The vehicle speed also depends on other factors, such as the distance between
the camera and the spraying nozzle, the height of the nozzle from the target weed and
the time required to calibrate and spray the herbicides on the weed (Alam et al., 2020;
DPIRD, 2021). The following expression summarises that.
Distance between camera and spraying nozzle + Height of the nozzle from target weed
Vehicle Speed =
Inference time + Time required to spray the chemicals
The expression shows that by reducing the inference time, we can increase the vehicle
speed and allow more time for the system to apply herbicide on target weeds more
accurately. For example, let us consider the distance between the camera and the spraying
nozzle is 2 metres and the height of the nozzle from the ground is 1 metre (Alam et al.,
1
https://www.python.org/downloads/release/python-380/
2
https://opencv.org/opencv-4-5-0/
103
4.3. Results
2020). Now for a vehicle moving at a speed of 25 kilometres per hour (“Precision Spraying
- Weed Sprayer”, n.d.), a system will have 430 milliseconds to spray the chemicals on a
target. A sprayer may take about 100 milliseconds to spray the herbicide (Alam et al.,
2020). If the image capturing rate is 15 images per second, then the system will have
about 22 milliseconds to infer one image. A higher vehicle speed can be achieved by
increasing the distance between the camera and the spraying nozzle and/or reducing the
image rate per second.
4.3 Results
The models were trained with a custom crop weed dataset as presented in Section
4.2.1. The results of the experiments are discussed here.
Table 4.3 shows that the YOLOv8x model has the highest number of parameters
(68.2 million), which is much higher than Faster-RCNN. However, the Faster-RCNN
model took longer than others to be trained. Moreover, most of the variations of the
YOLOv8 model took less training and inference time than the original YOLOv7 model.
Although the YOLOv7-tiny model is faster, YOLOv8n is the fastest among all.
YOLOv7 model took 5.4 to 11.3 hours to train with 36 million parameters, and the
inference time varies from 2.7 to 3.1 milliseconds. On the other hand, YOLOv8l and
YOLOv8x contain more trainable parameters yet take less time to train and detect weed
in an image. According to M. Hussain (2023), the YOLOv8 models reduced the use of
the mosaic augmentation technique, which improved the training time. The study also
reported that the use of C3 convolutions and an anchor-free detection approach helped
to increase the inference speed.
Based on the inference time reported in our study, both YOLOv7 and YOLOv8 models
are suitable for real-time applications. Since Faster-RCNN is a two-stage object detection
model, it takes around 95 milliseconds to detect objects in an image. That is why this
model may not be suitable for real-time operation.
104
Chapter 4. Real-time weed detection and classification
According to results from the five-fold cross-validation experiment, eight models (YOLOv7,
YOLOv7-tiny, YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x and Faster-RCNN)
were trained on the dataset for 100 epochs. We have picked the best training carves for
each model. The training curves for all the models in the three scenarios are shown in
Figure 4.6.
In the first case, where the model was trained with the original data, the performance
of the YOLO models was very close. The mAP achieved by the models was from 87.1%
105
4.3. Results
to 89.31%. The mAP of YOLOv7 was the highest, and YOLOv8n was the lowest. Al-
though the performance of the YOLOv8 models was improved by increasing the training
parameters, the YOLOv8l (88.24%) model achieved a better result than the YOLOv8x
(87.77%). However, the mAP of the Faster-RCNN model was 78.82% while trained on
the original dataset, which was much lower than the YOLO models.
The performance of the models was improved by training with augmented data. The
training mAP was increased by 0.3% to 1.83% by augmenting the images with objects of
minority classes (Selective augmentation). The performance of the models was improved
more by augmenting the entire training dataset (all augmentation). The improvement
of the mAP ranged from 0.4 to 2.5 with respect to the training model with real images
only. YOLOv7 model achieved the highest mAP of 90.01% (Selective augmentation)
and 90.25% (all augmentation). The YOLOv8l model (88.96%) performed better among
the YOLOv8 models with selective augmentation, while the YOLOv8x model (89.43%)
achieved the highest by augmenting the entire training set. Although the mAP of the
Faster-RCNN model was the lowest among all, the model performance was improved when
trained on the augmented data. It achieved 79.14% and 81.30% accuracy by augmenting
the selected classes and the entire training set, respectively.
The evaluation results of the models on the test dataset is provided in Table 4.4 in
terms of average precision by class and mean average precision. The evaluation results
shown here are the mean on five-fold cross validation test of different models.
The YOLOv7 model trained on the original images only achieved the highest mAP
of 88.50%. The average mAP of the YOLOv7-tiny model (88.29%) is very close to that.
On the other hand, the YOLOv8m model yielded the highest mAP of 87.75% among
the YOLOv8 models. Conversely, no significant differences in detection accuracy were
observed among the YOLOv8 models. Moreover, the YOLOv8 model’s performance in
detecting the weeds of Unknown class is much lower than the YOLOv7 model. The
Faster-RCNN model achieved the lowest mAP of 78.71%.
The performance of the models was improved by using augmented data for training.
106
Chapter 4. Real-time weed detection and classification
(a) Without augmenting training data (b) Augmenting the minority classes of the training
data
Figure 4.6: Training accuracy curves (mAP) for the models with and without data
augmentation
The YOLOv7 model exhibited better detection accuracy than others in both cases of
data augmentation. The highest mAP was 89.93%, which was achieved by augmenting
the entire training set. The YOLOv8 and Faster-RCNN models also exhibited improved
performance. Besides, a significant improvement was observed in detecting the weeds of
Unknown class. The YOLOv8l model yielded the highest mAP of 88.69% among the
YOLOv8 models using selective augmentation. The model also detected the Unknown
class (73.04%) more accurately. However, the performance of YOLOv8x model (89.39%)
was better than YOLOv8l (88.54%) by augmenting the entire training set. Notably,
107
4.3. Results
Table 4.4: Average precision by class and mean average precision at IoU of 50% on test
data.
YOLOv7 and YOLOv8 models exhibited significantly higher precision scores compared
to Faster-RCNN, which yielded the lowest precision scores of 79.31% and 81.29% with
selective and all augmentation, respectively. Figure 4.7 shows some example images
from the test dataset. From this figure, it can be said that the YOLOv7 and YOLOv8
models can perform better than Faster-RCNN. The Faster-RCNN model misclassified
some objects and detected the same object twice with two different classes.
108
Chapter 4. Real-time weed detection and classification
Figure 4.7: Example images from test dataset to show the detection accuracy of the
models
109
4.4. Discussion
4.4 Discussion
The YOLOv7 and YOLOv8 models took less time to be trained than the Faster-RCNN
model. The main reason behind this difference is the use of the Extended Efficient Layer
Aggregation Network (E-ELAN), which can improve the feature learning ability of the
model and reduce the use of parameters and the number of calculations. The YOLOv7
model also introduced the coarse-to-fine lead head label assigner, which also improved
the learning ability of the model without losing the required information. On the other
hand, the YOLOv8 model is even faster than the YOLOv7. The model reduced the use
of mosaic augmentation and replaced the traditional convectional layers of YOLO models
with C3 convolutions. Moreover, the YOLOv8 introduced anchor-free object detection,
which means there is no need to use pre-defined anchor boxes. This technique predicts
the bounding boxes directly, which reduces the inference time.
According to Zhao et al. (2019), training with more data can boost a deeper model’s
detection and classification accuracy with more trainable parameters. Since, by augment-
ing the entire dataset, the model had more data, the performance of YOLOv7, several
variants YOLOv8 models and Faster-RCNN were improved. However, the training accu-
racy of the YOLOv7-tiny model got saturated in that case. It showed a similar result as
trained with the unaugmented dataset since it becomes imbalanced by augmenting the
entire training set. The YOLOv7-tiny model achieved the best accuracy while trained
110
Chapter 4. Real-time weed detection and classification
Although the training mAP of Faster-RCNN was improved by augmenting the entire
dataset, YOLOv7 and YOLOv8 models were better. There are several reasons behind the
underperformance of the Faster-RCNN model. YOLO (YOLOv7 and YOLOv8) models
are data-efficient during training since they use the entire image for training and predic-
tion. Moreover, the use of mosaic and mixup augmentation techniques during training
also synthetically expands the dataset. These techniques help the model to overcome the
tendency to focus on detecting items towards the centre of the image. Faster-RCNN, on
the other hand, relies on region proposals, which can be computationally expensive and
require more labelled data (Kaya et al., 2023; D. Wu et al., 2022). Gallo et al. (2023)
and López-Correa et al. (2022) reported similar results while comparing the performance
of YOLOv7 and Faster-RCNN models in their study. Overall, the results in Figure 4.6
demonstrate that the efficacy of the YOLOv7 and YOLOv8 models is better than the
Faster-RCNN model.
The results in Table 4.4 showed that the models could detect Goosefoot and Thistle
plants more accurately since the plants are less occluded and relatively bigger. Although
we had more objects belonging to Bluegrass (21.27%) and Corn (43.29%) plants, the
models could not achieve similar accuracy. The detection accuracy of the YOLOv7 model
for Bluegrass and Corn was 87.54% and 86.06%, respectively, without augmenting the
data. The performance of YOLOv7-tiny was better in that case. Although YOLOv8n,
YOLOv8s and YOLOv8l models had higher precision in detecting Bluegrass, YOLOv7
models localise and classify the Corn plants more accurately. Similar outcomes were
observed by training the models with augmented data. This is presumably because most
of the Corn plants in the images are comparably young seedlings and smaller in size. Y.
Liu et al. (2021), Tong et al. (2020), and Wahyudi et al. (2022) agreed that detecting small
objects is more challenging due to having less feature information and lower resolution.
Moreover, the Bluegrass and Corn plants are overlapped with other objects. According
to Brawn and Snowden (2000), overlapping objects make it difficult for object detection
models to localise and classify objects. The models also failed to achieve a better result
111
4.4. Discussion
Data augmentation had a positive impact on both the training and testing process.
The learning process during training was faster using augmented data (Figure 4.6). The
detection accuracy of the models also improved by training the model with augmented
data. The improvement was more noticeable while detecting the smaller plants (Corn)
and objects of the minority class (Unknown weed).
Figure 4.8: Illustration of the effect of data augmentation on detecting the object of
Unknown class using YOLOv7 model.
On the other hand, all the models struggled to detect plants of the Unknown class
since there was more intra-class dissimilarity and fewer training samples. The effect of
data augmentation is more noticeable while detecting objects of Unknown class. For the
112
Chapter 4. Real-time weed detection and classification
YOLOv7 model, the average precision for detecting Unknown class objects was 70.54%,
which was improved by more than 5% after training the model with augmented data.
The impact was similar for the YOLOv8 and Faster-RCNN models as well. However, the
YOLOv7-tiny model showed no noticeable improvement after training with more data.
The effect of data augmentation is shown in Figure 4.8 for detecting the Unknown class
objects using the YOLOv7 model.
Figure 4.8 shows that Image 1 in the ground truth image has four objects from Blue-
grass and four from the Unknown class. The YOLOv7 model trained with the original
dataset detects only five objects (two Bluegrass and three Unknown plants). Four plants
from the Unknown class were detected after training with the original image and the
augmented image from selected classes. However, the model failed to detect two of the
plants from the Bluegrass class. The model’s performance was improved using the origi-
nal dataset and the augmented data of the entire training set. It detected all the objects
in the image. Similarly, the model failed to detect all the objects from Images 2 and 3
while trained with original data only.
4.5 Conclusion
113
4.5. Conclusion
On the other hand, the Faster-RCNN model takes more time (see table 4.3) to detect
objects in the image with lower accuracy than YOLOv7 and YOLOv8 models. From
the results found in this study, it can be said that YOLOv7 and YOLOv8 are more
appropriate models for real-time weed detection techniques than the Faster-RCNN model.
However, further studies are needed to optimise the inference time and improve the
detection accuracy of the models.
The model’s performance can be improved by training with a large and balanced
dataset. In our research, we needed more data for training, and the dataset needed
to be more balanced. That is why the models did not accurately detect the plants of
“Unknown” class. To overcome that, we have used data augmentation techniques to
increase the number of training samples. This study observed that data augmentation
positively impacts the accuracy of detecting and classifying weeds in the crop. For smaller
datasets, this technique may improve the performance of the models.
The images of this dataset were collected under different lighting conditions, and the
plants were occluded with each other. This had a major impact on classification accu-
racy. Subsequent studies are required to investigate how to overcome this limitation and
improve the model performance. In future, we will focus on improving the performance
of the two-stage object detector in terms of accuracy and inference time.
The outcome of this research indicates that the YOLOv7 and YOLOv8 models per-
form well detecting Corn and the weeds and can be used to develop a selective sprayer or
any other automatic weed control system in future. An automated field robot or a pre-
cision sprayer can be controlled for selective operations and to spray a specific herbicide
in the required amount using this method. The method allows locating and classifying
weeds in an image or video frame in real-time. On-field trials will need to be done to
test and validate the proposed techniques. Besides, using large datasets containing many
image samples from various weeds and crops collected in field conditions can improve the
performance of the deep learning techniques.
114
Chapter 5
Accurate classification of weed species in crop plants plays a crucial role in precision
agriculture by enabling targeted treatment. Recent studies show that artificial intelligence
deep learning (DL) models achieve promising solutions. However, several challenging is-
sues, such as lack of adequate training data, inter-class similarity between weed species
and intra-class dissimilarity between the images of the same weed species at different
growth stages or for other reasons (e.g., variations in lighting conditions, image capturing
mechanism, agricultural field environments) limit their performance. In this research,
we propose an image based weed classification pipeline where a patch of the image is
considered at a time to improve the performance. We first enhance the images using gen-
erative adversarial networks. The enhanced images are divided into overlapping patches,
a subset of which are used for training the DL models. For selecting the most informa-
tive patches, we use the variance of Laplacian and the mean frequency of Fast Fourier
Transforms. At test time, the model’s outputs are fused using a weighted majority voting
technique to infer the class label of an image. The proposed pipeline was evaluated using
10 state-of-the-art DL models on four publicly available crop weed datasets: DeepWeeds,
Cotton weed, Corn weed, and Cotton Tomato weed. Our pipeline achieved significant
performance improvements on all four datasets. DenseNet201 achieved the top perfor-
mance with F1 scores of 98.49%, 99.83% and 100% on Deepweeds, Corn weed and Cotton
This chapter has been published: Hasan, A. M., Diepeveen, D., Laga, H., Jones, M. G., & Sohel, F.
(2023). Image patch-based deep learning approach for crop and weed recognition. Ecological informatics,
78, 102361.
115
5.1. Introduction
Tomato weed datasets, respectively. The highest F1 score on the Cotton weed dataset
was 98.96%, obtained by InceptionResNetV2. Moreover, the proposed pipeline addressed
the issues of intra-class dissimilarity and inter-class similarity in the DeepWeeds dataset
and more accurately classified the minority weed classes in the Cotton weed dataset. This
performance indicates that the proposed pipeline can be used in farming applications.
5.1 Introduction
Artificial intelligence (AI) techniques have greate potential to improve modern farm-
ing systems. AI can benefit farming in many ways, e.g., improving crop yields, reducing
the environmental impact and optimising resource allocation (Eli-Chukwu, 2019; Smith,
2018). There are many investigations of deep learning (DL) techniques, which is a branch
of AI, on crop monitoring, such as (Kussul et al., 2017), plant disease detection (Ferenti-
nos, 2018; Haque & Sohel, 2022; Keceli et al., 2022; M. H. Saleem et al., 2019), yield
prediction (Maimaitijiang et al., 2020), weed and pest recognition (Amrani et al., 2023a;
Hasan et al., 2021; W. Li et al., 2021), plant phenotyping or growth monitoring (M.
Wang et al., 2023) crop health monitoring (Devi et al., 2023) and plant frost and stress
monitoring (Khotimah et al., 2023; Shammi et al., 2023; A. Singh et al., 2021).
Weeds are unwanted plants that grow in value crops. Because weeds compete with
crops for resources such as space, water, nutrients and sunlight, poor weed management
in agriculture can cause yield losses and degrade the quality of crops (Slaughter et al.,
2008). The most widely used approach in weed control is the application of herbicide
chemicals (Duke, 2015; Harker & O’Donovan, 2013; López-Granados, 2011). However,
the broad use of herbicides, irrespective of weed species or the severity of infestation,
can lead to high costs and environmental hazards. Therefore, it is important to deploy a
targeted weed control approach. AI and imaging-based techniques can help achieve this
(Sa et al., 2017; Slaughter et al., 2008).
Various weed species may grow within a particular crop that may require different
control and management strategies (Heap, 2014). Classifying weed species in crops is a
crucial step for applying a site-specific weed management system. It is also important for
biodiversity conservation and ecological monitoring (Keceli et al., 2022). Targeted appli-
116
Chapter 5. Improving classification accuracy
With the advancement of DL, many studies proposed convolutional neural network
(CNN) based techniques to classify weeds in crops (Hasan et al., 2021). CNN models can
automatically learn hierarchical representations of features from images using multiple
layers of interconnected artificial neurons (Shrestha & Mahmood, 2019).Subsequently,
the DL methods can adaptively learn the most discriminative features from images. Sev-
eral studies were conducted to compare the performance of DL models with traditional
machine learning techniques and showed that the classification accuracy of state-of-the-
art deep learning models is better (dos Santos Ferreira et al., 2017; Liang et al., 2019;
Sarvini et al., 2019; Tang et al., 2017; W. Zhang et al., 2018). However, the perfor-
mance of the state-of-the-art DL models may vary depending on the datasets and the
pre-processing techniques applied to the images (Hu et al., 2020; Kounalakis et al., 2018,
2019; Yu et al., 2019b). Although the pre-trained weights were not generated from any
crop-weed dataset, studies suggested that the performance of the CNN model could be
improved by fine-tuning the pre-trained networks (Bah et al., 2018; Suh et al., 2018;
Teimouri et al., 2018; Toğaçar, 2022; Valente et al., 2019; Yu et al., 2019b). Some other
117
5.1. Introduction
Most crop-weed datasets have a small number of images for training while the DL
models are data hungry. Also, these datasets generally suffer from the class imbalance
problem, i.e., some species have lot more images than others, which can affect the perfor-
mance of the DL models (Attri et al., 2023; Kamilaris & Prenafeta-Boldú, 2018). Many
studies use data augmentation techniques to increase the number of training data to ad-
dress that. This approach generally improves the classification accuracy (D. Chen et al.,
2022b; Hasan et al., 2023b; Le et al., 2020a; Olsen et al., 2019; Sarvini et al., 2019). Ap-
plication of several image pre-processing techniques, such as resizing (Chechlinski et al.,
2019; Farooq et al., 2018a; Partel et al., 2020), removing background (Alam et al., 2020;
Bah et al., 2018; Y. Jiang et al., 2019), image enhancement (Nkemelu et al., 2018; A.
Wang et al., 2020) and denoising (Tang et al., 2017), can also improve the performance
of the models.
According to Hou et al. (2016), training a CNN model with high-resolution images is
computationally expensive and time-consuming. They proposed a patch-based CNN to
classify cells from microscopic high-resolution whole slide images of tissues. A. Sharma
et al. (2017) used a patch-based CNN architecture to classify land cover images using
Landsat. The proposed architecture outperformed pixel-based approaches in overall clas-
sification accuracy. Patch-based image classification is a technique used to classify images
118
Chapter 5. Improving classification accuracy
by analysing smaller patches or sub-regions within the image rather than considering the
entire image as a whole (Ullah et al., 2023).
119
5.2. Materials and methods
• Instead of using all the patches of images, a novel technique was developed to select
relatively important patches. By avoiding less important patches and only learn-
ing from the important patches, the performances of the DL-models are improved.
Technically, we use a combination of the Laplacian method and Fast Fourier Trans-
form for calculating the relatively important information in a patch, and select a
patch if it is important.
Deep learning based frameworks for classifying crop and weed species follow several
croad steps: dataset acquisition, preparing data for training, training the deep learning
models, and evaluating the performance of the models. We have illustrated this in Fig-
ure 5.1. We evaluated the proposed method on four public datasets. Each dataset was
divided into training, validation and test subsets. We applied several image preprocess-
ing techniques, e.g., enhancing and resizing the images, dividing the images into patches
(both overlapping and non-overlapping patches) and selecting relatively more important
patches, and using them in training the deep learning models. Ten major deep learn-
ing models were then trained and evaluated using the datasets. Finally, the models’
performances were evaluated based on multiple benchmark evaluation metrics.
5.2.1 Datasets
Four publicly available crop weed datasets were selected to evaluate the performance
of our proposed approach. The datasets are DeepWeeds dataset (Olsen et al., 2019),
Cotton weed dataset (D. Chen et al., 2022b), Corn weed dataset (H. Jiang et al., 2020)
and Cotton Tomato weed dataset (Espejo-Garcia et al., 2020). A summary of the datasets
120
Chapter 5. Improving classification accuracy
Table 5.1: A summary of the datasets used in this research. The number of images to
train, validate and evaluate the models are also shown.
Number of images in
Dataset/ Crop/ weed Total number
each set
Total image Species of images
Train Validation Test
Chinee apple (Ziziphus mauritiana) 675 225 226 1126
Lantana (Lantana camara) 637 212 214 1063
Parkinsonia (Parkinsonia aculeata) 618 206 207 1031
Parthenium (Parthenium hysterophorus) 613 204 205 1022
DeepWeeds/
Prickly acacia (Vachellia nilotica) 637 212 213 1062
17,509
Rubber vine (Cryptostegia grandiflora) 605 201 203 1009
Siam weed (Eupatorium odoratum) 644 214 216 1074
Snake weed (Stachytarpheta spp.) 609 203 204 1016
Negative 5463 1821 1822 9106
Carpet weeds (Mollugo verticillata) 457 152 154 763
Crabgrass (Digitaria sanguinalis) 66 22 23 111
Eclipta (Eclipta prostrata) 152 50 52 254
Goosegrass (Eleusine indica) 129 43 44 216
Morningglory (Ipomoea purpurea) 669 223 223 1115
Nutsedge (Cyperus rotundus) 163 54 56 273
Palmer Amaranth (Amaranthus palmeri) 413 137 139 689
Cotton weed/
Prickly Sida (Sida spinosa) 77 25 27 129
5,187
Purslane (Portulaca oleracea) 270 90 90 450
Ragweed (Ambrosia artemisiifolia) 77 25 27 129
Sicklepod (Senna obtusifolia) 144 48 48 240
Spotted Spurge (Euphorbia maculata) 140 46 48 234
Spurred Anoda (Anoda cristata) 36 12 13 61
Swinecress (Lepidium coronopus) 43 14 15 72
Waterhemp (Amaranthus tuberculatus) 270 90 91 451
Bluegrass (Poa pratensis) 720 240 240 1200
Chenopodium album 720 240 240 1200
Corn weed/
Cirsium setosum 720 240 240 1200
6,000
Corn (Zea mays) 720 240 240 1200
Sedge (Cyperus compressus) 720 239 241 1200
Cotton (Gossypium herbaceum) 73 24 26 123
Cotton Tomato Tomato (Solanum lycopersicum) 32 10 12 54
weed/ 508 Black nightshade (Solanum nigrum) 120 40 41 201
Velvet leaf (Abutilon theophrasti) 78 26 26 130
These datasets impose several challenges for deep learning models. One of them
is the inter-class similarity and intra-class dissimilarity. This limits the performance
of the deep learning models (Cacheux et al., 2019). Besides, an image may contain
other plants along with the soil background while capturing a picture of a target plant
image. In the image label annotation technique, an image with multiple plants is labelled
based on the target plant, and the rest are considered background. The plants in the
background are sometimes from the same class and sometimes different. Since the plants
in the background have similar morphology, they influence the class label prediction of
121
5.2. Materials and methods
the deep learning model may lead to misclassification. Another challenging issue is the
class-imbalanced training data which affects the performance of the deep learning model
significantly (Q. Dong et al., 2018). Moreover, deep learning models generally require a
large volume of data to train and learn distinguishable features from the images (Barbedo,
2018). It will be challenging to classify images with small training dataset.
The dataset contains 17,509 images collected from eight locations in northern Aus-
tralia. More than 8,000 of them are from eight nationally significant weed species. The
rest are plants native to that region of Australia, but not weeds. Those images are classi-
fied as negative in the dataset. The weed species are- chinee apple, lantana, parkinsonia,
parthenium, prickly acacia, rubber vine, siam weed and snake weed. DeepWeeds dataset
has inter-class similarity and intra-class dissimilarity problems, making the weed species
recognition task more challenging. The images were collected using a FLIR Blackfly
23S6C Gigabit Ethernet high-resolution camera. Olsen et al. (2019) intentionally cap-
tured the pictures from different heights, angles, locations and in several lighting condi-
tions to introduce variability in the dataset. All the images were resized to 256 × 256
pixels in size. Olsen et al. (2019) also reported that due to lighting conditions, 3.4%
images of the chinee apple species were classified as snake weed by the trained model
and 4.1% vice versa. The models in their experiments also misclassified parkinsonia and
prickly acacia in several occasions since they are from same genus. The dataset is available
through the GitHub repository: https://github.com/AlexOlsen/DeepWeeds.
The Cotton weed dataset was acquired from the cotton belts of the United States of
America, e.g., North Caroline and Mississippi. D. Chen et al. (2022b) captured the images
of weeds at their different growth stages and under various natural lighting conditions
using digital cameras or smartphones. The data was collected in the growing seasons
(from June to August) of 2020 and 2021. The dataset has 5,187 images of weeds from
fifteen different classes. D. Chen et al. (2022b) reported that the dataset was highly
imbalanced and contained both high and low-resolution images, affecting classification
122
Chapter 5. Improving classification accuracy
accuracy. Moreover, the images of the weeds were collected at different growth stages
with variations in lighting conditions, plant background, leaf colour and structure. The
dataset also has both inter-class similarity and intra-class dissimilarity. These conditions
added some constraints to the classification accuracy. We used the dataset to evaluate
our proposed technique’s efficiency on those issues. This dataset is available through the
Kaggle repository: https://www.kaggle.com/yuzhenlu/cottonweedid15.
A Canon PowerShot SX600 HS camera was used to collect the images of this dataset.
H. Jiang et al. (2020) collected the data from an actual corn seedlings field under natural
lighting conditions at different growth stages of the plants. The dataset has 6,000 images
of corn and four types of weed. The collected pictures were resized to a resolution of
800 × 600. The soil background of the plants is not uniform, and so is the lighting
conditions, which posed some challenge to the deep learning model to achieve higher
accuracy. Although the Corn weed dataset had an equal number of images in each class,
most of the images contained multiple plants. The plants were sometimes from the same
class and sometimes different. The images with multiple plants were labelled based on one
of the plants, and the rest were considered background. Since the plants in the background
had similar textures, colours and shapes, they influenced the class label prediction of the
deep learning model. This dataset was taken to verify whether our proposed approach
can handle that issue. The dataset was shared by H. Jiang et al. (2020) through the
Github repository: https://github.com/zhangchuanyin/weed-datasets.
The images of this dataset were collected from different regions of Greece. The dataset
contains two types of crops and two types of weed plants. The picture of plants was
captured at their early growth stages. Several photographers collected the images from
different locations under various lighting conditions and soil backgrounds. The dataset
has only 508 images from four classes of crops and weeds. The images were captured with
a resolution of 2272 × 1704 pixels from one meter above the ground. The Cotton Tomato
weed dataset has relatively fewer data and is taken to evaluate the model’s performance.
123
5.2. Materials and methods
Espejo-Garcia et al. (2020) made the dataset available for further research through the
Github repository: https://github.com/AUAgroup/early-crop-weed.
In this research, the datasets were randomly divided into three parts for training,
validation and testing. Here, 60% of the data was used to train the deep learning models,
and 20% of them was kept to validate the models. The remaining 20% data was used
to evaluate the performance of the models. Table 5.1 show the number of data in the
dataset and how they are separated for training, testing and validation.
The selection of the deep learning models for image classification depends on avail-
able computational resources and the trade-offs between the model complexity and per-
formance (Druzhkov & Kustikova, 2016; Y. Li et al., 2018). To test the performance
of our technique, we selected the following deep learning models: VGG16 (Simonyan &
Zisserman, 2014), VGG19 (Simonyan & Zisserman, 2014), ResNet-50 (K. He et al., 2016),
Inception-V3 (Szegedy et al., 2016), InceptionResNetV2 (Szegedy et al., 2017), Xception
(Chollet, 2017), DenseNet121 (G. Huang et al., 2017), DenseNet169 (G. Huang et al.,
2017), DenseNet201 (G. Huang et al., 2017) and MobileNetV2 (Sandler et al., 2018). For
the sake of brevity, we briefly summarise the main attributes of these techniques.
VGG16 and VGG19 are classical architectures that are well known for simplicity
and uniformity. These models are suitable for smaller datasets and can provide better
accuracy by fine-tuning the pre-trained network (Sukegawa et al., 2020). The models
have several drawbacks, such as vanishing gradient problems and loss of fine-grained
spatial information (M. Pan et al., 2020). On the other hand, ResNet-50 contains residual
connections, which can overcome the vanishing gradient problem and enable training very
deep networks. The model performs well on both large and small datasets (Al-Masni et
al., 2020).
124
Chapter 5. Improving classification accuracy
scales using parallel convolution operations (C. Wang & Xiao, 2021). The Xception
model is an extension of the Inception architecture, which uses depth-wise separable con-
volutions to reduce computational complexity. The model balances the computational
efficiency and performance (Chollet, 2017; Kassani et al., 2019).
Several studies used these models on crop-weed datasets and achieved better perfor-
mance. For instance, Olsen et al. (2019) used ResNet-50 and Inception-V3. Hasan et al.
(2023b) compared the performance of VGG16, ResNet-50, Inception-V3, InceptionRes-
NetV2 and MobileNetV2 on a combined dataset containing twenty classes of image. An-
other study evaluated the performance of thirty-five state-of-the-art deep learning models
(including the models mentioned above) on the Cotton weed dataset where the models,
as mentioned earlier, had reasonably better results with less inference time (D. Chen
et al., 2022b). Sharpe et al. (2019) applied VGGNet and DetectNet to classify weeds
in strawberry plants and achieved better performance using them. Suh et al. (2018)
also proposed VGG19, ResNet-50 and Inception-V3 to classify images of sugar beet and
volunteer potato.
The performance of image classification models can be evaluated using several metrics
which can provide valuable insights about their effectiveness. The choice of metrics
depends on the specific requirements and characteristics of the task. In our study, we
have chosen the following commonly used metrics to evaluate the efficacy of the deep
learning models:
125
5.2. Materials and methods
the total number of images. The metric provides a general overview of model
performance.
In these metrics, true positives are correct positive predictions, true negatives are
correct negative predictions, false positives are incorrect positive predictions, and
false negatives are incorrect negative predictions.
T rue P ositives
P recision = .
T rue P ositives + F alse P ositives
• Recall: Recall measures the proportion of correctly classified positive samples (true
positive predictions) out of all actual positive samples (sum of true positives and
false negatives). It represents the ability of the model to detect positive samples.
T rue P ositives
Recall = .
T rue P ositives + F alse N egatives
• F1-Score: The F1-Score is the harmonic mean of precision and recall. It represents
the balance between precision and recall, which can help measure the performance
of a model on a class-imbalanced dataset.
P recision × Recall
F1 = 2 × .
P recision + Recall
• Confusion Matrix: A confusion matrix tabulates the true positive, true negative,
false positive, and false negative counts. It provides a detailed breakdown of model
performance. It is helpful to visualise how well the deep learning model is perform-
ing and what prediction errors it is making.
126
Chapter 5. Improving classification accuracy
acquisition
Data Image dataset
Enhance images
Resize the images
using GAN
Data
Overlapping
Disjoint patches
patches
Figure 5.1: An illustration of the proposed workflow for classifying crop and weeds in
images.
In this approach, the images were resized to 256 × 256 and used to train the models.
For this study, we have selected ten deep learning models as discussed in Section 5.2.3.
The performance of the models was evaluated using several well-known metrics (see Sec-
tion 5.2.6.5) such as accuracy, precision, recall and F1 score. We also used a confusion
matrix to show how accurate the models were in classifying different classes of images in
the datasets (Hasan et al., 2021; Kamilaris & Prenafeta-Boldú, 2018).
127
5.2. Materials and methods
We use three different resolutions of images: 256 × 256, 512 × 512 and 1024 × 1024.
The images in all four datasets have variations in image size. Some are smaller than
256 × 256, and some are larger than 1024 × 1024 pixels in size. The purpose of the
image resize operation is to make the images uniform in resolution. Suppose the images
with lower resolution are divided into patches. In that case, the size of the patches will
be very small, and it may not be possible to find distinguishable features from them.
That may affect the classification accuracy (Y. Liu et al., 2021). We have chosen these
three image resolutions to verify the impact of image size on the deep learning models’
performance. The OpenCV “resize” module was used to perform this task (OpenCV,
2019). To resize the image, this module uses any of the four interpolation methods,
namely nearest-neighbour, bilinear, bicubic and Lanczos interpolation. We tested all the
methods and found no significant effect of their choice on deep learning models. That is
why the nearest-neighbour interpolation was used in this research to resize the images.
However, increasing the resolution of the smaller images with a traditional image
processing approach may reduce the quality, and the resultant data may be blurry (Y. S.
Chen et al., 2018). GAN-based image enhancement can increase the image’s resolution
without affecting the quality. This research used Enhanced Super-Resolution Generative
Adversarial Network (ESRGAN) to enhance low-resolution images. Then resized them
128
Chapter 5. Improving classification accuracy
using the OpenCV “resize” module to a 1024 × 1024 pixel resolution. Figure 5.2 shows
the operations performed here.
256 × 256
1024 × 1024
Original Image
Figure 5.2: The workflow for enhancing and resizing the images.
After resizing the original or enhanced images, they were divided into patches. We
have generated both disjoint patches and overlapping patches. The patches will be over-
laid by 50% in the overlapping patches. As such, each image was divided into 16 disjoint
patches or 49 overlapping patches. For instance, a 256 × 256 pixels image is split into 16
disjoint or 49 overlapping patches of size 64 × 64 pixels.
Several research indicate that blurry images have negative impact on the classification
accuracy of deep learning models (Dodge & Karam, 2017; Q. Guo et al., 2020; Nimisha
et al., 2017). There are many approaches to detect whether an image is blurry or sharp.
Two of the techniques are based on Fast Fourier Transform (FFT) (Pertuz et al., 2013)
and variance of the Laplacian method (Bansal et al., 2016).
In an FFT based approach, the frequencies at different points of the images are
calculated and depending on the level of frequencies the images are identified as blurry or
sharp. On the other hand, the variance of the pixel values are used in Laplacian methods.
In both cases, the frequency is important to determine a threshold value which will be
used to make a decision. In FFT, it the frequency and for Laplacian method that should
129
5.2. Materials and methods
be the variance. If the frequency (in FFT) or the variance (in Laplacian method) is lower
than the threshold, the image will be identified as blurry.
We used this approach to detect whether a patch of an image contains plant parts and
relatively more discriminating information. Our research found that in most cases, the
patches with no plant parts have low frequency values or variance. This is because the
camera focuses on the crop or weed when a plant image is captured, and the background
soil gets slightly blurry. Both frequency (in FFT) and variance (in the Laplacian method)
are lower if a patch contains only a soil background. This helps us decide which patches
to select to train the models. Here, we followed the following steps to select the patches:
Steps to select the patches for to train, validate and test the models
1: P is a patch, there are N patches in an image
2: for each P in N do
3: Calculate mean frequency P fi
4: Calculate variance of Laplacian method P vi
5: end for
6: Calculate average of the mean frequency AP f = (P f1 + P f2 + ... + P fn )/n
7: Calculate average of the variance AP v = (P v1 + P v2 + ... + P vn )/n
8: for each P f in (P f1 , P f2 , ..., P fn ) and P v in (P v1 , P v2 , ..., P vn ) do
9: if P f > AP f and P v > AP v then
10: Add the patch to selected set SP
11: end if
12: end for
Figure 5.3 illustrates the patch selection process with an example. An image was
divided into 16 patches. Then the mean frequency (in FFT) and the variance (in the
Laplacian method) of each patch were calculated. We then calculated the average of the
mean frequency and the variance separately. This average value was used as a threshold
for that image. That means if the mean frequency of a patch is greater or equal to that
average, then the patch will be selected. The same approach was used to choose the more
informative patches using average variance. For this image, ten and eight patches were
selected by Laplacian and FFT techniques, respectively. Finally, we chose only those
patches which were selected by both methods. Here, seven patches were selected for
training or testing. It can be seen that the chosen patches have at least some plant parts.
130
Chapter 5. Improving classification accuracy
Select patches
Patches of an image which are chosen
by both methods
Calculate the
variance (in Lapla- Calculate Select patches
cian method) the average with the variance
of variance higher than average
for each patch
We trained the same deep learning models as mentioned in Section 5.2.5. For training,
we resized all the patches of images to a resolution of 256 × 256 pixels. Although after
dividing the image into patches, there were images with lower resolution, i.e., 64 × 64
and 128 × 128 pixels, they were converted into a uniform size.
An image was divided into patches first to predict its class label. Then the important
patches were selected using the approach mentioned in Section 5.2.6.3. After that, the
model predicted the class label for each patch. The weighted majority voting technique
was used to predict the class label of the image from the predicted labels.
131
5.3. Results
The experiments were conducted using a desktop computer with an Intel Core i9-
9900X processor, 128 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti graphics card.
The deep learning models were developed using Python 3.8 and Tensorflow 2.4 library.
The models were initialised with the pre-trained weights trained on the ImageNet dataset.
Since the models were trained to classify 1000 classes of images, the classification layer of
the models was replaced for fine-tuning them on the crop-weed datasets. A global average
pooling layer followed by two dense layers with 1024 neurons and ReLU (Rectified Linear
Unit) had replaced the fully connected layer of the original model. The final output layer
was another dense layer with a softmax activation function, and the number of neurons
varied depending on the number of classes. Although the maximum number of epochs
for training was set to 100, the training was completed before that because of the early
stopping strategy by inspecting validation accuracy. The initial learning rate was set to
1×10−4 and was randomly decreased to 10−6 by monitoring the validation loss after every
epoch. We used “Adam” optimiser and “Categorical Cross Entropy” loss for training all
deep learning models. The input size for all the DL models was 256 × 256 × 3 and due
to capability of the computing device the batch size was set to 32.
5.3 Results
In this section, we present the experimental results of our proposed approach and
compare them with the traditional approach and the results achieved by the previous
studies on all the datasets.
We first trained the models using the original images only, without any data aug-
mentation. Table 5.2 summarises the accuracy, precision, recall and F1 score of the ten
models. There is a significant variation between the models regarding the number of pa-
rameters and model depth. Overall, the DenseNet models achieved the best results on all
datasets. DenseNet169 and DenseNet201 attained the same accuracy of 95.01% for the
132
Chapter 5. Improving classification accuracy
DeepWeeds dataset, but the precision, recall and F1 score indicates that DenseNet169 is
better than DenseNet201. The dataset has inter-class similarity and intra-class dissimi-
larity issues. That affected the classification accuracy. Olsen et al. (2019) also mentioned
that there were images of chinee apple, snake weed, prickly acacia and parkinsonia, which
were falsely classified.
On the other hand, for the Cotton weed dataset, DenseNet201 achieved the highest
accuracy of 96%, and the precision, recall and F1 scores were also higher than other
models. This dataset is highly class imbalanced. According to D. Chen et al. (2022b),
the minority weed classes showed relatively low classification accuracy. We observed the
same in our experiments.
The Corn weed dataset was balanced, and the models showed promising performance
without any data augmentation. DenseNet169 and MobileNetV2 achieved the highest
and lowest accuracy of 99.67% and 98%, respectively. However, the Cotton Tomato
weed dataset had a relatively small number of images for training, and the dataset was
imbalanced. Those issues affected the performance of some models, such as MobileNetV2,
VGG16, VGG19, Inception-V3 and Xception. Moreover, for Inception-V3, we observed
that the model achieved a high precision rate (79.72%) but a low recall rate (69.23%).
This indicates that the model made many false predictions. On the other hand, the
performance of MobileNetV2 was relatively poor.
In most cases, the models’ performance improved by increasing the image size (Figure
5.4). R. Wu et al. (2015) mentioned that the deep learning models could extract more
distinguishable features from high-resolution images. Here, we have shown the results
achieved using disjoint and overlapping patches. However, the results justify that the
model performance can be improved by resizing the images to a relatively larger size
in a patch-based approach. Besides, all ten deep learning models showed better results
133
5.3. Results
Table 5.2: Performance of ten deep learning models based on accuracy, precision, recall
and F1 score on four datasets using traditional pipeline. The performance metrics show
the values in percentage.
Models
Dataset
InceptionResNetV2
Performance
metrics
MobileNetV2
DenseNet121
DenseNet169
DenseNet201
Inception-V3
ResNet-50
Xception
VGG16
VGG19
Accuracy 84.99 90.00 88.01 91.00 88.01 90.00 90.00 92.99 95.01 95.01
Precision 84.50 90.23 88.35 91.29 88.29 90.25 89.46 93.40 94.94 94.67
DeepWeeds
Recall 84.72 90.31 88.40 91.37 88.40 90.37 89.57 93.39 94.93 94.62
F1 score 84.40 90.25 88.35 91.24 88.31 90.27 89.46 93.38 94.91 94.62
Accuracy 88.00 80.00 78.00 89.05 89.05 91.05 92.00 95.05 95.05 96.00
Cotton Precision 87.49 80.27 77.87 89.80 88.95 91.65 91.77 95.32 95.59 95.88
weed Recall 87.71 80.19 78.38 89.42 88.86 91.43 91.71 95.23 95.43 95.81
F1 score 87.15 79.83 77.61 89.30 88.74 91.34 91.59 95.21 95.43 95.79
Accuracy 98.00 99.00 98.00 99.00 99.00 99.00 99.00 99.50 99.67 99.50
Corn Precision 98.35 99.18 98.42 99.42 98.67 98.75 99.18 99.50 99.67 99.50
weed Recall 98.33 99.16 98.42 99.42 98.67 98.75 99.17 99.50 99.67 99.50
F1 score 98.34 99.17 98.42 99.42 98.67 98.75 99.17 99.50 99.67 99.50
Accuracy 38.46 90.38 78.84 99.05 69.23 99.03 96.15 99.03 99.05 99.05
Cotton Precision 15.31 90.91 77.74 99.05 79.72 99.07 96.15 99.06 99.07 99.07
Tomato
weed Recall 38.46 90.38 78.84 99.05 69.23 99.03 96.15 99.03 99.07 99.07
F1 score 21.90 90.46 77.69 99.05 67.32 99.03 96.15 99.03 99.07 99.07
in the patch-based system with overlapping patches than the traditional approaches for
three datasets. The DenseNet201 model achieved 96.38%, 99.83% and 100% accuracy for
DeepWeeds, Corn weed and Cotton Tomato weed datasets, respectively. The performance
was improved because of having more training samples from the images (P. Wang et al.,
2021). Notably, the classification accuracy of most deep learning models was below 70%
on the Cotton weed dataset using low-resolution (256 × 256) images. The performance
was improved significantly by increasing the image size.
On the other hand, the optimal image size for better classification accuracy depends on
the dataset and the CNN architecture (Thambawita et al., 2021). MobileNetV2 obtained
a recognition accuracy of 42.31% using 256 × 256 pixels images on the Cotton Tomato
weed dataset. However, the accuracies were 96.15% and 94.23% using the image resolution
of 512 × 512 and 1024 × 1024 pixels, respectively. This indicates that the optimal image
134
Chapter 5. Improving classification accuracy
DeepWeeds
Cotton weed
Corn weed Without overlapping patches With overlapping patches
Tomato weed
Cotton
Figure 5.4: Deep learning models’ accuracy with respect to image size using
patch-based approach.
resolution for MobileNetV2 on this dataset is 512 × 512 pixels using the image resize
operation.
135
5.3. Results
DenseNet201 achieved the highest precision, recall and F1 score of 98.49% by en-
hancing the image using the GAN-based method. The performance was better than the
resized image and was improved by 2.19%. The classification accuracies for other deep
learning models except InceptionV3 were also increased. Olsen et al. (2019) reported
the highest precision of 95.7% on DeepWeeds dataset. Our approach showed significant
improvement. We also have shown the improvement in the classification accuracy for
each species in Section 5.3.4. Classification accuracy was improved by almost 6% for the
InceptionResNetV2 model on the Cotton weed dataset. The model achieved the high-
est precision, recall and F1 score of 98.96%, 98.95% and 98.95%, respectively. All other
models showed improved results as well. We have demonstrated the performance of the
models classifying the minority class in Section 5.3.5.
The results on Corn weed and the Cotton Tomato weed dataset indicate that our
proposed approach is effective for balanced and small dataset as well. Most of the models
achieved close to 100% classification accuracy.
The DeepWeeds dataset has high inter-class similarity and intra-class dissimilarity.
Olsen et al. (2019) reported that the model confuses chinee apple images with snake
weed and vice versa. They also added that the deep learning methods incorrectly classified
parkinsonia images as prickly acacia. Their results also indicated that many weed images
are classified as non-weed, and about 3% native plants (negative class) were classified as
various weed species.
Hu et al. (2020) proposed a graph-based deep learning approach to address the issues
136
Chapter 5. Improving classification accuracy
Table 5.3: Performance comparison of the models between training with resized and
enhanced images for all four datasets.
Cotton Tomato
DeepWeeds Cotton weed Corn weed
Performance
weed
Model (%) (%) (%)
metrics
(%)
Change
Change
Change
Change
Resize
Resize
Resize
Resize
GAN
GAN
GAN
GAN
Precision 93.84 95.29 1.55 90.77 96.44 6.24 95.85 99.58 3.90 99.08 99.08 0.00
MobileNetV2 Recall 93.84 95.30 1.55 90.56 96.38 6.42 95.33 99.58 4.46 99.04 99.04 0.00
F1 Score 93.76 95.26 1.60 90.50 96.38 6.50 95.39 99.58 4.40 99.04 99.04 0.00
Precision 89.23 93.70 5.01 92.98 97.65 5.03 99.17 99.42 0.25 100.00 100.00 0.00
VGG16 Recall 88.97 93.70 5.32 92.85 97.62 5.13 99.17 99.42 0.25 100.00 100.00 0.00
F1 Score 88.72 93.63 5.53 92.86 97.61 5.12 99.17 99.42 0.25 100.00 100.00 0.00
Precision 87.88 93.02 5.85 91.97 96.75 5.19 99.50 99.50 0.00 99.07 100.00 0.93
VGG19 Recall 87.92 92.96 5.74 91.71 96.66 5.41 99.50 99.50 0.00 99.04 100.00 0.97
F1 Score 87.61 92.83 5.96 91.64 96.65 5.46 99.50 99.50 0.00 99.03 100.00 0.98
Precision 91.71 95.71 4.37 92.12 96.33 4.57 98.45 99.75 1.32 100.00 100.00 0.00
ResNet-50 Recall 91.65 95.73 4.45 91.99 96.28 4.66 98.42 99.75 1.35 100.00 100.00 0.00
F1 Score 91.47 95.70 4.62 91.93 96.27 4.72 98.42 99.75 1.35 100.00 100.00 0.00
Precision 93.37 93.09 -0.30 92.06 97.27 5.66 98.85 99.50 0.66 99.07 99.08 0.00
Inception-V3 Recall 93.39 93.07 -0.34 91.90 97.24 5.81 98.83 99.50 0.67 99.04 99.04 0.00
F1 Score 93.32 93.01 -0.33 91.87 97.23 5.84 98.84 99.50 0.67 99.04 99.04 0.00
Precision 92.51 96.24 4.03 93.36 98.96 5.99 98.93 99.58 0.66 100.00 100.00 0.00
Inception-
Recall 92.53 96.21 3.97 93.23 98.95 6.14 98.92 99.58 0.67 100.00 100.00 0.00
ResNetV2
F1 Score 92.38 96.19 4.12 93.17 98.95 6.21 98.92 99.58 0.67 100.00 100.00 0.00
Precision 93.91 96.18 2.42 94.25 97.86 3.82 99.18 99.50 0.33 100.00 100.00 0.00
Xception Recall 93.90 96.15 2.40 94.18 97.81 3.85 99.17 99.50 0.34 100.00 100.00 0.00
F1 Score 93.85 96.14 2.44 94.12 97.80 3.91 99.17 99.50 0.34 100.00 100.00 0.00
Precision 94.63 96.69 2.18 93.63 98.30 4.98 99.50 99.83 0.33 100.00 100.00 0.00
DenseNet121 Recall 94.61 96.69 2.20 93.52 98.28 5.10 99.50 99.83 0.34 100.00 100.00 0.00
F1 Score 94.58 96.68 2.22 93.51 98.28 5.11 99.50 99.83 0.34 100.00 100.00 0.00
Precision 94.36 97.11 2.92 93.38 97.74 4.67 99.42 99.67 0.25 100.00 100.00 0.00
DenseNet169 Recall 94.39 97.12 2.90 93.23 97.71 4.81 99.42 99.67 0.25 100.00 100.00 0.00
F1 Score 94.35 97.11 2.92 93.19 97.71 4.85 99.42 99.67 0.25 100.00 100.00 0.00
Precision 96.38 98.49 2.19 94.12 97.54 3.64 99.83 99.83 0.00 99.08 100.00 0.93
DenseNet201 Recall 96.38 98.49 2.19 93.99 97.52 3.75 99.83 99.83 0.00 99.04 100.00 0.97
F1 Score 96.37 98.49 2.20 94.01 97.51 3.72 99.83 99.83 0.00 99.04 100.00 0.97
and achieved 98.1% overall accuracy. They used DenseNet202 as the backbone of the
Graph Convolutional Network (GCN). According to the results shown in their research,
the model obtained accuracy of 96.9%, 95.10% and 98.55% in classifying chinee apple,
snake weed and parkinsonia species, which is better than the previous results. The model
also classified 98.35% native plants correctly.
Our proposed technique (98.49%) outperformed the graph-based model (98.1%). DenseNet201
model achieved 97% and 99% accuracy for chinee apple and parkinsonia weed, respec-
tively. Although the performance was not improved for snake weed species, the model
137
5.3. Results
Figure 5.5: Confusion matrix for DeepWeeds dataset using DenseNet201 model.
yielded 100% accuracy for classifying native plants. This will ensure not to kill the off-
target plants, reducing the waste of herbicide and saving the native ecosystem (Olsen
et al., 2019).
D. Chen et al. (2022b) reported that the classification accuracy of the minority class
was relatively low in Cotton weed dataset. The weed species like prickly sida, ragweed,
crabgrass, swinecress, spurred anoda had less number of samples. That class imbal-
ance issue affected the classification accuracy. They replaced the cross entropy loss with
weighted cross entropy loss to improve the performance of the model.
Our proposed technique also achieved better results on recognising the minority
species of weed (Table 5.4). InceptionResNetV2, Xception, DenseNet169 and DenseNet201
classified prickly sida weed species with 100% accuracy, whereas the highest F1 score was
92% using the traditional approach. Although spurred anoda weed species had only
61 sample images, the classification accuracy of the proposed techniques was improved
significantly. The accuracy for other minority classes, including ragweed, crabgrass and
138
Chapter 5. Improving classification accuracy
Table 5.4: F1 score of the deep learning models for weed species from Cotton weed
dataset using traditional and proposed technique
Weed Models
Classification
InceptionResNetV2
species
approach
MobileNetV2
DenseNet121
DenseNet169
DenseNet201
Inception-V3
ResNet-50
Xception
VGG16
VGG19
Traditional 94.90 89.68 90.43 94.87 94.43 97.42 97.07 98.03 98.37 98.68
Carpetweeds
Proposed 98.39 98.05 97.40 97.73 98.38 99.67 98.39 99.35 98.70 98.69
Traditional 88.89 76.19 63.41 88.37 86.36 87.80 88.89 91.30 93.02 90.91
Crabgrass
Proposed 97.78 97.87 100.00 100.00 100.00 100.00 97.78 100.00 100.00 100.00
Traditional 74.23 52.63 42.22 72.90 81.42 71.84 84.91 88.68 84.68 92.45
Eclipta
Proposed 92.45 98.08 96.15 92.45 97.14 98.08 98.08 96.15 98.11 99.05
Traditional 82.50 64.00 62.65 78.95 80.49 87.50 87.18 85.37 87.06 89.16
Goosegrass
Proposed 96.63 96.47 91.36 97.73 97.67 97.73 94.25 100.00 97.67 96.63
Traditional 91.87 89.91 90.02 93.36 95.13 96.43 95.18 98.19 99.10 97.33
Morningglory
Proposed 96.40 98.00 96.07 96.16 97.30 99.33 98.87 98.88 97.29 97.30
Traditional 92.44 89.26 87.18 88.19 94.12 94.83 94.83 94.12 96.55 94.92
Nutsedge
Proposed 98.18 99.10 96.43 93.46 97.25 99.10 95.33 99.10 97.25 92.45
Palmer Traditional 88.28 81.33 83.87 90.91 89.12 91.99 92.58 96.45 97.84 96.77
Amaranth Proposed 94.62 96.40 96.03 96.45 95.71 98.21 97.86 97.86 96.77 97.14
Prickly Traditional 53.66 52.83 30.43 75.56 70.83 73.91 66.67 88.00 87.50 92.00
Sida Proposed 96.15 96.15 96.30 96.30 96.30 100.00 100.00 98.11 100.00 100.00
Traditional 86.86 78.82 76.36 90.61 84.39 93.26 89.14 94.51 94.44 96.67
Purslane
Proposed 93.99 98.32 97.18 94.51 96.70 99.45 96.13 97.21 97.24 97.24
Traditional 83.64 86.27 79.17 90.57 91.23 92.86 96.43 96.30 96.43 98.18
Ragweed
Proposed 100.00 98.18 100.00 98.18 98.18 96.30 96.43 98.18 96.43 98.18
Traditional 90.11 71.29 71.43 87.64 84.00 91.84 89.58 94.85 96.91 93.88
Sicklepod
Proposed 97.92 96.84 95.92 96.91 96.84 97.92 96.97 95.74 95.83 95.92
Spotted Traditional 90.32 76.40 65.31 88.89 86.96 91.11 88.17 96.84 92.47 95.83
Spurge Proposed 96.84 96.84 98.97 95.74 96.84 98.97 97.92 98.97 100.00 97.92
Spurred Traditional 23.53 57.14 23.53 72.00 78.26 80.00 72.73 84.62 92.31 88.00
Anoda Proposed 90.91 96.00 90.91 90.91 95.65 95.65 100.00 100.00 100.00 100.00
Traditional 89.66 75.00 86.67 92.86 82.76 89.66 92.86 96.77 96.55 90.32
Swinecress
Proposed 100.00 96.77 93.33 96.77 96.77 100.00 100.00 96.77 96.77 96.77
Traditional 83.42 71.87 71.92 88.17 83.15 84.97 90.43 94.57 91.89 95.03
Waterhemp
Proposed 96.70 97.83 98.89 97.80 97.83 99.45 98.34 97.27 97.80 98.36
139
5.4. Discussion
We have compared the classification accuracy of our proposed approach with the best
results of traditional approach and the best of the previous studies on the respective
datasets (Table 5.5). Our proposed approach showed significant improvement compared
to the traditional approaches on both DeepWeeds and Cotton weed datasets. For some
of weed species, the classification accuracy is improved by more than 10%. Although it
was not significant, better accuracy was observed using the proposed technique on the
other two datasets as well. We also show best results from all previous studies. Different
studies achieved the best results on different datasets. However, our proposed approach
(with DenseNet169) outperformed all the prior techniques. Hu et al. (2020) achieved the
highest accuracy of 98.10% on the DeepWeeds dataset using GCN, whereas the accuracy
for our proposed technique was 98.49%. Moreover, the DenseNet201 model classified
most of the weed species more accurately. Similar outcomes were observed for the other
datasets using the proposed approach. D. Chen et al. (2022b), H. Jiang et al. (2020)
and Espejo-Garcia et al. (2020) reported the highest accuracy of 98.40%, 97.80% and
99.29% on Cotton weed, Corn weed and Cotton Tomato weed datasets, respectively. Our
patch-based technique outperformed previous approaches with respective average results
of 98.46%, 99.83% and 100%.
5.4 Discussion
The primary objective of this research was to improve the classification accuracy of
crop and weed species using deep learning techniques. The results indicate that using the
proposed patch-based approach, the deep learning model can achieve better classification
accuracy irrespective of having challenges such as the number of images in the dataset,
inter-class similarity, intra-class dissimilarity or class-imbalanced dataset.
Here we have discussed the performance of our proposed pipeline with the traditional
approach.
140
Chapter 5. Improving classification accuracy
141
5.4. Discussion
In the traditional approach, the results indicate that the deep models cannot handle
the issues with the dataset. For the DeepWeeds dataset, DenseNet169 and DenseNet201
models achieved the highest accuracy, but the prediction result was affected by inter-class
similarity and intra-class dissimilarity problems. Figure 5.6 shows a confusion matrix of
the DenseNet201 model on the DeepWeeds dataset. It can be seen that the misclassifica-
tion rate for chinee apple and snake weed are very high. Since, there are lot of similarity
in the morphology of those two weed species, it become challenging for the model to dis-
tinguish them by comparing the features of some part of an image. Besides, many weed
images were classified as non-weed, and some native plants were recognised as prickly
acacia or snake weeds.
On the other hand, in a patch-based approach, model can compare several parts of an
image. In this case, the deep learning model may classify some of them incorrectly, but
due to weighted majority voting technique, in most cases it can identify the correct label
for the image. Table 5.3 shows that, the classification accuracy improved significantly
using our proposed patch-based approach.
Moreover, the models cannot achieve desired results while applied to the imbalanced
142
Chapter 5. Improving classification accuracy
Cotton weed dataset. Figure 5.7 shows how the classification accuracy is affected by the
class imbalance issue. The spurred andoda weed had the lowest, and the morningglory
weed had the highest number of images in the dataset. The recognition accuracy for the
classes with fewer samples was relatively low.
Figure 5.7: Illustration of the relationship between the number of data in a class and
classification accuracy of the model.
The performance of the models on Corn weed dataset was better since it was a bal-
anced dataset with adequate number of images in each class. However, there are still
some places to improve. On the other hand, classification accuracy of some of the models
on Cotton Tomato weed dataset was relatively low. This is because the dataset does not
have enough image data to train the model.
We have seen here that the models’ performance were improved using patch-based
approach and by resizing the image to a resolution of 1024 × 1024. Moreover, the whole
image does not really contribute to predicting a class label while classifying an image. The
deep learning model generally focuses on some regions of an image to make a prediction
(Selvaraju et al., 2017). In the traditional approach, a model predicts the class label based
on a specific region. If that region is not similar to the right class, then the prediction will
be wrong. On the other hand, in a patch-based approach, a model can look into several
regions of the image, increasing the probability of getting it right. It is also noticeable
that the model can classify images more accurately using overlapping patches.
143
5.4. Discussion
Although the performance of the models was improved by using the patch-based
approach with higher resolution images for three datasets, it was not so for the Cotton
weed dataset. The images in this dataset were of various resolutions and captured by
different camera types. The image quality was distorted by resizing a low-resolution
image to a higher resolution, which affected the performance of the models (Dodge &
Karam, 2016). In this research, we enhanced the images using GAN based approach to
address that issue and then resized them to the desired size to maintain the quality of
the image.
The Cotton Tomato weed dataset had relatively fewer image data. Espejo-Garcia
et al. (2020) reported the highest classification accuracy of 99.29% using the DenseNet
model with Support Vector Machine as the classifier. The proposed approach achieved
100% recognition accuracy on that dataset using eight out of ten deep learning models.
The remaining two models, i.e., MobileNetV2 and VGG19, obtained more than 99%
accuracy.
On the other hand, we tested our technique on the Corn weed dataset, which had
enough and an equal number of image data for each class. Our approach successfully
classified all the images except one using DenseNet201 and DenseNet121. One of the
Chenopodium album weed images was recognised as Bluegrass (Figure 5.8). Although
the image is labelled as Chenopodium album weed, it has more Bluegrass plants. When
the image was divided into patches, more patches were predicted as Bluegrass.
The results on the DeepWeeds dataset justify that the proposed technique can han-
144
Chapter 5. Improving classification accuracy
dle inter-class similarity and intra-class dissimilarity problems more efficiently. The
DenceNet201 model achieved more accuracy in distinguishing between chinee apple and
snakeweed. Only one image of parkinsonia was classified as prickly acacia, which signifi-
cantly improved. Besides, very few native plants were misclassified (Figure 5.5).
In this part, we compared the results of our proposed pipeline with the related studies.
We have taken an image of a chinee apple weed to explain how our proposed technique
achieved better accuracy than the traditional approach (Figure 5.9a). The image was
classified as snake weed using the DenseNet201 model in the traditional approach. We
have taken the output of the final convolutional layer (Figure 5.9b) by applying Gradient-
weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al., 2017). The colour
scale used for Grad-CAM heatmaps typically ranges from cool to hot colours, where the
cool colour indicates low importance, and the hot colour signifies high significance. In
this case, red regions are highly important, and blue areas have less importance. The
yellow part has a medium contribution in predicting the image’s class. In Figure 5.9b,
the red regions were more emphasised to predict the class of that image, and the model
finds that part was similar to snake weed.
In our proposed approach, 22 overlapping patches were extracted from that image.
Figure 5.10 shows the Grad-CAM for all the patches. Now the model has more options
to decide the class of the image. In this image, eleven patches were classified as chinee
apples, and eight were as snake weed. That indicates a large part of the image was
145
5.4. Discussion
(a) Chinee apple classified as snake weed using (b) Gradient-weighted Class Activation Mapping
traditional approach (Grad-CAM) of the image
similar to snake weed. However, the model successfully identified it as a chinee apple.
It also explains how our approach can achieve better accuracy in classifying datasets
with inter-class similarity or intra-class dissimilarity. Moreover, a cursory look in some
cases may show that the patches at the periphery exhibited greater significance compared
to the centre patches. However, a closer investigation would reveal the cause is rather
the availability of visible features. As shown in Figure 5.9a, the weed leaves are spread
throughout the whole image. However, there are several dark parts in the image where
there are no visible leaves. When the images were divided into patches and the relatively
more important patches were extracted, consequently, the classification model focused on
the visible parts of the patches. The weed leaves in the centre of some patches are either
unclear (not visible) or insignificant to determine the class label. As a result, for those
patches, the parts at the periphery exhibited greater significance.
Our proposed technique showed improved results in classifying the minority weed
classes. D. Chen et al. (2022b) reported the highest accuracy for spurred anoda, swinecress,
crabgrass and prickly sida, were 92%, 96.36%, 98.82% and 99%, respectively, by intro-
146
Chapter 5. Improving classification accuracy
(a) Snake weed (b) Negative (c) Chinee apple (d) Chinee apple (e) Snake weed
(f) Snake weed (g) Chinee apple (h) Negative (i) Chinee apple (j) Chinee apple
(k) Chinee apple (l) Chinee apple (m) Chinee apple (n) Chinee apple (o) Snake weed
(p) Negative (q) Snake weed (r) Chinee apple (s) Snake weed (t) Snake weed
Figure 5.10: Grad-CAM of the extracted patches from the image in Figure 5.9a
ducing weighted cross-entropy loss. However, the models showed better performance in
recognising those weed classes. Since the InceptionResNetV2 model achieved the highest
accuracy on the Cotton weed dataset, we compared the results with that model. The
model recognised all images of crabgrass, prickly sida and swine cress weed. Besides,
95.63% images spurred anoda weed were identified correctly, which is a significant im-
provement. Since the model can be trained on more data due to having multiple patches
from one image, the model can learn to classify the images of the minority classes more
147
5.5. Conclusion
accurately. Besides, the model can predict the class label of an image using a weighted
majority voting technique.
Our method has some limitations. For example, the method was tested on publicly
available datasets. The proposed pipeline has not been tested in field trials, which we
aim to do in a future growing season. Technically, the process of patch selection will
take additional computational time. However, it is compensated by the fact that only
the selected patches (i.e., a subset of image patches) are used in the training and testing.
Yet, our proposed pipeline achieved superior recognition accuracy. To mitigate the patch
selection time-impost, in future work we will investigate integrated patch selection and
learning using e.g., mutual information between patches, which will require fewer patches
for training and at the same time improve the model’s learning. Also, the proposed
approach can be evaluated in other contexts, beyond weed recognition; e.g., on large
image datasets with noise and perturbations.
5.5 Conclusion
148
Chapter 5. Improving classification accuracy
In terms of usability, we like to highlight that, artificial intelligence and deep learning
algorithms have been used/investigated in weed recognition technologies. Our proposed
patch-based pipeline can seamlessly be integrated with the existing algorithms for higher
accuracy. In terms of utility, it will offer higher accuracy in weed detection, localisation,
and recognition, which can be applied in developing targeted weed management and
control strategies for minimising costs and environmental impact, and better yields.
149
Chapter 6
Automatic weed detection and classification can significantly reduce weed manage-
ment costs and improve crop yields and quality. Weed detection in crops from imagery
is inherently a challenging problem. Because both weeds and crops are of similar colour
(green on green), their growth and texture are somewhat similar; weeds also vary based
on crops, geographical locations, seasons and even weather patterns. This study proposes
a novel approach utilising object detection and meta-learning techniques for generalised
weed detection, transcending the limitations of varying field contexts. Instead of classi-
fying weeds by species, we classified them based on their morphological families aligned
with farming practices. An object detector, e.g., a YOLO model is employed for plant
detection, while a Siamese network, leveraging state-of-the-art deep learning models as
its backbone, is used for weed classification. We repurposed and used three publicly
available datasets, namely, Weed25, Cotton weed and Corn weed data. Each dataset
contained multiple species of weeds, whereas we grouped those into three classes based
on the weed morphology. YOLOv7 achieved the best result as a plant detector, and the
VGG16 model as the feature extractor for the Siamese network. Moreover, the models
were trained on one dataset (Weed25) and applied to other datasets (Cotton weed and
Corn weed) without further training. We also observed that the classification accuracy
of the Siamese network was improved using the cosine similarity function for calculat-
ing contrastive loss. The YOLOv7 models obtained the mAP of 91.03% on the Weed25
Chapter 6. Generalised approach for weed recognition
dataset, which was used for training the model. The mAPs for the unseen datasets
were 84.65% and 81.16%. As mentioned earlier, the classification accuracies with the
best combination were 97.59%, 93.67% and 93.35% for the Weed25, Cotton weed and
Corn weed datasets, respectively. We also compared the classification performance of
our proposed technique with the state-of-the-art Convolutional Neural Network models.
The proposed approach advances weed classification accuracy and presents a viable solu-
tion for dataset independent, i.e., site-independent weed detection, fostering sustainable
agricultural practices.
6.1 Introduction
151
6.1. Introduction
2023a; Haque & Sohel, 2022; Y. He et al., 2019b; Kuzuhara et al., 2020; W. Li et al.,
2021), plant frost and stress monitoring (Khotimah et al., 2023; Shammi et al., 2023;
A. Singh et al., 2021), automated harvesting and sorting (Altaheri et al., 2019; Haggag
et al., 2019; Nasiri et al., 2019), decision support systems for farmers (Kukar et al., 2019;
Zhai et al., 2020) and many more.
When applied to weed detection, this technology analyses images captured in agricul-
tural fields, differentiating between crops and unwanted plants with remarkable accuracy.
The implications of DL in weed detection extend far beyond mere identification. This
technology offers precise and targeted interventions. By pinpointing areas infested with
weeds, farmers can implement specific, localised treatments, optimising the use of her-
bicides, reducing chemical inputs, and minimising environmental impact (Hasan et al.,
2021; Jin et al., 2021; Rai et al., 2023; Razfar et al., 2022; Yu et al., 2019b).
Several studies proposed Convolutional Neural Networks (CNN) based approaches for
weed detection and classification in precision agriculture (Hasan et al., 2021, 2023a; Rai
et al., 2023; A. Sharma et al., 2020). Initial studies focused on applying DL models for
classifying weed and crop images (Asad & Bais, 2020; Bosilj et al., 2020; Partel et al.,
2020; Ramirez et al., 2020; W. Zhang et al., 2018). Classification of weed species can
be more advantageous when applying specific management strategies. Many researchers
proposed DL models to recognise weed species as well (Espejo-Garcia et al., 2020; Hasan
et al., 2023b; Hu et al., 2020; Olsen et al., 2019; Sunil et al., 2022; Trong et al., 2020).
On the other hand, weed classification approaches do not localise the instances of
weeds in the image, which is essential for a real-time selective spraying system. More-
over, the classification will be inappropriate if an image contains multiple weed instances.
Several researches applied DL-based object detection methods such as Region-based Con-
volutional Neural Networks (R-CNN), You only look once (YOLO), and Single Shot De-
tector (SSD) to address the issue (Czymmek et al., 2019; Dang et al., 2023; Espinoza
et al., 2020; Gao et al., 2020; Y. Jiang et al., 2019; Le et al., 2021; Osorio et al., 2020;
Partel et al., 2019a; Patidar et al., 2020; Quan et al., 2019; Sharpe et al., 2020; Sivakumar
et al., 2020; W. Zhang et al., 2018).
The problem with most weed detection and classification approaches is that they are
very much data dependent, i.e., site-specific and depend on the crop, geographic location
152
Chapter 6. Generalised approach for weed recognition
and weather. One weed detection setting may not apply to others, even if weeds grow
in the same crop. The DL models need to be retrained with a part of the dataset to
be analysed. In this study, we propose a novel approach that is not dependent on the
crop. This approach allows for dataset-independent weed classification, accommodating
variations in environmental conditions and diverse weed species since we classified them
based on their morphology.
Weeds can be classified as broadleaf weeds, grass and sedge according to their morpho-
logical characteristics (Monaco et al., 2002). Broadleaf weeds typically have wider leaves
with a net-like vein structure (Mithila et al., 2011). Grass weeds resemble grasses and
usually have long, narrow leaves with parallel veins (Moore & Nelson, 2017). Sedges have
triangular stems and grass-like leaves but differ from grasses by having solid, three-sided
stems (Shi et al., 2021). An effective weed management plan that targets the specific
weeds in the field can be developed by classifying them according to their morphology
(Scavo & Mauromicale, 2020; Westwood et al., 2018). No herbicide or management tech-
nique can effectively control all types of weeds (Chauhan, 2020; Scavo & Mauromicale,
2020). An efficient weed classification according to their morphology can ensure a specific
management approach irrespective of crop or geographic location.
Meta-learning in deep learning refers to the process where a model learns how to learn
(Huisman et al., 2021). Instead of just learning to perform a specific task, a meta-learning
model learns the learning process. This means it gains the ability to adapt quickly to
new tasks or domains with minimal training data by leveraging knowledge gained from
previous tasks (Finn et al., 2017). The approach can be applied to weed classification
based on morphology. It involves leveraging meta-learning techniques to categorise weeds
based on visual characteristics such as leaf shape, colour, size, and other morphological
features. The method can facilitate understanding semantic similarities among different
weeds based on their morphology. The model learns to recognise and group weeds with
similar visual characteristics together, even if they belong to different species.
153
6.1. Introduction
Figure 6.1: Example of weed images from the Weed25 datasets. Here, we shown
different species of weed with similar morphology. Figures (a), (b), (c), (d) and (e) are
broadleaf weeds which have different leaf shape, colour and texture. Figures (f), (g) and
(h) are grasses and Figure (i) is a sadge weed. Grass and sadge weed have quite similar
structure.
in Figure 6.1a. Moreover, several weed species were grouped into one according to mor-
phology. Figures 6.1b, 6.1c, 6.1d and 6.1e show example images from the Weed25 dataset
where weeds from different species are considered as broadleaf weed in our study. It im-
poses additional challenges for the DL classifier since weed species in the same group have
considerable dissimilarity in colour, texture and shape. Moreover, grass and sedge weeds
have quite similar morphology, making distinguishing them difficult (Figures 6.1f, 6.1g,
6.1h and 6.1i). The main contributions of this study are (1) to propose a meta-learning
154
Chapter 6. Generalised approach for weed recognition
The proposed approach is divided into two stages. First, we use a deep learning
model to detect weed plants. We used an object detection technique using an annotated
dataset for plant detection. We labelled the annotated plants in the dataset as “Plant”
irrespective of their species. After training, we used the trained model to detect plants
from another unseen dataset. The second stage is about classifying the plants according
to their morphology. We trained the Siamese network using the same dataset used for
plant detection. The state-of-the-art deep learning models were used as feature extractors
for the Siamese network. The model was trained to predict the similarity score of plants.
The dataset used here contains twenty-five species of weeds, and the model learns to
predict the similarity score using 25 classes. There were multiple weed species in the
dataset belonging to each category (broadleaf, grass and sedge) according to morphology.
We used the trained model to find the similarity score for images of unseen datasets.
The similarity score was calculated based on three classes, and the images were classified
depending on that.
Figure 6.2 shows the proposed pipeline for recognising weed in crops. First, a YOLO
model was trained using a dataset, which is named “A” for better understanding. The
dataset “A” contains several weed species. For training the YOLO model, we labelled all
weed species of dataset “A” as plants since the goal was to detect plants only, irrespective
of their species. Then, we used pairwise similarity learning based on the Siamese network
and trained the model with the “A” dataset. After that, we took an unseen dataset
(named “B”) and used the trained YOLO model to extract the plants from images. A
support set was prepared by selecting ten images randomly from “A” dataset. The images
in dataset “B” were considered the query set. We extract a plant image using the YOLO
model for every image in the query set. Then, the trained Siamese network predicted
the plant image’s similarity score with respect to the support set. The dataset “A” had
155
6.2. Materials and Methods
many classes, and we calculated the similarity score with every class in the dataset. We
grouped the classes of the support set into three types: broadleaf, grass and sedge. Each
group contained more than one class. Therefore, we calculate the average similarity score
of the classes in a group. The group with the highest similarity score was considered the
class of the query plant image.
image
Group weed
Train an object Prepare a dataset to Trained species
detection model train a Siamese network plant detec- according
to detect plants using the weed plants tion model to weed
morphology
Get similarity
Classify weed
score with
based on the each group
similarity score
for the image
Figure 6.2: The proposed pipeline for classifying weeds. Here, we have shown how the
models were trained and how the trained models were used on unseen datasets.
6.2.1 Dataset
In this research, we have used three public datasets: Weed25 (P. Wang et al., 2022),
Cotton weed (Dang et al., 2023) and Corn weed (H. Jiang et al., 2020). The Weed25
dataset has 14,023 images in 25 categories of weeds. Weeds in the images were annotated
using bounding boxes. In our experiments, we used this dataset to train both the YOLO
models and the Siamese network. On the other hand, the Cotton weed and Corn weed
datasets were used to evaluate the performance of the model. The cotton weed dataset
has twelve weed species, whereas four types of weeds are in the Corn weed dataset. The
bounding box annotations of the Cotton weed dataset were available at https://zenodo.
org/records/7535814. The Corn weed dataset was annotated by Hasan et al. (2024) using
156
Chapter 6. Generalised approach for weed recognition
bounding boxes. We removed the images of corn plants from the dataset for our study
since those were crops. Moreover, all class labels in the datasets were labelled as “Plant”
for the YOLO models. Table 6.1 provides an overview of the dataset.
In Table 6.1, we have grouped the weed classes into three categories since our objective
157
6.2. Materials and Methods
is to classify weeds into broadleaf, grass and sedge. It is worth mentioning that, according
to morphology, most weed species belong to broadleaf category.
There are several object detection techniques available in computer vision like YOLO
(You Only Look Once), SSD (Single Shot MultiBox Detector) and R-CNN (Region-based
Convolutional Neural Networks) (A. Kumar et al., 2020). Among them YOLO is known
for its speed and efficiency, making it suitable for real-time object detection applications
(Du, 2018). Moreover, it performs well across different types of objects and scenes,
making it versatile and suitable for various applications (Diwan et al., 2023). In this
study we chose to use two latest iterations of YOLO model, YOLOv7 (C.-Y. Wang et
al., 2023b) and YOLOv8 (Jocher et al., 2023a). The models were trained with Weed25
dataset. All classes of Weed25 dataset were labelled as “Plant” here. After training, we
used the model to get a bounding box coordinates of the “Plant” object.
We have divided the images of the Weed25 dataset into 25 classes based on the object-
level annotation of the dataset. The images were cropped based on the bounding box
coordinates in the annotations. We have applied the following steps to prepare the dataset
for training the few-shot model:
• Resize images: Since the images in the dataset were of different sizes, we applied
image resize them to a consistent size (224 × 224).
• Create a subset of the dataset: The Weed25 dataset contains 14,023 images with
one or more plants from 25 classes. To train the few-shot model with the entire
dataset require more computational resources. Therefore, we prepared 15 subsets
of the dataset for episodic training. Each subset contained 50 images from every
class.
• Split the Dataset: We divided a subset of the dataset into training (80%), validation
(10%), and test (10%) sets.
158
Chapter 6. Generalised approach for weed recognition
Feature vector
Image A Feature extractor
calculate sim-
calculate loss
ilarity score
Labelled Make
Weed25 image pairs
dataset
Feature vector
Image B Feature extractor
Figure 6.3: The training process of the Siamese network using Weed25 dataset.
• Create image pairs: Every image in the dataset is paired with every other image.
The pair is labelled as positive (1) if both images are from the same class and
negative (0) otherwise. We created train, test and validation pairs.
This study used a Siamese neural network containing two identical sub-networks that
share the same configuration, parameters, and weights. Each identical network takes an
input image and extracts features by passing it through several convolutional, pooling
and fully connected layers. We used three state-of-the-art deep learning models as feature
extractors, namely VGG16 (Simonyan & Zisserman, 2014), ResNet50 (K. He et al., 2016)
and InceptionV3 (Szegedy et al., 2016) and compare their performance. The output of the
feature extractors is the feature vectors, which are then compared using several distance
metrics. We have evaluated the efficiency of two well-known distance metrics, namely
negative Euclidean distance (Melekhov et al., 2016) and cosine similarity (Chicco, 2021).
The role of the distance metric is to measure the similarity between two feature vectors,
where a higher value indicates similarity and a lower value indicates dissimilarity. We used
the contrastive loss function in our Siamese network to optimise the model’s parameters.
Here, the role of contrastive loss is to discriminate the features of the input images using
either negative Euclidean distance or cosine similarity function. Figure 6.3 shows the
training process of the Siamese network. We used “Weed25” dataset to train the model.
After training the Siamese network, we evaluate the model’s performance using two
159
6.3. Results and Discussion
Calculate
Feature vector
mean
Support similarity
images from score for
Feature extractor broadleaf
Weed25 weed
dataset
Calculate similar- Calculate
ity score of the mean
query image with similarity
all support images score for
of 25 classes grass weed
Feature vector
Query Calculate
image from Feature extractor mean
unseen similarity
dataset score for
sedge weed
unseen datasets. In the evaluation face, “Weed25” dataset was used as the support set,
and the unseen dataset’s images were used as query images. The model extracted the
features of the support and query images, compared the feature vectors and predicted
the similarity score. Since our goal was to classify the images into three categories, the
mean similarity score was calculated for each group of classes. The model predicted the
class label based on the mean similarity score. The process is shown in Figure 6.4.
Here, we trained the YOLOv7 and YOLOv8 models with the Weed25 dataset. We
used 80% of the data for training and the rest for testing the models. The models were
not trained on the Cotton weed and Corn weed datasets. The quantitative results are
presented in Table 6.2. We showed the models’ performance on the test data and the
entire Weed25 dataset. The performance of YOLO models for detecting bounding box
coordinates on “Cotton weed” and “Corn weed” datasets are also presented in Table 6.2.
The mean Average Precision (mAP) of the YOLOv7 and YOLOv8 models were 92.37%
and 91.19% on the test set of the Weed25 dataset. The YOLOv7 model achieved a higher
mAP of 91.03% than the YOLOv8 model, which was 89.43% when applied to the entire
Weed25 dataset. We then used the trained model to detect plant objects from the “Cotton
weed” and “Corn weed” datasets. The YOLOv7 model obtained the mAP of 84.65% and
160
Chapter 6. Generalised approach for weed recognition
Table 6.2: Performance of YOLO models in detecting plants. The models were trained
on the Weed25 dataset only.
Detected plants mAP (%)
Dataset Gound truth plants
YOLOv7 YOLOv8 YOLOv7 YOLOv8
Weed25 (20% test data) 8,723 7,951 7,793 92.37 91.19
Weed25 43,527 42,435 41,689 91.03 89.43
Cotton weed 9,388 8,343 7,714 84.65 78.26
Corn weed 7,225 6,156 5,868 81.16 77.37
81.16% respectively on “Cotton weed” and “Corn weed” datasets. However, the mAP for
YOLOv8 was 78.26% and 77.37% in those cases, which was lower than YOLOv7.
Another vital thing to notice here was the number of objects detected by the models
(Table 6.2). There were 43,527 plant objects in the Weed25 dataset set, whereas 42,435
and 41,680 were detected by YOLOv7 and YOLOv8 models, respectively. Similarly, in
the Cotton weed dataset, YOLOv7 and YOLOv8 models identified 8,343 and 7,714 plants,
although the total number of plants was 9,388. Both models detected fewer plants than
the ground truth in the Corn weed dataset as well. This happened because of occlusion
and the diversity of plant morphology (Figure 6.5). However, it will not affect the weed
management system much if the model detects the plants correctly. The management
technique needs to be applied to the whole region detected by the models.
We used two similarity functions to measure the Siamese network’s similarity score.
Our study showed that the model achieved higher accuracy using cosine similarity than
the negative Euclidean distance function (Table 6.3). The Euclidean distance function
is sensitive to variations in pixel intensities across images and captures the differences
in pixel intensities effectively. On the other hand, cosine similarity function focuses on
the orientation or similarity of feature vectors rather than their magnitudes. According
to Amer and Abdalla (2020) and Saha et al. (2020), cosine similarity helps to measure
their similarity irrespective of their magnitudes, making it suitable for similarity-based
categorisation. In this study, it would be challenging to find similarity score based on
pixel intensities across images which is done by the Euclidean distance function. The
cosine similarity function performs better since it focuses more on the spatial patterns
and relationships in the images. The VGG16 model obtained an accuracy of 96.35%,
161
6.3. Results and Discussion
(a) The ground truth annotation has (b) The ground truth annotation has (c) The ground truth annotation has
five plants and YOLOv7 model five plants and YOLOv7 model five plants and YOLOv7 model
detected one (Weed25). detected two (Cotton weed). detected three (Corn weed).
Figure 6.5: The ground truth image with annotation and plants detected by YOLOv7
model which explains why fewer plants are detected the model.
89.07% and 90.97% on Weed25, Cotton weed and Corn weed datasets, respectively, using
negative Euclidean distance as the similarity function. The accuracies were improved by
1.24%, 4.60% and 2.38% on the datasets using the cosine similarity function. ResNet50
and InceptionV3 showed similar results.
Moreover, using both similarity functions, the VGG16 model achieved the highest
accuracy among the DL models. The ResNet50 obtained the highest accuracy of 96.69%,
90.40%, and 92.14% on Weed25, Cotton weed and Corn weed datasets using the cosine
similarity function. Although the InceptionV3 procured improved results using the cosine
similarity functions, the performance was lower than other models.
All the models performed consistently better using the cosine similarity function,
as shown in Table 6.3. However, the models achieved better accuracy on the Weed25
dataset since those were trained using the dataset. Table 6.4 shows the performance of
models in classifying the weed plants. The Siamese model obtained the highest precision
162
Chapter 6. Generalised approach for weed recognition
Table 6.3: Comparison between two similarity functions. The results show the
classification accuracy for three dataset using different feature extractors. Here, the
models were trained on Weed25 dataset only.
Dataset
Similarity Function Feature Extractor
Weed25 Cotton Weed Corn Weed
of 99.52% while classifying broadleaf weed in the Weed25 dataset using the VGG16
model as the feature extractor and cosine similarity function for calculating contrastive
loss. However, performance was not similar for recognising grass (precision is 88.29%)
and sedge (precision is 85.24%). Moreover, the recall for sedge weed was 90.52%, much
higher than the precision value. It indicates that the several sedge weeds were classified
incorrectly. Since most plants were broadleaf weeds in the training set, the models could
not correctly classify the grass and sedge. Moreover, to train the model, we had to resize
the plant images to 224 × 224. Some of the grass and sedge plants were smaller in size.
When the image was enhanced, the plant morphology became similar to broadleaf weed.
Besides, the grass and sedge weeds had quite similar shapes and sizes. Therefore, many
grass weeds were classified as sedge and vice-versa.
In the Cotton weed dataset, the precision value for classifying broadleaf weed using
the cosine similarity function was 95.14%, 93.12% and 91.10% with VGG16, ResNet50
and InceptionV3, respectively. In broadleaf weed classification, the precision values were
higher than recall using all three feature extractors. It indicates that the model predicts
the most broadleaf weeds correctly, but they also missed many. While classifying grass
weed, the recall values were higher than the precision. It indicates that the model profi-
ciently labelled most of the images as grass. Although the models might have labelled a
lot of grass weeds correctly, some of those labels might be incorrect. In this case, some
of the sedge weeds were labelled as grass, but it rarely missed an actual grass weed.
The performance of the models was better in the Corn weed dataset than in the
Cotton weed dataset since the images of the dataset had less diversity and fewer classes
163
6.3. Results and Discussion
Table 6.4: The performance (%) of models in recognising the weed class. Here, NED is
for Negative Eucledian Distance and CS for Cosine Similarity
in the original labels. The average classification accuracy of the models was very close.
The F1 scores in recognising broadleaf weeds were 95.17%, 93.49% and 93.56% using
VGG16, ResNet50 and InceptionV3, respectively, with cosine similarity function. The
VGG16 model obtained the F1 scores of 93.56% and 88.60% for grass and sedge weeds,
respectively, which were the highest. The other two models showed similar accuracy in
classifying them.
164
Chapter 6. Generalised approach for weed recognition
(a) Example images with ground truth annotation and the detection of plants from Weed25
dataset
(b) Example images with ground truth annotation and the detection of plants from Cotton
weed dataset
(c) Example images with ground truth annotation and the detection of plants from Corn weed
dataset
Figure 6.6: Example of the detection and classification of weeds from images using the
proposed approach.
165
6.3. Results and Discussion
We have shown some example images to illustrate the performance of our proposed
approach in Figure 6.6. Since the Siamese network achieved the highest accuracy using
the VGG16 model as the feature extractor with cosine similarity function, we provided
the results for that only. Figure 6.6a shows the example images from the Weed25 dataset.
The first image here is a purslane weed, as annotated in the ground truth, which is a
broadleaf weed. Our model detected two plants and classified them as broadleaf weeds.
The second and third images have green foxtail and sedge weeds, which our technique
detected and classified correctly as grass and sedge. The images in Figure 6.6b are from
the Cotton weed dataset. In this case, all plants were recognised correctly. According to
the ground truth annotation, the plants were spurred anoda, goosegrass and nutsedge,
which were detected as broadleaf, grass and sedge, respectively. Figure 6.6c contains
images from the Corn weed dataset. Some of the plants were classified incorrectly here.
For instance, the first image had four bluegrass. The model detected five plants there,
and two of them were classified as broadleaf weeds. The rest were recognised as grass.
The second image contains two bluegrass and a goosefoot weed. Although they were
classified correctly, the model detected two plants.
The proposed technique holds high potential for real-time integration into precision
agriculture, fostering sustainable and efficient farming practices across diverse agricul-
tural landscapes. Unlike traditional methods that rely heavily on specific environmental
conditions or training datasets from particular locations, this approach can generalise
well across various geographies. Moreover, this versatility makes it suitable for site-
independent weed detection. This technique can effectively leverage pre-trained models
and extract relevant features even from limited labelled data. Our proposed method can
contribute to developing automated weed detection and classification systems, leading to
increased efficiency and reduced labour costs in agriculture.
We have trained the state-of-the-art CNN models using the detected plants to justify
the efficacy of our proposed technique. We trained VGG16, ResNet50 and InceptionV3
models with the images. Table 6.5 compares the performance of the CNN models with
166
Chapter 6. Generalised approach for weed recognition
our proposed method on the datasets. Here, we trained the models with the Weed25
dataset and tested the performance on the other two datasets (Cotton weed and Corn
weed) and the test set of the Weed25 dataset. We took 80% of the data from the Weed25
dataset for training and used the rest for testing. Since our proposed technique achieved
the best performance using the cosine similarity function for calculating contrastive loss
(Table 6.4), we compared our best result with the CNN models’ results in Table 6.5.
The CNN models showed promising results on The Weed25 dataset. ResNet50 model
achieved the highest accuracy of 95.76%. The VGG16 and InceptionV3 models obtained
95.73% and 94.26% accuracy, respectively, on the dataset. On the other hand, our pro-
posed approach with VGG16 as the feature extractor achieved 97.59% accuracy on the
Weed25 dataset. It is important to notice here that the CNN models classified broadleaf
weeds more accurately than the grass and sedge weeds. The ResNer50 model achieved
the highest F1 score of 97.76% for broadleaf weeds, where the F1 scores were 88.25% and
82.31% for grass and sedge weeds, respectively. Since the dataset has fewer samples from
grass and sedge weed classes, the models were biased. In contrast, the classification accu-
racy of grass and sedge weeds was better using our proposed technique with the Siamese
network.
The differences between the two approaches are more noticeable when the trained
model is applied to the unseen dataset without further training. The accuracy of the
CNN models was much lower for the unseen Cotton weed and Corn weed datasets since
the models were not trained on those datasets. The ResNet50 model obtained the high-
est accuracy of 91.86% and 89.41% as well on the Cotton weed and Corn weed datasets,
respectively. The proposed technique achieved the accuracy of 93.67% and 93.35%, re-
spectively, on those two datasets. However, the overall accuracy reflected only some of the
scenarios. The CNN models found it challenging to classify the grass and sedge weeds
from those datasets. The F1 scores for grass were 55.09%, 62.07% and 51.39% using
VGG16, ResNet50 and InceptionV3 models, respectively, for the Cotton weed dataset.
The models obtained the F1 scores of 18.13%, 25.02% and 14.07%, respectively, for sedge
weeds. In contrast, the F1 scores for grass and sedge weeds for the Cotton weed dataset
were 77.93% and 86.14%, respectively, using the VGG16 model as a feature extractor.
The CNN models classified sedge weeds more accurately from the Corn weed dataset
167
6.3. Results and Discussion
Table 6.5: Performance (%) comparison between the state-of-the-art CNN models and
our proposed method. Here, the models were trained on Weed25 dataset only. The
performances of the models were evaluated on the other datasets without further
training. The CNN columns represent the classification results using CNN models
(VGG16, ResNet50 and InceptionV3). The columns for the proposed methods represent
the best results using our proposed approach.
since the sedge weeds in the Weed25 dataset were similar to the Corn weed dataset.
The F1 score for sedge weeds using the ResNet50 model was 91.59%, whereas our pro-
posed technique obtained 88.60%. However, the models achieved very low accuracy while
168
Chapter 6. Generalised approach for weed recognition
classifying grasses in the dataset. The ResNet50 models achieved the highest F1 score
of 77.01% for grass weeds, much lower than our proposed approach (93.56% using the
VGG16 model). The performance of the CNN models suggests that the models can only
accurately classify weeds based on morphology with further training.
This study showed a different direction in detecting and classifying weeds in the field.
Here, we have classified the weeds based on their morphology without retraining the
models with a new dataset and irrespective of the species of weeds. However, there are
a few limitations of this study. First, the dataset that was used to train the model had
fewer samples for grass and sedge weeds. The performance may improve if more training
samples were available for them. Secondly, we chose to use the state-of-the-art model as
the feature extractor in this study. A custom or any other state-of-the-art model should
be evaluated to find an efficient model for this purpose. Finally, we trained the model
using contrastive loss with negative Euclidean distance and cosine similarity function.
There are other loss functions available for Siamese networks which should be explored.
In the future, we will add more data belonging to grass and sedge weeds to train
the model and improve efficiency. Moreover, we will explore the efficacy of other deep
learning models to extract features from the images. Besides, the models’ parameters
will need to be optimised using the other loss functions, such as triplet loss, in future.
6.5 Conclusion
In this study, we have proposed a technique to detect and classify weeds from im-
ages based on their morphology. We have broadly categorised weeds into three classes:
broadleaf, grass and sedge. In our proposed approach, first, we trained the YOLOv7 and
YOLOv8 models to detect plants from an image. The YOLOv7 model detected plants
more efficiently. We then trained the Siamese network to predict the similarity score
for the detected plants. The models were trained using the Weed25 dataset only. We
then applied the trained model on two unseen datasets (Cotton weed and Corn weed
169
6.5. Conclusion
dataset) for plant detection and predicting the similarity score. Our goal was to classify
weeds according to their morphology and not based on their species. Therefore, the weed
species in the Weed25 dataset were grouped into three classes, as mentioned earlier, and
the Siamese network predicted the similarity score for the groups. The YOLOv7 model
obtained the mAP of 91.03%, 84.65% and 81.16% for Weed25, Cotton weed and Corn
weed datasets, respectively. We used negative Euclidean distance and cosine similarity
function for training the Siamese network and observed that the models achieved better
accuracy using the cosine similarity function. Moreover, among the three deep learn-
ing models as feature extractors, the VGG16 model showed better performance. The
VGG16 model achieved the highest classification accuracy of 97.59%, 93.67% and 93.35%
on Weed25, Cotton weed and Corn weed datasets, respectively. Besides, the model clas-
sified broadleaf weed more accurately than grass and sedge since most weed species in
the Weed25 dataset were broadleaf weeds. In addition, we compared the performance
of the state-of-the-art CNN models with our proposed technique. The results showed
significant improvement in classification accuracy using the proposed Siamese network
based approach.
170
Chapter 7
Conclusion
In conclusion, this thesis has investigated deep learning-based weed detection to ad-
dress the challenges and has explored the potential of automating agricultural practices.
The journey through the chapters has provided valuable insights into various aspects of
weed detection, classification, and overall precision agriculture. Here, we summarise the
key findings and contributions of this research.
7.1 Contributions
This thesis has presented several innovative contributions that enhance the real-time
performance and accuracy of weed detecting, species recognition and category recognition
thereby contributing significantly to the effectiveness of automatic weed control systems.
The contributions of the thesis is summarised as follows.
domain, summarising various approaches to weed detection. The findings indicate a dom-
inant use of supervised learning techniques, particularly leveraging state-of-the-art deep
learning models. These studies demonstrate enhanced performance and classification
accuracy by fine-tuning pre-trained models on diverse plant datasets. While achieving
remarkable accuracy, it is observed that the experiments excel primarily under specific
conditions, such as on small datasets involving a limited number of crops and weed
species. The computational speed in the recognition process emerges as a limiting factor,
especially concerning real-time applications on fast-moving herbicide spraying vehicles.
In Chapter 3, the study utilised four crop weed datasets with 20 different crop and
weed species from various geographical locations. Five state-of-the-art CNN models were
employed for image classification, focusing on comparing transfer learning and fine-tuning
approaches. Fine-tuning proved more effective in achieving accurate image classification.
Combining datasets introduced complexity, leading to a performance decrease due to
specific weed species. Data augmentation addressed class imbalance issues and improved
model performance, particularly in distinguishing challenging weed species. ResNet-50
emerged as the most accurate model. The study emphasised the role of transfer learning
in mitigating the need for extensive datasets when training models from scratch, as pre-
trained models capture generalised features that can be fine-tuned for specific tasks,
enhancing classification accuracy. The research also addresses the need for large-scale
benchmark weed datasets.
In Chapter 4, a publicly available dataset with corn and associated weeds was repur-
posed, with object-level labelling through bounding boxes for object detection. YOLOv7
exhibited the best mean average precision (mAP). YOLOv7, YOLOv7-tiny, and YOLOv8x
demonstrated promising accuracy and inference times for real-time weed detection, out-
performing the Faster-RCNN model. The study highlights the importance of optimis-
ing inference time and improving detection accuracy through further research. More-
172
Chapter 7. Conclusion
over, the study emphasises the potential for enhancing model performance by training
with a large and balanced dataset. Data augmentation techniques addressed class im-
balances and improved weed detection accuracy. Overall, the outcomes suggest that
YOLOv7 and YOLOv8 models are effective in detecting corn and associated weeds, of-
fering prospects for developing selective sprayers or automatic weed control systems. The
proposed method allows real-time localisation and classification of weeds in images or
video frames.
173
7.2. Future work
using state-of-the-art deep learning models as its backbone for weed classification. The
study repurposes and uses three publicly available datasets and groups the weeds into
three classes based on their morphology: broadleaf, grass, and sedge. The YOLOv7
model achieved the best result as a plant detector, and the VGG16 model was the feature
extractor for the Siamese network. The models were trained on one dataset and applied to
others without further training. The study also observed that the classification accuracy
of the Siamese network was improved using the cosine similarity function for calculating
contrastive loss.
In weed recognition using deep learning, potential avenues for future work abound.
Future studies may focus on advancing models to recognise and classify multiple weed
species concurrently, contributing to a more comprehensive weed management system.
Emphasis should be placed on optimising models for real-time applications to ensure
prompt responses in agricultural settings. Additionally, efforts should be directed to-
wards enhancing model adaptability to varied environmental conditions, such as different
lighting and weather conditions.
174
Chapter 7. Conclusion
adopting transfer learning offers a viable solution to circumvent the need for large datasets
for training deep learning models from scratch. Pre-trained models like those trained on
large datasets like ImageNet can capture detailed generalised features from visual data.
However, as ImageNet lacks specific categorical labelling for weeds or crops, fine-tuning
the pre-trained weights using datasets specific to crops and weeds becomes crucial. This
fine-tuning process enables the model to capture dataset-specific or task-specific features,
thereby improving overall classification accuracy.
Furthermore, we understand that labelling images for training deep learning networks
can be costly and time-consuming. To tackle this challenge and make weed detection
models more scalable, looking into weakly-supervised or self-supervised deep learning
methods would be helpful. Weakly-supervised learning uses less precise annotations or
partial labels during training. This method lets models learn from a broader range of data
without manual labelling. We can lessen the need for manual annotation by trying out
weakly-supervised approaches. Similarly, self-supervised learning methods allow models
to learn from unlabelled data. By using self-supervised techniques, we can reduce the
need for extensive manual labelling. Adding weakly-supervised or self-supervised learning
to our method could make the process more efficient.
In Chapter 4, we highlighted that the dataset images were acquired under diverse
lighting conditions, and the presence of occluded plants significantly affected classifica-
tion accuracy. Future research efforts should consider overcoming this limitation and
enhancing overall model performance. Moreover, our future work will specifically con-
centrate on elevating the performance of the two-stage object detector, aiming to improve
175
7.2. Future work
accuracy and reduce inference time. While the approach demonstrated in this chapter
enables real-time detection and classification of weeds in image or video frames, on-field
trials are essential to test and validate the proposed techniques thoroughly.
In Chapter 6, our study presented a unique approach to weed detection and classifica-
tion by morphologically categorising weeds without retraining models or considering weed
species. However, certain limitations should be addressed in future works. Firstly, the
dataset used for model training had limited samples for grass and sedge weeds, potentially
leading to enhanced performance with a larger training set. Additionally, the decision
to employ a state-of-the-art model as the feature extractor may benefit from evaluating
custom or alternative models for efficiency. Lastly, our model training utilised contrastive
loss with negative Euclidean distance and cosine similarity function for Siamese networks;
exploring other loss functions, such as triplet loss, is an avenue for future investigation.
To address these limitations, our future work will augment the dataset with more grass
and sedge weeds samples, evaluate alternative deep learning models for feature extraction,
and optimise model parameters using different loss functions like triplet loss.
Our future goal is to conduct field trials for the proposed weed detection models. The
aim is to validate their real-world applicability and effectiveness across diverse agricultural
176
Chapter 7. Conclusion
Our field trials will be conducted under varying environmental conditions, including
different soil types, lighting conditions, and weather patterns. This comprehensive ap-
proach will ensure that the proposed weed detection models are robust and adaptable to
the complexities of real-world agricultural operations.
Moreover, we will actively engage with farmers and agricultural professionals to gather
feedback on the usability and practicality of the models in field settings. This feedback
will be instrumental in refining the models and optimising their performance to meet the
specific needs and challenges faced by farmers.
To facilitate the development and testing of our prototype system, hardware con-
siderations will be integral to the process. The prototype system will be optimised for
deployment on various computing platforms, including resource-constrained devices com-
monly found in agricultural environments. This optimisation will enhance the accessibil-
ity and scalability of our solution, ensuring its widespread adoption and impact within
the agricultural community.
177
Bibliography
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine
learning. 12th {USENIX} Symposium on Operating Systems Design and Imple-
mentation ({OSDI} 16), 265–283.
Abdalla, A., Cen, H., Wan, L., Rashid, R., Weng, H., Zhou, W., & He, Y. (2019). Fine-
tuning convolutional neural network with transfer learning for semantic segmen-
tation of ground-level oilseed rape images in a field with high weed pressure.
Computers and Electronics in Agriculture, 167, 105091.
Abuhani, D. A., Hussain, M. H., Khan, J., ElMohandes, M., & Zualkernan, I. (2023). Crop
and weed detection in sunflower and sugarbeet fields using single shot detectors.
2023 IEEE International Conference on Omni-layer Intelligent Systems (COINS),
1–5. https://doi.org/https://doi.org/10.1109/COINS57856.2023.10189257
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). Slic
superpixels compared to state-of-the-art superpixel methods. IEEE transactions
on pattern analysis and machine intelligence, 34 (11), 2274–2282.
Adhikari, S. P., Yang, H., & Kim, H. (2019). Learning semantic graphics using convolu-
tional encoder-decoder network for autonomous weeding in paddy field. Frontiers
in plant science, 10, 1404.
Aggarwal, C. C., et al. (2018). Neural networks and deep learning. Springer, 10 (978), 3.
Ahmad, A., Saraswat, D., Aggarwal, V., Etienne, A., & Hancock, B. (2021). Perfor-
mance of deep learning models for classifying and detecting common weeds in
corn and soybean production systems. Computers and Electronics in Agriculture,
184, 106081.
178
Bibliography
Ahmad, J., Muhammad, K., Ahmad, I., Ahmad, W., Smith, M. L., Smith, L. N., Jain,
D. K., Wang, H., & Mehmood, I. (2018). Visual features based boosted classi-
fication of weeds for real-time selective herbicide sprayer systems. Computers in
Industry, 98, 23–33.
Alam, M., Alam, M. S., Roman, M., Tufail, M., Khan, M. U., & Khan, M. T. (2020). Real-
time machine-learning based crop/weed detection and classification for variable-
rate spraying in precision agriculture. 2020 7th International Conference on Elec-
trical and Electronics Engineering (ICEEE), 273–280.
Ali-Gombe, A., & Elyan, E. (2019). Mfc-gan: Class-imbalanced dataset classification using
multiple fake class generative adversarial network. Neurocomputing, 361, 212–221.
Al-Masni, M. A., Kim, D.-H., & Kim, T.-S. (2020). Multiple skin lesions diagnostics
via integrated deep convolutional networks for segmentation and classification.
Computer methods and programs in biomedicine, 190, 105351.
Alom, M. Z., Taha, T. M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M. S., Hasan,
M., Van Essen, B. C., Awwal, A. A., & Asari, V. K. (2019). A state-of-the-art
survey on deep learning theory and architectures. Electronics, 8 (3), 292.
Altaheri, H., Alsulaiman, M., & Muhammad, G. (2019). Date fruit classification for
robotic harvesting in a natural environment using deep learning. IEEE Access, 7,
117115–117133. https://doi.org/https://doi.org/10.1109/ACCESS.2019.2936536
Amend, S., Brandt, D., Di Marco, D., Dipper, T., Gässler, G., Höferlin, M., Gohlke, M.,
Kesenheimer, K., Lindner, P., Leidenfrost, R., et al. (2019). Weed management of
the future. KI-Künstliche Intelligenz, 33 (4), 411–415.
Amer, A. A., & Abdalla, H. I. (2020). A set theory based similarity measure for text
clustering and classification. Journal of Big Data, 7, 1–43. https://doi.org/https:
//doi.org/10.1186/s40537-020-00344-3
Amrani, A., Sohel, F., Diepeveen, D., Murray, D., & Jones, M. G. (2023a). Deep learning-
based detection of aphid colonies on plants from a reconstructed brassica image
dataset. Computers and Electronics in Agriculture, 205, 107587. https://doi.org/
10.1016/j.compag.2022.107587
Amrani, A., Sohel, F., Diepeveen, D., Murray, D., & Jones, M. G. (2023b). Insect detec-
tion from imagery using yolov3-based adaptive feature fusion convolution network.
Crop and Pasture Science. https://doi.org/10.1071/CP21710
179
Bibliography
Andrea, C.-C., Daniel, B. B. M., & Misael, J. B. J. (2017). Precise weed and maize clas-
sification through convolutional neuronal networks. 2017 IEEE Second Ecuador
Technical Chapters Meeting (ETCM), 1–6.
Andreini, P., Bonechi, S., Bianchini, M., Mecocci, A., & Scarselli, F. (2020). Image gen-
eration by gan and style transfer for agar plate image segmentation. Computer
Methods and Programs in Biomedicine, 184, 105268.
Asad, M. H., & Bais, A. (2019). Weed detection in canola fields using maximum likelihood
classification and deep convolutional neural network. Information Processing in
Agriculture. https://doi.org/10.1016/j.inpa.2019.12.002
Asad, M. H., & Bais, A. (2020). Weed detection in canola fields using maximum likelihood
classification and deep convolutional neural network. Information Processing in
Agriculture, 7 (4), 535–545. https://doi.org/https://doi.org/10.1016/j.inpa.2019.
12.002
Attri, I., Awasthi, L. K., Sharma, T. P., & Rathee, P. (2023). A review of deep learning
techniques used in agriculture. Ecological Informatics, 102217. https://doi.org/
10.1016/j.ecoinf.2023.102217
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional
encoder-decoder architecture for image segmentation. IEEE transactions on pat-
tern analysis and machine intelligence, 39 (12), 2481–2495.
Bah, M. D., Hafiane, A., & Canals, R. (2018). Deep learning with unsupervised data
labeling for weed detection in line crops in uav images. Remote sensing, 10 (11),
1690.
Bakhshipour, A., & Jafari, A. (2018). Evaluation of support vector machine and artificial
neural networks in weed detection using shape features. Computers and Electronics
in Agriculture, 145, 153–160.
Bakhshipour, A., Jafari, A., Nassiri, S. M., & Zare, D. (2017). Weed segmentation using
texture features extracted from wavelet sub-images. Biosystems Engineering, 157,
1–12.
Banan, A., Nasiri, A., & Taheri-Garavand, A. (2020). Deep learning-based appearance
features extraction for automated carp species identification. Aquacultural Engi-
neering, 89, 102053.
180
Bibliography
Bansal, R., Raj, G., & Choudhury, T. (2016). Blur image detection using laplacian opera-
tor and open-cv. 2016 International Conference System Modeling & Advancement
in Research Trends (SMART), 63–67. https://doi.org/10.1109/SYSMART.2016.
7894491
Barbedo, J. G. A. (2018). Impact of dataset size and variety on the effectiveness of
deep learning and transfer learning for plant disease classification. Computers and
electronics in agriculture, 153, 46–53. https://doi.org/10.1016/j.compag.2018.08.
013
Barlow, H. B. (1989). Unsupervised learning. Neural computation, 1 (3), 295–311.
Barnes, E., Morgan, G., Hake, K., Devine, J., Kurtz, R., Ibendahl, G., Sharda, A., Rains,
G., Snider, J., Maja, J. M., et al. (2021). Opportunities for robotic systems and
automation in cotton production. AgriEngineering, 3 (2), 339–362. https://doi.
org/10.3390/agriengineering3020023
Bawden, O., Kulk, J., Russell, R., McCool, C., English, A., Dayoub, F., Lehnert, C., &
Perez, T. (2017). Robot for weed species plant-specific management. Journal of
Field Robotics, 34 (6), 1179–1199. https://doi.org/10.1002/rob.21727
Bi, J., & Zhang, C. (2018). An empirical comparison on state-of-the-art multi-class
imbalance learning algorithms and a new diversified ensemble learning scheme.
Knowledge-Based Systems, 158, 81–93.
Binguitcha-Fare, A.-A., & Sharma, P. (2019). Crops and weeds classification using convo-
lutional neural networks via optimization of transfer learning parameters. Inter-
national Journal of Engineering and Advanced Technology (IJEAT), 8 (5), 2284–
2294.
Bini, D., Pamela, D., & Prince, S. (2020). Machine vision and machine learning for intel-
ligent agrobots: A review. 2020 5th International Conference on Devices, Circuits
and Systems (ICDCS), 12–16.
Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). Yolov4: Optimal speed and
accuracy of object detection. arXiv preprint arXiv:2004.10934. https://doi.org/
10.48550/arXiv.2004.10934
Bosilj, P., Aptoula, E., Duckett, T., & Cielniak, G. (2020). Transfer learning between crop
types for semantic segmentation of crops versus weeds in precision agriculture.
Journal of Field Robotics, 37 (1), 7–19.
181
Bibliography
182
Bibliography
for crop rows segmentation on aerial images. Applied Artificial Intelligence, 34 (4),
271–291.
Chaisattapagon, N. Z. C. (1995). Effective criteria for weed identification in wheat fields
using machine vision. Transactions of the ASAE, 38 (3), 965–974.
Chapelle, O., Scholkopf, B., & Zien, A. (2009). Semi-supervised learning (chapelle, o.
et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20 (3),
542–542.
Chartrand, G., Cheng, P. M., Vorontsov, E., Drozdzal, M., Turcotte, S., Pal, C. J.,
Kadoury, S., & Tang, A. (2017). Deep learning: A primer for radiologists. Ra-
diographics, 37 (7), 2113–2131.
Chauhan, B. S. (2020). Grand challenges in weed management. https://doi.org/10.3389/
fagro.2019.00003
Chavan, T. R., & Nandedkar, A. V. (2018). Agroavnet for crops and weeds classification:
A step forward in automatic farming. Computers and Electronics in Agriculture,
154, 361–372.
Chebrolu, N., Läbe, T., & Stachniss, C. (2018). Robust long-term registration of uav
images of crop fields for precision agriculture. IEEE Robotics and Automation
Letters, 3 (4), 3097–3104.
Chebrolu, N., Lottes, P., Schaefer, A., Winterhalter, W., Burgard, W., & Stachniss, C.
(2017). Agricultural robot dataset for plant classification, localization and map-
ping on sugar beet fields. The International Journal of Robotics Research, 36 (10),
1045–1052.
Chechlinski, L., Siemikatkowska, B., & Majewski, M. (2019). A system for weeds and crops
identification—reaching over 10 fps on raspberry pi with the usage of mobilenets,
densenet and custom modifications. Sensors, 19 (17), 3787.
Chen, D., Lu, Y., Li, Z., & Young, S. (2022a). Performance evaluation of deep transfer
learning on multi-class identification of common weed species in cotton production
systems. Computers and Electronics in Agriculture, 198, 107091. https://doi.org/
10.1016/j.compag.2022.107091
Chen, D., Lu, Y., Li, Z., & Young, S. (2022b). Performance evaluation of deep transfer
learning on multi-class identification of common weed species in cotton production
183
Bibliography
184
Bibliography
185
Bibliography
Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural
networks. 2016 eighth international conference on quality of multimedia experience
(QoMEX), 1–6. https://doi.org/10.1109/QoMEX.2016.7498955
Dodge, S., & Karam, L. (2017). A study and comparison of human and deep learning
recognition performance under visual distortions. 2017 26th international confer-
ence on computer communication and networks (ICCCN), 1–7. https://doi.org/
10.1109/ICCCN.2017.8038465
DOĞAN, M. N., Ünay, A., Boz, Ö., & Albay, F. (2004). Determination of optimum weed
control timing in maize (zea mays l.) Turkish Journal of Agriculture and Forestry,
28 (5), 349–354.
Dong, Q., Gong, S., & Zhu, X. (2018). Imbalanced deep learning by minority class incre-
mental rectification. IEEE transactions on pattern analysis and machine intelli-
gence, 41 (6), 1367–1381. https://doi.org/10.1109/TPAMI.2018.2832629
Dong, S., Wang, P., & Abbas, K. (2021). A survey on deep learning and its applications.
Computer Science Review, 40, 100379.
dos Santos Ferreira, A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T. (2017).
Weed detection in soybean crops using convnets. Computers and Electronics in
Agriculture, 143, 314–324.
dos Santos Ferreira, A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T.
(2019). Unsupervised deep learning and semi-automatic data labeling in weed
discrimination. Computers and Electronics in Agriculture, 165, 104963.
DPIRD. (2021, June). Herbicide application: Page 3 of 5. https://www.agric.wa.gov.au/
grains/herbicide-application?page=0%5C%2C2
Druzhkov, P. N., & Kustikova, V. D. (2016). A survey of deep learning methods and
software tools for image classification and object detection. Pattern Recognition
and Image Analysis, 26, 9–15.
Du, J. (2018). Understanding of object detection based on cnn family and yolo. Journal
of Physics: Conference Series, 1004, 012029. https://doi.org/https://doi.org/10.
1088/1742-6596/1004/1/012029
Duke, S. O. (2015). Perspectives on transgenic, herbicide-resistant crops in the united
states almost 20 years after introduction. Pest management science, 71 (5), 652–
657. https://doi.org/10.1002/ps.3863
186
Bibliography
Durand, T., Mordan, T., Thome, N., & Cord, M. (2017). Wildcat: Weakly supervised
learning of deep convnets for image classification, pointwise localization and seg-
mentation. Proceedings of the IEEE conference on computer vision and pattern
recognition, 642–651.
Dyrmann, M., Jørgensen, R. N., & Midtiby, H. S. (2017). Roboweedsupport-detection
of weed locations in leaf occluded cereal crops using a fully convolutional neural
network. Adv. Anim. Biosci, 8 (2), 842–847.
Dyrmann, M., Karstoft, H., & Midtiby, H. S. (2016). Plant species classification using
deep convolutional neural network. Biosystems Engineering, 151, 72–80.
Ehrlich, M., & Davis, L. S. (2019). Deep residual learning in the jpeg transform domain.
Proceedings of the IEEE International Conference on Computer Vision, 3484–
3493.
Eli-Chukwu, N. C. (2019). Applications of artificial intelligence in agriculture: A review.
Engineering, Technology & Applied Science Research, 9 (4).
Espejo-Garcia, B., Mylonas, N., Athanasakos, L., Fountas, S., & Vasilakoglou, I. (2020).
Towards weeds identification assistance through transfer learning. Computers and
Electronics in Agriculture, 171, 105306.
Espinoza, M. A. M., Le, C. Z., Raheja, A., & Bhandari, S. (2020). Weed identification
and removal using machine learning techniques and unmanned ground vehicles.
Autonomous Air and Ground Sensing Systems for Agricultural Optimization and
Phenotyping V, 11414, 114140J. https://doi.org/10.1117/12.2557625
Farooq, A., Hu, J., & Jia, X. (2018a). Analysis of spectral bands and spatial resolutions
for weed classification via deep convolutional neural network. IEEE Geoscience
and Remote Sensing Letters, 16 (2), 183–187.
Farooq, A., Hu, J., & Jia, X. (2018b). Weed classification in hyperspectral remote sensing
images via deep convolutional neural network. IGARSS 2018-2018 IEEE Interna-
tional Geoscience and Remote Sensing Symposium, 3816–3819.
Farooq, A., Jia, X., Hu, J., & Zhou, J. (2019). Multi-resolution weed classification via con-
volutional neural network and superpixel based local binary pattern using remote
sensing images. Remote Sensing, 11 (14), 1692.
Fawakherji, M., Youssef, A., Bloisi, D., Pretto, A., & Nardi, D. (2019). Crop and weeds
classification for precision agriculture using context-independent pixel-wise seg-
187
Bibliography
188
Bibliography
Gharde, Y., Singh, P., Dubey, R., & Gupta, P. (2018). Assessment of yield and economic
losses in agriculture due to weeds in india. Crop Protection, 107, 12–18. https:
//doi.org/https://doi.org/10.1016/j.cropro.2018.01.007
Gill, S. S., Xu, M., Ottaviani, C., Patros, P., Bahsoon, R., Shaghaghi, A., Golec, M.,
Stankovski, V., Wu, H., Abraham, A., et al. (2022). Ai for next generation com-
puting: Emerging trends and future directions. Internet of Things, 19, 100514.
https://doi.org/https://doi.org/10.1016/j.iot.2022.100514
Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE international conference on
computer vision, 1440–1448.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for
accurate object detection and semantic segmentation. Proceedings of the IEEE
conference on computer vision and pattern recognition, 580–587.
Giselsson, T. M., Jørgensen, R. N., Jensen, P. K., Dyrmann, M., & Midtiby, H. S. (2017).
A public image database for benchmark of plant seedling classification algorithms.
arXiv preprint arXiv:1711.05458.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint
arXiv:1406.2661.
Grisso, R. D., Alley, M. M., Thomason, W. E., Holshouser, D. L., & Roberson, G. T.
(2011). Precision farming tools: Variable-rate application.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G.,
Cai, J., et al. (2018). Recent advances in convolutional neural networks. Pattern
Recognition, 77, 354–377.
Guo, Q., Juefei-Xu, F., Xie, X., Ma, L., Wang, J., Yu, B., Feng, W., & Liu, Y. (2020).
Watch out! motion is blurring the vision of your deep neural networks. Advances
in Neural Information Processing Systems, 33, 975–985.
Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., & Feris, R. (2019). Spottune:
Transfer learning through adaptive fine-tuning. Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 4805–4814.
Haggag, M., Abdelhay, S., Mecheter, A., Gowid, S., Musharavati, F., & Ghani, S. (2019).
An intelligent hybrid experimental-based deep learning algorithm for tomato-
189
Bibliography
190
Bibliography
191
Bibliography
192
Bibliography
Huang, H., Lan, Y., Deng, J., Yang, A., Deng, X., Zhang, L., & Wen, S. (2018c). A
semantic labeling approach for accurate weed mapping of high resolution uav
imagery. Sensors, 18 (7), 2113.
Huang, H., Lan, Y., Yang, A., Zhang, Y., Wen, S., & Deng, J. (2020). Deep learning versus
object-based image analysis (obia) in weed mapping of uav imagery. International
Journal of Remote Sensing, 41 (9), 3446–3479.
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna,
Z., Song, Y., Guadarrama, S., et al. (2017). Speed/accuracy trade-offs for modern
convolutional object detectors. Proceedings of the IEEE conference on computer
vision and pattern recognition, 7310–7311.
Huang, S.-W., Lin, C.-T., Chen, S.-P., Wu, Y.-Y., Hsu, P.-H., & Lai, S.-H. (2018). Auggan:
Cross domain adaptation with gan-based data augmentation. Proceedings of the
European Conference on Computer Vision (ECCV), 718–731.
Huisman, M., Van Rijn, J. N., & Plaat, A. (2021). A survey of deep meta-learning.
Artificial Intelligence Review, 54 (6), 4483–4541. https://doi.org/https://doi.org/
10.1007/s10462-021-10004-4
Hussain, M. (2023). Yolo-v1 to yolo-v8, the rise of yolo and its complementary nature
toward digital manufacturing and industrial defect detection. Machines, 11 (7),
677. https://doi.org/10.3390/machines11070677
Hussain, N., Farooque, A. A., Schumann, A. W., McKenzie-Gopsill, A., Esau, T., Abbas,
F., Acharya, B., & Zaman, Q. (2020). Design and development of a smart variable
rate sprayer using deep learning. Remote Sensing, 12 (24), 4091. https://doi.org/
10.3390/rs12244091
Iqbal, N., Manalil, S., Chauhan, B. S., & Adkins, S. W. (2019). Investigation of alternate
herbicides for effective weed management in glyphosate-tolerant cotton. Archives
of Agronomy and Soil Science, 65 (13), 1885–1899.
Ishak, A. J., Mokri, S. S., Mustafa, M. M., & Hussain, A. (2007). Weed detection utilizing
quadratic polynomial and roi techniques. 2007 5th Student Conference on Research
and Development, 1–5.
Jafari, A., Mohtasebi, S. S., Jahromi, H. E., & Omid, M. (2006). Weed detection in sugar
beet fields using machine vision. Int. J. Agric. Biol, 8 (5), 602–605.
193
Bibliography
Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., & Bengio, Y. (2017). The one hundred
layers tiramisu: Fully convolutional densenets for semantic segmentation. Proceed-
ings of the IEEE conference on computer vision and pattern recognition workshops,
11–19.
Jensen, T. A., Smith, B., & Defeo, L. F. (2020a). An automated site-specific fallow weed
management system using unmanned aerial vehicles.
Jensen, T. A., Smith, B., & Defeo, L. F. (2020b). An automated site-specific fallow weed
management system using unmanned aerial vehicles.
Jiang, H., Zhang, C., Qiao, Y., Zhang, Z., Zhang, W., & Song, C. (2020). Cnn feature
based graph convolutional network for weed and crop recognition in smart farming.
Computers and Electronics in Agriculture, 174, 105450.
Jiang, Y., Li, C., Paterson, A. H., & Robertson, J. S. (2019). Deepseedling: Deep con-
volutional network and kalman filter for plant seedling detection and counting in
the field. Plant methods, 15 (1), 141.
Jin, X., Che, J., & Chen, Y. (2021). Weed identification using deep learning and image
processing in vegetable plantation. IEEE Access, 9, 10940–10950. https://doi.org/
https://doi.org/10.1109/ACCESS.2021.3050296
Jocher, G., Chaurasia, A., & Qiu, J. (2023a). Ultralytics yolov8. https://github.com/
ultralytics/ultralytics
Jocher, G., Chaurasia, A., & Qiu, J. (2023b, January). YOLO by Ultralytics (Version 8.0.0).
https://github.com/ultralytics/ultralytics
Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey.
Computers and electronics in agriculture, 147, 70–90.
Karimi, Y., Prasher, S., Patel, R., & Kim, S. (2006). Application of support vector ma-
chine technology for weed and nitrogen stress detection in corn. Computers and
electronics in agriculture, 51 (1-2), 99–109.
Karunathilake, E., Le, A. T., Heo, S., Chung, Y. S., & Mansoor, S. (2023). The path to
smart farming: Innovations and opportunities in precision agriculture. Agriculture,
13 (8), 1593. https://doi.org/https://doi.org/10.3390/agriculture13081593
Kassani, S. H., Kassani, P. H., Khazaeinezhad, R., Wesolowski, M. J., Schneider, K. A.,
& Deters, R. (2019). Diabetic retinopathy classification using a modified xcep-
194
Bibliography
195
Bibliography
Kirk, K., Andersen, H. J., Thomsen, A. G., Jørgensen, J. R., & Jørgensen, R. N. (2009).
Estimation of leaf area index in cereal crops using red–green images. Biosystems
Engineering, 104 (3), 308–317. https://doi.org/10.1016/j.compag.2008.03.009
Knoll, F. J., Czymmek, V., Harders, L. O., & Hussmann, S. (2019). Real-time clas-
sification of weeds in organic carrot production using deep learning algorithms.
Computers and Electronics in Agriculture, 167, 105097.
Kodagoda, S., Zhang, Z., Ruiz, D., & Dissanayake, G. (2008). Weed detection and clas-
sification for autonomous farming. Intelligent Production Machines and Systems.
Kogan, M. (1998). Integrated pest management: Historical perspectives and contempo-
rary developments. Annual review of entomology, 43 (1), 243–270.
Korres, N. E., Burgos, N. R., Travlos, I., Vurro, M., Gitsopoulos, T. K., Varanasi, V. K.,
Duke, S. O., Kudsk, P., Brabham, C., Rouse, C. E., et al. (2019). New direc-
tions for integrated weed management: Modern technologies, tools and knowledge
discovery. Advances in Agronomy, 155, 243–319.
Kounalakis, T., Malinowski, M. J., Chelini, L., Triantafyllidis, G. A., & Nalpantidis,
L. (2018). A robotic system employing deep learning for visual recognition and
detection of weeds in grasslands. 2018 IEEE International Conference on Imaging
Systems and Techniques (IST), 1–6.
Kounalakis, T., Triantafyllidis, G. A., & Nalpantidis, L. (2019). Deep learning-based vi-
sual recognition of rumex for robotic precision farming. Computers and Electronics
in Agriculture, 165, 104973.
Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future direc-
tions. Progress in Artificial Intelligence, 5 (4), 221–232.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. Advances in neural information processing systems,
25, 1097–1105.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep
convolutional neural networks. Communications of the ACM, 60 (6), 84–90. https:
//doi.org/10.1145/3065386
Kukar, M., Vračar, P., Košir, D., Pevec, D., Bosnić, Z., et al. (2019). Agrodss: A decision
support system for agriculture and farming. Computers and Electronics in Agricul-
ture, 161, 260–271. https://doi.org/https://doi.org/10.1016/j.compag.2018.04.001
196
Bibliography
Kumar, A., Zhang, Z. J., & Lyu, H. (2020). Object detection in real time based on
improved single shot multi-box detector algorithm. EURASIP Journal on Wireless
Communications and Networking, 2020, 1–18. https://doi.org/https://doi.org/
10.1186/s13638-020-01826-x
Kumar, H. (2019, April). Data augmentation techniques. https://iq.opengenus.org/data-
augmentation/
Kussul, N., Lavreniuk, M., Skakun, S., & Shelestov, A. (2017). Deep learning classification
of land cover and crop types using remote sensing data. IEEE Geoscience and
Remote Sensing Letters, 14 (5), 778–782. https://doi.org/10.1109/LGRS.2017.
2681128
Kuzuhara, H., Takimoto, H., Sato, Y., & Kanagawa, A. (2020). Insect pest detection and
identification method based on deep learning for realizing a pest control system.
2020 59th Annual Conference of the Society of Instrument and Control Engineers
of Japan (SICE), 709–714. https://doi.org/https://doi.org/10.23919/SICE48898.
2020.9240458
Lal, R. (1991). Soil structure and sustainability. Journal of sustainable agriculture, 1 (4),
67–92.
Lam, O. H. Y., Dogotari, M., Prüm, M., Vithlani, H. N., Roers, C., Melville, B., Zimmer,
F., & Becker, R. (2020). An open source workflow for weed mapping in native
grassland using unmanned aerial vehicle: Using rumex obtusifolius as a case study.
European Journal of Remote Sensing, 1–18.
Lameski, P., Zdravevski, E., & Kulakov, A. (2018). Review of automated weed control
approaches: An environmental impact perspective. International Conference on
Telecommunications, 132–147.
Lameski, P., Zdravevski, E., Trajkovik, V., & Kulakov, A. (2017). Weed detection dataset
with rgb images taken under variable light conditions. International Conference
on ICT Innovations, 112–119.
Lammie, C., Olsen, A., Carrick, T., & Azghadi, M. R. (2019). Low-power and high-speed
deep fpga inference engines for weed classification at the edge. IEEE Access, 7,
51171–51184. https://doi.org/10.1109/ACCESS.2019
197
Bibliography
Le, V. N. T., Ahderom, S., & Alameh, K. (2020a). Performances of the lbp based algo-
rithm over cnn models for detecting crops and weeds with similar morphologies.
Sensors, 20 (8), 2193.
Le, V. N. T., Ahderom, S., Apopei, B., & Alameh, K. (2020b). A novel method for detect-
ing morphologically similar crops and weeds based on the combination of contour
masks and filtered local binary pattern operators. GigaScience, 9 (3), giaa017.
Le, V. N. T., Truong, G., & Alameh, K. (2021). Detecting weeds from crops under complex
field environments based on faster rcnn. 2020 IEEE Eighth International Confer-
ence on Communications and Electronics (ICCE), 350–355. https://doi.org/10.
1109/ICCE48956.2021.9352073
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521 (7553), 436–444.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., &
Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition.
Neural computation, 1 (4), 541–551.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A.,
Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-
resolution using a generative adversarial network. Proceedings of the IEEE con-
ference on computer vision and pattern recognition, 4681–4690.
Lee, D.-H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method
for deep neural networks. Workshop on challenges in representation learning,
ICML, 3 (2).
Leminen Madsen, S., Mathiassen, S. K., Dyrmann, M., Laursen, M. S., Paz, L.-C., &
Jørgensen, R. N. (2020). Open plant phenotype database of common weeds in
denmark. Remote Sensing, 12 (8), 1246.
Li, P., He, D., Qiao, Y., & Yang, C. (2013). An application of soft sets in weed identifi-
cation. 2013 Kansas City, Missouri, July 21-July 24, 2013, 1.
Li, W., Zheng, T., Yang, Z., Li, M., Sun, C., & Yang, X. (2021). Classification and detec-
tion of insects from field images using deep learning for smart pest management:
A systematic review. Ecological Informatics, 66, 101460. https://doi.org/10.1016/
j.ecoinf.2021.101460
198
Bibliography
Li, Y., Zhang, H., Xue, X., Jiang, Y., & Shen, Q. (2018). Deep learning for remote sensing
image classification: A survey. Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 8 (6), e1264.
Li, Y., Guo, Z., Shuang, F., Zhang, M., & Li, X. (2022). Key technologies of machine
vision for weeding robots: A review and benchmark. Computers and Electronics
in Agriculture, 196, 106880. https://doi.org/10.1016/j.compag.2022.106880
Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming
classifier imbalance for long-tail object detection with balanced group softmax.
Proceedings of the IEEE/CVF conference on computer vision and pattern recogni-
tion, 10991–11000.
Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning
in agriculture: A review. Sensors, 18 (8), 2674.
Liang, W.-C., Yang, Y.-J., & Chao, C.-M. (2019). Low-cost weed identification system us-
ing drones. 2019 Seventh International Symposium on Computing and Networking
Workshops (CANDARW), 260–263.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object
detection. Proceedings of the IEEE international conference on computer vision,
2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., &
Zitnick, C. L. (2014). Microsoft coco: Common objects in context. European con-
ference on computer vision, 740–755.
Lindblom, J., Lundström, C., Ljung, M., & Jonsson, A. (2017). Promoting sustainable
intensification in precision agriculture: Review of decision support systems devel-
opment and strategies. Precision agriculture, 18, 309–331.
Liu, B., & Bruch, R. (2020). Weed detection for selective spraying: A review. Current
Robotics Reports, 1 (1), 19–26.
Liu, J., Xiang, J., Jin, Y., Liu, R., Yan, J., & Wang, L. (2021). Boost precision agriculture
with unmanned aerial vehicle remote sensing and edge intelligence: A survey.
Remote Sensing, 13 (21), 4387. https://doi.org/10.3390/rs13214387
Liu, J., & Wang, X. (2021). Plant diseases and pests detection based on deep learning:
A review. Plant Methods, 17, 1–18. https://doi.org/10.1186/s13007-021-00722-9
199
Bibliography
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016).
Ssd: Single shot multibox detector. European conference on computer vision, 21–
37.
Liu, Y., Sun, P., Wergeles, N., & Shang, Y. (2021). A survey and performance evalu-
ation of deep learning methods for small object detection. Expert Systems with
Applications, 172, 114602. https://doi.org/10.1016/j.eswa.2021.114602
López-Correa, J. M., Moreno, H., Ribeiro, A., & Andújar, D. (2022). Intelligent weed
management based on object detection neural networks in tomato crops. Agron-
omy, 12 (12), 2953. https://doi.org/10.3390/agronomy12122953
López-Granados, F. (2011). Weed detection for site-specific weed management: Mapping
and real-time approaches. Weed Research, 51 (1), 1–11.
Lottes, P., Behley, J., Chebrolu, N., Milioto, A., & Stachniss, C. (2018a). Joint stem de-
tection and crop-weed classification for plant-specific treatment in precision farm-
ing. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 8233–8238.
Lottes, P., Behley, J., Chebrolu, N., Milioto, A., & Stachniss, C. (2020). Robust joint stem
detection and crop-weed classification using image sequences for plant-specific
treatment in precision farming. Journal of Field Robotics, 37 (1), 20–34.
Lottes, P., Behley, J., Milioto, A., & Stachniss, C. (2018b). Fully convolutional networks
with sequential information for robust crop and weed detection in precision farm-
ing. IEEE Robotics and Automation Letters, 3 (4), 2870–2877.
Lou, H., Duan, X., Guo, J., Liu, H., Gu, J., Bi, L., & Chen, H. (2023). Dc-yolov8: Small-
size object detection algorithm based on camera sensor. Electronics, 12 (10), 2323.
https://doi.org/10.3390/electronics12102323
Lu, Y., Young, S., Wang, H., & Wijewardane, N. (2022). Robust plant segmentation of
color images based on image contrast optimization. Computers and Electronics in
Agriculture, 193, 106711. https://doi.org/10.1016/j.compag.2022.106711
Ma, X., Deng, X., Qi, L., Jiang, Y., Li, H., Wang, Y., & Xing, X. (2019). Fully convolu-
tional network for rice seedling and weed image segmentation at the seedling stage
in paddy fields. PloS one, 14 (4), e0215676.
Maimaitijiang, M., Sagan, V., Sidike, P., Hartling, S., Esposito, F., & Fritschi, F. B.
(2020). Soybean yield prediction from uav using multimodal data fusion and deep
200
Bibliography
201
Bibliography
202
Bibliography
203
Bibliography
Patidar, S., Singh, U., Sharma, S. K., et al. (2020). Weed seedling detection using mask re-
gional convolutional neural network. 2020 International Conference on Electronics
and Sustainable Communication Systems (ICESC), 311–316.
Patterson, J., & Gibson, A. (2017). Deep learning: A practitioner’s approach. " O’Reilly
Media, Inc."
Pearlstein, L., Kim, M., & Seto, W. (2016). Convolutional neural network application to
plant detection, based on synthetic imagery. 2016 IEEE Applied Imagery Pattern
Recognition Workshop (AIPR), 1–4.
Pertuz, S., Puig, D., & Garcia, M. A. (2013). Analysis of focus measure operators for
shape-from-focus. Pattern Recognition, 46 (5), 1415–1432. https : / / doi . org / 10 .
1016/j.patcog.2012.11.011
Peteinatos, G., Reichel, P., Karouta, J., Andújar, D., & Gerhards, R. (2020). Weed iden-
tification in maize, sunflower, and potatoes with the aid of convolutional neural
networks. Remote Sensing, 12 (24), 4185.
Petrich, L., Lohrmann, G., Neumann, M., Martin, F., Frey, A., Stoll, A., & Schmidt,
V. (2019). Detection of colchicum autumnale in drone images, using a machine-
learning approach.
Precision spraying - weed sprayer. (n.d.). Retrieved January 25, 2021, from https://www.
weed-it.com/
PyTorch. (2020, August). Ai for ag: Production machine learning for agriculture. https:
//medium.com/pytorch/ai-for-ag-production-machine-learning-for-agriculture-
e8cfdb9849a1
Qian, Q., Chen, L., Li, H., & Jin, R. (2020). Dr loss: Improving object detection by
distributional ranking. Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 12164–12172.
Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., & Sun, J. (2019). Thundernet:
Towards real-time generic object detection on mobile devices. Proceedings of the
IEEE/CVF International Conference on Computer Vision, 6718–6727.
Quan, L., Feng, H., Lv, Y., Wang, Q., Zhang, C., Liu, J., & Yuan, Z. (2019). Maize
seedling detection under different growth stages and complex field environments
based on an improved faster r–cnn. Biosystems Engineering, 184, 1–23. https :
//doi.org/10.1016/j.biosystemseng.2019.05.002
204
Bibliography
Radoglou-Grammatikis, P., Sarigiannidis, P., Lagkas, T., & Moscholios, I. (2020). A com-
pilation of uav applications for precision agriculture. Computer Networks, 172,
107148.
Rai, N., Zhang, Y., Ram, B. G., Schumacher, L., Yellavajjala, R. K., Bajwa, S., & Sun,
X. (2023). Applications of deep learning in precision weed management: A review.
Computers and Electronics in Agriculture, 206, 107698. https://doi.org/https:
//doi.org/10.1016/j.compag.2023.107698
Raj, E., Appadurai, M., & Athiappan, K. (2021). Precision farming in modern agricul-
ture. In Smart agriculture automation using advanced technologies (pp. 61–87).
Springer. https://doi.org/10.1007/978-981-16-6124-2_4
Raja, R., Nguyen, T. T., Slaughter, D. C., & Fennimore, S. A. (2020). Real-time robotic
weed knife control system for tomato and lettuce based on geometric appearance
of plant labels. Biosystems Engineering, 194, 152–164.
Rakhmatulin, I., Kamilaris, A., & Andreasen, C. (2021). Deep neural networks to detect
weeds from crops in agricultural environments in real-time: A review. Remote
Sensing, 13 (21), 4486.
Ramcharan, A., Baranowski, K., McCloskey, P., Ahmed, B., Legg, J., & Hughes, D. P.
(2017). Deep learning for image-based cassava disease detection. Frontiers in plant
science, 8, 1852. https://doi.org/https://doi.org/10.3389/fpls.2017.01852
Ramirez, W., Achanccaray, P., Mendoza, L., & Pacheco, M. (2020). Deep convolutional
neural networks for weed detection in agricultural crops using optical aerial im-
ages. 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference
(LAGIRS), 133–137.
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine
learning. arXiv preprint arXiv:1811.12808. https://doi.org/10.48550/arXiv.1811.
12808
Rasmussen, J., Nørremark, M., & Bibby, B. M. (2007). Assessment of leaf cover and crop
soil cover in weed harrowing research using digital images. Weed Research, 47 (4),
299–310. https://doi.org/10.1111/j.1365-3180.2007.00565.x
Rasti, P., Ahmad, A., Samiei, S., Belin, E., & Rousseau, D. (2019). Supervised image
classification by scattering transform with application to weed detection in culture
205
Bibliography
crops of high density. Remote Sensing, 11 (3), 249. https : / / doi . org / 10 . 3390 /
rs11030249
Razfar, N., True, J., Bassiouny, R., Venkatesh, V., & Kashef, R. (2022). Weed detection
in soybean crops using custom lightweight deep learning models. Journal of Agri-
culture and Food Research, 8, 100308. https://doi.org/https://doi.org/10.1016/j.
jafr.2022.100308
Redmon, J. (n.d.). https://pjreddie.com/darknet/
Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. Proceedings of the
IEEE conference on computer vision and pattern recognition, 7263–7271.
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint
arXiv:1804.02767.
Reedha, R., Dericquebourg, E., Canals, R., & Hafiane, A. (2021). Vision transformers
for weeds and crops classification of high resolution uav images. arXiv preprint
arXiv:2109.02716. https://doi.org/10.3390/rs14030592
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object
detection with region proposal networks. arXiv preprint arXiv:1506.01497.
Rew, L., & Cousens, R. (2001). Spatial distribution of weeds in arable crops: Are current
sampling and analytical methods appropriate? Weed Research, 41 (1), 1–18. https:
//doi.org/10.1046/j.1365-3180.2001.00215.x
Rist, Y., Shendryk, I., Diakogiannis, F., & Levick, S. (2019). Weed mapping using very
high resolution satellite imagery and fully convolutional neural network. IGARSS
2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 9784–
9787.
Robertson, M., Kirkegaard, J., Peake, A., Creelman, Z., Bell, L., Lilley, J., Midwood, J.,
Zhang, H., Kleven, S., Duff, C., et al. (2016). Trends in grain production and yield
gaps in the high-rainfall zone of southern australia. Crop and Pasture Science,
67 (9), 921–937.
Robocrop spot sprayer: Weed removal. (2018, July). Retrieved January 25, 2021, from
https://garford.com/products/robocrop-spot-sprayer/
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for
biomedical image segmentation. International Conference on Medical image com-
puting and computer-assisted intervention, 234–241.
206
Bibliography
Ruigrok, T., van Henten, E., Booij, J., van Boheemen, K., & Kootstra, G. (2020).
Application-specific evaluation of a weed-detection algorithm for plant-specific
spraying. Sensors, 20 (24), 7262. https://doi.org/10.3390/s20247262
Sa, I., Chen, Z., Popović, M., Khanna, R., Liebisch, F., Nieto, J., & Siegwart, R. (2017).
Weednet: Dense semantic weed classification using multispectral images and mav
for smart farming. IEEE Robotics and Automation Letters, 3 (1), 588–595.
Sa, I., Popović, M., Khanna, R., Chen, Z., Lottes, P., Liebisch, F., Nieto, J., Stachniss,
C., Walter, A., & Siegwart, R. (2018). Weedmap: A large-scale semantic weed
mapping framework using aerial multispectral imaging and deep neural network
for precision farming. Remote Sensing, 10 (9), 1423.
Sabottke, C. F., & Spieler, B. M. (2020). The effect of image resolution on deep learning
in radiography. Radiology: Artificial Intelligence, 2 (1), e190015.
Sabzi, S., Abbaspour-Gilandeh, Y., & Arribas, J. I. (2020). An automatic visible-range
video weed detection, segmentation and classification prototype in potato field.
Heliyon, 6 (5), e03685.
Saha, S., Ghosh, M., Ghosh, S., Sen, S., Singh, P. K., Geem, Z. W., & Sarkar, R. (2020).
Feature selection for facial emotion recognition using cosine similarity-based har-
mony search algorithm. Applied Sciences, 10 (8), 2816. https : / / doi . org / https :
//doi.org/10.3390/app10082816
Sahlsten, J., Jaskari, J., Kivinen, J., Turunen, L., Jaanio, E., Hietala, K., & Kaski, K.
(2019). Deep learning fundus image analysis for diabetic retinopathy and macular
edema grading. Scientific reports, 9 (1), 1–11.
Sakyi, L. (2019, February). Linda sakyi. https://greenrootltd.com/2019/02/19/five-
general-categories-of-weed-control-methods/
Saleem, M. H., Potgieter, J., & Arif, K. M. (2019). Plant disease detection and classifica-
tion by deep learning. Plants, 8 (11), 468. https://doi.org/10.3390/plants8110468
Saleem, M. H., Potgieter, J., & Arif, K. M. (2022). Weed detection by faster rcnn model:
An enhanced anchor box approach. Agronomy, 12 (7), 1580. https://doi.org/10.
3390/agronomy12071580
Saleem, S. R., Zaman, Q. U., Schumann, A. W., & Naqvi, S. M. Z. A. (2023). Variable
rate technologies: Development, adaptation, and opportunities in agriculture. In
Precision agriculture (pp. 103–122). Elsevier.
207
Bibliography
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2:
Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on
computer vision and pattern recognition, 4510–4520.
Sarker, I. H. (2021). Deep learning: A comprehensive overview on techniques, taxonomy,
applications and research directions. SN Computer Science, 2 (6), 420. https://
doi.org/https://doi.org/10.1007/s42979-021-00815-1
Sarvini, T., Sneha, T., GS, S. G., Sushmitha, S., & Kumaraswamy, R. (2019). Perfor-
mance comparison of weed detection algorithms. 2019 International Conference
on Communication and Signal Processing (ICCSP), 0843–0847.
Scavo, A., & Mauromicale, G. (2020). Integrated weed management in herbaceous field
crops. Agronomy, 10 (4), 466.
Schneider, U. A., Havlík, P., Schmid, E., Valin, H., Mosnier, A., Obersteiner, M., Böttcher,
H., Skalskỳ, R., Balkovič, J., Sauer, T., et al. (2011). Impacts of population growth,
economic development, and technical change on global food production and con-
sumption. Agricultural Systems, 104 (2), 204–215.
Seelan, S. K., Laguette, S., Casady, G. M., & Seielstad, G. A. (2003). Remote sensing
applications for precision agriculture: A learning community approach. Remote
sensing of environment, 88 (1-2), 157–169.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017).
Grad-cam: Visual explanations from deep networks via gradient-based localization.
Proceedings of the IEEE international conference on computer vision, 618–626.
https://doi.org/10.1007/s11263-019-01228-7
Shaikh, T. A., Rasool, T., & Lone, F. R. (2022). Towards leveraging the role of machine
learning and artificial intelligence in precision agriculture and smart farming. Com-
puters and Electronics in Agriculture, 198, 107119.
Shammi, S., Sohel, F., Diepeveen, D., Zander, S., & Jones, M. G. (2023). Machine
learning-based detection of frost events in wheat plants from infrared thermog-
raphy. European Journal of Agronomy, 149, 126900. https://doi.org/10.1016/j.
eja.2023.126900
Shammi, S., Sohel, F., Diepeveen, D., Zander, S., Jones, M. G., Bekuma, A., & Biddulph,
B. (2022). Machine learning-based detection of freezing events using infrared ther-
208
Bibliography
209
Bibliography
Shrestha, A., & Mahmood, A. (2019). Review of deep learning algorithms and architec-
tures. IEEE access, 7, 53040–53065. https : / / doi . org / 10 . 1109 / ACCESS . 2019 .
2912200
Shukla, B. K., Maurya, N., & Sharma, M. (2023). Advancements in sensor-based tech-
nologies for precision agriculture: An exploration of interoperability, analytics and
deployment strategies. Engineering Proceedings, 58 (1), 22.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556.
Sinapan, I., Lin-Kwong-Chon, C., Damour, C., Kadjo, J.-J. A., & Benne, M. (2023).
Oxygen bubble dynamics in pem water electrolyzers with a deep-learning-based
approach. Hydrogen, 4 (3), 556–572. https://doi.org/10.3390/hydrogen4030036
Singh, A., Jones, S., Ganapathysubramanian, B., Sarkar, S., Mueller, D., Sandhu, K., &
Nagasubramanian, K. (2021). Challenges and opportunities in machine-augmented
plant stress phenotyping. Trends in Plant Science, 26 (1), 53–69. https://doi.org/
10.1016/j.tplants.2020.07.010
Singh, R. K., Berkvens, R., & Weyn, M. (2021). Agrifusion: An architecture for iot and
emerging technologies based on a precision agriculture survey. IEEE Access, 9,
136253–136283.
Sivakumar, A. N. V., Li, J., Scott, S., Psota, E., J Jhala, A., Luck, J. D., & Shi, Y.
(2020). Comparison of object detection and patch-based classification deep learn-
ing models on mid-to late-season weed detection in uav imagery. Remote Sensing,
12 (13), 2136.
Skovsen, S., Dyrmann, M., Mortensen, A. K., Laursen, M. S., Gislum, R., Eriksen, J.,
Farkhani, S., Karstoft, H., & Jorgensen, R. N. (2019). The grassclover image
dataset for semantic and hierarchical species understanding in agriculture. Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops.
Slaughter, D., Giles, D., & Downey, D. (2008). Autonomous robotic weed control systems:
A review. Computers and electronics in agriculture, 61 (1), 63–78.
Smith, M. J. (2018). Getting value from artificial intelligence in agriculture. Animal Pro-
duction Science, 60 (1), 46–54.
210
Bibliography
Sportelli, M., Apolo-Apolo, O. E., Fontanelli, M., Frasconi, C., Raffaelli, M., Peruzzi, A.,
& Perez-Ruiz, M. (2023). Evaluation of yolo object detectors for weed detection
in different turfgrass scenarios. Applied Sciences, 13 (14), 8502. https://doi.org/
10.3390/app13148502
Steinberg, R. (2017, December). 6 areas where artificial neural networks outperform hu-
mans. Retrieved December 25, 2020, from https://venturebeat.com/2017/12/08/
6-areas-where-artificial-neural-networks-outperform-humans/
Steup, R., Dombrowski, L., & Su, N. M. (2019). Feeding the world with data: Visions
of data-driven farming. Proceedings of the 2019 on Designing Interactive Systems
Conference, 1503–1515.
Stewart, R. E. (2018, May). Weed control. https://www.britannica.com/technology/
agricultural-technology/Weed-control
Stewart, R., Andriluka, M., & Ng, A. Y. (2016). End-to-end people detection in crowded
scenes. Proceedings of the IEEE conference on computer vision and pattern recog-
nition, 2325–2333.
Su, W.-H. (2020). Advanced machine learning in point spectroscopy, rgb-and hyperspectral-
imaging for automatic discriminations of crops and weeds: A review. Smart Cities,
3 (3), 767–792.
Sudars, K., Jasko, J., Namatevs, I., Ozola, L., & Badaukis, N. (2020). Dataset of anno-
tated food crops and weed images for robotic computer vision control. Data in
Brief, 105833.
Suh, H. K., Ijsselmuiden, J., Hofstee, J. W., & van Henten, E. J. (2018). Transfer learning
for the classification of sugar beet and volunteer potato under field conditions.
Biosystems engineering, 174, 50–65.
Sukegawa, S., Yoshii, K., Hara, T., Yamashita, K., Nakano, K., Yamamoto, N., Nagat-
suka, H., & Furuki, Y. (2020). Deep neural networks for dental implant system
classification. Biomolecules, 10 (7), 984.
Sunil, G., Zhang, Y., Koparan, C., Ahmed, M. R., Howatt, K., & Sun, X. (2022). Weed and
crop species classification using computer vision and deep learning technologies
in greenhouse conditions. Journal of Agriculture and Food Research, 9, 100325.
https://doi.org/https://doi.org/10.1016/j.jafr.2022.100325
211
Bibliography
Swain, K. C., Nørremark, M., Jørgensen, R. N., Midtiby, H. S., & Green, O. (2011). Weed
identification using an automated active shape matching (aasm) technique. biosys-
tems engineering, 110 (4), 450–457. https://doi.org/10.1016/j.biosystemseng.2011.
09.01
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-resnet
and the impact of residual connections on learning. Proceedings of the AAAI Con-
ference on Artificial Intelligence, 31 (1).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
V., & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the
IEEE conference on computer vision and pattern recognition, 1–9.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the
inception architecture for computer vision. Proceedings of the IEEE conference on
computer vision and pattern recognition, 2818–2826.
Taherkhani, A., Cosma, G., & McGinnity, T. M. (2020). Adaboost-cnn: An adaptive
boosting algorithm for convolutional neural networks to classify multi-class imbal-
anced datasets using transfer learning. Neurocomputing, 404, 351–366.
Takahashi, R., Matsubara, T., & Uehara, K. (2018). Ricap: Random image cropping
and patching data augmentation for deep cnns. Asian Conference on Machine
Learning, 786–798.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep
transfer learning. Artificial Neural Networks and Machine Learning–ICANN 2018:
27th International Conference on Artificial Neural Networks, Rhodes, Greece, Oc-
tober 4-7, 2018, Proceedings, Part III 27, 270–279.
Tang, J., Wang, D., Zhang, Z., He, L., Xin, J., & Xu, Y. (2017). Weed identification
based on k-means feature learning combined with convolutional neural network.
Computers and electronics in agriculture, 135, 63–70.
Tao, A., Barker, J., & Sarathy, S. (2016). Detectnet: Deep neural network for object
detection in digits. Parallel Forall, 4.
Teimouri, N., Dyrmann, M., Nielsen, P. R., Mathiassen, S. K., Somerville, G. J., &
Jørgensen, R. N. (2018). Weed growth stage estimator using deep convolutional
neural networks. Sensors, 18 (5), 1580.
212
Bibliography
Thambawita, V., Strümke, I., Hicks, S. A., Halvorsen, P., Parasa, S., & Riegler, M. A.
(2021). Impact of image resolution on deep learning performance in endoscopy
image classification: An experimental study using a large dataset of endoscopic
images. Diagnostics, 11 (12), 2183. https://doi.org/10.3390/diagnostics11122183
Tian, H., Wang, T., Liu, Y., Qiao, X., & Li, Y. (2020). Computer vision technology in
agricultural automation—a review. Information Processing in Agriculture, 7 (1),
1–19.
Tian, L., Slaughter, D., & Norris, R. (2000). Machine vision identification of tomato
seedlings for automated weed control. Transactions of ASAE, 40 (6), 1761–1768.
Toğaçar, M. (2022). Using darknet models and metaheuristic optimization methods to-
gether to detect weeds growing along with seedlings. Ecological Informatics, 68,
101519. https://doi.org/10.1016/j.ecoinf.2021.101519
Tong, K., Wu, Y., & Zhou, F. (2020). Recent advances in small object detection based
on deep learning: A review. Image and Vision Computing, 97, 103910. https :
//doi.org/10.1016/j.imavis.2020.103910
Trong, V. H., Gwang-hyun, Y., Vu, D. T., & Jin-young, K. (2020). Late fusion of mul-
timodal deep neural networks for weeds classification. Computers and Electronics
in Agriculture, 175, 105506.
Ullah, F., Salam, A., Abrar, M., & Amin, F. (2023). Brain tumor segmentation using a
patch-based convolutional neural network: A big data analysis approach. Mathe-
matics, 11 (7), 1635. https://doi.org/10.3390/math11071635
Umamaheswari, S., Arjun, R., & Meganathan, D. (2018). Weed detection in farm crops
using parallel image processing. 2018 Conference on Information and Communi-
cation Technology (CICT), 1–4.
Umamaheswari, S., & Jain, A. V. (2020). Encoder–decoder architecture for crop-weed
classification using pixel-wise labelling. 2020 International Conference on Artificial
Intelligence and Signal Processing (AISP), 1–6.
Valente, J., Doldersum, M., Roers, C., & Kooistra, L. (2019). Detecting rumex obtusifolius
weed plants in grasslands from uav rgb imagery using deep learning. ISPRS Annals
of Photogrammetry, Remote Sensing & Spatial Information Sciences, 4.
213
Bibliography
Van der Walt, S., Schönberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner, J. D.,
Yager, N., Gouillart, E., & Yu, T. (2014). Scikit-image: Image processing in python.
PeerJ, 2, e453.
Van Klompenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using
machine learning: A systematic literature review. Computers and Electronics in
Agriculture, 177, 105709. https://doi.org/https://doi.org/10.1016/j.compag.2020.
105709
Viraf. (2020, June). Create a synthetic image dataset - the "what", the "why" and the
"how". https://towardsdatascience.com/create- a- synthetic- image- dataset- the-
what-the-why-and-the-how-f820e6b6f718
Wahyudi, D., Soesanti, I., & Nugroho, H. A. (2022). Toward detection of small objects
using deep learning methods: A review. 2022 14th International Conference on
Information Technology and Electrical Engineering (ICITEE), 314–319. https://
doi.org/10.1109/ICITEE56407.2022.9954101
Wäldchen, J., & Mäder, P. (2018). Plant species identification using computer vision
techniques: A systematic literature review. Archives of Computational Methods in
Engineering, 25 (2), 507–543.
Wang, A., Xu, Y., Wei, X., & Cui, B. (2020). Semantic segmentation of crop and weed
using an encoder-decoder network and image enhancement method under uncon-
trolled outdoor illumination. IEEE Access, 8, 81724–81734.
Wang, A., Zhang, W., & Wei, X. (2019). A review on weed detection using ground-based
machine vision and image processing techniques. Computers and electronics in
agriculture, 158, 226–240.
Wang, C., & Xiao, Z. (2021). Lychee surface defect detection based on deep convolutional
neural networks with gan-based data augmentation. Agronomy, 11 (8), 1500.
Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2023a). Yolov7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors, 7464–7475.
Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2023b). Yolov7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors. Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464–7475.
Wang, M., Fu, B., Fan, J., Wang, Y., Zhang, L., & Xia, C. (2023). Sweet potato leaf
detection in a natural scene based on faster r-cnn with a visual attention mecha-
214
Bibliography
215
Bibliography
Wu, Z., Chen, Y., Zhao, B., Kang, X., & Ding, Y. (2021). Review of weed detection
methods based on computer vision. Sensors, 21 (11), 3647. https://doi.org/10.
3390/s21113647
Yan, X., Deng, X., & Jin, J. (2020). Classification of weed species in the paddy field with
dcnn-learned features. 2020 IEEE 5th Information Technology and Mechatronics
Engineering Conference (ITOEC), 336–340.
Yi, Z., Yongliang, S., & Jun, Z. (2019). An improved tiny-yolov3 pedestrian detection
algorithm. Optik, 183, 17–23.
Yoo, D., Park, S., Lee, J.-Y., Paek, A. S., & So Kweon, I. (2015). Attentionnet: Ag-
gregating weak directions for accurate object detection. Proceedings of the IEEE
International Conference on Computer Vision, 2659–2667.
Yu, J., Schumann, A. W., Cao, Z., Sharpe, S. M., & Boyd, N. S. (2019a). Weed detection
in perennial ryegrass with deep learning convolutional neural network. Frontiers
in plant science, 10.
Yu, J., Sharpe, S. M., Schumann, A. W., & Boyd, N. S. (2019b). Deep learning for image-
based weed detection in turfgrass. European journal of agronomy, 104, 78–84.
Zhai, Z., Martínez, J. F., Beltran, V., & Martínez, N. L. (2020). Decision support systems
for agriculture 4.0: Survey and challenges. Computers and Electronics in Agricul-
ture, 170, 105256. https://doi.org/https://doi.org/10.1016/j.compag.2020.105256
Zhang, R., Wang, C., Hu, X., Liu, Y., Chen, S., et al. (2020). Weed location and recogni-
tion based on uav imaging and deep learning. International Journal of Precision
Agricultural Aviation, 3 (1).
Zhang, W., Hansen, M. F., Volonakis, T. N., Smith, M., Smith, L., Wilson, J., Ralston, G.,
Broadbent, L., & Wright, G. (2018). Broad-leaf weed detection in pasture. 2018
IEEE 3rd International Conference on Image, Vision and Computing (ICIVC),
101–105.
Zhang, X.-Y., Shi, H., Zhu, X., & Li, P. (2019). Active semi-supervised learning based on
self-expressive correlation with generative adversarial networks. Neurocomputing,
345, 103–113.
Zhao, Z.-Q., Zheng, P., Xu, S.-t., & Wu, X. (2019). Object detection with deep learning:
A review. IEEE transactions on neural networks and learning systems, 30 (11),
3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865
216
Bibliography
Zheng, Y., Zhu, Q., Huang, M., Guo, Y., & Qin, J. (2017). Maize and weed classifica-
tion using color indices with support vector data description in outdoor fields.
Computers and Electronics in Agriculture, 141, 215–222.
Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National Science
Review, 5 (1), 44–53.
Zoph, B., Cubuk, E. D., Ghiasi, G., Lin, T.-Y., Shlens, J., & Le, Q. V. (2020). Learning
data augmentation strategies for object detection. European conference on com-
puter vision, 566–583. https://doi.org/10.1007/978-3-030-58583-9_34
217