0% found this document useful (0 votes)
31 views234 pages

Hasan 2024

This thesis presents research on deep learning techniques for detecting green weeds in agricultural imagery, addressing the challenges of similar coloration and texture between crops and weeds. The work includes a comprehensive literature review, evaluation of neural networks, and the creation of a new annotated dataset for weed classification. The findings demonstrate improved accuracy in weed species recognition, contributing to more effective weed management strategies.

Uploaded by

f.alpcelik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views234 pages

Hasan 2024

This thesis presents research on deep learning techniques for detecting green weeds in agricultural imagery, addressing the challenges of similar coloration and texture between crops and weeds. The work includes a comprehensive literature review, evaluation of neural networks, and the creation of a new annotated dataset for weed classification. The findings demonstrate improved accuracy in weed species recognition, contributing to more effective weed management strategies.

Uploaded by

f.alpcelik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 234

This thesis is presented for the degree of Doctor of Philosophy of Murdoch University

Deep Learning Techniques for Green on


Green Weed Detection from Imagery

Submitted By:

A S M Mahmudul Hasan
Information Technology, Murdoch University, Australia.

Principal Supervisor:
Professor Ferdous Sohel

Co-supervisors:
Professor Michael Jones
Professor Dean Diepeveen
Professor Hamid Laga

January, 2024
Thesis Declaration

I, A S M Mahmudul Hasan verify that in submitting this thesis; the thesis is my own
account of the research conducted by me, except where other sources are fully acknowl-
edged in the appropriate format, the extent to which the work of others has been used
is documented by a percent allocation of work and signed by myself and my Principal
Supervisor, the thesis contains as its main content work which has not been previously
submitted for a degree at any university, the University supplied plagiarism software has
been used to ensure the work is of the appropriate standard to send for examination, any
editing and proof-reading by professional editors comply with the standards set out on
the Graduate Research School website, and that all necessary ethics and safety approvals
were obtained, including their relevant approval or permit numbers, as appropriate.

January 10, 2024

i
Acknowledgements
I am profoundly grateful to all those whose support, guidance, and encouragement
have contributed to completing this thesis. First and foremost, I extend my deepest
gratitude to my principal supervisor, Professor Ferdous Sohel, whose unwavering support,
expertise, and mentorship were invaluable throughout this journey. I am indebted to my
co-supervisors, Professor Dean Diepeveen, Professor Hamid Laga and Professor Michael
Jones, for their insightful feedback and constructive criticism.
My heartfelt appreciation goes to my wife, Saria, whose unwavering love, encourage-
ment, and sacrifices made this achievement possible. To my children, Arisha and Ahyan,
your boundless patience, understanding, and the joy you bring into my life sustained me
through this challenging endeavour. I owe an immeasurable gratitude to my parents and
sister, whose unwavering belief in me, endless encouragement, and sacrifices paved the
way for my educational pursuits.
This research is supported by Murdoch International Postgraduate Scholarship and
Murdoch Strategic Scholarship. I am thankful to Murdoch University, Australia.
Lastly, I express my gratitude to all those whose names might not be mentioned but
whose contributions, in various forms, have been instrumental in shaping this thesis.

ii
Abstract
Weed is a major problem faced by the agriculture and farming sector. Advanced
imaging and deep learning (DL) techniques have the potential to automate various tasks
involved in weed management. However, automatic weed detection in crops from im-
agery is challenging because both weeds and crops are of similar colour (green on green),
and their growth and texture are somewhat similar; weeds vary based on crop, season
and weather. Moreover, recognising weed species is crucial for applying targeted con-
trolling mechanisms. This thesis focuses on improving the accuracy and throughput of
deep learning models for weed species recognition. This thesis has the following contri-
butions: First, we present a comprehensive literature review highlighting the challenges
in developing an automatic weed species recognition technique.
Second, we evaluate several neural networks for weed recognition in various exper-
imental settings and dataset combinations. Moreover, we investigate transfer-learning
techniques by preserving the pre-trained weights for extracting the features of crop and
weed datasets.
Third, we repurpose a public dataset and construct an instance-level weed dataset.
We annotate the dataset using a bounding box around each instance and label them with
the appropriate species of the crop or weed. To establish a benchmark, we evaluate the
dataset using several models to locate and classify weeds in crops.
Fourth, we propose a weed classification pipeline where only the discriminative image
patches are used to improve the performance. We enhance the images using generative
adversarial networks. The enhanced images are divided into patches, and a selected
subset of these are used for training the DL models.
Finally, we investigate an approach to classify weeds into three categories based on
morphology: grass, sedge and broadleaf. We train an object detection model to detect
plants from images. A Siamese network, leveraging state-of-the-art deep learning models
as its backbone, is used for weed classification.
Our experiments demonstrate the proposed DL techniques can be used in detecting
and classifying weeds at the species level and thereby help weed mitigation.

iii
Attribution Statement
In accordance with the Murdoch University Graduate Degrees Regulations, it is ac-
knowledged that this thesis represents the work of the Candidate with contributions from
their supervisors and, where indicated, collaborators. The Candidate is the majority con-
tributor to this thesis with no less than 75% of the total work attributed to their efforts.

Candidate Coordinating supervisor

iv
Authorship Declaration: Co-Authored Publications
This thesis contains works which have been published and prepared for publication.
Publication 1:
Title A survey of deep learning techniques for weed detection from images.
Authors A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga
and Michael G K Jones.
Journal Computers and Electronics in Agriculture.
Publisher Elsevier
Publication Date May 2021
DOI https://doi.org/10.1016/j.compag.2021.106067
Location in thesis Chapter 2

A S M Mahmudul
Ferdous Sohel Dean Diepeveen Hamid Laga Michael G K Jones
Hasan

Publication 2:
Title Weed recognition using deep learning techniques on class-imbalanced
imagery.
Authors A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga
and Michael G K Jones.
Journal Crop and Pasture Science.
Publisher CSIRO PUBLISHING
Publication Date April 2022
DOI https://doi.org/10.1071/CP21626
Location in thesis Chapter 3

A S M Mahmudul
Ferdous Sohel Dean Diepeveen Hamid Laga Michael G K Jones
Hasan

Publication 3:
Title Object-level benchmark for deep learning-based detection and classifi-
cation of weed species.
Authors A S M Mahmudul Hasan, Dean Diepeveen, Hamid Laga, Michael GK
Jones, Ferdous Sohel.
Journal Crop Protection.
Publisher Elsevier
Publication Date March 2024
DOI https://doi.org/10.1016/j.cropro.2023.106561
Location in thesis Chapter 4

A S M Mahmudul
Dean Diepeveen Hamid Laga Michael G K Jones Ferdous Sohel
Hasan

v
Publication 4
Title Image patch-based deep learning approach for crop and weed recogni-
tion.
Authors A S M Mahmudul Hasan, Dean Diepeveen, Hamid Laga, Michael GK
Jones, Ferdous Sohel.
Journal Ecological informatics.
Publisher Elsevier
Publication Date December 2023
DOI https://doi.org/10.1016/j.ecoinf.2023.102361
Location in thesis Chapter 5

A S M Mahmudul
Dean Diepeveen Hamid Laga Michael G K Jones Ferdous Sohel
Hasan

Publication 5
Title Morphology-based weed type recognition using Siamese network
Authors A S M Mahmudul Hasan, Dean Diepeveen, Hamid Laga, Michael GK
Jones, Ferdous Sohel.
Journal XXXXX.
Publisher XXXXX
Publication Date Under review
DOI Under review
Location in thesis Chapter 6

A S M Mahmudul
Dean Diepeveen Hamid Laga Michael G K Jones Ferdous Sohel
Hasan

I, Professor Ferdous Sohel, certify that the student statements regarding their contribu-
tion to each of the works listed above are correct.

Candidate Coordinating supervisor

vi
Contents

Thesis Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Attribution Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Authorship Declaration: Co-Authored Publications . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1 Introduction 1
1.1 Weed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Manual approach to control weeds . . . . . . . . . . . . . . . . . . 2
1.1.2 Automation in weed control systems . . . . . . . . . . . . . . . . 2
1.2 Automation and industry needs . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Weed recognition using artificial intelligence deep learning . . . . . . . . 4
1.4 Challenges in developing deep learning based weed management system . 5
1.5 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Literature Review 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Related Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Traditional ML- vs DL-based Weed Detection Methods . . . . . . . . . . 15
2.4 Paper Selection Criteria in this Survey . . . . . . . . . . . . . . . . . . . 18

vii
Contents

2.5 An Overview and Taxonomy of Deep Learning-based Weed Detection Ap-


proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.1 Sensors and Camera Mounting Vehicle . . . . . . . . . . . . . . . 28
2.6.1.1 Unmanned Aerial Vehicles (UAVs) . . . . . . . . . . . . 28
2.6.1.2 Field Robots (FRs) . . . . . . . . . . . . . . . . . . . . . 29
2.6.1.3 All-Terrain Vehicles (ATVs) . . . . . . . . . . . . . . . . 29
2.6.1.4 Collect Data without Camera Mounting Devices . . . . . 30
2.6.2 Satellite Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.3 Public Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7.1 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7.2 Training Data Generation . . . . . . . . . . . . . . . . . . . . . . 37
2.7.3 Data Labelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8 Detection Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.8.1 Plant-based Classification . . . . . . . . . . . . . . . . . . . . . . 41
2.8.2 Weed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.9 Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.9.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.9.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9.3 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . 43
2.10 Deep Learning Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.10.1 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . 43
2.10.1.1 Pre-trained Network . . . . . . . . . . . . . . . . . . . . 43
2.10.1.2 Training from Scratch . . . . . . . . . . . . . . . . . . . 46
2.10.2 Region Proposal Networks (RPN) . . . . . . . . . . . . . . . . . . 47
2.10.3 Fully Convolutional Networks (FCN) . . . . . . . . . . . . . . . . 49
2.10.4 Graph Convolutional Network (GCN) . . . . . . . . . . . . . . . . 52
2.10.5 Hybrid Networks (HN) . . . . . . . . . . . . . . . . . . . . . . . . 52
2.11 Performance Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 54
2.12 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

viii
Contents

3 Weed classification 59
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.1.1 DeepWeeds dataset . . . . . . . . . . . . . . . . . . . . . 64
3.2.1.2 Soybean Weed Dataset . . . . . . . . . . . . . . . . . . . 64
3.2.1.3 Cotton Tomato Weed Dataset . . . . . . . . . . . . . . . 65
3.2.1.4 Corn Weed Dataset . . . . . . . . . . . . . . . . . . . . . 65
3.2.1.5 Our Combined Dataset . . . . . . . . . . . . . . . . . . . 65
3.2.1.6 Unseen Test Dataset . . . . . . . . . . . . . . . . . . . . 66
3.2.2 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2.5 Transfer Learning and Fine-Tuning . . . . . . . . . . . . . . . . . 71
3.2.6 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 Experiment 1: Comparing the performance of DL models for clas-
sifying images in each of the datasets . . . . . . . . . . . . . . . . 74
3.3.2 Experiment 2: Combining two datasets . . . . . . . . . . . . . . . 76
3.3.3 Experiment 3: Training the model with all four datasets together 78
3.3.4 Experiment 4: Training the models using both real and augmented
images of the four datasets . . . . . . . . . . . . . . . . . . . . . . 80
3.3.5 Experiment 5: Comparing the performance of two ResNet-50 mod-
els individually trained on ImageNet dataset, and the combined
dataset, and testing on the Unseen Test dataset . . . . . . . . . . 83
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4 Real-time weed detection and classification 86


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.1 Prepare Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.1.1 Dataset Description . . . . . . . . . . . . . . . . . . . . 92
4.2.1.2 Object level data annotation . . . . . . . . . . . . . . . 93

ix
Contents

4.2.1.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . 97


4.2.2 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.2.1 YOLO model . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.2.2 Faster-RCNN . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.4 Computing Environment . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.4.1 Training and inference time . . . . . . . . . . . . . . . . 103
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.1 Comparison on training and inference time . . . . . . . . . . . . . 104
4.3.2 Performance of the models on training dataset . . . . . . . . . . . 105
4.3.3 Comparison of the models’ accuracy on the test data . . . . . . . 106
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.4.1 Comparison on training and inference time . . . . . . . . . . . . . 110
4.4.2 Performance of the models on training dataset . . . . . . . . . . . 110
4.4.3 Comparison of the models’ accuracy on the test data . . . . . . . 111
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5 Improving classification accuracy 115


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2.1.1 DeepWeeds dataset . . . . . . . . . . . . . . . . . . . . . 122
5.2.1.2 Cotton weed dataset . . . . . . . . . . . . . . . . . . . . 122
5.2.1.3 Corn weed dataset . . . . . . . . . . . . . . . . . . . . . 123
5.2.1.4 Cotton Tomato weed dataset . . . . . . . . . . . . . . . 123
5.2.2 Split the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.2.3 Deep learning models . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.2.4 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.2.5 Traditional approach . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2.6 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2.6.1 Image enhancement and image resize . . . . . . . . . . . 128
5.2.6.2 Generate image patches . . . . . . . . . . . . . . . . . . 129
5.2.6.3 Selection relatively important image patches . . . . . . . 129

x
Contents

5.2.6.4 Model training with selected patches . . . . . . . . . . . 131


5.2.6.5 Evaluation of the models . . . . . . . . . . . . . . . . . . 131
5.2.7 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.3.1 Traditional pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.3.2 Patch-based approach . . . . . . . . . . . . . . . . . . . . . . . . 133
5.3.3 Performance improvement with enhanced image . . . . . . . . . . 136
5.3.4 Performance evaluation on the dataset with inter-class similarity
and intra-class dissimilarity . . . . . . . . . . . . . . . . . . . . . 136
5.3.5 Performance improvement on class imbalanced dataset . . . . . . 138
5.3.6 Comparison with traditional approach and prior studies . . . . . . 140
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4.1 Comparison to related work . . . . . . . . . . . . . . . . . . . . . 140
5.4.1.1 Traditional vs patch-based pipeline . . . . . . . . . . . . 142
5.4.1.2 Patch-based approach . . . . . . . . . . . . . . . . . . . 143
5.4.1.3 Performance improvement with enhanced image . . . . . 144
5.4.2 Benchmarking of the results . . . . . . . . . . . . . . . . . . . . . 145
5.4.2.1 Performance evaluation on the dataset with inter-class
similarity and intra-class dissimilarity . . . . . . . . . . 145
5.4.2.2 Performance improvement on class imbalanced dataset . 146
5.4.3 Limitations and future work . . . . . . . . . . . . . . . . . . . . . 148
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6 Generalised approach for weed recognition 150


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2.2 Plant detection model . . . . . . . . . . . . . . . . . . . . . . . . 158
6.2.3 Prepare dataset to train the Siamese network . . . . . . . . . . . 158
6.2.4 Siamese neural network architecture . . . . . . . . . . . . . . . . . 159
6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.3.1 Plant detection from images . . . . . . . . . . . . . . . . . . . . . 160
6.3.2 Similarity functions and models’ performance . . . . . . . . . . . 161

xi
Contents

6.3.3 Performance of the models on the datasets . . . . . . . . . . . . . 162


6.3.4 The proposed Siamese network vs state-of-the-art CNN models
based weed classification . . . . . . . . . . . . . . . . . . . . . . . 166
6.4 Limitation and future works . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7 Conclusion 171
7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.1.1 Comprehensive literature review . . . . . . . . . . . . . . . . . . . 171
7.1.2 Weed classification pipeline and evaluation of deep learning . . . . 172
7.1.3 Weed detection and classification . . . . . . . . . . . . . . . . . . 172
7.1.4 Enhancing classification accuracy . . . . . . . . . . . . . . . . . . 173
7.1.5 Generalised weed recognition technique . . . . . . . . . . . . . . . 173
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2.1 Benchmark dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2.2 Deep learning in weed detection and classification . . . . . . . . . 175
7.2.3 Field trial of the proposed models . . . . . . . . . . . . . . . . . . 176

Bibliography 178

xii
List of Figures

1.1 Example of crop and weed similarities. . . . . . . . . . . . . . . . . . . . 6

2.1 Weeds in different crops. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


2.2 A workflow of weed detection techniques using deep learning . . . . . . . 17
2.3 The number of selected publications from 2010 to 30 August 2020 . . . . 19
2.4 Taxonomy of deep learning-based weed detection techniques . . . . . . . 20

3.1 Sample crop and weed images of each class from the datasets. . . . . . . 67
3.2 The basic block diagram of DL models used for the experiments. . . . . . 73
3.3 Confusion matrix of “DeepWeeds” combined with other three dataset. . . 79
3.4 Example of incorrectly classified images. . . . . . . . . . . . . . . . . . . 80
3.5 Confusion matrix after combining four dataset using ResNet-50 model . . 81
3.6 Confusion matrix for ResNet-50 model with augmentation . . . . . . . . 82
3.7 Confusion matrix for CW ResNet-50 and SOTA ResNet-50 model. . . . . 83

4.1 The proposed pipeline for detecting weeds in crops . . . . . . . . . . . . 93


4.2 Sample crop and weed images of each class from the dataset. . . . . . . . 94
4.3 Image level annotation vs instance level annotation . . . . . . . . . . . . 95
4.4 Bounding box annotation of crop and weed images. . . . . . . . . . . . . 96
4.5 Illustration of image augmentation . . . . . . . . . . . . . . . . . . . . . 98
4.6 Training accuracy curves (mAP) . . . . . . . . . . . . . . . . . . . . . . . 107
4.7 Example images from test dataset . . . . . . . . . . . . . . . . . . . . . . 109
4.8 Illustration of the effect of data augmentation . . . . . . . . . . . . . . . 112

5.1 An illustration of the proposed workflow . . . . . . . . . . . . . . . . . . 127


5.2 The workflow for enhancing and resizing the images. . . . . . . . . . . . 129
5.3 An illustration of the image patch selection process. . . . . . . . . . . . . 131

xiii
List of Figures

5.4 Deep learning models’ accuracy with respect to image size . . . . . . . . 135
5.5 Confusion matrix for DeepWeeds dataset using DenseNet201 model . . . 138
5.6 Confusion matrix for DenseNet201 model on DeepWeeds dataset. . . . . 142
5.7 Relationship between the number of data and classification accuracy . . . 143
5.8 Image of Chenopodium album weed classified as Bluegrass. . . . . . . . . 144
5.9 Example of a chinee apple weed classified as snake weed . . . . . . . . . . 146
5.10 Grad-CAM of the extracted patches from the image . . . . . . . . . . . . 147

6.1 Example of weed images from the Weed25 datasets. . . . . . . . . . . . . 154


6.2 The proposed pipeline for classifying weeds. . . . . . . . . . . . . . . . . 156
6.3 The training process of the Siamese network using Weed25 dataset. . . . 159
6.4 The evaluation process of our Siamese network. . . . . . . . . . . . . . . 160
6.5 The ground truth image with annotation and detected plants . . . . . . . 162
6.6 Example of the detection and classification of weeds . . . . . . . . . . . . 165

xiv
List of Tables

2.1 Number of documents resulted for the queries indicated . . . . . . . . . . 19


2.2 An overview of different DL approaches in weed detection . . . . . . . . . 21
2.3 List of publicly available datasets . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Data augmentation techniques used in the relevant studies . . . . . . . . 38
2.5 Image annotation techniques used for weed detection . . . . . . . . . . . 40
2.6 The evaluation metrics applied in the related works . . . . . . . . . . . . 55

3.1 Summary of crop and weed datasets . . . . . . . . . . . . . . . . . . . . . 66


3.2 Number of parameters in the DL models . . . . . . . . . . . . . . . . . . 71
3.3 The numbers of images for training, validation and testing. . . . . . . . . 74
3.4 Training, validation and testing accuracy. . . . . . . . . . . . . . . . . . . 75
3.5 Training, validation and testing accuracy by combining two of the datasets 77
3.6 The performance of five deep learning models . . . . . . . . . . . . . . . 80
3.7 Performance of five deep learning models for the real and augmented data 81

4.1 Number of images and annotations of crop and weed species. . . . . . . . 97


4.2 The number of objects to train, validate and test. . . . . . . . . . . . . . 98
4.3 Training and inference time for the models. . . . . . . . . . . . . . . . . . 105
4.4 Average precision by class and mean average precision. . . . . . . . . . . 108

5.1 A summary of the datasets used in this research . . . . . . . . . . . . . . 121


5.2 Performance of ten deep learning models . . . . . . . . . . . . . . . . . . 134
5.3 Performance comparison of the models (resized and enhanced images). . 137
5.4 F1 score of the deep learning models . . . . . . . . . . . . . . . . . . . . 139
5.5 Comparison among the classification accuracies . . . . . . . . . . . . . . 141

6.1 Overview of the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

xv
List of Tables

6.2 Performance of YOLO models in detecting plants. . . . . . . . . . . . . . 161


6.3 Comparison between two similarity functions. . . . . . . . . . . . . . . . 163
6.4 The performance of models in recognising the weed class. . . . . . . . . . 164
6.5 Performance comparison between the SOTA CNN models and proposed
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

xvi
Chapter 1

Introduction

The increase in global population has significant implications for agriculture and food
production. As the world’s population continues to grow, the demand for food rises,
putting pressure on agricultural systems to produce more efficiently and sustainably
(Schneider et al., 2011). Increasing agricultural production to meet the growing global
demand for food faces several challenges. Those includes, limited arable land, water
scarcity, climate change, soil degradation and environmental sustainability (Canavari et
al., 2010; J.-W. Han et al., 2021; Nosov et al., 2020).

Precision agriculture emerges as a transformative solution to the challenges con-


fronting modern agriculture (Lindblom et al., 2017). Using advanced technologies and
data-driven methodologies, precision agriculture redefines the way of cultivating crops
and raise livestock. Its impact is far-reaching, addressing critical issues that have long
hampered the efficiency and sustainability of global food production systems (Shukla et
al., 2023; Steup et al., 2019). One of the foremost challenges precision agriculture tackles
is the efficient use of resources. By precisely applying inputs such as water, fertilisers,
herbicides and pesticides, this approach minimise waste, optimises resource allocation,
and reduces the environmental footprint associated with conventional farming practices
(Higgins et al., 2017; R. K. Singh et al., 2021). As a part of precision agriculture, there
have been many studies to find a feasible way for automatic weed management in the
past decade (A. Wang et al., 2019).
1.1. Weed

1.1 Weed

Weed infestation in crops is a pervasive problem in agriculture and poses several


challenges for farmers. They compete with crops for essential resources such as water,
nutrients, and sunlight (Iqbal et al., 2019). Their rapid growth can outpace that of the
cultivated plants, leading to reduced yields and compromised crop quality (Kogan, 1998).
They may overshadow crops, reducing the amount of sunlight reaching them, and their
extensive root systems can absorb nutrients that would otherwise be available to the crops.
Therefore, farmers have to spend a lot of money, time, and effort to reduce weeds. It is
estimated that Australia alone spends $4.8 billion for weeds in grain production, which
includes weed control cost and loss in the impacted agricultural products (Robertson
et al., 2016). Weeds can be categorised as grass, sedge, and broadleaf weeds.

1.1.1 Manual approach to control weeds

The approaches to manage and control the impact of weeds depend on many factors.
However, the methods can be categorised in five main types: Preventative (prevent them
from being established), Cultural (by maintaining field conditions – low weed bank),
Mechanical (e.g., mowing, mulching and tilling), Biological (using weed’s native natural
enemy such as insects, grazing animals or disease), Chemical (using herbicide) (R. E.
Stewart, 2018).

Integrated Weed Management (IWM) seems to be agronomically, environmentally,


and economically acceptable. IWM is a combination of preventive practices with different
control methods (mechanical, physical, biological and chemical) (Scavo & Mauromicale,
2020). All the approaches have many drawbacks. They demand a lot of money, time,
and hard work. In addition to that, there are threats to the health of people, plants, soil,
creatures, and environment (Okese et al., 2020; Sakyi, 2019).

1.1.2 Automation in weed control systems

Automation in weed control systems represents a revolutionary approach to addressing


the challenges of managing weeds in agricultural fields (B. Liu & Bruch, 2020). It can be

2
Chapter 1. Introduction

beneficial both economically and environmentally. This technological advancement inte-


grates various automated solutions to enhance efficiency, reduce labour requirements, and
improve the precision of weed management. Robotic weeders, equipped with advanced
cameras and sensors, operate autonomously to navigate through fields. Using machine
learning algorithms, these robots can accurately distinguish between crops and weeds,
enabling precise and targeted weed removal. Machines equipped with herbicide spraying
systems can be used for the targeted application of herbicides over specific weed-infested
areas. (Lameski et al., 2018).

Automated weed control systems can enhance operational efficiency by working con-
tinuously and at a consistent pace. Precision in weed control is significantly improved,
reducing the use of herbicides and minimising the impact on non-target crops (Korres
et al., 2019). Automation reduces reliance on manual labour and potentially lowers oper-
ational costs (Shaner & Beckie, 2014). Although many obstacles are yet to be addressed,
researchers are improving automated and sustainable weed management systems to help
overcome the agricultural production challenge of 2050 (Westwood et al., 2018).

1.2 Automation and industry needs

Automation in weed detection could be an essential tool for farmers, addressing crit-
ical challenges and meeting their evolving needs. The efficiency it can bring to weed
management is particularly noteworthy, allowing farmers to swiftly identify and address
weed issues before they compromise crop health.

Labour savings can be a significant benefit. Traditional weed control methods often
rely on manual labour, which is time-consuming and costly. Automation minimises the
need for human intervention, addressing labour shortages and allowing farmers to allocate
their workforce more strategically. Moreover, providing accurate data on weed distribu-
tion within fields enables farmers to make informed decisions about resource allocation,
contributing to the overall optimisation of farming practices.

Automated weed detection may allow more efficient application of treatments, reduc-
ing the overall use of herbicides and leading to potential cost savings for farmers. Besides,
automation may help farmers intervene promptly, ensuring higher yields and maintaining

3
1.3. Weed recognition using artificial intelligence deep learning

crop quality.

Automation may provide farmers with real-time data on weed distribution and sever-
ity, empowering them to implement timely and effective weed control strategies. More-
over, Integrated Weed Management (IWM) strategies may benefit from automated weed
detection. Farmers can develop comprehensive and sustainable approaches to managing
weed populations on their farms by combining various weed control methods.

1.3 Weed recognition using artificial intelligence deep


learning

Deep learning is a subset of machine learning that uses artificial neural networks to
model and solve complex problems. It is inspired by the structure and function of the
human brain, precisely the way neurons are interconnected to process information (Ag-
garwal et al., 2018; Choi et al., 2020; Nielsen, 2015). Weed recognition using deep learning
involves the application of neural network architectures, specifically deep convolutional
neural networks (CNNs), to identify and classify weeds in images or videos (Hasan et al.,
2021; Rakhmatulin et al., 2021). Deep learning has shown remarkable success in image
recognition tasks, making it well-suited for automated weed detection in agricultural set-
tings (Rakhmatulin et al., 2021; A. Wang et al., 2019). In the context of weed detection,
deep learning offers several advantages:

• Deep learning models, especially CNNs, automatically learn hierarchical features


from images, eliminating the need for manual feature engineering (Banan et al.,
2020). This capability allows the model to adapt and identify distinguishable pat-
terns associated with different weed species.

• Deep learning models excel in image recognition tasks, achieving high levels of
accuracy and precision (Chartrand et al., 2017). Their ability to determine nuanced
differences in visual features makes them well-suited for distinguishing between
crops and weed species.

• Transfer learning leverages pre-trained models on large image datasets, such as


ImageNet, and fine-tunes them for specific tasks like weed detection (Tan et al.,

4
Chapter 1. Introduction

2018). This approach is particularly beneficial when dealing with limited annotated
data, as it allows the model to benefit from knowledge gained on broader image
recognition tasks.

• Deep learning models can adapt to diverse environmental conditions, lighting vari-
ations, and changes in crop and weed appearance (A. Wang et al., 2019). This
adaptability is crucial in agricultural settings where conditions can be dynamic and
challenging.

• Deep learning models can handle large-scale datasets and complex visual informa-
tion (Najafabadi et al., 2015). This capability is essential for capturing the diversity
of weed species, growth stages, and background variations commonly encountered
in agricultural imagery.

• Deep learning models can be optimised for real-time inference. This is particularly
relevant for applications like drone-based or tractor-mounted systems, where timely
decisions are critical for effective weed management (S. R. Saleem et al., 2023).

• Deep learning-based weed detection seamlessly integrates with precision agriculture


practices. The ability to precisely identify and locate weeds facilitates targeted
interventions, optimising the use of resources such as herbicides and minimising
environmental impact (Shaikh et al., 2022).

1.4 Challenges in developing deep learning based weed


management system

Developing an automated weed control system introduces several challenges. An es-


sential first and fundamental step to developing an automatic weed management system
is detecting and recognising weeds correctly (B. Liu & Bruch, 2020). Detecting weeds in
crops is challenging as both are plants with the same colour, texture, and shape. Figure
1.1 shows some example images where crop and weed plants share similar colour, texture
and shape. Researchers attempt many ways to identify weeds. However, these studies
usually focus on one crop in a specific geographic location. There is a lack of studies
focusing on a comprehensive weed management system for various crops. It is because,

5
1.5. Aims and Objectives

depending on the geographical area and variety of crops, weather and soil conditions
and weed species vary. Therefore, most of the automatic weed management systems are
site-specific (López-Granados, 2011).

(a) Weeds and crops with similar (b) Texture similarities between (c) Weeds and crops share similar
colour (Haug & Ostermann, 2014) crop and weed plants (Bakhshipour shape (PyTorch, 2020)
& Jafari, 2018)

Figure 1.1: Example of crop and weed similarities. Here, green boxes indicates crops and
red boxes indicates weeds. Weeds and crops have quite same colour texture and shape.
Moreover, deep learning models heavily depend on large and diverse datasets for
training (Shrestha & Mahmood, 2019). However, obtaining labelled datasets with a wide
range of weed species, growth stages, and environmental conditions can be challenging
(Hasan et al., 2021; Teimouri et al., 2018). Variability in data introduces difficulties in
creating robust models that generalise well to different scenarios. Furthermore, creating
labelled datasets for training deep learning models requires significant effort. Annotating
diverse and large-scale datasets that encompass various weed types, growth stages, and
environmental conditions is time-consuming and resource-intensive.

In addition, accurate identification of weeds is crucial for effective control. Deep learn-
ing models must be trained to distinguish between various weed species and differentiate
them from crops accurately (Osorio et al., 2020). Achieving high levels of precision and
recall in weed identification remains a persistent challenge. Besides, agricultural envi-
ronments are dynamic, with lighting, weather, and soil conditions variations. Developing
deep learning models that adapt to these changing conditions and maintain accuracy
under diverse scenarios is a significant challenge (A. Sharma et al., 2020).

1.5 Aims and Objectives

Given the use of deep learning in weed detection and its associated challenges mention
in Sections 1.3 and 1.4, we have the following aims and objectives in this thesis:

6
Chapter 1. Introduction

• Evaluate the performance of existing state-of-the-art deep learning model and in-
vestigate transfer-learning techniques by preserving the pre-trained weights for ex-
tracting the features of crop and weed datasets.

• Construct instance-level weed datasets and evaluate the datasets using several mod-
els to locate and classify weeds in crops.

• Investigate a weed classification pipeline for improving the classification accuracy


and address the issues which effect the performance of the models.

• Investigate a site-independent approach to recognise weed categories in crops.

1.6 Thesis contributions

Deep learning has demonstrated significant success in various domains, including


image and speech recognition, natural language processing, medical diagnosis, and au-
tonomous systems (S. Dong et al., 2021). Its ability to automatically learn hierarchical
representations from complex data makes it a powerful tool for solving intricate problems
in machine learning (Dargan et al., 2020). Deep learning methods offer several advan-
tages that significantly enhance the efficiency, accuracy, and overall effectiveness of weed
management in agriculture. The main contributions of this thesis are outlined as follows:

• Recognising weeds from images is challenging due to the visual similarities between
weeds and crop plants, exacerbated by varying imaging conditions and environmen-
tal factors. We investigated advanced machine learning techniques, precisely five
state-of-the-art deep neural networks, and evaluated them across multiple exper-
imental settings and dataset combinations. Transfer learning methods were also
explored, leveraging pre-trained weights to extract features and fine-tuning the
models with images from crop and weed datasets. The objective is to enhance the
models’ performance in accurately identifying and distinguishing between crop and
weed species in diverse agricultural scenarios.

• Most existing weed datasets often lack instance-level annotations needed for robust
deep learning-based object detection. The researchers constructed a new dataset

7
1.6. Thesis contributions

with instance-level labelling, annotating bounding boxes around each weed or crop
instance. Using this dataset, they evaluated several deep-learning models for crop
weed detection, comparing their performance in inference time and detection ac-
curacy. Introducing data augmentation techniques improved results by addressing
class imbalance. The findings suggest that these deep learning techniques have the
potential to be applied in developing an automatic field-level weed detection system.

• Accurate classification of weed species within crop plants is vital for targeted treat-
ment. While recent studies demonstrate the potential of artificial intelligence,
particularly deep learning (DL) models, several challenges, including insufficient
training data and complexities like inter-class similarity and intra-class dissimilar-
ity, hinder their effectiveness. To address these challenges, the authors propose
an image-based weed classification pipeline. The pipeline involves enhancing im-
ages using generative adversarial networks, dividing them into overlapping patches,
and selecting informative patches for training deep learning models. Evaluation
of the proposed pipeline on four publicly available crop weed datasets with ten
state-of-the-art models demonstrates significant performance improvements. The
pipeline effectively handles intra-class and inter-class similarity challenges, show-
casing its potential for enhancing weed species classification in precision farming
applications.

• We introduced a novel approach for generalised weed detection, overcoming chal-


lenges posed by diverse field contexts. Unlike traditional species-based classification,
the method categories weeds based on morphological families aligned with farming
practices. It combines object detection for plant identification with a Siamese net-
work using advanced deep-learning models for weed classification. The approach is
validated using three publicly available datasets, where weed species are grouped
into three classes based on morphology. Comparative analysis with state-of-the-
art CNN models demonstrates significantly improved weed classification accuracy.
The proposed technique offers a practical solution for dataset-independent weed
detection, fostering sustainable agricultural practices.

8
Chapter 1. Introduction

1.7 Thesis outline

The remainder of this thesis is structured as follows:

In Chapter 2, we extensively analyse current deep learning-based weed detection


and classification methods. The detailed examination covers four main procedures: data
acquisition, dataset preparation, the use of deep learning techniques for weed detection
and classification in crops, and the evaluation metrics employed.

Moving to Chapter 3, we assess various neural networks for weed recognition across
diverse experimental setups and dataset combinations. We also explore transfer-learning
techniques, preserving pre-trained weights to extract features from crop and weed datasets.

Chapter 4 involves repurposing a public dataset to create an instance-level weed


dataset. We annotate this dataset by bounding boxing around each instance and assigning
the appropriate species label to crops or weeds. To establish a benchmark, we evaluate
the dataset using multiple models to locate and classify weeds in crops.

Chapter 5 introduces a weed classification pipeline focusing on discriminative image


patches to enhance performance. We use generative adversarial networks to improve
images, dividing them into patches with a chosen subset for training deep learning models.

Chapter 6 presents a novel approach to generalised weed detection, addressing chal-


lenges in diverse field contexts. Unlike traditional species-based classification, the method
categorises weeds according to morphological families aligned with farming practices. It
incorporates object detection for plant identification and a Siamese network with ad-
vanced deep-learning models for weed classification. Validation involves three publicly
available datasets, grouping weed species into three morphology-based classes. Com-
parative analysis with state-of-the-art CNN models reveals significantly enhanced weed
classification accuracy.

Finally, Chapter 7 provides concluding remarks on the thesis and explores potential
avenues for future research for recognising weeds in crops using deep machine learning
and applying them in actual field settings. Emphasis is placed on envisioning how these
advancements may lead to substantial commercial adoption of automatic weed control
technology in the foreseeable future.

9
Chapter 2

Literature Review

The rapid advances in Deep Learning (DL) techniques have enabled rapid detection,
localisation, and recognition of objects from images or videos. DL techniques are now
being used in many applications related to agriculture and farming. Automatic detec-
tion and classification of weeds can play an important role in weed management and so
contribute to higher yields. Weed detection in crops from imagery is inherently a chal-
lenging problem because both weeds and crops have similar colours (‘green-on-green’),
and their shapes and texture can be very similar at the growth phase. Also, a crop in
one setting can be considered a weed in another. In addition to their detection, the
recognition of specific weed species is essential so that targeted controlling mechanisms
(e.g. appropriate herbicides and correct doses) can be applied. In this paper, we review
existing deep learning-based weed detection and classification techniques. We cover the
detailed literature on four main procedures, i.e., data acquisition, dataset preparation,
DL techniques employed for detection, location and classification of weeds in crops, and
evaluation metrics approaches. We found that most studies applied supervised learning
techniques, they achieved high classification accuracy by fine-tuning pre-trained models
on any plant dataset, and past experiments have already achieved high accuracy when a
large amount of labelled data is available.

This chapter has been published: Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G.
(2021). A survey of deep learning techniques for weed detection from images. Computers and Electronics
in Agriculture, 184, 106067.
Chapter 2. Literature Review

2.1 Introduction

The world population has been increasing rapidly, and it is expected to reach nine
billion by 2050. Agricultural production needs to increase by about 70% to meet the
anticipated demands (Radoglou-Grammatikis et al., 2020). However, the agricultural
sector will face many challenges during this time, including a reduction of cultivatable
land and the need for more intensive production. Other issues, such as climate change and
water scarcity, will also affect productivity. Precision agriculture or digital agriculture
can provide strategies to mitigate these issues (Lal, 1991; Radoglou-Grammatikis et al.,
2020; Seelan et al., 2003).

Weeds are plants that can spread quickly and undesirably, and can impact on crop
yields and quality (Patel & Kumbhar, 2016). Weeds compete with crops for nutrition,
water, sunlight, and growing space (Iqbal et al., 2019). Therefore, farmers have to deploy
resources to reduce weeds. The management strategies used to reduce the impact of
weeds depend on many factors. These strategies can be categorised into five main types
(Sakyi, 2019): ‘preventative’ (prevent weeds from becoming established), ‘cultural’ (by
maintaining field hygiene – low weed seed bank), ‘mechanical’ (e.g., mowing, mulching
and tilling), ‘biological’ (using natural enemies of weeds such as insects, grazing animals
or disease), and ‘chemical’ (application of herbicides). These approaches all have draw-
backs. In general, there is a financial burden and they require time and extra work. In
addition, control treatments may impact the health of people, plants, soil, animals, or
the environment (Holt, 2004; Okese et al., 2020; Sakyi, 2019).

As the costs of labour has increased, and people have become more concerned about
health and environmental issues, automation of weed control has become desirable (B. Liu
& Bruch, 2020). Automated weed control systems can be beneficial both economically
and environmentally. Such systems can reduce labour costs by using a machine to remove
weeds and, selective spraying techniques can minimise the use of the herbicides (Lameski
et al., 2018).

To develop an automatic weed management system, an essential first step is to be


able to detect and recognise weeds correctly (B. Liu & Bruch, 2020). Detection of weeds
in crops is challenging as weeds and crop plants often have similar colours, textures,

11
2.1. Introduction

(a) Occlusion of crop and (b) Colour and texture (c) Shadow effects in natural
weed (Haug & Ostermann, similarities between crop weed image (PyTorch, 2020)
2014) and weed plants
(Bakhshipour & Jafari,
2018)

(d) Effects of illumination (e) Four different species of (f) Sugar beet crop at
conditions (Di Cicco et al., weeds that share similarities different growth stages
2017) (inter-class similarity) (intra-class variations)
(Olsen et al., 2019) (Giselsson et al., 2017)

(g) Effects of motion blur (h) Weeds can vary at different geographic/weather locations:
and noise (J. Ahmad et al., weed in carrot crop collected from Germany(left) (Haug &
2018; Giselsson et al., 2017) Ostermann, 2014) and Macedonia (Right) (Lameski et al.,
2017)

Figure 2.1: Weeds in different crops (green boxes indicate crops and red boxes indicate
weeds).

and shapes. Figure 2.1 shows crop plants with weeds growing amongst them. Common
challenges in detection and classification of crops and weeds are occlusion (Figure 2.1a),
similarity in colour and texture (Figure 2.1b), plants shadowed in natural light (Figure
2.1c), colour and texture variations due to lighting conditions and illumination (Figure
2.1d) and different species of weeds which appear similar (Figure 2.1e). Same crop plants
or weeds may show dissimilarities during growth phases (Figure 2.1f). Motion blur and
noise in the image also increase the difficulty in classifying plants (Figure 2.1g). In
addition, depending on the geographical location (Figure 2.1h) and the variety of the
crop, weather and soil conditions, the species of weeds can vary (Jensen et al., 2020a).

12
Chapter 2. Literature Review

A typical weed detection system follows four key steps: image acquisition, pre-processing
of images, extraction of features and detection and classification of weeds (Shanmugam
et al., 2020). Different emerging technologies have been used to accomplish these steps.
The most crucial part of these steps is weed detection and classification. In recent years,
with advances in computer technologies, particularly in graphical processing units (GPU),
embedded processors coupled with the use of Machine Learning (ML) techniques have
become more widely used for automatic detection of weed species (Gu et al., 2018; LeCun
et al., 2015; Yu et al., 2019b).

Deep learning (DL) is an important branch of ML. For image classification, object
detection, and recognition, DL algorithms have many advantages over traditional ML ap-
proaches (in this paper, the term machine learning, we mean traditional machine learning
approaches). Extracting and selecting discriminating features with ML methods is diffi-
cult because crops and weeds can be similar. This problem can be addressed efficiently by
using DL approaches based on their strong feature learning capabilities. Recently, many
research articles have been published on DL-based weed recognition, yet few review ar-
ticles have been published on this topic. Su (2020) recently published a review paper
in which the main focus was on the use of point spectroscopy, RGB, and hyperspectral
imaging to classify weeds in crops automatically. However, most of the articles covered
in this review have applied traditional machine learning approaches, with few citations
of recent papers. B. Liu and Bruch (2020) analysed a number of publications on weed
detection, but from the perspective of selective spraying.

We provide this comprehensive literature survey to highlight the great potential now
presented by different DL techniques for detecting, localising, and classifying weeds in
crops. We present a taxonomy of the DL techniques for weed detection and recognition,
and classify major publications based on that taxonomy. We also cover data collection,
data preparation, and data representation approaches. We provide an overview of differ-
ent evaluation metrics used to benchmark the performance of the techniques surveyed in
this article.

The rest of the paper is organised as follows. Existing review papers in this area
are discussed briefly in Section 2.2. Advantages of DL-based weed detection approaches
over traditional ML methods are discussed in Section 2.3. In Section 2.4, we describe

13
2.2. Related Surveys

how the papers for review were selected. A taxonomy and an overview of DL-based
weed detection techniques are provided in Section 2.5. We describe four major steps of
DL-based approaches, i.e. data acquisition (Section 2.6), dataset preparation (Section
2.7), detection and classification methods (Section 2.10) and evaluation metrics (Section
2.11). In Section 2.8 we have highlighted the approaches to detection of weeds in crop
plants adopted in the related work. The learning methods applied the relevant studies
are explained in Section 2.9. We summarise the current state in this field and provide
future directions in Section 2.12 with conclusions are provided in Section 2.13.

2.2 Related Surveys

ML and DL techniques have been used for weed detection, recognition and thus for
weed management. In 2018, Kamilaris and Prenafeta-Boldú (2018) published a survey of
40 research papers that applied DL-techniques to address various agricultural problems,
including weed detection. The study reported that DL-techniques outperformed more
than traditional image processing methods.

In 2016, Merfield (2016) discussed ten components that are essential and possible
obstructions to develop a fully autonomous mechanical weed management system. With
the advance in DL, it seems that the problems raised can now be addressed. Amend
et al. (2019) articulated that DL-based plant classification modules can be deployed not
only in weed management systems but also for fertilisation, irrigation, and phenotyping.
Their study explained how “Deepfield Robotics” systems could reduce labour required for
weed control in agriculture and horticulture.

A. Wang et al. (2019) highlighted that the most challenging part of a weed detection
techniques is to distinguish between weed and crop species. They focused on different
machine vision and image processing techniques used for ground-based weed detection.
Brown and Noble (2005) made a similar observation. They reviewed remote sensing for
weed mapping and ground-based detection techniques. They also reported the limitations
of using either spectral or spatial features to identify weeds in crops. According to their
study, it is preferable to use both features.

Fernández-Quintanilla et al. (2018) reviewed technologies that can be used to monitor

14
Chapter 2. Literature Review

weeds in crops. They explored different remotely sensed and ground-based weed moni-
toring systems in agricultural fields. They reported that weed monitoring is essential for
weed management. They foresaw that the data collected using different sensors could
be stored in cloud systems for timely use in relevant contexts. In another study, Moaz-
zam et al. (2019) evaluated a small number of DL approaches used for detecting weeds in
crops. They identified research gaps, e.g., the lack of large crop-weed datasets, acceptable
classification accuracy and lack of generalised models for detecting different crop plants
and weed species. However, the article only covered a handful of publications and as such
the paper was not thorough and did not adequately cover the breadth and depth of the
literature.

2.3 Traditional ML- vs DL-based Weed Detection Meth-


ods

A typical ML-based weed classification technique follows five key steps: image ac-
quisition, pre-processing such as image enhancement, feature extraction or with feature
selection, applying an ML-based classifier and evaluation of the performance (Bini et al.,
2020; César Pereira Júnior et al., 2020; Liakos et al., 2018; B. Liu & Bruch, 2020).

Different image processing methods have been applied for crop and weed classification
(Hemming & Rath, 2002; L. Tian et al., 2000; Woebbecke et al., 1995). By extracting
shape features, many researchers identify weeds and crops using discriminate analysis
(Chaisattapagon, 1995; G. Meyer et al., 1998). In some other research, different colour
(Hamuda et al., 2017; Jafari et al., 2006; Kazmi et al., 2015b; Zheng et al., 2017) and
texture (Bakhshipour et al., 2017) features were used.

The main challenge in weed detection and classification is that both weeds and crops
can have very similar colours or textures. Machine learning approaches learn the features
from the training data that are available (Bakhshipour & Jafari, 2018). Understandably,
for traditional ML-approaches, the combination of multiple modalities of data e.g. the
shape, texture and colour or a combination of multiple sensor data is expected to generate
superior results to a single modality of data. Kodagoda et al. (2008) argued that colour
or texture features of an image alone are not adequate to classify wheat from weed

15
2.3. Traditional ML- vs DL-based Weed Detection Methods

species Bidens pilosa. They used Near-Infrared (NIR) image cues with those features.
Sabzi et al. (2020) extracted eight texture features based on the grey level co-occurrence
matrix (GLCM), two spectral descriptors of texture, thirteen different colour features, five
moment-invariant features, and eight shape features. They compared the performance
of several algorithms, such as the ant colony algorithm, simulated annealing method,
and genetic algorithm for selecting more discriminative features. The performance of
the Cultural Algorithm, Linear Discriminant Analysis (LDA), Support Vector Machine
(SVM), and Random Forest classifiers were also evaluated to distinguish between crops
and weeds.

Karimi et al. (2006) applied SVM for detecting weeds in corn from hyperspectral im-
ages. In other research, Wendel and Underwood (2016) used SVM and LDA for classifying
plants. They proposed a self-supervised approach for discrimination. Before training the
models, they applied vegetation separation techniques to remove background and dif-
ferent spectral pre-processing to extract features using Principal Component Analysis
(PCA). Ishak et al. (2007) extracted different shape features and the feature vectors were
evaluated using a single-layer perceptron classifier to distinguish narrow and broad-leafed
weeds.

Conventional ML techniques require substantial domain expertise to construct a fea-


ture extractor from raw data. On the other hand, the DL approach uses a representation-
learning method where a machine can automatically discover the discriminative features
from raw data for classification or object detection problems (LeCun et al., 2015). A
machine can learn to classify directly from images, text and sounds (Patterson & Gibson,
2017). The ability to extract the features that best suit the task automatically is also
known as feature learning. As deep learning is a hierarchical architecture of learning, the
features of the higher levels of the hierarchy are composed of lower-level features (Hinton
et al., 2006; Najafabadi et al., 2015).

Several popular and high performing network architectures are available in deep learn-
ing. Two of the frequently used architectures are Convolutional Neural Networks (CNNs)
and Recurrent Neural Networks (RNNs) (Hosseini et al., 2020; LeCun et al., 2015). Al-
though CNNs are used for other types of data, the most widespread use of CNNs is
to analyse and classify images. The word convolution refers to the filtering process. A

16
Chapter 2. Literature Review

Removing
Image En-
Motion
hancement
Blur

Collect Use Public Background


Denoising
Data Data Removal

Data Acquisition Dataset Preparation Image Pre-processing

Colour
Data
Model
Labelling
Conversion

Image Generate
Image
Augmen- Synthetic
Resizing
tation Data

Apply Deep Learning


based Classifiers

Evaluation
of the Model

Figure 2.2: A workflow of weed detection techniques using deep learning.

stack of convolutional layers is the basis of CNN. Each layer receives the input data,
transform, or convolve them and output to the next layer. This convolutional operation
eventually simplifies the data so that it can be better processed and understood. RNNs
have a built-in feedback loop, which allows them to act as a forecasting engine. Feed-
forward or CNN take a fixed size input and produces a fixed size output. The signal flow
of the feed-forward network is unidirectional, i.e., from input to output. They cannot
even capture the sequence or time-series information. RNNs overcome the limitation.
In RNN, the current inputs and outputs of the network are influenced by prior input.
Long Short-Term Memory (LSTM) is a type of RNN (LeCun et al., 2015), which has
a memory cell to remember important prior information, thus can help improving the
performance. Depending on the network architecture, DL has several components like
convolutional layers, pooling layers, activation functions, dense/fully connected layers,
encoder/decoder schemes, memory cells, gates etc. (Patterson & Gibson, 2017).

For image classification, object detection, and localisation, DL algorithms have many

17
2.4. Paper Selection Criteria in this Survey

advantages over traditional ML approaches. Because of the strong feature learning capa-
bilities, DL methods can effectively extract discriminative features of crops and weeds.
Also, with increasing data, the performance of traditional ML approaches has become
saturated. Using large dataset, DL techniques show superior performance compared to
traditional ML techniques (Alom et al., 2019). This characteristic is leading to the in-
creasing application of DL approaches. Many of the research reports in Section 2.10 show
comparisons between DL and other ML approaches to detect weeds in crops. Figure 2.2
gives an overview of DL-based weed detection and recognition techniques.

Not all the steps outlined in Figure 2.2 need to be present in every method. Four major
steps are followed in this process. They are Data Acquisition, Dataset Preparation/Image
Pre-processing, Classification and Evaluation. In this paper, we describe the steps used
in different research work to discriminate between weeds and crops using DL techniques.

2.4 Paper Selection Criteria in this Survey

To overview the state of research, we have undertaken a comprehensive literature


review. The process involved two major steps: (i) searching and selecting related studies
and (ii) detailed analysis of these studies. The main research question is: What is the
role of deep learning techniques for detecting, localising and classifying weeds in crops?
For collecting the related work based on this research question, we applied a keyword-
based search in Google Scholar, Web of Science, IEEE Xplore, Scopus, ScienceDirect,
Multidisciplinary Digital Publishing Institute (MDPI), Springer and Murdoch University
Library databases for journal articles and conference papers. We have applied a keyword
search from 2010 to 30 August 2020. Table 2.1 shows the number of search results for
the search query.

After searching the above databases, duplicated documents were removed: that pro-
vided 988 documents. We further identified and counted those using DL-based method-
ology. In Figure 2.3, we show the total number of papers which used DL between 2010
to 30 August 2020. This shows that before 2016, the number of publications in this area
was very small, but that there is an upward trend in the number of papers from 2016.
For this reason, articles published from 2016 and onward were used in this survey.

18
Chapter 2. Literature Review

Table 2.1: Number of documents resulted for the queries indicated


No. Academic Search Query Number of
Research Retrieved
Databases Documents
1. Google Scholar [“Weed Detection” OR “Weed management” OR “Weed Classification”] 998
AND [“Deep Learning” OR “Deep Machine Learning” OR “Deep Neu-
ral Network”]
2. Web of Science (Weed Detection OR Weed management OR Weed Classification) 124
AND (Deep Learning OR Deep Machine Learning OR Deep Neural
Network)
3. IEEE Xplore (((“All Metadata”:“Deep Learning”) OR “All Metadata”:“Deep Ma- 22
chine Learning”) OR “All Metadata”:“Deep Neural Network”) AND
Weed detection
4. ScienceDirect (“Weed Detection” OR “Weed management” OR “Weed Classifica- 87
tion”) AND (“Deep Learning” OR “Deep Machine Learning” OR “Deep
Neural Network”)
5. Scopus ((Weed AND detection) OR (Weed AND Management) OR (Weed 118
AND Classification)) AND ((Deep AND Learning) OR (Deep AND
Machine AND Learning) OR (Deep AND Neural AND Network))
6. MDPI (“Weed Detection” OR “Weed management” OR “Weed Classifica- 76
tion”) AND (“Deep Learning” OR “Deep Machine Learning” OR “Deep
Neural Network”)
7. SpringerLink (“Weed Detection” OR “Weed management” OR “Weed Classifica- 46
tion”) AND (“Deep Learning” OR “Deep Machine Learning” OR “Deep
Neural Network”)
8. Murdoch (“Weed Detection” OR “Weed management” OR “Weed Classifica- 179
University Library tion”) AND (“Deep Learning” OR “Deep Machine Learning” OR “Deep
Neural Network”)

331
Searched document DL based article
Number of Publication

300 275

200 193

100 91
47 46 38
16 30
11 3 4 8 1 12 1 13 2 5
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Figure 2.3: The number of selected publications on DL-based weed detection approach
from 2010 to 30 August 2020

2.5 An Overview and Taxonomy of Deep Learning-based


Weed Detection Approaches

An overall taxonomy of DL-based weed detection techniques is shown in Figure 2.4.


The papers covered in this survey are categorised using this taxonomy and listed in Table
2.2.

19
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches

Figure 2.4: An overall taxonomy of deep learning-based weed detection techniques

The related publications have been analysed based on the taxonomy in Figure 2.4.
Here, the data acquisition process, sensors and mounting vehicles are highlighted. More-
over, an overview of the dataset preparation approaches, i.e., image pre-processing, data
generation and annotation are also given. While analysing these publications, it has been
found that the related works either generate a weed map for the target site or a classifi-
cation for each of the plants (crops/weeds). For developing the classifiers, the researchers
applied supervised, unsupervised or semi-supervised learning approaches. Depending on
the learning approaches and the research goal, different DL architectures were used. An
overview of the related research is provided in Table 2.2. It shows the crop and weed
species selected for experimental work, the steps taken to collect and prepare the datasets,
and the DL methods applied in the research.

20
Chapter 2. Literature Review

Table 2.2: An overview of different DL approaches used in weed detection

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Espejo-Garcia Tomato, Cot- Black nightshade, velvetleaf Modified Xception, DC; (IP, IA,
et al. (2020) ton Inception-ResNet, ILA); PBC
VGGNet, MobileNet,
DenseNet

A. Wang et al. Sugar beet, Not specified FCN (DC, FR); (IP,
(2020) Oilseed IA, ILA); PBC

Le et al. (2020a) Canola, corn, Not specified Filtered Local Binary (ATV, MC);
radish Pattern with Contour (IP, IA, ILA);
Mask and Coefficient PBC
k (k-FLBPCM),
VGG-16, VGG-19,
ResNet-50, Inception-
v3

Hu et al. (2020) Not specified Chinee apple, Lantana, Parkinsonia, Inception-v3, ResNet- PD; IP; PBC
Parthenium, Prickly acacia, Rubber 50, DenseNet-202,
vine, Siam weed, Snake weed Inception-ResNet-v2,
GCN

Umamaheswari Carrot Not specified SegNet-512, SegNet- PD; IA; PBC


and Jain (2020) 256

H. Huang et al. Rice Leptochloa chinensis, Cyperus iria, FCN (DC, UAV);
(2020) Digitaria sanguinalis (L). Scop, (IP, PLA); WM
Barnyard Grass

Gao et al. Sugar beet Convolvulus sepium (hedge YOLO-v3, tiny DC; (IA, BBA);
(2020) bindweed) YOLO-v3 PBC

Sivakumar et Soybean Waterhemp, Palmer amaranthus, Single-Shot Detector (DC, UAV);


al. (2020) common lambsquarters, velvetleaf, (SSD), Faster R-CNN (IP, IA, BBA);
foxtail species WM

H. Jiang et al. Corn, lettuce, Cirsium setosum, Chenopodium al- GCN PD; (IP, ILA);
(2020) radish bum, bluegrass, sedge, other unspec- PBC
ified weed

Continued on next page

21
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Bosilj et al. Sugar Beets, Not specified SegNet PD; PLA; PBC
(2020) Carrots,
Onions

Yan et al. Paddy Alternanthera philoxeroides, Eclipta AlexNet DC; ILA; PBC
(2020) prostrata, Ludwigia adscendens,
Sagittaria trifolia, Echinochloa crus-
galli, Leptochloa chinensis

R. Zhang et al. Wheat Cirsium Setosum, Descurainia YOLO-v3, Tiny (DC, UAV);
(2020) Sophia, Euphorbia Helioscopia, YOLO-v3 (IP, PLA);
Veronica Didyma, Avena Fatu PBC

Lottes et al. Sugar beet Dicot weeds, grass weeds FCN MC; (IP, PLA);
(2020) PBC

Trong et al. Not specifies 12 species of “Plant Seedlings NASNet, ResNet, In- DC; ILA, PD
(2020) dataset”, 21 species of “CNU weeds ception–ResNet, Mo-
dataset” bileNet, VGGNet

Patidar et al. Not specified Scentless Mayweed, Chickweed, Mask R-CNN PD; PLA; PBC
(2020) Cranesbill, Shepherd’s Purse,
Cleavers, Charlock, Fat Hen, Maise,
Sugar beet, Common wheat, Black-
grass, Loose Silky-bent

Ramirez et al. Sugar beet Not specified DeepLab-v3, SegNet, (MC, UAV);
(2020) U-Net (IP, PLA)

Osorio et al. Lettuce Not specified YOLO-v3, Mask R- (MC, UAV);


(2020) CNN, SVM (IP, PLA)

Lam et al. Grasslands Rumex obtusifolius VGG-16 (DC, UAV);


(2020) (IP, PLA)

Sharpe et al. Strawberry, Goosegrass Tiny YOLO-v3 DC; (IP, BBA)


(2020) Tomato

Petrich et al. Not specified Colchicum autumnale U-Net (DC, UAV);


(2019) (IP, IA, BBA)

Continued on next page

22
Chapter 2. Literature Review

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Czymmek et al. Carrot Not specified Faster YOLO-v3, tiny (DC, FR); ILA;
(2019) YOLO-v3 PBC

Partel et al. Blueberry Not specified Faster R-CNN, (DC, ATV);


(2019b) YOLO-v3, ResNet- (IP, ILA); PBC
50, ResNet-101,
Darknet-53

Partel et al. Pepper Portulaca weeds Tiny YOLO-v3, (DC, ATV);


(2019a) YOLO-v3 (IP, BBA);
PBC

Olsen et al. Not specified Chinee apple, Lantana, Parkinsonia, Inception-v3, ResNet- (DC, FR); (IP,
(2019) Parthenium, Prickly acacia, Rubber 50 ILA); PBC
vine, Siam weed, Snake weed

Kounalakis Clover, grass Broad-leaved dock AlexNet, VGG- (DC, FR);


et al. (2019) F, VGG-VD-16, PLA; PBC
Inception-v1, ResNet-
50, ResNet-101

Rasti et al. Mache salad Not specified Scatter Transform, (DC, FR); (IP,
(2019) Local Binary Pattern SDG, BBA);
(LBP), GLCM, Ga- PBC
bor filter, CNN

Sarvini et al. Chrysanthemum Para grass, Nutsedge SVM, Artificial Neu- DC; (IP, IA,
(2019) ral Network (ANN), ILA); PBC
CNN

Ma et al. (2019) Rice Sagittaria trifolia SegNet, FCN, U-Net DC; (IP, BBA);
PBC

Asad and Bais Canola Not specified U-Net, SegNet (DC, ATV);
(2019) (IP, IA, PLA);
PBC

Yu et al. Bermudagrass Hydrocotyle spp., Hedyotis cormy- VGGNet, DC; (IP, ILA);
(2019b) bosa, Richardia scabra GoogLeNet, De- PBC
tectNet

Continued on next page

23
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Abdalla et al. Oilseed Not specified FCN DC; (IA, PLA);


(2019) WM

Yu et al. Perennial rye- dandelion, ground ivy, spotted AlexNet, VGGNet, DC; (IP, ILA);
(2019a) grass spurge GoogLeNet, Detect- PBC
Net

Liang et al. Not specified Not specified CNN, Histogram of (DC, UAV);
(2019) oriented Gradients (IP, ILA); PBC
(HoG), LBP

Sharpe et al. Strawberry Carolina geranium VGGNet, DC; (IP, BBA);


(2019) GoogLeNet, De- PBC
tectNet

Fawakherji Sunflower, Not specified SegNet, U-Net, Bon- (DC, FR, PD);
et al. (2019) carrots, sugar Net, FCN8 PLA; PBC
beets

Valente et al. Grassland Rumex obtusifolius AlexNet (DC, UAV);


(2019) (IP, BBA);
PBC

Chechlinski Beet, Not specified Hybrid Network (DC, ATV);


et al. (2019) cauliflower, (IP, IA, PLA);
cabbage, PBC
strawberry

Brilhador et al. Carrot Not specified U-Net PD; (IA, PLA);


(2019) PBC

Binguitcha- Maise, com- Scentless Mayweed, common chick- ResNet-101 PD, (IP, IA,
Fare and mon wheat, weed, shepherd’s purse, cleavers, BBA); PBC
Sharma (2019) sugar beet Redshank, charlock, fat hen, small-
flowered Cranesbill, field pansy,
black-grass, loose silky-bent

Y. Jiang et al. Cotton Not specified Faster R-CNN DC, (IP, BBA);
(2019) PBC

Continued on next page

24
Chapter 2. Literature Review

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Adhikari et al. Paddy Wild millet ESNet, U-Net, FCN- DC; (IP, IA,
(2019) 8s, and DeepLab-v3, PLA); PBC
Faster R-CNN, ED-
Net

Farooq et al. Sugar beet Alli, hyme, hyac, azol, other unspec- CNN, FCN, LBP, su- HC; (IP, PLA);
(2019) ified weeds perpixel based LBP, PBC
FCN-SPLBP

Knoll et al. Carrot Not specified CNN DC; (IP, PLA);


(2019) PBC

dos Santos Fer- Soybean grass, broadleaf weeds, Chinee ap- Joint Unsupervised PD; PBC
reira et al. ple, Lantana, Parkinsonia, Parthe- LEarning (JULE),
(2019) nium, Prickly acacia, Rubber vine, DeepCluster
Siam weed, Snake weed

Rist et al. Not specified Gamba grass U-Net SI; (IP, PLA)
(2019)

Skovsen et al. Clover Grass FCN-8s DC, (IP, PLA);


(2019) PBC

W. Zhang et al. Pasture Not specified CNN, SVM (DC, ATV);


(2018) (IP, IA, ILA);
PBC

Kounalakis Grasslands Broad-leaved dock AlexNet, VGG-F, (DC, FR);


et al. (2018) GoogLeNet BBA; PBC

H. Huang et al. Rice Not specified FCN, SVM (DC, UAV);


(2018c) BBA; WM

Continued on next page

25
2.5. An Overview and Taxonomy of Deep Learning-based Weed Detection Approaches

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Teimouri et al. Not specified Common field speedwell, field pansy, Inception-v3 DC; (IP, ILA);
(2018) common chickweed, fat-hen, fine PBC
grasses (annual meadow-grass, loose
silky-bent), blackgrass, hemp-nettle,
shepherd’s purse, common fumi-
tory, scentless mayweed, cereal, bras-
sicaceae, maise, polygonum, oat
(volunteers), cranesbill, dead-nettle,
common poppy

Umamaheswari Carrot Not specified GoogleNet PD, (IA, BBA);


et al. (2018) PBC

Suh et al. Sugar beets Volunteer potato AlexNet, VGG-19, (DC, ATV);
(2018) GoogLeNet, ResNet- (IP, IA, ILA);
50, ResNet-101, PBC
Inception-v3

Farooq et al. Not specified Hyme, Alli, Azol, Hyac CNN HC, (IP, IA,
(2018a) BBA); PBC

Bah et al. Spinach, bean Not specified ResNet-18 (DC, UAV);


(2018) (IP, IA, BBA);
WM

Farooq et al. Not specified Hyme, Alli, Azol, Hyac CNN, HoG HC, (IP, ILA);
(2018b) PBC

Lottes et al. Sugar beet Not specified FCN (MC, FR); (IP,
(2018b) PLA); PBC

Sa et al. (2018) Sugar beet Galinsoga spec., Amaranthus SegNet (MC, UAV);
retroflexus, Atriplex spec., PLA; WM
Polygonum spec., Gramineae
(Echinochloa crus-galli, agropyron,
others.), Convolvulus arvensis, Stel-
laria media, Taraxacum spec.

H. Huang et al. Rice Not specified CNN, FCN (DC, UAV);


(2018b) (IP, PLA); WM

Continued on next page

26
Chapter 2. Literature Review

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

H. Huang et al. Rice Not specified FCN-8s, FCN-4s, (DC, UAV) (IP,
(2018a) DeepLab PLA); WM

Chavan and Maise, com- Scentless Mayweed, common chick- AlexNet, VGGNet, PD; PBC
Nandedkar mon wheat, weed, shepherd’s purse, cleavers, Hybrid Network
(2018) sugar beet Redshank, charlock, fat hen, small-
flowered Cranesbill, field pansy,
black-grass, loose silky-bent

Nkemelu et al. Maise, com- Scentless Mayweed, common chick- KNN, SVM, CNN PD; (IP, BBA);
(2018) mon wheat, weed, shepherd’s purse, cleavers, PBC
sugar beet Redshank, charlock, fat hen, small-
flowered Cranesbill, field pansy,
black-grass, loose silky-bent

Sa et al. (2017) Sugar beet Not specified SegNet (MC, UAV),


(IP, BBA), WM

Andrea et al. Maise Not specified LeNET, AlexNet, DC; (IP, IA,
(2017) cNET, sNET PLA); PBC

Dyrmann et al. Winter wheat Not specified FCN (DC, ATV);


(2017) (IP, BBA);
PBC

dos Santos Fer- Soybean Grass, broadleaf weeds AlexNet, SVM, Ad- (DC, UAV);
reira et al. aboost – C4.5, Ran- (IP, ILA); PBC
(2017) dom Forest

Tang et al. Soybean Cephalanoplos, digitaria, bindweed Back propagation DC; (IP, ILA);
(2017) neural network, SVM, PBC
CNN

Milioto et al. Sugar beet Not specified CNN (DC, UAV);


(2017) (IP, PLA);
PBC

Pearlstein et al. Lawn grass Not specified CNN (DC, FR); (IP,
(2016) SDG, BBA);
PBC

Continued on next page

27
2.6. Data Acquisition

Table 2.2 – Continued from previous page

Reference Crop Weed Species DL Architectures Operations


Applied Performed
(based on Fig-
ure 2.4)

Di Cicco et al. Sugar beet Capsella bursa-pastoris, galium SegNet SDG, PBC
(2017) aparine

Dyrmann et al. Tobacco, thale Sherpherd’s-Purse , chamomile, CNN PD; (IP, IA);
(2016) cress, cleavers, knotweed family, cranesbill, chick- PBC
common weed, veronica, fat-hen, narrow-
Poppy, corn- leaved grasses, field pancy, broad-
flower, wheat, leaved grasses, annual nettle, black
maise, sugar nightshade
beet, cabbage,
barley

2.6 Data Acquisition

DL based weed detection and classification techniques require an adequate amount


of labelled data. Different modalities of data are collected using various types of sensors
that are mounted on a variety of platforms. Below we discuss the popular ways of weed
data collection.

2.6.1 Sensors and Camera Mounting Vehicle

2.6.1.1 Unmanned Aerial Vehicles (UAVs)

Unmanned Aerial Vehicles are often used for data acquisition in agricultural research.
Generally, UAVs are used for mapping weed density across a field by collecting RGB
images (H. Huang et al., 2018b, 2018c, 2020; Petrich et al., 2019) or multispectral images
(Osorio et al., 2020; Patidar et al., 2020; Ramirez et al., 2020; Sa et al., 2017, 2018).
In addition, UAVs can be used to identify crop rows and map weeds within crop rows
by collecting RGB (Red, Green and Blue color) images (Bah et al., 2018). Valente et
al. (2019) used a small quad-rotor UAV for recording images from grassland to detect
broad-leaved dock (Rumex obtusifolius). As UAVs fly over the field at a certain height,

28
Chapter 2. Literature Review

the images captured by them cover a large area. Some of the studies split the images
into smaller patches and use the patches to distinguish between weeds and crop plants
(dos Santos Ferreira et al., 2017; Milioto et al., 2017; Sivakumar et al., 2020). However,
the flight altitude can be maintained at a low height, e.g. 2 meters, so that each plant
can be labelled as either a weed or crop (Osorio et al., 2020; R. Zhang et al., 2020). Liang
et al. (2019) collected image data using a drone by maintaining an altitude of 2.5 meters.
H. Huang et al. (2018a) collected images with a resolution of 3000×4000 pixels using a
sequence of forward-overlaps and side-overlaps to cover the entire field. Lam et al. (2020)
flew DJI Phantom 3 and 4 Pro drones with a RGB camera at three different heights (10,
15 and 20 m) to determine the optimal height for weed detection.

2.6.1.2 Field Robots (FRs)

Various types of field robot can also be used to collect images. A robotic vehicle can
carry one or more cameras. As previously discussed, robotic vehicles are used to collect
RGB images by mounted digital cameras (Czymmek et al., 2019; Fawakherji et al., 2019;
Kounalakis et al., 2019; Olsen et al., 2019; Rasti et al., 2019). Mobile phone in-built
cameras have also been used for such data collection. For example, an iPhone 6 was
used to collect video data by mounting it on a Robotic Rover (Pearlstein et al., 2016).
A robotic platform called “BoniRob” has been used to collect multi-spectral images from
the field (Lottes et al., 2018b, 2020). Kounalakis et al. (2018) used three monochrome
cameras mounted on a robot to take images. They argued that, in most cases, weeds are
green, and so are the crops. There is no need to use colour features to distinguish them.

2.6.1.3 All-Terrain Vehicles (ATVs)

To collect images from the field, all-terrain vehicles have also been used. ATVs can
be mounted with different types of camera (Asad & Bais, 2019; Chechlinski et al., 2019;
Dyrmann et al., 2017; Partel et al., 2019b; W. Zhang et al., 2018). Le et al. (2020a)
used a combination of multi-spectral and spatial sensors to capture data. Even multiple
low-resolution webcams have been used on an ATV (Partel et al., 2019a). To maintain
specific height with external lighting conditions, and illumination, custom made mobile
platforms have been used to carry the cameras for capturing RGB images (Skovsen et al.,

29
2.6. Data Acquisition

2019; Suh et al., 2018). When it is not possible to use any vehicle to collect images at a
certain height, tripods can be used as an alternative (Abdalla et al., 2019).

2.6.1.4 Collect Data without Camera Mounting Devices

On a few occasions, weed data have been collected by cameras without being mounted
on a vehicle. As such, video data are collected using handheld cameras (Adhikari et al.,
2019; Espejo-Garcia et al., 2020; Gao et al., 2020; Y. Jiang et al., 2019; Knoll et al.,
2019; Ma et al., 2019; Sarvini et al., 2019; Sharpe et al., 2020; Tang et al., 2017; Teimouri
et al., 2018; Yan et al., 2020; Yu et al., 2019b, 2019a). Sharpe et al. (2019) collected their
data by maintaining a certain height (130 cm) from the soil surface. Brimrose VA210
filter and JAI BM-141 cameras have been used to collect hyperspectral images of weeds
and crops without using any platform (Farooq et al., 2018a, 2018b, 2019). Andrea et al.
(2017) manually focused a camera on the target plants in such a way that it could capture
images, including all the features of these plants. In Trong et al. (2020), they focus the
camera on many parts of weeds, such as flowers, leaf, fruits, or the full weeds structure.

2.6.2 Satellite Imagery

Rist et al. (2019) use the Pleiades-HR 1A to collect high-resolution 4-band (RGB+NIR)
imagery over the area of interest. They made use of high-resolution satellite images and
applied masking to indicate the presence of weeds.

2.6.3 Public Datasets

There are several publicly available crop and weed datasets that can be used to train
the DL models. Chebrolu et al. (2017) developed a dataset containing weeds in sugar beet
crops. Another annotated dataset containing images of crops and weeds collected from
fields has been made available by Haug and Ostermann (2014). A dataset of annotated
(7853 annotations) crops and weed images was developed by Sudars et al. (2020), which
comprises 1118 images of six food crops and eight weed species. Leminen Madsen et al.
(2020) developed a dataset containing 7,590 RGB images with 315,038 plant objects,
representing 64,292 individual plants from 47 different species. These data were collected

30
Chapter 2. Literature Review

in Denmark and made available for further use. A summary of the publicly available
datasets related to weed detection and plant classification is listed in Table 2.3.

We have listed nineteen datasets in Table 2.3 which are available in this area, and can
be used by researchers. Amongst these datasets, researchers will need to send a request
to the owner of “Perennial ryegrass and weed”, “CNU Weed Dataset” and “Sugar beet
and hedge bindweed” dataset to obtain the data. Other datasets can be downloaded
directly on-line. Most of the datasets contain RGB images of food crops and weeds from
different parts of the world. The RGB data have generally been collected using high-
resolution digital cameras. However, Teimouri et al. (2018) used a point grey industrial
camera. While acquiring data for the “DeepWeeds” dataset, the researchers added a
“Fujinon CF25HA-1” lens with their “FLIR Blackfly 23S6C” camera and mounted the
camera on a weed control robot (“AutoWeed”). Chebrolu et al. (2017) and Haug and
Ostermann (2015) employed “Bonirob” (an autonomous field robot) to mount the multi-
spectral cameras. “Carrots 2017” and “Onions 2017” datasets were also acquired using
a multi-spectral camera, namely the “Teledyne DALSA Genie Nano”. These researchers
used a manually pulled cart to carry the camera. The “CNU Weed Dataset” has 208,477
images of weeds collect from farms and fields in the Republic of Korea, which is the highest
number among the datasets. Though this dataset exhibits a class imbalance, it contains
twenty-one species of weeds from five families. Skovsen et al. (2019) developed a dataset
of red clover, white clover and other associated weeds. The dataset contains 31,600
unlabelled data together with 8000 synthetic data. Their goal was to generate labels for
the data using unsupervised or self-supervised approaches. All the other datasets were
manually labelled using image level, pixel-wise or bounding box annotation techniques.

Dyrmann et al. (2016) use six publicly available datasets containing 22 different plant
species to classify using deep learning methods. Several studies proposed an encoder-
decoder architecture to distinguish crops and weeds using the Crop Weed Field Image
Dataset (Brilhador et al., 2019; Umamaheswari & Jain, 2020; Umamaheswari et al.,
2018). The DeepWeeds dataset (Olsen et al., 2019) was used by Hu et al. (2020) to
evaluate their proposed method. In the study of H. Jiang et al. (2020), the “Carrot-Weed
dataset” (Lameski et al., 2017) was used with their own dataset the “Corn, lettuce and
weed dataset”. Fawakherji et al. (2019) collected data from a sunflower farm in Italy.
To demonstrate the proposed method’s generalising ability, they also used two publicly

31
2.6. Data Acquisition

Table 2.3: List of publicly available crop and weed datasets


Dataset and Reference Type/Number Type/Number of Data Sensor and Number of Im- Data Data Lo- Class Source
of Crop Weed Species Type Mounting ages Anno- cation imbal-
Vehicle tation ance?
Crop/Weed Field Image Carrot Not specified Multi- MC and FR 60 PLA Germany Yes https://github.com/cwfid/dataset
Dataset (Haug & Ostermann, spectral
2015) image
Dataset of food crops and weed Six crop Eight weed species RGB DC 1118 BBA Latvia Yes https://www.ncbi.nlm.nih.gov/
(Sudars et al., 2020) pmc/articles/PMC7305380/
DeepWeeds (Olsen et al., 2019) Not specified Eight weed species RGB DC and FR 17,509 ILA Australia No https://github.com/AlexOlsen/
DeepWeeds
Early crop weed dataset Tomato, cotton Black nightshad, vel- RGB 508 DC ILA Greece Yes https://github.com/AUAgroup/
(Espejo-Garcia et al., 2020) vetleaf early-crop-weed
Perennial ryegrass and weed Perennial rye- dandelion, ground RGB DC 33086 ILA USA No https://www.frontiersin.org/
(Yu et al., 2019a) grass ivy, and spotted articles/10.3389/fpls.2019.01422/
spurge full
Soybean and weed dataset (dos Soybean Grass and broadleaf RGB DC and UAV 400 ILA Brazil Yes https://www.kaggle.com/fpeccia/
Santos Ferreira et al., 2017) weeds weed-detection-in-soybean-crops
Open Plant Phenotype Not specified 46 most common RGB DC 7,590 BBA Denmark No https://gitlab.au.dk/
Database (Leminen Mad- monocotyledon AUENG-Vision/OPPD
sen et al., 2020) (grass) and dicotyle-
don (broadleaved)
weeds
Sugar beet and hedge Sugar beet Convolvulus sepium RGB DC 652 BBA Belgium Yes https://plantmethods.
bindweed dataset (Gao et (hedge bindweed) biomedcentral.com/articles/10.
al., 2020) 1186/s13007-020-00570-z
Sugar beet fields dataset (Che- Sugar beet Not specified Multi- MC and FR 12340 PLA Germany No https://www.ipb.uni-bonn.de/
brolu et al., 2017) spectral 2018/10/
image
UAV Sugarbeets 2015-16 Sugarbeets Not specified RGB DC and UAV 675 PLA Switzerland No https://www.ipb.uni-bonn.de/
Datasets (Chebrolu et al., data/uav-sugarbeets-2015-16/
2018)
Corn, lettuce and weed dataset Corn and lettuce Cirsium setosum, RGB DC 6800 ILA China No https://github.com/
(H. Jiang et al., 2020) Chenopodium album, zhangchuanyin/weed-datasets
bluegrass and sedge
Carrot-Weed dataset (Lameski Carrot Not specified RGB DC 39 PLA Republic Yes https://github.com/lameski/
et al., 2017) of Mace- rgbweeddetection
donia
Bccr-segset dataset (Le et al., Canola, corn, Not specified RGB DC 30,000 ILA Australia No https://academic.oup.com/
2020b) radish gigascience/article/9/3/giaa017/
5780256#200419497
Carrots 2017 dataset (Bosilj et Carrots Not specified Multi- MC and 20 PLA UK No https://lcas.lincoln.ac.uk/
al., 2020) spectral manually nextcloud/index.php/s/
image pulled cart RYni5xngnEZEFkR
Onions 2017 dataset (Bosilj et Onions Not specified Multi- MC and 20 PLA UK No https://lcas.lincoln.ac.uk/
al., 2020) spectral manually nextcloud/index.php/s/
image pulled cart e8uiyrogObAPtcN
GrassClover image dataset Red clover and Not specified RGB DC and 31,600 real and PLA Denmark Yes https://vision.eng.au.dk/
(Skovsen et al., 2019) white clover manually 8000 synthetic grass-clover-dataset/
operated images
platform
Leaf counting dataset Not specified Eighteen weed species RGB DC 9372 ILA Denmark Yes https://vision.eng.au.dk/
(Teimouri et al., 2018) leaf-counting-dataset/
CNU Weed Dataset (Trong et Not specified Twenty one species of RGB DC 208,477 ILA Republic Yes https://www.sciencedirect.
al., 2020) weed of Korea com/science/article/pii/
S0168169919319799#s0025
Plant Seedlings Dataset Three crop Nine weed species RGB DC 5539 ILA Denmark Yes https://www.kaggle.
(Giselsson et al., 2017) com/vbookshelf/
v2-plant-seedlings-dataset

32
Chapter 2. Literature Review

available datasets containing images of carrots, sugar beets and associated weeds. Bosilj
et al. (2020) also used those datasets along with the Carrot 2017 and Onion 2017 datasets.
The “Plant Seedlings” dataset is a publicly available dataset containing 12 different plant
species. Several studies used this dataset to develop a crop-weed classification model
(Binguitcha-Fare & Sharma, 2019; Chavan & Nandedkar, 2018; Nkemelu et al., 2018;
Patidar et al., 2020). dos Santos Ferreira et al. (2019) used DeepWeeds (Olsen et al.,
2019) and “Soybean and weed” datasets, which are publicly available.

While several datasets are publicly available, they are somewhat site/crop-specific. As
such there is no so-called benchmark weed dataset like ImageNet (Deng et al., 2009) and
MS COCO (Lin et al., 2014) in this research field, that is widely used in the evaluation.

2.7 Dataset Preparation

After acquiring data from different sources, it is necessary to prepare data for training,
testing, and to validate models. Raw data is not always suitable for the DL model. The
dataset preparation approaches include applying different image processing techniques,
data labelling, using image augmentation techniques to increase the number of input data,
or impose variations in the data and generating synthetic data for training. Commonly
used image processing techniques are - background removal, resizing the collected image,
green component segmentation, removing motion blur, de-noising, image enhancement,
extraction of colour vegetation indices, and changing the colour model. Pearlstein et
al. (2016) decoded video into a sequence of RGB images and then converted them into
grayscale images. In further research, the camera was set to auto-capture mode to collect
images in the TIFF format and then these were converted into the RGB colour model
(Suh et al., 2018). Using three webcams on an ATV, Partel et al. (2019a) took videos
and then converted them into different frames of images. In some occasions, it was
necessary to change the image format to accurately train the model, especially when using
public datasets. For instance, Binguitcha-Fare and Sharma (2019) converted the “Plant
Seedlings Dataset” (Giselsson et al., 2017) from PNG to JPEG format, as a number
of studies have show that the JPEG format is better for training Residual Networks
architectures (Ehrlich & Davis, 2019).

33
2.7. Dataset Preparation

2.7.1 Image Pre-processing

The majority of relevant studies undertook some level of image processing before
providing the data as an input to the DL model. It helps the DL architecture to extract
features more accurately. Here we discuss image pre-processing operations used in the
related studies.

Image Resizing Farooq et al. (2018a) investigate the performance of Deep Convolu-
tional Neural Networks based on spatial resolution. They used three different special
resolutions 30×30, 45×45, and 60×60 pixels. The lower patch size achieved good accu-
racy and required less time to train the model. To make the processing faster and reduce
the computational complexity, most of the studies performed image resizing operations
on the dataset before inputting into the DL model. After collecting images from the
field, the resolution of the images is reduced based on the DL network requirement. Yu
et al. (2019b) used 1280×720 pixel-sized images to train DetectNet (Tao et al., 2016)
architecture and 640×360 pixels for GoogLeNet (Szegedy et al., 2015) and VGGNet (Si-
monyan & Zisserman, 2014) neural networks. The commonly used image sizes (in pixel)
are- 64×64 (Andrea et al., 2017; Bah et al., 2018; Milioto et al., 2017; W. Zhang et al.,
2018), 128×128 (Binguitcha-Fare & Sharma, 2019; Dyrmann et al., 2016; Espejo-Garcia
et al., 2020), 224×224 (Binguitcha-Fare & Sharma, 2019; H. Jiang et al., 2020; Olsen
et al., 2019), 227×227 (Suh et al., 2018; Valente et al., 2019), 228×228 (Le et al., 2020a),
256×256 (dos Santos Ferreira et al., 2017; Hu et al., 2020; Pearlstein et al., 2016; Petrich
et al., 2019; Tang et al., 2017), 320×240 (Chechlinski et al., 2019), 288×288 (Adhikari
et al., 2019), 360×360 (Binguitcha-Fare & Sharma, 2019).

Images with high resolution are sometimes split into a number of patches to reduce the
computational complexity. For instance, in the work of Rasti et al. (2019), the images were
split with a resolution of 5120×3840 into 56 patches. Similar operations were performed
by Asad and Bais (2019), H. Huang et al. (2018b), and Ma et al. (2019) where they
divided the original images into tiles of size 912×1024, 1440×960 and 1000×1000 pixels.
Ramirez et al. (2020) captured only five images at high resolution using a drone which
was then split into small patches of size 480×360 without overlapping and 512×512 with
30% overlap. Partel et al. (2019b) collected images using three cameras simultaneously

34
Chapter 2. Literature Review

of resolution 640×480 pixels. They then merged those into a single image of 1920×480
pixels which was resized to 1024×256 pixels. Yu et al. (2019a) scaled down the images of
their dataset to 1224×1024 pixels, so that the training did not run low on memory. H.
Huang et al. (2018a) used orthomosaic imagery, which is usually quite large. They split
the images into small patches of 1000×1000 pixels. In the study of Sharpe et al. (2019),
the images were resized to 1280×720 pixels and then cropped into four sub-images. Osorio
et al. (2020) used 1280×960 pixel size image with four spectral bands. By applying union
operation on the red, green, and near infrared bands, they generated a false green image
in order to highlight the vegetation. Sharpe et al. (2020) resized the collected image to
1280×853 pixels and then cropped it to 1280×720 pixels.

Background Removal H. Huang et al. (2020) collected images using a UAV and
applied image mosaicing to generate an orthophoto. Bah et al. (2018) applied Hough-
transform to highlight the aligned pixels and used Otsu-adaptive-thresholding method to
differentiate the background and green crops or weeds. On the other hand, for remov-
ing the background soil image, Milioto et al. (2017) applied the Normalised Difference
Vegetation Index (NDVI). They also used morphological opening and closing operations
to remove the noise and fill tiny gaps among vegetation pixels. To annotate the images
manually into respective classes, dos Santos Ferreira et al. (2017) applied the Simple Lin-
ear Iterative Clustering algorithm. This algorithm helps to segment weeds, crops, and
background from images. Image pre-processing techniques were also involved in (Sa et
al., 2017) for having a bounding box around crop plants or weeds and removed the back-
ground. They first used image correlation and cropping for alignment and then applied
Gaussian blur, followed by a sharpening operation to remove shadows, small debris, etc.
Finally, for executing the blob detection process on connected pixels, Otsu’s method was
employed. Lottes et al. (2020) applied the pre-processing operation on red, green, blue,
and NIR channels separately. They also performed the Gaussian blur operation to remove
noise using a [5×5] kernel. To standardise the channels, the values were subtracted by
the mean of all channel values and divided by their standard deviation. After that, they
normalised and zero-centred the channel values. Y. Jiang et al. (2019) applied a Contrast
Limited Adaptive Histogram Equalisation algorithm to enhance the image contrast and
reduce the image variation due to ambient illumination changes.

35
2.7. Dataset Preparation

In the work of Le et al. (2020a) and (Bakhshipour et al., 2017), all images were seg-
mented using the Excess Green minus Excess Red Indices (ExG-ExR) method, which
effectively removed the background. They also applied opening and closing morphologi-
cal operations of images and generated contour masks to extract features. On the other
hand, Asad and Bais (2019) argued that the Maximum Likelihood Classification tech-
nique performed better than thresholding techniques for segmenting the background soil
and green plants. According to Alam et al. (2020), images captured from the field had
many problems (e.g. lack of brightness). It was necessary to apply image pre-processing
operations to prepare the data for training. They performed several morphological oper-
ations to remove motion blur and light illumination. They also removed the noisy region
before applying segmentation operations for separating the background. Threshold-based
segmentation techniques had been used to separate the soil and green plants in an im-
age. In the reports of Espejo-Garcia et al. (2020) and Andrea et al. (2017), the RGB
channels of the images were normalised to avoid differences in lighting conditions before
removing the background. For vegetation segmentation, Otsu’s thresholding was applied,
followed by the ExG (Excess Green) vegetation indexing operation. However, Dyrmann
et al. (2016) used a simple excessive green segmentation technique for removing the back-
ground and detecting the green pixels. Knoll et al. (2019) converted the RGB image to
HSV colour space, applied thresholding method and band-pass filtering, and then used
binary masking to extract the image’s green component.

Image Enhancement and Denoising Nkemelu et al. (2018) investigated the impor-
tance of image pre-processing operation by training the CNN model with raw data and
processed data. They found that without image pre-processing the model performance
decreased. They used Gaussian Blur for smoothing the images and removed the high-
frequency content. They then converted the colour of the image to HSV space. Using a
morphological erosion with an 11×11 structuring kernel, they subtracted the background
soil and produced foreground seedling images. Lottes et al. (2018b) reported that image
pre-processing improved the generalisation capabilities of a classification system. They
applied [5×5] Gaussian Kernel to remove noise and to normalise the data. They also zero-
centred the pixel values of the image. The study of Sarvini et al. (2019) used the Gaussian
and median filter to remove Gaussian noise and Salt and Pepper noise respectively. Tang

36
Chapter 2. Literature Review

et al. (2017) also normalised the data to maintain zero-mean and unit variance. Besides,
they applied Principal Component Analysis and Zero-phase Component Analysis data
whitening for eliminating the correlation among the data.

A. Wang et al. (2020) evaluated the performance of the DL model based on the input
representation of images. They applied many image pre-processing operations, such as
histogram equalisation, automatic adjustment of the contrast of images and deep photo
enhancement. They also used several vegetation indices including ExG, Excess Red,
ExG-ExR, NDVI, Normalised Difference Index, Colour Index of Vegetation, Vegetative
Index, and Modified Excess Green Index and Combined Indices. Liang et al. (2019)
split the collected data into blocks which contained multiple plants. The blocks were
then divided into sub-images with a single plant in them. After that, the histogram
equalisation operation was performed to enhance the contrast of the sub-images.

Rist et al. (2019) applied orthorectification and radiometric corrections operation to


process the satellite image. They then normalised the pixel values of each band. After
that, the large satellite image was split into 2138 samples of pixel size 128×128.

2.7.2 Training Data Generation

To enlarge the size of the training data, in several related studies data augmentation
was applied. It is a very useful technique when the dataset is not large enough (Umama-
heswari & Jain, 2020). If there is a little variation (Sarvini et al., 2019) or class imbalance
(Bah et al., 2018) among the images of the dataset, the image augmentation techniques
are helpful. A. Wang et al. (2020) applied an augmentation to the dataset to determine
the generalisation capability of their proposed approach. Table 2.4 shows different types
of data augmentation used in the relevant studies.

As shown in Table 2.4, it is observed that in most of the studies, different geometric
transformation operation were applied to the data. Use of colour augmentation can be
helpful to train a model for developing a real-time classification system. This is because
the colour of the object varies depending on the lighting condition and motion of the
sensors.

Image data that are not collected from the real environments and created artificially

37
2.7. Dataset Preparation

Table 2.4: Different types of data augmentation techniques used in the relevant studies

Image
Augmentation Description Reference
Technique
Rotation Rotate the image to the right or (Adhikari et al., 2019; Andrea et al., 2017; Bah et al.,
left on an axis between 1◦ and 359◦ 2018; Binguitcha-Fare & Sharma, 2019; Brilhador et al.,
(Shorten & Khoshgoftaar, 2019) 2019; Dyrmann et al., 2016; Espejo-Garcia et al., 2020;
Farooq et al., 2018a; Gao et al., 2020; Le et al., 2020a;
Sarvini et al., 2019; W. Zhang et al., 2018)
Scaling Use zooming in/out to resize the im- (Adhikari et al., 2019; Asad & Bais, 2019; Binguitcha-
age (H. Kumar, 2019). Fare & Sharma, 2019; Brilhador et al., 2019; Gao et al.,
2020)
Shearing Shift one part of the image to a direc- (Asad & Bais, 2019; Brilhador et al., 2019; Gao et al.,
tion and the other part to the oppo- 2020; Le et al., 2020a; W. Zhang et al., 2018)
site direction (Shorten & Khoshgof-
taar, 2019).
Flipping Flip the image horizontally or verti- (Abdalla et al., 2019; Adhikari et al., 2019; Asad &
cally (H. Kumar, 2019). Bais, 2019; Binguitcha-Fare & Sharma, 2019; Brilhador
et al., 2019; Chechlinski et al., 2019; Dyrmann et al.,
2016; Gao et al., 2020; Petrich et al., 2019; Sarvini et
al., 2019; W. Zhang et al., 2018)
Gamma Encode and decode the luminance (A. Wang et al., 2020)
Correction values of an image (Brasseur, n.d.).
Colour Isolating a single colour channel, in- (Adhikari et al., 2019; Asad & Bais, 2019; Bah et al.,
Space crease or decrease the brightness of 2018; Chechlinski et al., 2019; Espejo-Garcia et al.,
the image, changing the intensity val- 2020; Petrich et al., 2019; A. Wang et al., 2020)
ues in the histograms (Shorten &
Khoshgoftaar, 2019).
Colour Increase or decrease the pixel val- (Binguitcha-Fare & Sharma, 2019; Le et al., 2020a; Pet-
Space ues by a constant value and restrict- rich et al., 2019; Sarvini et al., 2019)
Transfor- ing pixel values to a certain min or
mations max value (Shorten & Khoshgoftaar,
2019).
Noise Injec- Injecting a matrix of random values (Espejo-Garcia et al., 2020; Petrich et al., 2019; Sarvini
tion to the image matrix. For example: et al., 2019)
Salt-Pepper noise, Gaussian noise etc
(Shorten & Khoshgoftaar, 2019).
Kernel fil- Sharpening or blurring the image (Asad & Bais, 2019; Bah et al., 2018; Espejo-Garcia et
tering (Shorten & Khoshgoftaar, 2019). al., 2020; Petrich et al., 2019)
Cropping Remove a certain portion of an im- (Adhikari et al., 2019; Asad & Bais, 2019; Farooq et al.,
age (Takahashi et al., 2018). Usually 2018a; Petrich et al., 2019)
this is done at random in case of data
augmentation (Shorten & Khoshgof-
taar, 2019).
Translation Shift the position of all the image (Abdalla et al., 2019; Asad & Bais, 2019; Brilhador et
pixels (S.-W. Huang et al., 2018). al., 2019)

or programmatically are known as synthetic data or images (Viraf, 2020). It is not always
possible to manage a large amount of labelled data to train a model. In these cases, the
use of synthetic data is an excellent alternative to use together with the real data. Several
research studies show that artificial data might have a significant change in classifying
images (Andreini et al., 2020). In weed detection using DL approaches, synthetic data
generation technique is not applied very often. Rasti et al. (2019) used synthetically
generated images to train the model and achieved a good classification accuracy while

38
Chapter 2. Literature Review

testing on a real dataset.

On the other hand, Pearlstein et al. (2016) created complex occlusion of crops and
weeds and generated variation in leaf size, colour, and orientation by producing synthetic
data. To minimise human effort for annotating data, Di Cicco et al. (2017) generated
synthetic data to train the model. For that purpose, they used a generic kinematic model
of a leaf prototype to generate a single leaf of different plant species and then meshed
that leaf to the artificial plant. Finally, they placed the plant in a virtual crop field for
collecting the data without any extra effort for annotation.

Skovsen et al. (2019) generated a 8000 synthetic dataset for labelling a real dataset.
To create artificial data, they cropped out different parts of the plant, randomly selected
any background from the real data, applied image processing (e.g. rotation, scaling, etc.),
and added an artificial shadow using a Gaussian filter.

2.7.3 Data Labelling

The majority of the reviewed publications used manually annotated data labelled by
experts for training the deep learning model in a supervised manner. The researchers
applied different annotations, such as bounding boxes annotation, pixel-level annotation,
image-level annotation, polygon annotation, and synthetic labelling, based on the research
need. Table 2.5 shows different image annotation approaches used for weed detection.
However, H. Jiang et al. (2020) applied a semi-supervised method to label the images;
they used a few labelled images to annotate the unlabelled data. On the other hand, dos
Santos Ferreira et al. (2019) proposed a semi-automatic labelling approach. Unlike semi-
supervised data annotation, they did not use any manually labelled data, but applied the
clustering method to label the data. First, they divided the data into different clusters
according to their features and then labelled the clusters. Similar techniques were used
by Hall et al. (2018). Yu et al. (2019a) separated the collected images into two parts;
one with positive images that contained weeds, and the other of negative images without
weeds. Lam et al. (2020) proposed an object-based approach to generate labelled data.

As summarised in Table 2.5, commonly used annotation techniques are bounding


boxes, pixel-wise labelling and image level annotation. However, plants are irregular in

39
2.8. Detection Approaches

Table 2.5: Different image annotation techniques used for weed detection using deep
learning
Type of Image
Description Reference
Annotation
Pixel Level Annotation Label each pixel whether Abdalla et al. (2019), Adhikari et al. (2019), Andrea et al.
it belongs to crop or (2017), Asad and Bais (2019), Bini et al. (2020), Bosilj et al.
weed in the image. (2020), Brilhador et al. (2019), Chechlinski et al. (2019), Fa-
rooq et al. (2019), Fawakherji et al. (2019), Hall et al. (2018),
H. Huang et al. (2018a, 2018b, 2020), Ishak et al. (2007), Knoll
et al. (2019), Kounalakis et al. (2019), Lam et al. (2020), Li-
akos et al. (2018), Lottes et al. (2018b, 2020), Milioto et al.
(2017), Osorio et al. (2020), Patidar et al. (2020), Ramirez et
al. (2020), Rist et al. (2019), Sa et al. (2018), Skovsen et al.
(2019), Umamaheswari and Jain (2020), and R. Zhang et al.
(2020)
Bounding There may be a mixture Bah et al. (2018), Binguitcha-Fare and Sharma (2019), Dyr-
Region Level Annotation

Boxes of weeds and crops in a mann et al. (2017), Farooq et al. (2018a), Gao et al. (2020),
Annotation single image. Using a H. Huang et al. (2018c), Y. Jiang et al. (2019), Kounalakis
bounding box the crops et al. (2018), Ma et al. (2019), Nkemelu et al. (2018), Partel
and weeds are labelled in et al. (2019a), Petrich et al. (2019), Rasti et al. (2019), Sa et
the image. al. (2017), Sharpe et al. (2019, 2020), Sivakumar et al. (2020),
and Valente et al. (2019)
Polygon This is used for seman- Patidar et al. (2020)
Annotation tic segmentation to de-
tect irregular shaped ob-
ject. It outlines the re-
gion of interest with ar-
bitrary number of sides.
Image Level Annotation Uses separate image for Alam et al. (2020), Czymmek et al. (2019), dos Santos Fer-
weeds and crops to train reira et al. (2017), Espejo-Garcia et al. (2020), Farooq et al.
the model. (2018b), H. Jiang et al. (2020), Le et al. (2020a), Liang et al.
(2019), Olsen et al. (2019), Partel et al. (2019b), Sarvini et al.
(2019), Suh et al. (2018), Tang et al. (2017), Teimouri et al.
(2018), Trong et al. (2020), A. Wang et al. (2020), Yan et al.
(2020), Yu et al. (2019b, 2019a), and W. Zhang et al. (2018)
Synthetic Labelling For training the model (Di Cicco et al., 2017; Pearlstein et al., 2016)
use generated and la-
belled data.

shape: by using polygon annotation, the images of crops and weeds can be separated ac-
curately. Synthetic labelling approaches can minimise labelling costs and help to generate
large annotated datasets.

2.8 Detection Approaches

Studies in this area apply two broad approaches for detecting, localising, and classi-
fying weeds in crops: i) localise every plant in an image and classify that image either as
a crop or as a weed; ii) map the density of weeds in the field. To detect weeds in crops,
the concept of “row planting” has been used. In some of these studies, there are further
classification steps of the weed species.

40
Chapter 2. Literature Review

2.8.1 Plant-based Classification

To develop a weed management system, a major step is to classify every plant as


weed or crop plant (Lottes et al., 2018a). The first problem is to detect weeds, followed
by localisation and finally, classification. This approach is useful for real-time weed
management techniques. For instance, Raja et al. (2020) developed a real-time weeding
system where a robotic machine detected the weeds and used a knife to remove them. In
this case, it was necessary to label individual plants, whether as a weed or crop plant. In
traditional farming approaches, farmers usually apply a uniform amount of herbicide over
the whole crop in a field. A machine needs to identify individual crop plants and weeds
to apply automatic selective spraying techniques. Besides, identifying the weed species
is also important to apply specific treatments (Lottes et al., 2020). We have found that
this approach has been used in most of the studies reported.

2.8.2 Weed Mapping

Mapping weed density can also be helpful for site-specific weed management and can
lead to a reduction in the use of herbicides. H. Huang et al. (2020) used the DL technique
to map the density of weeds in a rice field. An appropriate amount of herbicides can be
applied to a specific site based on the density map. The work in Abdalla et al. (2019)
segmented the images and detected the weed presence in the region of that image. Using
a deep learning approach, H. Huang et al. (2018c) generated a weed distribution map of
the field. In addition, some researchers argued that weed mapping helps to monitor the
conditions of the field automatically (Sa et al., 2017, 2018). Farmers can monitor the
distribution and spread of weeds, and can take action accordingly.

2.9 Learning Methods

2.9.1 Supervised Learning

Supervised learning occurs when the datasets for training and validation are labelled.
The dataset passed in the DL model as input contains the image along with the corre-

41
2.9. Learning Methods

sponding labels. That means, in supervised training, the model learns how to create a
map from a given input to a particular output based on the labelled dataset. Supervised
learning is popular to solve classification and regression problems (Caruana & Niculescu-
Mizil, 2006). In most of the related research the supervised learning approach was used to
train the DL models. Section 2.10 presents a detail description of those DL architectures.

2.9.2 Unsupervised Learning

Unsupervised learning occurs when the training set is not labelled. The dataset
passed as input in the unsupervised model has no corresponding annotation. The models
attempt to learn the structure of the data and extract distinguishable information or
features from data. Using this process, the model becomes able to map the input to
the particular output. From this, the objects in the whole dataset will be divided into
separate groups or clusters. The features of the objects in a cluster are similar and differ
from other clusters. This is how unsupervised learning can classify objects of a dataset
into separate categories. Clustering is one of the applications of unsupervised learning
(Barlow, 1989).

Most of the relevant studies used a supervised learning approach to detect and classify
weeds in crops automatically. However, dos Santos Ferreira et al. (2019) proposed un-
supervised clustering algorithms with a semi-automatic data labelling approach in their
research. They applied two clustering methods- Joint Unsupervised Learning (JULE) and
Deep Clustering for Unsupervised Learning of Visual Features algorithms (DeepCluster).
They developed the models using AlexNet (Krizhevsky et al., 2012) and VGG-16 (Si-
monyan & Zisserman, 2014) architecture and initialised with pre-trained weights. They
achieved a promising result (accuracy 97%) in classifying weeds in crops and reduce the
cost of manual data labelling.

Tang et al. (2017) applied an unsupervised K-means clustering algorithm as a pre-


training process and generate a feature dictionary. They then used those features to
initialise the weights of the CNN model. They claimed that it can improve generalisation
ability in feature extraction and resolve the unstable identification problem. The proposed
approach shows better accuracy than SVM, Back Propagation neural network, and even
CNN with randomly initialised weights.

42
Chapter 2. Literature Review

2.9.3 Semi-supervised Learning

Semi-supervised learning takes the middle ground between supervised and unsuper-
vised learning (Lee, 2013). A few researchers used Graph Convolutional Network (GCN)
(Kipf & Welling, 2016) in their research, which is a semi-supervised model. The major
difference between CNN and GCN is the structure of input data. CNN is for regular
structured data, whereas GCN uses graph data structure (Mayachita, 2020). We discuss
the use of GCN in the related work in Section 2.10.4.

2.10 Deep Learning Architecture

Our analysis shows that the related studies apply different DL architectures to clas-
sify the weeds in crop plants based on the dataset and research goal. Most researchers
compared their proposed models either with other DL architecture or with traditional
machine learning approaches. Table 2.2 shows an overview of different DL approach used
in weed detection. A CNN model generally consists of two basic parts- feature extrac-
tion and classification (Khoshdeli et al., 2017). In related research, some researchers
applied CNN models using various permutation of feature extraction and classification
layers. However, in most cases, they preferred to use state-of-art CNN models like VG-
GNet (Simonyan & Zisserman, 2014), ResNet (deep Residual Network) (K. He et al.,
2016), AlexNet (Krizhevsky et al., 2012), InceptionNet (Szegedy et al., 2015), and many
more. Fully Convolutional Networks (FCNs) like SegNet (Badrinarayanan et al., 2017)
and U-Net (Ronneberger et al., 2015) were also used in several studies.

2.10.1 Convolutional Neural Network (CNN)

2.10.1.1 Pre-trained Network

Suh et al. (2018) applied six well known CNN models namely AlexNet, VGG-19,
GoogLeNet, ResNet-50, ResNet-101 and Inception-v3. They evaluated the network per-
formance based on the transfer learning approach and found that pre-trained weights
had a significant influence on training the model. They obtained the highest classifica-

43
2.10. Deep Learning Architecture

tion accuracy (98.7%) using the VGG-19 model, but it took the longest classification
time. Considering that, the AlexNet model worked best for detecting volunteer potato
plants in sugar beet according to their experimental setup. Even under varying light
conditions, the model could classify plants with an accuracy of about 97%. The study
of dos Santos Ferreira et al. (2017) also supported that. They compared the classifica-
tion accuracy of AlexNet with SVM, Adaboost – C4.5, and the Random Forest model.
The AlexNet architecture performed better than other models in discriminating soybean
crop, soil, grass, and broadleaf weeds. Similarly, Valente et al. (2019) reported that
the AlexNet model with pre-trained weights showed excellent performance for detecting
Rumex in grasslands. They also showed that by increasing heterogeneous characteris-
tics of the input image might improve the model accuracy (90%). However, Lam et al.
(2020) argued that to detect Rumex in grassland the VGG-16 model performs well with
an accuracy of 92.1%.

Teimouri et al. (2018) demonstrated that, although ImageNet dataset does not contain
the images of different plant species, the pre-trained weights of the dataset could still help
to reduce the number of training iterations. They fine-tuned Inception-v3 architecture
for classifying eighteen weed species and determining growth stages based on the number
of leaves. The model achieved the classification accuracy of 46% to 78% and showed an
average accuracy of 70% while counting the leaves. However, Olsen et al. (2019) differed
from them. They developed a multi-class weed image dataset consisting of eight nationally
significant weed species. The dataset contains 17,509 annotated images collected from
different locations of northern Australia. They also applied the pre-trained Inception-
v3 model along with ResNet-50 to classify the weed species (source code is available
here: https://github.com/AlexOlsen/DeepWeeds). The average classification accuracy
of ResNet-50 (95.7%) was a little higher than Inception-v3 (95.1%). Bah et al. (2018)
also used ResNet with pre-trained weights as they found it more useful to detect weeds.

According to Yu et al. (2019b), Deep Convolutional Neural Network (DCNN) can


perform well in detecting different species of weeds in bermudagrass. They used three
pre-trained (from ImageNet dataset and KITTI dataset (Geiger et al., 2013)) models
including VGGNet, GoogLeNet and DetectNet. In another study, they added AlexNet
architecture with the previous models for detecting weeds in perennial ryegrass (Yu et al.,
2019a). Though all the models performed well, DetectNet exhibited a bit higher F1 score

44
Chapter 2. Literature Review

of ≥0.99. On the other hand, Sharpe et al. (2019) evaluated the performance of VGGNet,
GoogLeNet, and DetectNet architecture using two variations of images (i.e., whole and
cropped images). They also agreed that the DetectNet model could detect and classify
weed in strawberry plants more accurately using cropped sub-images. They suggested
that the most visible and prevalent part of the plant should be annotated rather than
labelling the whole plant in the image.

Le et al. (2020a) proposed a model namely Filtered LBP (Local Binary Patterns) with
Contour Mask and coefficient k (k-FLBPCM). They compared the model with VGG-16,
VGG-19, ResNet-50, and Inception-v3 architecture. The k-FLBPCM method effectively
classified barley, canola and wild radish with an accuracy of approximately 99%, which
was better than other CNN models (source code is available here: https://github.com/
vinguyenle/k-FLBPCM-method). The network was trained using pre-trained weights
from ImageNet dataset.

Andrea et al. (2017) compared the performance of LeNET (LeCun et al., 1989),
AlexNet, cNET (Gabor et al., 1996), and sNET (Qin et al., 2019) in their research.
They found that cNET was better in classifying maize crop plants and their weeds. They
further compared the performance of the original cNET architecture with the reduced
number of filter layers (16 filter layers). The result reported that with pre-processed
images, 16 filter layers were adequate to classify the crops and weeds. Besides, it made
the model 2.5 times faster than its typical architecture and helped to detect weeds in
real-time.

Partel et al. (2019b) analysed the performance of Faster R-CNN (Ren et al., 2015),
YOLO-v3 (Redmon & Farhadi, 2018), ResNet-50, ResNet-101, and Darknet-53 (Redmon,
n.d.) models to develop a smart sprayer for controlling weed in real-time. Based on
precision and recall value, ResNet-50 model performed better than others. In contrast,
Binguitcha-Fare and Sharma (2019) applied the ResNet-101 model. They demonstrated
that the size of the input image could affect the performance of ResNet-101 architecture.
They used three different pixel sizes (i.e., 128px, 224px, and 360px) for their experiment
and reported that model accuracy gets better by increasing the pixel size of the input
image.

Trong et al. (2020) proposed a multi-modal DL approach for classifying species of

45
2.10. Deep Learning Architecture

weeds. In this approach, they trained five pre-trained DL models, including NASNet,
ResNet, Inception–ResNet, MobileNet, and VGGNet independently. The Bayesian con-
ditional probability-based technique and priority weight scoring method were used to
calculate the score vector of models. The model with better scores has a higher prior-
ity on determining the classes of species. To classify weed species, they summed up the
probability vectors generated by the softmax layer of each model and the species with the
highest probability value was determined. According to the experimental results, they
argued that the performance of this approach was better than a single DL model.

2.10.1.2 Training from Scratch

Dyrmann et al. (2016) argued that, a CNN model initialised with pre-trained weights
which was not trained on any plant images would not work well. They therefore built a
new architecture using a combination of convolutional layers, batch normalisation, acti-
vation functions, max-pooling layers, fully connected layers, and residual layers according
to their need. The model was used to classify twenty-two plant species, and they achieved
a classification accuracy ranging from 33% to 98%.

Milioto et al. (2017) built a CNN model for blob wise discrimination of crops and
weeds. They used multi-spectral images to train the model. They investigated different
combinations of convolutional layers and fully connected layers to explore an optimised,
light-weight and over-fitting problem-free model. Finally, using three convolutional layers
and two fully connected layers, they obtained a better result. They stated that this
approach did not have any geometric priors like planting the crops in rows. Farooq et
al. (2018a) claimed in their research that the classification accuracy of the CNN model
depended on the number of the hyperspectral band and the resolution of the image
patch. They also built a CNN model using a combination of convolutional, nonlinear
transformation, pooling and dropout layers. In further research, they proved that a CNN
model trained with a higher number of bands could classify images more accurately than
HoG (Histogram of oriented Gradients) based method (Farooq et al., 2018b).

Nkemelu et al. (2018) compared CNN’s performance with SVM (61.47%) and K-
Nearest Neighbour (KNN) algorithm (56.84%) and found that CNN could distinguish
crop plants from weeds better. They used six convolutional layers and three fully con-

46
Chapter 2. Literature Review

nected layers in the CNN architecture to achieve the accuracy of 92.6%. They also
evaluated the accuracy of CNN using the original images and the pre-processed images.
The experimental results suggested that classification accuracy improved by using pre-
processed images. Sarvini et al. (2019) agreed that CNN offers better accuracy than
SVM and ANN in detecting weeds in crop plants because of its deep learning ability.
Liang et al. (2019) employed a CNN architecture that consists of three convolutional,
three pooling, four Dropout layers, and a fully connected layer for developing a low-
cost weed recognition system. Their experiment also proved that the performance of the
CNN model in classification was better than the HoG and LBP methods. W. Zhang et al.
(2018) also demonstrated that the CNN model was better than SVM for detecting broad-
leaf weeds in pastures. They used a CNN model with six convolutional layers and three
fully connected classification layers. The model could recognise weeds with an accuracy
of 96.88%, where SVM achieved maximum accuracy of 89.4%.

Pearlstein et al. (2016) used synthetic data to train their CNN model and evaluated
it on real data. They built a CNN model with five convolutional layers and two fully
connected layers. The results showed that CNN could classify crop plants and weeds very
well from natural images and with multiple occlusion. Although Rasti et al. (2019) applied
the same architecture in their research, they argued that the Scatter Transform method
achieved better accuracy with a small dataset than the CNN architecture. They compared
several machine learning approaches like Scatter Transform, LBP, GLCM, Gabor filter
with the CNN model. They also used synthetic data for training and evaluated the
models’ performance on real field images.

2.10.2 Region Proposal Networks (RPN)

Based on the tiny YOLO-v3 (Yi et al., 2019) framework, Gao et al. (2020) proposed
a DL model which speeds up the inference time of classification (source code is avail-
able here: https://drive.google.com/file/d/1-E_b_5oqQgAK2IkzpTf6E3X1OPm0pjqy/
view?usp=sharing). They added two extra convolutional layers to the original model
for better feature fusion and also reduced the number of detection scales to two. They
trained the model with both synthetic data and real data. Although YOLO-v3 archived
better classification accuracy in the experiments, they recommended the tiny YOLO-v3

47
2.10. Deep Learning Architecture

model for real-time application. Sharpe et al. (2020) also used tiny YOLO-v3 model to
detect goosegrass in strawberry and tomato plants.

YOLO-v3 and tiny YOLO-v3 models were also employed in a research by Partel et
al. (2019a). The aim was to find a low-cost, smart weed management system. They
applied the models on two machines with different hardware configurations. Their paper
reported that YOLO-v3 showed good performance when tested on powerful and expensive
computers, but the processing speed decreased if executed on a lower power computer.
From their experiments, they came to the conclusion that to save the hardware cost,
the tiny YOLO-v3 model was better. W. Zhang et al. (2018) also preferred to use tiny
YOLO-v3 instead of YOLO-v3, because it was a lightweight method and took less time
and resources to classify objects. In contrast, Czymmek et al. (2019) proposed to use
YOLO-v3 with a relatively larger input image size (832 × 832 pixels). They argued that
the model performed better in their research with a small dataset. They agreed that tiny
YOLO-v3 or Fast YOLO-v3 could improve the detection speed, but there was a need to
compromise with the model accuracy.

Sivakumar et al. (2020) trained and evaluated the performance of a pre-trained Faster
R-CNN and SSD (Single Shot Detector) (W. Liu et al., 2016) object detection models to
detect late-season weed in soybean fields. Moreover, they compared these object detection
models with patch-based CNN model. The result showed that Faster R-CNN performed
better in terms of weed detection accuracy and inference speed. Y. Jiang et al. (2019)
proposed the Faster R-CNN model to detect the weeds and crop plants and to count the
number of seedlings from the video frames. They used Inception-ResNet-v2 architecture
as the feature extractor. On the other hand, by applying the Mask R-CNN model on
“Plant Seedlings Dataset” Patidar et al. (2020) achieved more than 98% classification
accuracy. They argued that Mask R-CNN detected plant species more accurately with
less training time than FCN.

Osorio et al. (2020) compared two RPN models, namely YOLO-v3 and Mask R-CNN
with SVM. The classification accuracy of RPN architectures was 94%, whereas SVM
achieved 88%. However, they reported that as SVM required less processing capacity, it
could be used for IoT based solution.

48
Chapter 2. Literature Review

2.10.3 Fully Convolutional Networks (FCN)

Unlike CNN, FCN replaces all the fully connected layers with convolutional layers
and uses a transposed convolution layer to reconstruct the image with the same size as
the input. It helps to predict the output by making a one-to-one correspondence with
the input image in the spatial dimension (H. Huang et al., 2020; Shelhamer et al., 2017).

H. Huang et al. (2018b) compared the performance of AlexNet, VGGNet, and GoogLeNet
as the base model for FCN architecture. VGGNet achieved the best accuracy among
those. They further compared the model with patch-based CNN and pixel-based CNN
architectures. The result showed that the VGG-16 based FCN model achieved the highest
classification accuracy. On the other hand, H. Huang et al. (2018c) applied ResNet-101
and VGG-16 as a baseline model of FCN for segmentation. They also compared the per-
formance of the FCN models with a pixel-based SVM model. In their case, ResNet-101
based FCN architecture performed better. Asad and Bais (2019) compared two FCN ar-
chitecture for detecting weeds in canola fields, i.e., SegNet and U-Net. They used VGG-16
and ResNet-50 as the encoder block in both the models. The SegNet with ResNet-50 as
the base model achieved the highest accuracy.

According to Ma et al. (2019), SegNet (accuracy 92.7%) architecture was better than
traditional FCN (accuracy 89.5%) and U-Net (accuracy 70.8%) for weed image segmen-
tation when classifying rice plants and weeds in the paddy field. The study of Abdalla
et al. (2019) reported that the accuracy of image segmentation depended on the size of
the dataset. That is why it is difficult to train a model from scratch. To address this
problem, they applied transfer learning and real-time data augmentation to train the
model. In their experiment, they used VGG-16 based SegNet architecture. They applied
three different transfer learning approaches for VGG-16. Moreover, the performance of
the model was compared with the VGG-19 based architecture. The VGG-16 based Seg-
Net achieved the highest accuracy of 96% when they used pre-trained weights only for
feature extraction and the shallow machine learning classifier (i.e., SVM) for segmenta-
tion. Sa et al. (2017) also applied SegNet with the pre-trained VGG-16 as the base model
(source code is available here: https://github.com/inkyusa/weedNet). They trained the
model by varying the number of channels in the input images. They then compared the
inference speed and accuracy of different arrangements by deploying the model on an

49
2.10. Deep Learning Architecture

embedded GPU system, which was carried out by a small micro aerial vehicle (MAV).
Umamaheswari and Jain (2020) compared the performance of SegNet-512 and SegNet-
256 encoder-decoder architectures for semantic segmentation of weeds in crop plants. The
experiment proved that SegNet-512 was better for classification. In the study of Di Cicco
et al. (2017), the SegNet model was trained using synthetic data, and the performance
was evaluated on a real crop and weed dataset.

Fawakherji et al. (2019) proposed U-Net architecture using VGG-16 as an encoder


for semantic segmentation. They also applied a VGG-16 model for classifying the crop
plants and weeds. They also trained the model with one dataset containing sunflower
crop and evaluated it with two different datasets with carrots and sugar beets crops. In
the work of Rist et al. (2019), a ResNet based U-Net model was employed to map the
presence of gamba grass in the satellite image. However, Ramirez et al. (2020) compared
the performance of DeepLab-v3 (L.-C. Chen et al., 2017) with SegNet and U-Net model
in their research. The results demonstrated that DeepLab-v3 architecture achieved better
classification accuracy using class balanced data that has greater spatial context.

Lottes et al. (2018b) also proposed FCN architecture using DenseNet as a baseline
model. Their novel approach provided a pixel-wise semantic segmentation of crop plants
and weeds. The work of Lottes et al. (2020) proposed a task-specific decoder network.
As the plants were sown at a regular distance, they trained the model in a way so that
the model could learn the spatial plant arrangement from the image sequence. They then
fused this sequential feature with the visual features to localise and classify weeds in crop
plants. Dyrmann et al. (2017) used FCN architecture not only for segmentation but also
for generating bounding boxes around the plants. They applied pre-trained GoogLeNet
architecture as the base model.

According to A. Wang et al. (2020), changes in the input representation could make a
difference in classification performance. They employed the encoder-decoder deep learn-
ing network for semantic segmentation of crop and weed plants by initialising the input
layers with pre-trained weights. They evaluated the model with different input repre-
sentation by including NIR information with colour space transformation on the input,
which improved crop-weed segmentation and classification accuracy (96%). Sa et al.
(2018) also evaluated different input representation to train the network. They applied

50
Chapter 2. Literature Review

VGG-16 based SegNet architecture for detecting background, crop plants and weeds.
The model was evaluated by varying the number of spectral bands and changing the
hyper-parameters. The experimental results showed that the model achieved far better
accuracy by using nine spectral channels of an image rather than the RGB image.

H. Huang et al. (2018a) stated that the original FCN-4s architecture was designed for
PASCAL VOC 2011 dataset, which had 1000 classes of objects. However, their dataset
had only three categories (i.e., rice, weeds, and others). As a result they reduced the
feature maps of the intermediate layers to 2048. They then compared the accuracy
and efficiency of the model with original FCN-8s and DeepLab architecture and proved
that the modified FCN-4s model performed better. For the same reason, Bosilj et al.
(2020) simplified the original architecture of SegNet and named it as SegNet-Basic. They
decreased the number of convolutional layers from 13 to 4.

One of the problems with the basic architecture of FCN is that the spatial features
can not be recovered properly. The prediction accuracy can be decreased due to this
issue. To address this problem, H. Huang et al. (2020) improved the model by adding
skip architecture (SA), fully connected conditional random fields and partially connected
conditional random fields. They fine-tuned AlexNet, VGGNet, GoogLeNet, and ResNet
based FCN. They then compared the performance of different FCNs and Object-based
image analysis (OBIA) method. Experimental results reported that the VGGNet-based
FCN with proposed improvements achieved the highest accuracy.

Brilhador et al. (2019) modified the original U-Net architecture for pixel-level classi-
fication of crop plants and weeds. They added a convolutional layer with a kernel size of
1×1. For that change, they adjusted the input size of the network. Besides, replacing the
ReLU activation functions with the Exponential Linear Unit (ELU), they used adadelta
optimiser algorithm instead of the stochastic gradient descent and included dropout lay-
ers in between convolutional layers. Petrich et al. (2019) also modified the U-Net model
to detect one species of weed in grasslands.

51
2.10. Deep Learning Architecture

2.10.4 Graph Convolutional Network (GCN)

Hu et al. (2020) proposed Graph Weeds Net (GWN). GWN is a graph-based deep
learning architecture to classify weed species. Hu et al. (2020) used ResNet-50 and
DenseNet-202 model to learn vertex features with graph convolution layers, vertex-wise
dense layers, and the multi-level graph pooling mechanisms included in GWN architec-
ture. Here, an RGB image was represented as a multi-scale graph. The graph-based
model with DenseNet-202 architecture achieved the classification accuracy of 98.1%.

H. Jiang et al. (2020) proposed ResNet-101 based graph convolutional network in


their research. They chose GCN, because it was a semi-supervised learning approach.
Moreover, the feature relationships were captured using a graph structure. In this model,
the label information was shared by neighbouring vertices of the graph, which make the
learning more accurate with limited annotated data. They compared the proposed model
with AlexNet, VGG-16, and ResNet-101 architecture on four different datasets. The GCN
approach achieved 97.80%, 99.37%, 98.93% and 96.51% classification accuracy for each
dataset.

2.10.5 Hybrid Networks (HN)

Hybrid architectures are those where the researchers combine the characteristics of
two or more DL models. For instance, Chavan and Nandedkar (2018) proposed the
AgroAVNET model, which was a hybrid of AlexNet and VGGNet architecture. They
chose VGGNet for setting the depth of filters and used the normalisation concept of
AlexNet. They then compared the performance of the AgroAVNET network with the
original AlexNet and VGGNet and their different variants. All the parameters were
initialised using pre-trained weights except for the third layer of the fully connected
layers. They initialised that randomly. The AgroAVNET model outperformed others
with a classification accuracy of 98.21%. However, Farooq et al. (2019) adopted the
feature concatenation approach in their research. They combined a super pixel-based
LBP (SPLBP) method to extract local texture features, CNN for learning the spatial
features and SVM for classification. They compared their proposed FCN-SPLBP model
with CNN, LBP, FCN, and SPLBP architectures.

52
Chapter 2. Literature Review

R. Stewart et al. (2016) proposed OverFeat-GoogLeNet architecture by combining


the features from LSTM and GoogLeNet model. The model was used to develop a
“Parallelised Weed Detection System” by Umamaheswari et al. (2018). They claimed
that this system was robust, scalable and could be applied for real-time weed detection.
The classification accuracy of the system was 91.1%.

Kounalakis et al. (2019) fine-tuned AlexNet, VGG-F, VGG-16, Inception-v1, ResNet-


50, and ResNet-101 model to extract features from the images. They replaced the CNNs’
default classifiers with linear classifiers, i.e., SVM and logistic regression. They compared
the performance of various SVM and logistic regression classifiers by combining them
with CNN models for detecting weeds. They achieved the most balanced result in terms
of accuracy and false positive rate by using “L2-regularised with L2-loss logistic regression
model using primal computation” classifier. This classifier performed better while being
used with GoogLeNet architecture for detecting weeds in grasslands (Kounalakis et al.,
2018).

Espejo-Garcia et al. (2020) also replaced CNN’s default classifier with traditional ML
classifiers including SVM, XGBoost, and Logistic Regression. They initialised Xception,
Inception-ResNet, VGGNets, MobileNet, and DenseNet model with pre-trained weights.
The experimental result showed that the best performing network was DenseNet model
with the SVM classifier. The micro F1 score for the architecture was 99.29%. This
research also reported that with a small dataset, network performance could be enhanced
using this approach.

Adhikari et al. (2019) proposed a fully convolutional encoder-decoder network named


as Enhanced Skip Network. The model had multiple VGGNet-like blocks in the encoder
and decoder. However, the decoder part had fewer future maps to reduce the computa-
tional complexity and memory requirement. Besides, the skip layers, larger convolutional
kernels and a multi-scale filter bank were incorporated in the proposed model. The
weights were initialised using the transfer learning method. The model performed better
than U-Net, FCN8, and DeepLab-v3, Faster R-CNN, and EDNet in identifying weeds in
the paddy field. Chechlinski et al. (2019) combined U-Net architecture, MobileNet-v2
and DenseNet architectures and replaced transposed convolution layers with activation
map scaling.

53
2.11. Performance Evaluation Metrics

2.11 Performance Evaluation Metrics

In general, evaluation metrics is the measurement tool to quantify the performance


of a classifier. Different metrics are used to evaluate various characteristics of a classifier
(Hand, 2009). The evaluation metrics can be used either to measure the quality of a
classification model (Hand, 2009) or to compare the performance of the different trained
models for selecting the best one (Ozcift & Gulten, 2011). Various metrics were used
in related studies based on the research need. The most commonly used metric is clas-
sification accuracy (CA) to evaluate the DL model. Many of the authors used multiple
metrics to assess the model before drawing any conclusion. Table 2.6 lists the evaluation
metrics applied in the relevant studies.

As Table 2.6 shows, it is not easy to compare the related works as different types of
evaluation metrics are employed depending on the DL model, the goal of classification,
dataset and detection approach. However, the most frequently used evaluation metrics
are CA, F1 score and mIoU. In the case of classifying plant species, researchers prefer to
use confusion metrics to evaluate the model.

In addition to the evaluation metrics provided in Table 2.6, Milioto et al. (2017) jus-
tified their model based on run-time. This was because, to develop a real-time weeds and
crop plants classifier, it is important to identify the class of a plant as quickly as possible.
They showed how quickly their model could detect a plant in an image. Similarly, Suh
et al. (2018) calculated the classification accuracy of their model along with the time
required to train and identify classes of plants, as they intended to develop a real-time
classifier. Ma et al. (2019) also used run-time for justifying the model performance. They
found that, by increasing the patch size of the input images, it was possible to reduce
the time required to train the model. Another research method used inference time to
compare different DL architecture (H. Huang et al., 2018b). dos Santos Ferreira et al.
(2017) evaluated the CNN model not only based on time but also in terms of the memory
consumed by the model during training. They argued that though the CNN architecture
achieved higher accuracy than other machine learning model, it required more time and
memory to train the model. Andrea et al. (2017) showed that reducing the number of
layers of the DL model could make it faster in detecting and identifying the crop and

54
Chapter 2. Literature Review

Table 2.6: The evaluation metrics applied by different researchers of the related works

No. Performance Metric Meaning


1. Classification Accuracy (CA) The percentage of correct prediction among the input. A model is judged
based on how high the value is
2. True Positive (TP) How many times the model correctly predict the actual classes of the
object.
3. False Positive Rate (FPR) It is the proportion of negative cases incorrectly identified as positive cases
in the data.
4. False Negative Rate (FNR) The ratio of positive samples that were incorrectly classified.
5. Specificity (S) The fraction of True Negative from the sum of False Positive and True
Negative.
6. Mean Pixel Accuracy (MPA) It is the average of ration of the correctly classified pixels among all pixels
of the images in the dataset. It is used to evaluate the model for semantic
segmentation.
7. Precision (P) The fraction of correct prediction (True Positive) from the total number
of relevant result (Sum of True Positive and False Positive). It helps when
the value of False Positives are high.
8. Mean Average Precision It is the mean of average precision over all the classes of an object in the
(mAP) data.
9. Recall The fraction of True Positive from the sum of True Positive and False
Negative. It helps when the value of False Negatives are high.
10. F1 Score (F1) The harmonic mean of precision and recall.
11. Confusion Matrix (CM) It is the summary of the number of correct and incorrect prediction made
by a model. It helps to visualise not only the errors made by the model
but also the types of error in predicting the class of object.
12. Intersection over Union (IoU) It is the ratio of the overlapping area of ground truth (the hand labelled
bounding boxes from the testing dataset) and predicted area (predicted
bounding boxes from the model) to the total area.
13. Mean Intersection over Union It is average IoU over all the classes of an object in the dataset.
(mIoU)
14. Frequency Weighted It is the weighted average of IoUs based on pixel classes.
Intersection over Union
(FWIoU)
15. Mean Square Error (MSE) It is the mean of all the squared errors between the predicted and actual
target class
16. Root Mean Square Error It is the standard deviation of the difference between the predicted value
(RMSE) and observed values.
17. Mean Absolute Error (MAE) It is the mean of the absolute values of each prediction error on all in-
stances of the test dataset.
18. R2 It is the squared correlation between the observed and the predicted out-
come by the model.
19. K-fold Cross Validation The dataset is divided into K number of parts and each of the parts is
used as testing dataset.
20. Receiver Operating The true positive rate is plotted in function of the false positive rate for
Characteristic (ROC) curve different cut-off points of a parameter.
21. Kappa Coefficient Measures the degree of agreement between the true values and the pre-
dicted values
22. Matthews correlation A correlation coefficient between the observed and predicted binary clas-
coefficient (MCC) sifications
23. Dice Similarity Coefficient It is a measure of spatial overlap between two sets of pixels.
(DSC)

weed plants. They also used processing time as an evaluation criterion while choosing
the CNN architecture.

55
2.12. Discussion

2.12 Discussion

It is evident that the DL model offers high performance in the area of weed detection
and classification in crops. In this paper, we have provided an overview of the current sta-
tus of the area of the automatic weed detection technique. In most relevant studies, the
preferred method to acquire data was using a digital camera mounted on a ground vehicle
to collect RGB images. A few research studies collected multi-spectral or hyper-spectral
data. To prepare the dataset for training, different image processing techniques were
used to resize the images, background and noise removing and image enhancement. The
datasets were generally annotated using bounding boxes, pixel-wise and image level an-
notation approaches. For training the model, supervised learning approaches are applied
by the researchers. They employ different DL techniques to find a better weed detection
model. Detection accuracy is given as the most important parameter to evaluate the
performance of the model.

Nevertheless, there is still room for improvements in this area. Use of emerging
technologies can help to improve the accuracy and speed of automatic weed detection
systems. As crop and weed plants have many similarities, the use of other spectral
indices can improve the performance.

However, there is a lack of large datasets for crops and weeds. It is necessary to
construct a large benchmark dataset by capturing a variety of crops/weeds from different
geographical locations, weather conditions and at various growth stages of crops and
weeds. At the same time, it will be expensive to annotate these large datasets. Semi-
supervised (Chapelle et al., 2009; X.-Y. Zhang et al., 2019) or weakly supervised (Durand
et al., 2017; Zhou, 2018) approaches could be employed to address this problem.

Moreover, Generative Adversarial Network (GAN) (Ledig et al., 2017) or other syn-
thetic data generation techniques can contribute to creating a large dataset. Random
point generation and polygon labelling can further improve the precision of automatic
weed detection systems. DL is evolving very fast, and new state-of-art techniques are
being proposed. In addition to developing new solutions, researchers can enhance and
apply those methods in the area of weed detection. They can also consider using weakly
supervised, self-supervised or unsupervised approaches like multiple instance learning,

56
Chapter 2. Literature Review

few-shot or zero-shot learning as a means for synthetic data generation.

Furthermore, most datasets mentioned in this paper exhibit class imbalance, which
may create biases and lead to over-fitting of the model. Future research needs to ad-
dress the problem. This can be achieved via the use of appropriate data redistribution
approaches, cost-sensitive learning approaches (Khan et al., 2017), or class balancing
classifiers (Bi & Zhang, 2018; Taherkhani et al., 2020).

To summarise, the primary objective of developing automatic weed detection system


is to provide a weed management technique that will minimise cost and maximise crop
yields. To do so, researchers need to come up with a system that can be deployed on
devices with a lower computational requirement and can detect weeds accurately in real-
time.

2.13 Conclusion

This study provides a comprehensive survey of the deep learning-based research in


detecting and classifying weed species in value crops. A total of 70 relevant papers have
been examined based on data acquisition, dataset preparation, detection and classification
methods and model evaluation process. Publicly available datasets in the related field
are also highlighted for prospective researchers. In this article, we provide a taxonomy
of the research studies in this area and summarise the approaches of detecting weeds
(Table 2.2). It was found that most of the studies applied supervised learning techniques
using state-of-art deep learning models and they can achieve better performance and
classification accuracy by fine-tuning pre-trained models on any plant dataset. The results
also show that the experiments already have achieved very high accuracy when a sufficient
amount of labelled data of each class is available for training the models. However, the
existing research only achieved high accuracy in a limited experiment setup, e.g., on
small datasets of a select number of crops and weeds species. Computational speed in
the recognition process is another limiting factor for deployment on real-time fast-moving
herbicide spraying vehicles. An important future direction would be to investigate highly
efficient detection techniques using very large datasets with a variety of crop and weed
species so that one single model can be used across any weed-crop setting as needed.

57
2.13. Conclusion

Other potential future research directions include the need for large generalised datasets,
tailored machine learning models in weed-crop settings, addressing the class imbalance
problems, identifying the growth stage of the weeds, as well as thorough field trials for
commercial deployments.

58
Chapter 3

Weed classification

Most weed species can adversely impact agricultural productivity by competing for
nutrients required by high-value crops. Manual weeding is not practical for large cropping
areas. Many studies have been undertaken to develop automatic weed management sys-
tems for agricultural crops. In this process, one of the major tasks is to recognise the weeds
from images. However, weed recognition is a challenging task. It is because weed and crop
plants can be similar in colour, texture and shape which can be exacerbated further by the
imaging conditions, geographic or weather conditions when the images are recorded. Ad-
vanced machine learning techniques can be used to recognise weeds from imagery. In this
paper, we have investigated five state-of-the-art deep neural networks, namely VGG16,
ResNet-50, Inception-V3, Inception-ResNet-v2 and MobileNetV2, and evaluated their
performance for weed recognition. We have used several experimental settings and mul-
tiple dataset combinations. In particular, we constructed a large weed-crop dataset by
combining several smaller datasets, mitigating class imbalance by data augmentation, and
using this dataset in benchmarking the deep neural networks. We investigated the use
of transfer learning techniques by preserving the pre-trained weights for extracting the
features and fine-tuning them using the images of crop and weed datasets. We found that
VGG16 performed better than others on small-scale datasets, while ResNet-50 performed
better than other deep networks on the large combined dataset.

This chapter has been published: Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G.
(2022). Weed recognition using deep learning techniques on class-imbalanced imagery. Crop and Pasture
Science.
3.1. Introduction

3.1 Introduction

Weeds in crops compete for water, nutrients, space and light, and may decrease prod-
uct quality (Iqbal et al., 2019). Their control, using a range of herbicides, constitutes a
significant part of current agricultural practices. In Australia weed control costs in grain
production is estimated at $4.8 billion per annum. These costs include weed control and
the cost of lost production (McLeod, 2018).

The most widely used methods for controlling weeds are chemical-based, where her-
bicides are applied at an early growth stage of the crop (Harker & O’Donovan, 2013;
López-Granados, 2011). Although the weeds spread in small patches in crops, herbicides
are usually applied uniformly throughout the agricultural field. While such an approach
works reasonably well against weeds, it also affects the crops. A report from the Eu-
ropean Food Safety Authority (EFSA) shows that most of the unprocessed agricultural
produces contain harmful substances originating from herbicides (Medina-Pastor & Tri-
acchini, 2020).

Recommended rates of herbicide application are expensive and may also be detrimen-
tal to the environment. Thus, new methods that can be used to identify weeds in crops,
and then selectively apply herbicides on the weeds, or other methods to control weeds,
will reduce production costs to the farmers and benefit the environment. Technologies
that enable the rapid discrimination of weeds in crops are now becoming available (H.
Tian et al., 2020).

Recent advances in Deep Learning (DL) have revolutionised the field of Machine
Learning (ML). DL has made a significant impact in the area of computer vision by
learning features and tasks directly from audio, images or text data without human
intervention or predefined rules (Dargan et al., 2019). For image classification, DL meth-
ods outperform humans and traditional ML methods in accuracy and speed (Steinberg,
2017). In addition, the availability of computers with powerful GPUs, coupled with the
availability of large amounts of labelled data, enable the efficient training of DL models.

As for other computer vision and image analysis problems, digital agriculture and
digital farming also benefits from the recent advances in deep learning. Deep learning
techniques have been applied for weed and crop management, weed detection, localisa-

60
Chapter 3. Weed classification

tion and classification, field conditions and livestock monitoring (Kamilaris & Prenafeta-
Boldú, 2018).

ML techniques have been used in commercial solutions to combat weeds. “Robocrop


Spot Sprayer” (“Robocrop Spot Sprayer: Weed Removal”, 2018) is a video analysis-
based autonomous selective spraying system that can identify potatoes grown in carrots,
parsnips, onions or leeks. “WeedSeeker sprayer” (“WeedSeeker 2 Spot Spray System”,
n.d.) is a near-infrared reflectance sensor-based system that detects the green compo-
nent in the field. The machine sprays herbicides only on the plants while reducing the
amount of herbicide. Similar technology is offered by a herbicide spraying system known
as “WEED-IT” (“Precision Spraying - Weed Sprayer”, n.d.). It can target all green plants
on the soil. A fundamental problem with these systems is that they are non-selective
of crops or weeds. Therefore the ability to discriminate between crops and weeds is
important.

Further development of autonomous weed control systems can be beneficial both eco-
nomically and environmentally. Labour costs can be reduced by using a machine to
identify and remove weeds. Selective spraying can also minimise the amount of herbi-
cides applied (Lameski et al., 2018). The success of an autonomous weed control system
will depend on four core modules: (i) weed detection and recognition, (ii) mapping, (iii)
guidance and (iv) weed control (Olsen et al., 2019). This paper focuses on the first
module: weed detection and recognition, which is a challenging task (Slaughter et al.,
2008). This is because both weeds and crop plants often exhibit similar colours, textures
and shapes. Furthermore, the visual properties of both weeds and crop plants can vary
depending on the growth stage, lighting conditions, environments and geographical lo-
cations (Hasan et al., 2021; Jensen et al., 2020b). Also, weeds and crops, exhibit high
inter-class similarity as well as high intra-class dissimilarity. The lack of large-scale crop
weed datasets is a fundamental problem for DL-based solutions.

There are many approaches to recognise weed and crop classes from images (Wäld-
chen & Mäder, 2018). High accuracy can be obtained for weed classification using Deep
Learning (DL) techniques (Kamilaris & Prenafeta-Boldú, 2018) whereas Chavan and
Nandedkar (2018) used Convolutional Neural Network (CNN) models to classify weeds
and crop plants. Teimouri et al. (2018) used DL for the classification of weed species and

61
3.1. Introduction

the estimation of growth stages, with an average classification accuracy of 70% and 78%
for growth stage estimation.

As a general rule, the accuracy of the methods used for the classification of weed
species decreases in multi-class classification when the number of classes is large (Dyr-
mann et al., 2016; Peteinatos et al., 2020). Class-imbalanced datasets also reduce the
performance of DL-based classification techniques because of overfitting (Ali-Gombe &
Elyan, 2019). This problem can be addressed using data-level and algorithm-level meth-
ods. Data-level methods include oversampling or undersampling of the data. In contrast,
algorithm-level methods work by modifying the existing learning algorithms to concen-
trate less on the majority group and more on the minority classes. The cost-sensitive
learning approach is one such approach (Khan et al., 2017; Krawczyk, 2016).

DL techniques have been used extensively for weed recognition, for example Hasan et
al. (2021) have provided a comprehensive review of these techniques. dos Santos Ferreira
et al. (2017) compared the performance of CNN with Support Vector Machines (SVM),
Adaboost – C4.5, and Random Forest models for discriminating soybean plants, soil,
grass, and broadleaf weeds. This study shows that CNN can be used to classify images
more accurately than other machine learning approaches. Nkemelu et al. (2018) report
that CNN models perform better than SVM and K-Nearest Neighbour (KNN) algorithms.

Transfer learning is an approach that uses the learned features on one problem or
data domain for another related problem. Transfer learning mimics classification used
by humans, where a person can identify a new thing using previous experience. In deep
learning, pre-trained convolutional layers can be used as a feature extractor for a new
dataset (Shao et al., 2014). However, most of the well-known CNN models are trained
on ImageNet datasets, which contains 1000 classes of objects. That is why, depending on
the number of classes in the desired dataset, only the classification layer (fully connected
layer) of the models need to be trained again in the transfer learning approach. Suh et
al. (2018) applied six CNN models (AlexNet, VGG-19, GoogLeNet, ResNet-50, ResNet-
101 and Inception-v3) pre-trained on the ImageNet dataset to classify sugar beet and
volunteer potatoes. They reported that these models can achieve a classification accuracy
of about 95% without retraining the pre-trained weights of the convolutional layers. They
also observed that the models’ performance improved significantly by fine-tuning the pre-

62
Chapter 3. Weed classification

trained weights. In the fine-tuning approach, the convolutional layers of the DL models
are initialised with the pre-trained weights, and subsequently during the training phase
of the model, those weights are retrained for the desired dataset. Instead of training a
model from scratch, initialising it with pre-trained weights and fine-tuning them helps the
model to achieve better classification accuracy for a new target dataset, and this also saves
training time (Gando et al., 2016; Girshick et al., 2014; Hentschel et al., 2016). Olsen
et al. (2019) fine-tuned the pre-trained ResNet-50 and Inception-V3 models to classify
nine weed species in their study and achieved an average accuracy of 95.7% and 95.1%
respectively. In another study, VGG16, ResNet-50 and Inception-V3 pre-trained models
were fine-tuned to classify the weed species found in the corn and soybean production
system (A. Ahmad et al., 2021). The VGG16 model achieved the highest classification
accuracy of 98.90% in their research.

In this paper, we have performed several experiments: i) First, we evaluated the


performance of DL models under the same experimental conditions using small-scale
public datasets. ii) We then constructed a large dataset by combining a few small-
scale datasets with a variety of weeds in crops. In the dataset construction process,
we mitigated the class imbalance problem. In a class-imbalance dataset, certain classes
have very high or lower representation compared to others. iii) We then investigated the
performance of DL models following several pipelines, e.g. transfer learning and fine-
tuning. Finally, we provide a thorough analysis and offer future perspectives (Section
3.3).

The main contributions of this research are:

• construction of a large data set by combining four small-scale datasets with a variety
of weeds and crops.

• addressing the class imbalance issue of the combined dataset using the data aug-
mentation technique.

• comparing the performance of five well-known DL methods using the combined


dataset.

• evaluating the efficiency of the pre-trained models on the combined dataset using
the transfer learning and fine-tuning approach.

63
3.2. Materials and Methods

This paper is organised as follows: Section 3.2 describes the materials and methods,
including datasets, pre-processing approaches of images, data augmentation techniques,
DL architectures and performance metrics. Section 3.3 covers the experimental results
and analysis, and Section 3.4 concludes the paper.

3.2 Materials and Methods

3.2.1 Dataset

In this work, four publicly available datasets were used: the “DeepWeeds dataset”(Olsen
et al., 2019), the “Soybean weed dataset” (dos Santos Ferreira et al., 2017), the “Cotton-
tomato and weed dataset” (Espejo-Garcia et al., 2020) and the “Corn weed dataset” (H.
Jiang et al., 2020).

3.2.1.1 DeepWeeds dataset

The DeepWeeds dataset contains images of eight nationally significant species of weeds
collected from eight rangeland environments across northern Australia. It also includes
another class of images that contain non-weed plants. These are represented as a negative
class. In this research, the negative image class was not used as it does not have any
weed species. The images were collected using a FLIR Blackfly 23S6C high-resolution
(1920 × 1200 pixel) camera paired with the Fujinon CF25HA-1 machine vision lens
(Olsen et al., 2019). The dataset is publicly available through the GitHub repository:
https://github.com/AlexOlsen/DeepWeeds.

3.2.1.2 Soybean Weed Dataset

dos Santos Ferreira et al. (2017) acquired soybean, broadleaf, grass and soil images
from Campo Grande in Brazil. We did not use the images from the soil class as they
did not contain crop plants or weeds. dos Santos Ferreira et al. (2017) used a “Sony
EXMOR” RGB camera mounted on an Unmanned Aerial Vehicle (UAV - - DJI Phantom
3 Professional). The flights were undertaken in the morning (8 to 10 am) from December
2015 to March 2016 with 400 images captured manually at an average height of four

64
Chapter 3. Weed classification

meters above the ground. The images of size 4000 × 3000 were then segmented using
the Simple Linear Iterative Clustering (SLIC) superpixels algorithm (Achanta et al.,
2012) with manual annotation of the segments to their respective classes. The dataset
contained 15336 segments of four classes. This dataset is publicly available at the website:
https://data.mendeley.com/datasets/3fmjm7ncc6/2.

3.2.1.3 Cotton Tomato Weed Dataset

This dataset was acquired from three different farms in Greece, covering the south-
central, central and northern areas of Greece. The images were captured in the morning
(8 to 10 am) from May 2019 to June 2019 to ensure similar light intensities. The images
of size 2272 × 1704 were taken manually from about one-meter height using a Nikon
D700 camera (Espejo-Garcia et al., 2020). The dataset is available through the GitHub
repository: https://github.com/AUAgroup/early-crop-weed.

3.2.1.4 Corn Weed Dataset

This dataset was taken from a corn field in China. A total of 6000 images were
captured using a Canon PowerShot SX600 HS camera placed vertically above the crop.
To avoid the influence of illumination variations from different backgrounds, the images
were taken under various lighting conditions. The original images were large (3264 ×
2448), and these were subsequently resized to a resolution of 800 × 600 (H. Jiang et
al., 2020). The dataset is available at the Github: https://github.com/zhangchuanyin/
weed-datasets/tree/master/corn%20weed%20datasets.

3.2.1.5 Our Combined Dataset

In this paper, we combine all these datasets to create a single large dataset with weed
and crop images sourced from different weather and geographical zones. This has created
extra variability and complexity in the dataset with a large number of classes. This is
also an opportunity to test the DL models and show their efficacy in complex settings.
We used this combined dataset to train the classification models. Table 3.1 provides a
summary of the dataset used. The combined dataset contains four types of crop plants

65
3.2. Materials and Methods

and sixteen species of weeds. The combined dataset is highly class-imbalanced since 27%
of images are from the soybean crop, while only 0.2% of images are from the cotton crop
(Table 3.1).

3.2.1.6 Unseen Test Dataset

Another set of data was collected from the Eden Library website (https://edenlibrary.
ai/) for this research. The website contains some plant datasets for different research
work that use artificial intelligence. The images were collected under field conditions.
We used images of five different crop plants from the website namely: Chinese cabbage
(142 images), grapevine (33 images), pepper (355 images), red cabbage (52 images) and
zucchini (100 images). In addition, we also included 500 images of lettuce plants (H.
Jiang et al., 2020) and 201 images of radish plants (Lameski et al., 2017) in the combined
dataset. This dataset was then used to evaluate the performance of the transfer learning
approach. This experiment checks the reusability of the DL models in the case of a new
dataset.

Table 3.1: Summary of crop and weed datasets used in this research

% of images in
Number
Dataset Location Crop/weed species the class in the
of images
combined dataset
Chinee apple 1126 4.17
Lantana 1063 3.94
Parkinsonia 1031 3.82
Parthenium 1022 3.78
DeepWeeds Australia Weed
Prickly acacia 1062 3.93
Rubber vine 1009 3.74
Siam weed 1074 3.98
Snakeweed 1016 3.76
Crop Soybean 7376 27.31
Soybean Weed Brazil Broadleaf 1191 4.41
Weed
Grass 3526 13.06
Cotton 54 0.20
Crop
Cotton Tomato Tomato 201 0.74
Greece
Weed Black nightshade 123 0.46
Weed
Velvet leaf 130 0.48
Crop Corn 1200 4.44
Blue Grass 1200 4.44
Corn Weed China Chenopodium album 1200 4.44
Weed
Cirsium setosum 1200 4.44
Sedge 1200 4.44

66
Chapter 3. Weed classification

In the study, the images of each class were randomly assigned for training (60%),
validation (20%) and testing (20%). Each image was labelled with one image-level anno-
tation. This means that each image has only one label, i.e., the name of the weed or crop
classes, e.g., Chinee apple or corn. Figure 3.1 provides sample images in the dataset.

(a) Chinee apple (b) Lantana (c) Parkinsonia (d) Parthenium

(e) Prickly acacia (f) Rubber vine (g) Siam weed (h) snakeweed

(i) Soybean (j) Broadleaf (k) Grass (l) Cotton

(m) Tomato (n) Black nightshade (o) Velvet leaf (p) Corn

(q) Blue grass (r) Chenopodium album (s) Cirsium setosum (t) Sedge

Figure 3.1: Sample crop and weed images of each class from the datasets.

67
3.2. Materials and Methods

3.2.2 Image Pre-processing

Some level of image pre-processing is needed before the data can be used as input
for training the DL model. This includes resizing the images, removing the background,
enhancing and denoising the images, colour transformation, morphological transformation
etc. In this study, the Keras pre-processing utilities (Chollet et al., 2015) were used to
prepare the data for training. This function applies some predefined operations to the
data. One of the operations is to increase the dimension of the input. DL models process
images in batches. To create the batches of images, additional dimension resizing is
needed. An image contains three properties, e.g., image height, width and the number of
channels. The pre-processing function adds a dimension to the image for inclusion in the
batch information. Pre-processing involves normalising the data so that the pixel values
range is from 0 to 1. Each model has a specific pre-processing technique to transform a
standard image into an appropriate input. Research suggests that the classification model
performance is improved by increasing the input resolution of the images (Sabottke &
Spieler, 2020; Sahlsten et al., 2019). However, the model’s computational complexity also
increases with a higher resolution of the input image. The default input resolution for all
the models used in this research is 224 × 224.

3.2.3 Data Augmentation

The combined dataset is highly class-imbalanced. The minority classes are over-
sampled using image augmentation to balance the dataset. The augmented data is only
used to train the models. Image augmentation is done using the Python image processing
library Scikit-image (Van der Walt et al., 2014). After splitting the dataset into training,
validation and testing sets, most training images were from soybean with 4,425 image.
By applying augmentation approaches, we obtained 4425 images for all other weed and
crop classes; thus we ensured that all classes were balanced. The following operations
were applied randomly to the data to generate the augmented images:

• Random rotation in the range of [-25, +25] degrees,

• Horizontal and vertical scaling in the range of 0.5 and 1,

68
Chapter 3. Weed classification

• Horizontal and vertical flip,

• Added random noise (Gaussian noise),

• Blurring the images,

• Applied gamma, sigmoid and logarithmic correction operation, and

• Stretched or shrunk the intensity levels of images.

The models are then trained on both actual data and augmented data without making
any discrimination.

3.2.4 Deep Learning

Five state-of-the-art deep learning models with pre-trained weights were used in this
research to classify images. These models were made available via the Keras Application
Programming Interface (API) (Chollet et al., 2015). TensorFlow (Abadi et al., 2016) was
used as a machine learning framework. The selected CNN architectures were:

• VGG16 (Simonyan & Zisserman, 2014) uses a stack of convolutional layers with a
very small receptive field (3 × 3). It was the winner of ImageNet Challenge 2014
in the localisation track. The architecture consists of a stack of 13 convolutional
layers, followed by three fully connected layers. A very small receptive field (3 ×
3) is used in the convolutional layers. The network fixes the convolutional stride
and padding to 1 pixel. Spatial pooling is carried out by the max-pooling layers.
However, only five of the convolutional layers are followed by the max-pooling layer.
This actual state-of-the-art VGG16 model has 138,357,544 trainable parameters.
Of these, about 124 million parameters are contained in the fully connected layers.
Those layers were customised in this research.

• ResNet-50 (K. He et al., 2016) is deeper than VGG16 but has a lower compu-
tational complexity. Generally, with increasing depths of the network, the perfor-
mance becomes saturated or degraded. The model uses residual blocks to maintain
accuracy with the deeper network. The residual blocks also contain convolutions
layers like VGG16. The model uses batch normalisation after each convolutional

69
3.2. Materials and Methods

layer and before the activation layer. The model explicitly reformulates the layers
as residual functions with reference to the input layers and skip connections. Al-
though the model contains more layers than VGG16, it only has 25,636,712 trainable
parameters.

• Inception-V3 (Szegedy et al., 2016) uses a deeper network with fewer training
parameters (23,851,784). The model consists of symmetric and asymmetric building
blocks with convolutions, average pooling, max pooling, concats, dropouts, and fully
connected layers.

• Inception-ResNet-V2 (Szegedy et al., 2017) combines the concept of skip con-


nections from ResNet with Inception modules. Each inception block is followed by a
filter expansion layer (1 × 1 convolution without activation). Before concatenation
with the input layer the dimensionality expansion is performed to match the depth.
The model uses batch normalisation only on the traditional layer, but not for the
summation layers. The network is 164 layers deep and has 55,873,736 trainable
parameters.

• MobileNetV2 (Sandler et al., 2018) allows memory-efficient inference with a reduced


number of parameters. It contains 3,538,984 trainable parameters. The basic build-
ing block of the model is a bottleneck depth-separable convolution with residuals.
The model has the initial fully convolution layer with 32 filters, followed by 19 resid-
ual bottleneck layers. It always uses 3 × 3 kernels and utilises the dropout layer
and batch normalisation during training. Instead of ReLU (Rectified Linear Unit),
this model uses ReLU6 as an activation function. ReLU6 is a variant of ReLU,
where the number 6 is an arbitrary choice of the upper bound, which worked well
and the model can easily learn the sparse features.

All the models were initialised with pre-trained weights trained on the ImageNet
dataset. As the models were trained to recognise 1000 different objects, the original
architecture was slightly modified to classify twenty crops and weed species. The last
fully-connected layer of the original model was replaced by a global average pooling
layer followed by two dense layers with 1024 neurons and “ReLU” activation function.
The output contained another dense layer where the number of neurons depended on

70
Chapter 3. Weed classification

the number of classes. The softmax activation function was used in the output layer
since the models were multi-class classifiers. The size of the input was 256×256×3, and
the batch size was 64. The maximum number of epochs for training the models was
100. However, often the training was completed before reaching the maximum number.
The initial learning rate was set to 1 × 10-4 and is randomly decreased down to 10-6 by
monitoring the validation loss in every epoch. Table 3.2 shows the number of parameters
of each of the models used in this research without the output layer. It was found that
the Inception-Resnet-V2 model has the most parameters, and the MobileNetV2 model
has the least.

Table 3.2: Number of parameters used in the deep learning models

Deep Learning Model Number of parameters

VGG16 16,289,600
ResNet-50 26,735,488
Inception-V3 24,950,560
Inception-ResNet-V2 56,960,224
MobileNetV2 4,585,216

3.2.5 Transfer Learning and Fine-Tuning

A conventional DL model contains two basic components: a feature extractor and


a classifier. Depending on the DL model, different layers in the feature extractor and
classifier may vary. However, all the DL architectures, used in this research, contain a
series of trainable filters. Their weights are adjusted or trained for classifying images
of a target dataset. Figure 3.2a shows a basic structure of a pre-trained DL model. A
pre-trained DL model means that the weights of the filters in the feature extractor and
classifier is trained to classify 1000 different classes of images contained in the ImageNet
dataset. The concept of transfer learning is to use those pre-trained weights to classify the
images of a new unseen dataset (Y. Guo et al., 2019; S. J. Pan & Yang, 2009). We used
this approach in two different ways. The approaches were categorised as transfer learning
and fine-tuning. To train the model using our dataset of crop and weed images, we
took the feature extractor from the pre-trained DL model and removed its classifier part
since it was designed for a specific classification task. In the transfer learning approach
(Figure 3.2b), we only trained the weights of the filters in the classifier part and kept

71
3.2. Materials and Methods

the pre-trained weights of the layer in the feature extractor. This process eliminates the
potential issue of training the complete network on a large number of labelled images.
However, in the fine-tuning approach (Figure 3.2c), the weights in the feature extractor
were initialised from the pre-trained model, but not fixed. During the training phase of the
model, the weights were retrained together with the classifier part. This process increased
the efficiency of the classifier because it was not necessary to train the whole model
from scratch. The model can extract discriminating features for the target dataset more
accurately. Our experiments used both approaches and evaluated their performance on
the crop and weed image dataset. Finally, we trained one state-of-the-art DL architecture
from scratch, using our combined dataset (Section 3.2.1.5) and used its feature extractor
to classify the images in an unseen test dataset (Section 3.2.1.6) using the transfer learning
approach. The performance of the pre-trained state-of-the-art model was then compared
with the model trained on the crop and weed dataset.

3.2.6 Performance Metrics

The models were tested and thoroughly evaluated using several metrics: accuracy,
precision, recall, and F1 score metrics, which are defined as follows:

• Accuracy (Acc): It is the percentage of images whose classes are predicted cor-
rectly among all the test images. A higher value represents a better result.

• Precision (P): The fraction of correct prediction (True Positive) from the total
number of relevant result (Sum of True Positives and False Positives).

• Recall (R): The fraction of True Positive from the sum of True Positive and False
Negative (number of incorrect predictions).

• F1 Score (F1): The harmonic mean of precision and recall. This metric is useful
to measure the performance of a model on a class-imbalanced dataset.

• Confusion Matrix: It is used to measure the performance of machine learning


models for classification problems. The confusion matrix tabulates the comparison
of the actual target values with the values predicted by the trained model. It helps

72
Chapter 3. Weed classification

(a) Pre-trained DL model

(b) Transfer learning approach

(c) Fine-tuning approach

Figure 3.2: The basic block diagram of DL models used for the experiments.

to visualise how well the classification model is performing and what prediction
errors it is making.

In all these metrics, a higher value represents better performance.

3.3 Results and Discussions

We conducted five sets of experiments on the data. Table 3.3 shows the number of
images used for training, validation and testing of the models. Augmentation was applied
to generate 4,425 images for each of the classes. However, only actual images were used to
validate and test the models. All the experiments were done on a desktop computer, with

73
3.3. Results and Discussions

an Intel(R) Core(TM) i9-9900X processor, 128 gigabyte of RAM and a NVIDIA GeForce
RTX 2080 Ti Graphics Processing Unit (GPU). We used the Professional Edition of the
Windows 10 operating system. The deep learning models were developed using Python
3.8 and Tensorflow 2.4 framework.

Table 3.3: The numbers of images used to train (after augmentation), validate and test
the models.

Training set (number


Crop and Validation Test set
Dataset of images)
weed set (number (number
Real and
species of images) of images)
Real images augmented
images
Chinee apple 675 4,425 225 226
Lantana 637 4,425 212 214
Parkinsonia 618 4,425 206 207
Parthenium 613 4,425 204 205
DeepWeeds
Prickly acacia 637 4,425 212 213
Rubber vine 605 4,425 201 203
Siam weed 644 4,425 214 216
Snakeweed 609 4,425 203 204
Soybean 4,425 4,425 1,475 1,476
Soybean
Broadleaf 714 4,425 238 239
Weed
Grass 2,112 4,425 704 704
Cotton 32 4,425 10 12
Cotton
Tomato 120 4,425 40 41
Tomato
Black
Weed 73 4,425 24 26
nightsade
Velvet leaf 78 4,425 26 26
Corn 720 4,425 240 240
Bluegrass 720 4,425 240 240
Corn Weed Chenopodium
720 4,425 240 240
album
Cirsium
720 4,425 240 240
setosum
Sedge 218 4,425 239 241

3.3.1 Experiment 1: Comparing the performance of DL models


for classifying images in each of the datasets

In this experiment, we trained the five models separately on each dataset using only
actual images (see Table 3.3). Both transfer learning (TL) and fine-tuning (FT) ap-
proaches were used to train the models. Table 3.4 shows the training, validation and

74
Chapter 3. Weed classification

testing accuracy for the five models.

Table 3.4: Training, validation and testing accuracy for classifying crop and weed
species of all four datasets using different DL models.

Training Validation Testing


Deep Learning
Dataset Accuracy (%) Accuracy (%) Accuracy (%)
model
TL FT TL FT TL FT
VGG16 98.43 99.46 83.84 93.44 84.05 93.36
ResNet-50 97.56 100.00 46.51 92.96 44.31 93.78
DeepWeeds Inception-V3 81.20 100.00 34.28 86.17 34.77 86.08
Inception-ResNet-V2 81.02 100.00 35.84 89.09 36.55 89.39
MobileNetV2 96.47 100.00 35.01 33.09 32.23 31.87

VGG16 100.00 99.97 96.83 99.33 96.92 99.67


ResNet-50 100.00 100.00 71.72 99.50 63.11 99.50
Corn Weed Inception-V3 98.92 100.00 68.39 98.41 59.28 98.42
Inception-ResNet-V2 97.55 100.00 47.21 99.75 44.96 99.33
MobileNetV2 99.03 100.00 70.89 89.91 69.03 87.51

VGG16 100.00 96.04 94.00 92.00 99.05 88.57


Cotton ResNet-50 100.00 100.00 54.00 99.00 55.24 99.05
Tomato Inception-V3 100.00 100.00 53.00 96.00 59.05 98.10
Weed Inception-ResNet-V2 95.71 100.00 64.00 77.00 57.33 77.14
MobileNetV2 100.00 100.00 64.00 72.00 60.00 78.10

VGG16 100.00 99.96 98.97 99.79 98.76 99.88


ResNet-50 99.98 100.00 82.58 99.91 83.16 99.83
Soybean
Inception-V3 99.49 100.00 88.25 99.67 86.77 99.71
Weed
Inception-ResNet-V2 98.80 100.00 90.36 99.79 89.78 99.59
MobileNetV2 100.00 100.00 94.54 99.54 94.75 99.67

On the DeepWeeds dataset, the VGG16 model achieved the highest training, valida-
tion and testing accuracy (98.43%, 83.84% and 84.05% respectively) using the transfer
learning approach. The training accuracy of the other four models was above 81%. How-
ever, the validation and testing accuracy for those models were less than 50%. This
suggests that the models are overfitting. After fine-tuning the models, the overfitting
problem was mitigated except for the MobileNetV2 architecture. Although four of the
models achieved 100% training accuracy after fine-tuning, the validation and testing
accuracy was between 86% and 94%. MobileNetV2 model still overfitted even after fine-
tuning with about 32% validation and testing accuracy. Overall, the VGG16 model gave
the best results for the DeepWeeds dataset as they had the least convolutional layers,
which was adequate for small datasets. It should be noted that Olsen et al. (2019),
who initially worked on this dataset, achieved an average classification accuracy of 95.1%

75
3.3. Results and Discussions

and 95.7% using Inception-V3 and ResNet-50, respectively. However, they applied data
augmentation techniques to overcome the variable nature of the dataset.

On the Corn Weed and Cotton Tomato Weed datasets, the VGG16 and ResNet-50
models generally gave accurate result, but the accuracy of validation and testing were
low for the DL models using the transfer learning approach for both datasets, and the
classification performance of the models was substantially improved after fine-tuning.
Among the five models, the retrained Inception-ResNet-V2 model gave better results for
the Corn Weed dataset with training, validation and testing accuracy of 100%, 99.75%
and 99.33% respectively. The ResNet-50 model accurately classified the images of the
Cotton Tomato Weed dataset.

VGG16 architecture reached about 99% classification accuracy on both validation


and testing data of the “Soybean Weed” dataset using the transfer learning approach.
Also, the performance of four other models are better for this dataset using pre-trained
weights. Compared to other datasets, the “Soybean Weed” dataset had more training
samples, which helped to improve its classification performance. However, after fine-
tuning the models on the datasets, all five deep learning architectures achieve more than
99% classification accuracy on the validation and testing data.

According to the results of this experiment, as shown in Table 3.4, it can be concluded
that, for classifying the images of crop and weed species dataset, the transfer learning
approach does not work well. Since the pre-trained models were trained on the “ImageNet”
dataset (Deng et al., 2009), which does not contain images of crop or weed species, the
models cannot accurately classify weed images.

3.3.2 Experiment 2: Combining two datasets

In the previous experiment, we showed that it was unlikely to achieve better classi-
fication results using pre-trained weights for the convolutional layers of the DL models.
The image classification accuracy improved by fine-tuning the weights of the models for
the crop and weed dataset. For that reason, in this experiment, all the models were ini-
tialised with pre-trained weights and then retrained for the dataset. In this experiment,
the datasets were paired up and used to generate six combinations to train the models.

76
Chapter 3. Weed classification

The training, validation and testing accuracies are shown in Table 3.5. The combinations
were-

• “DeepWeeds” with “Corn Weed” dataset (DW-CW),

• “DeepWeeds” with “Cotton Tomato Weed” dataset (DW-CTW),

• “DeepWeeds” with “Soybean Weed” dataset (DW-SW),

• “Corn Weed” with “Cotton Tomato Weed” dataset (CW-CTW),

• “Corn Weed” with “Soybean Weed” dataset (CW-SW) and

• “Cotton Tomato Weed” with “Soybean Weed” dataset.

After fine-tuning the weights, all the DL models reached 100% training accuracy. The
accuracy of the DL architectures also gave better validation and testing results when
trained with CW-CTW, CW-SW, CTW-SW combined datasets. However, the models
overfitted when trained on the “DeepWeeds” dataset and combined with any of the other
three datasets.
Table 3.5: Training, validation and testing accuracy of the DL models after training by
combining two of the datasets

DW- DW- DW- CW- CW- CTW-


DL Models Accuracy
CW CTW SW CTW SW SW

Training 100.00 99.63 99.95 99.97 100.00 100.00


VGG16 Validation 96.21 93.64 97.31 98.99 99.67 99.76
Testing 96.22 94.37 97.25 99.61 99.75 99.76

Training 100.00 100.00 100.00 100.00 100.00 100.00


ResNet-50 Validation 96.10 93.58 97.68 99.53 99.64 99.72
Testing 95.67 93.25 97.42 99.31 99.61 99.80

Training 100.00 100.00 100.00 100.00 100.00 100.00


Inception-V3 Validation 92.45 87.06 96.07 98.15 99.59 99.44
Testing 92.06 87.45 96.23 99.16 99.67 99.88

Training 100.00 100.00 100.00 100.00 100.00 100.00


Inception-
Validation 94.26 89.70 96.43 98.76 99.64 99.56
ResNet-V2
Testing 94.25 90.35 96.93 99.46 99.67 99.60

Training 100.00 100.00 100.00 100.00 100.00 100.00


MobileNetV2 Validation 93.01 43.16 96.02 98.31 99.42 99.52
Testing 92.94 42.49 95.98 98.55 99.61 99.68

The results of the confusion matrix are provided in Figure 3.3. We found that chinee
apple, lantana, prickly acacia and snakeweed had a high confusion rate. This result

77
3.3. Results and Discussions

agrees with that of Olsen et al. (2019). Visually, the images were quite similar and so
were difficult to distinguish. That is why the DL model also failed to detect those. Since
the dataset was small and did not have enough variations among the images, the models
were not able to distinguish among the classes. The datasets also lacked enough images
taken under different lighting conditions. The models were unable to detect the actual
class of the images because of the illumination effects.

For the DW-CW dataset, the VGG16 model was more accurate. In this case, the
model did not distinguish between chinee apple and snakeweed. As shown in the con-
fusion matrix in Figure 3.3a, out of 224 test images of chinee apple, 16 were classified
as snakeweed, and 23 of the 204 test images of snakeweed identified as chinee apple. A
significant number of chinee apple and snakeweed images were not correctly predicted
by the VGG16 model (see Figure 3.3b). For the DW-SW dataset, the ResNet-50 model
achieved 100% training, 97.68% validation and 97.42% testing accuracy. The confusion
matrix is shown in Figure 3.3c. The ResNet-50 model identified 13 chinee apple images
as snakeweed, and the same number of snakeweed images were classified as chinee apple.
The model also identified 9 test images of snakeweed as lantana. Figure 3.4 shows some
sample images which the models classified incorrectly.

By applying data augmentation techniques, one can create more variations among the
classes which may also help the model to learn more discriminating features.

3.3.3 Experiment 3: Training the model with all four datasets


together

In this experiment, all the datasets were combined to train the deep learning models.
Classifying the images of the combined dataset is much more complex, as the data is
highly class-imbalanced. The models were initialised with pre-trained weights and then
fine-tuned. Table 3.6 shows the training, validation and testing accuracy and average
precision, recall, and F1 scores achieved by the models on the test data.

After training the models with the combined dataset, the ResNet-50 model performed
better. Though all the models except VGG16 achieved 100% training accuracy, the vali-
dation (97.83%) and testing (98.06%) accuracies of ResNet-50 architecture were higher.

78
Chapter 3. Weed classification

(a) Confusion matrix for DW-CW dataset (using VGG16 (b) Confusion matrix for DW-CTW dataset (using
model) VGG16 model)

(c) Confusion matrix for DW-SW dataset (ResNet-50


model)

Figure 3.3: Confusion matrix of “DeepWeeds” combined with other three dataset.

The average precision, recall and F1 score also verified these results. However, the mod-
els still did not correctly classify the chinee apple and snakeweed species mentioned in
the previous experiment (Section 3.3.2). A confusion matrix for predicting the classes of
images using ResNet-50 is shown in Figure 3.5. The confusion of ResNet-50 is chosen,
since the highest accuracy is achieved in this experiment using this model. Seventeen
chinee apple images were classified as snakeweed, and fifteen snakeweeds images were

79
3.3. Results and Discussions

(a) Chinee apple predicted (b) snakeweed predicted as (c) Lantana predicted as (d) Prickly acacia
as snakeweed chinee apple prickly acacia predicted as lantana

Figure 3.4: Example of incorrectly classified images.

Table 3.6: The performance of five deep learning models after training with the
combined dataset

Training Validation Testing Precision Recall F1 score


DL model
accuracy accuracy accuracy (Average) (Average) (Average)
VGG16 99.96 97.53 97.76 96.89 96.83 96.84
ResNet-50 100.00 97.83 98.06 98.06 98.06 98.05
Inception-V3 100.00 96.66 96.09 96.11 97.09 97.09
Inception-
100.00 96.88 97.17 97.17 97.17 97.16
Resnet-V2
MobileNetV2 100.00 96.94 97.17 97.18 97.17 97.17

classified incorrectly as chinee apple. In addition, the model also incorrectly classified
some lantana and prickly acacia weed images. To overcome this classification problem,
both actual and augmented data were used in the following experiment.

3.3.4 Experiment 4: Training the models using both real and


augmented images of the four datasets

Augmented data were used together with the real data in the training phase to address
the misclassification problem in the previous experiment (Section 3.3.3). All the weed
species and crop plant images had the same training data for this experiment. The models
were initialised with pre-trained weights, and all the parameters were fine-tuned. Table
3.7 shows the result of this experiment.

From Table 3.7, we can see that the training accuracy for all the DL models is 100%.
Also the validation and testing accuracies were reasonably high. In this experiment,
the ResNet-50 models achieved the highest precision, recall and F1 score for the test
data. Figure 3.6 shows the confusion matrix for the ResNet-50 model. We compared the

80
Chapter 3. Weed classification

Figure 3.5: Confusion matrix after combining four dataset using ResNet-50 model

Table 3.7: Performance of five deep learning models after training with the real and
augmented data

Training Validation Testing Precision Recall F1 score


DL model
accuracy accuracy accuracy (Average) (Average) (Average)
VGG16 100.00 97.96 97.83 97.83 97.84 97.83
ResNet-50 100.00 98.31 98.30 98.29 98.30 98.30
Inception-V3 100.00 97.31 98.02 98.02 98.02 98.01
Inception-Resnet-V2 100.00 97.85 97.76 97.76 97.76 97.76
MobileNetV2 100.00 97.68 98.02 98.02 98.02 98.02

performance of the model using the confusion matrix with the previous experiment. The
performance of the model was improved using both actual and augmented data. The

81
3.3. Results and Discussions

classification accuracy increased for chinee apple, lantana, prickly acacia and snakeweed
species by 2%.

Figure 3.6: Confusion matrix for ResNet-50 model using combined dataset with
augmentation

In this research, the ResNet-50 model attained the highest accuracy using actual and
augmented images. The Inception-ResNet-V2 model gave similar results. The explana-
tion is that both of the models used residual layers. Residual connections help train a
deeper neural network with better performance and reduced computational complexity.
A deeper convolutional network works better when trained using a large dataset (Szegedy
et al., 2017). Since we have used the augmented data and actual images, the dataset size
has increased by several times.

82
Chapter 3. Weed classification

3.3.5 Experiment 5: Comparing the performance of two ResNet-


50 models individually trained on ImageNet dataset, and
the combined dataset, and testing on the Unseen Test dataset

In this experiment, we used two ResNet-50 models. The first was trained on our com-
bined dataset with actual and augmented data (Sec. 3.2.1.5). Here, the top layers were
removed from the model and a global average pooling layer and three dense layers were
added as before. Other than the top layers, all the layers used pre-trained weights, which
were not fine-tuned. This model termed as “CW ResNet-50”. The same arrangement was
used for the pre-trained ResNet-50 model, which was instead trained on the ImageNet
dataset. It was named as “SOTA ResNet-50” model for further use. We trained the top
layers of both models using the training split of the Unseen Test Dataset (3.2.1.6). Both
models were tested using the test split of the Unseen Test Dataset. The confusion matrix
for CW ResNet-50 and SOTA ResNet-50 model is shown in Figure 3.7.

(a) Confusion matrix showing the classification (b) Confusion matrix showing the classification
accuracy of CW ResNet-50 model accuracy of CW ResNet-50 model

Figure 3.7: Confusion matrix for CW ResNet-50 and SOTA ResNet-50 model.

We can see in Figure 3.7 that the performance of the two models is very similar. The
“SOTA ResNet-50” model detected all the classes of crop and weeds accurately. However,
the pre-trained “CW Resnet-50” model only identified two images incorrectly. As the
“SOTA ResNet-50” model was trained on a large dataset containing millions of images,
it detected the discriminating features more accurately. On the other hand, the “CW

83
3.4. Conclusion

Resnet-50” model was only trained on 88,500 images. If this model were trained with
more data, it is probable that it would be more accurate using the transfer learning
approach. This type of pre-trained model could be used for classifying the images of new
crop and weed datasets, which would eventually make the training process faster.

3.4 Conclusion

This study was undertaken on four image datasets of crop and weed species collected
from four different geographical locations. The datasets contained a total of 20 different
species of crops and weeds. We used five state-of-the-art CNN models, namely VGG16,
ResNet-50, Inception-V3, Inception- ResNet-V2, MobileNetV2, to classify the images of
these crops and weeds.

First, we evaluated the performance of transfer learning and fine-tuning approaches


of the models by training them on each dataset. The results showed that fine-tuning of
the models could improve classification of the images more accurately than the transfer
learning approach.

To add more complexity to the classification problem, we combined the datasets


together. After combining two of the datasets, the performance decreased due to some of
the species of weeds in the “DeepWeeds” dataset. The weed species that were confused
were chinee apple, snakeweed, lantana and prickly acacia. We then combined all four
datasets to train the models. Since the dataset was class-imbalanced, it was difficult
to achieve high classification accuracy by only training the model with actual images.
Consequently, we used augmentation to balance the classes of the dataset. However, it
was evident that the models had problems in distinguishing between chinee apple and
snakeweed. The performance of the models improved using both actual and augmented
data. The models could distinguish chinee apple and snake weed more accurately. The
results showed that the ResNet-50 was most accurate.

Another finding was that using the transfer learning method was that in most cases
the models did not achieve the desired accuracy. As ResNet-50 was the most accurate
system, we ran a test using this pre-trained model. The model was used to classify the
images of a new dataset using the transfer learning approach. Although the model was

84
Chapter 3. Weed classification

not more accurate than the state-of-the-art pre-trained ResNet-50 model, it was very
close to that. We could expect a higher accuracy using the transfer learning approach if
the model can be trained using a large crop and weed dataset.

This research shows that the data augmentation technique can help address the class
imbalance problem and add more variations to the dataset. The variations in the images of
the training dataset improve the training accuracy of the deep learning models. Moreover,
the transfer learning approach can mitigate the requirement of large data sets to train the
deep learning models from scratch. The pre-trained models are trained on a large dataset
to capture the detailed generalised features from the imagery, e.g., ImageNet in our case.
However, because, ImageNet data set was not categorically labelled for weeds or crops,
fine-tuning the pre-trained weights with crop and weed datasets help capture the dataset
or task-specific features. Consequently, fine-tuning improves classification accuracy.

For training a deep learning model for classifying images, it is essential to have a large
dataset like ImageNet (Deng et al., 2009) and MS-COCO (Lin et al., 2014). Classification
of crop and weed species cannot be generalised unless a benchmark dataset is available.
Most studies in this area are site-specific. A large dataset is needed to generalise the
classification of crop and weed plants, and as an initial approach, large datasets can be
generated by combining multiple small datasets, as demonstrated here. In this work, the
images only had image-level labels. A benchmark dataset can be created by combining
many datasets annotated with a variety of image labelling techniques. Generative Ad-
versarial Networks (GANs) (Goodfellow et al., 2014) based image sample generation can
also be used to mitigate class-imbalance issues. Moreover, it is needed to develop a crop
& weed dataset annotated at the object level. For implementing a real-time selective
herbicide sprayer, the classification of weed species is not enough. It is also necessary to
locate the weeds in crops. Deep learning-based object detection models can be used for
detecting weeds.

85
Chapter 4

Real-time weed detection and


classification

Weeds can decrease yields and the quality of crops. Detection, localisation, and classi-
fication of weeds in crops are crucial for developing efficient weed control and management
systems. Deep learning (DL) based object detection techniques have been applied in var-
ious applications. However, such techniques generally need appropriate datasets. Most
available weed datasets only offer image-level annotation, i.e., each image is labelled with
one weed species. However, in practice, one image can have multiple weed (and crop)
species and/or multiple instances of one species. Consequently, the lack of instance-level
annotations of the weed datasets puts a constraint on the applicability of powerful DL
techniques. In the current research, we construct an instance-level labelled weed dataset.
The images are sourced from a publicly available weed dataset, namely the Corn weed
dataset. It has 5,997 images of Corn plants and four types of weeds. We annotated the
dataset using a bounding box around each instance and labelled them with the appropri-
ate species of the crop or weed. Overall, the images contain about three bounding box
annotations on average, while some images have over fifty bounding boxes. To establish
the benchmark dataset, we evaluated the dataset using several DL models, including
YOLOv7, YOLOv8 and Faster-RCNN, to locate and classify weeds in crops. The per-

This chapter has been published: Hasan, A. M., Diepeveen, D., Laga, H., Jones, M. G., & Sohel,
F. (2024). Object-level benchmark for deep learning-based detection and classification of weed species.
Crop Protection, 177, 106561.
Chapter 4. Real-time weed detection and classification

formance of the models was compared based on inference time and detection accuracy.
YOLOv7 and its variant YOLOv7-tiny models both achieved the highest mean average
precision (mAP) of 88.50% and 88.29% and took 2.7 and 1.43 milliseconds, respectively,
to classify crop and weed species in an image. YOLOv8m, a variant of YOLOv8, detected
the plants in 2.2 milliseconds with the mAP of 87.75%. Data augmentation to address
the class imbalance in the dataset improves the mAP results to 89.93% for YOLOv7 and
89.39% for YOLOv8. The detection accuracy and inference time performed by YOLOv7
and YOLOv8 models in this research indicate that these techniques can be used to develop
an automatic field-level weed detection system.

4.1 Introduction

Global agriculture today faces many challenges such as, a reduction in cultivable land,
lack of water, and abiotic and biotic issues such as frost, heat, pests, diseases and weeds
(Amrani et al., 2023b; Haque & Sohel, 2022; J. Liu et al., 2021; Raj et al., 2021; Shammi
et al., 2022). Weeds are one of the major constraints that can significantly reduce crop
performance by competing for resources such as water, sunlight, nutrition and growing
space (DOĞAN et al., 2004; Gao et al., 2018). Each year, farmers around the world
invest large amounts of time, money, and resources to prevent yield losses from weed
infestations. For instance, Australia alone spends about AUD 4.8 billion annually to
control weeds (Chauhan, 2020).

There are several control approaches, which include preventive, cultural, mechanical,
biological, and chemical methods. However, farmers rely mostly on chemical methods by
applying herbicides at the early growth stage of the crop (Harker & O’Donovan, 2013;
López-Granados, 2011). Oftentimes, weeds infest fields in small patches rather than the
whole field (Rew & Cousens, 2001), whereas herbicides are usually broadcasted across
the entire field rather than only where the weeds are. This increases the costs and is
also potentially hazardous for humans and the environment. According to the European
Food Safety Authority (EFSA), most raw agricultural commodities i.e., fruit, grain and
livestock feed contains herbicide residuals, which may have long-term ramifications for
human health, soil health, and wildlife well-being (Medina-Pastor & Triacchini, 2020).

87
4.1. Introduction

To apply appropriate weed-management strategies, detecting weeds and recognising


their species are important (López-Granados, 2011). Despite these disadvantages, herbi-
cides remain an important tool for preventing large yield losses from weed infestations
(Deutsch et al., 2018; Partel et al., 2019a). There are several challenges to developing an
automatic selective herbicide spraying systems. The first one is automatically detecting
weeds in crops since most weed species share the similar colour, texture and shape as
crops. Classification of weed species is also required to apply the appropriate herbicide
with variable doses (Raja et al., 2020). Also, variable rate application of agrochemicals
reduces economic losses and prevents environmental contamination (da Costa Lima &
Mendes, 2020; Grisso et al., 2011; Raja et al., 2020).

Artificial Intelligence (AI) techniques have been used in several commercial solutions
to manage and control weeds by minimising herbicide use. For example, the “Robocrop
Spot Sprayer” (“Robocrop Spot Sprayer: Weed Removal”, 2018) is a video analysis-based
autonomous selective spraying system that can selectively spray potatoes grown in car-
rots, parsnips, onions or leeks. The “WEED-IT” (“Precision Spraying - Weed Sprayer”,
n.d.) and the “WeedSeeker sprayer” (“WeedSeeker 2 Spot Spray System”, n.d.) sprayers
can target all living green materials on soil and apply herbicide on them. The prob-
lem with these systems is that they are not designed to detect and recognise individual
species of in-crop weeds (Hasan et al., 2023a, 2023b; Z. Wu et al., 2021), i.e., one image
can include multiple weeds of different species.

Several AI based solutions for selective herbicide spraying systems have been proposed
to reduce the use of chemicals in the field (Alam et al., 2020; N. Hussain et al., 2020; Raja
et al., 2020; Ruigrok et al., 2020). Using computer vision-based machine learning (ML)
techniques to detect and classify weeds in crops is difficult to implement. Traditional
ML approaches use predefined features such as plant patterns, colour, shape and texture
to distinguish crops from weeds (Bakhshipour et al., 2017; Hamuda et al., 2017; Jafari
et al., 2006; Kazmi et al., 2015b; P. Li et al., 2013; Zheng et al., 2017). However, Deep
Learning (DL) technique, which is a type of ML, can learn discriminating features and are
being used in many real-time object detection and classification problems. The adoption
of deep learning for weed detection is becoming more popular (D. Chen et al., 2022a;
Kamilaris & Prenafeta-Boldú, 2018; Reedha et al., 2021; Z. Wu et al., 2021); for instance
Hasan et al. (2021) reported that the use of DL methods for detection, localisation, and

88
Chapter 4. Real-time weed detection and classification

classification of weeds in crops has significantly increased since 2016 due to their detection
accuracy and speed.

There have been a number of studies that have compared the performance of DL
techniques with the traditional ML approaches to classify images of crop and weed plants;
the result showed that DL methods outperformed traditional methods. Researchers have
proposed several DL methods for classifying images of crop plants and weed species (dos
Santos Ferreira et al., 2017; Farooq et al., 2018b; H. Huang et al., 2018c; Nkemelu et al.,
2018). Espejo-Garcia et al. (2020) compared the performance of several state-of-the-art
deep learning architectures such as Xception, Inception-ResNet, VGGNet, MobileNet
and DenseNet to classify weeds in cotton and tomato. Yu et al. (2019b), Yu et al.
(2019a) and Lammie et al. (2019) performed comparative studies of different DL models
on their respective datasets. Pre-trained models, such as Inception-v3 and ResNet50, were
applied by Olsen et al. (2019) to categorise the images of eight species of weed found in
Australian rangeland. A. Ahmad et al. (2021) evaluated the performance of VGG16,
ResNet50 and Inception-v3 architectures for classifying weeds in Corn (Zea mays) and
soybean production systems using Keras and PyTorch framework. H. Jiang et al. (2020)
used the Graph Convolutional Network (GCN) to classify the images in the Corn weed
dataset. They have introduced this dataset along with the lettuce weed dataset and tested
the model. The datasets contained image-level annotation, which means each image has
only one label. Hasan et al. (2023b) also used the Corn weed dataset for training and
classifying the images using several state-of-the-art deep learning models.

However, image classification approaches do not localise the weeds and crops in an
image, which is required to develop a real-time selective spraying system. Besides, if
an image contains multiple instances of weeds and crops, then the classification will not
be appropriate. Object detection techniques are required to overcome these limitations.
The aforementioned deep learning models are based on Convolutional Neural Networks
(CNN). A CNN is a deep learning model, which can automatically learn and extract
features from pixel data of images for classifying or recognising them.

Based on the number of learning steps, object detection techniques can broadly be
classified into two categories: single-stage and two-stage detection methods (J. Huang
et al., 2017). Typically, an object detection process consists of two tasks: identifying the

89
4.1. Introduction

regions of interest (ROI) and then localising and classifying them. The two-stage detec-
tors divide the process into region proposal and classification stages. At first, the models
extract several ROIs called object proposals, and then classification and localisation are
performed only on these proposed regions. It is like first looking at the images, extracting
the interesting regions, and then analysing only the interesting regions. R-CNN (Girshick
et al., 2014), Fast R-CNN (Girshick, 2015), Faster R-CNN (Ren et al., 2015), Mask R-
CNN (K. He et al., 2017) and Cascade R-CNN (Cai & Vasconcelos, 2018) are examples
of widely used two-stage object detection models. A single-stage detector predicts boxes
and simultaneously classifies objects. Single Shot Detector (SSD) (W. Liu et al., 2016),
Detectron2 (Y. Wu et al., 2019), MMDetection (K. Chen et al., 2019) and You Only Look
Once (YOLO) (Redmon & Farhadi, 2017) are examples of commonly used single-stage
object detection models. Single-stage detectors are faster in inference and computation-
ally efficient compared to two-stage object detection techniques. However, single-stage
methods cannot achieve high accuracy for images with extreme foreground-background
imbalance (Carranza-García et al., 2021; J. Huang et al., 2017).

Sivakumar et al. (2020) compared the performance of Faster R-CNN, SSD and patch-
based Convolutional Neural Network (CNN) models for detecting weeds in soybean fields.
Although Faster R-CNN and SSD models showed similar performance based on the met-
rics and inference time, the optimal confidence threshold for the SSD model was lower
than Faster R-CNN. Faster R-CNN with Inception-ResNet-v2 as a feature extractor was
also proposed by Y. Jiang et al. (2019) for detecting weeds in crops. Patidar et al. (2020)
applied Mask R-CNN architecture and Fully Convolutional Network (FCN) on a public
dataset known as the “Plant Seedling Dataset” (Giselsson et al., 2017). M. H. Saleem
et al. (2022) used Faster-RCNN with ResNet-50 and ResNet-101 model to detect and
classify weeds. Le et al. (2021) and Quan et al. (2019) also proposed a Faster-RCNN
model for detecting weeds in crops.

Osorio et al. (2020) argued that the YOLOv3 model detected weeds in lettuce crops
more accurately than Mask R-CNN using multispectral images. Gao et al. (2020) trained
both YOLOv3 and tiny YOLOv3 (Redmon and Farhadi 2018) models to detect C. sepium
and sugar beet. Although the complete architecture of YOLOv3 performed better than
tiny YOLOv3, it required less inference time. Since tiny YOLOv3 has less number of
convolutional layers, it occupies less number of resources and thus reduces the inference

90
Chapter 4. Real-time weed detection and classification

time. Sharpe et al. (2020) also proposed the tiny YOLOv3 model to localise and classify
goosegrass, strawberry and tomato plants. Espinoza et al. (2020) showed that YOLOv3
achieved higher detection accuracy in less inference time than Faster R-CNN and SSD
models. Y. Li et al. (2022) also obtained better detection accuracy using YOLOX (Ge
et al., 2021) model on a crop-weed dataset (Sudars et al., 2020) containing eight weed
species and six different food crops.

Partel et al. (2019a) reported that the performance of YOLOv3 and tiny YOLOv3
models depended on computer’s hardware configuration. Their research aim was to
develop a cost-effective and smart weed management system. Although the YOLOv3
model achieved higher classification and localisation accuracy, they preferred to use tiny
YOLOv3 architecture to develop the autonomous herbicide sprayer. The tiny YOLOv3
model was compatible with less expensive hardware, performed better in real-time ap-
plications and had good accuracy. The reasons, as mentioned earlier, also inspired N.
Hussain et al. (2020) and W. Zhang et al. (2018) to use this model. In contrast, Czymmek
et al. (2019) argued the full version YOLOv3 performed better for their small dataset.
They doubled the default height and width of the input image (832 × 832 pixels) for train-
ing the model. Although tiny YOLOv3 worked faster, they did not want to compromise
the accuracy.

Dang et al. (2023) compared the performance of seven YOLO model versions. The
research showed the competence of YOLO models for real-time weed detection and clas-
sification tasks. The YOLOv4 model achieved the highest mAP of 95.22%. On the other
hand, Sportelli et al. (2023) reported that the performance of YOLOv7 and YOLOv8 were
very similar in detecting turfgrasses. Although YOLOv8 showed some improvement, the
difference was not significant. Abuhani et al. (2023) also agreed that the performance
of YOLOv7 and YOLOv8 was similar while detecting the weeds in sunflower and sugar
beet plants.

The literature clearly shows that Deep Convolutional Neural Network (DCNN) is
suitable for developing a real-time weed detection, localisation and classification system.
However, there is a lack of benchmark datasets of crops and weed species annotated at
the individual object level P. Wang et al. (2022). In addition, a comparative study of ex-
isting object detection techniques can help develop a real-time weed management system.

91
4.2. Materials and Methods

Therefore, the objectives of this paper are (1) to relabel an existing Corn weed dataset
with object level annotations and repurpose it for localising and classifying different
species from imagery and (2) to evaluate the performance of single-stage and two-stage
object detection models for localising and classifying weeds in crop in real-time.

4.2 Materials and Methods

The pipeline for detecting and classifying weeds in crop is shown in Figure 4.1. Here,
we have annotated the data, prepared the data for training, and then train and evaluate
the trained models. The steps are described as follows.

4.2.1 Prepare Dataset

4.2.1.1 Dataset Description

The dataset used in this paper is made available publicly by H. Jiang et al. (2020). It
contains Corn (Zea mays) plants and four weed species images: Bluegrass (Poa praten-
sis), Goosefoot (Chenopodium album), Thistle (Cirsium setosum) and Sedge (Cyperus
compressus). The original dataset stored in the Github repository (https://github.com/
zhangchuanyin/weed-datasets) contains 5,997 images of five classes. Each class has 1,200
images except Sedge weeds, which has 1,197 images. H. Jiang et al. (2020) collected
the dataset from an actual Corn field. They used a Cannon PowerShot SX600 HS
(https://www.canon.com.au/) camera to acquire images. The camera was placed ver-
tically towards the ground to reduce the influence of sunlight. As displayed in Figure 4.2,
the images have different soil backgrounds (e.g., moisture and wheat straw residue) and
light illumination. Changes in lighting conditions and backgrounds add complexity to
the dataset and affect the performance of deep-learning models (J. Liu & Wang, 2021).
All the images have a dimension of 800 × 600 pixels. Figure 4.4 shows example images
of each class as annotated by H. Jiang et al. (2020).

92
Chapter 4. Real-time weed detection and classification

Data annotation

Annotate weeds and


crops using LabelImg

Label in XML format Convert labes into


.TXT format for YOLO

Data description

Annotated dataset

Train
(Original dataset) Valid Test
(10%) (10%)
(80%)

Augment all train images Augment selected train images


(Imbalanced dataset) (Balanced dataset)

Training and validation

YOLOv7 YOLOv8 Faster-RCNN

Performance evaluation

Figure 4.1: The proposed pipeline for detecting weeds in crops

4.2.1.2 Object level data annotation

H. Jiang et al. (2020) labelled the dataset using image-level annotation techniques,
which means the entire image is identified using a single label. This approach is suitable
for identifying a single object in the image; everything else is considered background.
However, in this dataset, most images contain more than one plant, and some also have
multiple crop and weed species. For instance, in Figure 4.3, we have two images. Although
there were three Bluegrass, two Corn and one Goosefoot plant in the first image (Figure
4.3(a)), it was labelled as Bluegrass. Similarly, the second image (Figure 4.3(b)) was

93
4.2. Materials and Methods

(a) Corn in different growth stage (b) Corn with other plants (c) Multiple plants of Bluegrass

(d) Several Goosefoot plant with (e) Image of Thistle containing (f) Image representing Sedge weed
other weed species other plants along with some have other species of weed and Corn
Unknown weeds plants

Figure 4.2: Sample crop and weed images of each class from the dataset.

annotated as Sedge. However, the image has other plants which are not similar to the
Sedge plant. Besides, there are three plants which have no similarity with the plant of
any of the five classes of this dataset. We have labelled them as “Unknown” plants.

H. Jiang et al. (2020) labelled the images like Figure 4.2a and Figure 4.2b as “Corn”.
However, Figure 4.2a has multiple instances of Corn plants at different growth stages and
Figure 4.2b contains Bluegrass and Corn plants. Although Figure 4.2c was annotated
as “Bluegrass”, the plants in it were not identified separately. Figure 4.2d also contains
multiple instances of “Goosefoot” and other weed species. There are several Unknown
plants, along with the five crop and weed species, which are not labelled in the dataset.
According to H. Jiang et al. (2020), Figure 4.2e is an image of “Thistle”, although it
contains other plants as well. Since Figure 4.2f is annotated as “Sedge”, one would expect
that it contains only that plant. However, it also has other weeds and crop plants. To
illustrate the data labelling complexity, we have provided few more example images in
Figure 4.4.

Developing an efficient object detection and classification system requires annotating

94
Chapter 4. Real-time weed detection and classification

(a) Bluegrass

(b) Sedge

Figure 4.3: The images in left is the original image with only one label and right one is
our label for that image. Figure (a) and (b) were originally labelled as Bluegrass and
Sedge respectively.

each object in the image. Since our goal is to locate and classify weed and crop plants for
developing selective sprayers, we have re-annotated the “Corn weed dataset” (H. Jiang
et al., 2020) again using bounding boxes; see Figure 4.4 for some annotation examples.
The bounding boxes cover the whole plant for all crop and weed species except Sedge.
The leaves of Sedge weeds are narrow and long, covering a wider area of an image and
are more likely to overlap with other plants. We label them by keeping the stem in the
middle and covering as much area as possible without overlapping with other annotated
plants in an image. In the dataset, we have bounding boxes that may have overlapped.
According to Yoo et al. (2015), occlusion of objects is likely to occur in situations where
weeds may grow over each other. The overlapping of bounding boxes may affect the
accuracy of the model. However, weed detection aims to apply treatments to control
weeds (e.g., herbicides). Although some bounding boxes overlap, we have annotated all
the available objects in the image to detect the weeds in the crop.

In this research, we used 80% of the data for training and validation and 20% data
for testing. Table 4.1 shows the number of instances in each class used for training and
testing the model. We found 16,620 plant objects in 5997 images. From the table, each

95
4.2. Materials and Methods

Figure 4.4: Bounding box annotation of crop and weed images. It shows that the
dataset images contain multiple crop or weed plants of more than one class.

image contains around three labels on average. Some of the Corn images have more than
50 plants.

96
Chapter 4. Real-time weed detection and classification

Table 4.1: Number of images and annotations for each class of crop and weed species.

Crop and weed species Number of Images Number of objects

Corn (Zea mays) 1200 7194

Bluegrass (Poa pratensis) 1200 3486

Goosefoot (Chenopodium album) 1200 1568

Thistle (Cirsium setosum) 1200 1569

Sedge (Cyperus compressus) 1197 1681

Unknown 0 1121

Total 5997 16619

4.2.1.3 Data Augmentation

From Table 4.1, it is clear that the dataset is imbalanced. Moreover, the objects
belonging to the “Unknown” class have differences in shape, colour and texture (intra-
class dissimilarity). This can affect the performance of deep learning model (Y. Li et
al., 2020; Lin et al., 2017). According to Qian et al. (2020) and Zoph et al. (2020),
data augmentation can improve the performance of object detection models with an
imbalanced dataset. This research has augmented the data to address the class imbalance
issue.

We have considered two scenarios of data augmentation in this study, i.e., augmenting
the entire training dataset four times (it will be termed as “All augmentation”) and aug-
menting the images of the training dataset containing any of the four classes of objects:
Goosefoot, Thistle, Sedge and Unknown (it will be termed as “Selective Augmentation”).
The Albumentation package (Buslaev et al., 2020) was used to perform image augmen-
tation. Ten well-known image geometric and photometric transformations, namely, ran-
dom rotation, horizontal flip, vertical flip, blur (Hendrycks & Dietterich, 2019), random
brightness and contrast, Gaussian noise, multiplicative noise, RGB shift, compression
and Fancy Principle Component Analysis (PCA) (Krizhevsky et al., 2017) were used
here. Four (randomly selected) of the ten transformation techniques were applied for
each training image to get the augmented images. The outcome of one image after using
the image augmentation approach is shown in Figure 4.5.

In this study, we used 80% of the data for training, 10% for validation and 10% data
for testing. The table 4.2 shows the number of objects for training, validation and testing.

97
4.2. Materials and Methods

(a) Original Image (b) Random roation (c) Horizontal flipping (d) Vertical flipping

(e) Blurring (f) Brightness and (g) Adding Gaussian (h) Adding
Contrast change noise multiplicative noise

(i) RGB shifting (j) Compression (k) Applying fancy PCA

Figure 4.5: Illustration of the ten image augmentation techniques applied on one of the
training image.

Although the classes of objects with fewer samples increase after augmenting the entire
training set, the dataset remained imbalanced in the first scenario. In the second scenario,
the dataset becomes quite balanced.

Table 4.2: The number of objects used to train (before and after augmentation),
validate and test the models.

Training data Validation Testing


Classes
Original Augmenting Augmenting data data

data all images selected images

Bluegrass 2828 (21.27%) 11312 (21.27%) 6425 (19.51%) 372 286

Goosefoot 1250 (9.40%) 5000 (9.40%) 5000 (15.18%) 153 165

Thistle 1227 (9.23%) 4908 (9.23%) 4908 (14.91%) 172 170

Corn 5756 (43.29%) 23024 (43.29%) 7664 (23.26%) 694 744

Sedge 1336 (10.05%) 5344 (10.05%) 5344 (16.22%) 183 162

Unknown 899 (6.76%) 3596 (6.76%) 3596 (10.91%) 108 114

Total 13296 53184 32937 1682 1641

98
Chapter 4. Real-time weed detection and classification

4.2.2 Object Detection

In this research, we used two well-known object detection models: the You Only Look
Once (YOLO) model (Bochkovskiy et al., 2020) and Faster-RCNN (Ren et al., 2015).
We have chosen one single-stage and one two-stage object detector for this research. The
models were trained using the Corn and weed dataset to detect, localise and classify the
plants in an image.

4.2.2.1 YOLO model

YOLO is a state-of-the-art single-stage object detection model that considers the


detection task a regression problem. YOLO can detect multiple objects in a single image.
For object detection, this technique divides the image into a grid. At first, each grid cell
predicts the probability of a class. Then, the model finds the coordinates of the bounding
boxes and calculates the confidence score of each bounding box. Being a single-stage
detector, YOLO models achieve high computational speed (W. Liu et al., 2016). In
this research, we have used two of the latest variants of YOLO model: YOLOv7 and
YOLOv8. YOLOv7 is considered the official YOLO model by the computer vision and
machine learning communities since it was developed by the same group of researchers
who proposed this idea for the first time (Sinapan et al., 2023). On the other hand,
YOLOv8 was developed by (Jocher et al., 2023b), which has the best characteristics of
several real-time object detection approach (Lou et al., 2023).

YOLOv7 model exceeds all the previous object detection techniques, including the
earlier versions of YOLO algorithms, in terms of speed and accuracy (C.-Y. Wang et al.,
2023a). We evaluated the performance of YOLOv7 and YOLOv7-tiny models in this
study. YOLOv7-tiny (6.2 million parameters) is a compressed version of the original
YOLOv7 (36.9 million parameters). YOLOv7-tiny is suitable for real-time applications
and can be implemented on devices with low computational power. However, YOLOv7
offers higher accuracy. The models were trained with MS COCO (Microsoft Common
Objects in Context) (Lin et al., 2014), and the pre-trained weights were made available
(https://www.kaggle.com/datasets/parapapapam/yolov7-weights) to use. We train the
models (using pre-trained weights) with our dataset and compare the performance of

99
4.2. Materials and Methods

YOLOv7 and YOLOv7-tiny for detecting, localising and classifying weeds in the Corn
crop.

YOLOv8 is the latest iteration of YOLO models, which exhibits higher performance
in terms of accuracy and speed. The YOLOv8 introduced several improvements, such as
mosaic augmentation, C3 convolutions and anchor-free detection, to improve the perfor-
mance and inference speed. In this study, we have evaluated all five variants of the model:
YOLOv8n (3.2 million parameters), YOLOv8s (11.2 million parameters), YOLOv8m
(25.9 million parameters), YOLOv8l (43.7 million parameters) and YOLOv8x (68.2 mil-
lion parameters). YOLOv8n is the smallest among them, yet the fastest one. Although
YOLOv8x provides the most accurate result, the model takes more time to train and
detect objects in the image.

4.2.2.2 Faster-RCNN

Faster-RCNN (Ren et al., 2015) is a two-stage region proposal-based object detection


model. The performance of Faster-RCNN was improved from its previous versions (R-
CNN and Fast R-CNN) by introducing a fully convolutional Region Proposal Network
(RPN). In this study, we used the ResNet-50-FPN model as the backbone. The advan-
tages of adopting the pre-trained models are the opportunity to use a state-of-the-art
architecture and leverage the transfer learning technique. Although these models are
trained to detect generic objects, we can fine-tune them to detect custom objects. In
this study, these models are retrained to localise, recognise and classify crop and weed
plants. The model has around 44 million parameters which consume a massive amount
of memory. However, it provides better detection quality and higher accuracy (Y. He
et al., 2019a).

4.2.3 Evaluation metrics

The widely used evaluation metrics for object detection models are: Precision, Recall,
Intersection over Union, Average Precision and Mean Average Precision (Padilla et al.,
2020). These metrics were used in this paper to evaluate the performance of the models.
Before discussing these metrics, we first present some basic concepts:

100
Chapter 4. Real-time weed detection and classification

• True Positive (TP): A correct detection of an object that matches the ground truth
is TP. Detection will be a true positive if the confidence score and the IoU value of
the predicted bounding box, the ground truth are higher than the threshold, and
the predicted class matches the class of the ground truth.

• False Positive (FP): A prediction will be a false positive if the predicted class does
not match with the ground truth class or the IoU value of the predicted bounding
box with the ground truth is less than a predefined threshold.

• False Negative (FN): A detection is counted as a false negative if the model fails to
detect a ground truth bounding box.

• True Negative (TN): If a model detects an object that is not supposed to be de-
tected, it will be considered as a true negative prediction. Since there are many
bounding boxes predicted by the model with different confidence scores, true neg-
ative results is not applicable in the object detection context.

The evaluation metrics that are used in this paper are explained below:

• Precision: It is the ratio of the correctly detected objects (TP) and the total number
of objects predicted by the model (sum of TP and FP). Precision measures the
correct positive prediction of the model. It represents how accurate the model is to
predict the objects.
TP
P = .
TP + FP

• Recall: The ratio of the total number of correctly detected objects (TP) and all
ground-truth objects (sum of TP and FN) are called recall. The recall measures
the ability of the model to detect objects that are identified in the ground truth
dataset. It represents how good the model is in finding the correct objects.

TP
R= .
TP + FN

• Intersection over Union: This metric evaluates the model’s ability to localise an
object and also determines whether the detection is correct or not. In the object
detection context, the IoU is the ratio of the overlapping area and the area of union

101
4.2. Materials and Methods

between the ground truth bounding box and the box predicted by the model. If the
IoU of a bounding box is equal or greater than a specific threshold value then the
detection is considered as correct (i.e., true positive), otherwise incorrect or (i.e.,
false positive).
Area of intersection
IoU = .
Area of Union

• Average Precision: Average precision is derived from precision and recall. It is


calculated separately for each class. AP is the precision averages for all recall values
at various confidence thresholds. There are two approaches to find the average
precision: the 11-point interpolation and all-point interpolation. In the 11-point
interpolation, AP is the average of the maximum precision values at a set of 11
equally spaced recall levels [0.0, 0.1, 0.2, 0.3, ..., 1.0]. Alternatively, the AP is
obtained by interpolating the precision at each level of recall values in the all-point
interpolation method.

• Mean Average Precision: The mean average precision is the mean of APs for all ob-
ject categories. This metric is used to evaluate the accuracy of the object detection
model over all classes in a dataset (W. Liu et al., 2016).

K-fold cross-validation is a common technique used to evaluate the performance of


deep learning models. Cross-validation helps assess the model’s performance in different
scenarios by dividing the dataset into multiple subsets and training and testing the model
multiple times. It evaluates the model’s robustness and generalisation across different
data distributions. Moreover, by averaging the evaluation metrics across multiple folds,
a more reliable estimate of the model’s performance can be achieved (Raschka, 2018). In
this study, we applied five-fold cross-validation to evaluate the performance of the models
where the dataset was split into five subsets using different random seeds. The models
were trained and tested five times, each time using a different subset as the validation
set. This provided a good balance between computational cost and reliable performance
estimation.

102
Chapter 4. Real-time weed detection and classification

4.2.4 Computing Environment

All the experiments were performed on a desktop computer with an Intel(R) Core(TM)
i9-9900X processor, 128 gigabytes of RAM and an NVIDIA GeForce RTX 2080 Ti Graph-
ics Processing Unit (GPU). We used the Professional Edition of the Windows 10 operating
system. To implement the weed detection and classification system, Python 3.81 with
OpenCV 4.52 and Pytorch (version 1.10) (Paszke et al., 2019) library was used. The
CUDA 11.1 and CUDNN 8.0.4 were installed in the system to fully use the GPU and
accelerate the model’s training.

All images were resized to the spatial resolution of 416 × 416 pixels for training and
inferencing. We used a batch size of 32 for all the models, and the models were trained
for 100 epochs (we let the models be trained for up to 300 epochs, but the results were
not improved).

4.2.4.1 Training and inference time

The training time and inference time (for weed detection and classification) depend on
the hardware’s availability and the model’s computational complexity. Table 4.3 shows
the training and inference times for the models. For developing an automatic spray-rig to
apply herbicide on a specific weed, the inference time has a significant role in the vehicle
speed. The vehicle speed also depends on other factors, such as the distance between
the camera and the spraying nozzle, the height of the nozzle from the target weed and
the time required to calibrate and spray the herbicides on the weed (Alam et al., 2020;
DPIRD, 2021). The following expression summarises that.

Distance between camera and spraying nozzle + Height of the nozzle from target weed
Vehicle Speed =
Inference time + Time required to spray the chemicals

The expression shows that by reducing the inference time, we can increase the vehicle
speed and allow more time for the system to apply herbicide on target weeds more
accurately. For example, let us consider the distance between the camera and the spraying
nozzle is 2 metres and the height of the nozzle from the ground is 1 metre (Alam et al.,
1
https://www.python.org/downloads/release/python-380/
2
https://opencv.org/opencv-4-5-0/

103
4.3. Results

2020). Now for a vehicle moving at a speed of 25 kilometres per hour (“Precision Spraying
- Weed Sprayer”, n.d.), a system will have 430 milliseconds to spray the chemicals on a
target. A sprayer may take about 100 milliseconds to spray the herbicide (Alam et al.,
2020). If the image capturing rate is 15 images per second, then the system will have
about 22 milliseconds to infer one image. A higher vehicle speed can be achieved by
increasing the distance between the camera and the spraying nozzle and/or reducing the
image rate per second.

4.3 Results

The models were trained with a custom crop weed dataset as presented in Section
4.2.1. The results of the experiments are discussed here.

4.3.1 Comparison on training and inference time

Table 4.3 shows that the YOLOv8x model has the highest number of parameters
(68.2 million), which is much higher than Faster-RCNN. However, the Faster-RCNN
model took longer than others to be trained. Moreover, most of the variations of the
YOLOv8 model took less training and inference time than the original YOLOv7 model.
Although the YOLOv7-tiny model is faster, YOLOv8n is the fastest among all.

YOLOv7 model took 5.4 to 11.3 hours to train with 36 million parameters, and the
inference time varies from 2.7 to 3.1 milliseconds. On the other hand, YOLOv8l and
YOLOv8x contain more trainable parameters yet take less time to train and detect weed
in an image. According to M. Hussain (2023), the YOLOv8 models reduced the use of
the mosaic augmentation technique, which improved the training time. The study also
reported that the use of C3 convolutions and an anchor-free detection approach helped
to increase the inference speed.

Based on the inference time reported in our study, both YOLOv7 and YOLOv8 models
are suitable for real-time applications. Since Faster-RCNN is a two-stage object detection
model, it takes around 95 milliseconds to detect objects in an image. That is why this
model may not be suitable for real-time operation.

104
Chapter 4. Real-time weed detection and classification

Table 4.3: Training and inference time for the models.

Data Number of parameters Training time Inference time for each


Model
augmentation (Millions) (Hours) image (Millisecond)

YOLOv7 36 5.4 2.7

YOLOv7-tiny 6 2.6 1.43

YOLOv8n 3.2 1.28 1.39

YOLOv8s 11.2 1.89 1.7


No
YOLOv8m 25.9 2.38 2.2

YOLOv8l 43.7 2.9 2.9

YOLOv8x 68.2 2.6 3.8

Faster-RCNN 43 18.3 94.6

YOLOv7 36 6.7 2.9

YOLOv7-tiny 6 3.7 1.5

YOLOv8n 3.2 1.89 1.41

YOLOv8s 11.2 2.72 1.73


Selective
YOLOv8m 25.9 3.5 2.21

YOLOv8l 43.7 4.27 2.95

YOLOv8x 68.2 5.05 3.79

Faster-RCNN 43 23.4 96.07

YOLOv7 36 11.3 3.1

YOLOv7-tiny 6 6.25 1.6

YOLOv8n 3.2 3.11 1.43

YOLOv8s 11.2 4.12 1.77


All
YOLOv8m 25.9 5.17 2.25

YOLOv8l 43.7 6.22 2.99

YOLOv8x 68.2 7.28 3.88

Faster-RCNN 43 25.7 95.02

4.3.2 Performance of the models on training dataset

According to results from the five-fold cross-validation experiment, eight models (YOLOv7,
YOLOv7-tiny, YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x and Faster-RCNN)
were trained on the dataset for 100 epochs. We have picked the best training carves for
each model. The training curves for all the models in the three scenarios are shown in
Figure 4.6.

In the first case, where the model was trained with the original data, the performance
of the YOLO models was very close. The mAP achieved by the models was from 87.1%

105
4.3. Results

to 89.31%. The mAP of YOLOv7 was the highest, and YOLOv8n was the lowest. Al-
though the performance of the YOLOv8 models was improved by increasing the training
parameters, the YOLOv8l (88.24%) model achieved a better result than the YOLOv8x
(87.77%). However, the mAP of the Faster-RCNN model was 78.82% while trained on
the original dataset, which was much lower than the YOLO models.

The performance of the models was improved by training with augmented data. The
training mAP was increased by 0.3% to 1.83% by augmenting the images with objects of
minority classes (Selective augmentation). The performance of the models was improved
more by augmenting the entire training dataset (all augmentation). The improvement
of the mAP ranged from 0.4 to 2.5 with respect to the training model with real images
only. YOLOv7 model achieved the highest mAP of 90.01% (Selective augmentation)
and 90.25% (all augmentation). The YOLOv8l model (88.96%) performed better among
the YOLOv8 models with selective augmentation, while the YOLOv8x model (89.43%)
achieved the highest by augmenting the entire training set. Although the mAP of the
Faster-RCNN model was the lowest among all, the model performance was improved when
trained on the augmented data. It achieved 79.14% and 81.30% accuracy by augmenting
the selected classes and the entire training set, respectively.

4.3.3 Comparison of the models’ accuracy on the test data

The evaluation results of the models on the test dataset is provided in Table 4.4 in
terms of average precision by class and mean average precision. The evaluation results
shown here are the mean on five-fold cross validation test of different models.

The YOLOv7 model trained on the original images only achieved the highest mAP
of 88.50%. The average mAP of the YOLOv7-tiny model (88.29%) is very close to that.
On the other hand, the YOLOv8m model yielded the highest mAP of 87.75% among
the YOLOv8 models. Conversely, no significant differences in detection accuracy were
observed among the YOLOv8 models. Moreover, the YOLOv8 model’s performance in
detecting the weeds of Unknown class is much lower than the YOLOv7 model. The
Faster-RCNN model achieved the lowest mAP of 78.71%.

The performance of the models was improved by using augmented data for training.

106
Chapter 4. Real-time weed detection and classification

(a) Without augmenting training data (b) Augmenting the minority classes of the training
data

(c) Augmenting all training data

Figure 4.6: Training accuracy curves (mAP) for the models with and without data
augmentation

The YOLOv7 model exhibited better detection accuracy than others in both cases of
data augmentation. The highest mAP was 89.93%, which was achieved by augmenting
the entire training set. The YOLOv8 and Faster-RCNN models also exhibited improved
performance. Besides, a significant improvement was observed in detecting the weeds of
Unknown class. The YOLOv8l model yielded the highest mAP of 88.69% among the
YOLOv8 models using selective augmentation. The model also detected the Unknown
class (73.04%) more accurately. However, the performance of YOLOv8x model (89.39%)
was better than YOLOv8l (88.54%) by augmenting the entire training set. Notably,

107
4.3. Results

Table 4.4: Average precision by class and mean average precision at IoU of 50% on test
data.

Augmen- Crop Weed species


Model mAP
tation Corn Bluegrass Goosefoot Thistle Sedge Unknown

YOLOv7 86.06 87.54 97.14 98.76 90.98 70.54 88.50

YOLOv7-tiny 86.86 88.64 97.24 99.06 88.68 69.24 88.29

YOLOv8n 84.66 88.14 96.64 97.66 88.68 68.44 87.37

YOLOv8s 84.96 88.84 97.04 97.26 89.98 61.94 86.67


No
YOLOv8m 84.86 86.44 97.44 97.36 91.58 68.84 87.75

YOLOv8l 84.56 88.94 96.84 98.06 92.28 65.04 87.62

YOLOv8x 84.46 85.54 97.14 97.46 91.48 68.14 87.37

Faster-RCNN 77.60 79.02 86.59 88.28 79.28 61.47 78.71

YOLOv7 86.96 88.04 97.54 99.06 91.48 75.64 89.79

YOLOv7-tiny 85.26 88.94 97.64 99.06 91.18 69.04 88.52

YOLOv8n 83.56 87.84 97.34 97.46 92.38 61.54 86.69

YOLOv8s 83.56 89.14 96.84 96.56 89.98 67.44 87.25


Selective
YOLOv8m 85.06 89.64 96.84 97.36 92.88 67.64 88.24

YOLOv8l 84.56 88.34 96.74 97.16 92.28 73.04 88.69

YOLOv8x 84.96 87.14 97.14 98.26 92.68 71.84 88.67

Faster-RCNN 76.65 77.42 85.66 87.07 80.69 68.36 79.31

YOLOv7 88.06 88.04 97.22 99.16 90.98 76.14 89.93

YOLOv7-tiny 87.46 90.14 97.54 99.16 91.18 69.34 89.14

YOLOv8n 86.06 88.84 97.34 96.96 94.38 67.84 88.57

YOLOv8s 85.66 88.64 97.54 97.56 92.98 66.24 88.10


All
YOLOv8m 85.76 89.44 97.34 98.16 93.08 70.24 89.00

YOLOv8l 86.56 88.94 97.34 97.46 93.48 67.44 88.54

YOLOv8x 87.16 88.64 97.24 98.06 92.98 72.24 89.39

Faster-RCNN 79.48 79.32 88.14 89.30 82.17 69.38 81.29

YOLOv7 and YOLOv8 models exhibited significantly higher precision scores compared
to Faster-RCNN, which yielded the lowest precision scores of 79.31% and 81.29% with
selective and all augmentation, respectively. Figure 4.7 shows some example images
from the test dataset. From this figure, it can be said that the YOLOv7 and YOLOv8
models can perform better than Faster-RCNN. The Faster-RCNN model misclassified
some objects and detected the same object twice with two different classes.

108
Chapter 4. Real-time weed detection and classification

Ground Image 1 Image 2 Image 3 Image 4


truth
YOLOv7
YOLOv7-
tiny
YOLOv8n
YOLOv8s
YOLOv8m
YOLOv8l
YOLOv8x
Faster-
RCNN

Figure 4.7: Example images from test dataset to show the detection accuracy of the
models

109
4.4. Discussion

4.4 Discussion

4.4.1 Comparison on training and inference time

The YOLOv7 and YOLOv8 models took less time to be trained than the Faster-RCNN
model. The main reason behind this difference is the use of the Extended Efficient Layer
Aggregation Network (E-ELAN), which can improve the feature learning ability of the
model and reduce the use of parameters and the number of calculations. The YOLOv7
model also introduced the coarse-to-fine lead head label assigner, which also improved
the learning ability of the model without losing the required information. On the other
hand, the YOLOv8 model is even faster than the YOLOv7. The model reduced the use
of mosaic augmentation and replaced the traditional convectional layers of YOLO models
with C3 convolutions. Moreover, the YOLOv8 introduced anchor-free object detection,
which means there is no need to use pre-defined anchor boxes. This technique predicts
the bounding boxes directly, which reduces the inference time.

Inference time is a significant indicator in developing a real-time herbicide sprayer


or weed control system. If a model can detect, localise and classify weeds in crops,
it can take action quicker, and the sprayer can move faster. Inference time depends
on the input image’s resolution and the neural network’s depth and efficiency. High
resolution and deep network generally provide better accuracy but much lower detection
and classification speeds (Zhao et al., 2019). The choice of network for a real-time weed
detection system should balance speed and accuracy.

4.4.2 Performance of the models on training dataset

According to Zhao et al. (2019), training with more data can boost a deeper model’s
detection and classification accuracy with more trainable parameters. Since, by augment-
ing the entire dataset, the model had more data, the performance of YOLOv7, several
variants YOLOv8 models and Faster-RCNN were improved. However, the training accu-
racy of the YOLOv7-tiny model got saturated in that case. It showed a similar result as
trained with the unaugmented dataset since it becomes imbalanced by augmenting the
entire training set. The YOLOv7-tiny model achieved the best accuracy while trained

110
Chapter 4. Real-time weed detection and classification

with a balanced dataset (selective augmentation).

Although the training mAP of Faster-RCNN was improved by augmenting the entire
dataset, YOLOv7 and YOLOv8 models were better. There are several reasons behind the
underperformance of the Faster-RCNN model. YOLO (YOLOv7 and YOLOv8) models
are data-efficient during training since they use the entire image for training and predic-
tion. Moreover, the use of mosaic and mixup augmentation techniques during training
also synthetically expands the dataset. These techniques help the model to overcome the
tendency to focus on detecting items towards the centre of the image. Faster-RCNN, on
the other hand, relies on region proposals, which can be computationally expensive and
require more labelled data (Kaya et al., 2023; D. Wu et al., 2022). Gallo et al. (2023)
and López-Correa et al. (2022) reported similar results while comparing the performance
of YOLOv7 and Faster-RCNN models in their study. Overall, the results in Figure 4.6
demonstrate that the efficacy of the YOLOv7 and YOLOv8 models is better than the
Faster-RCNN model.

4.4.3 Comparison of the models’ accuracy on the test data

The results in Table 4.4 showed that the models could detect Goosefoot and Thistle
plants more accurately since the plants are less occluded and relatively bigger. Although
we had more objects belonging to Bluegrass (21.27%) and Corn (43.29%) plants, the
models could not achieve similar accuracy. The detection accuracy of the YOLOv7 model
for Bluegrass and Corn was 87.54% and 86.06%, respectively, without augmenting the
data. The performance of YOLOv7-tiny was better in that case. Although YOLOv8n,
YOLOv8s and YOLOv8l models had higher precision in detecting Bluegrass, YOLOv7
models localise and classify the Corn plants more accurately. Similar outcomes were
observed by training the models with augmented data. This is presumably because most
of the Corn plants in the images are comparably young seedlings and smaller in size. Y.
Liu et al. (2021), Tong et al. (2020), and Wahyudi et al. (2022) agreed that detecting small
objects is more challenging due to having less feature information and lower resolution.
Moreover, the Bluegrass and Corn plants are overlapped with other objects. According
to Brawn and Snowden (2000), overlapping objects make it difficult for object detection
models to localise and classify objects. The models also failed to achieve a better result

111
4.4. Discussion

in detecting Sedge plants due to occlusion.

Data augmentation had a positive impact on both the training and testing process.
The learning process during training was faster using augmented data (Figure 4.6). The
detection accuracy of the models also improved by training the model with augmented
data. The improvement was more noticeable while detecting the smaller plants (Corn)
and objects of the minority class (Unknown weed).

Image 1 Image 2 Image 3


Ground truth
augmentation
Without
augmentation
Selective
augmentation
All

Figure 4.8: Illustration of the effect of data augmentation on detecting the object of
Unknown class using YOLOv7 model.

On the other hand, all the models struggled to detect plants of the Unknown class
since there was more intra-class dissimilarity and fewer training samples. The effect of
data augmentation is more noticeable while detecting objects of Unknown class. For the

112
Chapter 4. Real-time weed detection and classification

YOLOv7 model, the average precision for detecting Unknown class objects was 70.54%,
which was improved by more than 5% after training the model with augmented data.
The impact was similar for the YOLOv8 and Faster-RCNN models as well. However, the
YOLOv7-tiny model showed no noticeable improvement after training with more data.
The effect of data augmentation is shown in Figure 4.8 for detecting the Unknown class
objects using the YOLOv7 model.

Figure 4.8 shows that Image 1 in the ground truth image has four objects from Blue-
grass and four from the Unknown class. The YOLOv7 model trained with the original
dataset detects only five objects (two Bluegrass and three Unknown plants). Four plants
from the Unknown class were detected after training with the original image and the
augmented image from selected classes. However, the model failed to detect two of the
plants from the Bluegrass class. The model’s performance was improved using the origi-
nal dataset and the augmented data of the entire training set. It detected all the objects
in the image. Similarly, the model failed to detect all the objects from Images 2 and 3
while trained with original data only.

4.5 Conclusion

We have repurposed a publicly available dataset of Corn and associated weeds by


labelling at the object level from the image label in this work. The dataset was annotated
using bounding boxes and prepared for applying the object detection methods. It was
found that the YOLOv7 model achieved the best mean average precision of 90.01% and
89.93% on the training and the test data, respectively. The similarity between the mAP
on training and test data indicates the model was well-trained and not overfit or underfit.
The object detection accuracy and the required inference show that YOLOv7 can be
used to localise and classify Corn crop weeds. However, the YOLOv7-tiny (89.14%) and
YOLOv8x (89.39%) models also achieved better accuracy. The mAP of the model is very
close to the YOLOv7 model. Although the models could not detect the plants of the
Unknown class as efficiently as YOLOv7, the performance was promising and suitable for
real-time weed detection. It was also observed that the variants of the YOLOv8 models
had less inference time than the YOLOv7 model.

113
4.5. Conclusion

On the other hand, the Faster-RCNN model takes more time (see table 4.3) to detect
objects in the image with lower accuracy than YOLOv7 and YOLOv8 models. From
the results found in this study, it can be said that YOLOv7 and YOLOv8 are more
appropriate models for real-time weed detection techniques than the Faster-RCNN model.
However, further studies are needed to optimise the inference time and improve the
detection accuracy of the models.

The model’s performance can be improved by training with a large and balanced
dataset. In our research, we needed more data for training, and the dataset needed
to be more balanced. That is why the models did not accurately detect the plants of
“Unknown” class. To overcome that, we have used data augmentation techniques to
increase the number of training samples. This study observed that data augmentation
positively impacts the accuracy of detecting and classifying weeds in the crop. For smaller
datasets, this technique may improve the performance of the models.

The images of this dataset were collected under different lighting conditions, and the
plants were occluded with each other. This had a major impact on classification accu-
racy. Subsequent studies are required to investigate how to overcome this limitation and
improve the model performance. In future, we will focus on improving the performance
of the two-stage object detector in terms of accuracy and inference time.

The outcome of this research indicates that the YOLOv7 and YOLOv8 models per-
form well detecting Corn and the weeds and can be used to develop a selective sprayer or
any other automatic weed control system in future. An automated field robot or a pre-
cision sprayer can be controlled for selective operations and to spray a specific herbicide
in the required amount using this method. The method allows locating and classifying
weeds in an image or video frame in real-time. On-field trials will need to be done to
test and validate the proposed techniques. Besides, using large datasets containing many
image samples from various weeds and crops collected in field conditions can improve the
performance of the deep learning techniques.

114
Chapter 5

Improving classification accuracy

Accurate classification of weed species in crop plants plays a crucial role in precision
agriculture by enabling targeted treatment. Recent studies show that artificial intelligence
deep learning (DL) models achieve promising solutions. However, several challenging is-
sues, such as lack of adequate training data, inter-class similarity between weed species
and intra-class dissimilarity between the images of the same weed species at different
growth stages or for other reasons (e.g., variations in lighting conditions, image capturing
mechanism, agricultural field environments) limit their performance. In this research,
we propose an image based weed classification pipeline where a patch of the image is
considered at a time to improve the performance. We first enhance the images using gen-
erative adversarial networks. The enhanced images are divided into overlapping patches,
a subset of which are used for training the DL models. For selecting the most informa-
tive patches, we use the variance of Laplacian and the mean frequency of Fast Fourier
Transforms. At test time, the model’s outputs are fused using a weighted majority voting
technique to infer the class label of an image. The proposed pipeline was evaluated using
10 state-of-the-art DL models on four publicly available crop weed datasets: DeepWeeds,
Cotton weed, Corn weed, and Cotton Tomato weed. Our pipeline achieved significant
performance improvements on all four datasets. DenseNet201 achieved the top perfor-
mance with F1 scores of 98.49%, 99.83% and 100% on Deepweeds, Corn weed and Cotton

This chapter has been published: Hasan, A. M., Diepeveen, D., Laga, H., Jones, M. G., & Sohel, F.
(2023). Image patch-based deep learning approach for crop and weed recognition. Ecological informatics,
78, 102361.

115
5.1. Introduction

Tomato weed datasets, respectively. The highest F1 score on the Cotton weed dataset
was 98.96%, obtained by InceptionResNetV2. Moreover, the proposed pipeline addressed
the issues of intra-class dissimilarity and inter-class similarity in the DeepWeeds dataset
and more accurately classified the minority weed classes in the Cotton weed dataset. This
performance indicates that the proposed pipeline can be used in farming applications.

5.1 Introduction

Artificial intelligence (AI) techniques have greate potential to improve modern farm-
ing systems. AI can benefit farming in many ways, e.g., improving crop yields, reducing
the environmental impact and optimising resource allocation (Eli-Chukwu, 2019; Smith,
2018). There are many investigations of deep learning (DL) techniques, which is a branch
of AI, on crop monitoring, such as (Kussul et al., 2017), plant disease detection (Ferenti-
nos, 2018; Haque & Sohel, 2022; Keceli et al., 2022; M. H. Saleem et al., 2019), yield
prediction (Maimaitijiang et al., 2020), weed and pest recognition (Amrani et al., 2023a;
Hasan et al., 2021; W. Li et al., 2021), plant phenotyping or growth monitoring (M.
Wang et al., 2023) crop health monitoring (Devi et al., 2023) and plant frost and stress
monitoring (Khotimah et al., 2023; Shammi et al., 2023; A. Singh et al., 2021).

Weeds are unwanted plants that grow in value crops. Because weeds compete with
crops for resources such as space, water, nutrients and sunlight, poor weed management
in agriculture can cause yield losses and degrade the quality of crops (Slaughter et al.,
2008). The most widely used approach in weed control is the application of herbicide
chemicals (Duke, 2015; Harker & O’Donovan, 2013; López-Granados, 2011). However,
the broad use of herbicides, irrespective of weed species or the severity of infestation,
can lead to high costs and environmental hazards. Therefore, it is important to deploy a
targeted weed control approach. AI and imaging-based techniques can help achieve this
(Sa et al., 2017; Slaughter et al., 2008).

Various weed species may grow within a particular crop that may require different
control and management strategies (Heap, 2014). Classifying weed species in crops is a
crucial step for applying a site-specific weed management system. It is also important for
biodiversity conservation and ecological monitoring (Keceli et al., 2022). Targeted appli-

116
Chapter 5. Improving classification accuracy

cation of weed management techniques (e.g., application of chemicals) can significantly


reduce the production costs and environmental hazards (Barnes et al., 2021). Identifica-
tion of weed species is essential for that purpose. However, it is challenging to classify
the weed species from crops since they all share somewhat similar colour, texture, shape
and background (A. Wang et al., 2019).

Several studies proposed computer vision-based approaches to classify weed species


and crops. Traditional image analysis-based methods are dependent on extraction of
predefined colour and texture features (Hamuda et al., 2016). Several researchers used a
variety of colour indices to distinguish weeds from soil background using colour or spectral
images (Kirk et al., 2009; Lu et al., 2022; G. E. Meyer & Neto, 2008; Rasmussen et al.,
2007). However, this approach only works under a specific lighting condition (Hamuda
et al., 2016) and at the early growth stage of crop and weed when there is no occlusion
of leaves, and the plants are not overlapped (Bakhshipour et al., 2017). A combination
of colour and texture features was used to overcome the issue (J. Ahmad et al., 2018;
Bakhshipour & Jafari, 2018; Bawden et al., 2017). Shape features of the plants or leaves
can also be considered to distinguish weeds in crops (Kazmi et al., 2015a; Swain et al.,
2011).

With the advancement of DL, many studies proposed convolutional neural network
(CNN) based techniques to classify weeds in crops (Hasan et al., 2021). CNN models can
automatically learn hierarchical representations of features from images using multiple
layers of interconnected artificial neurons (Shrestha & Mahmood, 2019).Subsequently,
the DL methods can adaptively learn the most discriminative features from images. Sev-
eral studies were conducted to compare the performance of DL models with traditional
machine learning techniques and showed that the classification accuracy of state-of-the-
art deep learning models is better (dos Santos Ferreira et al., 2017; Liang et al., 2019;
Sarvini et al., 2019; Tang et al., 2017; W. Zhang et al., 2018). However, the perfor-
mance of the state-of-the-art DL models may vary depending on the datasets and the
pre-processing techniques applied to the images (Hu et al., 2020; Kounalakis et al., 2018,
2019; Yu et al., 2019b). Although the pre-trained weights were not generated from any
crop-weed dataset, studies suggested that the performance of the CNN model could be
improved by fine-tuning the pre-trained networks (Bah et al., 2018; Suh et al., 2018;
Teimouri et al., 2018; Toğaçar, 2022; Valente et al., 2019; Yu et al., 2019b). Some other

117
5.1. Introduction

studies suggested modifying well-known CNN architectures to achieve better classifica-


tion accuracy (Adhikari et al., 2019; Espejo-Garcia et al., 2020; Umamaheswari et al.,
2018). For instance, Espejo-Garcia et al. (2020) achieved better accuracy by replacing the
classifier layer of the deep learning models with the traditional machine learning-based
classifiers, i.e., support vector machine, logistic regression, and gradient boosting. On the
other hand, Adhikari et al. (2019) proposed a custom fully convolutional encoder-decoder
network using multiple VGGNet-like blocks. They reduced the computational complex-
ity by incorporating large convolutional kernels, skip layers and multiscale filters in the
traditional VGGNet-like blocks. This model performed better than several architectures
including Faster R-CNN, DeepLab-v3 and U-Net. Chechlinski et al. (2019) proposed a
hybrid CNN model of AlexNet and VGGNET, which outperformed the original model.
On the other hand, Trong et al. (2020) argued that it was difficult to achieve better accu-
racy by using a single DL model to classify weed species. They improved the performance
by fusing five state-of-the-art DL models.

Most crop-weed datasets have a small number of images for training while the DL
models are data hungry. Also, these datasets generally suffer from the class imbalance
problem, i.e., some species have lot more images than others, which can affect the perfor-
mance of the DL models (Attri et al., 2023; Kamilaris & Prenafeta-Boldú, 2018). Many
studies use data augmentation techniques to increase the number of training data to ad-
dress that. This approach generally improves the classification accuracy (D. Chen et al.,
2022b; Hasan et al., 2023b; Le et al., 2020a; Olsen et al., 2019; Sarvini et al., 2019). Ap-
plication of several image pre-processing techniques, such as resizing (Chechlinski et al.,
2019; Farooq et al., 2018a; Partel et al., 2020), removing background (Alam et al., 2020;
Bah et al., 2018; Y. Jiang et al., 2019), image enhancement (Nkemelu et al., 2018; A.
Wang et al., 2020) and denoising (Tang et al., 2017), can also improve the performance
of the models.

According to Hou et al. (2016), training a CNN model with high-resolution images is
computationally expensive and time-consuming. They proposed a patch-based CNN to
classify cells from microscopic high-resolution whole slide images of tissues. A. Sharma
et al. (2017) used a patch-based CNN architecture to classify land cover images using
Landsat. The proposed architecture outperformed pixel-based approaches in overall clas-
sification accuracy. Patch-based image classification is a technique used to classify images

118
Chapter 5. Improving classification accuracy

by analysing smaller patches or sub-regions within the image rather than considering the
entire image as a whole (Ullah et al., 2023).

To summarise, there is a lack of benchmark datasets containing images of crops and


weeds, while deep learning models require a large amount of data to train. Besides,
most datasets have class-imbalance problems, high intra-class dissimilarity and inter-class
similarity. These shortcomings of the datasets affect the performance of deep learning
models. We propose a patch-based image classification approach that increases the data
in a dataset and addresses the class imbalance issue by dividing each image into several
patches. Moreover, We address the intra-class dissimilarity and inter-class similarity
problem by analysing multiple parts of an image instead of the entire image at once.

In this paper, we propose a patch-based image classification approach to improve


classification accuracy. The fundamental concept of our proposed technique is to divide
the input image into smaller overlapping patches. Each patch is then fed into a DL model
for processing and feature extraction. Not all patches are useful since some of them may
contain only the soil background. We propose a novel patch selection technique using
the variance and the mean frequency of image obtained by Laplacian and Fast Fourier
Transform (FFT) methods respectively, to identify the most discriminative features. The
CNN processes each patch independently, extracting meaningful features that capture
local information within the patch. These features can be learned through training,
where the DL optimises its parameters based on labelled training data. At the testing
phase, outputs of all test patches are combined through weighted voting to generate
the inference of the species in an image. The weed and crop species can be classified
more accurately using our proposed approach despite having several challenges in the
datasets, e.g., intra-class dissimilarity, inter-class similarity, class-imbalance and small
training data.

Patch-based image classification focuses on a single part of an image. It is worth


mentioning that the patch-based image classification approach differs from the concept of
Vision Transformers, which operates on the entire image (K. Han et al., 2022). Moreover,
the patch-based image classification focuses on local information, while Visual Transform-
ers capture both local and global context (Shamsabadi et al., 2022). Overall, the main
contributions of this paper are as follows:

119
5.2. Materials and methods

• It presents a novel patch-based crop and weed species classification approach to


improve classification accuracy.

• Instead of using all the patches of images, a novel technique was developed to select
relatively important patches. By avoiding less important patches and only learn-
ing from the important patches, the performances of the DL-models are improved.
Technically, we use a combination of the Laplacian method and Fast Fourier Trans-
form for calculating the relatively important information in a patch, and select a
patch if it is important.

• Proposing an alternative to data augmentation technique for addressing issues arises


due to class-imbalanced dataset or insufficient number of data which affects the
classification accuracy.

5.2 Materials and methods

Deep learning based frameworks for classifying crop and weed species follow several
croad steps: dataset acquisition, preparing data for training, training the deep learning
models, and evaluating the performance of the models. We have illustrated this in Fig-
ure 5.1. We evaluated the proposed method on four public datasets. Each dataset was
divided into training, validation and test subsets. We applied several image preprocess-
ing techniques, e.g., enhancing and resizing the images, dividing the images into patches
(both overlapping and non-overlapping patches) and selecting relatively more important
patches, and using them in training the deep learning models. Ten major deep learn-
ing models were then trained and evaluated using the datasets. Finally, the models’
performances were evaluated based on multiple benchmark evaluation metrics.

5.2.1 Datasets

Four publicly available crop weed datasets were selected to evaluate the performance
of our proposed approach. The datasets are DeepWeeds dataset (Olsen et al., 2019),
Cotton weed dataset (D. Chen et al., 2022b), Corn weed dataset (H. Jiang et al., 2020)
and Cotton Tomato weed dataset (Espejo-Garcia et al., 2020). A summary of the datasets

120
Chapter 5. Improving classification accuracy

is given in Table 5.1.

Table 5.1: A summary of the datasets used in this research. The number of images to
train, validate and evaluate the models are also shown.
Number of images in
Dataset/ Crop/ weed Total number
each set
Total image Species of images
Train Validation Test
Chinee apple (Ziziphus mauritiana) 675 225 226 1126
Lantana (Lantana camara) 637 212 214 1063
Parkinsonia (Parkinsonia aculeata) 618 206 207 1031
Parthenium (Parthenium hysterophorus) 613 204 205 1022
DeepWeeds/
Prickly acacia (Vachellia nilotica) 637 212 213 1062
17,509
Rubber vine (Cryptostegia grandiflora) 605 201 203 1009
Siam weed (Eupatorium odoratum) 644 214 216 1074
Snake weed (Stachytarpheta spp.) 609 203 204 1016
Negative 5463 1821 1822 9106
Carpet weeds (Mollugo verticillata) 457 152 154 763
Crabgrass (Digitaria sanguinalis) 66 22 23 111
Eclipta (Eclipta prostrata) 152 50 52 254
Goosegrass (Eleusine indica) 129 43 44 216
Morningglory (Ipomoea purpurea) 669 223 223 1115
Nutsedge (Cyperus rotundus) 163 54 56 273
Palmer Amaranth (Amaranthus palmeri) 413 137 139 689
Cotton weed/
Prickly Sida (Sida spinosa) 77 25 27 129
5,187
Purslane (Portulaca oleracea) 270 90 90 450
Ragweed (Ambrosia artemisiifolia) 77 25 27 129
Sicklepod (Senna obtusifolia) 144 48 48 240
Spotted Spurge (Euphorbia maculata) 140 46 48 234
Spurred Anoda (Anoda cristata) 36 12 13 61
Swinecress (Lepidium coronopus) 43 14 15 72
Waterhemp (Amaranthus tuberculatus) 270 90 91 451
Bluegrass (Poa pratensis) 720 240 240 1200
Chenopodium album 720 240 240 1200
Corn weed/
Cirsium setosum 720 240 240 1200
6,000
Corn (Zea mays) 720 240 240 1200
Sedge (Cyperus compressus) 720 239 241 1200
Cotton (Gossypium herbaceum) 73 24 26 123
Cotton Tomato Tomato (Solanum lycopersicum) 32 10 12 54
weed/ 508 Black nightshade (Solanum nigrum) 120 40 41 201
Velvet leaf (Abutilon theophrasti) 78 26 26 130

These datasets impose several challenges for deep learning models. One of them
is the inter-class similarity and intra-class dissimilarity. This limits the performance
of the deep learning models (Cacheux et al., 2019). Besides, an image may contain
other plants along with the soil background while capturing a picture of a target plant
image. In the image label annotation technique, an image with multiple plants is labelled
based on the target plant, and the rest are considered background. The plants in the
background are sometimes from the same class and sometimes different. Since the plants
in the background have similar morphology, they influence the class label prediction of

121
5.2. Materials and methods

the deep learning model may lead to misclassification. Another challenging issue is the
class-imbalanced training data which affects the performance of the deep learning model
significantly (Q. Dong et al., 2018). Moreover, deep learning models generally require a
large volume of data to train and learn distinguishable features from the images (Barbedo,
2018). It will be challenging to classify images with small training dataset.

5.2.1.1 DeepWeeds dataset

The dataset contains 17,509 images collected from eight locations in northern Aus-
tralia. More than 8,000 of them are from eight nationally significant weed species. The
rest are plants native to that region of Australia, but not weeds. Those images are classi-
fied as negative in the dataset. The weed species are- chinee apple, lantana, parkinsonia,
parthenium, prickly acacia, rubber vine, siam weed and snake weed. DeepWeeds dataset
has inter-class similarity and intra-class dissimilarity problems, making the weed species
recognition task more challenging. The images were collected using a FLIR Blackfly
23S6C Gigabit Ethernet high-resolution camera. Olsen et al. (2019) intentionally cap-
tured the pictures from different heights, angles, locations and in several lighting condi-
tions to introduce variability in the dataset. All the images were resized to 256 × 256
pixels in size. Olsen et al. (2019) also reported that due to lighting conditions, 3.4%
images of the chinee apple species were classified as snake weed by the trained model
and 4.1% vice versa. The models in their experiments also misclassified parkinsonia and
prickly acacia in several occasions since they are from same genus. The dataset is available
through the GitHub repository: https://github.com/AlexOlsen/DeepWeeds.

5.2.1.2 Cotton weed dataset

The Cotton weed dataset was acquired from the cotton belts of the United States of
America, e.g., North Caroline and Mississippi. D. Chen et al. (2022b) captured the images
of weeds at their different growth stages and under various natural lighting conditions
using digital cameras or smartphones. The data was collected in the growing seasons
(from June to August) of 2020 and 2021. The dataset has 5,187 images of weeds from
fifteen different classes. D. Chen et al. (2022b) reported that the dataset was highly
imbalanced and contained both high and low-resolution images, affecting classification

122
Chapter 5. Improving classification accuracy

accuracy. Moreover, the images of the weeds were collected at different growth stages
with variations in lighting conditions, plant background, leaf colour and structure. The
dataset also has both inter-class similarity and intra-class dissimilarity. These conditions
added some constraints to the classification accuracy. We used the dataset to evaluate
our proposed technique’s efficiency on those issues. This dataset is available through the
Kaggle repository: https://www.kaggle.com/yuzhenlu/cottonweedid15.

5.2.1.3 Corn weed dataset

A Canon PowerShot SX600 HS camera was used to collect the images of this dataset.
H. Jiang et al. (2020) collected the data from an actual corn seedlings field under natural
lighting conditions at different growth stages of the plants. The dataset has 6,000 images
of corn and four types of weed. The collected pictures were resized to a resolution of
800 × 600. The soil background of the plants is not uniform, and so is the lighting
conditions, which posed some challenge to the deep learning model to achieve higher
accuracy. Although the Corn weed dataset had an equal number of images in each class,
most of the images contained multiple plants. The plants were sometimes from the same
class and sometimes different. The images with multiple plants were labelled based on one
of the plants, and the rest were considered background. Since the plants in the background
had similar textures, colours and shapes, they influenced the class label prediction of the
deep learning model. This dataset was taken to verify whether our proposed approach
can handle that issue. The dataset was shared by H. Jiang et al. (2020) through the
Github repository: https://github.com/zhangchuanyin/weed-datasets.

5.2.1.4 Cotton Tomato weed dataset

The images of this dataset were collected from different regions of Greece. The dataset
contains two types of crops and two types of weed plants. The picture of plants was
captured at their early growth stages. Several photographers collected the images from
different locations under various lighting conditions and soil backgrounds. The dataset
has only 508 images from four classes of crops and weeds. The images were captured with
a resolution of 2272 × 1704 pixels from one meter above the ground. The Cotton Tomato
weed dataset has relatively fewer data and is taken to evaluate the model’s performance.

123
5.2. Materials and methods

Espejo-Garcia et al. (2020) made the dataset available for further research through the
Github repository: https://github.com/AUAgroup/early-crop-weed.

5.2.2 Split the datasets

In this research, the datasets were randomly divided into three parts for training,
validation and testing. Here, 60% of the data was used to train the deep learning models,
and 20% of them was kept to validate the models. The remaining 20% data was used
to evaluate the performance of the models. Table 5.1 show the number of data in the
dataset and how they are separated for training, testing and validation.

5.2.3 Deep learning models

The selection of the deep learning models for image classification depends on avail-
able computational resources and the trade-offs between the model complexity and per-
formance (Druzhkov & Kustikova, 2016; Y. Li et al., 2018). To test the performance
of our technique, we selected the following deep learning models: VGG16 (Simonyan &
Zisserman, 2014), VGG19 (Simonyan & Zisserman, 2014), ResNet-50 (K. He et al., 2016),
Inception-V3 (Szegedy et al., 2016), InceptionResNetV2 (Szegedy et al., 2017), Xception
(Chollet, 2017), DenseNet121 (G. Huang et al., 2017), DenseNet169 (G. Huang et al.,
2017), DenseNet201 (G. Huang et al., 2017) and MobileNetV2 (Sandler et al., 2018). For
the sake of brevity, we briefly summarise the main attributes of these techniques.

VGG16 and VGG19 are classical architectures that are well known for simplicity
and uniformity. These models are suitable for smaller datasets and can provide better
accuracy by fine-tuning the pre-trained network (Sukegawa et al., 2020). The models
have several drawbacks, such as vanishing gradient problems and loss of fine-grained
spatial information (M. Pan et al., 2020). On the other hand, ResNet-50 contains residual
connections, which can overcome the vanishing gradient problem and enable training very
deep networks. The model performs well on both large and small datasets (Al-Masni et
al., 2020).

Inception models (Inception-V3 and InceptionResNetV2) are computationally inten-


sive, yet they can provide high accuracy. The models can capture features at multiple

124
Chapter 5. Improving classification accuracy

scales using parallel convolution operations (C. Wang & Xiao, 2021). The Xception
model is an extension of the Inception architecture, which uses depth-wise separable con-
volutions to reduce computational complexity. The model balances the computational
efficiency and performance (Chollet, 2017; Kassani et al., 2019).

DenseNets (DenseNet121, DenseNet169, and DenseNet201) promote feature reuse


by connecting each convolutional layer to every other layer in a feed-forward fashion.
Compared to other deeper architectures, these memory-efficient models achieve better
performance on image classification tasks using fewer parameters (Jégou et al., 2017).
Finally, MobileNetV2 is chosen as a lightweight model, which is an efficient architecture
for real-time applications. The model balances speed and accuracy well using limited
computational resources (S. Wang et al., 2022).

Several studies used these models on crop-weed datasets and achieved better perfor-
mance. For instance, Olsen et al. (2019) used ResNet-50 and Inception-V3. Hasan et al.
(2023b) compared the performance of VGG16, ResNet-50, Inception-V3, InceptionRes-
NetV2 and MobileNetV2 on a combined dataset containing twenty classes of image. An-
other study evaluated the performance of thirty-five state-of-the-art deep learning models
(including the models mentioned above) on the Cotton weed dataset where the models,
as mentioned earlier, had reasonably better results with less inference time (D. Chen
et al., 2022b). Sharpe et al. (2019) applied VGGNet and DetectNet to classify weeds
in strawberry plants and achieved better performance using them. Suh et al. (2018)
also proposed VGG19, ResNet-50 and Inception-V3 to classify images of sugar beet and
volunteer potato.

5.2.4 Performance metrics

The performance of image classification models can be evaluated using several metrics
which can provide valuable insights about their effectiveness. The choice of metrics
depends on the specific requirements and characteristics of the task. In our study, we
have chosen the following commonly used metrics to evaluate the efficacy of the deep
learning models:

• Accuracy: Accuracy measures the proportion of correctly classified images out of

125
5.2. Materials and methods

the total number of images. The metric provides a general overview of model
performance.

T rue P ositives + T rue N egatives


Accuracy = .
T rue P ositives + T rue N egatives + F alse P ositives + F alse N egatives

In these metrics, true positives are correct positive predictions, true negatives are
correct negative predictions, false positives are incorrect positive predictions, and
false negatives are incorrect negative predictions.

• Precision: Precision measures the proportion of correctly classified positive samples


(true positive predictions) out of all positive predictions (sum of true positives and
false positives). It represents how accurate the model is to predict the class of
images.

T rue P ositives
P recision = .
T rue P ositives + F alse P ositives

• Recall: Recall measures the proportion of correctly classified positive samples (true
positive predictions) out of all actual positive samples (sum of true positives and
false negatives). It represents the ability of the model to detect positive samples.

T rue P ositives
Recall = .
T rue P ositives + F alse N egatives

• F1-Score: The F1-Score is the harmonic mean of precision and recall. It represents
the balance between precision and recall, which can help measure the performance
of a model on a class-imbalanced dataset.

P recision × Recall
F1 = 2 × .
P recision + Recall

• Confusion Matrix: A confusion matrix tabulates the true positive, true negative,
false positive, and false negative counts. It provides a detailed breakdown of model
performance. It is helpful to visualise how well the deep learning model is perform-
ing and what prediction errors it is making.

126
Chapter 5. Improving classification accuracy

acquisition
Data Image dataset

Split dataset (randomly)

Train Validation Test


(60%) (20%) (20%)

Traditional approach Proposed approach


preparation

Enhance images
Resize the images
using GAN
Data

Generate image patches

Overlapping
Disjoint patches
patches

Select relatively im-


portant patches
DL models
Training

Train DL models with


Train models the selected patches
DL models
Evaluate

Evaluate the models Evaluate the models

Figure 5.1: An illustration of the proposed workflow for classifying crop and weeds in
images.

5.2.5 Traditional approach

In this approach, the images were resized to 256 × 256 and used to train the models.
For this study, we have selected ten deep learning models as discussed in Section 5.2.3.
The performance of the models was evaluated using several well-known metrics (see Sec-
tion 5.2.6.5) such as accuracy, precision, recall and F1 score. We also used a confusion
matrix to show how accurate the models were in classifying different classes of images in
the datasets (Hasan et al., 2021; Kamilaris & Prenafeta-Boldú, 2018).

127
5.2. Materials and methods

5.2.6 Proposed approach

We proposed a patch-based image classification approach to improve the accuracy of


the deep learning model. Figure 5.1 shows the major components of our research. The
steps in the proposed approach for classifying crop and weed species are as follows. At
first, we used either the original or the enhanced (using GAN based approach) images
and resized those by an image processing technique (Section 5.2.6.1). Then the images
were divided into smaller patches (Section 5.2.6.2). A technique is used to select rela-
tively important patches (Section 5.2.6.3). After that, the selected patches were used to
train deep learning models (Section 5.2.6.4). Finally, the performance of the models was
evaluated (Section 5.2.6.5). The steps are described in the following sections.

5.2.6.1 Image enhancement and image resize

We use three different resolutions of images: 256 × 256, 512 × 512 and 1024 × 1024.
The images in all four datasets have variations in image size. Some are smaller than
256 × 256, and some are larger than 1024 × 1024 pixels in size. The purpose of the
image resize operation is to make the images uniform in resolution. Suppose the images
with lower resolution are divided into patches. In that case, the size of the patches will
be very small, and it may not be possible to find distinguishable features from them.
That may affect the classification accuracy (Y. Liu et al., 2021). We have chosen these
three image resolutions to verify the impact of image size on the deep learning models’
performance. The OpenCV “resize” module was used to perform this task (OpenCV,
2019). To resize the image, this module uses any of the four interpolation methods,
namely nearest-neighbour, bilinear, bicubic and Lanczos interpolation. We tested all the
methods and found no significant effect of their choice on deep learning models. That is
why the nearest-neighbour interpolation was used in this research to resize the images.

However, increasing the resolution of the smaller images with a traditional image
processing approach may reduce the quality, and the resultant data may be blurry (Y. S.
Chen et al., 2018). GAN-based image enhancement can increase the image’s resolution
without affecting the quality. This research used Enhanced Super-Resolution Generative
Adversarial Network (ESRGAN) to enhance low-resolution images. Then resized them

128
Chapter 5. Improving classification accuracy

using the OpenCV “resize” module to a 1024 × 1024 pixel resolution. Figure 5.2 shows
the operations performed here.

256 × 256

Apply OpenCV 512 × 512


“resize” module

1024 × 1024
Original Image

Enhance image Apply OpenCV


using ESRGAN 1024 × 1024
“resize” module

Figure 5.2: The workflow for enhancing and resizing the images.

5.2.6.2 Generate image patches

After resizing the original or enhanced images, they were divided into patches. We
have generated both disjoint patches and overlapping patches. The patches will be over-
laid by 50% in the overlapping patches. As such, each image was divided into 16 disjoint
patches or 49 overlapping patches. For instance, a 256 × 256 pixels image is split into 16
disjoint or 49 overlapping patches of size 64 × 64 pixels.

5.2.6.3 Selection relatively important image patches

Several research indicate that blurry images have negative impact on the classification
accuracy of deep learning models (Dodge & Karam, 2017; Q. Guo et al., 2020; Nimisha
et al., 2017). There are many approaches to detect whether an image is blurry or sharp.
Two of the techniques are based on Fast Fourier Transform (FFT) (Pertuz et al., 2013)
and variance of the Laplacian method (Bansal et al., 2016).

In an FFT based approach, the frequencies at different points of the images are
calculated and depending on the level of frequencies the images are identified as blurry or
sharp. On the other hand, the variance of the pixel values are used in Laplacian methods.
In both cases, the frequency is important to determine a threshold value which will be
used to make a decision. In FFT, it the frequency and for Laplacian method that should

129
5.2. Materials and methods

be the variance. If the frequency (in FFT) or the variance (in Laplacian method) is lower
than the threshold, the image will be identified as blurry.

We used this approach to detect whether a patch of an image contains plant parts and
relatively more discriminating information. Our research found that in most cases, the
patches with no plant parts have low frequency values or variance. This is because the
camera focuses on the crop or weed when a plant image is captured, and the background
soil gets slightly blurry. Both frequency (in FFT) and variance (in the Laplacian method)
are lower if a patch contains only a soil background. This helps us decide which patches
to select to train the models. Here, we followed the following steps to select the patches:

Steps to select the patches for to train, validate and test the models
1: P is a patch, there are N patches in an image
2: for each P in N do
3: Calculate mean frequency P fi
4: Calculate variance of Laplacian method P vi
5: end for
6: Calculate average of the mean frequency AP f = (P f1 + P f2 + ... + P fn )/n
7: Calculate average of the variance AP v = (P v1 + P v2 + ... + P vn )/n
8: for each P f in (P f1 , P f2 , ..., P fn ) and P v in (P v1 , P v2 , ..., P vn ) do
9: if P f > AP f and P v > AP v then
10: Add the patch to selected set SP
11: end if
12: end for

Figure 5.3 illustrates the patch selection process with an example. An image was
divided into 16 patches. Then the mean frequency (in FFT) and the variance (in the
Laplacian method) of each patch were calculated. We then calculated the average of the
mean frequency and the variance separately. This average value was used as a threshold
for that image. That means if the mean frequency of a patch is greater or equal to that
average, then the patch will be selected. The same approach was used to choose the more
informative patches using average variance. For this image, ten and eight patches were
selected by Laplacian and FFT techniques, respectively. Finally, we chose only those
patches which were selected by both methods. Here, seven patches were selected for
training or testing. It can be seen that the chosen patches have at least some plant parts.

130
Chapter 5. Improving classification accuracy

12.06 16.11 14.85 11.98

15.09 16.19 19.01 17.82

05.87 16.44 15.75 14.31

16.53 13.62 14.31 14.87 14.63

Calculate the mean Calculate Select patches with


frequency (in FFT) the average
the mean frequency
for each patch of the mean
frequency higher than average

Select patches
Patches of an image which are chosen
by both methods

Calculate the
variance (in Lapla- Calculate Select patches
cian method) the average with the variance
of variance higher than average
for each patch

345.72 338.11 260.12 230.02 303.41


283.72 329.35 394.43 323.88

266.35 336.62 386.73 295.01

284.78 222.93 266.01 290.75

Figure 5.3: An illustration of the image patch selection process.

5.2.6.4 Model training with selected patches

We trained the same deep learning models as mentioned in Section 5.2.5. For training,
we resized all the patches of images to a resolution of 256 × 256 pixels. Although after
dividing the image into patches, there were images with lower resolution, i.e., 64 × 64
and 128 × 128 pixels, they were converted into a uniform size.

5.2.6.5 Evaluation of the models

An image was divided into patches first to predict its class label. Then the important
patches were selected using the approach mentioned in Section 5.2.6.3. After that, the
model predicted the class label for each patch. The weighted majority voting technique
was used to predict the class label of the image from the predicted labels.

131
5.3. Results

5.2.7 Experimental setup

The experiments were conducted using a desktop computer with an Intel Core i9-
9900X processor, 128 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti graphics card.
The deep learning models were developed using Python 3.8 and Tensorflow 2.4 library.
The models were initialised with the pre-trained weights trained on the ImageNet dataset.
Since the models were trained to classify 1000 classes of images, the classification layer of
the models was replaced for fine-tuning them on the crop-weed datasets. A global average
pooling layer followed by two dense layers with 1024 neurons and ReLU (Rectified Linear
Unit) had replaced the fully connected layer of the original model. The final output layer
was another dense layer with a softmax activation function, and the number of neurons
varied depending on the number of classes. Although the maximum number of epochs
for training was set to 100, the training was completed before that because of the early
stopping strategy by inspecting validation accuracy. The initial learning rate was set to
1×10−4 and was randomly decreased to 10−6 by monitoring the validation loss after every
epoch. We used “Adam” optimiser and “Categorical Cross Entropy” loss for training all
deep learning models. The input size for all the DL models was 256 × 256 × 3 and due
to capability of the computing device the batch size was set to 32.

5.3 Results

In this section, we present the experimental results of our proposed approach and
compare them with the traditional approach and the results achieved by the previous
studies on all the datasets.

5.3.1 Traditional pipeline

We first trained the models using the original images only, without any data aug-
mentation. Table 5.2 summarises the accuracy, precision, recall and F1 score of the ten
models. There is a significant variation between the models regarding the number of pa-
rameters and model depth. Overall, the DenseNet models achieved the best results on all
datasets. DenseNet169 and DenseNet201 attained the same accuracy of 95.01% for the

132
Chapter 5. Improving classification accuracy

DeepWeeds dataset, but the precision, recall and F1 score indicates that DenseNet169 is
better than DenseNet201. The dataset has inter-class similarity and intra-class dissimi-
larity issues. That affected the classification accuracy. Olsen et al. (2019) also mentioned
that there were images of chinee apple, snake weed, prickly acacia and parkinsonia, which
were falsely classified.

On the other hand, for the Cotton weed dataset, DenseNet201 achieved the highest
accuracy of 96%, and the precision, recall and F1 scores were also higher than other
models. This dataset is highly class imbalanced. According to D. Chen et al. (2022b),
the minority weed classes showed relatively low classification accuracy. We observed the
same in our experiments.

The Corn weed dataset was balanced, and the models showed promising performance
without any data augmentation. DenseNet169 and MobileNetV2 achieved the highest
and lowest accuracy of 99.67% and 98%, respectively. However, the Cotton Tomato
weed dataset had a relatively small number of images for training, and the dataset was
imbalanced. Those issues affected the performance of some models, such as MobileNetV2,
VGG16, VGG19, Inception-V3 and Xception. Moreover, for Inception-V3, we observed
that the model achieved a high precision rate (79.72%) but a low recall rate (69.23%).
This indicates that the model made many false predictions. On the other hand, the
performance of MobileNetV2 was relatively poor.

5.3.2 Patch-based approach

In this research, we have proposed a patch based image classification approach to


improve the classification accuracy of the models. For that, the images were resized to
three different sizes- 256 × 256, 512 × 512 and 1024 × 1024.

In most cases, the models’ performance improved by increasing the image size (Figure
5.4). R. Wu et al. (2015) mentioned that the deep learning models could extract more
distinguishable features from high-resolution images. Here, we have shown the results
achieved using disjoint and overlapping patches. However, the results justify that the
model performance can be improved by resizing the images to a relatively larger size
in a patch-based approach. Besides, all ten deep learning models showed better results

133
5.3. Results

Table 5.2: Performance of ten deep learning models based on accuracy, precision, recall
and F1 score on four datasets using traditional pipeline. The performance metrics show
the values in percentage.

Models
Dataset

InceptionResNetV2
Performance
metrics

MobileNetV2

DenseNet121

DenseNet169

DenseNet201
Inception-V3
ResNet-50

Xception
VGG16

VGG19
Accuracy 84.99 90.00 88.01 91.00 88.01 90.00 90.00 92.99 95.01 95.01
Precision 84.50 90.23 88.35 91.29 88.29 90.25 89.46 93.40 94.94 94.67
DeepWeeds
Recall 84.72 90.31 88.40 91.37 88.40 90.37 89.57 93.39 94.93 94.62
F1 score 84.40 90.25 88.35 91.24 88.31 90.27 89.46 93.38 94.91 94.62
Accuracy 88.00 80.00 78.00 89.05 89.05 91.05 92.00 95.05 95.05 96.00

Cotton Precision 87.49 80.27 77.87 89.80 88.95 91.65 91.77 95.32 95.59 95.88
weed Recall 87.71 80.19 78.38 89.42 88.86 91.43 91.71 95.23 95.43 95.81
F1 score 87.15 79.83 77.61 89.30 88.74 91.34 91.59 95.21 95.43 95.79
Accuracy 98.00 99.00 98.00 99.00 99.00 99.00 99.00 99.50 99.67 99.50

Corn Precision 98.35 99.18 98.42 99.42 98.67 98.75 99.18 99.50 99.67 99.50
weed Recall 98.33 99.16 98.42 99.42 98.67 98.75 99.17 99.50 99.67 99.50
F1 score 98.34 99.17 98.42 99.42 98.67 98.75 99.17 99.50 99.67 99.50
Accuracy 38.46 90.38 78.84 99.05 69.23 99.03 96.15 99.03 99.05 99.05
Cotton Precision 15.31 90.91 77.74 99.05 79.72 99.07 96.15 99.06 99.07 99.07
Tomato
weed Recall 38.46 90.38 78.84 99.05 69.23 99.03 96.15 99.03 99.07 99.07
F1 score 21.90 90.46 77.69 99.05 67.32 99.03 96.15 99.03 99.07 99.07

in the patch-based system with overlapping patches than the traditional approaches for
three datasets. The DenseNet201 model achieved 96.38%, 99.83% and 100% accuracy for
DeepWeeds, Corn weed and Cotton Tomato weed datasets, respectively. The performance
was improved because of having more training samples from the images (P. Wang et al.,
2021). Notably, the classification accuracy of most deep learning models was below 70%
on the Cotton weed dataset using low-resolution (256 × 256) images. The performance
was improved significantly by increasing the image size.

On the other hand, the optimal image size for better classification accuracy depends on
the dataset and the CNN architecture (Thambawita et al., 2021). MobileNetV2 obtained
a recognition accuracy of 42.31% using 256 × 256 pixels images on the Cotton Tomato
weed dataset. However, the accuracies were 96.15% and 94.23% using the image resolution
of 512 × 512 and 1024 × 1024 pixels, respectively. This indicates that the optimal image

134
Chapter 5. Improving classification accuracy

DeepWeeds
Cotton weed
Corn weed Without overlapping patches With overlapping patches
Tomato weed
Cotton

Figure 5.4: Deep learning models’ accuracy with respect to image size using
patch-based approach.

resolution for MobileNetV2 on this dataset is 512 × 512 pixels using the image resize
operation.

135
5.3. Results

5.3.3 Performance improvement with enhanced image

We have enhanced the images using a GAN-based approach as mentioned in Section


5.2.6.1 and resized them to the resolution of 1024 × 1024 pixels. The deep learning
models trained with enhanced image improved remarkably over the counterpart (Table
5.3) trained with the resized image.

DenseNet201 achieved the highest precision, recall and F1 score of 98.49% by en-
hancing the image using the GAN-based method. The performance was better than the
resized image and was improved by 2.19%. The classification accuracies for other deep
learning models except InceptionV3 were also increased. Olsen et al. (2019) reported
the highest precision of 95.7% on DeepWeeds dataset. Our approach showed significant
improvement. We also have shown the improvement in the classification accuracy for
each species in Section 5.3.4. Classification accuracy was improved by almost 6% for the
InceptionResNetV2 model on the Cotton weed dataset. The model achieved the high-
est precision, recall and F1 score of 98.96%, 98.95% and 98.95%, respectively. All other
models showed improved results as well. We have demonstrated the performance of the
models classifying the minority class in Section 5.3.5.

The results on Corn weed and the Cotton Tomato weed dataset indicate that our
proposed approach is effective for balanced and small dataset as well. Most of the models
achieved close to 100% classification accuracy.

5.3.4 Performance evaluation on the dataset with inter-class sim-


ilarity and intra-class dissimilarity

The DeepWeeds dataset has high inter-class similarity and intra-class dissimilarity.
Olsen et al. (2019) reported that the model confuses chinee apple images with snake
weed and vice versa. They also added that the deep learning methods incorrectly classified
parkinsonia images as prickly acacia. Their results also indicated that many weed images
are classified as non-weed, and about 3% native plants (negative class) were classified as
various weed species.

Hu et al. (2020) proposed a graph-based deep learning approach to address the issues

136
Chapter 5. Improving classification accuracy

Table 5.3: Performance comparison of the models between training with resized and
enhanced images for all four datasets.
Cotton Tomato
DeepWeeds Cotton weed Corn weed

Performance
weed
Model (%) (%) (%)

metrics
(%)

Change

Change

Change

Change
Resize

Resize

Resize

Resize
GAN

GAN

GAN

GAN
Precision 93.84 95.29 1.55 90.77 96.44 6.24 95.85 99.58 3.90 99.08 99.08 0.00
MobileNetV2 Recall 93.84 95.30 1.55 90.56 96.38 6.42 95.33 99.58 4.46 99.04 99.04 0.00
F1 Score 93.76 95.26 1.60 90.50 96.38 6.50 95.39 99.58 4.40 99.04 99.04 0.00
Precision 89.23 93.70 5.01 92.98 97.65 5.03 99.17 99.42 0.25 100.00 100.00 0.00
VGG16 Recall 88.97 93.70 5.32 92.85 97.62 5.13 99.17 99.42 0.25 100.00 100.00 0.00
F1 Score 88.72 93.63 5.53 92.86 97.61 5.12 99.17 99.42 0.25 100.00 100.00 0.00
Precision 87.88 93.02 5.85 91.97 96.75 5.19 99.50 99.50 0.00 99.07 100.00 0.93
VGG19 Recall 87.92 92.96 5.74 91.71 96.66 5.41 99.50 99.50 0.00 99.04 100.00 0.97
F1 Score 87.61 92.83 5.96 91.64 96.65 5.46 99.50 99.50 0.00 99.03 100.00 0.98
Precision 91.71 95.71 4.37 92.12 96.33 4.57 98.45 99.75 1.32 100.00 100.00 0.00
ResNet-50 Recall 91.65 95.73 4.45 91.99 96.28 4.66 98.42 99.75 1.35 100.00 100.00 0.00
F1 Score 91.47 95.70 4.62 91.93 96.27 4.72 98.42 99.75 1.35 100.00 100.00 0.00
Precision 93.37 93.09 -0.30 92.06 97.27 5.66 98.85 99.50 0.66 99.07 99.08 0.00
Inception-V3 Recall 93.39 93.07 -0.34 91.90 97.24 5.81 98.83 99.50 0.67 99.04 99.04 0.00
F1 Score 93.32 93.01 -0.33 91.87 97.23 5.84 98.84 99.50 0.67 99.04 99.04 0.00
Precision 92.51 96.24 4.03 93.36 98.96 5.99 98.93 99.58 0.66 100.00 100.00 0.00
Inception-
Recall 92.53 96.21 3.97 93.23 98.95 6.14 98.92 99.58 0.67 100.00 100.00 0.00
ResNetV2
F1 Score 92.38 96.19 4.12 93.17 98.95 6.21 98.92 99.58 0.67 100.00 100.00 0.00
Precision 93.91 96.18 2.42 94.25 97.86 3.82 99.18 99.50 0.33 100.00 100.00 0.00
Xception Recall 93.90 96.15 2.40 94.18 97.81 3.85 99.17 99.50 0.34 100.00 100.00 0.00
F1 Score 93.85 96.14 2.44 94.12 97.80 3.91 99.17 99.50 0.34 100.00 100.00 0.00
Precision 94.63 96.69 2.18 93.63 98.30 4.98 99.50 99.83 0.33 100.00 100.00 0.00
DenseNet121 Recall 94.61 96.69 2.20 93.52 98.28 5.10 99.50 99.83 0.34 100.00 100.00 0.00
F1 Score 94.58 96.68 2.22 93.51 98.28 5.11 99.50 99.83 0.34 100.00 100.00 0.00
Precision 94.36 97.11 2.92 93.38 97.74 4.67 99.42 99.67 0.25 100.00 100.00 0.00
DenseNet169 Recall 94.39 97.12 2.90 93.23 97.71 4.81 99.42 99.67 0.25 100.00 100.00 0.00
F1 Score 94.35 97.11 2.92 93.19 97.71 4.85 99.42 99.67 0.25 100.00 100.00 0.00
Precision 96.38 98.49 2.19 94.12 97.54 3.64 99.83 99.83 0.00 99.08 100.00 0.93
DenseNet201 Recall 96.38 98.49 2.19 93.99 97.52 3.75 99.83 99.83 0.00 99.04 100.00 0.97
F1 Score 96.37 98.49 2.20 94.01 97.51 3.72 99.83 99.83 0.00 99.04 100.00 0.97

and achieved 98.1% overall accuracy. They used DenseNet202 as the backbone of the
Graph Convolutional Network (GCN). According to the results shown in their research,
the model obtained accuracy of 96.9%, 95.10% and 98.55% in classifying chinee apple,
snake weed and parkinsonia species, which is better than the previous results. The model
also classified 98.35% native plants correctly.

Our proposed technique (98.49%) outperformed the graph-based model (98.1%). DenseNet201
model achieved 97% and 99% accuracy for chinee apple and parkinsonia weed, respec-
tively. Although the performance was not improved for snake weed species, the model

137
5.3. Results

Figure 5.5: Confusion matrix for DeepWeeds dataset using DenseNet201 model.

yielded 100% accuracy for classifying native plants. This will ensure not to kill the off-
target plants, reducing the waste of herbicide and saving the native ecosystem (Olsen
et al., 2019).

5.3.5 Performance improvement on class imbalanced dataset

D. Chen et al. (2022b) reported that the classification accuracy of the minority class
was relatively low in Cotton weed dataset. The weed species like prickly sida, ragweed,
crabgrass, swinecress, spurred anoda had less number of samples. That class imbal-
ance issue affected the classification accuracy. They replaced the cross entropy loss with
weighted cross entropy loss to improve the performance of the model.

Our proposed technique also achieved better results on recognising the minority
species of weed (Table 5.4). InceptionResNetV2, Xception, DenseNet169 and DenseNet201
classified prickly sida weed species with 100% accuracy, whereas the highest F1 score was
92% using the traditional approach. Although spurred anoda weed species had only
61 sample images, the classification accuracy of the proposed techniques was improved
significantly. The accuracy for other minority classes, including ragweed, crabgrass and

138
Chapter 5. Improving classification accuracy

Table 5.4: F1 score of the deep learning models for weed species from Cotton weed
dataset using traditional and proposed technique

Weed Models

Classification

InceptionResNetV2
species

approach

MobileNetV2

DenseNet121

DenseNet169

DenseNet201
Inception-V3
ResNet-50

Xception
VGG16

VGG19
Traditional 94.90 89.68 90.43 94.87 94.43 97.42 97.07 98.03 98.37 98.68
Carpetweeds
Proposed 98.39 98.05 97.40 97.73 98.38 99.67 98.39 99.35 98.70 98.69
Traditional 88.89 76.19 63.41 88.37 86.36 87.80 88.89 91.30 93.02 90.91
Crabgrass
Proposed 97.78 97.87 100.00 100.00 100.00 100.00 97.78 100.00 100.00 100.00
Traditional 74.23 52.63 42.22 72.90 81.42 71.84 84.91 88.68 84.68 92.45
Eclipta
Proposed 92.45 98.08 96.15 92.45 97.14 98.08 98.08 96.15 98.11 99.05
Traditional 82.50 64.00 62.65 78.95 80.49 87.50 87.18 85.37 87.06 89.16
Goosegrass
Proposed 96.63 96.47 91.36 97.73 97.67 97.73 94.25 100.00 97.67 96.63
Traditional 91.87 89.91 90.02 93.36 95.13 96.43 95.18 98.19 99.10 97.33
Morningglory
Proposed 96.40 98.00 96.07 96.16 97.30 99.33 98.87 98.88 97.29 97.30
Traditional 92.44 89.26 87.18 88.19 94.12 94.83 94.83 94.12 96.55 94.92
Nutsedge
Proposed 98.18 99.10 96.43 93.46 97.25 99.10 95.33 99.10 97.25 92.45
Palmer Traditional 88.28 81.33 83.87 90.91 89.12 91.99 92.58 96.45 97.84 96.77
Amaranth Proposed 94.62 96.40 96.03 96.45 95.71 98.21 97.86 97.86 96.77 97.14
Prickly Traditional 53.66 52.83 30.43 75.56 70.83 73.91 66.67 88.00 87.50 92.00
Sida Proposed 96.15 96.15 96.30 96.30 96.30 100.00 100.00 98.11 100.00 100.00
Traditional 86.86 78.82 76.36 90.61 84.39 93.26 89.14 94.51 94.44 96.67
Purslane
Proposed 93.99 98.32 97.18 94.51 96.70 99.45 96.13 97.21 97.24 97.24
Traditional 83.64 86.27 79.17 90.57 91.23 92.86 96.43 96.30 96.43 98.18
Ragweed
Proposed 100.00 98.18 100.00 98.18 98.18 96.30 96.43 98.18 96.43 98.18
Traditional 90.11 71.29 71.43 87.64 84.00 91.84 89.58 94.85 96.91 93.88
Sicklepod
Proposed 97.92 96.84 95.92 96.91 96.84 97.92 96.97 95.74 95.83 95.92
Spotted Traditional 90.32 76.40 65.31 88.89 86.96 91.11 88.17 96.84 92.47 95.83
Spurge Proposed 96.84 96.84 98.97 95.74 96.84 98.97 97.92 98.97 100.00 97.92
Spurred Traditional 23.53 57.14 23.53 72.00 78.26 80.00 72.73 84.62 92.31 88.00
Anoda Proposed 90.91 96.00 90.91 90.91 95.65 95.65 100.00 100.00 100.00 100.00
Traditional 89.66 75.00 86.67 92.86 82.76 89.66 92.86 96.77 96.55 90.32
Swinecress
Proposed 100.00 96.77 93.33 96.77 96.77 100.00 100.00 96.77 96.77 96.77
Traditional 83.42 71.87 71.92 88.17 83.15 84.97 90.43 94.57 91.89 95.03
Waterhemp
Proposed 96.70 97.83 98.89 97.80 97.83 99.45 98.34 97.27 97.80 98.36

swinecress, also increased.

139
5.4. Discussion

5.3.6 Comparison with traditional approach and prior studies

We have compared the classification accuracy of our proposed approach with the best
results of traditional approach and the best of the previous studies on the respective
datasets (Table 5.5). Our proposed approach showed significant improvement compared
to the traditional approaches on both DeepWeeds and Cotton weed datasets. For some
of weed species, the classification accuracy is improved by more than 10%. Although it
was not significant, better accuracy was observed using the proposed technique on the
other two datasets as well. We also show best results from all previous studies. Different
studies achieved the best results on different datasets. However, our proposed approach
(with DenseNet169) outperformed all the prior techniques. Hu et al. (2020) achieved the
highest accuracy of 98.10% on the DeepWeeds dataset using GCN, whereas the accuracy
for our proposed technique was 98.49%. Moreover, the DenseNet201 model classified
most of the weed species more accurately. Similar outcomes were observed for the other
datasets using the proposed approach. D. Chen et al. (2022b), H. Jiang et al. (2020)
and Espejo-Garcia et al. (2020) reported the highest accuracy of 98.40%, 97.80% and
99.29% on Cotton weed, Corn weed and Cotton Tomato weed datasets, respectively. Our
patch-based technique outperformed previous approaches with respective average results
of 98.46%, 99.83% and 100%.

5.4 Discussion

The primary objective of this research was to improve the classification accuracy of
crop and weed species using deep learning techniques. The results indicate that using the
proposed patch-based approach, the deep learning model can achieve better classification
accuracy irrespective of having challenges such as the number of images in the dataset,
inter-class similarity, intra-class dissimilarity or class-imbalanced dataset.

5.4.1 Comparison to related work

Here we have discussed the performance of our proposed pipeline with the traditional
approach.

140
Chapter 5. Improving classification accuracy

Table 5.5: Comparison among the classification accuracies achieved by traditional


approach, our proposed approach and the best results of prior studies

Traditional approach Prior Study Proposed approach


Dataset Crop/ weed Species
(Accuracy) (Accuracy) (Accuracy)
Chinee apple 90.82 93.52 97.00
Lantana 92.63 97.08 98.00
Parkinsonia 95.28 98.93 99.00
Parthenium 95.54 98.14 98.00
Prickly acacia 88.00 96.33 97.00
DeepWeeds
Rubber vine 98.01 97.42 99.00
Siam weed 95.35 97.95 97.00
Snake weed 93.55 93.11 95.00
Negative 96.18 98.17 100.00
Overall Accuracy 95.01 (Hu et al., 2020) 98.10 98.49
Carpetweeds 99.33 99.00 99.67
Crabgrass 91.30 99.00 100.00
Eclipta 87.03 97.00 98.08
Goosegrass 92.10 98.00 97.93
Morningglory 99.08 99.00 99.33
Nutsedge 88.89 100.00 99.10
Palmer Amaranth 95.10 98.00 98.21
Prickly Sida 95.62 99.00 100.00
Cotton weed
Purslane 93.48 100.00 99.45
Ragweed 96.29 100.00 100.00
Sicklepod 93.88 99.00 97.92
Spotted Spurge 97.87 100.00 100.00
Spurred Anoda 84.62 92.00 95.65
Swinecress 93.75 96.00 100.00
Waterhemp 93.54 100.00 99.45
Overall Accuracy 95.32 (D. Chen et al., 2022b) 98.40 98.96
Bluegrass 99.17 98.20 100.00
Chenopodium album 99.16 98.90 99.58
Cirsium setosum 99.59 97.50 100.00
Corn weed
Corn 99.57 97.10 100.00
Sedge 99.18 97.20 100.00
Overall Accuracy 99.67 (H. Jiang et al., 2020) 97.80 99.83
Cotton 100.00 99.76 100.00
Cotton Tomato 97.62 98.81 100.00
Tomato Black nightshade 100.00 99.63 100.00
weed Velvet leaf 100.00 98.96 100.00
Overall Accuracy 99.04 (Espejo-Garcia et al., 2020) 99.29 100.00

141
5.4. Discussion

5.4.1.1 Traditional vs patch-based pipeline

In the traditional approach, the results indicate that the deep models cannot handle
the issues with the dataset. For the DeepWeeds dataset, DenseNet169 and DenseNet201
models achieved the highest accuracy, but the prediction result was affected by inter-class
similarity and intra-class dissimilarity problems. Figure 5.6 shows a confusion matrix of
the DenseNet201 model on the DeepWeeds dataset. It can be seen that the misclassifica-
tion rate for chinee apple and snake weed are very high. Since, there are lot of similarity
in the morphology of those two weed species, it become challenging for the model to dis-
tinguish them by comparing the features of some part of an image. Besides, many weed
images were classified as non-weed, and some native plants were recognised as prickly
acacia or snake weeds.

On the other hand, in a patch-based approach, model can compare several parts of an
image. In this case, the deep learning model may classify some of them incorrectly, but
due to weighted majority voting technique, in most cases it can identify the correct label
for the image. Table 5.3 shows that, the classification accuracy improved significantly
using our proposed patch-based approach.

Figure 5.6: Confusion matrix for DenseNet201 model on DeepWeeds dataset.

Moreover, the models cannot achieve desired results while applied to the imbalanced

142
Chapter 5. Improving classification accuracy

Cotton weed dataset. Figure 5.7 shows how the classification accuracy is affected by the
class imbalance issue. The spurred andoda weed had the lowest, and the morningglory
weed had the highest number of images in the dataset. The recognition accuracy for the
classes with fewer samples was relatively low.

Figure 5.7: Illustration of the relationship between the number of data in a class and
classification accuracy of the model.

The performance of the models on Corn weed dataset was better since it was a bal-
anced dataset with adequate number of images in each class. However, there are still
some places to improve. On the other hand, classification accuracy of some of the models
on Cotton Tomato weed dataset was relatively low. This is because the dataset does not
have enough image data to train the model.

5.4.1.2 Patch-based approach

We have seen here that the models’ performance were improved using patch-based
approach and by resizing the image to a resolution of 1024 × 1024. Moreover, the whole
image does not really contribute to predicting a class label while classifying an image. The
deep learning model generally focuses on some regions of an image to make a prediction
(Selvaraju et al., 2017). In the traditional approach, a model predicts the class label based
on a specific region. If that region is not similar to the right class, then the prediction will
be wrong. On the other hand, in a patch-based approach, a model can look into several
regions of the image, increasing the probability of getting it right. It is also noticeable
that the model can classify images more accurately using overlapping patches.

143
5.4. Discussion

Although the performance of the models was improved by using the patch-based
approach with higher resolution images for three datasets, it was not so for the Cotton
weed dataset. The images in this dataset were of various resolutions and captured by
different camera types. The image quality was distorted by resizing a low-resolution
image to a higher resolution, which affected the performance of the models (Dodge &
Karam, 2016). In this research, we enhanced the images using GAN based approach to
address that issue and then resized them to the desired size to maintain the quality of
the image.

5.4.1.3 Performance improvement with enhanced image

The Cotton Tomato weed dataset had relatively fewer image data. Espejo-Garcia
et al. (2020) reported the highest classification accuracy of 99.29% using the DenseNet
model with Support Vector Machine as the classifier. The proposed approach achieved
100% recognition accuracy on that dataset using eight out of ten deep learning models.
The remaining two models, i.e., MobileNetV2 and VGG19, obtained more than 99%
accuracy.

On the other hand, we tested our technique on the Corn weed dataset, which had
enough and an equal number of image data for each class. Our approach successfully
classified all the images except one using DenseNet201 and DenseNet121. One of the
Chenopodium album weed images was recognised as Bluegrass (Figure 5.8). Although
the image is labelled as Chenopodium album weed, it has more Bluegrass plants. When
the image was divided into patches, more patches were predicted as Bluegrass.

Figure 5.8: Image of Chenopodium album weed classified as Bluegrass.

The results on the DeepWeeds dataset justify that the proposed technique can han-

144
Chapter 5. Improving classification accuracy

dle inter-class similarity and intra-class dissimilarity problems more efficiently. The
DenceNet201 model achieved more accuracy in distinguishing between chinee apple and
snakeweed. Only one image of parkinsonia was classified as prickly acacia, which signifi-
cantly improved. Besides, very few native plants were misclassified (Figure 5.5).

Moreover, our proposed approach showcased high accuracy in classifying minority


classes in the imbalanced Cotton weed dataset. Imbalanced datasets incur more chal-
lenges in image classification task (Q. Dong et al., 2018; S. Wang et al., 2016). The
technique demonstrated significant improvement in classifying the minority class images
without using data augmentation or a generation approach.

5.4.2 Benchmarking of the results

In this part, we compared the results of our proposed pipeline with the related studies.

5.4.2.1 Performance evaluation on the dataset with inter-class similarity and


intra-class dissimilarity

We have taken an image of a chinee apple weed to explain how our proposed technique
achieved better accuracy than the traditional approach (Figure 5.9a). The image was
classified as snake weed using the DenseNet201 model in the traditional approach. We
have taken the output of the final convolutional layer (Figure 5.9b) by applying Gradient-
weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al., 2017). The colour
scale used for Grad-CAM heatmaps typically ranges from cool to hot colours, where the
cool colour indicates low importance, and the hot colour signifies high significance. In
this case, red regions are highly important, and blue areas have less importance. The
yellow part has a medium contribution in predicting the image’s class. In Figure 5.9b,
the red regions were more emphasised to predict the class of that image, and the model
finds that part was similar to snake weed.

In our proposed approach, 22 overlapping patches were extracted from that image.
Figure 5.10 shows the Grad-CAM for all the patches. Now the model has more options
to decide the class of the image. In this image, eleven patches were classified as chinee
apples, and eight were as snake weed. That indicates a large part of the image was

145
5.4. Discussion

(a) Chinee apple classified as snake weed using (b) Gradient-weighted Class Activation Mapping
traditional approach (Grad-CAM) of the image

Figure 5.9: Example of a chinee apple weed classified as snake weed

similar to snake weed. However, the model successfully identified it as a chinee apple.
It also explains how our approach can achieve better accuracy in classifying datasets
with inter-class similarity or intra-class dissimilarity. Moreover, a cursory look in some
cases may show that the patches at the periphery exhibited greater significance compared
to the centre patches. However, a closer investigation would reveal the cause is rather
the availability of visible features. As shown in Figure 5.9a, the weed leaves are spread
throughout the whole image. However, there are several dark parts in the image where
there are no visible leaves. When the images were divided into patches and the relatively
more important patches were extracted, consequently, the classification model focused on
the visible parts of the patches. The weed leaves in the centre of some patches are either
unclear (not visible) or insignificant to determine the class label. As a result, for those
patches, the parts at the periphery exhibited greater significance.

5.4.2.2 Performance improvement on class imbalanced dataset

Our proposed technique showed improved results in classifying the minority weed
classes. D. Chen et al. (2022b) reported the highest accuracy for spurred anoda, swinecress,
crabgrass and prickly sida, were 92%, 96.36%, 98.82% and 99%, respectively, by intro-

146
Chapter 5. Improving classification accuracy

(a) Snake weed (b) Negative (c) Chinee apple (d) Chinee apple (e) Snake weed

(f) Snake weed (g) Chinee apple (h) Negative (i) Chinee apple (j) Chinee apple

(k) Chinee apple (l) Chinee apple (m) Chinee apple (n) Chinee apple (o) Snake weed

(p) Negative (q) Snake weed (r) Chinee apple (s) Snake weed (t) Snake weed

(u) Snake weed (v) Chinee apple

Figure 5.10: Grad-CAM of the extracted patches from the image in Figure 5.9a

ducing weighted cross-entropy loss. However, the models showed better performance in
recognising those weed classes. Since the InceptionResNetV2 model achieved the highest
accuracy on the Cotton weed dataset, we compared the results with that model. The
model recognised all images of crabgrass, prickly sida and swine cress weed. Besides,
95.63% images spurred anoda weed were identified correctly, which is a significant im-
provement. Since the model can be trained on more data due to having multiple patches
from one image, the model can learn to classify the images of the minority classes more

147
5.5. Conclusion

accurately. Besides, the model can predict the class label of an image using a weighted
majority voting technique.

5.4.3 Limitations and future work

Our method has some limitations. For example, the method was tested on publicly
available datasets. The proposed pipeline has not been tested in field trials, which we
aim to do in a future growing season. Technically, the process of patch selection will
take additional computational time. However, it is compensated by the fact that only
the selected patches (i.e., a subset of image patches) are used in the training and testing.
Yet, our proposed pipeline achieved superior recognition accuracy. To mitigate the patch
selection time-impost, in future work we will investigate integrated patch selection and
learning using e.g., mutual information between patches, which will require fewer patches
for training and at the same time improve the model’s learning. Also, the proposed
approach can be evaluated in other contexts, beyond weed recognition; e.g., on large
image datasets with noise and perturbations.

5.5 Conclusion

In this study, we proposed a patch-based weed classification approach to improve the


classification accuracy of crop and weed species from images. Lack of data availability,
inter-class similarity, intra-class dissimilarity, class imbalanced dataset and image quality
affect the performance of the deep learning models in most cases. Our proposed technique
can overcome those issues and classify the images with relatively high accuracy. DenseNet
model performed better than the other deep learning model in our research. However,
InceptionResNetV2 model achieved an accuracy of 98.95% on the Cotton weed dataset,
which was the highest. Moreover, many studies used different image processing techniques
or deep learning methods to overcome the issues mentioned earlier with the dataset. On
the other hand, our proposed approach successfully handled those in most cases. One of
the applications of this approach could be automatic annotation of data. If there is a
larger number of image data to be labelled, then train the model with a small subset of
the dataset and use that trained model to annotate the rest. Moreover, these techniques

148
Chapter 5. Improving classification accuracy

can be used to achieve better classification accuracy.

In terms of usability, we like to highlight that, artificial intelligence and deep learning
algorithms have been used/investigated in weed recognition technologies. Our proposed
patch-based pipeline can seamlessly be integrated with the existing algorithms for higher
accuracy. In terms of utility, it will offer higher accuracy in weed detection, localisation,
and recognition, which can be applied in developing targeted weed management and
control strategies for minimising costs and environmental impact, and better yields.

149
Chapter 6

Generalised approach for weed


recognition

Automatic weed detection and classification can significantly reduce weed manage-
ment costs and improve crop yields and quality. Weed detection in crops from imagery
is inherently a challenging problem. Because both weeds and crops are of similar colour
(green on green), their growth and texture are somewhat similar; weeds also vary based
on crops, geographical locations, seasons and even weather patterns. This study proposes
a novel approach utilising object detection and meta-learning techniques for generalised
weed detection, transcending the limitations of varying field contexts. Instead of classi-
fying weeds by species, we classified them based on their morphological families aligned
with farming practices. An object detector, e.g., a YOLO model is employed for plant
detection, while a Siamese network, leveraging state-of-the-art deep learning models as
its backbone, is used for weed classification. We repurposed and used three publicly
available datasets, namely, Weed25, Cotton weed and Corn weed data. Each dataset
contained multiple species of weeds, whereas we grouped those into three classes based
on the weed morphology. YOLOv7 achieved the best result as a plant detector, and the
VGG16 model as the feature extractor for the Siamese network. Moreover, the models
were trained on one dataset (Weed25) and applied to other datasets (Cotton weed and
Corn weed) without further training. We also observed that the classification accuracy
of the Siamese network was improved using the cosine similarity function for calculat-
ing contrastive loss. The YOLOv7 models obtained the mAP of 91.03% on the Weed25
Chapter 6. Generalised approach for weed recognition

dataset, which was used for training the model. The mAPs for the unseen datasets
were 84.65% and 81.16%. As mentioned earlier, the classification accuracies with the
best combination were 97.59%, 93.67% and 93.35% for the Weed25, Cotton weed and
Corn weed datasets, respectively. We also compared the classification performance of
our proposed technique with the state-of-the-art Convolutional Neural Network models.
The proposed approach advances weed classification accuracy and presents a viable solu-
tion for dataset independent, i.e., site-independent weed detection, fostering sustainable
agricultural practices.

6.1 Introduction

Weeds cause several detrimental effects on crops, impacting agricultural productiv-


ity, crop quality, and overall farm management. They compete with crops for essential
resources like sunlight, water, nutrients, and space. Weeds can drastically reduce crop
yields by depriving cultivated plants of resources (Gharde et al., 2018). Moreover, weeds
can also compromise the quality of crops and make them less desirable for market sale
due to contamination or altering taste and appearance (Westwood et al., 2018).

Technological developments have been revolutionising farming practices, fostering sus-


tainable and efficient crop management (Karunathilake et al., 2023). As advancements
continue, the potential for scalable, cost-effective, and environmentally conscious weed
detection using artificial intelligence technology has become increasingly tangible (Gill
et al., 2022). Deep learning (DL), a subset of artificial intelligence, empowers machines
to learn patterns and features within datasets and apply them in applications such as
image classification and object detection, anomaly detection, and object tracking (Sarker,
2021) that can be translated to agriculture. For instance, crop monitoring and disease
detection research focus on plant disease detection through image analysis. DL Models
are trained to identify disease symptoms, nutrient deficiencies, and crop stress factors,
allowing for early intervention (Chowdhury et al., 2021; Ferentinos, 2018; Ramcharan
et al., 2017). Moreover, DL models are employed to predict crop yields (Khaki & Wang,
2019; Nevavuori et al., 2019; Van Klompenburg et al., 2020), resource optimisation (H.
Chen et al., 2020; Shaikh et al., 2022), pest control and management (Amrani et al.,

151
6.1. Introduction

2023a; Haque & Sohel, 2022; Y. He et al., 2019b; Kuzuhara et al., 2020; W. Li et al.,
2021), plant frost and stress monitoring (Khotimah et al., 2023; Shammi et al., 2023;
A. Singh et al., 2021), automated harvesting and sorting (Altaheri et al., 2019; Haggag
et al., 2019; Nasiri et al., 2019), decision support systems for farmers (Kukar et al., 2019;
Zhai et al., 2020) and many more.

When applied to weed detection, this technology analyses images captured in agricul-
tural fields, differentiating between crops and unwanted plants with remarkable accuracy.
The implications of DL in weed detection extend far beyond mere identification. This
technology offers precise and targeted interventions. By pinpointing areas infested with
weeds, farmers can implement specific, localised treatments, optimising the use of her-
bicides, reducing chemical inputs, and minimising environmental impact (Hasan et al.,
2021; Jin et al., 2021; Rai et al., 2023; Razfar et al., 2022; Yu et al., 2019b).

Several studies proposed Convolutional Neural Networks (CNN) based approaches for
weed detection and classification in precision agriculture (Hasan et al., 2021, 2023a; Rai
et al., 2023; A. Sharma et al., 2020). Initial studies focused on applying DL models for
classifying weed and crop images (Asad & Bais, 2020; Bosilj et al., 2020; Partel et al.,
2020; Ramirez et al., 2020; W. Zhang et al., 2018). Classification of weed species can
be more advantageous when applying specific management strategies. Many researchers
proposed DL models to recognise weed species as well (Espejo-Garcia et al., 2020; Hasan
et al., 2023b; Hu et al., 2020; Olsen et al., 2019; Sunil et al., 2022; Trong et al., 2020).

On the other hand, weed classification approaches do not localise the instances of
weeds in the image, which is essential for a real-time selective spraying system. More-
over, the classification will be inappropriate if an image contains multiple weed instances.
Several researches applied DL-based object detection methods such as Region-based Con-
volutional Neural Networks (R-CNN), You only look once (YOLO), and Single Shot De-
tector (SSD) to address the issue (Czymmek et al., 2019; Dang et al., 2023; Espinoza
et al., 2020; Gao et al., 2020; Y. Jiang et al., 2019; Le et al., 2021; Osorio et al., 2020;
Partel et al., 2019a; Patidar et al., 2020; Quan et al., 2019; Sharpe et al., 2020; Sivakumar
et al., 2020; W. Zhang et al., 2018).

The problem with most weed detection and classification approaches is that they are
very much data dependent, i.e., site-specific and depend on the crop, geographic location

152
Chapter 6. Generalised approach for weed recognition

and weather. One weed detection setting may not apply to others, even if weeds grow
in the same crop. The DL models need to be retrained with a part of the dataset to
be analysed. In this study, we propose a novel approach that is not dependent on the
crop. This approach allows for dataset-independent weed classification, accommodating
variations in environmental conditions and diverse weed species since we classified them
based on their morphology.

Weeds can be classified as broadleaf weeds, grass and sedge according to their morpho-
logical characteristics (Monaco et al., 2002). Broadleaf weeds typically have wider leaves
with a net-like vein structure (Mithila et al., 2011). Grass weeds resemble grasses and
usually have long, narrow leaves with parallel veins (Moore & Nelson, 2017). Sedges have
triangular stems and grass-like leaves but differ from grasses by having solid, three-sided
stems (Shi et al., 2021). An effective weed management plan that targets the specific
weeds in the field can be developed by classifying them according to their morphology
(Scavo & Mauromicale, 2020; Westwood et al., 2018). No herbicide or management tech-
nique can effectively control all types of weeds (Chauhan, 2020; Scavo & Mauromicale,
2020). An efficient weed classification according to their morphology can ensure a specific
management approach irrespective of crop or geographic location.

Meta-learning in deep learning refers to the process where a model learns how to learn
(Huisman et al., 2021). Instead of just learning to perform a specific task, a meta-learning
model learns the learning process. This means it gains the ability to adapt quickly to
new tasks or domains with minimal training data by leveraging knowledge gained from
previous tasks (Finn et al., 2017). The approach can be applied to weed classification
based on morphology. It involves leveraging meta-learning techniques to categorise weeds
based on visual characteristics such as leaf shape, colour, size, and other morphological
features. The method can facilitate understanding semantic similarities among different
weeds based on their morphology. The model learns to recognise and group weeds with
similar visual characteristics together, even if they belong to different species.

In this research, we propose a meta-learning-based weed classification approach, where


the weeds are classified into three categories (e.g., broadleaf weeds, grass and sedge) ac-
cording to their morphology. Several challenges were associated with this task (Figure
6.1). The occlusion of plants made it very difficult to detect plants from images, as shown

153
6.1. Introduction

(a) Alligatorweed (b) Black nightshade (c) Field thistle

(d) Ceylon spinach (e) Asiatic smartweed (f) Barnyard grass

(g) Crab grass (h) Green foxtail (i) Sedge

Figure 6.1: Example of weed images from the Weed25 datasets. Here, we shown
different species of weed with similar morphology. Figures (a), (b), (c), (d) and (e) are
broadleaf weeds which have different leaf shape, colour and texture. Figures (f), (g) and
(h) are grasses and Figure (i) is a sadge weed. Grass and sadge weed have quite similar
structure.

in Figure 6.1a. Moreover, several weed species were grouped into one according to mor-
phology. Figures 6.1b, 6.1c, 6.1d and 6.1e show example images from the Weed25 dataset
where weeds from different species are considered as broadleaf weed in our study. It im-
poses additional challenges for the DL classifier since weed species in the same group have
considerable dissimilarity in colour, texture and shape. Moreover, grass and sedge weeds
have quite similar morphology, making distinguishing them difficult (Figures 6.1f, 6.1g,
6.1h and 6.1i). The main contributions of this study are (1) to propose a meta-learning

154
Chapter 6. Generalised approach for weed recognition

based dataset-independent weed classification approach, (2) to demonstrate a technique


for detecting and classifying weeds according to morphology and (3) to repurpose the
public dataset.

6.2 Materials and Methods

The proposed approach is divided into two stages. First, we use a deep learning
model to detect weed plants. We used an object detection technique using an annotated
dataset for plant detection. We labelled the annotated plants in the dataset as “Plant”
irrespective of their species. After training, we used the trained model to detect plants
from another unseen dataset. The second stage is about classifying the plants according
to their morphology. We trained the Siamese network using the same dataset used for
plant detection. The state-of-the-art deep learning models were used as feature extractors
for the Siamese network. The model was trained to predict the similarity score of plants.
The dataset used here contains twenty-five species of weeds, and the model learns to
predict the similarity score using 25 classes. There were multiple weed species in the
dataset belonging to each category (broadleaf, grass and sedge) according to morphology.
We used the trained model to find the similarity score for images of unseen datasets.
The similarity score was calculated based on three classes, and the images were classified
depending on that.

Figure 6.2 shows the proposed pipeline for recognising weed in crops. First, a YOLO
model was trained using a dataset, which is named “A” for better understanding. The
dataset “A” contains several weed species. For training the YOLO model, we labelled all
weed species of dataset “A” as plants since the goal was to detect plants only, irrespective
of their species. Then, we used pairwise similarity learning based on the Siamese network
and trained the model with the “A” dataset. After that, we took an unseen dataset
(named “B”) and used the trained YOLO model to extract the plants from images. A
support set was prepared by selecting ten images randomly from “A” dataset. The images
in dataset “B” were considered the query set. We extract a plant image using the YOLO
model for every image in the query set. Then, the trained Siamese network predicted
the plant image’s similarity score with respect to the support set. The dataset “A” had

155
6.2. Materials and Methods

many classes, and we calculated the similarity score with every class in the dataset. We
grouped the classes of the support set into three types: broadleaf, grass and sedge. Each
group contained more than one class. Therefore, we calculate the average similarity score
of the classes in a group. The group with the highest similarity score was considered the
class of the query plant image.

Training phase Evaluation phase

Dataset: Dataset: Dataset:


Weed25 Unseen Weed25
new data

image
Group weed
Train an object Prepare a dataset to Trained species
detection model train a Siamese network plant detec- according
to detect plants using the weed plants tion model to weed
morphology

Detect the weed Train a Siamese net- Get plants’


plants and get the work to predict the support
similarity score be- coordinates
plant coordinates for the image images
tween plant images

Create separate Trained


Get trained plant Get trained similarity similarity
images for
detection model score predictor model query score predic-
the plants
image tor model

Get similarity
Classify weed
score with
based on the each group
similarity score
for the image

Figure 6.2: The proposed pipeline for classifying weeds. Here, we have shown how the
models were trained and how the trained models were used on unseen datasets.

6.2.1 Dataset

In this research, we have used three public datasets: Weed25 (P. Wang et al., 2022),
Cotton weed (Dang et al., 2023) and Corn weed (H. Jiang et al., 2020). The Weed25
dataset has 14,023 images in 25 categories of weeds. Weeds in the images were annotated
using bounding boxes. In our experiments, we used this dataset to train both the YOLO
models and the Siamese network. On the other hand, the Cotton weed and Corn weed
datasets were used to evaluate the performance of the model. The cotton weed dataset
has twelve weed species, whereas four types of weeds are in the Corn weed dataset. The
bounding box annotations of the Cotton weed dataset were available at https://zenodo.
org/records/7535814. The Corn weed dataset was annotated by Hasan et al. (2024) using

156
Chapter 6. Generalised approach for weed recognition

bounding boxes. We removed the images of corn plants from the dataset for our study
since those were crops. Moreover, all class labels in the datasets were labelled as “Plant”
for the YOLO models. Table 6.1 provides an overview of the dataset.

Table 6.1: Overview of the datasets


Dataset Types of weed Common name Scientific name Number of images
Horseweed Calyptocarpus vialis 192
Field thistle Cirsium discolor 565
Cocklebur Xanthium strumarium 745
Indian aster Kalimeris indica 510
Bidens Bidens comosa 612
Ceylon spinach Basella alba 536
Billygoat weed Ageratum conyzoides 599
White smartweed Persicaria attenuata 671
Asiatic smartweed Persicaria perfoliata 490
Chinese knotweed Persicaria chinensis 390
Broadleaf Alligatorweed Alternanthera philoxeroides 637
Pigweed Amaranthus albus 742
Weed25 Shepherd purse Capsella bursa-pastoris 224
Purslane Portulaca oleracea 730
Common dayflower Commelina benghalensis 562
Goosefoot Chenopodium album 593
Plantain Plantago lanceolata 556
Viola Viola pratincola 523
Black nightshade Solanum nigrum 606
Mock strawberry Duchesnea indica 615
Velvetleaf Abutilon theophrasti 622
Barnyard grass Echinochloa crus-galli 563
Grass Crabgrass Digitaria bicornis 594
Green foxtail Setaria viridis 552
Sedge Sedge Cyperus compressus 594
Morningglory Calystegia occidentalis 1115
Palmer Amaranth Amaranthus palmeri 763
Carpetweed Mollugo verticillata 689
Waterhemp Amaranthus tuberculatus 451
Purslane Portulaca oleracea 450
Eclipta Eclipta prostrata 254
Broadleaf
Spotted Spurge Euphorbia maculata 234
Cotton weed Sicklepod Senna obtusifolia 240
Prickly Sida Sida rhombifolia 129
Ragweed Ambrosia artemisiifolia 129
Swinecress Lepidium didymum 72
Spurred Anoda Anoda cristata 61
Goosegrass Eleusine indica 216
Grass
Crabgrass Digitaria bicornis 111
Sedge Nutsedge Cyperus rotundus 273
Goosefoot Chenopodium album 1200
Broadleaf
Field thistle Cirsium discolor 1200
Corn weed
Grass Bluegrass Poa pratensis 1200
Sedge Sedge Cyperus compressus 1197

In Table 6.1, we have grouped the weed classes into three categories since our objective

157
6.2. Materials and Methods

is to classify weeds into broadleaf, grass and sedge. It is worth mentioning that, according
to morphology, most weed species belong to broadleaf category.

6.2.2 Plant detection model

There are several object detection techniques available in computer vision like YOLO
(You Only Look Once), SSD (Single Shot MultiBox Detector) and R-CNN (Region-based
Convolutional Neural Networks) (A. Kumar et al., 2020). Among them YOLO is known
for its speed and efficiency, making it suitable for real-time object detection applications
(Du, 2018). Moreover, it performs well across different types of objects and scenes,
making it versatile and suitable for various applications (Diwan et al., 2023). In this
study we chose to use two latest iterations of YOLO model, YOLOv7 (C.-Y. Wang et
al., 2023b) and YOLOv8 (Jocher et al., 2023a). The models were trained with Weed25
dataset. All classes of Weed25 dataset were labelled as “Plant” here. After training, we
used the model to get a bounding box coordinates of the “Plant” object.

6.2.3 Prepare dataset to train the Siamese network

We have divided the images of the Weed25 dataset into 25 classes based on the object-
level annotation of the dataset. The images were cropped based on the bounding box
coordinates in the annotations. We have applied the following steps to prepare the dataset
for training the few-shot model:

• Resize images: Since the images in the dataset were of different sizes, we applied
image resize them to a consistent size (224 × 224).

• Create a subset of the dataset: The Weed25 dataset contains 14,023 images with
one or more plants from 25 classes. To train the few-shot model with the entire
dataset require more computational resources. Therefore, we prepared 15 subsets
of the dataset for episodic training. Each subset contained 50 images from every
class.

• Split the Dataset: We divided a subset of the dataset into training (80%), validation
(10%), and test (10%) sets.

158
Chapter 6. Generalised approach for weed recognition

Feature vector
Image A Feature extractor

calculate sim-

calculate loss
ilarity score
Labelled Make
Weed25 image pairs
dataset

Feature vector
Image B Feature extractor

Figure 6.3: The training process of the Siamese network using Weed25 dataset.

• Create image pairs: Every image in the dataset is paired with every other image.
The pair is labelled as positive (1) if both images are from the same class and
negative (0) otherwise. We created train, test and validation pairs.

6.2.4 Siamese neural network architecture

This study used a Siamese neural network containing two identical sub-networks that
share the same configuration, parameters, and weights. Each identical network takes an
input image and extracts features by passing it through several convolutional, pooling
and fully connected layers. We used three state-of-the-art deep learning models as feature
extractors, namely VGG16 (Simonyan & Zisserman, 2014), ResNet50 (K. He et al., 2016)
and InceptionV3 (Szegedy et al., 2016) and compare their performance. The output of the
feature extractors is the feature vectors, which are then compared using several distance
metrics. We have evaluated the efficiency of two well-known distance metrics, namely
negative Euclidean distance (Melekhov et al., 2016) and cosine similarity (Chicco, 2021).
The role of the distance metric is to measure the similarity between two feature vectors,
where a higher value indicates similarity and a lower value indicates dissimilarity. We used
the contrastive loss function in our Siamese network to optimise the model’s parameters.
Here, the role of contrastive loss is to discriminate the features of the input images using
either negative Euclidean distance or cosine similarity function. Figure 6.3 shows the
training process of the Siamese network. We used “Weed25” dataset to train the model.

After training the Siamese network, we evaluate the model’s performance using two

159
6.3. Results and Discussion

Calculate

Feature vector
mean
Support similarity
images from score for
Feature extractor broadleaf
Weed25 weed
dataset
Calculate similar- Calculate
ity score of the mean
query image with similarity
all support images score for
of 25 classes grass weed

Feature vector
Query Calculate
image from Feature extractor mean
unseen similarity
dataset score for
sedge weed

Figure 6.4: The evaluation process of our Siamese network.

unseen datasets. In the evaluation face, “Weed25” dataset was used as the support set,
and the unseen dataset’s images were used as query images. The model extracted the
features of the support and query images, compared the feature vectors and predicted
the similarity score. Since our goal was to classify the images into three categories, the
mean similarity score was calculated for each group of classes. The model predicted the
class label based on the mean similarity score. The process is shown in Figure 6.4.

6.3 Results and Discussion

6.3.1 Plant detection from images

Here, we trained the YOLOv7 and YOLOv8 models with the Weed25 dataset. We
used 80% of the data for training and the rest for testing the models. The models were
not trained on the Cotton weed and Corn weed datasets. The quantitative results are
presented in Table 6.2. We showed the models’ performance on the test data and the
entire Weed25 dataset. The performance of YOLO models for detecting bounding box
coordinates on “Cotton weed” and “Corn weed” datasets are also presented in Table 6.2.
The mean Average Precision (mAP) of the YOLOv7 and YOLOv8 models were 92.37%
and 91.19% on the test set of the Weed25 dataset. The YOLOv7 model achieved a higher
mAP of 91.03% than the YOLOv8 model, which was 89.43% when applied to the entire
Weed25 dataset. We then used the trained model to detect plant objects from the “Cotton
weed” and “Corn weed” datasets. The YOLOv7 model obtained the mAP of 84.65% and

160
Chapter 6. Generalised approach for weed recognition

Table 6.2: Performance of YOLO models in detecting plants. The models were trained
on the Weed25 dataset only.
Detected plants mAP (%)
Dataset Gound truth plants
YOLOv7 YOLOv8 YOLOv7 YOLOv8
Weed25 (20% test data) 8,723 7,951 7,793 92.37 91.19
Weed25 43,527 42,435 41,689 91.03 89.43
Cotton weed 9,388 8,343 7,714 84.65 78.26
Corn weed 7,225 6,156 5,868 81.16 77.37

81.16% respectively on “Cotton weed” and “Corn weed” datasets. However, the mAP for
YOLOv8 was 78.26% and 77.37% in those cases, which was lower than YOLOv7.

Another vital thing to notice here was the number of objects detected by the models
(Table 6.2). There were 43,527 plant objects in the Weed25 dataset set, whereas 42,435
and 41,680 were detected by YOLOv7 and YOLOv8 models, respectively. Similarly, in
the Cotton weed dataset, YOLOv7 and YOLOv8 models identified 8,343 and 7,714 plants,
although the total number of plants was 9,388. Both models detected fewer plants than
the ground truth in the Corn weed dataset as well. This happened because of occlusion
and the diversity of plant morphology (Figure 6.5). However, it will not affect the weed
management system much if the model detects the plants correctly. The management
technique needs to be applied to the whole region detected by the models.

6.3.2 Similarity functions and models’ performance

We used two similarity functions to measure the Siamese network’s similarity score.
Our study showed that the model achieved higher accuracy using cosine similarity than
the negative Euclidean distance function (Table 6.3). The Euclidean distance function
is sensitive to variations in pixel intensities across images and captures the differences
in pixel intensities effectively. On the other hand, cosine similarity function focuses on
the orientation or similarity of feature vectors rather than their magnitudes. According
to Amer and Abdalla (2020) and Saha et al. (2020), cosine similarity helps to measure
their similarity irrespective of their magnitudes, making it suitable for similarity-based
categorisation. In this study, it would be challenging to find similarity score based on
pixel intensities across images which is done by the Euclidean distance function. The
cosine similarity function performs better since it focuses more on the spatial patterns
and relationships in the images. The VGG16 model obtained an accuracy of 96.35%,

161
6.3. Results and Discussion

(a) The ground truth annotation has (b) The ground truth annotation has (c) The ground truth annotation has
five plants and YOLOv7 model five plants and YOLOv7 model five plants and YOLOv7 model
detected one (Weed25). detected two (Cotton weed). detected three (Corn weed).

Figure 6.5: The ground truth image with annotation and plants detected by YOLOv7
model which explains why fewer plants are detected the model.

89.07% and 90.97% on Weed25, Cotton weed and Corn weed datasets, respectively, using
negative Euclidean distance as the similarity function. The accuracies were improved by
1.24%, 4.60% and 2.38% on the datasets using the cosine similarity function. ResNet50
and InceptionV3 showed similar results.

Moreover, using both similarity functions, the VGG16 model achieved the highest
accuracy among the DL models. The ResNet50 obtained the highest accuracy of 96.69%,
90.40%, and 92.14% on Weed25, Cotton weed and Corn weed datasets using the cosine
similarity function. Although the InceptionV3 procured improved results using the cosine
similarity functions, the performance was lower than other models.

6.3.3 Performance of the models on the datasets

All the models performed consistently better using the cosine similarity function,
as shown in Table 6.3. However, the models achieved better accuracy on the Weed25
dataset since those were trained using the dataset. Table 6.4 shows the performance of
models in classifying the weed plants. The Siamese model obtained the highest precision

162
Chapter 6. Generalised approach for weed recognition

Table 6.3: Comparison between two similarity functions. The results show the
classification accuracy for three dataset using different feature extractors. Here, the
models were trained on Weed25 dataset only.

Dataset
Similarity Function Feature Extractor
Weed25 Cotton Weed Corn Weed

VGG16 96.35 89.07 90.97


Negative Euclidean Distance ResNet50 94.61 88.01 89.74

InceptionV3 93.33 84.89 86.89

VGG16 97.59 93.67 93.35


Cosine similarity ResNet50 96.69 90.40 92.14

InceptionV3 95.01 88.29 91.14

of 99.52% while classifying broadleaf weed in the Weed25 dataset using the VGG16
model as the feature extractor and cosine similarity function for calculating contrastive
loss. However, performance was not similar for recognising grass (precision is 88.29%)
and sedge (precision is 85.24%). Moreover, the recall for sedge weed was 90.52%, much
higher than the precision value. It indicates that the several sedge weeds were classified
incorrectly. Since most plants were broadleaf weeds in the training set, the models could
not correctly classify the grass and sedge. Moreover, to train the model, we had to resize
the plant images to 224 × 224. Some of the grass and sedge plants were smaller in size.
When the image was enhanced, the plant morphology became similar to broadleaf weed.
Besides, the grass and sedge weeds had quite similar shapes and sizes. Therefore, many
grass weeds were classified as sedge and vice-versa.

In the Cotton weed dataset, the precision value for classifying broadleaf weed using
the cosine similarity function was 95.14%, 93.12% and 91.10% with VGG16, ResNet50
and InceptionV3, respectively. In broadleaf weed classification, the precision values were
higher than recall using all three feature extractors. It indicates that the model predicts
the most broadleaf weeds correctly, but they also missed many. While classifying grass
weed, the recall values were higher than the precision. It indicates that the model profi-
ciently labelled most of the images as grass. Although the models might have labelled a
lot of grass weeds correctly, some of those labels might be incorrect. In this case, some
of the sedge weeds were labelled as grass, but it rarely missed an actual grass weed.

The performance of the models was better in the Corn weed dataset than in the
Cotton weed dataset since the images of the dataset had less diversity and fewer classes

163
6.3. Results and Discussion

Table 6.4: The performance (%) of models in recognising the weed class. Here, NED is
for Negative Eucledian Distance and CS for Cosine Similarity

Feature Broadleaf Grass Sedge


Dataset Metric
Extractor NED CS NED CS NED CS

Precision 99.28 99.52 85.18 88.29 81.03 85.24


VGG16 Recall 97.14 98.29 93.47 85.19 86.35 90.52

F1 score 98.20 98.90 89.13 86.71 83.61 87.80

Precision 97.26 98.16 83.46 90.38 81.76 87.21


Weed25 ResNet50 Recall 97.26 98.16 85.62 92.69 76.87 83.18

F1 score 97.26 98.16 84.53 91.52 79.24 85.15

Precision 97.07 98.17 77.14 81.09 74.12 80.15


InceptionV3 Recall 96.15 97.61 81.09 88.16 69.17 78.51

F1 score 96.61 97.89 79.07 84.48 71.56 79.32

Precision 92.11 95.14 81.09 89.07 71.08 86.14


VGG16 Recall 96.15 98.17 54.90 69.26 72.09 86.14

F1 score 94.09 96.63 65.47 77.93 71.58 86.14

Precision 91.10 93.12 79.26 81.19 76.67 78.15


Cotton
ResNet50 Recall 97.16 97.16 74.11 54.08 78.15 79.27
weed
F1 score 94.03 95.10 76.60 64.92 77.40 78.71

Precision 88.16 91.10 79.16 81.09 74.11 76.13


InceptionV3 Recall 98.17 98.17 51.06 56.11 47.11 56.11

F1 score 92.90 94.50 62.08 66.33 57.60 64.60

Precision 96.15 98.17 89.17 91.10 83.11 87.15


VGG16 Recall 91.10 93.12 92.11 96.15 90.07 90.09

F1 score 93.56 95.58 90.62 93.56 86.45 88.60

Precision 95.14 97.16 88.16 90.09 81.09 85.13


Corn
ResNet50 Recall 92.11 90.09 90.09 92.11 83.11 87.15
weed
F1 score 93.60 93.49 89.11 91.09 82.09 86.13

Precision 91.91 94.01 83.24 86.95 82.10 92.25


InceptionV3 Recall 91.21 93.11 86.96 92.89 78.15 84.13

F1 score 91.56 93.56 85.06 89.82 80.08 88.00

in the original labels. The average classification accuracy of the models was very close.
The F1 scores in recognising broadleaf weeds were 95.17%, 93.49% and 93.56% using
VGG16, ResNet50 and InceptionV3, respectively, with cosine similarity function. The
VGG16 model obtained the F1 scores of 93.56% and 88.60% for grass and sedge weeds,
respectively, which were the highest. The other two models showed similar accuracy in
classifying them.

164
Chapter 6. Generalised approach for weed recognition

(a) Example images with ground truth annotation and the detection of plants from Weed25
dataset

(b) Example images with ground truth annotation and the detection of plants from Cotton
weed dataset

(c) Example images with ground truth annotation and the detection of plants from Corn weed
dataset

Figure 6.6: Example of the detection and classification of weeds from images using the
proposed approach.

165
6.3. Results and Discussion

We have shown some example images to illustrate the performance of our proposed
approach in Figure 6.6. Since the Siamese network achieved the highest accuracy using
the VGG16 model as the feature extractor with cosine similarity function, we provided
the results for that only. Figure 6.6a shows the example images from the Weed25 dataset.
The first image here is a purslane weed, as annotated in the ground truth, which is a
broadleaf weed. Our model detected two plants and classified them as broadleaf weeds.
The second and third images have green foxtail and sedge weeds, which our technique
detected and classified correctly as grass and sedge. The images in Figure 6.6b are from
the Cotton weed dataset. In this case, all plants were recognised correctly. According to
the ground truth annotation, the plants were spurred anoda, goosegrass and nutsedge,
which were detected as broadleaf, grass and sedge, respectively. Figure 6.6c contains
images from the Corn weed dataset. Some of the plants were classified incorrectly here.
For instance, the first image had four bluegrass. The model detected five plants there,
and two of them were classified as broadleaf weeds. The rest were recognised as grass.
The second image contains two bluegrass and a goosefoot weed. Although they were
classified correctly, the model detected two plants.

The proposed technique holds high potential for real-time integration into precision
agriculture, fostering sustainable and efficient farming practices across diverse agricul-
tural landscapes. Unlike traditional methods that rely heavily on specific environmental
conditions or training datasets from particular locations, this approach can generalise
well across various geographies. Moreover, this versatility makes it suitable for site-
independent weed detection. This technique can effectively leverage pre-trained models
and extract relevant features even from limited labelled data. Our proposed method can
contribute to developing automated weed detection and classification systems, leading to
increased efficiency and reduced labour costs in agriculture.

6.3.4 The proposed Siamese network vs state-of-the-art CNN


models based weed classification

We have trained the state-of-the-art CNN models using the detected plants to justify
the efficacy of our proposed technique. We trained VGG16, ResNet50 and InceptionV3
models with the images. Table 6.5 compares the performance of the CNN models with

166
Chapter 6. Generalised approach for weed recognition

our proposed method on the datasets. Here, we trained the models with the Weed25
dataset and tested the performance on the other two datasets (Cotton weed and Corn
weed) and the test set of the Weed25 dataset. We took 80% of the data from the Weed25
dataset for training and used the rest for testing. Since our proposed technique achieved
the best performance using the cosine similarity function for calculating contrastive loss
(Table 6.4), we compared our best result with the CNN models’ results in Table 6.5.

The CNN models showed promising results on The Weed25 dataset. ResNet50 model
achieved the highest accuracy of 95.76%. The VGG16 and InceptionV3 models obtained
95.73% and 94.26% accuracy, respectively, on the dataset. On the other hand, our pro-
posed approach with VGG16 as the feature extractor achieved 97.59% accuracy on the
Weed25 dataset. It is important to notice here that the CNN models classified broadleaf
weeds more accurately than the grass and sedge weeds. The ResNer50 model achieved
the highest F1 score of 97.76% for broadleaf weeds, where the F1 scores were 88.25% and
82.31% for grass and sedge weeds, respectively. Since the dataset has fewer samples from
grass and sedge weed classes, the models were biased. In contrast, the classification accu-
racy of grass and sedge weeds was better using our proposed technique with the Siamese
network.

The differences between the two approaches are more noticeable when the trained
model is applied to the unseen dataset without further training. The accuracy of the
CNN models was much lower for the unseen Cotton weed and Corn weed datasets since
the models were not trained on those datasets. The ResNet50 model obtained the high-
est accuracy of 91.86% and 89.41% as well on the Cotton weed and Corn weed datasets,
respectively. The proposed technique achieved the accuracy of 93.67% and 93.35%, re-
spectively, on those two datasets. However, the overall accuracy reflected only some of the
scenarios. The CNN models found it challenging to classify the grass and sedge weeds
from those datasets. The F1 scores for grass were 55.09%, 62.07% and 51.39% using
VGG16, ResNet50 and InceptionV3 models, respectively, for the Cotton weed dataset.
The models obtained the F1 scores of 18.13%, 25.02% and 14.07%, respectively, for sedge
weeds. In contrast, the F1 scores for grass and sedge weeds for the Cotton weed dataset
were 77.93% and 86.14%, respectively, using the VGG16 model as a feature extractor.

The CNN models classified sedge weeds more accurately from the Corn weed dataset

167
6.3. Results and Discussion

Table 6.5: Performance (%) comparison between the state-of-the-art CNN models and
our proposed method. Here, the models were trained on Weed25 dataset only. The
performances of the models were evaluated on the other datasets without further
training. The CNN columns represent the classification results using CNN models
(VGG16, ResNet50 and InceptionV3). The columns for the proposed methods represent
the best results using our proposed approach.

Overall accuracy Broadleaf Grass Sedge


Dataset DL models Metric
Proposed Proposed Proposed Proposed
CNN CNN CNN CNN
method method method method

Precision 99.64 99.52 78.96 88.29 67.26 85.24


VGG16 94.73 97.59 Recall 95.01 98.29 93.01 85.19 94.17 90.52

F1 score 97.27 98.90 85.41 86.71 78.47 87.80

Precision 99.56 98.16 83.03 90.38 72.61 87.21


Weed25 ResNet50 95.76 96.69 Recall 96.03 98.16 94.17 92.69 95.01 83.18

F1 score 97.76 98.16 88.25 91.52 82.31 85.15

Precision 99.72 98.17 77.18 81.09 67.88 80.15


InceptionV3 94.26 95.01 Recall 94.53 97.61 92.71 88.16 93.33 78.51

F1 score 96.89 97.89 84.24 84.48 78.59 79.32

Precision 97.72 95.14 41.73 89.07 36.26 86.14


VGG16 89.74 93.67 Recall 95.02 98.17 81.04 69.26 12.09 86.14

F1 score 96.35 96.63 55.09 77.93 18.13 86.14

Precision 97.83 93.12 49.72 81.19 48.42 78.15


Cotton
ResNet50 91.86 90.40 Recall 97.02 97.16 82.57 54.08 16.85 79.27
weed
F1 score 97.42 95.10 62.07 64.92 25.02 78.71

Precision 97.40 91.10 38.03 81.09 22.41 76.13


InceptionV3 87.75 88.29 Recall 93.01 98.17 79.20 56.11 10.26 56.11

F1 score 95.15 94.50 51.39 66.33 14.07 64.60

Precision 85.65 98.17 83.33 91.10 88.60 87.15


VGG16 86.02 93.35 Recall 95.03 93.12 60.01 96.15 94.07 90.09

F1 score 90.08 95.58 69.77 93.56 91.25 88.60

Precision 89.36 97.16 90.54 90.09 88.73 85.13


Corn
ResNet50 89.41 92.40 Recall 98.01 90.09 67.03 92.11 94.66 87.15
weed
F1 score 93.48 93.49 77.01 91.09 91.59 86.13

Precision 84.75 94.01 79.01 86.95 87.42 92.25


InceptionV3 84.41 88.29 Recall 93.13 93.11 58.04 92.89 93.41 84.13

F1 score 88.74 93.56 66.89 89.82 90.31 88.00

since the sedge weeds in the Weed25 dataset were similar to the Corn weed dataset.
The F1 score for sedge weeds using the ResNet50 model was 91.59%, whereas our pro-
posed technique obtained 88.60%. However, the models achieved very low accuracy while

168
Chapter 6. Generalised approach for weed recognition

classifying grasses in the dataset. The ResNet50 models achieved the highest F1 score
of 77.01% for grass weeds, much lower than our proposed approach (93.56% using the
VGG16 model). The performance of the CNN models suggests that the models can only
accurately classify weeds based on morphology with further training.

6.4 Limitation and future works

This study showed a different direction in detecting and classifying weeds in the field.
Here, we have classified the weeds based on their morphology without retraining the
models with a new dataset and irrespective of the species of weeds. However, there are
a few limitations of this study. First, the dataset that was used to train the model had
fewer samples for grass and sedge weeds. The performance may improve if more training
samples were available for them. Secondly, we chose to use the state-of-the-art model as
the feature extractor in this study. A custom or any other state-of-the-art model should
be evaluated to find an efficient model for this purpose. Finally, we trained the model
using contrastive loss with negative Euclidean distance and cosine similarity function.
There are other loss functions available for Siamese networks which should be explored.

In the future, we will add more data belonging to grass and sedge weeds to train
the model and improve efficiency. Moreover, we will explore the efficacy of other deep
learning models to extract features from the images. Besides, the models’ parameters
will need to be optimised using the other loss functions, such as triplet loss, in future.

6.5 Conclusion

In this study, we have proposed a technique to detect and classify weeds from im-
ages based on their morphology. We have broadly categorised weeds into three classes:
broadleaf, grass and sedge. In our proposed approach, first, we trained the YOLOv7 and
YOLOv8 models to detect plants from an image. The YOLOv7 model detected plants
more efficiently. We then trained the Siamese network to predict the similarity score
for the detected plants. The models were trained using the Weed25 dataset only. We
then applied the trained model on two unseen datasets (Cotton weed and Corn weed

169
6.5. Conclusion

dataset) for plant detection and predicting the similarity score. Our goal was to classify
weeds according to their morphology and not based on their species. Therefore, the weed
species in the Weed25 dataset were grouped into three classes, as mentioned earlier, and
the Siamese network predicted the similarity score for the groups. The YOLOv7 model
obtained the mAP of 91.03%, 84.65% and 81.16% for Weed25, Cotton weed and Corn
weed datasets, respectively. We used negative Euclidean distance and cosine similarity
function for training the Siamese network and observed that the models achieved better
accuracy using the cosine similarity function. Moreover, among the three deep learn-
ing models as feature extractors, the VGG16 model showed better performance. The
VGG16 model achieved the highest classification accuracy of 97.59%, 93.67% and 93.35%
on Weed25, Cotton weed and Corn weed datasets, respectively. Besides, the model clas-
sified broadleaf weed more accurately than grass and sedge since most weed species in
the Weed25 dataset were broadleaf weeds. In addition, we compared the performance
of the state-of-the-art CNN models with our proposed technique. The results showed
significant improvement in classification accuracy using the proposed Siamese network
based approach.

The primary objective of this study is to propose a site-independent weed detection


and classification approach where it is not necessary to retrain the model depending on
crop or geographic location. A pre-trained model can detect plants at any new agricultural
site and classify the weed plants according to their morphology. The approach can help
with any weed management strategy, i.e., applying herbicides and mapping weed density.
The technique can also be implemented on automatic herbicide spraying systems.

170
Chapter 7

Conclusion

In conclusion, this thesis has investigated deep learning-based weed detection to ad-
dress the challenges and has explored the potential of automating agricultural practices.
The journey through the chapters has provided valuable insights into various aspects of
weed detection, classification, and overall precision agriculture. Here, we summarise the
key findings and contributions of this research.

7.1 Contributions

This thesis has presented several innovative contributions that enhance the real-time
performance and accuracy of weed detecting, species recognition and category recognition
thereby contributing significantly to the effectiveness of automatic weed control systems.
The contributions of the thesis is summarised as follows.

7.1.1 Comprehensive literature review

In Chapter 2, a comprehensive survey is presented, encompassing deep learning-based


research focused on detecting and classifying weed species in value crops. A thorough
analysis of 70 relevant papers explores aspects such as data acquisition, dataset prepa-
ration, detection and classification methods, and model evaluation processes. Emphasis
is placed on highlighting publicly available datasets in the related field for the benefit of
prospective researchers. The chapter introduces a taxonomy of research studies in this
7.1. Contributions

domain, summarising various approaches to weed detection. The findings indicate a dom-
inant use of supervised learning techniques, particularly leveraging state-of-the-art deep
learning models. These studies demonstrate enhanced performance and classification
accuracy by fine-tuning pre-trained models on diverse plant datasets. While achieving
remarkable accuracy, it is observed that the experiments excel primarily under specific
conditions, such as on small datasets involving a limited number of crops and weed
species. The computational speed in the recognition process emerges as a limiting factor,
especially concerning real-time applications on fast-moving herbicide spraying vehicles.

7.1.2 Weed classification pipeline and evaluation of deep learning

In Chapter 3, the study utilised four crop weed datasets with 20 different crop and
weed species from various geographical locations. Five state-of-the-art CNN models were
employed for image classification, focusing on comparing transfer learning and fine-tuning
approaches. Fine-tuning proved more effective in achieving accurate image classification.
Combining datasets introduced complexity, leading to a performance decrease due to
specific weed species. Data augmentation addressed class imbalance issues and improved
model performance, particularly in distinguishing challenging weed species. ResNet-50
emerged as the most accurate model. The study emphasised the role of transfer learning
in mitigating the need for extensive datasets when training models from scratch, as pre-
trained models capture generalised features that can be fine-tuned for specific tasks,
enhancing classification accuracy. The research also addresses the need for large-scale
benchmark weed datasets.

7.1.3 Weed detection and classification

In Chapter 4, a publicly available dataset with corn and associated weeds was repur-
posed, with object-level labelling through bounding boxes for object detection. YOLOv7
exhibited the best mean average precision (mAP). YOLOv7, YOLOv7-tiny, and YOLOv8x
demonstrated promising accuracy and inference times for real-time weed detection, out-
performing the Faster-RCNN model. The study highlights the importance of optimis-
ing inference time and improving detection accuracy through further research. More-

172
Chapter 7. Conclusion

over, the study emphasises the potential for enhancing model performance by training
with a large and balanced dataset. Data augmentation techniques addressed class im-
balances and improved weed detection accuracy. Overall, the outcomes suggest that
YOLOv7 and YOLOv8 models are effective in detecting corn and associated weeds, of-
fering prospects for developing selective sprayers or automatic weed control systems. The
proposed method allows real-time localisation and classification of weeds in images or
video frames.

7.1.4 Enhancing classification accuracy

In Chapter 5, a patch-based weed classification approach is proposed to enhance


the accuracy of crop and weed species classification from images, addressing challenges
such as data scarcity, inter-class similarity, intra-class dissimilarity, imbalanced datasets,
and variable image quality. The technique, particularly leveraging the DenseNet model,
demonstrated robust performance. The proposed approach effectively handled various
challenges without relying on additional image processing techniques. Our proposed
technique holds potential applications, including automatic data annotation, where the
model trained on a subset of the dataset could annotate the remaining images. This
technique contributes to achieving improved classification accuracy. Notably, the patch-
based pipeline can seamlessly integrate with existing artificial intelligence and deep learn-
ing algorithms used in weed recognition technologies, offering higher accuracy in weed
detection, localisation, and recognition. The utility of this approach extends to develop-
ing targeted weed management strategies, aiming to minimise costs and environmental
impact and enhance crop yields.

7.1.5 Generalised weed recognition technique

In Chapter 6, we proposed a novel approach for weed detection and classification in


agricultural fields, utilising object detection and meta-learning techniques to transcend
the limitations of varying field contexts. Instead of classifying weeds by species, the study
categories them based on their morphological families aligned with farming practices. The
approach involves employing a YOLO model for plant detection and a Siamese network,

173
7.2. Future work

using state-of-the-art deep learning models as its backbone for weed classification. The
study repurposes and uses three publicly available datasets and groups the weeds into
three classes based on their morphology: broadleaf, grass, and sedge. The YOLOv7
model achieved the best result as a plant detector, and the VGG16 model was the feature
extractor for the Siamese network. The models were trained on one dataset and applied to
others without further training. The study also observed that the classification accuracy
of the Siamese network was improved using the cosine similarity function for calculating
contrastive loss.

7.2 Future work

In weed recognition using deep learning, potential avenues for future work abound.
Future studies may focus on advancing models to recognise and classify multiple weed
species concurrently, contributing to a more comprehensive weed management system.
Emphasis should be placed on optimising models for real-time applications to ensure
prompt responses in agricultural settings. Additionally, efforts should be directed to-
wards enhancing model adaptability to varied environmental conditions, such as different
lighting and weather conditions.

7.2.1 Benchmark dataset

A benchmark dataset is integral to the development and evaluation of deep learning


models for weed detection and classification. It serves as a standardised set of labelled
images for training, validating, and comparing models, ensuring consistency in perfor-
mance assessment. The diversity within a benchmark dataset, including samples from
different locations, lighting conditions, and weed species, allows for testing the generalisa-
tion capabilities of models. Cross-comparing different models using a shared benchmark
dataset enables the identification of state-of-the-art approaches and encourages healthy
competition within the research community.

Efforts should be made to expand benchmark datasets, incorporating more diverse


samples from different regions, varied lighting conditions, and additional weed species.
This expansion enhances the dataset’s representatives and unreliability. Additionally,

174
Chapter 7. Conclusion

adopting transfer learning offers a viable solution to circumvent the need for large datasets
for training deep learning models from scratch. Pre-trained models like those trained on
large datasets like ImageNet can capture detailed generalised features from visual data.
However, as ImageNet lacks specific categorical labelling for weeds or crops, fine-tuning
the pre-trained weights using datasets specific to crops and weeds becomes crucial. This
fine-tuning process enables the model to capture dataset-specific or task-specific features,
thereby improving overall classification accuracy.

Furthermore, we understand that labelling images for training deep learning networks
can be costly and time-consuming. To tackle this challenge and make weed detection
models more scalable, looking into weakly-supervised or self-supervised deep learning
methods would be helpful. Weakly-supervised learning uses less precise annotations or
partial labels during training. This method lets models learn from a broader range of data
without manual labelling. We can lessen the need for manual annotation by trying out
weakly-supervised approaches. Similarly, self-supervised learning methods allow models
to learn from unlabelled data. By using self-supervised techniques, we can reduce the
need for extensive manual labelling. Adding weakly-supervised or self-supervised learning
to our method could make the process more efficient.

7.2.2 Deep learning in weed detection and classification

In Chapter 3, we demonstrated the efficacy of data augmentation in addressing the


class imbalance problem and introducing greater diversity to the dataset. The variations
in training images contribute to enhanced accuracy in deep learning models. Another
effective strategy for mitigating class imbalance issues involves leveraging Generative
Adversarial Networks (GANs) for image sample generation, as proposed by Goodfellow
et al. (2014). Future research endeavours should prioritise investigating the impact of
GAN-based approaches on addressing class imbalances.

In Chapter 4, we highlighted that the dataset images were acquired under diverse
lighting conditions, and the presence of occluded plants significantly affected classifica-
tion accuracy. Future research efforts should consider overcoming this limitation and
enhancing overall model performance. Moreover, our future work will specifically con-
centrate on elevating the performance of the two-stage object detector, aiming to improve

175
7.2. Future work

accuracy and reduce inference time. While the approach demonstrated in this chapter
enables real-time detection and classification of weeds in image or video frames, on-field
trials are essential to test and validate the proposed techniques thoroughly.

The method presented in Chapter 5 to enhance classification accuracy has certain


limitations we plan to address in future work. The proposed technique was evaluated
using publicly available datasets and has not undergone field trials, which we aim to
conduct in upcoming research. While the patch selection process adds computational
time, this is compensated by using only the selected patches (a subset of image patches)
during training and testing, resulting in superior recognition accuracy. To alleviate the
time overhead of patch selection, our future investigations will explore integrated patch
selection and learning, leveraging mutual information between patches. This approach
is expected to require fewer patches for training while enhancing the model’s learning.
Furthermore, we intend to assess the proposed approach in diverse contexts, extending
its applicability beyond weed recognition, such as on large image datasets with noise and
perturbations.

In Chapter 6, our study presented a unique approach to weed detection and classifica-
tion by morphologically categorising weeds without retraining models or considering weed
species. However, certain limitations should be addressed in future works. Firstly, the
dataset used for model training had limited samples for grass and sedge weeds, potentially
leading to enhanced performance with a larger training set. Additionally, the decision
to employ a state-of-the-art model as the feature extractor may benefit from evaluating
custom or alternative models for efficiency. Lastly, our model training utilised contrastive
loss with negative Euclidean distance and cosine similarity function for Siamese networks;
exploring other loss functions, such as triplet loss, is an avenue for future investigation.
To address these limitations, our future work will augment the dataset with more grass
and sedge weeds samples, evaluate alternative deep learning models for feature extraction,
and optimise model parameters using different loss functions like triplet loss.

7.2.3 Field trial of the proposed models

Our future goal is to conduct field trials for the proposed weed detection models. The
aim is to validate their real-world applicability and effectiveness across diverse agricultural

176
Chapter 7. Conclusion

settings. To accomplish this, we plan to employ that on a spray rig.

For image acquisition, we intend to utilise a combination of sensors, including high-


resolution DSLR cameras and, where feasible, cell phone cameras. These sensors will
enable us to capture images with sufficient detail and accuracy for weed detection across
different crops and growth stages. The in-field image collection will encompass various
agricultural environments, covering various crops and growth stages representative of
typical farming practices. We aim to collect imagery from diverse crop types at different
growth stages ranging from seedling emergence to maturity.

Our field trials will be conducted under varying environmental conditions, including
different soil types, lighting conditions, and weather patterns. This comprehensive ap-
proach will ensure that the proposed weed detection models are robust and adaptable to
the complexities of real-world agricultural operations.

Moreover, we will actively engage with farmers and agricultural professionals to gather
feedback on the usability and practicality of the models in field settings. This feedback
will be instrumental in refining the models and optimising their performance to meet the
specific needs and challenges faced by farmers.

To facilitate the development and testing of our prototype system, hardware con-
siderations will be integral to the process. The prototype system will be optimised for
deployment on various computing platforms, including resource-constrained devices com-
monly found in agricultural environments. This optimisation will enhance the accessibil-
ity and scalability of our solution, ensuring its widespread adoption and impact within
the agricultural community.

177
Bibliography

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Irving, G., Isard, M., et al. (2016). Tensorflow: A system for large-scale machine
learning. 12th {USENIX} Symposium on Operating Systems Design and Imple-
mentation ({OSDI} 16), 265–283.
Abdalla, A., Cen, H., Wan, L., Rashid, R., Weng, H., Zhou, W., & He, Y. (2019). Fine-
tuning convolutional neural network with transfer learning for semantic segmen-
tation of ground-level oilseed rape images in a field with high weed pressure.
Computers and Electronics in Agriculture, 167, 105091.
Abuhani, D. A., Hussain, M. H., Khan, J., ElMohandes, M., & Zualkernan, I. (2023). Crop
and weed detection in sunflower and sugarbeet fields using single shot detectors.
2023 IEEE International Conference on Omni-layer Intelligent Systems (COINS),
1–5. https://doi.org/https://doi.org/10.1109/COINS57856.2023.10189257
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Süsstrunk, S. (2012). Slic
superpixels compared to state-of-the-art superpixel methods. IEEE transactions
on pattern analysis and machine intelligence, 34 (11), 2274–2282.
Adhikari, S. P., Yang, H., & Kim, H. (2019). Learning semantic graphics using convolu-
tional encoder-decoder network for autonomous weeding in paddy field. Frontiers
in plant science, 10, 1404.
Aggarwal, C. C., et al. (2018). Neural networks and deep learning. Springer, 10 (978), 3.
Ahmad, A., Saraswat, D., Aggarwal, V., Etienne, A., & Hancock, B. (2021). Perfor-
mance of deep learning models for classifying and detecting common weeds in
corn and soybean production systems. Computers and Electronics in Agriculture,
184, 106081.

178
Bibliography

Ahmad, J., Muhammad, K., Ahmad, I., Ahmad, W., Smith, M. L., Smith, L. N., Jain,
D. K., Wang, H., & Mehmood, I. (2018). Visual features based boosted classi-
fication of weeds for real-time selective herbicide sprayer systems. Computers in
Industry, 98, 23–33.
Alam, M., Alam, M. S., Roman, M., Tufail, M., Khan, M. U., & Khan, M. T. (2020). Real-
time machine-learning based crop/weed detection and classification for variable-
rate spraying in precision agriculture. 2020 7th International Conference on Elec-
trical and Electronics Engineering (ICEEE), 273–280.
Ali-Gombe, A., & Elyan, E. (2019). Mfc-gan: Class-imbalanced dataset classification using
multiple fake class generative adversarial network. Neurocomputing, 361, 212–221.
Al-Masni, M. A., Kim, D.-H., & Kim, T.-S. (2020). Multiple skin lesions diagnostics
via integrated deep convolutional networks for segmentation and classification.
Computer methods and programs in biomedicine, 190, 105351.
Alom, M. Z., Taha, T. M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M. S., Hasan,
M., Van Essen, B. C., Awwal, A. A., & Asari, V. K. (2019). A state-of-the-art
survey on deep learning theory and architectures. Electronics, 8 (3), 292.
Altaheri, H., Alsulaiman, M., & Muhammad, G. (2019). Date fruit classification for
robotic harvesting in a natural environment using deep learning. IEEE Access, 7,
117115–117133. https://doi.org/https://doi.org/10.1109/ACCESS.2019.2936536
Amend, S., Brandt, D., Di Marco, D., Dipper, T., Gässler, G., Höferlin, M., Gohlke, M.,
Kesenheimer, K., Lindner, P., Leidenfrost, R., et al. (2019). Weed management of
the future. KI-Künstliche Intelligenz, 33 (4), 411–415.
Amer, A. A., & Abdalla, H. I. (2020). A set theory based similarity measure for text
clustering and classification. Journal of Big Data, 7, 1–43. https://doi.org/https:
//doi.org/10.1186/s40537-020-00344-3
Amrani, A., Sohel, F., Diepeveen, D., Murray, D., & Jones, M. G. (2023a). Deep learning-
based detection of aphid colonies on plants from a reconstructed brassica image
dataset. Computers and Electronics in Agriculture, 205, 107587. https://doi.org/
10.1016/j.compag.2022.107587
Amrani, A., Sohel, F., Diepeveen, D., Murray, D., & Jones, M. G. (2023b). Insect detec-
tion from imagery using yolov3-based adaptive feature fusion convolution network.
Crop and Pasture Science. https://doi.org/10.1071/CP21710

179
Bibliography

Andrea, C.-C., Daniel, B. B. M., & Misael, J. B. J. (2017). Precise weed and maize clas-
sification through convolutional neuronal networks. 2017 IEEE Second Ecuador
Technical Chapters Meeting (ETCM), 1–6.
Andreini, P., Bonechi, S., Bianchini, M., Mecocci, A., & Scarselli, F. (2020). Image gen-
eration by gan and style transfer for agar plate image segmentation. Computer
Methods and Programs in Biomedicine, 184, 105268.
Asad, M. H., & Bais, A. (2019). Weed detection in canola fields using maximum likelihood
classification and deep convolutional neural network. Information Processing in
Agriculture. https://doi.org/10.1016/j.inpa.2019.12.002
Asad, M. H., & Bais, A. (2020). Weed detection in canola fields using maximum likelihood
classification and deep convolutional neural network. Information Processing in
Agriculture, 7 (4), 535–545. https://doi.org/https://doi.org/10.1016/j.inpa.2019.
12.002
Attri, I., Awasthi, L. K., Sharma, T. P., & Rathee, P. (2023). A review of deep learning
techniques used in agriculture. Ecological Informatics, 102217. https://doi.org/
10.1016/j.ecoinf.2023.102217
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional
encoder-decoder architecture for image segmentation. IEEE transactions on pat-
tern analysis and machine intelligence, 39 (12), 2481–2495.
Bah, M. D., Hafiane, A., & Canals, R. (2018). Deep learning with unsupervised data
labeling for weed detection in line crops in uav images. Remote sensing, 10 (11),
1690.
Bakhshipour, A., & Jafari, A. (2018). Evaluation of support vector machine and artificial
neural networks in weed detection using shape features. Computers and Electronics
in Agriculture, 145, 153–160.
Bakhshipour, A., Jafari, A., Nassiri, S. M., & Zare, D. (2017). Weed segmentation using
texture features extracted from wavelet sub-images. Biosystems Engineering, 157,
1–12.
Banan, A., Nasiri, A., & Taheri-Garavand, A. (2020). Deep learning-based appearance
features extraction for automated carp species identification. Aquacultural Engi-
neering, 89, 102053.

180
Bibliography

Bansal, R., Raj, G., & Choudhury, T. (2016). Blur image detection using laplacian opera-
tor and open-cv. 2016 International Conference System Modeling & Advancement
in Research Trends (SMART), 63–67. https://doi.org/10.1109/SYSMART.2016.
7894491
Barbedo, J. G. A. (2018). Impact of dataset size and variety on the effectiveness of
deep learning and transfer learning for plant disease classification. Computers and
electronics in agriculture, 153, 46–53. https://doi.org/10.1016/j.compag.2018.08.
013
Barlow, H. B. (1989). Unsupervised learning. Neural computation, 1 (3), 295–311.
Barnes, E., Morgan, G., Hake, K., Devine, J., Kurtz, R., Ibendahl, G., Sharda, A., Rains,
G., Snider, J., Maja, J. M., et al. (2021). Opportunities for robotic systems and
automation in cotton production. AgriEngineering, 3 (2), 339–362. https://doi.
org/10.3390/agriengineering3020023
Bawden, O., Kulk, J., Russell, R., McCool, C., English, A., Dayoub, F., Lehnert, C., &
Perez, T. (2017). Robot for weed species plant-specific management. Journal of
Field Robotics, 34 (6), 1179–1199. https://doi.org/10.1002/rob.21727
Bi, J., & Zhang, C. (2018). An empirical comparison on state-of-the-art multi-class
imbalance learning algorithms and a new diversified ensemble learning scheme.
Knowledge-Based Systems, 158, 81–93.
Binguitcha-Fare, A.-A., & Sharma, P. (2019). Crops and weeds classification using convo-
lutional neural networks via optimization of transfer learning parameters. Inter-
national Journal of Engineering and Advanced Technology (IJEAT), 8 (5), 2284–
2294.
Bini, D., Pamela, D., & Prince, S. (2020). Machine vision and machine learning for intel-
ligent agrobots: A review. 2020 5th International Conference on Devices, Circuits
and Systems (ICDCS), 12–16.
Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). Yolov4: Optimal speed and
accuracy of object detection. arXiv preprint arXiv:2004.10934. https://doi.org/
10.48550/arXiv.2004.10934
Bosilj, P., Aptoula, E., Duckett, T., & Cielniak, G. (2020). Transfer learning between crop
types for semantic segmentation of crops versus weeds in precision agriculture.
Journal of Field Robotics, 37 (1), 7–19.

181
Bibliography

Brasseur, E. (n.d.). http://www.ericbrasseur.org/gamma.html?i=1


Brawn, P. T., & Snowden, R. J. (2000). Attention to overlapping objects: Detection and
discrimination of luminance changes. Journal of Experimental Psychology: Human
Perception and Performance, 26 (1), 342. https://doi.org/https://psycnet.apa.
org/doi/10.1037/0096-1523.26.1.342
Brilhador, A., Gutoski, M., Hattori, L. T., de Souza Inácio, A., Lazzaretti, A. E., & Lopes,
H. S. (2019). Classification of weeds and crops at the pixel-level using convolutional
neural networks and data augmentation. 2019 IEEE Latin American Conference
on Computational Intelligence (LA-CCI), 1–6.
Brown, R. B., & Noble, S. D. (2005). Site-specific weed management: Sensing require-
ments—what do we need to see? Weed Science, 53 (2), 252–258.
Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M., & Kalinin,
A. A. (2020). Albumentations: Fast and flexible image augmentations. Informa-
tion, 11 (2), 125. https://doi.org/10.3390/info11020125
Cacheux, Y. L., Borgne, H. L., & Crucianu, M. (2019). Modeling inter and intra-class
relations in the triplet loss for zero-shot learning. Proceedings of the IEEE/CVF
International Conference on Computer Vision, 10333–10342. https://doi.org/10.
1109/ICCV.2019.01043
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detec-
tion. Proceedings of the IEEE conference on computer vision and pattern recogni-
tion, 6154–6162.
Canavari, M., Castellini, A., & Spadoni, R. (2010). Challenges in marketing quality food
products.
Carranza-García, M., Torres-Mateo, J., Lara-Benítez, P., & García-Gutiérrez, J. (2021).
On the performance of one-stage and two-stage object detectors in autonomous
vehicles using camera data. Remote Sensing, 13 (1), 89. https://doi.org/10.3390/
rs13010089
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learn-
ing algorithms. Proceedings of the 23rd international conference on Machine learn-
ing, 161–168.
César Pereira Júnior, P., Monteiro, A., Da Luz Ribeiro, R., Sobieranski, A. C., & Von
Wangenheim, A. (2020). Comparison of supervised classifiers and image features

182
Bibliography

for crop rows segmentation on aerial images. Applied Artificial Intelligence, 34 (4),
271–291.
Chaisattapagon, N. Z. C. (1995). Effective criteria for weed identification in wheat fields
using machine vision. Transactions of the ASAE, 38 (3), 965–974.
Chapelle, O., Scholkopf, B., & Zien, A. (2009). Semi-supervised learning (chapelle, o.
et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20 (3),
542–542.
Chartrand, G., Cheng, P. M., Vorontsov, E., Drozdzal, M., Turcotte, S., Pal, C. J.,
Kadoury, S., & Tang, A. (2017). Deep learning: A primer for radiologists. Ra-
diographics, 37 (7), 2113–2131.
Chauhan, B. S. (2020). Grand challenges in weed management. https://doi.org/10.3389/
fagro.2019.00003
Chavan, T. R., & Nandedkar, A. V. (2018). Agroavnet for crops and weeds classification:
A step forward in automatic farming. Computers and Electronics in Agriculture,
154, 361–372.
Chebrolu, N., Läbe, T., & Stachniss, C. (2018). Robust long-term registration of uav
images of crop fields for precision agriculture. IEEE Robotics and Automation
Letters, 3 (4), 3097–3104.
Chebrolu, N., Lottes, P., Schaefer, A., Winterhalter, W., Burgard, W., & Stachniss, C.
(2017). Agricultural robot dataset for plant classification, localization and map-
ping on sugar beet fields. The International Journal of Robotics Research, 36 (10),
1045–1052.
Chechlinski, L., Siemikatkowska, B., & Majewski, M. (2019). A system for weeds and crops
identification—reaching over 10 fps on raspberry pi with the usage of mobilenets,
densenet and custom modifications. Sensors, 19 (17), 3787.
Chen, D., Lu, Y., Li, Z., & Young, S. (2022a). Performance evaluation of deep transfer
learning on multi-class identification of common weed species in cotton production
systems. Computers and Electronics in Agriculture, 198, 107091. https://doi.org/
10.1016/j.compag.2022.107091
Chen, D., Lu, Y., Li, Z., & Young, S. (2022b). Performance evaluation of deep transfer
learning on multi-class identification of common weed species in cotton production

183
Bibliography

systems. Computers and Electronics in Agriculture, 198, 107091. https://doi.org/


10.1016/j.compag.2022.107091
Chen, H., Chen, A., Xu, L., Xie, H., Qiao, H., Lin, Q., & Cai, K. (2020). A deep learning
cnn architecture applied in smart near-infrared analysis of water pollution for
agricultural irrigation resources. Agricultural Water Management, 240, 106303.
https://doi.org/https://doi.org/10.1016/j.agwat.2020.106303
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu,
J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R.,
Wu, Y., . . . Lin, D. (2019). MMDetection: Open mmlab detection toolbox and
benchmark. arXiv preprint arXiv:1906.07155.
Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convo-
lution for semantic image segmentation. arXiv preprint arXiv:1706.05587.
Chen, Y. S., Wang, Y. C., Kao, M. H., & Chuang, Y. Y. (2018). Deep photo enhancer:
Unpaired learning for image enhancement from photographs with gans. Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition, 6306–
6314. https://doi.org/10.1109/CVPR.2018.00660
Chicco, D. (2021). Siamese neural networks: An overview. Artificial neural networks, 73–
94. https://doi.org/https://doi.org/10.1007/978-1-0716-0826-5_3
Choi, R. Y., Coyner, A. S., Kalpathy-Cramer, J., Chiang, M. F., & Campbell, J. P.
(2020). Introduction to machine learning, neural networks, and deep learning.
Translational vision science & technology, 9 (2), 14–14.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. Pro-
ceedings of the IEEE conference on computer vision and pattern recognition, 1251–
1258. https://doi.org/10.1109/CVPR.2017.195
Chollet, F., et al. (2015). Keras. https://github.com/fchollet/keras
Chowdhury, M. E., Rahman, T., Khandakar, A., Ayari, M. A., Khan, A. U., Khan, M. S.,
Al-Emadi, N., Reaz, M. B. I., Islam, M. T., & Ali, S. H. M. (2021). Automatic
and reliable leaf disease detection using deep learning techniques. AgriEngineering,
3 (2), 294–312. https://doi.org/https://doi.org/10.3390/agriengineering3020020
Czymmek, V., Harders, L. O., Knoll, F. J., & Hussmann, S. (2019). Vision-based deep
learning approach for real-time detection of weeds in organic farming. 2019 IEEE

184
Bibliography

International Instrumentation and Measurement Technology Conference (I2MTC),


1–5.
da Costa Lima, A., & Mendes, K. F. (2020). Variable rate application of herbicides
for weed management in pre-and postemergence. Pests, Weeds and Diseases in
Agricultural Crop and Animal Husbandry Production, 179.
Dang, F., Chen, D., Lu, Y., & Li, Z. (2023). Yoloweeds: A novel benchmark of yolo object
detectors for multi-class weed detection in cotton production systems. Computers
and Electronics in Agriculture, 205, 107655. https://doi.org/10.1016/j.compag.
2023.107655
Dargan, S., Kumar, M., Ayyagari, M. R., & Kumar, G. (2019). A survey of deep learning
and its applications: A new paradigm to machine learning. Archives of Computa-
tional Methods in Engineering, 1–22.
Dargan, S., Kumar, M., Ayyagari, M. R., & Kumar, G. (2020). A survey of deep learning
and its applications: A new paradigm to machine learning. Archives of Computa-
tional Methods in Engineering, 27, 1071–1092.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-
scale hierarchical image database. 2009 IEEE conference on computer vision and
pattern recognition, 248–255.
Deutsch, C. A., Tewksbury, J. J., Tigchelaar, M., Battisti, D. S., Merrill, S. C., Huey,
R. B., & Naylor, R. L. (2018). Increase in crop losses to insect pests in a warming
climate. Science, 361 (6405), 916–919. https://doi.org/10.1126/science.aat3466
Devi, N., Sarma, K. K., & Laskar, S. (2023). Design of an intelligent bean cultivation
approach using computer vision, iot and spatio-temporal deep learning structures.
Ecological Informatics, 75, 102044. https://doi.org/10.1016/j.ecoinf.2023.102044
Di Cicco, M., Potena, C., Grisetti, G., & Pretto, A. (2017). Automatic model based
dataset generation for fast and accurate crop and weeds detection. 2017 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), 5188–5195.
Diwan, T., Anirudh, G., & Tembhurne, J. V. (2023). Object detection using yolo: Chal-
lenges, architectural successors, datasets and applications. multimedia Tools and
Applications, 82 (6), 9243–9275. https://doi.org/https://doi.org/10.1007/s11042-
022-13644-y

185
Bibliography

Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural
networks. 2016 eighth international conference on quality of multimedia experience
(QoMEX), 1–6. https://doi.org/10.1109/QoMEX.2016.7498955
Dodge, S., & Karam, L. (2017). A study and comparison of human and deep learning
recognition performance under visual distortions. 2017 26th international confer-
ence on computer communication and networks (ICCCN), 1–7. https://doi.org/
10.1109/ICCCN.2017.8038465
DOĞAN, M. N., Ünay, A., Boz, Ö., & Albay, F. (2004). Determination of optimum weed
control timing in maize (zea mays l.) Turkish Journal of Agriculture and Forestry,
28 (5), 349–354.
Dong, Q., Gong, S., & Zhu, X. (2018). Imbalanced deep learning by minority class incre-
mental rectification. IEEE transactions on pattern analysis and machine intelli-
gence, 41 (6), 1367–1381. https://doi.org/10.1109/TPAMI.2018.2832629
Dong, S., Wang, P., & Abbas, K. (2021). A survey on deep learning and its applications.
Computer Science Review, 40, 100379.
dos Santos Ferreira, A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T. (2017).
Weed detection in soybean crops using convnets. Computers and Electronics in
Agriculture, 143, 314–324.
dos Santos Ferreira, A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T.
(2019). Unsupervised deep learning and semi-automatic data labeling in weed
discrimination. Computers and Electronics in Agriculture, 165, 104963.
DPIRD. (2021, June). Herbicide application: Page 3 of 5. https://www.agric.wa.gov.au/
grains/herbicide-application?page=0%5C%2C2
Druzhkov, P. N., & Kustikova, V. D. (2016). A survey of deep learning methods and
software tools for image classification and object detection. Pattern Recognition
and Image Analysis, 26, 9–15.
Du, J. (2018). Understanding of object detection based on cnn family and yolo. Journal
of Physics: Conference Series, 1004, 012029. https://doi.org/https://doi.org/10.
1088/1742-6596/1004/1/012029
Duke, S. O. (2015). Perspectives on transgenic, herbicide-resistant crops in the united
states almost 20 years after introduction. Pest management science, 71 (5), 652–
657. https://doi.org/10.1002/ps.3863

186
Bibliography

Durand, T., Mordan, T., Thome, N., & Cord, M. (2017). Wildcat: Weakly supervised
learning of deep convnets for image classification, pointwise localization and seg-
mentation. Proceedings of the IEEE conference on computer vision and pattern
recognition, 642–651.
Dyrmann, M., Jørgensen, R. N., & Midtiby, H. S. (2017). Roboweedsupport-detection
of weed locations in leaf occluded cereal crops using a fully convolutional neural
network. Adv. Anim. Biosci, 8 (2), 842–847.
Dyrmann, M., Karstoft, H., & Midtiby, H. S. (2016). Plant species classification using
deep convolutional neural network. Biosystems Engineering, 151, 72–80.
Ehrlich, M., & Davis, L. S. (2019). Deep residual learning in the jpeg transform domain.
Proceedings of the IEEE International Conference on Computer Vision, 3484–
3493.
Eli-Chukwu, N. C. (2019). Applications of artificial intelligence in agriculture: A review.
Engineering, Technology & Applied Science Research, 9 (4).
Espejo-Garcia, B., Mylonas, N., Athanasakos, L., Fountas, S., & Vasilakoglou, I. (2020).
Towards weeds identification assistance through transfer learning. Computers and
Electronics in Agriculture, 171, 105306.
Espinoza, M. A. M., Le, C. Z., Raheja, A., & Bhandari, S. (2020). Weed identification
and removal using machine learning techniques and unmanned ground vehicles.
Autonomous Air and Ground Sensing Systems for Agricultural Optimization and
Phenotyping V, 11414, 114140J. https://doi.org/10.1117/12.2557625
Farooq, A., Hu, J., & Jia, X. (2018a). Analysis of spectral bands and spatial resolutions
for weed classification via deep convolutional neural network. IEEE Geoscience
and Remote Sensing Letters, 16 (2), 183–187.
Farooq, A., Hu, J., & Jia, X. (2018b). Weed classification in hyperspectral remote sensing
images via deep convolutional neural network. IGARSS 2018-2018 IEEE Interna-
tional Geoscience and Remote Sensing Symposium, 3816–3819.
Farooq, A., Jia, X., Hu, J., & Zhou, J. (2019). Multi-resolution weed classification via con-
volutional neural network and superpixel based local binary pattern using remote
sensing images. Remote Sensing, 11 (14), 1692.
Fawakherji, M., Youssef, A., Bloisi, D., Pretto, A., & Nardi, D. (2019). Crop and weeds
classification for precision agriculture using context-independent pixel-wise seg-

187
Bibliography

mentation. 2019 Third IEEE International Conference on Robotic Computing


(IRC), 146–152.
Ferentinos, K. P. (2018). Deep learning models for plant disease detection and diagnosis.
Computers and electronics in agriculture, 145, 311–318. https://doi.org/10.1016/
j.compag.2018.01.009
Fernández-Quintanilla, C., Peña, J., Andújar, D., Dorado, J., Ribeiro, A., & López-
Granados, F. (2018). Is the current state of the art of weed monitoring suitable
for site-specific weed management in arable crops? Weed research, 58 (4), 259–272.
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adapta-
tion of deep networks. International conference on machine learning, 1126–1135.
Gabor, A., Leach, R., & Dowla, F. (1996). Automated seizure detection using a self-
organizing neural network. Electroencephalography and clinical Neurophysiology,
99 (3), 257–266.
Gallo, I., Rehman, A. U., Dehkordi, R. H., Landro, N., La Grassa, R., & Boschetti, M.
(2023). Deep object detection of crop weeds: Performance of yolov7 on a real case
dataset from uav images. Remote Sensing, 15 (2), 539. https://doi.org/10.3390/
rs15020539
Gando, G., Yamada, T., Sato, H., Oyama, S., & Kurihara, M. (2016). Fine-tuning deep
convolutional neural networks for distinguishing illustrations from photographs.
Expert Systems with Applications, 66, 295–301.
Gao, J., French, A. P., Pound, M. P., He, Y., Pridmore, T. P., & Pieters, J. G. (2020).
Deep convolutional neural networks for image-based convolvulus sepium detection
in sugar beet fields. Plant Methods, 16 (1), 1–12.
Gao, J., Nuyttens, D., Lootens, P., He, Y., & Pieters, J. G. (2018). Recognising weeds
in a maize crop using a random forest machine-learning algorithm and near-
infrared snapshot mosaic hyperspectral imagery. Biosystems Engineering, 170, 39–
50. https://doi.org/10.1016/j.biosystemseng.2018.03.006
Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021.
arXiv preprint arXiv:2107.08430. https://doi.org/10.48550/arXiv.2107.08430
Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti
dataset. The International Journal of Robotics Research, 32 (11), 1231–1237.

188
Bibliography

Gharde, Y., Singh, P., Dubey, R., & Gupta, P. (2018). Assessment of yield and economic
losses in agriculture due to weeds in india. Crop Protection, 107, 12–18. https:
//doi.org/https://doi.org/10.1016/j.cropro.2018.01.007
Gill, S. S., Xu, M., Ottaviani, C., Patros, P., Bahsoon, R., Shaghaghi, A., Golec, M.,
Stankovski, V., Wu, H., Abraham, A., et al. (2022). Ai for next generation com-
puting: Emerging trends and future directions. Internet of Things, 19, 100514.
https://doi.org/https://doi.org/10.1016/j.iot.2022.100514
Girshick, R. (2015). Fast r-cnn. Proceedings of the IEEE international conference on
computer vision, 1440–1448.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for
accurate object detection and semantic segmentation. Proceedings of the IEEE
conference on computer vision and pattern recognition, 580–587.
Giselsson, T. M., Jørgensen, R. N., Jensen, P. K., Dyrmann, M., & Midtiby, H. S. (2017).
A public image database for benchmark of plant seedling classification algorithms.
arXiv preprint arXiv:1711.05458.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint
arXiv:1406.2661.
Grisso, R. D., Alley, M. M., Thomason, W. E., Holshouser, D. L., & Roberson, G. T.
(2011). Precision farming tools: Variable-rate application.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G.,
Cai, J., et al. (2018). Recent advances in convolutional neural networks. Pattern
Recognition, 77, 354–377.
Guo, Q., Juefei-Xu, F., Xie, X., Ma, L., Wang, J., Yu, B., Feng, W., & Liu, Y. (2020).
Watch out! motion is blurring the vision of your deep neural networks. Advances
in Neural Information Processing Systems, 33, 975–985.
Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., & Feris, R. (2019). Spottune:
Transfer learning through adaptive fine-tuning. Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 4805–4814.
Haggag, M., Abdelhay, S., Mecheter, A., Gowid, S., Musharavati, F., & Ghani, S. (2019).
An intelligent hybrid experimental-based deep learning algorithm for tomato-

189
Bibliography

sorting controllers. IEEE Access, 7, 106890–106898. https : / / doi . org / https : / /


doi.org/10.1109/ACCESS.2019.2932730
Hall, D., Dayoub, F., Perez, T., & Mccool, C. (2018). A rapidly deployable classification
system using visual data for the application of precision weed management. Com-
puters and Electronics in Agriculture, 148, 107–120. https://doi.org/10.1016/j.
compag.2018.02.023
Hamuda, E., Glavin, M., & Jones, E. (2016). A survey of image processing techniques
for plant extraction and segmentation in the field. Computers and electronics in
agriculture, 125, 184–199. https://doi.org/10.1016/j.compag.2016.04.024
Hamuda, E., Mc Ginley, B., Glavin, M., & Jones, E. (2017). Automatic crop detection
under field conditions using the hsv colour space and morphological operations.
Computers and electronics in agriculture, 133, 97–107.
Han, J.-W., Zuo, M., Zhu, W.-Y., Zuo, J.-H., Lü, E.-L., & Yang, X.-T. (2021). A com-
prehensive review of cold chain logistics for fresh agricultural products: Current
status, challenges, and future trends. Trends in Food Science & Technology, 109,
536–551.
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu,
Y., et al. (2022). A survey on vision transformer. IEEE transactions on pattern
analysis and machine intelligence, 45 (1), 87–110. https : / / doi . org / 10 . 1109 /
TPAMI.2022.3152247
Hand, D. J. (2009). Measuring classifier performance: A coherent alternative to the area
under the roc curve. Machine learning, 77 (1), 103–123.
Haque, M., & Sohel, F. (2022). Deep network with score level fusion and inference-based
transfer learning to recognize leaf blight and fruit rot diseases of eggplant. Agri-
culture, 12 (8), 1160. https://doi.org/10.3390/agriculture12081160
Harker, K. N., & O’Donovan, J. T. (2013). Recent weed control, weed management, and
integrated weed management. Weed Technology, 27 (1), 1–11.
Hasan, A. M., Diepeveen, D., Laga, H., Jones, M. G., & Sohel, F. (2023a). Image patch-
based deep learning approach for crop and weed recognition. Ecological informat-
ics, 78, 102361.
Hasan, A. M., Diepeveen, D., Laga, H., Jones, M. G., & Sohel, F. (2024). Object-level
benchmark for deep learning-based detection and classification of weed species.

190
Bibliography

Crop Protection, 177, 106561. https://doi.org/https://doi.org/10.1016/j.cropro.


2023.106561
Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G. (2021). A survey of deep
learning techniques for weed detection from images. Computers and Electronics in
Agriculture, 184, 106–067.
Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G. (2023b). Weed recogni-
tion using deep learning techniques on class-imbalanced imagery. Crop and Pasture
Science. https://doi.org/10.1071/CP21626
Haug, S., & Ostermann, J. (2014). A crop/weed field image dataset for the evaluation of
computer vision based precision agriculture tasks. European Conference on Com-
puter Vision, 105–116.
Haug, S., & Ostermann, J. (2015). A crop/weed field image dataset for the evaluation of
computer vision based precision agriculture tasks. Computer Vision - ECCV 2014
Workshops, 105–116. https://doi.org/10.1007/978-3-319-16220-1_8
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. Proceedings of the
IEEE international conference on computer vision, 2961–2969.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition,
770–778.
He, Y., Zhu, C., Wang, J., Savvides, M., & Zhang, X. (2019a). Bounding box regres-
sion with uncertainty for accurate object detection. Proceedings of the ieee/cvf
conference on computer vision and pattern recognition, 2888–2897.
He, Y., Zeng, H., Fan, Y., Ji, S., & Wu, J. (2019b). Application of deep learning in
integrated pest management: A real-time system for detection and diagnosis of
oilseed rape pests. Mobile Information Systems, 2019. https : / / doi . org / https :
//doi.org/10.1155/2019/4570808
Heap, I. (2014). Global perspective of herbicide-resistant weeds. Pest management sci-
ence, 70 (9), 1306–1315. https://doi.org/10.1002/ps.3696
Hemming, J., & Rath, T. (2002). Image processing for plant determination using the
hough transform and clustering methods. Gartenbauwissenschaft, 67 (1), 1–10.

191
Bibliography

Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to


common corruptions and perturbations. arXiv preprint arXiv:1903.12261. https:
//doi.org/10.48550/arXiv.1903.12261
Hentschel, C., Wiradarma, T. P., & Sack, H. (2016). Fine tuning cnns with scarce train-
ing data—adapting imagenet to art epoch classification. 2016 IEEE International
Conference on Image Processing (ICIP), 3693–3697.
Higgins, V., Bryant, M., Howell, A., & Battersby, J. (2017). Ordering adoption: Materi-
ality, knowledge and farmer engagement with precision agriculture technologies.
Journal of Rural Studies, 55, 193–202.
Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep
belief nets. Neural computation, 18 (7), 1527–1554.
Holt, J. S. (2004). Principles of weed management in agroecosystems and wildlands1.
Weed Technology, 18 (sp1), 1559–1562.
Hosseini, M.-P., Lu, S., Kamaraj, K., Slowikowski, A., & Venkatesh, H. C. (2020). Deep
learning architectures. In Deep learning: Concepts and architectures (pp. 1–24).
Springer.
Hou, L., Samaras, D., Kurc, T. M., Gao, Y., Davis, J. E., & Saltz, J. H. (2016). Patch-
based convolutional neural network for whole slide tissue image classification. Pro-
ceedings of the IEEE conference on computer vision and pattern recognition, 2424–
2433. https://doi.org/10.1109/CVPR.2016.266
Hu, K., Coleman, G., Zeng, S., Wang, Z., & Walsh, M. (2020). Graph weeds net: A graph-
based deep learning method for weed recognition. Computers and Electronics in
Agriculture, 174, 105520.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected
convolutional networks. Proceedings of the IEEE conference on computer vision
and pattern recognition, 4700–4708. https://doi.org/10.1109/CVPR.2017.243
Huang, H., Deng, J., Lan, Y., Yang, A., Deng, X., Wen, S., Zhang, H., & Zhang, Y.
(2018a). Accurate weed mapping and prescription map generation based on fully
convolutional networks using uav imagery. Sensors, 18 (10), 3299.
Huang, H., Deng, J., Lan, Y., Yang, A., Deng, X., & Zhang, L. (2018b). A fully convolu-
tional network for weed mapping of unmanned aerial vehicle (uav) imagery. PloS
one, 13 (4), e0196302.

192
Bibliography

Huang, H., Lan, Y., Deng, J., Yang, A., Deng, X., Zhang, L., & Wen, S. (2018c). A
semantic labeling approach for accurate weed mapping of high resolution uav
imagery. Sensors, 18 (7), 2113.
Huang, H., Lan, Y., Yang, A., Zhang, Y., Wen, S., & Deng, J. (2020). Deep learning versus
object-based image analysis (obia) in weed mapping of uav imagery. International
Journal of Remote Sensing, 41 (9), 3446–3479.
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna,
Z., Song, Y., Guadarrama, S., et al. (2017). Speed/accuracy trade-offs for modern
convolutional object detectors. Proceedings of the IEEE conference on computer
vision and pattern recognition, 7310–7311.
Huang, S.-W., Lin, C.-T., Chen, S.-P., Wu, Y.-Y., Hsu, P.-H., & Lai, S.-H. (2018). Auggan:
Cross domain adaptation with gan-based data augmentation. Proceedings of the
European Conference on Computer Vision (ECCV), 718–731.
Huisman, M., Van Rijn, J. N., & Plaat, A. (2021). A survey of deep meta-learning.
Artificial Intelligence Review, 54 (6), 4483–4541. https://doi.org/https://doi.org/
10.1007/s10462-021-10004-4
Hussain, M. (2023). Yolo-v1 to yolo-v8, the rise of yolo and its complementary nature
toward digital manufacturing and industrial defect detection. Machines, 11 (7),
677. https://doi.org/10.3390/machines11070677
Hussain, N., Farooque, A. A., Schumann, A. W., McKenzie-Gopsill, A., Esau, T., Abbas,
F., Acharya, B., & Zaman, Q. (2020). Design and development of a smart variable
rate sprayer using deep learning. Remote Sensing, 12 (24), 4091. https://doi.org/
10.3390/rs12244091
Iqbal, N., Manalil, S., Chauhan, B. S., & Adkins, S. W. (2019). Investigation of alternate
herbicides for effective weed management in glyphosate-tolerant cotton. Archives
of Agronomy and Soil Science, 65 (13), 1885–1899.
Ishak, A. J., Mokri, S. S., Mustafa, M. M., & Hussain, A. (2007). Weed detection utilizing
quadratic polynomial and roi techniques. 2007 5th Student Conference on Research
and Development, 1–5.
Jafari, A., Mohtasebi, S. S., Jahromi, H. E., & Omid, M. (2006). Weed detection in sugar
beet fields using machine vision. Int. J. Agric. Biol, 8 (5), 602–605.

193
Bibliography

Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., & Bengio, Y. (2017). The one hundred
layers tiramisu: Fully convolutional densenets for semantic segmentation. Proceed-
ings of the IEEE conference on computer vision and pattern recognition workshops,
11–19.
Jensen, T. A., Smith, B., & Defeo, L. F. (2020a). An automated site-specific fallow weed
management system using unmanned aerial vehicles.
Jensen, T. A., Smith, B., & Defeo, L. F. (2020b). An automated site-specific fallow weed
management system using unmanned aerial vehicles.
Jiang, H., Zhang, C., Qiao, Y., Zhang, Z., Zhang, W., & Song, C. (2020). Cnn feature
based graph convolutional network for weed and crop recognition in smart farming.
Computers and Electronics in Agriculture, 174, 105450.
Jiang, Y., Li, C., Paterson, A. H., & Robertson, J. S. (2019). Deepseedling: Deep con-
volutional network and kalman filter for plant seedling detection and counting in
the field. Plant methods, 15 (1), 141.
Jin, X., Che, J., & Chen, Y. (2021). Weed identification using deep learning and image
processing in vegetable plantation. IEEE Access, 9, 10940–10950. https://doi.org/
https://doi.org/10.1109/ACCESS.2021.3050296
Jocher, G., Chaurasia, A., & Qiu, J. (2023a). Ultralytics yolov8. https://github.com/
ultralytics/ultralytics
Jocher, G., Chaurasia, A., & Qiu, J. (2023b, January). YOLO by Ultralytics (Version 8.0.0).
https://github.com/ultralytics/ultralytics
Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey.
Computers and electronics in agriculture, 147, 70–90.
Karimi, Y., Prasher, S., Patel, R., & Kim, S. (2006). Application of support vector ma-
chine technology for weed and nitrogen stress detection in corn. Computers and
electronics in agriculture, 51 (1-2), 99–109.
Karunathilake, E., Le, A. T., Heo, S., Chung, Y. S., & Mansoor, S. (2023). The path to
smart farming: Innovations and opportunities in precision agriculture. Agriculture,
13 (8), 1593. https://doi.org/https://doi.org/10.3390/agriculture13081593
Kassani, S. H., Kassani, P. H., Khazaeinezhad, R., Wesolowski, M. J., Schneider, K. A.,
& Deters, R. (2019). Diabetic retinopathy classification using a modified xcep-

194
Bibliography

tion architecture. 2019 IEEE international symposium on signal processing and


information technology (ISSPIT), 1–6.
Kaya, Ö., Çodur, M. Y., & Mustafaraj, E. (2023). Automatic detection of pedestrian
crosswalk with faster r-cnn and yolov7. Buildings, 13 (4), 1070. https://doi.org/
10.3390/buildings13041070
Kazmi, W., Garcia-Ruiz, F., Nielsen, J., Rasmussen, J., & Andersen, H. J. (2015a). Ex-
ploiting affine invariant regions and leaf edge shapes for weed detection. Computers
and Electronics in Agriculture, 118, 290–299. https://doi.org/10.1016/j.compag.
2015.08.023
Kazmi, W., Garcia-Ruiz, F. J., Nielsen, J., Rasmussen, J., & Andersen, H. J. (2015b).
Detecting creeping thistle in sugar beet fields using vegetation indices. Computers
and Electronics in Agriculture, 112, 10–19.
Keceli, A. S., Kaya, A., Catal, C., & Tekinerdogan, B. (2022). Deep learning-based multi-
task prediction system for plant disease and species detection. Ecological Infor-
matics, 69, 101679. https://doi.org/10.1016/j.ecoinf.2022.101679
Khaki, S., & Wang, L. (2019). Crop yield prediction using deep neural networks. Frontiers
in plant science, 10, 621. https://doi.org/https://doi.org/10.3389/fpls.2019.00621
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., & Togneri, R. (2017). Cost-sensitive
learning of deep feature representations from imbalanced data. IEEE transactions
on neural networks and learning systems, 29 (8), 3573–3587.
Khoshdeli, M., Cong, R., & Parvin, B. (2017). Detection of nuclei in h&e stained sections
using convolutional neural networks. 2017 IEEE EMBS International Conference
on Biomedical & Health Informatics (BHI), 105–108.
Khotimah, W. N., Bennamoun, M., Boussaid, F., Xu, L., Edwards, D., & Sohel, F.
(2023). Mce-st: Classifying crop stress using hyperspectral data with a multiscale
conformer encoder and spectral-based tokens. International Journal of Applied
Earth Observation and Geoinformation, 118, 103286. https://doi.org/10.1016/j.
jag.2023.103286
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907.

195
Bibliography

Kirk, K., Andersen, H. J., Thomsen, A. G., Jørgensen, J. R., & Jørgensen, R. N. (2009).
Estimation of leaf area index in cereal crops using red–green images. Biosystems
Engineering, 104 (3), 308–317. https://doi.org/10.1016/j.compag.2008.03.009
Knoll, F. J., Czymmek, V., Harders, L. O., & Hussmann, S. (2019). Real-time clas-
sification of weeds in organic carrot production using deep learning algorithms.
Computers and Electronics in Agriculture, 167, 105097.
Kodagoda, S., Zhang, Z., Ruiz, D., & Dissanayake, G. (2008). Weed detection and clas-
sification for autonomous farming. Intelligent Production Machines and Systems.
Kogan, M. (1998). Integrated pest management: Historical perspectives and contempo-
rary developments. Annual review of entomology, 43 (1), 243–270.
Korres, N. E., Burgos, N. R., Travlos, I., Vurro, M., Gitsopoulos, T. K., Varanasi, V. K.,
Duke, S. O., Kudsk, P., Brabham, C., Rouse, C. E., et al. (2019). New direc-
tions for integrated weed management: Modern technologies, tools and knowledge
discovery. Advances in Agronomy, 155, 243–319.
Kounalakis, T., Malinowski, M. J., Chelini, L., Triantafyllidis, G. A., & Nalpantidis,
L. (2018). A robotic system employing deep learning for visual recognition and
detection of weeds in grasslands. 2018 IEEE International Conference on Imaging
Systems and Techniques (IST), 1–6.
Kounalakis, T., Triantafyllidis, G. A., & Nalpantidis, L. (2019). Deep learning-based vi-
sual recognition of rumex for robotic precision farming. Computers and Electronics
in Agriculture, 165, 104973.
Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future direc-
tions. Progress in Artificial Intelligence, 5 (4), 221–232.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. Advances in neural information processing systems,
25, 1097–1105.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep
convolutional neural networks. Communications of the ACM, 60 (6), 84–90. https:
//doi.org/10.1145/3065386
Kukar, M., Vračar, P., Košir, D., Pevec, D., Bosnić, Z., et al. (2019). Agrodss: A decision
support system for agriculture and farming. Computers and Electronics in Agricul-
ture, 161, 260–271. https://doi.org/https://doi.org/10.1016/j.compag.2018.04.001

196
Bibliography

Kumar, A., Zhang, Z. J., & Lyu, H. (2020). Object detection in real time based on
improved single shot multi-box detector algorithm. EURASIP Journal on Wireless
Communications and Networking, 2020, 1–18. https://doi.org/https://doi.org/
10.1186/s13638-020-01826-x
Kumar, H. (2019, April). Data augmentation techniques. https://iq.opengenus.org/data-
augmentation/
Kussul, N., Lavreniuk, M., Skakun, S., & Shelestov, A. (2017). Deep learning classification
of land cover and crop types using remote sensing data. IEEE Geoscience and
Remote Sensing Letters, 14 (5), 778–782. https://doi.org/10.1109/LGRS.2017.
2681128
Kuzuhara, H., Takimoto, H., Sato, Y., & Kanagawa, A. (2020). Insect pest detection and
identification method based on deep learning for realizing a pest control system.
2020 59th Annual Conference of the Society of Instrument and Control Engineers
of Japan (SICE), 709–714. https://doi.org/https://doi.org/10.23919/SICE48898.
2020.9240458
Lal, R. (1991). Soil structure and sustainability. Journal of sustainable agriculture, 1 (4),
67–92.
Lam, O. H. Y., Dogotari, M., Prüm, M., Vithlani, H. N., Roers, C., Melville, B., Zimmer,
F., & Becker, R. (2020). An open source workflow for weed mapping in native
grassland using unmanned aerial vehicle: Using rumex obtusifolius as a case study.
European Journal of Remote Sensing, 1–18.
Lameski, P., Zdravevski, E., & Kulakov, A. (2018). Review of automated weed control
approaches: An environmental impact perspective. International Conference on
Telecommunications, 132–147.
Lameski, P., Zdravevski, E., Trajkovik, V., & Kulakov, A. (2017). Weed detection dataset
with rgb images taken under variable light conditions. International Conference
on ICT Innovations, 112–119.
Lammie, C., Olsen, A., Carrick, T., & Azghadi, M. R. (2019). Low-power and high-speed
deep fpga inference engines for weed classification at the edge. IEEE Access, 7,
51171–51184. https://doi.org/10.1109/ACCESS.2019

197
Bibliography

Le, V. N. T., Ahderom, S., & Alameh, K. (2020a). Performances of the lbp based algo-
rithm over cnn models for detecting crops and weeds with similar morphologies.
Sensors, 20 (8), 2193.
Le, V. N. T., Ahderom, S., Apopei, B., & Alameh, K. (2020b). A novel method for detect-
ing morphologically similar crops and weeds based on the combination of contour
masks and filtered local binary pattern operators. GigaScience, 9 (3), giaa017.
Le, V. N. T., Truong, G., & Alameh, K. (2021). Detecting weeds from crops under complex
field environments based on faster rcnn. 2020 IEEE Eighth International Confer-
ence on Communications and Electronics (ICCE), 350–355. https://doi.org/10.
1109/ICCE48956.2021.9352073
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521 (7553), 436–444.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., &
Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition.
Neural computation, 1 (4), 541–551.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A.,
Tejani, A., Totz, J., Wang, Z., et al. (2017). Photo-realistic single image super-
resolution using a generative adversarial network. Proceedings of the IEEE con-
ference on computer vision and pattern recognition, 4681–4690.
Lee, D.-H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method
for deep neural networks. Workshop on challenges in representation learning,
ICML, 3 (2).
Leminen Madsen, S., Mathiassen, S. K., Dyrmann, M., Laursen, M. S., Paz, L.-C., &
Jørgensen, R. N. (2020). Open plant phenotype database of common weeds in
denmark. Remote Sensing, 12 (8), 1246.
Li, P., He, D., Qiao, Y., & Yang, C. (2013). An application of soft sets in weed identifi-
cation. 2013 Kansas City, Missouri, July 21-July 24, 2013, 1.
Li, W., Zheng, T., Yang, Z., Li, M., Sun, C., & Yang, X. (2021). Classification and detec-
tion of insects from field images using deep learning for smart pest management:
A systematic review. Ecological Informatics, 66, 101460. https://doi.org/10.1016/
j.ecoinf.2021.101460

198
Bibliography

Li, Y., Zhang, H., Xue, X., Jiang, Y., & Shen, Q. (2018). Deep learning for remote sensing
image classification: A survey. Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 8 (6), e1264.
Li, Y., Guo, Z., Shuang, F., Zhang, M., & Li, X. (2022). Key technologies of machine
vision for weeding robots: A review and benchmark. Computers and Electronics
in Agriculture, 196, 106880. https://doi.org/10.1016/j.compag.2022.106880
Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J., & Feng, J. (2020). Overcoming
classifier imbalance for long-tail object detection with balanced group softmax.
Proceedings of the IEEE/CVF conference on computer vision and pattern recogni-
tion, 10991–11000.
Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning
in agriculture: A review. Sensors, 18 (8), 2674.
Liang, W.-C., Yang, Y.-J., & Chao, C.-M. (2019). Low-cost weed identification system us-
ing drones. 2019 Seventh International Symposium on Computing and Networking
Workshops (CANDARW), 260–263.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object
detection. Proceedings of the IEEE international conference on computer vision,
2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., &
Zitnick, C. L. (2014). Microsoft coco: Common objects in context. European con-
ference on computer vision, 740–755.
Lindblom, J., Lundström, C., Ljung, M., & Jonsson, A. (2017). Promoting sustainable
intensification in precision agriculture: Review of decision support systems devel-
opment and strategies. Precision agriculture, 18, 309–331.
Liu, B., & Bruch, R. (2020). Weed detection for selective spraying: A review. Current
Robotics Reports, 1 (1), 19–26.
Liu, J., Xiang, J., Jin, Y., Liu, R., Yan, J., & Wang, L. (2021). Boost precision agriculture
with unmanned aerial vehicle remote sensing and edge intelligence: A survey.
Remote Sensing, 13 (21), 4387. https://doi.org/10.3390/rs13214387
Liu, J., & Wang, X. (2021). Plant diseases and pests detection based on deep learning:
A review. Plant Methods, 17, 1–18. https://doi.org/10.1186/s13007-021-00722-9

199
Bibliography

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016).
Ssd: Single shot multibox detector. European conference on computer vision, 21–
37.
Liu, Y., Sun, P., Wergeles, N., & Shang, Y. (2021). A survey and performance evalu-
ation of deep learning methods for small object detection. Expert Systems with
Applications, 172, 114602. https://doi.org/10.1016/j.eswa.2021.114602
López-Correa, J. M., Moreno, H., Ribeiro, A., & Andújar, D. (2022). Intelligent weed
management based on object detection neural networks in tomato crops. Agron-
omy, 12 (12), 2953. https://doi.org/10.3390/agronomy12122953
López-Granados, F. (2011). Weed detection for site-specific weed management: Mapping
and real-time approaches. Weed Research, 51 (1), 1–11.
Lottes, P., Behley, J., Chebrolu, N., Milioto, A., & Stachniss, C. (2018a). Joint stem de-
tection and crop-weed classification for plant-specific treatment in precision farm-
ing. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 8233–8238.
Lottes, P., Behley, J., Chebrolu, N., Milioto, A., & Stachniss, C. (2020). Robust joint stem
detection and crop-weed classification using image sequences for plant-specific
treatment in precision farming. Journal of Field Robotics, 37 (1), 20–34.
Lottes, P., Behley, J., Milioto, A., & Stachniss, C. (2018b). Fully convolutional networks
with sequential information for robust crop and weed detection in precision farm-
ing. IEEE Robotics and Automation Letters, 3 (4), 2870–2877.
Lou, H., Duan, X., Guo, J., Liu, H., Gu, J., Bi, L., & Chen, H. (2023). Dc-yolov8: Small-
size object detection algorithm based on camera sensor. Electronics, 12 (10), 2323.
https://doi.org/10.3390/electronics12102323
Lu, Y., Young, S., Wang, H., & Wijewardane, N. (2022). Robust plant segmentation of
color images based on image contrast optimization. Computers and Electronics in
Agriculture, 193, 106711. https://doi.org/10.1016/j.compag.2022.106711
Ma, X., Deng, X., Qi, L., Jiang, Y., Li, H., Wang, Y., & Xing, X. (2019). Fully convolu-
tional network for rice seedling and weed image segmentation at the seedling stage
in paddy fields. PloS one, 14 (4), e0215676.
Maimaitijiang, M., Sagan, V., Sidike, P., Hartling, S., Esposito, F., & Fritschi, F. B.
(2020). Soybean yield prediction from uav using multimodal data fusion and deep

200
Bibliography

learning. Remote sensing of environment, 237, 111599. https://doi.org/10.1016/j.


rse.2019.111599
Mayachita, I. (2020, August). Understanding graph convolutional networks for node clas-
sification. https://towardsdatascience.com/understanding- graph- convolutional-
networks-for-node-classification-a2bfdb7aba7b
McLeod, R. (2018, November). Annual costs of weeds in australia. https://invasives.com.
au/wp-content/uploads/2019/01/Cost-of-weeds-report.pdf
Medina-Pastor, P., & Triacchini, G. (2020). The 2018 european union report on pesticide
residues in food. EFSA Journal, 18 (4), e06057.
Melekhov, I., Kannala, J., & Rahtu, E. (2016). Siamese network features for image match-
ing. 2016 23rd international conference on pattern recognition (ICPR), 378–383.
https://doi.org/https://doi.org/10.1109/ICPR.2016.7899663
Merfield, C. N. (2016). Robotic weeding’s false dawn? Ten requirements for fully au-
tonomous mechanical weed management. Weed Research, 56 (5), 340–344. https:
//doi.org/10.1111/wre.12217
Meyer, G., Mehta, T., Kocher, M., Mortensen, D., & Samal, A. (1998). Textural imaging
and discriminant analysis for distinguishingweeds for spot spraying. Transactions
of the ASAE, 41 (4), 1189.
Meyer, G. E., & Neto, J. C. (2008). Verification of color vegetation indices for automated
crop imaging applications. Computers and electronics in agriculture, 63 (2), 282–
293.
Milioto, A., Lottes, P., & Stachniss, C. (2017). Real-time blob-wise sugar beets vs weeds
classification for monitoring fields using convolutional neural networks. ISPRS
Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
4, 41.
Mithila, J., Hall, J. C., Johnson, W. G., Kelley, K. B., & Riechers, D. E. (2011). Evo-
lution of resistance to auxinic herbicides: Historical perspectives, mechanisms of
resistance, and implications for broadleaf weed management in agronomic crops.
Weed science, 59 (4), 445–457. https://doi.org/https://doi.org/10.1614/WS-D-
11-00062.1
Moazzam, S. I., Khan, U. S., Tiwana, M. I., Iqbal, J., Qureshi, W. S., & Shah, S. I. (2019).
A review of application of deep learning for weeds and crops classification in agri-

201
Bibliography

culture. 2019 International Conference on Robotics and Automation in Industry


(ICRAI), 1–6.
Monaco, T. J., Weller, S. C., & Ashton, F. M. (2002). Weed science: Principles and
practices. John Wiley & Sons.
Moore, K. J., & Nelson, C. J. (2017). Structure and morphology of grasses. Forages,
volume 1: An introduction to grassland agriculture, 1, 19.
Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic,
E. (2015). Deep learning applications and challenges in big data analytics. Journal
of Big Data, 2 (1), 1.
Nasiri, A., Taheri-Garavand, A., & Zhang, Y.-D. (2019). Image-based deep learning au-
tomated sorting of date fruit. Postharvest biology and technology, 153, 133–141.
https://doi.org/https://doi.org/10.1016/j.postharvbio.2019.04.003
Nevavuori, P., Narra, N., & Lipping, T. (2019). Crop yield prediction with deep convo-
lutional neural networks. Computers and electronics in agriculture, 163, 104859.
https://doi.org/https://doi.org/10.1016/j.compag.2019.104859
Nielsen, M. A. (2015). Neural networks and deep learning (Vol. 25). Determination press
San Francisco, CA, USA.
Nimisha, T. M., Kumar Singh, A., & Rajagopalan, A. N. (2017). Blur-invariant deep
learning for blind-deblurring. Proceedings of the IEEE international conference on
computer vision, 4752–4760. https://doi.org/10.1109/ICCV.2017.509
Nkemelu, D. K., Omeiza, D., & Lubalo, N. (2018). Deep convolutional neural network
for plant seedlings classification. arXiv preprint arXiv:1811.08404.
Nosov, V., Zhenzhebir, V., Nurgaziev, R., Sleptsova, L., & Eryushev, M. (2020). Farming
and agricultural consumers’ cooperative: Challenges and opportunities. E3S Web
of Conferences, 161, 01067.
Okese, K. A., Kankam, T., Boamah, J., & Evans, O. M. (2020, July). Basic principles of
weeds control and management. https://blog.agrihomegh.com/principles-weeds-
control-management/
Olsen, A., Konovalov, D. A., Philippa, B., Ridd, P., Wood, J. C., Johns, J., Banks, W.,
Girgenti, B., Kenny, O., Whinney, J., et al. (2019). Deepweeds: A multiclass weed
species image dataset for deep learning. Scientific reports, 9 (1), 1–12.

202
Bibliography

OpenCV. (2019, December). Geometric image transformations. https://docs.opencv.org/


2.4/modules/imgproc/doc/geometric_transformations.html#resize
Osorio, K., Puerto, A., Pedraza, C., Jamaica, D., & Rodríguez, L. (2020). A deep learning
approach for weed detection in lettuce crops using multispectral images. AgriEngi-
neering, 2 (3), 471–488.
Ozcift, A., & Gulten, A. (2011). Classifier ensemble construction with rotation forest to
improve medical diagnosis performance of machine learning algorithms. Computer
methods and programs in biomedicine, 104 (3), 443–451.
Padilla, R., Netto, S. L., & da Silva, E. A. (2020). A survey on performance met-
rics for object-detection algorithms. 2020 International Conference on Systems,
Signals and Image Processing (IWSSIP), 237–242. https : / / doi . org / 10 . 1109 /
IWSSIP48289.2020.9145130
Pan, M., Liu, Y., Cao, J., Li, Y., Li, C., & Chen, C.-H. (2020). Visual recognition based
on deep learning for navigation mark classification. IEEE Access, 8, 32767–32775.
Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on
knowledge and data engineering, 22 (10), 1345–1359.
Partel, V., Kakarla, S. C., & Ampatzidis, Y. (2019a). Development and evaluation of a
low-cost and smart technology for precision weed management utilizing artificial
intelligence. Computers and electronics in agriculture, 157, 339–350.
Partel, V., Kim, J., Costa, L., Pardalos, P., & Ampatzidis, Y. (2019b). Smart sprayer for
precision weed control using artificial intelligence: Comparison of deep learning
frameworks. Association for the Advancement of Artificial Intelligence.
Partel, V., Kim, J., Costa, L., Pardalos, P. M., & Ampatzidis, Y. (2020). Smart sprayer
for precision weed control using artificial intelligence: Comparison of deep learning
frameworks. ISAIM.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin,
Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-
performance deep learning library. Advances in neural information processing sys-
tems, 32.
Patel, D., & Kumbhar, B. (2016). Weed and its management: A major threats to crop
economy. J. Pharm. Sci. Bioscientific Res, 6 (6), 453–758.

203
Bibliography

Patidar, S., Singh, U., Sharma, S. K., et al. (2020). Weed seedling detection using mask re-
gional convolutional neural network. 2020 International Conference on Electronics
and Sustainable Communication Systems (ICESC), 311–316.
Patterson, J., & Gibson, A. (2017). Deep learning: A practitioner’s approach. " O’Reilly
Media, Inc."
Pearlstein, L., Kim, M., & Seto, W. (2016). Convolutional neural network application to
plant detection, based on synthetic imagery. 2016 IEEE Applied Imagery Pattern
Recognition Workshop (AIPR), 1–4.
Pertuz, S., Puig, D., & Garcia, M. A. (2013). Analysis of focus measure operators for
shape-from-focus. Pattern Recognition, 46 (5), 1415–1432. https : / / doi . org / 10 .
1016/j.patcog.2012.11.011
Peteinatos, G., Reichel, P., Karouta, J., Andújar, D., & Gerhards, R. (2020). Weed iden-
tification in maize, sunflower, and potatoes with the aid of convolutional neural
networks. Remote Sensing, 12 (24), 4185.
Petrich, L., Lohrmann, G., Neumann, M., Martin, F., Frey, A., Stoll, A., & Schmidt,
V. (2019). Detection of colchicum autumnale in drone images, using a machine-
learning approach.
Precision spraying - weed sprayer. (n.d.). Retrieved January 25, 2021, from https://www.
weed-it.com/
PyTorch. (2020, August). Ai for ag: Production machine learning for agriculture. https:
//medium.com/pytorch/ai-for-ag-production-machine-learning-for-agriculture-
e8cfdb9849a1
Qian, Q., Chen, L., Li, H., & Jin, R. (2020). Dr loss: Improving object detection by
distributional ranking. Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 12164–12172.
Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., & Sun, J. (2019). Thundernet:
Towards real-time generic object detection on mobile devices. Proceedings of the
IEEE/CVF International Conference on Computer Vision, 6718–6727.
Quan, L., Feng, H., Lv, Y., Wang, Q., Zhang, C., Liu, J., & Yuan, Z. (2019). Maize
seedling detection under different growth stages and complex field environments
based on an improved faster r–cnn. Biosystems Engineering, 184, 1–23. https :
//doi.org/10.1016/j.biosystemseng.2019.05.002

204
Bibliography

Radoglou-Grammatikis, P., Sarigiannidis, P., Lagkas, T., & Moscholios, I. (2020). A com-
pilation of uav applications for precision agriculture. Computer Networks, 172,
107148.
Rai, N., Zhang, Y., Ram, B. G., Schumacher, L., Yellavajjala, R. K., Bajwa, S., & Sun,
X. (2023). Applications of deep learning in precision weed management: A review.
Computers and Electronics in Agriculture, 206, 107698. https://doi.org/https:
//doi.org/10.1016/j.compag.2023.107698
Raj, E., Appadurai, M., & Athiappan, K. (2021). Precision farming in modern agricul-
ture. In Smart agriculture automation using advanced technologies (pp. 61–87).
Springer. https://doi.org/10.1007/978-981-16-6124-2_4
Raja, R., Nguyen, T. T., Slaughter, D. C., & Fennimore, S. A. (2020). Real-time robotic
weed knife control system for tomato and lettuce based on geometric appearance
of plant labels. Biosystems Engineering, 194, 152–164.
Rakhmatulin, I., Kamilaris, A., & Andreasen, C. (2021). Deep neural networks to detect
weeds from crops in agricultural environments in real-time: A review. Remote
Sensing, 13 (21), 4486.
Ramcharan, A., Baranowski, K., McCloskey, P., Ahmed, B., Legg, J., & Hughes, D. P.
(2017). Deep learning for image-based cassava disease detection. Frontiers in plant
science, 8, 1852. https://doi.org/https://doi.org/10.3389/fpls.2017.01852
Ramirez, W., Achanccaray, P., Mendoza, L., & Pacheco, M. (2020). Deep convolutional
neural networks for weed detection in agricultural crops using optical aerial im-
ages. 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference
(LAGIRS), 133–137.
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine
learning. arXiv preprint arXiv:1811.12808. https://doi.org/10.48550/arXiv.1811.
12808
Rasmussen, J., Nørremark, M., & Bibby, B. M. (2007). Assessment of leaf cover and crop
soil cover in weed harrowing research using digital images. Weed Research, 47 (4),
299–310. https://doi.org/10.1111/j.1365-3180.2007.00565.x
Rasti, P., Ahmad, A., Samiei, S., Belin, E., & Rousseau, D. (2019). Supervised image
classification by scattering transform with application to weed detection in culture

205
Bibliography

crops of high density. Remote Sensing, 11 (3), 249. https : / / doi . org / 10 . 3390 /
rs11030249
Razfar, N., True, J., Bassiouny, R., Venkatesh, V., & Kashef, R. (2022). Weed detection
in soybean crops using custom lightweight deep learning models. Journal of Agri-
culture and Food Research, 8, 100308. https://doi.org/https://doi.org/10.1016/j.
jafr.2022.100308
Redmon, J. (n.d.). https://pjreddie.com/darknet/
Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. Proceedings of the
IEEE conference on computer vision and pattern recognition, 7263–7271.
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint
arXiv:1804.02767.
Reedha, R., Dericquebourg, E., Canals, R., & Hafiane, A. (2021). Vision transformers
for weeds and crops classification of high resolution uav images. arXiv preprint
arXiv:2109.02716. https://doi.org/10.3390/rs14030592
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object
detection with region proposal networks. arXiv preprint arXiv:1506.01497.
Rew, L., & Cousens, R. (2001). Spatial distribution of weeds in arable crops: Are current
sampling and analytical methods appropriate? Weed Research, 41 (1), 1–18. https:
//doi.org/10.1046/j.1365-3180.2001.00215.x
Rist, Y., Shendryk, I., Diakogiannis, F., & Levick, S. (2019). Weed mapping using very
high resolution satellite imagery and fully convolutional neural network. IGARSS
2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 9784–
9787.
Robertson, M., Kirkegaard, J., Peake, A., Creelman, Z., Bell, L., Lilley, J., Midwood, J.,
Zhang, H., Kleven, S., Duff, C., et al. (2016). Trends in grain production and yield
gaps in the high-rainfall zone of southern australia. Crop and Pasture Science,
67 (9), 921–937.
Robocrop spot sprayer: Weed removal. (2018, July). Retrieved January 25, 2021, from
https://garford.com/products/robocrop-spot-sprayer/
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for
biomedical image segmentation. International Conference on Medical image com-
puting and computer-assisted intervention, 234–241.

206
Bibliography

Ruigrok, T., van Henten, E., Booij, J., van Boheemen, K., & Kootstra, G. (2020).
Application-specific evaluation of a weed-detection algorithm for plant-specific
spraying. Sensors, 20 (24), 7262. https://doi.org/10.3390/s20247262
Sa, I., Chen, Z., Popović, M., Khanna, R., Liebisch, F., Nieto, J., & Siegwart, R. (2017).
Weednet: Dense semantic weed classification using multispectral images and mav
for smart farming. IEEE Robotics and Automation Letters, 3 (1), 588–595.
Sa, I., Popović, M., Khanna, R., Chen, Z., Lottes, P., Liebisch, F., Nieto, J., Stachniss,
C., Walter, A., & Siegwart, R. (2018). Weedmap: A large-scale semantic weed
mapping framework using aerial multispectral imaging and deep neural network
for precision farming. Remote Sensing, 10 (9), 1423.
Sabottke, C. F., & Spieler, B. M. (2020). The effect of image resolution on deep learning
in radiography. Radiology: Artificial Intelligence, 2 (1), e190015.
Sabzi, S., Abbaspour-Gilandeh, Y., & Arribas, J. I. (2020). An automatic visible-range
video weed detection, segmentation and classification prototype in potato field.
Heliyon, 6 (5), e03685.
Saha, S., Ghosh, M., Ghosh, S., Sen, S., Singh, P. K., Geem, Z. W., & Sarkar, R. (2020).
Feature selection for facial emotion recognition using cosine similarity-based har-
mony search algorithm. Applied Sciences, 10 (8), 2816. https : / / doi . org / https :
//doi.org/10.3390/app10082816
Sahlsten, J., Jaskari, J., Kivinen, J., Turunen, L., Jaanio, E., Hietala, K., & Kaski, K.
(2019). Deep learning fundus image analysis for diabetic retinopathy and macular
edema grading. Scientific reports, 9 (1), 1–11.
Sakyi, L. (2019, February). Linda sakyi. https://greenrootltd.com/2019/02/19/five-
general-categories-of-weed-control-methods/
Saleem, M. H., Potgieter, J., & Arif, K. M. (2019). Plant disease detection and classifica-
tion by deep learning. Plants, 8 (11), 468. https://doi.org/10.3390/plants8110468
Saleem, M. H., Potgieter, J., & Arif, K. M. (2022). Weed detection by faster rcnn model:
An enhanced anchor box approach. Agronomy, 12 (7), 1580. https://doi.org/10.
3390/agronomy12071580
Saleem, S. R., Zaman, Q. U., Schumann, A. W., & Naqvi, S. M. Z. A. (2023). Variable
rate technologies: Development, adaptation, and opportunities in agriculture. In
Precision agriculture (pp. 103–122). Elsevier.

207
Bibliography

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2:
Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on
computer vision and pattern recognition, 4510–4520.
Sarker, I. H. (2021). Deep learning: A comprehensive overview on techniques, taxonomy,
applications and research directions. SN Computer Science, 2 (6), 420. https://
doi.org/https://doi.org/10.1007/s42979-021-00815-1
Sarvini, T., Sneha, T., GS, S. G., Sushmitha, S., & Kumaraswamy, R. (2019). Perfor-
mance comparison of weed detection algorithms. 2019 International Conference
on Communication and Signal Processing (ICCSP), 0843–0847.
Scavo, A., & Mauromicale, G. (2020). Integrated weed management in herbaceous field
crops. Agronomy, 10 (4), 466.
Schneider, U. A., Havlík, P., Schmid, E., Valin, H., Mosnier, A., Obersteiner, M., Böttcher,
H., Skalskỳ, R., Balkovič, J., Sauer, T., et al. (2011). Impacts of population growth,
economic development, and technical change on global food production and con-
sumption. Agricultural Systems, 104 (2), 204–215.
Seelan, S. K., Laguette, S., Casady, G. M., & Seielstad, G. A. (2003). Remote sensing
applications for precision agriculture: A learning community approach. Remote
sensing of environment, 88 (1-2), 157–169.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017).
Grad-cam: Visual explanations from deep networks via gradient-based localization.
Proceedings of the IEEE international conference on computer vision, 618–626.
https://doi.org/10.1007/s11263-019-01228-7
Shaikh, T. A., Rasool, T., & Lone, F. R. (2022). Towards leveraging the role of machine
learning and artificial intelligence in precision agriculture and smart farming. Com-
puters and Electronics in Agriculture, 198, 107119.
Shammi, S., Sohel, F., Diepeveen, D., Zander, S., & Jones, M. G. (2023). Machine
learning-based detection of frost events in wheat plants from infrared thermog-
raphy. European Journal of Agronomy, 149, 126900. https://doi.org/10.1016/j.
eja.2023.126900
Shammi, S., Sohel, F., Diepeveen, D., Zander, S., Jones, M. G., Bekuma, A., & Biddulph,
B. (2022). Machine learning-based detection of freezing events using infrared ther-

208
Bibliography

mography. Computers and Electronics in Agriculture, 198, 107013. https://doi.


org/10.1016/j.compag.2022.107013
Shamsabadi, E. A., Xu, C., Rao, A. S., Nguyen, T., Ngo, T., & Dias-da-Costa, D. (2022).
Vision transformer-based autonomous crack detection on asphalt and concrete
surfaces. Automation in Construction, 140, 104316. https://doi.org/10.1016/j.
autcon.2022.104316
Shaner, D. L., & Beckie, H. J. (2014). The future for weed control and technology. Pest
management science, 70 (9), 1329–1339.
Shanmugam, S., Assunção, E., Mesquita, R., Veiros, A., & Gaspar, P. D. (2020). Auto-
mated weed detection systems: A review. KnE Engineering, 271–284.
Shao, L., Zhu, F., & Li, X. (2014). Transfer learning for visual categorization: A survey.
IEEE transactions on neural networks and learning systems, 26 (5), 1019–1034.
Sharma, A., Jain, A., Gupta, P., & Chowdary, V. (2020). Machine learning applications
for precision agriculture: A comprehensive review. IEEE Access, 9, 4843–4873.
Sharma, A., Liu, X., Yang, X., & Shi, D. (2017). A patch-based convolutional neural
network for remote sensing image classification. Neural Networks, 95, 19–28. https:
//doi.org/10.1016/j.neunet.2017.07.017
Sharpe, S. M., Schumann, A. W., & Boyd, N. S. (2019). Detection of carolina geranium
(geranium carolinianum) growing in competition with strawberry using convolu-
tional neural networks. Weed Science, 67 (2), 239–245.
Sharpe, S. M., Schumann, A. W., & Boyd, N. S. (2020). Goosegrass detection in straw-
berry and tomato using a convolutional neural network. Scientific Reports, 10 (1),
1–8.
Shelhamer, E., Long, J., & Darrell, T. (2017). Fully convolutional networks for semantic
segmentation. IEEE transactions on pattern analysis and machine intelligence,
39 (4), 640–651.
Shi, B., Osunkoya, O. O., Chadha, A., Florentine, S. K., & Dhileepan, K. (2021). Biology,
ecology and management of the invasive navua sedge (cyperus aromaticus)—a
global review. Plants, 10 (9), 1851. https://doi.org/https://doi.org/10.3390/
plants10091851
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for
deep learning. Journal of Big Data, 6 (1), 60.

209
Bibliography

Shrestha, A., & Mahmood, A. (2019). Review of deep learning algorithms and architec-
tures. IEEE access, 7, 53040–53065. https : / / doi . org / 10 . 1109 / ACCESS . 2019 .
2912200
Shukla, B. K., Maurya, N., & Sharma, M. (2023). Advancements in sensor-based tech-
nologies for precision agriculture: An exploration of interoperability, analytics and
deployment strategies. Engineering Proceedings, 58 (1), 22.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556.
Sinapan, I., Lin-Kwong-Chon, C., Damour, C., Kadjo, J.-J. A., & Benne, M. (2023).
Oxygen bubble dynamics in pem water electrolyzers with a deep-learning-based
approach. Hydrogen, 4 (3), 556–572. https://doi.org/10.3390/hydrogen4030036
Singh, A., Jones, S., Ganapathysubramanian, B., Sarkar, S., Mueller, D., Sandhu, K., &
Nagasubramanian, K. (2021). Challenges and opportunities in machine-augmented
plant stress phenotyping. Trends in Plant Science, 26 (1), 53–69. https://doi.org/
10.1016/j.tplants.2020.07.010
Singh, R. K., Berkvens, R., & Weyn, M. (2021). Agrifusion: An architecture for iot and
emerging technologies based on a precision agriculture survey. IEEE Access, 9,
136253–136283.
Sivakumar, A. N. V., Li, J., Scott, S., Psota, E., J Jhala, A., Luck, J. D., & Shi, Y.
(2020). Comparison of object detection and patch-based classification deep learn-
ing models on mid-to late-season weed detection in uav imagery. Remote Sensing,
12 (13), 2136.
Skovsen, S., Dyrmann, M., Mortensen, A. K., Laursen, M. S., Gislum, R., Eriksen, J.,
Farkhani, S., Karstoft, H., & Jorgensen, R. N. (2019). The grassclover image
dataset for semantic and hierarchical species understanding in agriculture. Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops.
Slaughter, D., Giles, D., & Downey, D. (2008). Autonomous robotic weed control systems:
A review. Computers and electronics in agriculture, 61 (1), 63–78.
Smith, M. J. (2018). Getting value from artificial intelligence in agriculture. Animal Pro-
duction Science, 60 (1), 46–54.

210
Bibliography

Sportelli, M., Apolo-Apolo, O. E., Fontanelli, M., Frasconi, C., Raffaelli, M., Peruzzi, A.,
& Perez-Ruiz, M. (2023). Evaluation of yolo object detectors for weed detection
in different turfgrass scenarios. Applied Sciences, 13 (14), 8502. https://doi.org/
10.3390/app13148502
Steinberg, R. (2017, December). 6 areas where artificial neural networks outperform hu-
mans. Retrieved December 25, 2020, from https://venturebeat.com/2017/12/08/
6-areas-where-artificial-neural-networks-outperform-humans/
Steup, R., Dombrowski, L., & Su, N. M. (2019). Feeding the world with data: Visions
of data-driven farming. Proceedings of the 2019 on Designing Interactive Systems
Conference, 1503–1515.
Stewart, R. E. (2018, May). Weed control. https://www.britannica.com/technology/
agricultural-technology/Weed-control
Stewart, R., Andriluka, M., & Ng, A. Y. (2016). End-to-end people detection in crowded
scenes. Proceedings of the IEEE conference on computer vision and pattern recog-
nition, 2325–2333.
Su, W.-H. (2020). Advanced machine learning in point spectroscopy, rgb-and hyperspectral-
imaging for automatic discriminations of crops and weeds: A review. Smart Cities,
3 (3), 767–792.
Sudars, K., Jasko, J., Namatevs, I., Ozola, L., & Badaukis, N. (2020). Dataset of anno-
tated food crops and weed images for robotic computer vision control. Data in
Brief, 105833.
Suh, H. K., Ijsselmuiden, J., Hofstee, J. W., & van Henten, E. J. (2018). Transfer learning
for the classification of sugar beet and volunteer potato under field conditions.
Biosystems engineering, 174, 50–65.
Sukegawa, S., Yoshii, K., Hara, T., Yamashita, K., Nakano, K., Yamamoto, N., Nagat-
suka, H., & Furuki, Y. (2020). Deep neural networks for dental implant system
classification. Biomolecules, 10 (7), 984.
Sunil, G., Zhang, Y., Koparan, C., Ahmed, M. R., Howatt, K., & Sun, X. (2022). Weed and
crop species classification using computer vision and deep learning technologies
in greenhouse conditions. Journal of Agriculture and Food Research, 9, 100325.
https://doi.org/https://doi.org/10.1016/j.jafr.2022.100325

211
Bibliography

Swain, K. C., Nørremark, M., Jørgensen, R. N., Midtiby, H. S., & Green, O. (2011). Weed
identification using an automated active shape matching (aasm) technique. biosys-
tems engineering, 110 (4), 450–457. https://doi.org/10.1016/j.biosystemseng.2011.
09.01
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-resnet
and the impact of residual connections on learning. Proceedings of the AAAI Con-
ference on Artificial Intelligence, 31 (1).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
V., & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the
IEEE conference on computer vision and pattern recognition, 1–9.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the
inception architecture for computer vision. Proceedings of the IEEE conference on
computer vision and pattern recognition, 2818–2826.
Taherkhani, A., Cosma, G., & McGinnity, T. M. (2020). Adaboost-cnn: An adaptive
boosting algorithm for convolutional neural networks to classify multi-class imbal-
anced datasets using transfer learning. Neurocomputing, 404, 351–366.
Takahashi, R., Matsubara, T., & Uehara, K. (2018). Ricap: Random image cropping
and patching data augmentation for deep cnns. Asian Conference on Machine
Learning, 786–798.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep
transfer learning. Artificial Neural Networks and Machine Learning–ICANN 2018:
27th International Conference on Artificial Neural Networks, Rhodes, Greece, Oc-
tober 4-7, 2018, Proceedings, Part III 27, 270–279.
Tang, J., Wang, D., Zhang, Z., He, L., Xin, J., & Xu, Y. (2017). Weed identification
based on k-means feature learning combined with convolutional neural network.
Computers and electronics in agriculture, 135, 63–70.
Tao, A., Barker, J., & Sarathy, S. (2016). Detectnet: Deep neural network for object
detection in digits. Parallel Forall, 4.
Teimouri, N., Dyrmann, M., Nielsen, P. R., Mathiassen, S. K., Somerville, G. J., &
Jørgensen, R. N. (2018). Weed growth stage estimator using deep convolutional
neural networks. Sensors, 18 (5), 1580.

212
Bibliography

Thambawita, V., Strümke, I., Hicks, S. A., Halvorsen, P., Parasa, S., & Riegler, M. A.
(2021). Impact of image resolution on deep learning performance in endoscopy
image classification: An experimental study using a large dataset of endoscopic
images. Diagnostics, 11 (12), 2183. https://doi.org/10.3390/diagnostics11122183
Tian, H., Wang, T., Liu, Y., Qiao, X., & Li, Y. (2020). Computer vision technology in
agricultural automation—a review. Information Processing in Agriculture, 7 (1),
1–19.
Tian, L., Slaughter, D., & Norris, R. (2000). Machine vision identification of tomato
seedlings for automated weed control. Transactions of ASAE, 40 (6), 1761–1768.
Toğaçar, M. (2022). Using darknet models and metaheuristic optimization methods to-
gether to detect weeds growing along with seedlings. Ecological Informatics, 68,
101519. https://doi.org/10.1016/j.ecoinf.2021.101519
Tong, K., Wu, Y., & Zhou, F. (2020). Recent advances in small object detection based
on deep learning: A review. Image and Vision Computing, 97, 103910. https :
//doi.org/10.1016/j.imavis.2020.103910
Trong, V. H., Gwang-hyun, Y., Vu, D. T., & Jin-young, K. (2020). Late fusion of mul-
timodal deep neural networks for weeds classification. Computers and Electronics
in Agriculture, 175, 105506.
Ullah, F., Salam, A., Abrar, M., & Amin, F. (2023). Brain tumor segmentation using a
patch-based convolutional neural network: A big data analysis approach. Mathe-
matics, 11 (7), 1635. https://doi.org/10.3390/math11071635
Umamaheswari, S., Arjun, R., & Meganathan, D. (2018). Weed detection in farm crops
using parallel image processing. 2018 Conference on Information and Communi-
cation Technology (CICT), 1–4.
Umamaheswari, S., & Jain, A. V. (2020). Encoder–decoder architecture for crop-weed
classification using pixel-wise labelling. 2020 International Conference on Artificial
Intelligence and Signal Processing (AISP), 1–6.
Valente, J., Doldersum, M., Roers, C., & Kooistra, L. (2019). Detecting rumex obtusifolius
weed plants in grasslands from uav rgb imagery using deep learning. ISPRS Annals
of Photogrammetry, Remote Sensing & Spatial Information Sciences, 4.

213
Bibliography

Van der Walt, S., Schönberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner, J. D.,
Yager, N., Gouillart, E., & Yu, T. (2014). Scikit-image: Image processing in python.
PeerJ, 2, e453.
Van Klompenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using
machine learning: A systematic literature review. Computers and Electronics in
Agriculture, 177, 105709. https://doi.org/https://doi.org/10.1016/j.compag.2020.
105709
Viraf. (2020, June). Create a synthetic image dataset - the "what", the "why" and the
"how". https://towardsdatascience.com/create- a- synthetic- image- dataset- the-
what-the-why-and-the-how-f820e6b6f718
Wahyudi, D., Soesanti, I., & Nugroho, H. A. (2022). Toward detection of small objects
using deep learning methods: A review. 2022 14th International Conference on
Information Technology and Electrical Engineering (ICITEE), 314–319. https://
doi.org/10.1109/ICITEE56407.2022.9954101
Wäldchen, J., & Mäder, P. (2018). Plant species identification using computer vision
techniques: A systematic literature review. Archives of Computational Methods in
Engineering, 25 (2), 507–543.
Wang, A., Xu, Y., Wei, X., & Cui, B. (2020). Semantic segmentation of crop and weed
using an encoder-decoder network and image enhancement method under uncon-
trolled outdoor illumination. IEEE Access, 8, 81724–81734.
Wang, A., Zhang, W., & Wei, X. (2019). A review on weed detection using ground-based
machine vision and image processing techniques. Computers and electronics in
agriculture, 158, 226–240.
Wang, C., & Xiao, Z. (2021). Lychee surface defect detection based on deep convolutional
neural networks with gan-based data augmentation. Agronomy, 11 (8), 1500.
Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2023a). Yolov7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors, 7464–7475.
Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2023b). Yolov7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors. Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464–7475.
Wang, M., Fu, B., Fan, J., Wang, Y., Zhang, L., & Xia, C. (2023). Sweet potato leaf
detection in a natural scene based on faster r-cnn with a visual attention mecha-

214
Bibliography

nism and diou-nms. Ecological Informatics, 73, 101931. https://doi.org/10.1016/


j.ecoinf.2022.101931
Wang, P., Tang, Y., Luo, F., Wang, L., Li, C., Niu, Q., & Li, H. (2022). Weed25: A deep
learning dataset for weed identification. Frontiers in Plant Science, 13. https :
//doi.org/10.3389/fpls.2022.1053329
Wang, P., Fan, E., & Wang, P. (2021). Comparative analysis of image classification algo-
rithms based on traditional machine learning and deep learning. Pattern Recogni-
tion Letters, 141, 61–67. https://doi.org/10.1016/j.patrec.2020.07.042
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., & Kennedy, P. J. (2016). Training deep
neural networks on imbalanced data sets. 2016 international joint conference on
neural networks (IJCNN), 4368–4374. https : / / doi . org / 10 . 1109 / IJCNN . 2016 .
7727770
Wang, S., Pu, Z., Li, Q., & Wang, Y. (2022). Estimating crowd density with edge intel-
ligence based on lightweight convolutional neural networks. Expert Systems with
Applications, 206, 117823.
Weedseeker 2 spot spray system. (n.d.). Retrieved January 25, 2021, from https : / /
agriculture.trimble.com/product/weedseeker-2-spot-spray-system/
Wendel, A., & Underwood, J. (2016). Self-supervised weed detection in vegetable crops
using ground based hyperspectral imaging. 2016 IEEE international conference
on robotics and automation (ICRA), 5128–5135.
Westwood, J. H., Charudattan, R., Duke, S. O., Fennimore, S. A., Marrone, P., Slaughter,
D. C., Swanton, C., & Zollinger, R. (2018). Weed management in 2050: Perspec-
tives on the future of weed science. Weed science, 66 (3), 275–285.
Woebbecke, D., Meyer, G., Von Bargen, K., & Mortensen, D. (1995). Shape features for
identifying young weeds using image analysis. Transactions of the ASAE, 38 (1),
271–281.
Wu, D., Jiang, S., Zhao, E., Liu, Y., Zhu, H., Wang, W., & Wang, R. (2022). Detection of
camellia oleifera fruit in complex scenes by using yolov7 and data augmentation.
Applied Sciences, 12 (22), 11318. https://doi.org/10.3390/app122211318
Wu, R., Yan, S., Shan, Y., Dang, Q., & Sun, G. (2015). Deep image: Scaling up image
recognition. arXiv preprint arXiv:1501.02876, 7 (8), 4.
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., & Girshick, R. (2019). Detectron2.

215
Bibliography

Wu, Z., Chen, Y., Zhao, B., Kang, X., & Ding, Y. (2021). Review of weed detection
methods based on computer vision. Sensors, 21 (11), 3647. https://doi.org/10.
3390/s21113647
Yan, X., Deng, X., & Jin, J. (2020). Classification of weed species in the paddy field with
dcnn-learned features. 2020 IEEE 5th Information Technology and Mechatronics
Engineering Conference (ITOEC), 336–340.
Yi, Z., Yongliang, S., & Jun, Z. (2019). An improved tiny-yolov3 pedestrian detection
algorithm. Optik, 183, 17–23.
Yoo, D., Park, S., Lee, J.-Y., Paek, A. S., & So Kweon, I. (2015). Attentionnet: Ag-
gregating weak directions for accurate object detection. Proceedings of the IEEE
International Conference on Computer Vision, 2659–2667.
Yu, J., Schumann, A. W., Cao, Z., Sharpe, S. M., & Boyd, N. S. (2019a). Weed detection
in perennial ryegrass with deep learning convolutional neural network. Frontiers
in plant science, 10.
Yu, J., Sharpe, S. M., Schumann, A. W., & Boyd, N. S. (2019b). Deep learning for image-
based weed detection in turfgrass. European journal of agronomy, 104, 78–84.
Zhai, Z., Martínez, J. F., Beltran, V., & Martínez, N. L. (2020). Decision support systems
for agriculture 4.0: Survey and challenges. Computers and Electronics in Agricul-
ture, 170, 105256. https://doi.org/https://doi.org/10.1016/j.compag.2020.105256
Zhang, R., Wang, C., Hu, X., Liu, Y., Chen, S., et al. (2020). Weed location and recogni-
tion based on uav imaging and deep learning. International Journal of Precision
Agricultural Aviation, 3 (1).
Zhang, W., Hansen, M. F., Volonakis, T. N., Smith, M., Smith, L., Wilson, J., Ralston, G.,
Broadbent, L., & Wright, G. (2018). Broad-leaf weed detection in pasture. 2018
IEEE 3rd International Conference on Image, Vision and Computing (ICIVC),
101–105.
Zhang, X.-Y., Shi, H., Zhu, X., & Li, P. (2019). Active semi-supervised learning based on
self-expressive correlation with generative adversarial networks. Neurocomputing,
345, 103–113.
Zhao, Z.-Q., Zheng, P., Xu, S.-t., & Wu, X. (2019). Object detection with deep learning:
A review. IEEE transactions on neural networks and learning systems, 30 (11),
3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865

216
Bibliography

Zheng, Y., Zhu, Q., Huang, M., Guo, Y., & Qin, J. (2017). Maize and weed classifica-
tion using color indices with support vector data description in outdoor fields.
Computers and Electronics in Agriculture, 141, 215–222.
Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National Science
Review, 5 (1), 44–53.
Zoph, B., Cubuk, E. D., Ghiasi, G., Lin, T.-Y., Shlens, J., & Le, Q. V. (2020). Learning
data augmentation strategies for object detection. European conference on com-
puter vision, 566–583. https://doi.org/10.1007/978-3-030-58583-9_34

217

You might also like