The Plant Phenome Journal - 2023 - Kar-1

Received: 23 January 2023 Accepted: 26 June 2023
DOI: 10.1002/ppj2.20079
ORIGINAL ARTICLE
Self-supervised learning improves classification of agriculturally

important insect pests in plants
Soumyashree Kar1 Koushik Nagasubramanian2 Dinakaran Elango1

Matthew E. Carroll1 Craig A. Abel3 Ajay Nair4 Daren S. Mueller5
Matthew E. O’Neal5 Asheesh K. Singh1 Soumik Sarkar2
Baskar Ganapathysubramanian2 Arti Singh1
1 Department of Agronomy, Iowa State University, Ames, IA, USA
2 Department of Mechanical Engineering, Iowa State University, Ames, IA, USA
3 USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Ames, IA, USA
4 Department of Horticulture, Iowa State University, Ames, IA, USA
5 Department of Plant Pathology, Entomology and Microbiology, Iowa State University, Ames, IA, USA
Correspondence
Arti Singh, Department of Agronomy, Iowa Abstract
State University, Ames, IA, USA. Insect pests cause significant damage to food production, so early detection and
Email: [email protected]
efficient mitigation strategies are crucial. There is a continual shift toward machine
Assigned to Associate Editor Weizhen Liu. learning (ML)-based approaches for automating agricultural pest detection. Although
supervised learning has achieved remarkable progress in this regard, it is impeded by
Funding information
NSF, Grant/Award Number: 1952045; CPS the need for significant expert involvement in labeling the data used for model train-
Frontier, Grant/Award Number: 1954556; ing. This makes real-world applications tedious and oftentimes infeasible. Recently,
Agricultural Research Service; National
self-supervised learning (SSL) approaches have provided a viable alternative to train-
Institute of Food and Agriculture,
Grant/Award Numbers: 2019-67021-29938, ing ML models with minimal annotations. Here, we present an SSL approach to
2021-67021-35329, 2022-67013-37120; classify 22 insect pests. The framework was assessed on raw and segmented field-
U.S. Department of Agriculture,
Grant/Award Number: IOW04714
captured images using three different SSL methods, Nearest Neighbor Contrastive
Learning of Visual Representations (NNCLR), Bootstrap Your Own Latent, and Bar-
low Twins. SSL pre-training was done on ResNet-18 and ResNet-50 models using all
three SSL methods on the original RGB images and foreground segmented images.
The performance of SSL pre-training methods was evaluated using linear probing
of SSL representations and end-to-end fine-tuning approaches. The SSL-pre-trained
convolutional neural network models were able to perform annotation-efficient
Abbreviations: BYOL, Bootstrap Your Own Latent; CNN, convolutional neural network; DL, deep learning; FAW, fall armyworm; FN, false negatives; FP,
false positives; HTP, high throughput phenotyping; IA-IP22, IA insect-pest dataset 22; IPM, integrated pest management; ML, Machine Learning; NNCLR,
Nearest Neighbor Contrastive Learning of Visual Representations; RGB, red, green, and blue; sgd, stochastic gradient descent; SSL, self-supervised learning;
TN, true negatives; TP, true positives.
Soumyashree Kar and Koushik Nagasubramanian contributed equally.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided
the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2023 The Authors. The Plant Phenome Journal published by Wiley Periodicals LLC on behalf of American Society of Agronomy and Crop Science Society of America.
The Plant Phenome J. 2023;6:e20079. wileyonlinelibrary.com/journal/ppj2 1 of 20

https://doi.org/10.1002/ppj2.20079
25782703, 2023, 1, Downloaded from https://acsess.onlinelibrary.wiley.com/doi/10.1002/ppj2.20079 by Cochrane Saudi Arabia, Wiley Online Library on [13/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 20 KAR ET AL.
classification. NNCLR was the best performing SSL method for both linear and full
model fine-tuning. With just 5% annotated images, transfer learning with ImageNet
initialization obtained 74% accuracy, whereas NNCLR achieved an improved clas-
sification accuracy of 79% for end-to-end fine-tuning. Models created using SSL
pre-training consistently performed better, especially under very low annotation, and
were robust to object class imbalances. These approaches help overcome annotation
bottlenecks and are resource efficient.
1 INTRODUCTION DL architecture enables the extraction of a suite of features

from images using a multilayer neural network, such as Con-
Insect pests cause yield losses of up to 40% globally, with esti- vNet or ResNet (Li et al., 2019). Therefore, several studies
mated revenue losses of $220 billion (Gullino et al., 2021). have reported the comparative performance of multiple DL
Insect populations are influenced by temperature and other architectures with respect to conventional supervised methods
environmental conditions, so future climate change is pre- in classifying crop insect pests (Tetila et al., 2020; Then-
dicted to affect insect-pest outbreaks (Liebhold & Bentz, mozhi & Srinivasulu Reddy, 2019; Xia et al., 2018). The three
2011; Skendžić et al., 2021). Resistant varieties and integrated most reported convolutional neural network (CNN) models
pest management (IPM) strategies are effective methods to for insect-pest classification are versions of VGGNet, ResNet,
control insect pests. IPM and other management interven- and MobileNet, albeit some studies have reported 98% accu-
tions require the timely identification of different insect pests, racy in classifying multiple crop insect pests by fine-tuning
which also reduces the usage of excessive pesticides. There- models like GoogLeNet (Chen et al., 2020). The latter is,
fore, developing tools to identify diverse insects would benefit however, both resource- and time-intensive, hence not very
both farmers and the broader science community. The lack of common in this domain (Liu & Wang, 2021; Nanni et al.,
species-specific visual features (due to extreme visual similar- 2022). Considering the challenges in insect-pest classification
ities between various insects), very specific and short activity tasks, large insect datasets have been published, for example,
duration, mobile nature, and the propensity to hide under the the IP102 dataset (Wu et al., 2019), and iNaturalist plant–
leaves in clusters, and so on often lead to misidentification and insect interaction data (Gazdic & Groom, 2019). However,
make insect-pest detection an extremely challenging problem despite utilizing a combined deep-CNN and saliency-based
(Zhong et al., 2018). Timely pest detection would lower pro- approach and being trained on such datasets, models fail to
duction costs and adverse environmental impacts and help perform desirably due to small inter-class and large intra-class
contribute to better human health and food safety (Hao et al., variation in a multi-class pest detection problem (Singh et al.,
2020). 2021a; Tetila et al., 2020).
High-throughput phenotyping (HTP) tasks have been one Supervised DL methods provide promising results with
of the successful applications of machine learning (ML) very high classification accuracies; however, the amount of
and computer vision in the past decade including plant labeling needed to achieve desired accuracies is very high,
stress phenotyping (Singh et al., 2016; 2021a). Since 2016, making their applicability infeasible in many real-world cases
deep learning (DL)-based methods have been successfully (Tetila et al., 2020). Therefore, there is a pressing need to
deployed in a variety of applications to extract plant traits, build a DL-based classification framework to address the
such as pod counting (Riera et al., 2021), crop yield (Shook issue of inter- and intra-class variabilities with limited anno-
et al., 2021), weed detection (Bah et al., 2018; dos Santos Fer- tation. In agriculture or other domains where data labeling
reira et al., 2017; Osorio et al., 2020; Razfar et al., 2022), is difficult, costly, time-consuming, or complex, there is
insect identification (Ahmad et al., 2022; Bereciartua-Pérez a need to overcome the challenges of limited annotation,
et al., 2022; Li et al., 2021), disease detection (Ghosal et al., so that a robust DL method classification framework can
2018; Kulkarni, 2018; Mohanty et al., 2016; Rairdin et al., be created. In this context, a state-of-the-art self-supervised
2022; Rangarajan et al., 2018), nutrient deficiency detection learning (SSL) approach has been developed that learns use-
(Azimi et al., 2021; Bahtiar et al., 2020; Barbedo, 2019; ful latent representations from input data without human
Waheed et al., 2022; Yi et al., 2020), and root nodules (Jubery annotations. The efficiency of employing an SSL approach
et al., 2021). Although conventional DL-based supervised over the conventional supervised methods has been shown
classification and object detection are powerful models, they in diverse domains, for example, diagnosis from medical
require large volumes of labeled data (Singh et al., 2018). The imaging (Masood et al., 2015; Shurrab & Duwairi, 2022),
KAR ET AL. 3 of 20
autonomous navigation systems (Kahn et al., 2021), seis-

mic imaging (Wang et al., 2020), and plant phenotyping Core Ideas
(Nagasubramanian et al., 2022).
The SSL approach is built on a set of latent features, and ∙ Insect pests cause significant damage to food
therefore, carrying out downstream tasks gets very conve- production.
nient with the significantly reduced amount of labeled data ∙ Early detection and mitigation of insect pests are
while performance is comparable with that of supervised crucial in managing economic threshold level.
learning (Caron et al., 2021; Grill et al., 2020; Nagasubra- ∙ We developed a self-supervised learning (SSL)
manian et al., 2021). An integral aspect of SSL that enables model to identify insect pests with minimal anno-
the learning of latent and complex high-level features from tations.
non-labeled data is augmentation. Tuning different augmen- ∙ SSL models greatly improve the identification and
tation parameters allows the backbone architecture to learn classification tasks.
the underlying distortion-agnostic representations (Misra & ∙ Entropy-masking-based segmentation aids SSL
van der Maaten, 2020). Thus, the pre-trained models obtained effectiveness.
via SSL could be fine-tuned on annotated examples for target
transfer-learning tasks. This becomes even more applicable
where HTP is routinely utilized or deployed as a large trove of
data is created, and classification tasks are the goal (Agastya segmented images with 70.87% accuracy, whereas with raw
et al., 2021; Margapuri & Neilsen, 2021; Nagasubramanian images, the method was just 58.59% accurate. Additionally,
et al., 2021; Singh et al., 2021a). compared to supervised learning, SSL with segmented images
Our main objectives were to develop an efficient classi- yielded visible performance gains. Such findings will be
fication model for economically important 22 insect classes applicable to crop production and plant breeding (Singh et al.,
in field and horticultural crops in Iowa, generate insight into 2021b).
real-world challenges faced in processing a large dataset for
DL, present strategies to handle imbalanced dataset in var-
ious insect-pest classes using SSL, and solve fine-grained 2 MATERIALS AND METHODS
inter- and intra-class classification problems. As real-field
images of insect pests are confounded with larger and complex 2.1 Dataset
backgrounds compared to the foreground, we hypothesize
that image segmentation can aid with better latent repre- Although multiple insect-pest datasets have been reported,
sentations from the foreground that can improve the overall including open source (Gazdic & Groom, 2019; Wu et al.,
classification performance. Therefore, this work focuses on 2019), we emphasized real-world settings and did not include
demonstrating the role of efficient pre-training of the SSL any images sourced from the internet, to make the applica-
methods for a significant reduction in the need for human tion easier in real-life settings that farmers and agronomists
annotation, and comparative performance assessment on both will encounter in agriculture. Further, the available insect
raw and segmented images. Further, we show the ability datasets are mostly limited in the total image count, for exam-
of foreground-aware SSL in addressing the abovementioned ple, 200 images (Venugoban & Ramanan, 2014), 1440 images
challenges and improved model performance. In this con- (Xie et al., 2015), 5000 images (Tetila et al., 2020), and are
text, we present a novel insect-pest dataset (IA-IP22) collected sometimes crop-specific (Tetila et al., 2020; Venugoban &
from several fields in Iowa, comprising 22 insect-pest classes Ramanan, 2014). Limited size and variability in a dataset con-
and 14,665 images collected using smartphones. Using this strain the training of DL models in satisfactorily capturing
useful dataset, we investigated the efficacy of SSL in clas- the complex features for the detection or classification of the
sifying 22 insect-pest classes via a meticulous DL-based insect pests, which inherently possess significant inter- and
classification framework that involves comparative assess- intra-class variabilities (Wu et al., 2019).
ment across 3 SSL algorithms, Nearest-Neighbor Contrastive To create a novel insect-pest dataset with practical applica-
Learning of Visual Representations (NNCLR) (Dwibedi et al., tions, a team of agronomists visited several fields in the state
2021), Bootstrap Your Own Latent (BYOL) (Grill et al., of Iowa (IA), United States with the objective to collect insect-
2020), and Barlow Twins (Zbontar et al., 2021). The SSL pest images of common species present in different crops.
and conventional transfer learning methods were employed For real-world applicable images, smartphones (Android and
to address the inter- and intra-class variabilities using raw iOS) were used to take photos throughout the day with a team
and segmented images. In both methods, segmented images of five people who collected images over the course of sev-
produced better results. For instance, with just 3% training eral weeks in July–August 2021. This ensured varied image
data, the NNCLR self-supervised learner could classify the specification, image variation across people/camera, and time
4 of 20 KAR ET AL.
of the day. Due to the presence of different insect species in tered on a leaf or flower, creating an impression of
varying numbers, we have an imbalance dataset in different overlapping objects.
insect species classes, which was desirable for the objective vii. Insects from different classes were found together in
of this research project. Additional variation was created due the same image.
to the imaging of insects in various crops, leaf, or stem in the
background, different zooms while taking images, and vari- These variabilities (Figure 3) not only make the classi-
ation in types of insect species present. It was noticed that fication task challenging but also make the dataset unique,
the insects appeared at the top of the canopy mostly during because it unravels the opportunities for solving complex
the early morning or evening hours, when the temperature real-world computer vision problems (Singh et al. 2021c).
and environmental conditions were mild. This characteristic
in insect sightings was also reported by Tetila et al. (2020).
However, some insects like the Japanese beetles could be 2.3 Description of the SSL methods
found in clusters throughout the day. We did not experience
any challenge in collecting sufficient images for 21 of 22 SSL methods differ based on the augmentation approaches
insect species. However, fall armyworm (FAW) (Spodoptera and the loss function definitions, which control the selection
frugiperda) was difficult to collect images because those were of the constraints and the way an optimal solution is achieved.
rarely sighted compared to the other insects. Hence, the lar- The three SSL methods leveraged in this study are briefly
vae were first reared and grown in the lab and then imaged described below.
with varying background conditions to get a sizable number
of FAW images. 2.4 BYOL
To incorporate variability in the dataset, the images were
also taken from varying camera angles with an intent to serve BYOL is a distillation-based SSL method that does not rely on
as a natural augmentation technique in training the mod- negative samples, unlike contrastive methods. It rather works
els. Thus, the mentioned insect-pest dataset includes both on two same architecture networks, the online and the tar-
between- and within-species variability in terms of type, get network. The online network is tasked with learning the
size, shape, and overall visual features. All these phenotyping representations for an augmented view of an image, then pre-
efforts led to the creation of “IA insect-pest dataset 22,” that dicting the representations of the target network trained on
is, IA-IP22, which comprises 14,665 images across 22 insects another augmentation of the same image. Although the online
(Figure 1), and the number of insects per class varies from 95 network gets updated as per the prediction errors, the tar-
to 1653 (Figure 2). As few insects were extremely tiny to pho- get network weights are also simultaneously updated with the
tograph, a very close-range 5× zoomed mode was primarily moving averages of the online network weights. Thus, BYOL
used; however, the zoom level differed based on the insect enables self-supervision by learning interactively from two
type and their location on the plant canopy. In the following encoder networks (Grill et al., 2020).
section, the challenges faced with and the methodology for
handling such data are demonstrated.
2.5 Barlow Twins
2.2 Challenges in classifier training Barlow Twins also leverages two identical networks to learn
image features, like BYOL. However, in the Barlow Twins
Using this dataset is challenging from the ML perspective due method, embeddings from both the networks trained on dif-
to the following reasons: ferent augmentations of the same images are cross-correlated.
The model is optimized by making the cross-correlation
i. Several classes had large intra-class variability in size, matrix close to identity, such that the learned embeddings are
shape, color, patterns, and texture. distortion-agnostic providing maximized information. The
ii. Insects from different classes looked very similar, that objective function thus tries to minimize redundancy between
is, very small inter-class variability. the representations learned from the networks and works on a
iii. The dataset was highly class imbalanced. simpler concept than BYOL (Zbontar et al., 2021).
iv. There was a large background compared to the insect
or the foreground.
v. Due to varying illumination conditions in a day, 2.6 NNCLR
shadow effects were also found.
vi. Many images consisted of multiple instances of the NNCLR exploits a contrastive learning approach to find-
same insect, in cases where insects were found clus- ing positives from other samples closest in the latent space
5 of 20
F I G U R E 1 An illustration of some of the insect pest images collected from Iowa State University research fields in Iowa, USA. These
Plot representing the count of insect per class, arranged in descending order (top to bottom).
represent the variety, type, and quality of the collected images.
FIGURE 2
KAR ET AL.
6 of 20 KAR ET AL.
F I G U R E 3 (A) Single (left) and multiple (right) instances of the same insect, milkweed bug; (B) two examples of similar-looking insects from
different classes—(B-i) black soldier beetle (left) and sap beetle (right) and (B-ii) southern corn rootworm (left) and bean leaf beetle (right); (C) two
examples of camouflaging background effect with an instance of a northern corn rootworm in each; (D) intra-class variability in the same insect
class, bean leaf beetle; (E) multiple insect classes in the same image—(E-i) a lady beetle, one soldier beetle and two northern corn rootworms and
(E-ii) a northern and a western corn rootworm; (F) visual similarity between western corn rootworm (left) and striped cucumber beetle (right); (g)
instances of both noisy background and multiple insects in the same image—(G-i) northern and western corn rootworm and (G-ii) northern corn
rootworm and milkweed bug; (H) background and illumination effects on the foreground, an instance of bean leaf beetle in each.
than using augmentations of the same image. This enables sidering the complexity of the images, SSL performance
increasing semantic variability compared to the latter. The on raw and segmented images was compared via linear
networks thus learn beyond a single discriminative instance evaluation of the representations learned in both cases. Sub-
in providing better invariance to different viewpoints, defor- sequently, supervised fine-tuning was performed to compare
mations, and even intra-class variations. This not only makes supervised versus self-supervised results. In this process,
the method less reliant on complex data augmentations but two backbone architectures, ResNet-18 and -50 (RN 18/50),
also helps with significant improvement in performance in were examined for different sampling strategies, random,
downstream tasks. random-augmented, diverse, and diverse-augmented, with
label fractions of the sample varying from 0.1%, 0.3%, 0.5%,
and 100%. All the experiments were repeated three times,
2.7 Workflow and the average results from each method were used to
compare between SSL and SL performances. Thus, this paper
The classification framework comprises three major primarily aims to examine the minimum amount of training
steps: data pre-processing and extraction of segmented data needed to obtain at least 80% classification accuracy,
images, deriving latent representations through different and how efficiently SSL helps in handling class imbalance.
self-supervised procedures, and finally, classification. Con- The detailed methodology is illustrated in Figure 4.
KAR ET AL. 7 of 20
F I G U R E 4 Detailed methodology flowchart, representing the two input sets, raw/segmented, the backbone architectures ResNet-18 and 50
(RN 18/50), sampling strategies—random, random-augmented, diverse, and diverse-augmented, and the labeled fractions of sample varying from
0.1%, 0.3%, 0.5%, to 100%.
2.8 Data pre-processing (raw and is first removed via segmentation before executing the pre-
segmented) and pre-training setup training methods. The visual difference between the raw and
the segmented images is shown (Figure 5) in BGR format,
The dataset was first cleaned up by removing the duplicate and the default format used by the OpenCV library employed
empty images. Then it was partitioned in an approximately for the image pre-processing operations in this study. In the
70:15:15 ratio yielding 10,725 training images, 2081 valida- segmented images, much of the background is removed; how-
tion, and 1859 test images. All the images were resized to ever, essential visual patterns in the foreground are retained,
224 × 224 dimensions for processing efficiency, and then, for example, the bean leaf beetle, and the aphid images are
training samples were labeled with increasing proportion, desirably segmented despite very high similarity between the
that is, [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10, 30, 50, 70, 100]. foreground and background.
This approach would eventually help identify the amount For segmentation, the local entropy-based (Hržić et al.,
of training data ideally required for reasonable SSL perfor- 2019) masking approach was leveraged to segment an image
mance. In this study, four different sampling strategies were based on the level of complexity contained in a given neigh-
adopted: (a) random, (b) diverse (by selecting diverse samples borhood, defined by the structuring element, disk radius. The
from the latent space of the encoder output (Bortolato et al., entropy filter first detects subtle variations in the local gray
2022), (c) random-augmented, and (d) diverse-augmented. level distribution in the defined neighborhood and captures
Although the two former training sample sets (a, b) included the inherent properties of the transition regions. Image bina-
imbalanced classes, the latter two (c, d) were augmented rization was then performed using a threshold of 0.8 to obtain
via over-sampling for ensuring balanced classes. Again, this the mask. On applying the resultant mask to the grayscale
strategy was adopted to test the impact of an imbalanced image, only those portions of the image that exceeded the
dataset on SSL results. Thus, there were four training sets threshold were retained. The resultant entropy-masked image
that differed in the sampling strategy. The entire training was then converted back to the color image format, which
set was replicated, and each image was segmented, to cre- now represented the foreground, which was segmented from
ate the segmented training samples, such that both the raw the background. In this process, for each insect class, the
and segmented training data contained the same images. The foreground-object texture was selectively segregated using
classification framework was then parallelly employed for entropy, by varying the disk radius from 5 to 20. Satisfac-
subsequent analysis of any difference in performance. This tory segmentation results were empirically achieved for a disk
study hypothesizes a possible improvement in the perfor- radius of 20 for southern corn rootworm and flea beetle; for
mance of downstream tasks if much of the noisy background stink bug, northern corn rootworm, and flea beetle, it was 15,
8 of 20 KAR ET AL.
F I G U R E 5 Examples of raw (top row) and corresponding segmented (bottom row) images are shown (in BGR format) for specific insect
classes, northern corn rootworm, flea beetle, corn earworm larvae, bean leaf beetle, and aphids.
and for the remaining insect classes, 5. Similarly, the thresh- ResNet-50 model. We used different label fractions of train-
old for masking was also empirically chosen to be 0.8. This ing sets (0.1%, 0.3%, 0.5%, 0.7%, 1%, 3%, 5%, 7%, 10%, 30%,
segmentation method was adopted because it takes image tex- 50%, 70%, and 100%) for the classifier. All the linear prob-
ture into account rather than color variations and is simpler ing experiments were repeated three times. We also evaluated
and remarkably faster than other reported methods like the the SSL model initializations as shown in Figure 6b. For this,
Simple Linear Iterative Clustering super pixel segmentation we fine-tuned the model end-to-end using supervised learn-
(Stutz, 2015). Thus, once the datasets were prepared, pre- ing. We used different label fractions of training sets (0.1%,
training was performed for 800 epochs by employing SSL 0.3%, 0.5%, 0.7%, 1%, 3%, 5%, 7%, and 10%) for fine-tuning
methods described above. Two backbone architectures were the classifier. Unlike the linear probing evaluation, here we
compared during pre-training, ResNet-18 and ResNet-50, ini- focus on accessing performance when there is a limited bud-
tialized with ImageNet weights (Krizhevsky et al., 2017). get for labeling (set to 10% of the dataset). All the fine-tuning
The hyperparameters were fine-tuned for each of the methods experiments were repeated three times, and the average results
(Table 1), and the model checkpoint with the lowest train- from each method were used to compare between SSL and SL
ing and validation loss was saved for the downstream task. performances.
For training optimization, the stochastic gradient descent opti-
mizer was used for each of the experiments, and the models
were trained using ReLU activations in the convolutional and
2.10 Performance metrics
dense layers.
We calculate the multi-class classification accuracy from the
confusion matrix: true positives (TP), true negatives (TN),
2.9 Linear probing versus end-to-end false positives (FP), and false negatives (FN). TP and TN
fine-tuning are the samples that were correctly classified by the model
and are shown on the main diagonal of the confusion matrix.
We used two different types of evaluation for the SSL methods
FP and FN are the samples that were incorrectly classified
as shown in Figure 6. To evaluate the transfer of representa-
by the model. From these values, the classification accuracy,
tions, a popular evaluation protocol is to freeze the backbone
precision, recall, and F1-score are calculated as follows:
model and train a linear classifier on the final layer repre-
sentation (Kolesnikov et al., 2019) as shown in Figure 6a. 𝑇𝑃 + 𝑇𝑁
This method is used to understand the effectiveness of SSL 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
representations for downstream classification. Here, we froze
the ResNet backbone model and used the representation from
the final layer of the model to train a linear classifier. A lin- 𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
ear classifier with 512 nodes was used for the ResNet-18 𝑇𝑃 + 𝐹𝑃
model, and a linear classifier with 2048 nodes was used for the
KAR ET AL. 9 of 20
TA B L E 1 The values of hyperparameters tuned during pre-training of each self-supervised learning (SSL) model.
Hyperparameter BYOL NNCLR Barlow Twins

num_crops_per_aug [1, 1, 6] [1, 1] [1, 1]
Brightness [0.4, 0.4, 0.4] [0.4, 0.4] [0.4, 0.4]
Contrast [0.4, 0.4, 0.4] [0.4, 0.4] [0.4, 0.4]
Saturation [0.2, 0.2, 0.2] [0.2, 0.2] [0.2, 0.2]
Hue [0.2, 0.2, 0.2] [0.1, 0.1] [0.4, 0.4]
color_jitter_prob [0.8, 0.8, 0.8] [0.8, 0.8] [0.8, 0.8]
gray_scale_prob [0.0, 0.0, 0.0] [0.2, 0.2] [0.2, 0.2]
horizontal_flip_prob [0.5, 0.5, 0.5] [0.5, 0.5] [0.5, 0.5]
gaussian_prob [0.1, 0.2, 0.3] [1.0, 0.1] [1.0, 0.1]
solarization_prob [0.0, 0.2, 0.4] [0.0, 0.2] [0.2, 0.4]
crop_size [128, 128, 64] [224, 224] [128, 128]
min_scale [0.08, 0.08, 0.08] [0.08, 0.08] [0.08, 0.08]
max_scale [1.0, 1.0, 1.0] [1.0, 1.0] [1.0, 1.0]
batch_size 128 128 64
Lr 0.02 0.02 0.01
classifier_lr 0.1 0.3 0.3
weight_decay 1.00E − 05 1.00E − 05 0.0001
Optimizer sgd sgd sgd
Note: The list is as per the hyperparameters provided in the solo-learn library (da Costa et al., 2022).
Abbreviations: BYOL, Bootstrap Your Own Latent; NNCLR, Nearest Neighbor Contrastive Learning of Visual Representations; sgd, stochastic gradient descent.
F I G U R E 6 Illustration of (A) linear classification and (B) end-to-end fine-tuning methods, which were used to compare the accuracy of
self-supervised learning (SSL) methods. In (A), only weights of the last fully connected layer are fine-tuned, and in (B), all model weights are
fine-tuned in the end-to-end evaluation.
10 of 20 KAR ET AL.
F I G U R E 7 The mean (across all four sampling strategies and three repetitions) self-supervised learning (SSL) performance with both
ResNet-18 and 50 (RN18/RN50) backbones is plotted for raw and segmented datasets.
𝑇𝑃 training and validation curves also showed that it rapidly

𝑅𝑒𝑐𝑎𝑙𝑙 = (3)
𝑇𝑃 + 𝐹𝑁 reaches the plateau, and further improvement in performance
drastically slows down beyond 200 epochs. This is fol-
lowed by NNCLR with 93.05%, and then Barlow Twins with
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑟𝑒𝑐𝑎𝑙𝑙 88.98%. NNCLR was the most annotation-efficient method
𝐹1 = 2 (4)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 as it reached an accuracy of 90% with just 30% of the training
data.
The segmented images helped enhance the pre-training per-
3 RESULTS AND DISCUSSION formance compared to the raw images, as expected. With
less than 1% labeling of the training data, ∼10%–11% accu-
3.1 Linear probing racy improvement was observed in the case of segmented
images. The highest improvement was noticed in NNCLR.
The overall pre-training results (Figure 7), obtained by taking With just 3% sample size and ResNet-18 backbone, seg-
the mean across all the sampling strategies and three repe- mented images reached an accuracy of 70.87% compared
titions of each, show that with 100% training data, BYOL to 58.59% of raw images, with a remarkable increment of
achieves the highest classification accuracy (94.16%). The 12.27%. As the amount of sampling data increased this
KAR ET AL. 11 of 20
difference was reduced, still leading to an average of 3% incre- tried to learn all the data points including noise and inac-
ment with 100% training samples. This shows that entropy- curate values present in the dataset, thereby reducing model
based image segmentation combined with the NNCLR-SSL accuracy (Santos et al., 2018). There were very minor to no
method could be a highly annotation-efficient solution with differences noticed in the performance between the random
greater than 70% accuracy, even with very low sample size of and diverse sampling strategies. In addition, in the case of
3% (i.e., 3% of 10,725 = 322 images in this case). random sampling, results from both the imbalanced (raw) and
Currently, there are varied SSL implementations for solv- balanced (raw-augmented) datasets were almost similar with
ing fine-grained image classification problem, for exam- no noticeable difference. Thus, these findings confirmed that
ple, semantic learning from the discriminative feature- SSL methods are robust to class imbalance, also suggested in
representations of image parts (Yang et al., 2022; Yu et al., Liu, Zhang et al. (2021), and these methods can achieve better
2022), part-level contrastive learning (Wang et al., 2022), performance with segmented images. Therefore, the subse-
attentively identifying fine-grained images by interaction quent results demonstrate the performance difference between
(Zhuang et al., 2020). However, this study shows the ability linear probing and fine-tuning, based only on the randomly
of local entropy-mask segmentation in enhancing SSL per- sampled segmented images, and do not include the diverse
formance to classify insect pests from complex images, as and augmented cases.
segmentation helps retain mostly the foreground portions that
accentuate the learning of more meaningful representations
during the pretext task, compared to the raw images. In the 3.2 Fine-tuning evaluation
latter case, some of the latent representations could belong to
the image background, which is intuitively not very helpful Figure 9 shows the performance of end-to-end fine-tuning
in generalizing the downstream task. Utilizing image seg- results of ResNet-18 and ResNet-50 models. All the fine-
mentation for aiding supervised classification performance tuning experiments were repeated three times, and the mean
has been found to be beneficial in previous studies (Liu, classification accuracy across the three repetitions is shown in
HaoChen, et al., 2021; Mahbod et al., 2020). Additionally, it Figure 9a,b. NNCLR was the best performing SSL method.
may be noted that such improvement in model effectiveness For 5% of the labeled samples, the NNCLR method obtained
was achieved from “local” entropy-mask-based segmentation a mean classification accuracy of ∼79% for the ResNet-
that may still be influenced by external factors like illumina- 50 model and an accuracy of ∼74% for the ResNet-18
tion and occlusion. Hence, as a future research domain, the model. All the SSL pre-training methods outperformed super-
“locally adaptive” entropy-based thresholding (Zhang et al., vised baseline for end-to-end fine-tuning evaluation. The SSL
2022), which is rather a computationally expensive approach pre-training methods were more annotation efficient than
can be tested to determine the change in performance. ImageNet initialization for training fractions less than 5%. The
Regarding the backbone architecture, ResNet-50-based performance of ImageNet initialization was on-par with SSL
experiments yielded a 5%–9% increase in accuracy than the methods for training fractions greater than 5%. These results
ResNet-18-based experiments, when sample size was 100%. were as expected because evidence suggests that the benefit of
However, when the training size was just 1% or less, ResNet- SSL models increases with the availability of larger amounts
18-based experiments seemed to achieve an average of ∼3% of unlabeled data for pre-training. Among the SSL methods,
higher accuracy than the ResNet-50 ones. Such an effect was Barlow Twins had the lowest performance. For 10% train-
prominent in the BYOL and Barlow Twins methods. However, ing data, the ResNet-50 model obtained a mean classification
in the case of NNCLR, ResNet-50 proved beneficial across all accuracy of 86% and was ∼4% better than the ResNet-18
the sample sizes with a 4% increase in accuracy on an average, model.
both on raw and segmented datasets. This states that when the Figures 10 and 11 show the confusion matrices of the
training size is extremely low, simpler architectures are better ResNet-50 model with ImageNet and NNCLR initializations,
for information maximization or distillation-based SSL meth- respectively. The model was trained with 7% of labeled data,
ods. However, based on the overall result from the comparison and the input images were pre-processed with entropy-based
between the backbone architectures, the sampling strategies segmentation. For confounding classes, like bean leaf beetle
were examined for the ResNet-50-based models (Figure 8). and ladybird beetle, the NNCLR model performed better than
There was no improvement noticed with the augmented ImageNet initialization. The NNCLR initialization obtained
dataset containing balanced classes, on any of the three SSL an accuracy of 96% for bean leaf beetle, whereas the Ima-
methods. It was observed that classification accuracy rather geNet model obtained an accuracy of 78%. Similarly, for
dropped with diverse-augmented samples, particularly if the the confounding classes like FAW and corn earworm lar-
proportion of labeled samples in the training set was less than vae, the NNCLR model obtained accuracies of 97% and 90%,
10%. This could have potentially resulted from over-sampling respectively, whereas the ImageNet model obtained accura-
that led to overfitting for specific classes, where the model cies of 89% and 92%, respectively.
12 of 20 KAR ET AL.
F I G U R E 8 Comparison of the impact of different sampling strategies on each of the self-supervised learning (SSL) methods. For brevity, the
results are plotted for sample sizes of 1%, 5%, 10%, 50%, and 100%, which potentially capture the overall pattern of improvement in classification
accuracy as the sample size increases.
F I G U R E 9 End-to-end fine-tuning evaluation of (A) ResNet-18 and (B) ResNet-50 models using segmented images. The “Supervised” curve
corresponds to training from random initialization. The models were fine-tuned for different label percentage fractions (0.1%, 0.3%, 0.5%, 0.7%, 1%,
3%, 5%, 7%, and 10%).
KAR ET AL. 13 of 20
F I G U R E 1 0 Confusion matrix for ImageNet initialized ResNet-50. The model was trained with 7% of labeled images. The input images were
pre-processed with entropy-based segmentation for removing the background. The 22 classes are “Aphids”: 0, “Bean leaf beetle”: 1, “Corn earworm
larvae”: 2, “Fall armyworm”: 3, “Flea beetle”: 4, “Green lacewing”: 5, “Green leaf hopper”: 6, “Japanese beetle”: 7, “Ladybird beetle”: 8, “Maize
calligrapher”: 9, “Milkweed bug”: 10, “Northern corn rootworm beetle”: 11, “Sap beetle”: 12, “Silver spotted caterpillars”: 13, “Soldier beetle”: 14,
“Southern corn rootworm beetle”: 15, “Soybean nodule fly”: 16, “Stink bug”: 17, “Striped cucumber beetle”: 18, “Tarnished plant bug”: 19,
“Western corn rootworm beetle”: 20, “White fly”: 21.
14 of 20 KAR ET AL.
F I G U R E 1 1 Confusion matrix for Nearest Neighbor Contrastive Learning of Visual Representations (NNCLR) initialized ResNet-50 model
trained on segmented images. The model was trained with 7% of labeled images. The input images were pre-processed with entropy-based
segmentation for removing the background. The 22 classes are “Aphids”: 0, “Bean leaf beetle”: 1, “Corn earworm larvae”: 2, “Fall armyworm”: 3,
“Flea beetle”: 4, “Green lace wing”: 5, “Green leaf hopper”: 6, “Japanese beetle”: 7, “Ladybird beetle”: 8, “Maize calligrapher”: 9, “Milkweed bug”:
10, “Northern corn rootworm beetle”: 11, “Sap beetle”: 12, “Silver spotted caterpillars”: 13, “Soldier beetle”: 14, “Southern corn rootworm beetle”:
15, “Soybean nodule fly”: 16, “Stink bug”: 17, “Striped cucumber beetle”: 18, “Tarnished plant bug”: 19, “Western corn rootworm beetle”: 20,
“White fly”: 21.
KAR ET AL. 15 of 20
T A B L E 2 Precision obtained for each of the 22 classes at 5%, 7%, and 10% proportions of training data, from the ImagNet and Nearest
Neighbor Contrastive Learning of Visual Representations (NNCLR) models.
Precision
5p 7p 10p
ImageNet NNCLR ImageNet NNCLR ImageNet NNCLR
Aphids: 0 0.926 0.955 0.958 0.987 0.990 0.999
Bean leaf beetle: 1 0.546 0.503 0.696 0.653 0.728 0.685
Corn earworm larvae: 2 0.850 0.859 0.925 0.934 0.957 0.966
Fall armyworm: 3 0.820 0.992 0.825 0.997 0.857 0.999
Flea beetle: 4 0.833 0.764 0.888 0.819 0.920 0.871
Green lace wing: 5 0.541 0.884 0.556 0.899 0.676 0.991
Green leaf hopper: 6 0.884 0.763 0.909 0.863 0.941 0.955
Japanese beetle: 7 0.668 0.812 0.743 0.887 0.863 0.919
Ladybird beetle: 8 0.775 0.905 0.820 0.950 0.852 0.982
Maize calligrapher: 9 0.485 0.757 0.520 0.792 0.640 0.824
Milkweed bug: 10 0.976 1.000 0.976 1.000 0.978 1.000
Northern corn rootworm 0.417 0.401 0.467 0.451 0.787 0.483
beetle: 11
Sap beetle: 12 0.959 0.945 0.974 0.960 0.976 0.992
Silver spotted caterpillars: 13 0.967 0.956 0.972 0.981 0.974 0.983
Soldier beetle: 14 0.639 0.699 0.789 0.849 0.841 0.881
Southern corn rootworm 0.753 0.831 0.828 0.906 0.860 0.938
beetle: 15
Soybean nodule fly: 16 0.430 0.472 0.505 0.547 0.625 0.579
Stink bug: 17 0.776 0.643 0.871 0.738 0.903 0.770
Striped cucumber beetle: 18 0.880 0.860 0.935 0.915 0.967 0.947
Tarnished plant bug: 19 0.747 0.788 0.812 0.853 0.932 0.885
Western corn rootworm 0.724 0.659 0.789 0.724 0.821 0.756
beetle: 20
White fly: 21 0.644 0.872 0.699 0.947 0.731 0.979
The precision, recall, and F1-scores are presented in 12% higher than that of ImageNet. Contrarily, western corn
Tables 2–4. Overall, the NNCLR model yielded 4.9%, 5.43%, rootworm beetle was the only class for which the ImageNet
and 2.56% better precision than the ImageNet model with 5%, classifier performed better in all the three metrics, with a mean
10%, and 10% labeling of the training samples, respectively. increase of ∼6% (precision), 2% (recall), and 5% (F1-score)
Similarly, the NNCLR model’s recall was higher by 2.07%, across the three scenarios with 5%, 7%, and 10% labeled data.
4.12%, and 2.0%, whereas the F1-score improved by 2.46%, However, for the minority classes like the green lacewing, and
4.07%, and 0.52% for 5%, 7%, and 10% labeled fractions of the maize calligrapher, NNCLR performed remarkably bet-
the training set. Nevertheless, it was interesting to note that ter. In the case of green lacewing, precision and recall were
some classes, for example, bean leaf beetle, northern corn higher by 33.3% and 22.3%, whereas for maize calligrapher,
rootworm beetle, and stink bug, could be classified with bet- the respective scores were up by 24.3% and 15%. Another
ter precision by the ImageNet, while the corresponding recall notable example demonstrating the efficiency of the SSL-pre-
scores from the NNCLR model were higher. This implied that trained model in correctly classifying a confounding class is
the NNCLR model produced fewer FN, that is, it was bet- that of the southern corn rootworm beetle (with ∼8% higher
ter at identifying both positive and negative samples of the precision, recall, and F1-score), which looks very similar to a
classes with high intra-class variability like the bean leaf bee- bean leaf beetle (Figure 3b-ii).
tle, and the northern corn rootworm beetle that is tan to pale These classification results show that the NNCLR model
green in color and easily camouflages with the background that was trained on smaller in-domain unlabeled data was
in the field. Considering all the three sampling scenarios, able to obtain good accuracy for challenging classes with few
the NNCLR-based recall for the northern corn rootworm was labels compared to ImageNet model that was pre-trained on
16 of 20 KAR ET AL.
T A B L E 3 Recall obtained for each of the 22 classes at 5%, 7%, and 10% proportions of training data, from the ImagNet and Nearest Neighbor
Contrastive Learning of Visual Representations (NNCLR) models.
Recall
5p 7p 10p
Aphids: 0 0.811 0.922 0.836 0.947 0.891 0.972
Bean leaf beetle: 1 0.781 0.959 0.806 0.984 0.861 0.999
Fall armyworm: 3 0.915 0.903 0.920 0.958 0.945 0.983
Flea beetle: 4 0.699 0.519 0.849 0.669 0.944 0.819
Green lace wing: 5 0.230 0.412 0.480 0.662 0.505 0.812
Japanese beetle: 7 0.821 0.660 0.846 0.910 0.941 0.965
Ladybird beetle: 8 0.730 0.742 0.805 0.817 0.955 0.842
Milkweed bug: 10 0.683 0.683 0.883 0.883 0.938 0.908
beetle: 11
Sap beetle: 12 0.750 0.450 0.845 0.545 0.900 0.570
Soldier beetle: 14 0.858 0.964 0.898 0.994 0.923 0.999
beetle: 15
Stink bug: 17 0.675 0.741 0.805 0.834 0.900 0.859
beetle: 20
White fly: 21 0.778 0.741 0.793 0.891 0.888 0.946
large, labeled data from out-of-domain. This showed that SSL downstream tasks (Liu, Zhang, et al., 2021; Yang & Xu,
could solve fine-grained inter- and intra-class classification 2020). More specifically, SSL is not actuated by any labels,
problems, because the bean leaf beetle class contained the unlike the SL approach. Hence, SSL is not limited to learning
high intra-class variability, whereas the confounding classes only the label-relevant features that help predict the frequent
had fine-grained inter-class variability. As the proportion of classes, but rather a diverse set of generalizable represen-
labeled samples increased from 5% to 10%, the recall or the tations, including both label-relevant and irrelevant features
ability of the SSL method in correctly identifying the bean from unlabeled data. Learning during the pretext task also
leaf beetle images increased from 95.9% to 99.9%, compared contributes to the representation-invariance property of an
to a recall of 0.861 by the ImageNet model, when trained with SSL model (Tendle & Hasan, 2021), such that it captures
just 10% labeled samples. Similar patterns in the results were the ingrained characteristics of the input distribution, that are
also observed in the case of confounding classes like green generalizable or transferable to downstream tasks. Therefore,
lace wing and the green leaf hopper, also identified as one SSL methods can generalize to rare classes better than SL
of the minority classes in the dataset. Aphids is another class approaches. SSL’s robustness to class imbalance is thoroughly
with high fine-grained variability, which could be classified demonstrated by Liu, Zhang et al. (2021), and the general-
with 92% accuracy with 7% training using SSL, whereas the izability of self-supervised representations is discussed by
ImageNet method’s accuracy was 11% lower. Tendle and Hasan (2021).
Such robustness of SSL to dataset imbalance could be Overall, the SSL methods provide an exciting opportunity
attributed to its ability to learn richer features that are trans- and application in the plant science domain. At the same
ferable across layers to help classify the rare classes and time, there are several open questions that require future
KAR ET AL. 17 of 20
T A B L E 4 F1-Score obtained for each of the 22 classes at 5%, 7%, and 10% proportions of training data, from the ImagNet and Nearest
Neighbor Contrastive Learning of Visual Representations (NNCLR) models.
F1-Score
5p 7p 10p
Aphids: 0 0.864 0.938 0.893 0.967 0.938 0.985
Bean leaf beetle: 1 0.643 0.660 0.747 0.785 0.789 0.813
Fall armyworm: 3 0.865 0.945 0.870 0.977 0.899 0.991
Flea beetle: 4 0.760 0.618 0.868 0.736 0.932 0.844
Green lace wing: 5 0.323 0.562 0.515 0.762 0.578 0.892
Japanese beetle: 7 0.737 0.728 0.791 0.898 0.900 0.941
Ladybird beetle: 8 0.752 0.815 0.812 0.878 0.901 0.906
Milkweed bug: 10 0.804 0.811 0.927 0.938 0.958 0.952
beetle: 11
Sap beetle: 12 0.842 0.610 0.905 0.695 0.936 0.724
Soldier beetle: 14 0.733 0.810 0.840 0.916 0.880 0.936
beetle: 15
Stink bug: 17 0.722 0.689 0.837 0.783 0.902 0.812
beetle: 20
White fly: 21 0.705 0.801 0.743 0.918 0.802 0.962
research. The SSL-based insect-pest identification should without prior image segmentation, to circumvent data anno-
investigate (a) designing pretext classes specifically to insect- tation challenges that plague plant scientists as biological
pest classification, (b) using class-specific loss functions, (c) systems are inherently very complex. We found that SSL-
pre-training with both out-of-domain and in-domain data, and pre-trained models were annotation efficient for insect-pest
(d) developing a mobile application for farmers and breeders. classification. For learning with few labels, the model ini-
tializations and latent representation from NNCLR was better
than the ImageNet model. Pre-training with segmented input
4 CONCLUSIONS images provided better performance than the original images.
All the SSL methods performed better than the supervised
This paper presents an IA insect-pest dataset that gener- baseline for both linear probing and end-to-end evaluation.
ates exciting opportunities for researchers and practitioners The SSL-pre-trained models were robust to class imbalances
to utilize the dataset in ML model development. This dataset and were able to differentiate confounding insect classes.
includes (a) several classes with large intra-class variability in These results indicate the usefulness of SSL methods, espe-
size, shape, color, patterns, and texture; (b) insects from differ- cially with segmented images for data labeling/annotation
ent classes that look similar; (c) high class imbalance; (d) large challenges to save time, cost, physical resource, computa-
background noise compared to the insect or the foreground; tion, and integrate phone-based imaging with ML pipeline
(e) varying illumination conditions and shadows; (f) overlap- that can work across geographies to help identify and even-
ping objects in the image; (g) multiple insect-pest species tually control insect pests in the field. SSL models from our
in the same image frame. Using this insect-pest dataset, paper will be efficient in solving a variety of plant phenomics
we thoroughly investigated different SSL methods, with and problems, which includes the early detection of insect pests,
18 of 20 KAR ET AL.
species identification, damage assessment, yield loss due to REFERENCES

insect infestation, and provide vital information to farming Agastya, C., Ghebremusse, S., Anderson, I., Reed, C., Vahabi, H.,
community to maintain a healthy crop cycle. & Todeschini, A. (2021). Self-supervised contrastive learning for
irrigation detection in satellite imagery. ArXiv Preprint.
Ahmad, I., Yang, Y., Yue, Y., Ye, C., Hassan, M., Cheng, X., Wu, Y.,
AU T H O R C O N T R I B U T I O N S & Zhang, Y. (2022). Deep learning based detector YOLOv5 for iden-
Soumyashree Kar: Conceptualization; data curation; formal tifying insect pests. Applied Sciences, 12(19), 10167. https://doi.org/
analysis; investigation; methodology; validation; writing— 10.3390/app121910167
original draft; writing—review and editing. Koushik Naga- Azimi, S., Kaur, T., & Gandhi, T. K. (2021). A deep learning approach
subramanian: Data curation; formal analysis; investi- to measure stress level in plants due to Nitrogen deficiency. Mea-
gation; methodology; validation; writing—original draft. surement, 173, 108650. https://doi.org/10.1016/j.measurement.2020.
108650
Dinakaran Elango: Data curation; investigation; method-
Bah, M., Hafiane, A., & Canals, R. (2018). Deep learning with unsuper-
ology; writing—original draft; writing—review and edit- vised data labeling for weed detection in line crops in UAV images.
ing. Matthew E. Carroll: Writing—review and edit- Remote Sensing, 10(11), 1690. https://doi.org/10.3390/rs10111
ing. Craig A. Abel: Funding acquisition; methodology; 690
resources; writing—review and editing. Ajay Nair: Inves- Bahtiar, A. R., Pranowo, P., Santos, A. J., & Juhariah, J. (2020). Deep
tigation; resources; writing—review and editing. Daren S. learning detected nutrient deficiency in chili plant. 2020 8th Inter-
Mueller: Investigation; resources; writing—review and edit- national Conference on Information and Communication Technology
(ICoICT) (pp. 1–4). IEEE. https://doi.org/10.1109/ICoICT49345.
ing. Matthew E. O’Neal: Investigation; resources; writing—
2020.9166224
review and editing. Asheesh K. Singh: Conceptualization; Barbedo, J. G. A. (2019). Detection of nutrition deficiencies in plants
investigation; methodology; resources; validation; writing— using proximal images and machine learning: A review. Comput-
review and editing. Soumik Sarkar: Funding acquisi- ers and Electronics in Agriculture, 162, 482–492. https://doi.org/10.
tion; methodology; resources; writing—review and editing. 1016/j.compag.2019.04.035
Baskar Ganapathysubramanian: Conceptualization; data Bereciartua-Pérez, A., Gómez, L., Picón, A., Navarra-Mestre, R.,
curation; funding acquisition; investigation; methodology; Klukas, C., & Eggers, T. (2022). Insect counting through deep
learning-based density maps estimation. Computers and Electronics
project administration; resources; supervision; validation;
in Agriculture, 197, 106933. https://doi.org/10.1016/j.compag.2022.
visualization; writing—review and editing. Arti Singh: Con-
106933
ceptualization; data curation; formal analysis; funding acqui- Bortolato, B., Smolkovič, A., Dillon, B. M., & Kamenik, J. F. (2022).
sition; investigation; methodology; project administration; Bump hunting in latent space. Physical Review D, 105(11), 115009.
resources; supervision; validation; writing—original draft; https://doi.org/10.1103/PhysRevD.105.115009
writing—review and editing. Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski,
P., & Joulin, A. (2021). Emerging properties in self-supervised vision
transformers. Proceedings of the IEEE/CVF International Conference
AC K N OW L E D G M E N T S on Computer Vision (pp. 9650–9660). IEEE.
We thank team members and collaborators of Arti Singh and Chen, H., Chen, A., Xu, L., Xie, H., Qiao, H., Lin, Q., & Cai, K. (2020).
Asheesh K Singh’s group for their help in imaging and data A deep learning CNN architecture applied in smart near-infrared
collection. analysis of water pollution for agricultural irrigation resources. Agri-
Open access funding provided by the Iowa State University cultural Water Management, 240, 106303. https://doi.org/10.1016/j.
agwat.2020.106303
Library.
da Costa, V. G. T., Fini, E., Nabi, M., Sebe, N., & Ricci, E. (2022). Solo-
learn: A library of self-supervised methods for visual representation
C O N F L I C T O F I N T E R E ST STAT E M E N T learning. Journal of Machine Learning Research, 23, 1–6.
The authors declare no conflicts of interest. dos Santos Ferreira, A., Matte Freitas, D., Gonçalves da Silva, G., Pistori,
H., & Theophilo Folhes, M. (2017). Weed detection in soybean crops
using ConvNets. Computers and Electronics in Agriculture, 143, 314–
D A T A AVA I L A B I L I T Y S T A T E M E N T
324. https://doi.org/10.1016/j.compag.2017.10.027
Raw dataset and the Python code used for analysis can be Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman,
accessed at https://github.com/SoylabSingh/Insect1. A. (2021). With a little help from my friends: Nearest-neighbor
contrastive learning of visual representations. Proceedings of the
ORCID IEEE/CVF International Conference on Computer Vision (pp. 9588–
Soumyashree Kar https://orcid.org/0000-0003-2158-2540 9597). IEEE.
Dinakaran Elango https://orcid.org/0000-0003-2226- Gazdic, M., & Groom, Q. (2019). iNaturalist is an unexploited source
of plant-insect interaction data. Biodiversity Information Science and
486X
Standards, 3, e37303. https://doi.org/10.3897/biss.3.37303
Asheesh K. Singh https://orcid.org/0000-0002-7522-037X
Ghosal, S., Blystone, D., Singh, A. K., Ganapathysubramanian, B.,
Arti Singh https://orcid.org/0000-0001-6191-9238 Singh, A., & Sarkar, S. (2018). An explainable deep machine vision
KAR ET AL. 19 of 20
framework for plant stress phenotyping. Proceedings of the National Liu, Y., Zhang, Z., Liu, X., Wang, L., & Xia, X. (2021). Efficient image
Academy of Sciences of the United States of America, 115(18), segmentation based on deep learning for mineral image classification.
4613–4618. https://doi.org/10.1073/pnas.1716999115 Advanced Powder Technology, 32(10), 3885–3903.
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Mahbod, A., Tschandl, P., Langs, G., Ecker, R., & Ellinger, I. (2020).
Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. The effects of skin lesion segmentation on the performance of der-
G., Piot, B., Kavukcuoglu, K., Munos, R., & Valko, M. (2020). matoscopic image classification. Computer Methods and Programs
Bootstrap your own latent: A new approach to self-supervised in Biomedicine, 197, 105725.
Learning. Advances in Neural Information Processing Systems, 33, Margapuri, V., & Neilsen, M. (2021). Classification of seeds using
21271–21284. domain randomization on self-supervised learning frameworks. 2021
Gullino, M., Albajes, R., Al-Jboory, I., Angelotti, F., Chakraborty, IEEE Symposium Series on Computational Intelligence (SSCI) (pp.
S., Garrett, K., Hurley, B., Juroszek, P., Makkouk, K., Pan, X., & 01–08). IEEE. https://doi.org/10.1109/SSCI50451.2021.9659998
Stephenson, T. (2021). Scientific review of the impact of climate Masood, A., Al-Jumaily, A., & Anam, K. (2015). Self-supervised
change on plant pests. FAO on behalf of the IPPC Secretariat. https:// learning model for skin cancer diagnosis. 2015 7th International
doi.org/10.4060/cb4769en IEEE/EMBS Conference on Neural Engineering (NER) (pp. 1012–
Hao, G.-F., Zhao, W., & Song, B.-A. (2020). Big data platform: An 1015). IEEE. https://doi.org/10.1109/NER.2015.7146798
emerging opportunity for precision pesticides. Journal of Agricultural Misra, I., & van der Maaten, L. (2020). Self-supervised learning
and Food Chemistry, 68(41), 11317–11319. https://doi.org/10.1021/ of pretext-invariant representations. Proceedings of the IEEE/CVF
acs.jafc.0c05584 Conference on Computer Vision and Pattern Recognition (pp. 6707–
Hržić, F., Štajduhar, I., Tschauner, S., Sorantin, E., & Lerga, J. 6717). IEEE.
(2019). Local-entropy based approach for X-ray image segmentation Mohanty, S. P., Hughes, D. P., & Salathé, M. (2016). Using deep learning
and fracture detection. Entropy, 21(4), 338. https://doi.org/10.3390/ for image-based plant disease detection. Frontiers in Plant Science, 7,
e21040338 1419. https://doi.org/10.3389/fpls.2016.01419
Jubery, T. Z., Carley, C. N., Singh, A., Sarkar, S., Nagasubramanian, K., Jubery, T., Fotouhi Ardakani, F., Mirnezami, S.
Ganapathysubramanian, B., & Singh, A. K. (2021). Using V., Singh, A. K., Singh, A., Sarkar, S., & Ganapathysubramanian, B.
machine learning to develop a fully automated Soybean Nodule (2021). How useful is active learning for image-based plant pheno-
Acquisition Pipeline (SNAP). Plant Phenomics, 2021, 9834746. typing? The Plant Phenome Journal, 4(1), e20020. https://doi.org/10.
https://doi.org/10.34133/2021/9834746 1002/ppj2.20020
Kahn, G., Abbeel, P., & Levine, S. (2021). BADGR: An autonomous Nagasubramanian, K., Singh, A. K., Singh, A., Sarkar, S., &
self-supervised learning-based navigation system. IEEE Robotics and Ganapathysubramanian, B. (2022). Plant phenotyping with limited
Automation Letters, 6(2), 1312–1319. https://doi.org/10.1109/LRA. annotation: Doing more with less. The Plant Phenome Journal, 5,
2021.3057023 e20051. https://doi.org/10.1002/ppj2.20051
Kolesnikov, A., Zhai, X., & Beyer, L. (2019). Revisiting self-supervised Nanni, L., Manfè, A., Maguolo, G., Lumini, A., & Brahnam, S. (2022).
visual representation learning. Proceedings of the IEEE/CVF Confer- High performing ensemble of convolutional neural networks for insect
ence on Computer Vision and Pattern Recognition (pp. 1920–1929). pest image detection. Ecological Informatics, 67, 101515. https://doi.
IEEE. org/10.1016/j.ecoinf.2021.101515
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet Osorio, K., Puerto, A., Pedraza, C., Jamaica, D., & Rodríguez, L. (2020).
classification with deep convolutional neural networks. Commu- A deep learning approach for weed detection in lettuce crops using
nications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065 multispectral images. AgriEngineering, 2(3), 471–488. https://doi.
386 org/10.3390/agriengineering2030032
Kulkarni, O. (2018). Crop disease detection using deep learning. 2018 Rairdin, A., Fotouhi, F., Zhang, J., Mueller, D. S.,
Fourth International Conference on Computing Communication Con- Ganapathysubramanian, B., Singh, A. K., Dutta, S., Sarkar, S., &
trol and Automation (ICCUBEA) (pp. 1–4). IEEE. https://doi.org/10. Singh, A. (2022). Deep learning-based phenotyping for genome wide
1109/ICCUBEA.2018.8697390 association studies of sudden death syndrome in soybean. Frontiers in
Li, W., Chen, P., Wang, B., & Xie, C. (2019). Automatic localization Plant Science, 13, 966244. https://doi.org/10.3389/fpls.2022.966244
and count of agricultural crop pests based on an improved deep learn- Rangarajan, A. K., Purushothaman, R., & Ramesh, A. (2018). Tomato
ing pipeline. Scientific Reports, 9(1), 7024. https://doi.org/10.1038/ crop disease classification using pre-trained deep learning algo-
s41598-019-43171-0 rithm. Procedia Computer Science, 133, 1040–1047. https://doi.org/
Li, W., Zheng, T., Yang, Z., Li, M., Sun, C., & Yang, X. (2021). 10.1016/j.procs.2018.07.070
Classification and detection of insects from field images using deep Razfar, N., True, J., Bassiouny, R., Venkatesh, V., & Kashef, R. (2022).
learning for smart pest management: A systematic review. Ecolog- Weed detection in soybean crops using custom lightweight deep learn-
ical Informatics, 66, 101460. https://doi.org/10.1016/j.ecoinf.2021. ing models. Journal of Agriculture and Food Research, 8, 100308.
101460 https://doi.org/10.1016/j.jafr.2022.100308
Liebhold, A., & Bentz, B. (2011). Insect disturbance and climate change. Riera, L. G., Carroll, M. E., Zhang, Z., Shook, J. M., Ghosal, S., Gao, T.,
USDA Forest Service, Climate Change Resource Center. www.fs. Singh, A., Bhattacharya, S., Ganapathysubramanian, B., Singh, A. K.,
usda.gov/ccrc/topics/insectdisturbance/insect-disturbance & Sarkar, S. (2021). Deep multiview image fusion for soybean yield
Liu, H., HaoChen, J. Z., Gaidon, A., & Ma, T. (2021). Self-supervised estimation in breeding applications. Plant Phenomics, 2021, 9846470.
learning is more robust to dataset imbalance. arXiv preprint. https://doi.org/10.34133/2021/9846470
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018).
Liu, J., & Wang, X. (2021). Plant diseases and pests detection based on
Cross-validation for imbalanced datasets: Avoiding overoptimistic
deep learning: A review. Plant Methods, 17, 22. https://doi.org/10.
and overfitting approaches [research frontier]. IEEE Computational
1186/s13007-021-00722-9
Intelligence Magazine, 13(4), 59–76.
20 of 20 KAR ET AL.
Shook, J., Gangopadhyay, T., Wu, L., Ganapathysubramanian, B., 30th ACM International Conference on Multimedia (pp. 6416–6424).
Sarkar, S., & Singh, A. K. (2021). Crop yield prediction integrating Association for Computing Machinery.
genotype and weather variables using deep learning. PLoS One, 16(6), Wang, M., Xu, S., & Zhou, H. (2020). Self-supervised learning for
e0252402. https://doi.org/10.1371/journal.pone.0252402 low frequency extension of seismic data. SEG Technical Program
Shurrab, S., & Duwairi, R. (2022). Self-supervised learning methods and Expanded Abstracts 2020 (pp. 1501–1505). SEG. https://doi.org/10.
applications in medical imaging analysis: A survey. PeerJ Computer 1190/segam2020-3427086.1
Science, 8, e1045. https://doi.org/10.7717/peerj-cs.1045 Wu, X., Zhan, C., Lai, Y.-K., Cheng, M.-M., & Yang, J. (2019). IP102:
Singh, A., Ganapathysubramanian, B., Singh, A. K., & Sarkar, S. (2016). A large-scale benchmark dataset for insect pest recognition. 2019
Machine learning for high-throughput stress phenotyping in plants. IEEE/CVF Conference on Computer Vision and Pattern Recognition
Trends in Plant Science, 21(2), 110–124. https://doi.org/10.1016/j. (CVPR) (pp. 8779–8788). IEEE. https://doi.org/10.1109/CVPR.2019.
tplants.2015.10.015 00899
Singh, A., Jones, S., Ganapathysubramanian, B., Sarkar, S., Mueller, Xia, D., Chen, P., Wang, B., Zhang, J., & Xie, C. (2018). Insect detection
D., Sandhu, K., & Nagasubramanian, K. (2021a). Challenges and and classification based on an improved convolutional neural network.
opportunities in machine-augmented plant stress phenotyping. Trends Sensors, 18, 4169. https://doi.org/10.3390/s18124169
in Plant Science, 26(1), 53–69. https://doi.org/10.1016/j.tplants.2020. Xie, C., Zhang, J., Li, R., Li, J., Hong, P., Xia, J., & Chen, P. (2015).
07.010 Automatic classification for field crop insects via multiple-task sparse
Singh, A. K., Ganapathysubramanian, B., Sarkar, S., & Singh, A. (2018). representation and multiple-kernel learning. Computers and Electron-
Deep learning for plant stress phenotyping: Trends and future per- ics in Agriculture, 119, 123–132. https://doi.org/10.1016/j.compag.
spectives. Trends in Plant Science, 23(10), 883–898. https://doi.org/ 2015.10.015
10.1016/j.tplants.2018.07.004 Yang, X., Wang, Y., Chen, K., Xu, Y., & Tian, Y. (2022). Fine-grained
Singh, A. K., Singh, A., Sarkar, S., Ganapathysubramanian, B., object classification via self-supervised pose alignment. Proceed-
Schapaugh, W., Miguez, F. E., Carley, C. N., Carroll, M. E., Chiozza, ings of the IEEE/CVF Conference on Computer Vision and Pattern
M. V., Chiteri, K. O., Falk, K. G., Jones, S. E., Jubery, T. Z., Recognition (pp. 7399–7408). IEEE.
Mirnezami, S. V., Nagasubramanian, K., Parmley, K. A., Rairdin, Yang, Y., & Xu, Z. (2020). Rethinking the value of labels for improv-
A. M., Shook, J. M., van der Laan, L., . . . Zhang, J. (2021b). High- ing class-imbalanced learning. Advances in Neural Information
throughput phenotyping in soybean. In J. Zhou & H. T. Nguyen Processing Systems, 33, 19290–19301.
(Eds.), High-throughput crop phenotyping. Concepts and strategies Yi, J., Krusenbaum, L., Unger, P., Hüging, H., Seidel, S. J., Schaaf, G., &
in plant sciences (1st ed., pp. 129–163). Springer. https://doi.org/10. Gall, J. (2020). Deep learning for non-invasive diagnosis of nutrient
1007/978-3-030-73734-4_7 deficiencies in sugar beet using RGB images. Sensors, 20(20), 5893.
Singh, D. P., Singh, A. K., & Singh, A. (2021c). Plant breeding and cul- https://doi.org/10.3390/s20205893
tivar development (1st ed.). Elsevier. https://doi.org/10.1016/C2018- Yu, X., Zhao, Y., & Gao, Y. (2022). SPARE: Self-supervised part erasing
0-01730-2 for ultra-fine-grained visual categorization. Pattern Recognition, 128,
Skendžić, S., Zovko, M., Živković, I. P., Lešić, V., & Lemić, D. (2021). 108691.
The impact of climate change on agricultural insect pests. Insects, Zbontar, J., Jing, L., Misra, I., LeCun, Y., & Deny, S. (2021).
12(5), 440. https://doi.org/10.3390/insects12050440 Barlow Twins: Self-supervised learning via redundancy reduction.
Stutz, D. (2015). Superpixel segmentation: An evaluation. In J. Gall, P. International Conference on Machine Learning (pp. 12310–12320).
Gehler, & B. Leibe (Eds.), Pattern recognition. DAGM 2015. Lecture PMLR.
notes in computer science: Vol. 9358 (pp. 555–562). Springer. https:// Zhang, M., Cheng, S., Cao, X., Chen, H., & Xu, X. (2022). Entropy-
doi.org/10.1007/978-3-319-24947-6_46 based locally adaptive thresholding for image segmentation. https://
Tendle, A., & Hasan, M. R. (2021). A study of the generalizability of papers.ssrn.com/sol3/papers.cfm?abstract_id=4010416
self-supervised representations. Machine Learning with Applications, Zhong, Y., Gao, J., Lei, Q., & Zhou, Y. (2018). A Vision-based
6, 100124. counting and recognition system for flying insects in intelligent
Tetila, E. C., Machado, B. B., Astolfi, G., de Souza Belete, N. A., agriculture. Sensors, 18(5), 1489. https://doi.org/10.3390/s18051
Amorim, W. P., Roel, A. R., & Pistori, H. (2020). Detection and clas- 489
sification of soybean pests using deep learning with UAV images. Zhuang, P., Wang, Y., & Qiao, Y. (2020). Learning attentive pairwise
Computers and Electronics in Agriculture, 179, 105836. https://doi. interaction for fine-grained classification. Proceedings of the AAAI
org/10.1016/j.compag.2020.105836 Conference on Artificial Intelligence, 34(07), 13130–13137.
Thenmozhi, K., & Srinivasulu Reddy, U. (2019). Crop pest classifica-
tion based on deep convolutional neural network and transfer learning.
Computers and Electronics in Agriculture, 164, 104906. https://doi.
org/10.1016/j.compag.2019.104906 How to cite this article: Kar, S., Nagasubramanian,
Venugoban, K., & Ramanan, A. (2014). Image classification of paddy
K., Elango, D., Carroll, M. E., Abel, C. A., Nair, A.,
field insect pests using gradient-based features. International Journal
of Machine Learning and Computing, 4, 1–5. https://doi.org/10.7763/
Mueller, D. S., O’Neal, M. E., Singh, A. K., Sarkar,
IJMLC.2014.V4.376 S., Ganapathysubramanian, B., & Singh, A. (2023).
Waheed, H., Zafar, N., Akram, W., Manzoor, A., Gani, A., & Islam, S. U. Self-supervised learning improves classification of
(2022). Deep learning based disease, pest pattern and nutritional defi- agriculturally important insect pests in plants. The
ciency detection system for “Zingiberaceae” crop. Agriculture, 12(6), Plant Phenome Journal, 6, e20079.
742. https://doi.org/10.3390/agriculture12060742 https://doi.org/10.1002/ppj2.20079
Wang, C., Fu, H., & Ma, H. (2022). PaCL: Part-level contrastive learn-
ing for fine-grained few-shot image classification. Proceedings of the

The Plant Phenome Journal - 2023 - Kar-1

Uploaded by

Copyright:

Available Formats

The Plant Phenome Journal - 2023 - Kar-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Plant Phenome Journal - 2023 - Kar-1

Uploaded by

Copyright:

Available Formats

Received: 23 January 2023 Accepted: 26 June 2023

Self-supervised learning improves classification of agriculturally

Soumyashree Kar1 Koushik Nagasubramanian2 Dinakaran Elango1

The Plant Phenome J. 2023;6:e20079. wileyonlinelibrary.com/journal/ppj2 1 of 20

1 INTRODUCTION DL architecture enables the extraction of a suite of features

autonomous navigation systems (Kahn et al., 2021), seis-

Hyperparameter BYOL NNCLR Barlow Twins

𝑇𝑃 training and validation curves also showed that it rapidly

species identification, damage assessment, yield loss due to REFERENCES

You might also like