0% found this document useful (0 votes)
17 views

Fruit Quality and Defect Image Classification With Conditional GAN Data Augmentation

Uploaded by

Ahamed Zahvie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Fruit Quality and Defect Image Classification With Conditional GAN Data Augmentation

Uploaded by

Ahamed Zahvie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Scientia Horticulturae 293 (2022) 110684

Contents lists available at ScienceDirect

Scientia Horticulturae
journal homepage: www.elsevier.com/locate/scihorti

Fruit quality and defect image classification with conditional GAN


data augmentation
Jordan J. Bird a, *, Chloe M. Barnes b, Luis J. Manso c, Anikó Ekárt b, Diego R. Faria c
a
Department of Computer Science, Nottingham Trent University, Nottingham, United Kingdom
b
College of Engineering and Physical Sciences, Aston University, Birmingham, United Kingdom
c
ARVIS Lab, Aston University, Birmingham, United Kingdom

A R T I C L E I N F O A B S T R A C T

Keywords: Contemporary Artificial Intelligence technologies allow for the employment of Computer Vision to discern good
Fruit quality crops from bad, providing a step in the pipeline of selecting healthy fruit from undesirable fruit, such as those
Cultivar which are mouldy or damaged. State-of-the-art works in the field report high accuracy results on small datasets
Image classification
(<1000 images), which are not representative of the population regarding real-world usage. The goals of this
Data augmentation
Convolutional neural networks
study are to further enable real-world usage by improving generalisation with data augmentation as well as to
Generative adversarial networks reduce overfitting and energy usage through model pruning. In this work, we suggest a machine learning pipeline
that combines the ideas of fine-tuning, transfer learning, and generative model-based training data augmentation
towards improving fruit quality image classification. A linear network topology search is performed to tune a
VGG16 lemon quality classification model using a publicly-available dataset of 2690 images. We find that
appending a 4096 neuron fully connected layer to the convolutional layers leads to an image classification ac­
curacy of 83.77%. We then train a Conditional Generative Adversarial Network on the training data for 2000
epochs, and it learns to generate relatively realistic images. Grad-CAM analysis of the model trained on real
photographs shows that the synthetic images can exhibit classifiable characteristics such as shape, mould, and
gangrene. A higher image classification accuracy of 88.75% is then attained by augmenting the training with
synthetic images, arguing that Conditional Generative Adversarial Networks have the ability to produce new data
to alleviate issues of data scarcity. Finally, model pruning is performed via polynomial decay, where we find that
the Conditional GAN-augmented classification network can retain 81.16% classification accuracy when com­
pressed to 50% of its original size.

1. Introduction apples belonging to one of thirteen cultivars. According to the United


Nations, 87.2 million tonnes of apples were farmed globally in 2019
Recognition of fruit quality is important in smart agriculture to in­ alone (Food and Agriculture Organization of the United Nations, 2021).
crease production efficiency. Contemporary Artificial Intelligence In terms of smart agriculture, this highlights the issue of data scarcity.
technologies allow for the employment of Computer Vision to discern On the problem of crop quality recognition, much data is required to
good crops from bad, providing a step in the pipeline of selecting healthy generalise to a population and thus become apt for real-world use.
fruit from undesirable fruit, such as those which are mouldy or Considering the yield of fruit globally compared to practical data
damaged. collection, bridging this gap manually i.e., collecting more data, is simply
Even during modern times, collecting a dataset that generally rep­ an impossible task.
resents a species of fruit poses difficulties. For example, the widely used Computer vision has long been noted as an important approach for
Fruits 360 dataset (Mureşan and Oltean, 2018) contains 2134 images of fruit processing(Jahanbakhshi and Kheiralipour, 2020; Jahanbakhshi

* Corresponding author.
E-mail addresses: [email protected] (J.J. Bird), [email protected] (C.M. Barnes), [email protected] (L.J. Manso), [email protected] (A. Ekárt), d.
[email protected] (D.R. Faria).
URL: https://jordanjamesbird.com/ (J.J. Bird), https://www.chloembarnes.com/ (C.M. Barnes), https://ljmanso.com/ (L.J. Manso), https://cs.aston.ac.uk/
fariad/ (D.R. Faria).

https://doi.org/10.1016/j.scienta.2021.110684
Received 12 April 2021; Received in revised form 8 October 2021; Accepted 18 October 2021
Available online 1 November 2021
0304-4238/© 2021 Elsevier B.V. All rights reserved.
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

et al., 2020; Khojastehnazhand et al., 2019). Further work also explores et al., 2019). Electronic noses are indeed strongly performing solutions
varying techniques for computer vision in smart agriculture such as to fruit quality recognition as shown by Brezmes et al. (2001), Di Natale
transfer learning from pre-trained weights(Behera et al., 2020; Siddiqi, et al. (2001), and Brezmes et al. (2005).
2019; Xiang et al., 2019), and current research has started to explore the Deep learning approaches to fruit classification are abundant, as
approach of data augmentation in improving such problems(Abbas shown by multiple literature reviews on the subject (Dubey and Jalal,
et al., 2021; Chou et al., 2019). In this work, we focus on exploring a 2015; Hameed et al., 2018; Naik and Patel, 2017; Naranjo-Torres et al.,
solution to this problem for lemon harvesting. Recording data for both 2020; Zawbaa et al., 2014). In comparison, the application of deep
lemons and limes, the United Nations noted that 20 million tonnes were learning to the more fine-grained problem of quality classification are
harvested in 2019 (Food and Agriculture Organization of the United relatively rare. Bhargava and Bansal show that the field of automated
Nations, 2021). According to the CIA World Factbook (Central Intelli­ fruit quality recognition is rapidly growing, in part due to the avail­
gence Agency, 2020), lemon and lime fruit exports comprised over $3.3 ability of newer technologies (Bhargava and Bansal, 2018).
billion USD of international exports in 2019. The largest exporters were Given that one can discern the quality of a fruit based on observation,
Spain ($828 million USD), Mexico ($523 million USD), and the visual features are often noted as of importance when it comes to fruit
Netherlands ($339 million USD). Although the top 15 countries expor­ quality classification. Yamamoto et al. (2015) suggested that Linear
ted 93.5% of all lemons and limes in 2019, many countries are Discriminant Analysis (LDA) of colour, shape, and size enabled the
expanding their efforts; for example, Belize increased lemon and lime formation of a distance matrix which could be used to classify both the
exports by 11,100% from 2018 to 2019, followed by Timor-Leste and cultivar and quality of strawberries. Usage of a single LDA led to a
Georgia which increased exports by 4,200% and 2,115%, respectively. classification accuracy of 42% whereas combining the three analyses
With this information in mind, the ability to autonomously select and caused accuracy to rise to 68%. Capizzi et al. followed a similar tech­
reject lemon fruit based on health, e.g., sorting out those which have nique through texture and gray-features with a Radial Basis Probabilistic
developed mould or gangrene, would allow for further increases in an Neural Network, which scored around 97.25% on a limited set of images
already growing market by increasing production efficiency. of orange fruits (Capizzi et al., 2015). Azizah et al. (2017) provided a
In this article, we explore a Conditional Generative Adversarial solution to defect classification of the mangosteen fruit, attaining a
Network-based solution to data scarcity through training data mean 97.5% 4-fold classification accuracy, albeit with a limited dataset.
augmentation. The main scientific contributions we present in chrono­ This pattern of data scarcity continues as would be expected in the field;
logical order are as follows: in 2020, Fan et al. (2020) found that CNNs could classify apple defects
with around 96.5% accuracy after processing 300 fruit images (150 per
• Exploration of Convolutional Neural Network (CNN) topologies for class). This study also notes the efficiency of deep learning algorithms
fruit quality recognition. post-training, the algorithm was capable of processing 5 fruit images per
• Implementation of a Conditional Generative Adversarial Network second (0.2s each).
(Conditional GAN) for the generation of synthetic healthy and un­ In this work, we take inspiration from Osako et al.’s approaches to
healthy fruit images. The trained synthetic data generation model is cultivar discrimination of lychee fruit (Osako et al., 2020). The study
made available for future work1. showed the success of fruit image classification (albeit for a different
• Augmentation of the original dataset with synthetic images to task of cultivar recognition) when fine-tune transfer learning with the
improve the CNN performance. VGG16 CNN (Simonyan and Zisserman, 2014), and predictions were
• Exploration of features within synthetic images shows that the analysed super-imposed upon the images via Grad-CAM (Selvaraju et al.,
Conditional GAN learns to generate healthy synthetic fruit images 2017) in order to explain useful features for discrimination. We follow a
with no defects, as well as unhealthy synthetic fruit images with similar approach in this work (applied to a new problem) in terms of
defects such as mould and gangrene. fine-tuning of VGG16 and analysis with Grad-CAM, and go a step further
• Model pruning shows that a Conditional GAN-augmented classifi­ in improving classification through data augmentation with a Condi­
cation network can retain 81.16% classification accuracy when tional Generative Adversarial Network to self-regularise the network by
compressed to 50% of its original size. creating new, synthetic fruit images.

2. Background and related work 2.2. Data scarcity and augmentation

2.1. Fruit Quality Recognition Regarding the global fruit yield statistics versus dataset size exam­
ples within the introduction, data scarcity in machine learning is the
Fruit Quality Recognition is a technique where a fruit can be scored reliance of models on exhaustive labelling, providing a limitation to
or classified autonomously by an algorithm given input features such as their real-world use (Zhang, 2020). Given that the use of a model is to
photographs. As previously mentioned, the ability to perform this task aim towards generalisation of a population, a lack of data can lead to a
autonomously (and non-invasively (Vetrekar et al., 2015)) allows for an situation wherein training accuracy scores are high and yet deployment
increase of production efficiency since sorting can be performed by to industry would lead to failure. As noted in the introduction, gathering
machines endowed with such an algorithm. enough data of cultivated fruit is impractical. Without enough data to
Reduction in cost of such a system is of particular interest in the field properly represent the population, models will be prone to overfitting.
given that several Lower Economically Developed Countries (LEDCs) are Given this, other methods are required to prevent overfitting and
expanding the production and export of lemon fruit (Central Intelligence encourage generalisation towards the real-world use of a machine
Agency, 2020). Solutions such as electronic noses can cost up to 100,000 learning model outside the realm of simply collecting more data.
$ USD (Chang and Subramanian, 2008), whereas a camera and com­ Data augmentation is the process of creating new training data by
puter are a fraction of the cost, arguing that image recognition is a more either slightly modifying the data at hand or generating new, synthetic
viable option when cost is an issue. It is worth noting though, that data (Shorten and Khoshgoftaar, 2019). An augmented dataset thus
low-cost electronic noses are currently an expanding line of research provides more training examples for a given task.
within the Sensors and Internet of Things (IoT) fields (García-Orellana Image recognition tasks for Convolutional Neural Network image
classification are affected by data scarcity due to their data re­
quirements (Andriyanov and Andriyanov, 2020; Bloice et al., 2017),
1
Image generation model weights and code are available at: https://github. where many generative models have been recommended to alleviate
com/jordan-bird/synthetic-fruit-image-generator. such issues (Nalepa et al., 2019; Tran et al., 2021). Generative models

2
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Fig. 1. Generalisation of a Conditional GAN topology.

have also been noted to positively impact biological signal classification loss of one network is directly beneficial to the other and vice versa. To
(Anicet Zanini and Luna Colombini, 2020; Bird et al., 2021), semantic give an example of image generation, as this work performs, there are
Image-to-Image Translation (Arantes et al., 2020), speech processing two networks; a generator network which creates images, and a
(Bird et al., 2020; Qian et al., 2019), and Human Activity Recognition discriminator network which classifies the inputs as either real or fake.
(Alnujaim et al., 2019; Erol et al., 2019) among many others. In this As with most deep learning approaches, the gradients of each network
work, we use a Conditional GAN for data augmentation, which are are updated after each training batch with a stochastic gradient algo­
described in the following section. This is based on the literature rithm. The output of the generator network feeds directly into the
wherein GANs and Conditional GANs have been noted to perform discriminator network and thus training of the two networks is auto­
particularly well in image classification (Frid-Adar et al., 2018; Han mated via their competition. In terms of categorical cross-entropy, a
et al., 2019; Lee et al., 2019; Loey et al., 2020). We note specific inspi­ score can be calculated as follows:
ration from Fu et al. (2020), where Conditional GANs have been noted to
Ex [log(D(x))] + Ez [log(1 − D(G(z)))], (1)
perform well on fine-grained images such as classification of bird and
dog breeds (rather than classification of a whole species). This bares where the first part of the equation (Ex [log(D(x))]) is the recognition of
similarity to our problem, where finer details on generally similar im­ real images and the second part (Ez [log(1 − D(G(z)))]) is the recognition
ages dictate which class they belong to. of fake images. Ex and Ez are the expected values over all real and fake
data, respectively; e.g., x is a real input from the dataset and z may begin
2.3. GAN and conditional GAN as a random noise input to a generator. The function D(x) is the prob­
ability that a given data is real and is thus therefore being reversed to
The Generative Adversarial Network (GAN) was first introduced in discern fake images. Note that D(x) is replaced by G(z) in the second part
2014 (Goodfellow et al., 2014). The idea behind the GAN is to have two of the equation, this is due to input to the discriminator being Generator
neural networks compete in a zero-sum game ergo adversarial, i.e., the G’s output when presented with random input vector z. This is known as

Fig. 2. A general overview of our proposed approach for data augmentation towards fruit quality image classification.

3
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Fig. 3. Visualisation of healthy and unhealthy lemons within the dataset. Mouldy, gangrenous, and those with a dark style remaining are all considered unhealthy.

a minimax loss, since the generator’s aim is to maximise Eq. (1) while the we sort through the dataset to apply a single binary class label of
discriminator aims to minimise it. “healthy” or “unhealthy” to each of the fruit images.
A Conditional Generative Adversarial Network (CGAN or Condi­ Given the computational complexity of the algorithms involved, the
tional GAN) (Mirza and Osindero, 2014) is an extension of the above images are then resized to 256 × 256 pixels; this resolution still allows
technology, but with a given class label. That is, the generator now aims for the visualisation of undesirable features while reducing the total
to learn to generate images belonging to one of n classes, in this work number of RGB pixel values from 3,345,408 (1056 × 1056 × 3) to
this is a binary label of “healthy” and “unhealthy”. Eq. (1) can be 196,608 (256 × 256 × 3). This reduction to 5% of the original model
extended as follows: inputs reduces the amount of memory required for training of all
models, since the use of full resolution images is not feasible for
Ex [log(D(x|y))] + Ez [log(1 − D(G(z|y)))], (2)
consumer-grade hardware. In terms of deployment and real-world
where data objects x and D(z) are given class label y. Therefore, D(x|y) is usage, robots themselves will have energy restrictions due to the pro­
the discriminator’s probability that x is real given class label y, and cessing cost and profit tradeoff regarding automation of fruit sorting.
G(z|y) is the output of the generator with random vector z given class Thus, this reduction in image size increases the practicality of the
label y. This minute difference in topology from a GAN, as can be approach.
observed in Fig. 1, allows for the generation of objects belonging to To better discern noise throughout the generative learning process,
multiple classes. If the dataset in this work was presented to a vanilla the black background is replaced with white. Although it would have no
GAN, the network would learn to generate fake fruit images by learning effect on the training process of the model, the background is replaced so
from real fruit, thus two networks would then be needed for the gen­ visual glitches can be better discerned through manual observation
eration of either class. Said networks would have to train independently throughout training. For example, later in Fig. 7, several small glitches
of one another. By using a Conditional GAN, we can specify to the occur in the eighth generation of outputs that would have been more
network whether we want it to generate healthy or unhealthy fruit by difficult to observe in the presence of a dark background.
learning not only to generate them in the general sense, but also learning Examples of some images in the preprocessed dataset can be found in
from the significance of a class label. Fig. 3. Healthy lemons are gathered for the healthy class, whereas
mouldy, gangrenous, and those with a dark style remaining are gathered
to form the unhealthy class.
3. Method
3.2. Data augmentation via conditional GAN
A general overview of the proposed approach can be observed in
Fig. 2, where the training data is augmented with a Conditional GAN
For data augmentation, a Conditional GAN is utilised. The model is
approach. In this section, we describe the method for each step of the
selected since it supports the concatenation of the generator and
experiments performed.
discriminator networks with a second input of the class label. That is, the
context of a healthy or unhealthy fruit is specified, and so the model will
3.1. Data collection and preprocessing learn to generate images as belonging to either one of the two classes.
The initial input to the generator is a vector representing a three-
Initially, an open source dataset of lemon images were acquired from channel 8 × 8 pixel image (8 × 8 × 3). By using Convolutional Trans­
SoftwareMill (Adamiak, 2020)2. The dataset contains 2690 images of pose layers, this is eventually upscaled to a 256 × 256 RGB image. The
lemons at a resolution of 1056 × 1056 pixels and are annotated in COCO discriminator network downsamples twice with convolutional layers of
format. Given that each COCO annotation describes one class, i.e., a fruit 128 neurons, a kernel size of (3,3) and a stride of (2,2). Each layer uti­
that exhibits both mould and gangrene will have two individual entries, lises LeakyReLU activation (Maas et al., 2013) whereas the output is set
as a hyperbolic tangent for scaling, and the ADAM optimiser (Kingma
and Ba, 2014) is used to train. Latent space for class label interpretation
2
Note: None of the authors of this work are affiliated with SoftwareMill. is of size 100. Hyperparameter selections are based on the findings of the

4
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Fig. 4. Overview of the VGG16 topology and custom interpretation and output layer which is used for binary image classification.

studies in Radford et al. (2015).


The Conditional GAN was first initially trained for 500 epochs, Table 1
A linear search for the best number of CNN interpretation neurons for the
manual exploration of the produced synthetic data showed promise, but
classification of the real dataset, repeated three times with different random
several severe visual glitches still occurred. Due to this, the training was
seeds.
extended and performed for 2000 epochs in total with a batch size of 64.
CNN Output Validation Accuracy (%)
It was also observed that batch sizes below 64 caused the generator to
Interpretation Neurons
cease training after around 10 epochs and failed to learn any further. Seed = 1 Seed = 2 Seed = 3 Mean

8 60.1 60.97 57.13 59.40


16 60.1 78.31 71.5 69.97
3.3. Classification, model analysis and pruning 32 81.41 60.97 57.13 66.50
64 75.71 78.93 57.13 70.59
The image classification network itself utilises the concept of fine- 128 81.04 79.8 76.33 79.06
256 73.23 75.84 65.55 71.54
tune transfer learning from a large ImageNet-trained model, VGG16
512 81.91 80.79 79.06 80.59
(Simonyan and Zisserman, 2014). A diagram of the Convolutional 1024 73.85 83.02 80.17 79.01
Neural Network topology we use for the classification of fruit quality 2048 83.4 81.91 79.68 81.66
images can be observed in Fig. 4. The final three ReLu layers and Soft­ 4096 82.28 83.77 79.93 81.99
Max predictions have been replaced by a single interpretation layer and 8192 81.54 81.78 81.41 81.58

a single output neuron with a sigmoid activation function for the opti­
misation of binary cross-entropy. sizes, more apt for real-world usage (Fountsop et al., 2020; Molchanov
The number of neurons within the interpretation layer is optimised et al., 2019), through polynomial decay. For each model, 9 pruning
through a linear search of (8, 16, 32, 64,..., 8192) neurons. The network experiments were performed with weight sparsity ranging from 0.9
is given a maximum of 100 epochs to train, but training is stopped early (10% of original size) to 0.1 (90% of original size). Pruning was per­
if no further learning occurs within a time of 10 epochs. Hyper­ formed on the whole model for 20 epochs, from a sparsity of 0 (full size)
parameters are set to a batch size of 64, and the Adam optimiser has a to the given value for the individual experiment (0, 0.1, 0.2,..., 0.9).
learning rate of 0.001, β1 = 0.9, β2 = 0.999, and ϵ = 1e − 07.
Explainiability in AI is important, especially when such algorithms
are considered for real world usage. Algorithms tend to operate in a 3.4. Experimental software and hardware
black-box like nature, for example, the algorithm in this work would
take as input an image of a lemon fruit and produce a class label output, The models in this work were implemented in the Keras library with
corresponding to whether the fruit is healthy or not. With this in mind, a TensorFlow backend. Models were trained on an RTX 2080Ti GPU
regardless of the training accuracy scores attained, further analysis is (4352 CUDA cores).
needed to explain why decisions are made and predictions are given. We
analyse several synthetic images through Gradient-weighted Class 4. Results
Activation Mapping (Grad-CAM) (Huff et al., 2021; Selvaraju et al.,
2017). Class activation maps are produced by the convolutional neural 4.1. Non-augmentation results
network trained only on real images when given synthetic data as input.
This allows us to confirm that undesirable characteristics are indeed Table 1 shows the exploration of interpretation neurons following
both generated and classified as being important; since a GAN genera­ the VGG16 ImageNet Convolutional Neural Network. Following the
tor’s goal is simply to learn to outperform the discriminator, and this three tests, the lowest observed accuracy was that from 8 interpretation
may be done through other means i.e. finding methods to trick the neurons with a mean classification accuracy of 59.4%. The best model
classifier. was the network with 4096 interpretation neurons, which scored
Considering that processing is performed on large quantities of fruit 81.99% classification accuracy, and the best individual run was the
and that there may be time and energy restrictions, model pruning is second run of the 4096-neuron network which scored 83.77% classifi­
performed on the network to explore the possibility of smaller model cation accuracy.

5
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Fig. 5. Observed loss (errors) for the generator network of the Conditional GAN during training.

Fig. 6. Observed loss (errors) for the discriminator network of the Conditional GAN during training. Loss is calculated for recognition of real and fake im­
ages separately.

Fig. 7. Examples of generator outputs at the lowest observed loss compared to the final epoch.

6
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Fig. 8. Examples of both real photographs and Conditional GAN outputs (Epoch 2000) for the two classes of healthy and unhealthy lemons.

4.2. Conditional GAN training

Separated into two graphs for readability purposes, Figs. 5 and 6


show the observed losses for the generator and discriminator networks
of the Conditional GAN, respectively. Albeit with several anomalous
spikes, it can be observed that the generator starts at a high loss of 5.5
which drops throughout the first few epochs. The discriminator, as can
be expected, starts low given the quality of output by the generator. The
generator can be seen to rise steadily for the first 500 epochs before
becoming relatively stabilised. The two discriminator network losses
were lower throughout these first 500 epochs before showing an oscil­
latory nature during the remainder of the learning process.
At the end of the final epoch, the discriminator losses were 0.013 for
the real images and 0.005 for the fake images produced by the gener­ Fig. 9. Two synthetic images of lemons belonging to the “unhealthy” class
ator. The generator loss was 6.648. Indeed, this value of 6.648 is by far generated by the Conditional GAN at full resolution. The lemons are seemingly
not the lowest observed, but it is important to consider the nature of presenting with symptoms of both mould and gangrene, showing that the
GANs; the losses of the two networks are relative to one another, i.e. it is generator model has learnt to generate fruit with undesirable characteristics.
an adversarial score. To provide an example of this, Fig. 7 shows a An unrealistic ’checker-boarding’ effect has also been generated by the Con­
comparison of the images produced by the generator with the lowest loss ditional GAN on the surface of the fruit.
(epoch 8 at 1.063) and then the generator at the final epoch. Evidently,
the images produced by the final generator are of much higher quality several patterns have been observed during training, and these patterns
than those output by the generator when it experiences the lowest are then applied while generating new images. A higher resolution
observed loss. For this reason, the final generator is selected as the example of synthetic lemon images showing features of mould and
synthetic data producing model in this work, although it is suggested in gangrene can be observed in Fig. 9. The generator has seemingly learnt
the future to explore the quality at multiple stages. to cast light and shadows on the fruit as well as undesirable character­
Fig. 8 shows 18 real photographs for healthy and unhealthy classed istics; both fruits have a texture and colour similar to mould on the
lemons, as well as 18 examples of Conditional GAN generator output for surface, and the second fruit seems to have a darker patch towards the
healthy and unhealthy classed lemons. Interestingly, many of the syn­ top.
thetic images seem to be more reminiscent of potatoes than lemons; Fig. 10 shows class activation maps on six synthetic images (three per
given the nature of GANs, this is due to the model’s generalisation of the class) from the convolutional neural network trained only on the real
dataset which contained lemons photographed at different angles - and data. All six images were predicted to belong to their ground-truth
so this generalisation of a shape is reflected in the produced images. A classes. Note that on the bottom row the Convolutional Neural
more uniform colour can be observed in healthy synthetic lemons, Network focuses on issues on the flesh of the fruit such as mould in the
whereas unhealthy lemons are given mould and dark styles, as well as first two images and a dark patch which may indicate gangrene on the
several instances of gangrene. Similarly to the potato-like shape of the right-most bottom image. Additionally, the class activation maps for the
outputs, this could be attributed to the generalisation nature of GANs,

7
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Fig. 10. Grad-CAM analysis of six outputs from the Conditional GAN. Top row shows images belonging to the “healthy” class and bottom row shows images
belonging to the “unhealthy” class. This Grad-CAM VGG16 CNN is trained only on real photographs, and no training has been performed on synthetic images.

Fig. 11. Comparison of how the vanilla CNN (no augmentation) performs against the models which have additional synthetic training data present.

healthy fruit exist more generally around the shape. These behaviours augmentation approaches scored higher than training only on the real
are seemingly reminiscent of how a human would analyse the images, images, showing that augmentation has had a positive effect on the
either focusing on unhealthy characteristics if the fruit is bad or learning process.The best set of results were achieved augmenting the
observing the general image when the fruit shows no undesirable dataset with 400 images (13.51% of the whole dataset, 200 per class),
features. which scored a classification accuracy of 88.75% when classifying un­
This analysis further shows that both desirable and undesirable seen images of both healthy and unhealthy lemons.
characteristics are generalised, and, to an extent reproduced by the Although there was a varying number of classification accuracies
generative model. Indeed, the synthetic images are not perfect (as can be recorded, note that even the weakest augmentation approach (2200
observed when comparing them via Fig. 8), but these activation maps images) caused the image recognition ability to rise from 83.77% to
provide insight into the useful synthetic knowledge existing within 85.02%. That is, of the 15 trials performed, all augmentation approaches
them. outperformed the vanilla CNN. These results thus argue that Conditional
GAN-based training data augmentation is a promising approach to
improve fruit quality image classification. The generator model and
4.3. Classification comparison weights are made publicly available for further exploration.

The results from the best CNN experiment along with the augmen­
tation approaches can be observed in Fig. 11 and Table 2. All

8
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Table 2 4.4. Pruning


Comparison of non-augmented (first row) to training data augmentation with
Conditional GAN-generated images. Table 3 and Fig. 12 show a comparison of the pruning experiments
Synthetic Images Augmented (per Synthetic Data Classification Accuracy from 90% to 10% compression (through sparsity i.e. zeroed weights).
Class) (Total) (%) Although the final two models with all neurons present the best results,
0 0 83.77 we note earlier that a faster model would be ideal for the problem at
100 200 87.25 hand, since fruits are processed in large quantities and the models might
200 400 88.75 need being ran under time and energy restrictions. It is interesting to
300 600 87.63
note from these results that the data augmented network tends to out­
400 800 87.14
500 1000 87.51 peform the vanilla network in all cases. It is also interesting to note that a
600 1200 87.75 data augmentation network compressed to half of its original size can
700 1400 85.64 still perform at an accuracy of 81.16%, effectively doubling production
800 1600 86.51 at a loss of 7.59% accuracy.
900 1800 87.01
1000 2000 87.63
1100 2200 85.02 5. Conclusion and future work
1200 2400 86.88
1300 2600 85.89 Lemon fruit are of growing financial importance in LDC nations, in
1400 2800 85.52
some cases the percentage annual increase of exports were measured in
1500 3000 87.63
the thousands. The possibility of increasing productivity and reducing
the operating costs of those systems are thus particularly interesting.
This work has suggested methods to both improve autonomous fruit
Table 3 quality recognition, followed by linear pruning to reduce the complexity
Comparison of models when pruning with polynomial decay (for smaller model
of the computer vision model.
sizes) for both the vanilla and data augmentation approaches.
To finally conclude, we first noted that autonomous sorting of
Pruned model size (% of Final Post-pruning Classification Accuracy healthy fruit from undesirable fruit is possible through contemporary
original) Sparsity (%)
computer vision technologies. We explored the concepts of fine-tuning,
Vanilla Augmented (200 synthetic transfer learning, and Conditional GAN-based training data augmenta­
images)
tion; our results showed that the recognition (and thus sorting) of fruit
10 0.9 60.1 60.1 images was improved when augmentation was introduced. We found
20 0.8 73.85 73.98 that introducing 400 synthetic data points had the largest impact,
30 0.7 75.71 79.18
raising the recognition accuracy from 83.77% to 88.75% via a con­
40 0.6 77.45 78.31
50 0.5 80.79 81.16 volutional neural network. Finally, we performed Grad-CAM analysis on
60 0.4 81.54 82.16 the model trained only on real photographs to show that the Conditional
70 0.3 82.16 82.65 GAN was successful in imagining the undesirable characteristics of the
80 0.2 83.64 84.76
flesh of the synthetic fruit generated. Thus, this work argues that Con­
90 0.1 82.9 83.51
100 0 83.77 88.75 ditional Generative Adversarial Networks have the ability to produce
new data to alleviate issues of data scarcity in the problem of fruit health
classification.
Although we explored the lowest loss of the generator compared to
the final loss, and a vast improvement was made albeit with the said
higher loss, future further exploration of model weights throughout the

Fig. 12. Comparison of the polynomial decay pruning accuracies for both vanilla and data augmentation models.

9
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

training process may be useful to explore. Given the behaviour of losses Alnujaim, I., Oh, D., Kim, Y., 2019. Generative adversarial networks to augment micro-
doppler signatures for the classification of human activity. IGARSS 2019-2019 IEEE
spiking, several of the later models should be explored in order to
International Geoscience and Remote Sensing Symposium. IEEE, pp. 9459–9461.
ascertain whether the final epoch did produce the best model for this Andriyanov, N., Andriyanov, D., 2020. The using of data augmentation in machine
approach. Although the Conditional GAN-based approach showed learning in image processing tasks in the face of data scarcity. Journal of Physics:
promise to the classical CNN, one limitation of this work is that we Conference Series, 1661. IOP Publishing, p. 012018.
Anicet Zanini, R., Luna Colombini, E., 2020. Parkinson’s disease EMG data augmentation
compare the best train-test CNN to the Conditional GAN; in future, given and simulation with DCGANS and style transfer. Sensors 20 (9), 2605.
resource availability to quickly train several Conditional GANs, a better Arantes, R.B., Vogiatzis, G., Faria, D.R., 2020. CSC-GAN: Cycle and semantic consistency
metric for evaluation would be to train k Conditional GANs during k-fold for dataset augmentation. International Symposium on Visual Computing. Springer,
pp. 170–181.
cross validation. This would allow for better scientific accuracy. Note Azizah, L.M., Umayah, S.F., Riyadi, S., Damarjati, C., Utama, N.A., 2017. Deep learning
that the Conditional GAN was trained for 17 hours on a leading GPU, implementation using convolutional neural network in mangosteen surface defect
and thus such a solution would only likely be viable in the future with detection. 2017 7th IEEE International Conference on Control System, Computing
and Engineering (ICCSCE). IEEE, pp. 242–246.
better technology. Behera, S.K., Rath, A.K., Sethy, P.K., 2020. Maturity status classification of papaya fruits
Following the augmentation approach, the classification accuracy based on machine learning and transfer learning approach. Inf. Process. Agric.
rose from 83.77% to 88.75%. In future work, further exploration into the Bhargava, A., Bansal, A., 2018. Fruits and vegetables quality evaluation using computer
vision: a review. J. King Saud Univ.-Comput. Inf. Sci.
classification model should be performed to further increase this level of Bird, J.J., Faria, D.R., Premebida, C., Ekárt, A., Ayrosa, P.P., 2020. Overcoming data
recognition accuracy. In addition to further improvement of binary scarcity in speaker identification: dataset augmentation with synthetic MFCCS via
classification, future work could also discern the type of unhealthy character-level RNN. 2020 IEEE International Conference on Autonomous Robot
Systems and Competitions (ICARSC). IEEE, pp. 146–151.
attribute, for example, the specificity between damaged and rotten fruit,
Bird, J.J., Pritchard, M.G., Fratini, A., Ekart, A., Faria, D., 2021. Synthetic biological
which directs the fruit to the correct destination; in this case, a juicer or signals machine-generated by GPT-2 improve the classification of EEG and EMG
disposal, respectively. through data augmentation. IEEE Robot. Autom. Lett.
Since the lemon dataset is relatively new, there are no published Bloice, M. D., Stocker, C., Holzinger, A., 2017. Augmentor: an image augmentation
library for machine learning. arXiv:1708.04680.
works to directly compare our results to. In the future, once the dataset Brezmes, J., Fructuoso, M.L., Llobet, E., Vilanova, X., Recasens, I., Orts, J., Saiz, G.,
has been explored by other works, it would be useful to have a direct Correig, X., 2005. Evaluation of an electronic nose to assess fruit ripeness. IEEE Sens.
comparison of all results when attempting to classify the data. J. 5 (1), 97–108.
Brezmes, J., Llobet, E., Vilanova, X., Orts, J., Saiz, G., Correig, X., 2001. Correlation
Pruning showed that the models could be presented at a fraction of between electronic nose signals and fruit quality indicators on shelf-life
their original size and only lose a small amount of classification ability. measurements with pinklady apples. Sens. Actuators B 80 (1), 41–50.
With this in mind, other pruning techniques could be explored in the Capizzi, G., Sciuto, G.L., Napoli, C., Tramontana, E., Woźniak, M., 2015. Automatic
classification of fruit defects based on co-occurrence matrix and neural networks.
future to further maximise the model’s ability in terms of real-world 2015 Federated Conference on Computer Science and Information Systems
usage. (FedCSIS). IEEE, pp. 861–867.
Some future work is also needed for the real-world use of this Central Intelligence Agency, 2020. The world factbook field listing: exports –
commodities. https://www.cia.gov/the-world-factbook/field/exports-commodities.
approach. The nature of the dataset that these models are trained with Accessed: 16th March 2021.
dictate that image segmentation is required in order to separate the fruit Chang, J.B., Subramanian, V., 2008. Electronic noses sniff success. IEEE Spectr. 45 (3),
from the background (such as a surface or other fruit). If a segmentation 50–56.
Chou, Y.-C., Kuo, C.-J., Chen, T.-T., Horng, G.-J., Pai, M.-Y., Wu, M.-E., Lin, Y.-C.,
network is trained and applied prior to the preprocessing described in
Hung, M.-H., Su, W.-T., Chen, Y.-C., et al., 2019. Deep-learning-based defective bean
these techniques, then the approach can be tested on more data in a inspection with GAN-structured automated labeled data augmentation in coffee
much higher volume, i.e., fruit production. industry. Appl. Sci. 9 (19), 4166.
Di Natale, C., Macagnano, A., Martinelli, E., Paolesse, R., Proietti, E., D’Amico, A., 2001.
The evaluation of quality of post-harvest oranges and apples by means of an
Model and Code Availability electronic nose. Sens. Actuators B 78 (1–3), 26–31.
Dubey, S.R., Jalal, A.S., 2015. Application of image processing in fruit and vegetable
For replicability purposes as well as future research, the generator analysis: a review. J. Intell. Syst. 24 (4), 405–424.
Erol, B., Gurbuz, S.Z., Amin, M.G., 2019. GAN-based synthetic radar micro-doppler
model and synthetic image generation Python code have been made augmentations for improved human activity recognition. 2019 IEEE Radar
available at: https://github.com/jordan-bird/synthetic-fruit-image- Conference (RadarConf). IEEE, pp. 1–5.
generator. Fan, S., Li, J., Zhang, Y., Tian, X., Wang, Q., He, X., Zhang, C., Huang, W., 2020. On line
detection of defective apples using computer vision system combined with deep
learning methods. J. Food Eng. 286, 110102.
CRediT authorship contribution statement Food and Agriculture Organization of the United Nations, 2021. FAOSTAT: crops.
http://www.fao.org/faostat/en/#data/QC. Accessed: 16th March 2021.
Fountsop, A.N., Ebongue Kedieng Fendji, J.L., Atemkeng, M., 2020. Deep learning
Jordan J. Bird: Conceptualization, Methodology, Software, Writing models compression for agricultural plants. Appl. Sci. 10 (19), 6866.
– review & editing. Chloe M. Barnes: Conceptualization, Methodology, Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H., 2018.
Software, Writing – review & editing. Luis J. Manso: Conceptualization, GAN-based synthetic medical image augmentation for increased CNN performance
in liver lesion classification. Neurocomputing 321, 321–331.
Methodology, Software, Writing – review & editing, Supervision. Anikó Fu, Y., Li, X., Ye, Y., 2020. A multi-task learning model with adversarial data
Ekárt: Conceptualization, Methodology, Software, Writing – review & augmentation for classification of fine-grained images. Neurocomputing 377,
editing, Supervision. Diego R. Faria: Conceptualization, Methodology, 122–129.
García-Orellana, C.J., Macías-Macías, M., González-Velasco, H.M., García-Manso, A.,
Software, Writing – review & editing, Supervision.
Gallardo-Caballero, R., 2019. Low-power and low-cost environmental IoT electronic
nose using initial action period measurements. Sensors 19 (14), 3183.
Declaration of Competing Interest Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., Bengio, Y., 2014. Generative adversarial nets. Proceedings of the 27th
International Conference on Neural Information Processing Systems - Volume 2. MIT
The authors declare that they have no known competing financial Press, Cambridge, MA, USA, pp. 2672–2680.
interests or personal relationships that could have appeared to influence Hameed, K., Chai, D., Rassau, A., 2018. A comprehensive review of fruit and vegetable
classification techniques. Image Vis. Comput. 80, 24–44.
the work reported in this paper. Han, C., Rundo, L., Araki, R., Nagano, Y., Furukawa, Y., Mauri, G., Nakayama, H.,
Hayashi, H., 2019. Combining noise-to-image and image-to-image GANS: brain MR
References image augmentation for tumor detection. IEEE Access 7, 156966–156977.
Huff, D.T., Weisman, A.J., Jeraj, R., 2021. Interpretation and visualization techniques for
deep learning models in medical imaging. Phys. Med. Biol. 66 (4), 04TR01.
Abbas, A., Jain, S., Gour, M., Vankudothu, S., 2021. Tomato plant disease detection using
Jahanbakhshi, A., Kheiralipour, K., 2020. Evaluation of image processing technique and
transfer learning with C-GAN synthetic images. Comput. Electron. Agric. 187,
discriminant analysis methods in postharvest processing of carrot fruit. Food Sci.
106279.
Nutr. 8 (7), 3346–3352.
Adamiak, M., 2020. Lemons quality control dataset. https://github.com/softwar
emill/lemon-dataset. Accessed: 16th March 2021. doi:10.5281/zenodo.3965568.

10
J.J. Bird et al. Scientia Horticulturae 293 (2022) 110684

Jahanbakhshi, A., Momeny, M., Mahmoudi, M., Zhang, Y.-D., 2020. Classification of sour Qian, Y., Hu, H., Tan, T., 2019. Data augmentation using generative adversarial networks
lemons based on apparent defects using stochastic pooling mechanism in deep for robust speech recognition. Speech Commun. 114, 1–9.
convolutional neural networks. Sci. Horticult. 263, 109133. Radford, A., Metz, L., Chintala, S., 2015. Unsupervised representation learning with deep
Khojastehnazhand, M., Mohammadi, V., Minaei, S., 2019. Maturity detection and convolutional generative adversarial networks. arXiv:1511.06434.
volume estimation of apricot using image processing technique. Sci. Horticult. 251, Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-
247–251. cam: visual explanations from deep networks via gradient-based localization.
Kingma, D. P., Ba, J., 2014. Adam: a method for stochastic optimization. arXiv: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626.
1412.6980. Shorten, C., Khoshgoftaar, T.M., 2019. A survey on image data augmentation for deep
Lee, M.B., Kim, Y.H., Park, K.R., 2019. Conditional generative adversarial network-based learning. J. Big Data 6 (1), 1–48.
data augmentation for enhancement of iris recognition accuracy. IEEE Access 7, Siddiqi, R., 2019. Effectiveness of transfer learning and fine tuning in automated fruit
122134–122152. image classification. Proceedings of the 2019 3rd International Conference on Deep
Loey, M., Manogaran, G., Khalifa, N.E.M., 2020. A deep transfer learning model with Learning Technologies, pp. 91–100.
classical data augmentation and CGAN to detect covid-19 from chest ct radiography Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale
digital images. Neural Comput. Appl. 1–13. image recognition. arXiv:1409.1556.
Maas, A.L., Hannun, A.Y., Ng, A.Y., 2013. Rectifier nonlinearities improve neural Tran, N.-T., Tran, V.-H., Nguyen, N.-B., Nguyen, T.-K., Cheung, N.-M., 2021. On data
network acoustic models. Proc. ICML, 30. Citeseer, p. 3. augmentation for gan training. IEEE Trans. Image Process. 30, 1882–1897.
Mirza, M., Osindero, S., 2014. Conditional generative adversarial nets. arXiv:1411.1784. Vetrekar, N., Gad, R., Fernandes, I., Parab, J., Desai, A., Pawar, J., Naik, G.,
Molchanov, P., Mallya, A., Tyree, S., Frosio, I., Kautz, J., 2019. Importance estimation for Umapathy, S., 2015. Non-invasive hyperspectral imaging approach for fruit quality
neural network pruning. Proceedings of the IEEE/CVF Conference on Computer control application and classification: case study of apple, chikoo, guava fruits.
Vision and Pattern Recognition, pp. 11264–11272. J. Food Sci. Technol. 52 (11), 6978–6989.
Mureşan, H., Oltean, M., 2018. Fruit recognition from images using deep learning. Acta Xiang, Q., Wang, X., Li, R., Zhang, G., Lai, J., Hu, Q., 2019. Fruit image classification
Univ. Sapientiae Inform. 10 (1), 26–42. based on mobilenetv2 with transfer learning technique. Proceedings of the 3rd
Naik, S., Patel, B., 2017. Machine vision based fruit classification and grading-a review. International Conference on Computer Science and Application Engineering,
Int. J. Comput. Appl. 170 (9), 22–34. pp. 1–7.
Nalepa, J., Marcinkiewicz, M., Kawulok, M., 2019. Data augmentation for brain-tumor Yamamoto, K., Ninomiya, S., Kimura, Y., Hashimoto, A., Yoshioka, Y., Kameoka, T.,
segmentation: a review. Front. Comput. Neurosci. 13, 83. 2015. Strawberry cultivar identification and quality evaluation on the basis of
Naranjo-Torres, J., Mora, M., Hernández-García, R., Barrientos, R.J., Fredes, C., multiple fruit appearance features. Comput. Electron. Agric. 110, 233–240.
Valenzuela, A., 2020. A review of convolutional neural network applied to fruit Zawbaa, H.M., Hazman, M., Abbass, M., Hassanien, A.E., 2014. Automatic fruit
image processing. Appl. Sci. 10 (10), 3443. classification using random forest algorithm. 2014 14th International Conference on
Osako, Y., Yamane, H., Lin, S.-Y., Chen, P.-A., Tao, R., 2020. Cultivar discrimination of Hybrid Intelligent Systems. IEEE, pp. 164–168.
litchi fruit images using deep learning. Sci. Horticult. 269, 109360. Zhang, J., 2020. Towards Robust Machine Learning Models for Data Scarcity. Arizona
State University. Ph.D. thesis.

11

You might also like