0% found this document useful (0 votes)
19 views

DCNN-BasedVegetableImageClassificationUsingTransferLearning AComparativeStudy

This document discusses a study comparing different deep convolutional neural network (DCNN) models for vegetable image classification using transfer learning. The authors develop a CNN model from scratch and fine-tune pre-trained models like VGG16, InceptionV3, ResNet, and MobileNet on a dataset of 21,000 vegetable images from 15 classes. Experimental results are presented comparing the performance of the different CNN architectures. The study finds that transfer learning techniques using pre-trained models can achieve better classification accuracy than a traditional CNN when the training dataset is small.

Uploaded by

Nara Zero
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

DCNN-BasedVegetableImageClassificationUsingTransferLearning AComparativeStudy

This document discusses a study comparing different deep convolutional neural network (DCNN) models for vegetable image classification using transfer learning. The authors develop a CNN model from scratch and fine-tune pre-trained models like VGG16, InceptionV3, ResNet, and MobileNet on a dataset of 21,000 vegetable images from 15 classes. Experimental results are presented comparing the performance of the different CNN architectures. The study finds that transfer learning techniques using pre-trained models can achieve better classification accuracy than a traditional CNN when the training dataset is small.

Uploaded by

Nara Zero
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352846889

DCNN-Based Vegetable Image Classification Using Transfer Learning: A


Comparative Study

Conference Paper · May 2021


DOI: 10.1109/ICCCSP52374.2021.9465499

CITATIONS READS
5 3,775

3 authors:

M. Israk Ahmed Shahriyar Mahmud Mamun


Memorial University of Newfoundland Daffodil International University
2 PUBLICATIONS   5 CITATIONS    2 PUBLICATIONS   5 CITATIONS   

SEE PROFILE SEE PROFILE

Asif Uz Zaman Asif


Daffodil International University
1 PUBLICATION   5 CITATIONS   

SEE PROFILE

All content following this page was uploaded by M. Israk Ahmed on 17 July 2021.

The user has requested enhancement of the downloaded file.


2021 5th International Conference on Computer, Communication and Signal Processing (ICCCSP - 2021)

DCNN-Based Vegetable Image Classification Using


Transfer Learning: A Comparative Study
M. Israk Ahmed Shahriyar Mahmud Mamun Asif Uz Zaman Asif
Dept. of CSE Dept. of CSE Dept. of CSE
Daffodil International University Daffodil International University Daffodil International University
Dhaka, Bangladesh Dhaka, Bangladesh Dhaka, Bangladesh
[email protected] [email protected] [email protected]

Abstract—In this paper, an attempt is addressed towards accu- As several steps of vegetable production to consumption is
rate vegetable image classification. A dataset consisting of 21,000 still depended on manual operation with a huge sum of labor
images of 15 classes is used for this classification. Convolutional constrain, it truly influences the advancement of commercial-
neural network, a deep learning algorithm is the most efficient
tool in the machine learning field for classification problems. But ization of vegetable items. To solve this issue, automation in
CNN requires large datasets so that it performs well in natural vegetable picking, sorting, labeling is required by introducing
image classification problems. Here, we conduct an experiment a vegetable image classifier so that time and money can be
on the performance of CNN for vegetable image classification saved. In modern days, in the field of agriculture, fundamental
by developing a CNN model from the ground. Additionally, research work are classification and detection. As there are
several pre-trained CNN architectures using transfer learning
are employed to compare the accuracy with the typical CNN. various sorts of vegetables and numerous individuals don’t
This work proposes the study between such typical CNN and its have any idea about them. So, the plan of a vegetable classifier
architectures(VGG16, MobileNet, InceptionV3, ResNet etc.) to will likewise carry straightforwardness to individuals lives.
build up which technique would work best regarding accuracy Also, the sorting of vegetables is manually done in super shops
and effectiveness with new image datasets. Experimental results and distribution centers. Therefore, to solve these problems
are presented for all the proposed architectures of CNN. Besides,
a comparative study is done between developed CNN models this research is conducted. The purpose of this paper is
and pre-trained CNN architectures. And the study shows that to classify vegetable images with higher accuracy, with the
by utilizing previous information gained from related large- help of CNN and pre-trained DCNN with transfer learning.
scale work, the transfer learning technique can achieve better Nowadays, convolutional neural network is often used for clas-
classification results over traditional CNN with a small dataset. sification, segmentation, image recognition etc. Deep network
And one more enrichment in this paper is that we build up a
vegetable images dataset of 15 categories consisting of a total of architecture is the utmost power of CNN which enables CNN
21,000 images. to automatically learn mid to high-level considerations from
Index Terms—Vegetable image classification, deep learning, new data [2], [3]. In this research, we proposed a CNN model
CNN, VGG16, MobileNet, Inception-V3, ResNet. developed from scratch as well as four fine-tuned state-of-
the-art CNN architectures (InceptionV3, VGG16, Resnet, Mo-
I. I NTRODUCTION bileNet) for vegetable image classification. Also a comparative
Vegetables are one of the most common food items in study is done based on the performance between CNN and its
everyday meals worldwide. People around the world produce architectures.
many kinds of vegetables. On our planet there exists almost
hundreds of thousands of species of vegetables according to II. R ELATED W ORK
some survey [1]. And vegetables are important for human This research is about classifying vegetable images with
beings as a result of their nutrients, minerals, phytochemical higher accuracy and efficiency. And also finding out the effi-
mixtures, and dietary fiber content. There are similarities in cient model along with a technique that performs well in terms
many vegetable types in terms of color, texture, and shape. of accuracy, time, and cost. Almost 3 years ago, Om Patil et
Even country-wise same vegetable has a different name. From al. [4] used InceptionV3( known as GoogLeNet ) for vegetable
vegetable production to delivery, several common steps are classification tasks. By fine-tuning the inception network and
operated manually. Like picking, and sorting vegetables. And applying the transfer learning technique, the proposed model
recognizing a vegetable is a hard task for the customer in the can classify 4 types of vegetables- carrots, onions, cucumbers,
market, as there are similarities between different vegetables. and tomatoes. The accuracy of their fine-tuned inception-V3 is

978-1-6654-3277-1/21/$31.00 ©2021 IEEE


99% for a comparatively smaller dataset that contains around III. P ROPOSED M ETHODOLOGY
1200 images. Yuki Sakai et al. [5], proposed deep neural This study of vegetable image classification proposed a
network for classification of vegetable by extracting features developed CNN model and state-of-the-art CNN model with
and learning the object. Recognition rate of their DNN model transfer learning technique. In 2012, a breakthrough occurred
was 97.58%. They worked with a very small dataset, having in the image recognition area through the ILSVRC(ImageNet
8 types of vegetables and the dataset contains only 200 total Large-Scale Visual Recognition Challenge) competition on the
images. For the learning process of vegetable recognition, they ImageNet dataset [11]. As of now different types of deep
used 3 million iterations to get that accuracy, which is lengthy convolutional neural network architectures are introduced that
and expensive as well. Frida Femling et al. [6], deep con- varies in numbers of layers as well as complexity and available
volutional neural network-based transfer learning model has for use by anyone. In the past few years, using transfer
been used. For image collection, they used Raspberry Pi. They learning concepts like fine-tuning and layer freezing in CNN
worked with only inception-V3 and MobileNet and got 96% architectures beat the traditional machine learning models in
accuracy with inception-V3 and 97% accuracy with MobileNet terms of performance and efficiency for image classification
respectively. By two properties their model is appreciated. One problems. Here, we used a CNN model and four state-of-the-
is propagation time, another is how much time it takes for art CNN architectures - VGG16, InceptionV3, ResNet, and
classifying a fruit or vegetable image. But the dataset size is MobileNet. These four models are pre-trained on the ImageNet
small as it contains 4,000 images from ImageNet and a total dataset, a large-scale dataset that contains 1.2 million training
of 4,300 images of 10 classes. Zhu L et al. [7], they proposed data mostly animals and daily objects. By applying transfer
AlexNet network for vegetable image classification. And also learning techniques, learned features of these DCNN models
a comparative study is done by addressing the Support Vector may help to make very deep network architecture effective for
Machine classifier and the traditional back propagation neural our dataset.
network. They worked with 5 types of vegetables- pumpkin,
mushrooms, broccoli, cauliflowers, and cucumber. And images A. CNN
were obtained from the ImageNet dataset and were expanded In the field of computer vision, convolutional neural network
by adopting the data expansion method to a total of 24,000 basically a normalized multilayer perceptron (MLP) has been
images so that overfitting is reduced. The highest accuracy they the most influential innovation. In recent years, CNN has
gained from their experiment is 92.1% with the AlexNet net- dominated the computer vision and image processing field for
work. Guoxiang Zeng [8], proposed image saliency technique large-scale image recognition, classification, and segmentation
and VGG architecture for the classification task of fruit and task. A typical CNN starts with an input layer, ends with
vegetables. To reduce the unnecessary noise from the image, an output layer, where in between them there exist multiple
they extracted dense features from each image and filtered hidden layers. Convolutional, pooling, normalization (ReLU),
the complicated backgrounds of it. And for determining the fully-connected layers are part of the hidden layer [12]. Fig. 1
significant area in an input image, they choose a bottom-up shows typical CNN layers. Input layer takes the target image
graph-based visual saliency (GBVS) model [9]. They used a
total of 12,173 images spanning 26 categories, among them
13 categories were vegetables (3678 images)- broccoli, celery,
cowpea, green onion, garlic, cucumber, mushroom, carrot,
onion, pumpkin, chinese cabbage, tomato, and pepper. And
the classification accuracy of their model is 95.6%. Li et
al. [10] proposed an improved VGG model (VGG-M-BN)
and got 96.5% accuracy. They worked with 10 categories of
vegetables. Images were mostly collected from the ImageNet
dataset and were expanded by adopting the data expansion
method. Fig. 1. CNN Layers [13].
Though all these works are good, all of them have limita-
tions in terms of accuracy, efficiency, and effectiveness. The data as input. The image is then reshaped to an optimal size
common problem we found in those works is the size of the and forwarded to the next layer which is a convolutional layer.
dataset, and dataset source. Secondly, the training time of most There exists a number of kernel or filter that actually slides
of those work is so long. And in terms of cost and time, long over the input and performs element-wise multiplications in
training time is inefficient. And very few researchers work order to extract features. Here, through an activation function,
with only vegetable classification [4], [5], [7], [10] but all of the negative weighted input will be replaced with zero oth-
that work have limitations. The minimum type of vegetable erwise it will go to output directly. The most widely used
they worked with was four [4], and the maximum was ten activation function is ReLU (Rectified-Linear Unit) and for
[10]. many types of neural networks, it is the default activation
function. It’s a non-linear function and faster than the other
activation function like- Sigmoid, Scaled Exponential Linear
Unit (SELU), Exponential Linear Unit (ELU), Gaussian Error to Learn”, agenda of which was a lifelong ML technique
Linear Unit (GELU) etc. Features that are extracted from that is able to hold and reuse already learned information
the convolutional layer then sent to the pooling layer. This [14]. It can quickly transfer learned features from one(source)
layer preserves only important features from a large image by domain to the another(target) domain using a smaller dataset
reducing parameters. Then the fully connected layer translates in the fastest manner using the easiest way [15]. Hence, the
these highly filtered images into categories. And another concept of transfer learning is adopted in this research and in
non-linear function named softmax finally gives the decimal this case target domain is vegetable image classification. Two
probabilities ranged from 0 to 1 to each class. In this research, approaches are available for implementing transfer learning-
a 6-layer convolutional neural network is proposed that is one is ”Pre-trained Model” approach and another is ”Develop
completely built from scratch. The input image size is selected Model Approach”. In deep learning, the most commonly used
to 32×32 for reducing overall computational time so that approach is the “Pre-trained Model Approach” and it was
a good model can be created in terms of efficiency. Data selected for this research. In this approach, a pre-trained source
augmentation techniques like rotation, rescale, shear, zoom, model that is trained on large-scale data is selected, and then
and horizontal flip are also applied to the 32×32 size 3- the whole model or parts of the model are used as the starting
channel training image data. ReLU was used as the activation point for a model of another task. Where fine-tuning of the
function with each convolutional layer. And for improving model may be required on the input-output pair of the target
generalization error a dropout rate of 0.25 is used so that it can domain. Fine-tuning refers that, keeping the weights and biases
overcome overfitting issues. Finally, softmax is used in order of some layers unfrozen and using them for training so that the
to find the probabilities of each class in decimal numbers. pre-trained model can perform well on the training data. This
Fig. 2 shows the developed CNN model architecture. paper proposes transfer learning techniques on four pre-trained
state-of-the-art CNN architectures - VGG16, InceptionV3,
ResNet50, and MobileNet with fine-tuning.
C. Fine-tuning CNN Architectures for Transfer Learning
Fine-tuning refers to the technique of using learned features
or weights and biases from a pre-trained deep CNN as the
initialization of a target CNN model so that the target CNN
can be trained on target data in a supervised manner [2]. As
the interrelation between our target dataset and the source
domain dataset is notable, for each architecture we used
layer-wise fine-tuning. We fined-tuned our four state-of-the-
art CNN architectures by freezing the convolutional base so
that previously learned weights and biases can be repurposed
in our task. We used the full architecture as it is, except the
output layer which is basically the last fully connected layer.
Here, convolutional base is the fixed feature extractor, and the
extracted feature will be used for classifying the input image.
For retraining these transferred networks we set the number
of classes in the output layer to 15 referring to our multi-class
classification task. Finally, the final layer was retrained.
Fig. 2. Developed CNN Model.
1) VGG16: Visual Geometry Group(VGG) network con-
tains VGG16 and VGG19. VGG16 was the first runner-up
B. Transfer learning in ILSVRC and consists of 16 convolutional layers, three
Building and training a deep convolutional neural network fully-connected, and five max-pooling layers [3]. It has over
from scratch is time-consuming, costly, and hard. A deep 138million parameters. It uses the ReLU activation function
network means it contains multiple layers, where it can also and dropout for improving generalization error with all fully
be multiple convolutional layers in exact order or sequence in connected layers, and also uses the softmax function in the
order to classify the exact image. To learn feature mapping, output. In this model several 3*3 filters are used in order to
when this kind of large and deep architecture is being trained replace the large-size filters or kernels. And these small-size
from the ground, it needs a large-scale dataset. Transfer kernels give the opportunity of complex feature extraction at
learning is a technique of re-using a previously developed a low cost.
model on a second related task. The utilization of knowledge 2) InceptionV3: InceptionV3 is a state-of-the-art CNN ar-
that is learned from a previous domain for the improvement chitecture developed by google also known as GoogleNet.
and optimization of a new domain is the core idea of it It has 48 layers and it replaced the last fully connected
[12]. The concept of transfer learning in the field of ML layer with average pooling right after the last convolutional
was first presented in a NIPS-95 workshop named “Learning layer. As a result total number of parameters(24 million) is
reduced and makes this model more computationally effi-
cient. In InceptionV3 it has eleven inception module, each
module contains convolutional layers with ReLU activation
function, convolutional filter for dimension reduction, max-
pooling layer, fully-connected layer, and an output layer along
with a softmax activation function [11]. It uses the ReLU
activation function and dropout for improving generalization
error with the fully connected layer.
3) MobileNet: MobileNet is a DCNN architecture that
uses depthwise separated convolutions to construct lightweight
deep convolutional neural networks which are suitable for mo-
bile and embedded vision applications [16]. Here, depthwise
separable convolutions mean, it does the typical combining
and filtering tasks in different separate layers. Except for the
first layer, it uses depthwise separable convolutions in order
to reduce the model size rather than typical convolutions so
that computational efficiency gets increased. It has total of 28
layers where all are followed by batch normalization. It uses
ReLU as an activation function and uses the softmax function Fig. 3. Example From Each Class.
in the output.
4) ResNet50: ResNet elaboration is Residual Network,
which is a particular sort of neural network that was presented and accuracy. The validation accuracy for our developed CNN
in 2015 by Microsoft [17]. ResNet50 is an example network model is 97.6% and testing accuracy is 97.5%. Also Fig. 5
from ResNet network architecture. ResNet introduces a revolu- shows the confusion matrix of the 6-Layer CNN model.
tionary technique to overcome the degradation problem named
“residual mapping”. It has 50 layers and over 23 millions of
trainable parameters. It uses global average pooling instead of
fully connected layers than other standard DCNN architecture.
Though it’s much deeper than other used architectures in this
research, it’s considerably light-weight.
IV. DATASET
This research is conducted to find out the highest accuracy
for vegetable image classification. The initial experiment is
done with 15 types of common vegetables that are found
throughout the world. The vegetables that are chosen for the
experimentation are- bean, bitter gourd, bottle gourd, brinjal, Fig. 4. 6-Layer CNN Training-Validation Loss and Accuracy.
broccoli, cabbage, capsicum, carrot, cauliflower, cucumber,
papaya, potato, pumpkin, radish and tomato. Fig. 3 shows a VGG16 is a 16-layer CNN architecture that has over 138
random example from each class. A total of 21000 images million parameters. Input image size is 224×224 and in-built
from 15 classes are used where each class contains 1400 image pre-processing of VGG architecture is used before they
images of size 224×224 and in *.jpg format. All of the get passed to the network. Adam optimizer is used along with
experiment in this paper is done by using own dataset, and a learning rate of 0.0001, and training is done with a batch size
the dataset split 70% for training, 15% for validation, and of 64 in 10 epoch. Fig. 6 shows the training-validation loss and
15% for testing purpose. accuracy graph for VGG16. With fine-tuning the architecture,
we have got 99.8% validation accuracy and 99.7% testing
V. R ESULTS AND D ISCUSSION
accuracy from the VGG16 model. And Fig. 7 shows the
A. Experimental Results confusion matrix of the model.
In the developed 6-layers CNN model, the 3-channel input InceptionV3 has 48 layers and 24 million parameters. The
image size is 32×32 and the total number of parameters is over input image size is 299×299 and in-built image pre-processing
1.4 million. In the 6-layer CNN model, there are four conv2D of Inception architecture is used before they get passed to the
layers and 2 fully connected layers. Adam optimizer is used network. Adam optimizer is used along with a learning rate
along with a learning rate of 0.0001, and training is done of 0.0001, and training is done with a batch size of 64 in 10
with a batch size of 64 in 50 epochs. With this combination epoch. Fig. 8 shows training vs validation loss and accuracy
of training, we have got the best accuracy from the developed graph for InceptionV3. With fine-tuning the architecture, we
CNN model. Fig. 4 shows the graph of training-validation loss have got 99.6% validation accuracy and 99.7% testing accu-
racy. Fig. 9 shows the confusion matrix for the fine-tuned
InceptionV3.

Fig. 8. InceptionV3 Training-Validation Loss and Accuracy.

Fig. 5. Confusion Matrix of Proposed 6-Layer CNN.

Fig. 9. Confusion Matrix of Fine-Tuned InceptionV3.


Fig. 6. VGG16 Training-Validation Loss and Accuracy.
MobileNet is a very light-weight CNN architecture com-
paring to other architectures used in this research has only 28
layers and the input image size is 224×224. In-built image
pre-processing of MobileNet architecture is used before each
image gets passed to the network. MobileNet V1 is used for
the experiment. Adam optimizer is used along with a learning
rate of 0.0001, and training is done with a batch size of 64 in
10 epoch. With fine-tuning the architecture, we have got 99.8%
validation accuracy and 99.9% testing accuracy. Fig. 10 shows
training vs validation loss and accuracy graph for MobileNet.
And Fig. 11 shows the confusion matrix for the model.
ResNet is also used in this experiment specifically ResNet50
that has only 28 layers and the input image size is 224×224.
In-built image pre-processing of ResNet architecture is used
before each image gets passed to the network. Adam optimizer
is used along with a learning rate of 0.0001, and training is
done with a batch size of 64 in 10 epoch. Fig. 12 shows the
training-validation loss and accuracy graph for ResNet50. With
fine-tuning the architecture, we have got 99.9% validation
Fig. 7. Confusion Matrix of Fine-Tuned VGG16. accuracy and 99.9% testing accuracy. Fig. 13 shows the
confusion matrix for the fine-tuned ResNet50.
Fig. 10. MobileNet Training-Validation Loss and Accuracy.

Fig. 13. Confusion Matrix of Fine-Tuned ResNet50.

97.5%, which is the highest compared with all the previous


work conducted by building a model from scratch. Table I
shows the result summary along with the applied technique.
All the previous works done by using state-of-the-art CNN
model has no major impact, the accuracy of those models is
not significant as most of the experiment was done on a smaller
dataset or collected from ImageNet. Table II shows previous
methods, dataset size and results. With the proposed fine-
tuning approach in state-of-the-art CNN architecture, the out-
put is significantly impressive. All the four DCNN architecture
used with transfer learning technique gives the accuracy over
Fig. 11. Confusion Matrix of Fine-Tuned MobileNet. 99% each. Maximum accuracy is achieved from MobileNet
and ResNet which is 99.9%.
B. Comparative Analysis TABLE I
Building a CNN model from scratch is not a very easy R ESULT S UMMARY
task and with a small dataset it’s harder to find out the best Method/ Training
Technique Epochs Accuracy
accuracy from a developed CNN model. For bringing out Algorithm Time
the best possible accuracy, tweaking the CNN models such Build from
CNN 50 97.5% >1 hour
scratch
as adding more layers, dropout, changing activation function, VGG16 10 99.7% <40 min.
trying with different optimizer along with learning rate is Transfer InceptionV3 10 99.7% <1 hour
important. Proposed 6-layer CNN is tweaked and optimized learning MobileNet 10 99.9% <30 min.
ResNet 10 99.9% <30 min.
for the working vegetable dataset and gives an accuracy of

TABLE II
E XISTING METHODS , DATASETS , AND RESULTS

Method/ Dataset Dataset


Author Accuracy
Algorithm Size Source
Om Patil Self
Inception V3 1200 99%
et al. [4] collected
Yuki Sakai Self
DNN 200 97.38%
et al. [5] collected
Frida Femling MobileNet 96%
4300 ImageNet
et al. [6] Inception V3 97%
Zhu L
AlexNet 24000 ImageNet 92%
et al. [7]
Guoxiang Self
VGG 3678 95.6%
Zeng [8] collected
Fig. 12. ResNet50 Training-Validation Loss and Accuracy.
VI. C ONCLUSION AND F UTURE R ESEARCH [9] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in
Proceedings of the 19th International Conference on Neural Information
Agriculture is the most important sector but this sector is Processing Systems, ser. NIPS’06. Cambridge, MA, USA: MIT Press,
less focused on digitalization than others. There were some 2006, p. 545–552.
works done previously on vegetable classification but those [10] Z. Li, F. Li, L. Zhu, and J. Yue, “Vegetable recognition and classification
based on improved vgg deep learning network model,” International
are in limited scope with a very small dataset and less Journal of Computational Intelligence Systems, vol. 13, no. 1, pp. 559–
accuracy. By taking account of those problems this research 564, 2020.
is conducted to resolve those issues. In this study, vegetable [11] A. Rehman, S. Naz, M. I. Razzak, F. Akram, and M. Imran, “A deep
learning-based framework for automatic brain tumors classification using
image classification is done using a typical CNN model and transfer learning,” Circuits, Systems, and Signal Processing, vol. 39,
CNN-based pre-trained VGG16, InceptionV3, MobileNet, and no. 2, pp. 757–775, 2020.
Resnet50 with two techniques. Proposed typical CNN model [12] M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn transfer learning
for image classification,” in UK Workshop on computational Intelligence.
has six-layer and was build from scratch. On the other hand, Springer, 2018, pp. 191–202.
pre-trained state-of-the-art CNN architectures are fine-tuned [13] Prabhu, “Understanding of convolutional neural network (cnn)
and applied using transfer learning techniques. From various — deep learning,” https://medium.com/@RaghavPrabhu/understanding-
of-convolutional-neural-network-cnn-deep-learning-99760835f148, Mar.
species of vegetables, only 15 types of vegetables are selected 2021.
for primary research of vegetable image classification task. A [14] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans-
dataset consists of 21000 images from 15 classes is created actions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–
1359, 2009.
locally and used for training and testing. A comparative study [15] V. Chauhan, K. D. Joshi, and B. Surgenor, “Image classification using
is also done on the performance between typical CNN and pre- deep neural networks: Transfer learning and the handling of unknown
trained CNN, to check which one is better, efficient, less time- images,” in International Conference on Engineering Applications of
Neural Networks. Springer, 2019, pp. 274–285.
consuming. Also, experimental results for different models and [16] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
techniques are discussed and the overall accuracy achieved T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-
99.9%. From the experimental result, it’s clear that pre-trained lutional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861, 2017.
CNN architectures are the future of machine vision. And as [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
of now, it’s the highest possible accuracy for the vegetable recognition (2015),” arXiv preprint arXiv:1512.03385, 2016.
classification task, which seems quite promising. Also there
are some options for future implications. Various types of
actual devices can be made by utilizing this work. The sorting
and labeling process of vegetables can be automated to save
both time and human resources in super shops, warehouses.
Extending this work by continuing the study with more classes
and types by contributing to the existing dataset, to make it
more robust.
R EFERENCES
[1] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
network training by reducing internal covariate shift,” in International
conference on machine learning. PMLR, 2015, pp. 448–456.
[2] X. Li, T. Pang, B. Xiong, W. Liu, P. Liang, and T. Wang, “Convolutional
neural networks based transfer learning for diabetic retinopathy fundus
image classification,” in 2017 10th international congress on image
and signal processing, biomedical engineering and informatics (CISP-
BMEI). IEEE, 2017, pp. 1–11.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[4] P. Om, G. Vijay et al., “Classification of vegetables using tensorflow.”
International Journal for Research in Applied Science and Engineering
Technology, vol. 6, no. 4, pp. 2926–2934, 2018.
[5] Y. Sakai, T. Oda, M. Ikeda, and L. Barolli, “A vegetable category recog-
nition system using deep neural network,” in 2016 10th International
Conference on Innovative Mobile and Internet Services in Ubiquitous
Computing (IMIS). IEEE, 2016, pp. 189–192.
[6] F. Femling, A. Olsson, and F. Alonso-Fernandez, “Fruit and vegetable
identification using machine learning for retail applications,” in 2018
14th International Conference on Signal-Image Technology & Internet-
Based Systems (SITIS). IEEE, 2018, pp. 9–15.
[7] L. Zhu, Z. Li, C. Li, J. Wu, and J. Yue, “High performance vegetable
classification from images based on alexnet deep learning model,” In-
ternational Journal of Agricultural and Biological Engineering, vol. 11,
no. 4, pp. 217–223, 2018.
[8] G. Zeng, “Fruit and vegetables classification system using image
saliency and convolutional neural network,” in 2017 IEEE 3rd Infor-
mation Technology and Mechatronics Engineering Conference (ITOEC).
IEEE, 2017, pp. 613–617.

View publication stats

You might also like