Clasification of Mango (Mangifera Indica L) Fruit Varieties Using CNN
Clasification of Mango (Mangifera Indica L) Fruit Varieties Using CNN
) fruit
varieties using Convolutional Neural Network
Sapan Naik Hinal Shah
Babu Madhav Institute of Information Technology, Babu Madhav Institute of Information Technology,
Uka Tarsadia University, Uka Tarsadia University,
Surat, India. Surat, India.
[email protected] [email protected]
Abstract – Automation in classification and grading of mango Current grading systems have few limitations. Those are
(Mangifera Indica L.) is important for farmers as well as time-consuming, laborious, less efficient, monotonous as well
consumers for identifying quality of mango. This paper addresses as inconsistent while automatic systems provide rapid,
the issue of classifying mango fruit based on non-destructive
economic, hygienic, consistent and objective assessment. This
method. Fruit classification is the prerequisite stage of fruit
grading. Advancement in deep learning and convolution neutral reason motivated us to propose work in field of post harvesting
network (CNN) have been proved to be a boon for the image and mainly in fruit grading. Classification being the initial
classification and recognition tasks which can be used for fruit stage of fruit grading, we have considered it.
recognition. Here in this paper pre-trained CNN is used for
mango classification. Expert knowledge has been collected and In this paper, mango (Mangifera Indica L.) classification is
mango image dataset of 1028 images has been created with seven
performed on seven different varieties of mango fruit as mango
different categories of mango. Mango categories are Kesar,
Rajapuri, Totapuri, Langdo, Aafush, Dahseri and Jamadar. CNN is the extraordinary product that substantiates the high-quality
is tuned and trained according to mango dataset. Four modern standards and ample of nutrients filled in it. There are 1,000
CNN network architectures are compared namely Inception, varieties of mango cultivated in India but only small numbers
Xception, DenseNet and MobileNet. Experiment results show that of varieties cultivate commercially all over India or in other
the MobileNet model is the fastest and DenseNet is the slowest in countries. With the largest area under mango cultivation,
terms of execution time out of all four models. Xception and
Gujarat is the strapping mango growing state for economic
DenseNet model give highest accuracy of 91.42%. Accuracy
achieved by Inception is 90% and time required to grade a single growth stretching from Jamadar, Totapuri, Dahseri, Neelum,
mango is 9.78 seconds. Mango classification is also performed Langdo, Kesar, Payri, Alphonso to Rajapuri [3].
using traditional feature extraction method with classifier where
Histogram of Oriented Gradient (HOG), Scale-Invariant Feature Current trend shows the popularity of deep learning and
Transform (SIFT) and Chain code methods are used as features convolution neutral network (CNN) [4]. The development in
extractor and multiclass Support Vector Machine (SVM) as
deep learning and CNN has been led the field of computer
classifier. 80% accuracy is achieved using this method.
vision and specific of image classification since long time.
Index Terms—Convolutional Neural Network(CNN), Mango, Deep learning instinctively acquires the features of the images
Classification. and extracts the global features and contextual details, which
drastically reduce the errors in the image recognition [5]. It all
I. INTRODUCTION started when the Hinton’s team received the championship of
Agriculture plays a crucial role in the economy of India as the ImageNet image classification, at that time surveillance of
it comprises 16.5% of GDP by sector (2016 est.) with deep learning has been observed [6]. QuocNet, AlexNet,
approximately 50% of labor force (2014 est.) and 10% of total Inception, BN-Inception-v2 are few of the models proposed
export. Even in India, agriculture is sole financial source for later and exhibit superior results. The 70% improvement of the
70% of the agricultural labor and common man [1]. For results have been observed as Google trained random 10
developing country like India, post harvesting procedures are million images with neural network of 9 layers and
bigger issues. Post harvesting phase normally contains classification performed on ImageNet data set of 2000
processes like cooling, cleaning, sorting, grading and packing. categories [7]. PASCAL-VOC- the state-of-the-art detection
Sorting and grading are important aspects for analyzing fruits. framework [8] consists of two stages.
There are some parameters of non-destructive fruit
classification and grading like composition, defects, size, Color (RGB) and Near-Infrared (NIR) are combined using
shape, strength, flavor and color [2]. early and late fusion methods and used in Faster R-CNN model
to detect seven different fruits [9]. Pre-trained R-CNN takes
four hours to process fully and to train new fruit. Fruit organic carrot which give 93.8% average accuracy. 86%
recognition system presented in [10] uses selective search classification accuracy is achieved on 15 classes of 2635
algorithm and fruit image’s entropy for selecting fruit’s region; images of fruits in [16]. Here color and texture feature are used
which given as input to CNN. Finally voting mechanism is with minimum distance classifier. Co-occurrence and
used for classification. K-means feature learning is used as pre- statistical features are computed from the sub-bands of
training process with CNN for Weed identification in [11] Wavelet transform.
where 92.89% accuracy achieved and concluded that fine
tuning can improve results. For online prediction of food This paper presents the solution of mango categories
materials, a fast auto-clean CNN model is proposed in [12]. classification using four modern CNN architectural models
Auto-clean task and multiclass prediction task based adapting and also using traditional feature extraction method. Paper is
learning are used by the model. The proposed work gives organized as; material and methods are discussed in section II.
precise and fast output. 7 classes of Mixed Crops images Result discussion of experiments has been done in section III
mainly oil radish, barley, weed, stump, soil, equipment and and finally work is concluded with future directions.
unknown are classified using Deep Convolutional Neural
Network in [13] where VGG-16’s modified version is used for II. MATERIAL AND METHODS
implementation. 79% accuracy is achieved which shows the For mango classification, we have assembled the data of
potential of deep learning. almost 100 different varieties of mangoes with their features
from Navsari Agriculture University, Gujarat and Paria Farm.
Multi-class kernel support vector machine (kSVM) with Seven easily available and more popular mangoes in south
color histogram, texture and shape features are used for fruits Gujarat region have been selected for experiment. These
classification in [14]. Split-and-merge algorithm is used for mangoes are Kesar, Rajapuri, Totapuri, Langdo, Aafush,
segmentation purpose. Principal component analysis (PCA) is Dahseri and Jamadar. Mix image dataset for mango
used for dimensionality reduction; Winner-Takes-All SVM, classification has been created. Details of Mix image dataset is
Max-Wins-Voting SVM, and Directed Acyclic Graph SVM given in below fig. 1.
are used as multiclass SVM. SVM is used with linear kernel
Homogeneous Polynomial kernel, and Gaussian Radial Basis Below section gives the overview of CNN and how to tune,
kernel. Results conclude that Max-Wins-Voting SVM with train and implement the CNN. CNN consists of multiple levels
Gaussian Radial Basis kernel performs best with 88.2% where each level consists of multiple training sets. Input and
accuracy and Directed Acyclic Graph SVM is the fastest. Crop output of each training stage are images or sets (which are
and weed plants are discriminated without segmentation in known as feature maps) [17].
[15] where Random Forest classifier, Markov Random Field
and Interpolation methods are used. Experiments perform on
1) Convolution Layer
First layer of CNN is always Convolutional Layer and input
of this layer is always an input image. To understand the
working of convolutional layer, let’s suppose one image (A)
having 32 x 32 x 3 size of pixel values. Assume that one
spotlight is shining at the top left corner of an image and
shining of spotlight covers the 3 x 3 area. Visualize this
spotlight sliding across all the areas of the input image as
shown in Fig.2(A).
The common pooling is being done with the filter of size C. Implementation of CNN
2*2 and a stride of 2. It basically reduces the half size of input Using transfer learning technique, lots of work get reduced.
image. After this in flattening step, feature map is converted Here fully trained model is taken which is pre-trained for a set
into single vector because it gives as an input to artificial neural of categories like ImageNet. On this model, the existing
network. weights for new classes can be retrained. All the other layers
remain untouched and only the final layer is retrained [23].
4) Fully Connected Layer This process is faster and does not require graphical processing
In fully connected layer of neural network, input is received unit(GPU). Instead of training full new network, this is better
by each neuron by previous layer’s neurons. Output of this alternative and it gives very good results too.
layer is computed using matrix multiplication, followed by
bias offset. All neurons in previous layer connects to single III. EXPERIMENTS AND RESULT DISCUSSION
neuron to generate specific output.
Algorithm for digit classification is proposed in [24] using
HOG and multiclass SVM. Some more features are found in
B. Tuning CNN Model same reference where SIFT technique is used to extract the
To make the use of CNN, firstly we need to create its model. features from the image. For classification of mango, shape
There are three phases through which tuning of CNN model is plays an important role and the chain code is good shape
done: i) Training, ii) Validate, and iii) Testing. feature extractor [25]. So, we have designed our own chain
code for shape feature extraction. Based on the study, basic
In Training phase, the network is prepared through which classification method is implemented by combining features
do classification. In Validation phase, the calibration is extracted using HOG, SIFT and Chain code techniques; and
provided for the network. It corrects the classification multiclass SVM as classifier.
performed by the training phase. After all the corrections,
model gets ready for testing in Testing phase. Procedure is simple. Inputted image is segmented. Features
are extracted from segmented image; and provide extracted
For designing of neural network, one needs to decide many features as input to classifier. Due to white background of our
things like arrangement of the layers, types of layers used dataset, simple thresholding method is used to segment image.
inside, number of neurons in each layer etc. It is complex to Later HOG, SIFT and Chain code features are extracted and
design the architecture of neural network. It is difficult to provided as input to multiclass SVM.
prepare our own architecture; some standard architectures are
available which can be used directly for our work such as We took 120 training images of Kesar, Aafush, Rajapuri,
AlexNet, GoogleNet, Inception, ResNet, VGG, etc. In the Totapuri, Jamadar and Dahseri (Langdo image is not
beginning, it is preferable to make the use of standard network considered, only six categories are experimented). Each
architectures [22]. category has 20 images. Same way 120 test images are taken.
Script has been written in Python for implementation.
Once the architecture of the network is decided, the next Experiments give 100% accuracy while giving training
important decision is of weights and biases (the parameters of images; for testing purpose and 80% accuracy is achieved in
the network). Backward propagation is used to set parameters case of test images. Time required for classifying single mango
in best of manner. Once parameters get finalize and training is 4.1 seconds for our experiment.
gets completed all parameters and architecture are saved in
binary files. These files are known as model. To test the new Experiments are performed with the use of MacBook
input image, this model is loaded and this model will predict Pro(13-inch,2016) having 2.9GHz Intel Core i5 processor,
the output [22]. 8GB 2133 MHz LPDDR3 memory and Intel Iris Graphics 550
1523MB graphics card. For Implementing CNN, we have used maintained. The reason for choosing only these four CNN
TensorFlow. For initial experiments, pre-trained Inception v3 models for experiment is that, accuracy achieved by these
model is used. From this model, old top layer is removed and models are better compared to other models [26].
retrained it with our mango images because none of the mango
species are there in the original ImageNet classes. Other than Table II and Table III show the experiment results. Values
top layer, all the lower layers have been trained for classifying in Table II represents the number of correct prediction out of
1000 classes in ImageNet dataset. The weights and biases are inputted 10 images for each category of mango for all four
directly used to distinguish between new objects recognition models. Table III shows the overall accuracy, error rates and
tasks. This is the power of transfer learning as discussed before time required for classification of single mango for all four
[5]. models.
TABLE II
RESULTS FOR NUMBER OF CORRECT PREDICTIONS (INPUTTED IMAGES ARE
Initial experiment is done to conclude proper value of
10 FOR EACH CATEGORIES)
training images and epoch for our dataset. We have tested Inception Xception DenseNet MobileNet
Inception v3 model; by providing training for individual v3
mango category and testing the same. We have used different Aafush 9 10 10 10
Jamadar 7 8 10 10
number of training images (10,20,30 and 40) and epoch values
Dahseri 10 9 8 10
(1500, 2000). Based on this, accuracy for classifying Kesar 8 10 9 9
individual mango category is derived. Table I shows the Langdo 9 10 10 10
experiment results. The learning rate is set to 0.01; training Rajapuri 10 10 10 6
Totapuri 10 7 7 7
batch size is taken as 100; testing and validation percentage is
kept to 10; for tutoring of deep CNN learning. Based on our
TABLE III
initial experiment we have concluded that 60 images/samples RESULT SHOWING PERFORMANCE OF ALL CNN MODELS
of each category for training and 10 images of each category Accuracy (%) Error Rate (%) Time(Seconds)
for testing are good enough and selected epoch value for final Inception v3 90 10 9.78
experiment is 2000. Xception 91.42 8.57 5.10
TABLE I DenseNet 91.42 8.57 11.52
ACCURACY FOR DIFFERENT TRAINING IMAGES AND EPOCH VALUES ON SEVEN MobileNet 88.57 11.42 1.09
CATEGORIES OF MANGO
10 20 30 40
Aafush 1500 98 98 96 94
2000 98 99 97 94
Dahseri 1500 67 68 58 50
2000 74 75 66 50
Jamadar 1500 99 99 98 97
2000 99 99 98 97
Kesar 1500 93 82 71 63
2000 95 86 79 63
Langdo 1500 82 94 87 89
2000 85 96 90 89
Totapuri 1500 94 89 86 70
2000 95 91 90 70
Rajapuri 1500 95 97 96 91
2000 97 98 97 91