Deep Residual Networks For Plant Identification
Deep Residual Networks For Plant Identification
com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2019) 000–000
Procedia Computer Science 00 (2019) 000–000 www.elsevier.com/locate/procedia
ScienceDirect www.elsevier.com/locate/procedia
Abstract
Abstract
Advancing the knowledge and understanding of plants growing around us plays a very important and crucial role in medicinally,
Advancing theand
economically knowledge and understanding
in a sustainable agriculture.ofTheplants growing around
identification us images
of plant plays a invery
theimportant and crucial
field of computer role in
vision hasmedicinally,
become an
economically andfocus.
interdisciplinary in a sustainable agriculture.
Taking benign The identification
conditions of plant images
of quick advancement in the field
in computer of computer
vision and deepvision has algorithms,
learning become an
interdisciplinary
convolutional focus.
neural Taking(CNN)
networks benign conditions
approach of quick
is taken advancementtoinlearn
into consideration computer
feature vision and deep
representation learning
of 185 classesalgorithms,
of leaves
convolutional neural networks
taken from Columbia (CNN)
University, approach isoftaken
the University into consideration
Maryland, to learn feature
and the Smithsonian representation
Institution. of 185 classes
For the large-scale of leaves
classification
taken
of frominColumbia
plants the naturalUniversity, the University
environment, a 50-layerof deep
Maryland, andlearning
residual the Smithsonian
framework Institution.
consistingForofthe
5 large-scale classification
stages is designed. The
of plants model
proposed in the achieves
natural environment,
a recognition arate
50-layer
of 93.09deep residual
percent as thelearning
accuracy framework
of testing consisting of 5 stages
on the LeafSnap is illustrating
dataset, designed. Thethat
proposed
deep model
learning is aachieves a recognition
highly promising rate technology.
forestry of 93.09 percent as the accuracy of testing on the LeafSnap dataset, illustrating that
deep learning is a highly promising forestry technology.
© 2019 The Authors. Published by Elsevier Ltd.
© 2019 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
© 2019
This The
is an Authors.
open access Published by Elsevier
article under Ltd.
Peer-review under responsibility of the
the CC BY-NC-ND
scientific license
committee of (https://creativecommons.org/licenses/by-nc-nd/4.0/)
the International Conference on Pervasive Computing Advances
This is an open access article
and Applications – PerCAA 2019. under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Keywords:Plant Classification; Convolution Neural Networks; Data Augmentation; Computer Vision
Keywords:Plant Classification; Convolution Neural Networks; Data Augmentation; Computer Vision
1. Introduction
1. Introduction
Plants are the essential resource for human well-being providing us with oxygen and food. So, the researchers
Plants
along witharebreeding
the essential resource
industry for human
are making well-being
great efforts to providing
continueusagriculture
with oxygen forand food.period
a long So, thewithout
researchers
any
along with breeding
interruptions. A good industry
knowledge areofmaking
plants isgreat effortsto tohelp
essential continue agriculture
fully recognize new,fordifferent
a long period withoutplant
or uncommon any
interruptions.
species in order A to
good knowledge
support of plantspromote
the ecosystem, is essential to help
the drug fully and
industry, recognize
promote new, different orand
sustainability uncommon plant
productivity in
species in order
agriculture. to support
Different modernthe ecosystem,
and advanced promote
modelsthehave
drugbeenindustry, and promote
proposed sustainability
for automatic and productivity
plant identification in
as the
agriculture.
deep learning Different modern
technology and advanced
advances. In addition,models have been proposed
the classification for automatic
and learning of an itemplant
in anidentification as the
image in computer
deep learning
vision technology
is a really advances.
challenging In addition,
task. Today, the classification
most researchers and learning
use different ofinanleaf
variants item in an image
properties as ain computer
comparable
vision
methodis for
a really challenging
the study of new task. Today,
plants, andmost researchers
some use different
leaf datasets, includingvariants in leaf
Flavia, properties
Swedish as a comparable
and ICL, have been
method for and
transmitted, the study of new plants,
fewer attempts have beenand made
some toleaf datasets,
extract local including
features ofFlavia, Swedish
leaf, flower and ICL,
or fruit. have much
However, been
transmitted,
work has been andconveyed
fewer attempts
towardshave been made
identification andtoprediction
extract local featuresapplications.
in different of leaf, flower or fruit. However,
Additionally, much
computational
work has been
intelligence conveyedplay
techniques towards
a vitalidentification and prediction
role in identification in different
and prediction applications.
applications. ForAdditionally,
example, roughcomputational
computing
intelligence
is hybridizedtechniques
with neuralplay a vital[1,
network role
2],ingenetic
identification
algorithm and[3,prediction
4], and softapplications.
set [5]. For example, rough computing
is hybridized with neural network [1, 2], genetic algorithm [3, 4], and soft set [5].
Deep convolutional neural networks play a vital role in order to learn distinct features of an image using image
classification techniques. To enhance the quality of images, we need to increase their resolution, which results to an
establishment of deeper neural networks and increasing the number of stacked layers of the network become very
crucial as mentioned in recent evidence [6,7]. These large number of hidden layers establish the problem of
vanishing gradient descent [8] which could not be resolved by traditional machine learning techniques. To overcome
aforementioned challenges and invigorated by the deep learning breakthrough in image recognition, a 50-layer deep
learning model using residual networks is designed for uncontrolled plant identification on the LeafSnap dataset
which consists of 185 different tree species. The model proposed achieves an accuracy rate of 93.09 percent with
0.24 percent error.
2. Literature Survey
The detection of plants plays an important role in helping different or unusual plant species to increase drug
trafficking, maintain the ecosystem and increase agricultural productivity and property. Neeraj Kumar et al. [9]
proposed a recognition system of images for plant species which can be identified automatically namely LeafSnap.
They built a mobile app that assists users to recognize trees by capturing photos of their leaves and their current
version covers all species of the given dataset. They used an indispensable measure to calculate the various
functions of the curvature-based shape of the leaf at the border and based on these characteristics identified plants.
Their recognition process mainly consists of four parts - classifying, segmenting, extracting and comparing.
Furthermore, they followed nearest neighbors (NN) approach for identification of leaf type.
Sue Han et al. [10] employed the well trained convolutional neural network model to perform plant identification.
They proposed a methodology to recognize the learned features using deconvolution networks (DN), instead of
using the CNN. This approach was used to obtain visual perception of the features needed to recognize a leaf from
various classes, thereby evading the need to manually design the features. They worked on a new dataset called
MalayaKew Leaf Dataset with only 44 classes. They created a new dataset (called as D2) by extending the given
dataset(D1) by cropping manually and rotating the images. They selected 34672 leaf patches for training and 8800
for testing randomly, which resulted in 99.6% accuracy on the D2 dataset and 97.7% on the D1 dataset.
For the purpose of leaf detection at various scales of plants, Jing Hu et al. [11] proposed a MSF-CNN (Multi-
Scale Fusion Convolutional Neural Network). They down-sampled an input image with a list of bilinear
interpolation operations into multiple low-resolution images. The images were then fed into the MSF- CNN
architecture to learn different characteristics in various layers, step by step. The final feature for anticipating the
input image plant species is obtained by aggregating all last layer information. They re-trained the DeepPlant on D1
dataset and predicted classes with an accuracy of 98.1% using Support Vector Machine(SVM) model and 97.7%
accuracy using Multi-Layer Perceptron approach(MLP). They observed that some of the classes were misclassified
and concluded that identifying the shape of plants is not a good choice to recognize plants. Then they trained their
model on the D2 dataset and achieved 99.6%, which is higher than the D1 dataset. They concluded that the D2 has
better performance than D1, is because venation of distinct orders is a more robust characteristic for plant
recognition.
Mohamed Abba et al. [12] modified a trained model to visually identify the leaves in images. They showed how
a model can be used on a small training dataset that is already trained on a large dataset. Its result was that the
traditional machine learning methods were outperformed with the use of local binary patterns (LBPs). They didn’t
train their model from scratch and instead, took a CNN model trained on ImageNet. They worked on an
ImageClef2013 dataset which includes images of clean as well as the cluttered background. Due to the shortage of
the training data, it led to the problem of overfitting and high variability. They, therefore, applied transfer learning to
avoid overfitting and made the AlexNet fine-tuning with the help of Caffe framework. They compared AlexNet
from scratch with the help of random initialization with the fine-tuning versions which resulted in an accuracy of
71.17% on validation dataset and 70.0% on the testing dataset.
Yu Sun et al. [13] took BJFU100 dataset which consists of 100 species of ornamental plants where each category
contains 100 different photos of 3120 × 4208 resolution. They compared a ResNet26(26 layers) with ResNet model
of 18,34 and 50 layers and recommended ResNet26 is the best tradeoff between optimization difficulty and model
capacity. They explained ResNet26 contains enough trainable parameter to learn the discriminative feathers, which
188 Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194
Vinit / Procedia Computer Science 00 (2019) 000–000 3
prevents both under-fitting and over-fitting and thus, resulted in fast and robust convergence during SGD
optimization. Their model achieved 91.78% accuracy which enhances the overall performance up to 2.51%.
Kaiming He et al. [14] presented a framework for residual learning to speed up network training. It leads to a
problem of vanishing gradient descent by increasing the number of hidden layers in the residual network. Therefore,
they introduced a concept of shortcut connection which results in a high accuracy and lower complexity of their
model. On the ImageNet dataset, they developed a residual net model with a depth of up to 152 layers (8 times
deeper than VGG nets). They made a 3.57% error in the test set after using these residual networks.
In this paper, Residual networks of 50 layers are being chosen to train the model on the LeafSnap dataset. Since
the dataset wasn’t enough, we applied data augmentation techniques using open source library – Augmentor [15]
which rotates flips, zoom and resized the given images into 64x64x3 dimensions. We tried to train our model on
CPU, but it trained only 60% of our dataset in 24 hours. So, we trained our model on Google Colab K80 GPU [16]
which took 4-5 hours in training. At final, we achieved an accuracy of 98.75% on training dataset with a loss rate of
0.0462% and 93.0926% accuracy on testing dataset with loss rate 0.247%.
The advantage of a very deep network is that it can learn distinct features of images at many different levels of
abstraction and thus can represent very complex functions. However, it is not at all easy to work with deep networks
as it leads to a problem of vanishing gradients thus making gradient descent unbearably too slow. In gradient
descent, we backpropagate from the last layer back to the first layer (input layer) and multiply by the weight matrix
[17] on each step (including hidden layers also) which makes the gradient, decrease exponentially quickly to zero as
shown in Fig. 1.
Convolutional neural networks (CNN) have been successfully used in patter recognition and image classification
techniques[18,19]. They are widely used to train especially structured networks on extensive data sets and the
availability of parallel computing hardware [20]. Deep convolutional layers help to avoid the extraction process of
features that is essential in classical algorithms for machine learning [21,22]. However, to avoid overfitting, deep
neural networks require a lot of data. When the training data is insufficient and limited, alternative approaches are
much needed [23,24].
As we have got a huge number of images and classes, traditional CNNs approach wouldn’t work as it requires
high extensive memory resources, time and computational power [25, 26]. The chosen dataset leads to deep network
and thus we need to use Residual Networks. The residual learning framework expedites the training of deep
networks resulting in improved performance in each visual and non-visual task. These residual networks are a lot of
Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194 189
4 Vinit / Procedia Computer Science 00 (2019) 000–000
deeper than their ‘plain’ counterparts, nonetheless, they need an identical variety of parameters (weights). Fig. 2
shows basic building block without skip connection and residual building block with skip connection.
Mathematically it is defined as:
At Eqn. (1), x and H(x) are defined as the subnetwork's input and true output, and the F(x) function represents the
residual mapping to be learned. We are explicitly forcing the network to learn an identity mapping by learning the
input and output of subnetworks. The major difference between traditional neural nets and residual networks is
traditional neural nets will try to learn H(x) directly, whereas ResNet model the layers to learn the input and output
of subnetworks.
The overall network architecture of our 50-layer ResNet (i.e. ResNet50) model is depicted in Fig. 3. The model is
mainly designed with the help of 5 stages. Initially, zero-padding pads the input image of dimension 64x64x3 with a
pad of size (3,3). The resulted image is fed into a 7 × 7 convolution layer followed by batch normalization applied to
the channels axis of the input and a 3 × 3 max pooling layers. In stage 2, the convolutional block uses three sets of
filters of size (64×64×256) and stacking 2 identity blocks together using three sets of filters of size (64×64×256).
The dimension of images increases suddenly and in the last stage, the convolutional block uses three sets of filters of
size (512×512×2048) followed by stacking 2 identity blocks which use three sets of filters of size (256×256×2048).
The network finally ends with an average pooling and the output is flattened by adding a fully connected layer that
reduces the number of classes with softmax activation.
The proposed research design consists of 50-layer deep residual network. Each layer consists of an identity block
and convolutional block. The identity block is the standard block used in ResNets, and corresponds to the case
where the input activation and output activation have the same dimensions. We have executed a more powerful
version (as shown in Fig. 4), in which 3 hidden layers are skipped rather than 2 which makes our model unique as it
takes less time and results in a good accuracy.
190 Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194
Vinit / Procedia Computer Science 00 (2019) 000–000 5
The purpose of this block is to match input and output dimensions. The difference with the identity block is that
in this block, there is an additional CONV2D layer in the shortcut path as shown in Fig.5. To resize the input to the
different dimension, we used the CONV2D layer in the shortcut path so that the dimensions match up in the final
addition required to add the shortcut value back to the main path.
5. Implementation
The implementation of our model is based on an open source deep learning framework keras. All the experiments
were conducted on Google's Cloud platform having Tesla K80 GPU processor. We used an Augmentor library,
which is a standalone library in Python and is more convenient because it enables finer grain control over
augmentation and implements the most relevant augmentation techniques in the real world. It uses a stochastic
approach that uses building blocks to put operations together in a pipeline. The data set size [27] has been increased
by rotating, zooming and rotating the images to improve accuracy. All the images were resized into the 64x64x3
dimension and then per- pixel value is divided by 255.
Algorithm 1: Augmentation of images
initialization:
N: no of images.
width: width of images we want to change.
height: height of images we want to change.
sample: after mentioning specific changes, we can sample it which will generate N augmented images
based on our specifications.
path: path of the dataset where all images are present
begin
Initialize an empty pipeline
imagepipeline.Augmentor(path)
imagepipeline.rotate(max_left_rotation, probability, max_right_rotation)
imagepipeline.flip_left_right(probability)
imagepipeline.zoom_random(percentage_area=0.8, probability)
image pipeline.flip_top_bottom(probability)
imagepipeline.resize(probability, width, height)
samplepipeline.sample(N27000)
Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194 191
6 Vinit / Procedia Computer Science 00 (2019) 000–000
We converted our dataset (which was consisted of images) into Hierarchical Data Format 5(hdf5) which is a set
of file formats designed to store and organize huge volumes of data. We converted our dataset into hdf5 format
because of its hierarchical structure (similar to files/folder) and its rate of compression and fast accessibility. After
that, we prepared 150 images of each class (out of total 180 classes) which results in 27,000 images in total. We then
split 80% of our dataset (21600 images) as training and 20% (5400 images) as testing.
Algorithm 2: Actual implementation of Residual networks
initialization:
filters: the number of filters in the CONV layers of the main path
image: input image
input_shape: images’ shape
classes: number of classes, integer
epochs: no of epochs
batch_size: number of training examples utilised in one iteration
F: no of filters
begin:
training_dataset, test_dataset load dataset
training_dataset normalize training dataset by dividing pixels values with 255.
testing_dataset convert test dataset using one hot encoding
model call ResNet50 function with input_shape 64x64x3 and classes 180
model compile model using ‘adam’ optimizer and ‘categorical_crossentropy’ loss value.
fit model with training_dataset, testing_dataset, epochs and batch size as parameters
Function ResNet50(input_shape, classes)
Stage 1:
imagezero_padding(padding_shape)
image 2D_Convolution(F,filters)
imageRelu_Activation(image)
imagemax_pooling(window_shape)
Stage 2:
filter64x64x256
stage2
image call convolutional_block function with image, filter, stage and block’a’asparamenters.
image call convolutional_block function with image, filter, stage and block’b’asparamenters
image call convolutional_block function with image, filter, stage and block’c’asparamenters
Stage 3:
filter128x128x512
image call convolutional_block function with image, filter, stage and block’a’asparamenters
image call identity_block function with image, filter, stage and block’b’asparamenters
image call identity _block function with image, filter, stage and block’c’asparamenters
image call identity _block function with image, filter, stage and block’d’asparamenters
Stage 4:
filter256x256x1024
stage4
image call convolutional_block function with image, filters, stage and block’a’asparamenters
image call identity_block function with image, filters, stage and block’b’asparamenters
image call identity _block function with image, filters, stage and block’c’asparamenters
image call identity _block function with image, filter, stage and block’d’asparamenters
image call identity _block function with image, filter, stage and block’e’asparamenters
image call identity _block function with image, filters, stage and block’f’asparamenters
192 Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194
Vinit / Procedia Computer Science 00 (2019) 000–000 7
Stage 5:
filter 512x512x20148
image call convolutional_block function with image, filters, stage5 and block’a’asparamenters
image call identity_block function with image, filters, stage5 and block’b’asparamenters
image call identity _block function with image, filters, stage5 and block’c’asparamenters
imageaverage_pooling(pool_size, padding_shape)
image convert output into categorical values using softmax activation function
model compile
Function identity_block(image, filters)
prev_image image
image 2D_Convolution(filters)
imageBatchNormalization(image)
imageRelu_Activation(image)
image Add(prev_image, image)
imageRelu_Activation(image)
return image
Function convolutional_block(image, filters)
prev_image image
image 2D_Convolution(filters)
imageBatchNormalization(image)
imageRelu_Activation(image)
prev_image 2D_Convolution(filters)
prev_imageBatchNormalization(prev_image)
image Add(prev_image, image)
imageRelu_Activation(image)
6. Result Analysis
We present a performance assessment of our CNN model using residual networks in this section. We used
"Adam" optimizer and "categorical_crossentropy" loss function as these are good for a model to predict multiple
mutually-exclusive classes. The configuration parameters of our model are depicted in Table 1.
Fig.6(a) shows the training process of our ResNet50 model (of 50 layers). After the very first epoch, training
accuracy improves quickly and stabilizes after 32 epochs. Our model yield 99.07% training accuracy and error rate
started reducing from 1.3133% to 0.0344% as shown in Fig. 6 (b). The proposed ResNet50 results in 93.09% testing
accuracy with a loss rate of 0.247% as shown in Table 2.
8 Vinit / Procedia Computer Science 00 (2019) 000–000
Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194 193
Figure 6. (a) Evolution of classification accuracy on training set. (b) Training Error Rate
7. Conclusion
This paper examined a methodology of deep learning using CNN to learn discriminatory characteristics for plant
categorization from leaf images. We demonstrated how skip-connections in residual networks help to address the
Vanishing Gradient problem which resulted in 93.09% accuracy in the test set, manifesting that deep learning is the
promising technology for large-scale plant classification in the natural environment. We plan to try to evaluate
different CNN architectures in our future work to increase performance. In particular, we plan to extend the deep
learning model to include prediction, disease segmentation, insect detection and so on from the classification task.
Data Reference: Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David W. Jacobs, W. John Kress, Ida C. Lopez, João V. B. Soares, LeafSnap
Dataset, 1.0, 2014, http://leafsnap.com/dataset/
References
[1] A. Anitha and D. P. Acharjya. (2015) “ Neural Network and Rough Set Hybrid Scheme for Prediction of Missing Associations.” International
Journal of Bioinformatics Research and Applications, Vol. 11 (6), pp. 503-524.
[2] A. Anitha and D. P. Acharjya, (2018) “Crop Suitability Prediction in Vellore District using Rough Set on Fuzzy Approximation Space and
Neural Network.” Neural Computing & Applications, Vol. 30 (12), pp. 3633–3650, DOI: 10.1007/s00521-017-2948-1.
[3] R. Rathi and D. P. Acharjya, (2018) “A Rule Based Classification for Vegetable Production for Tiruvannamalai District using Rough Set and
Genetic Algorithm.” International Journal of Fuzzy System Applications, Vol. 7 (1), pp. 74 – 100.
[4] R. Rathi and D. P. Acharjya, (2018) “A Framework for Prediction using Rough Set and Real Coded Genetic Algorithm.” Arabian Journal for
Science and Engineering, Vol. 43 (8), pp. 4215 – 4227.
[5] T. K. Das and D. P. Acharjya, (2014) “A Decision Making Model using Soft Set and Rough Set on Fuzzy Approximation Spaces.”
International Journal of Intelligent Systems Technologies and Applications, Vol. 13 (3), pp. 170-186.
[6] K. Simonyan and A. Zisserman, (2015) “Very deep convolutional networks for large-scale image recognition.” International Conference on
Learning Representations, arXiv:1409.1556.
[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Er-han, V. Vanhoucke, and A. Rabinovich, (2015) “Going deeper with
convolutions.” Computer Vision and Pattern Recognition, pp. 1–9, DOI: 10.1109/CVPR.2015.7298594.
[8] X. Glorot and Y. Bengio. (2010) “Understanding the difficulty of training deep feedforward neural networks.” Artificial Intelligence and
Statistics, Vol. 9, pp. 249-256.
[9] Neeraj Kumar, Peter N Belhumeur, Arijit Biswas, David W Jacobs, W John Kress, Ida C Lopez, and João VB Soares, (2012) “Leafsnap: A
computer vision system for automatic plant species identification.” European Conference on Computer Vision, Vol. 7573, pp. 502–516, Springer.
[10] S. H. Lee, C. S. Chan, P. Wilkin, and P. Remagnino, (2015) “Deep- plant: Plant identification with convolutional neural networks.”
International Conference on Image Processing (IEEE), pp. 452–456, DOI: 10.1109/ICIP.2015.7350839.
[11] Jing Hu, Zhibo Chen∗, Meng Yang, Rongguo Zhang and Yaji Cui, (2018) “A Multi-Scale Fusion Convolutional Neural Network for Plant
Leaf Recognition.” IEEE Signal Processing Letters; vol. 25, pp. 853 – 857.
[12] Mohamed Abbas Hedjazi, IkramKourbane, YakupGenc, (2017) “On Identifying Leaves: A Comparison of CNN with Classical ML
Methods.” Signal Processing and Communications Applications Conference (SIU), IEEE, pp. 1-4, DOI: 10.1109/SIU.2017.7960257.
[13] Yu Sun, Yuan Liu, Guan Wang, and Haiyan Zhang, (2017) “Deep Learning for Plant Identification in Natural Environment.” Computational
194 Vinit Bodhwani et al. / Procedia Computer Science 152 (2019) 186–194
Vinit / Procedia Computer Science 00 (2019) 000–000 9