Plant Disease Detection Based On Lightweight CNN Model
Plant Disease Detection Based On Lightweight CNN Model
Abstract—The human population keeps increasing over the considered as the first deep CNN architecture, which shows
last decades. It requires a significant increase in agricultural breakthrough results for image classification and recognition
production. However, the agricultural production is greatly task. It is composed of five convolution and three fully
affected by various plant diseases. Timely and accurately connected layers. VGG [4] is the next step of AlexNet, it is
identifying the types of leaf diseases is very important for plant composed of 19 layers. GoogleNet [5] is composed of 22
diseases control. Convolutional neural network (CNN) is one of layers and it shows a better performance than VGG. ResNet
the most popular ways for image identification. It can
[6] was proposed by He et al., which is considered as a
automatically learn appropriate features from training data. In
continuation of deep networks. It introduces the concept of
this paper, we propose a light-weight CNN model based on
SqueezeNet. The proposed model is trained and tested using the
residual learning in CNNs. ResNet proposed 152-layers deep
open source PlantVillage dataset. Testing results show that the CNN, which was 20 and 8 times deeper than AlexNet and
proposed model can achieve an accuracy of 98.46% while the VGG, respectively.
memory requirements for this model is only 0.62 MB. This Although performance of the CNN has been improved, the
demonstrates the technical feasibility of light-weight CNNs in cost is the computation complexity of the network. The
classifying plant diseases using embedded system. memory requirements and the computation complexity of the
CNN is the bottleneck for embedded applications. Light-
Keywords—plant disease detection, deep learning, transfer
learning, light-weight CNN
weight architecture design [7-10] is a promising solution to
this problem. Light-weight CNN is a kind of high efficient
I. INTRODUCTION CNN which reduce the number of parameters in the network
without compromising the performance. Light-weight
The human population keeps increasing over the last architecture makes CNN applicable for resource constraint
decades. In order to support population growth, a large hardware.
increase in agricultural production is required. However, the
agricultural production is greatly affected by various plant In this work we propose a light-weight CNN model based
diseases. Accurate identification of leaf disease types is the on SqueezeNet [11]. The proposed model was trained to detect
prerequisite for crop disease control. However, plant disease 38 plant disease. The rest of the paper is constructed as follows:
diagnosis through observation of the plant leaves is a very Section II introduce the dataset and data pre-processing.
complex work. Even experienced farmers are often unable to Section III shows the detail of the proposed light-weight CNN
successfully identify certain plant diseases, leading to wrong model. Section IV demonstrate the experiment results and
conclusions and treatment methods. Thus, the traditional data analysis. Conclusion is presented in section V.
method of identifying crop diseases by visual observation is
no longer suitable for modern agriculture. II. MATERIALS AND METHODS
Modern information technology is an effective way to A. Data Acquisition
identify crop diseases. Deep learning [1] refers to the use of In this paper, all the experiments are based on ‘The
artificial neural network architectures that contain a quite PlantVillage Dataset’ [12] (www.plantvillage.org). It is an
large number of processing layers. Deep learning is an end-to- open-access repository, which contains 54,323 images. This
end learning. It can predict the outputs directly from the input dataset contains 38 types of leaves (including 26 types of
raw data. It can automatically extract features for its targets, diseases and 12 types of heathy leaves) from 14 crops.
which eliminate the need to manually select features. However, the number of images of each type is different,
Convolutional Neural Network (CNN) [2] is a class of ranging from 275 to 5357. Normally, machine learning
neural networks that specializes in processing two- algorithms assume that the number of objects in each classes
dimensional image data. A CNN typically has three type of is roughly similar. Otherwise the algorithm will be biased
layers: convolutional layer, pooling layer and fully connected towards the majority group [13].
layer. The convolution layer is the core building block of the To combat this data imbalance issue, two solutions are
CNN. It carries the main portion of the network’s proposed. The first one is under-sampling [14]. The purpose
computational load. The main function of the convolutional of under-sampling is to reduce the number of objects in the
layer is to extract the features from image. majority group such that the number of objects in each classes
In order to improve the performance of the CNN, the is roughly similar. The disadvantage of this method is that
network topology is getting deeper and deeper. AlexNet [3] is some information will be lost. The second method is over-
sampling. The purpose of over-sampling is to increase the
Project fund: National Natural Science Foundation of China (51375210), number of objects in smaller groups so that the number of
Zhenjiang Key R&D Program (GZ2018004), Jiangsu Province Graduate
Research and Practice Innovation Project (CXZZ12_0694), Nantong objects in each group can be balanced. An easy way to over-
Science and Technology Plan Project (MSZ20155). sampling is to supplement the training data with multiple
Yang Liu is a lecture in Nantong Vocational University, he is a part-time copies of some of the minority classes. The disadvantage of
Ph.D candidate in Jiangsu University. this method is that because there are few features that can be
No.1-No.3 are original apple black rot leaves; No.4 is the
horizontal flip of No.1; No.5 is the vertical flip and increase
brightness of No.2; No.6 is the resize of No.3 with
increased noise;
Fig. 1 Image augmentation
65
Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
The third strategy is to reduce the size of the feature map. IV. EXPERIMENT RESULTS AND DATA ANALYSIS
This can be done by adjusting the location of the pooling layer. All the light-weight CNN model proposed in this paper is
For example, if max pooling layer B is moved in front of fire built and trained using PyTorch ( https://pytorch.org/ ).
module two (as shown in Fig. 4), the computational load of Dropout [18] rate is set as 0.5 to combat overfitting. Initial
this fire module is 4 times fewer (since the feature map size learning rate is 0.01, learning rate is decayed by a factor of 0.1
after the pooling module is 4 times smaller). The purpose of every 7 epochs. The number of training epochs is 30 and the
this strategy is to reduce the computational load while batch size is 32. The computation complexity of the proposed
attempting to preserve the performance. model is estimated using flops-counter [19].
A. Performance evaluation
The performance of the proposed light-weight CNN
models was evaluated using accuracy, precision, recall and F1
score [19]. A confusion matrix is a table that is used to
describe the performance of a classification model. Below is
the terminology used in confusion matrix. TP (true positive):
prediction is positive and the real value is also positive; TN
(true negative): prediction is negative and the real value is also
negative; FP (false positive): prediction is positive but the real
value is negative; FN (false negative): prediction is negative
but the real value is positive.
Accuracy is defined as [20]:
73 71
$FF
73 71 )3 )1
Mod-3 is the modification of mod-2 based on the third B. Results and data analysis
strategy. The modification is to move max pooling layer B in Table I shows the performance of the four proposed light-
front of fire module 2. Fig.54 shows the detailed structure of weight CNN model. The model size, computational load,
Mod-2 and Mod-3. accuracy, recall, precision and F1 score are compared. It can
be observed that mod-0 has the best performance (since it is
based on the transfer learning of the classical SqueezeNet,
66
Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
which has lots of redundancy). However, its model size and shown in Fig. 4). The computational load is reduced from 131
computational load are also the largest. MFLOPs to 111 MFLOPs, which is 84.7% of the previous one.
The accuracy and F1 score is 98.13% and 97.85%,
respectively. Compare with mod-2, the performance drop is
Table I Modified model parameters and performance 0.33% and 0.18%, respectively. It shows that the
computational load can be reduced by adjusting the location
Model
Compute
Accuracy Recall Precision F1 of the pooling layer. However, there is a performance drop if
model load smaller feature map size is used.
size /% /% /% /%
/MFLOPs
/MB
mod-0 2.91 272 99.29 98.73 98.78 98.75
mod-1 0.908 185 98.76 98.22 98.04 98.13
mod-2 0.62 131 98.46 98.11 97.96 98.03
mod-3 0.62 111 98.13 98.09 97.62 97.85
Mod-1 is the modification of mod-0. It removes fire
module 6,7 and 8. The total number of the fire module is
reduced from 8 to 5. Therefore, the model size is reduced from
2.91 MB to 0.908 MB, which is only 31.2% of the previous
one. The computational load is reduced from 272 MFOPs
(Million Floating-point Operations) to 185 MFLOPs, which is
only 68% of the previous one. However, its performance is
almost the same as mod-0. Its accuracy reaches 98.76%,
which is only 0.53% lower than mod-0. Its recall rate,
precision and F1 score are 98.22%, 98.04% and 98.13%,
respectively. Compare with mod-0, the performance drop is
only 0.51%, 0.74% and 0.62%, respectively. It shows that the
PlantVillage task (54,323 images in 38 categories) is a Fig. 6 testing loss vs training epoch
relatively simple task compared with the ImageNet task (1
million images in 1000 categories), and therefore, there are a
lot of redundancy in the CNN model.
Fig. 5 testing accuracy vs. training epoch Fig.5 shows the relationship between the testing accuracy
and the training epoch. It can be observed that due to the
useing of transfer learning, all four models converge very fast.
Mod-2 is the modification of mod-1. The idea is to replace Mod-0 converges fastest among all four models. Its accuracy
part of the 3x3 filter in the expand layer with 1x1 filter. The reaches 93.7% after the first epoch. After 3 epochs, the
model size is reduced from 0.908 MB to 0.62 MB, which is accuracy of the other three models also reached 93% or higher.
only 66% of the previous one. The computational load is All the models converge after 7 epochs, since the learning rate
reduced from 185 MFLOPs to 131 MFLOPs, which is only decays by a factor 0.1 after 7 epochs. Fig.6 shows the
71% of the previous one. Its accuracy and F1 score are 98.46% relationship between testing loss and the training epoch. A
and 98.03, respectively. Compare with mod-1, the similar trend can be found in this curve, and the testing loss
performance drop is 0.3% and 0.1%, respectively. It can be converges after 7 epochs.
observed that the benefits of mod-2 in terms of model size and Fig.7 shows the confusion matrix of four different apple
computational load are much larger (about 30%), while the leaves (scab, black rot, rust and healthy apple leaf). Each row
cost is much smaller (about 0.1%). of the confusion matrix represents the true label, and each
Mod-3 is the modification of mod-2. The difference is that column of the confusion matrix represents the predicted label.
max pooling layer B is moved in front of fire module 2 (as The diagonal elements represent the number of points for
which the predicted label is equal to the true label, while off-
67
Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
diagonal elements are those that are mislabeled by the REFERENCES
classifier. The higher the diagonal values of the confusion [1] Ian Goodfellow and Yoshua Bengio and Aaron Courville (2016).
matrix the better, indicating many correct predictions [20]. “Deep Learning”. MIT Press.
[2] Venkatesan, Ragav; Li, Baoxin (2017-10-23). “Convolutional Neural
It can be observed that there are 378 apple-scab leaves, of Networks in Visual Computing: A Concise Guide”. CRC Press. ISBN
which 370 are correctly identified. There are 373 apple black 978-1-351-65032-8.
rotten leaves, one of which is identified as apple scab, the [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
other one is identified as apple rust, and the remaining 371 are with deep convolutional neural networks,” in Advances in Neural
correctly identified. There are 385 apple rust leaves, of which Information Processing Systems (NIPS), 2012, pp. 1097–1105.
381 are correctly identified. There are 329 healthy apple [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks
leaves, of which 327 are correctly identified. The recall rates for large-scale image recognition,” arXiv preprint arXiv:1409.1556,
2014.
of these 4 types of leaves were 97.88%, 99.46%, 98.96 and
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
99.39%, respectively. Fig. 8 shows the sample images of apple Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
leaves shown in confusion matrix convolutions,” in Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, 2015.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, 2016, pp. 770–778 .
[7] Chollet, F.: “Xception: Deep learning with depthwise separable
convolutions”. arXiv preprint (2016).
[8] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W.,
Weyand, T., Andreetto, M., Adam, H.: “Mobilenets: Efficient
convolutional neural networks for mobile vision applications”. arXiv
preprint arXiv:1704.04861 (2017).
[9] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.:
“Inverted residuals and linear bottlenecks: Mobile networks for
classification, detection and segmentation”. arXiv preprint
arXiv:1801.04381 (2018).
[10] Zhang, X., Zhou, X., Lin, M., Sun, J.: “Shufflenet: An extremely
efficient convolutional neural network for mobile devices”. arXiv
preprint arXiv:1707.01083 (2017).
[11] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer
parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360,
2016.
No.1: apple-scab leaf, No.2 apple black rotten leaf; No.3: [12] S. Mohanty, "spMohanty/PlantVillage-Dataset", GitHub. Available:
https://github.com/spMohanty/PlantVillageDataset
apple rust leaf, No.4: apple healthy leaf
[13] Krawczyk, B. “Learning from imbalanced data: open challenges and
Fig. 8 Sample images of apple leaves in confusion matrix future directions,” ProgArtifIntell 5, 221 㸫 232 (2016).
https://doi.org/10.1007/s13748-016-0094-0.
[14] García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining,
In: Intelligent Systems Reference Library, vol. 72. Springer, Berlin
V. CONCLUSION (2015) .
In this work, a light-weight CNN model was proposed. [15] Shorten, Connor; Khoshgoftaar, Taghi M. (2019). "A survey on Image
Data Augmentation for Deep Learning". Mathematics and Computers
The proposed model is modified based on SqueezeNet. The in Simulation. springer. 6: 60. doi:10.1186/s40537-019-0197-0.
training and testing of this model was performed using [16] Alexander Jung, "aleju/imgaug", GitHub. Available:
PlantVillage dataset. The dataset comprises 14 plant species https://github.com/aleju/imgaug .
in 38 distinct classes (including healthy and disease leaves). [17] Olivas, E. S., Guerrero, J. D. M., Sober, M. M., Benedito, J. R. M., &
The proposed model achieved an accuracy of 98.46% and F1 Lopez, A. J. S 2009 “Handbook Of Research On Machine Learning
score of 98.03% in the identification of 14,665 plant leaves Applications and Trends: Algorithms, Methods and Techniques-2
images (testing dataset). The model size and computation Volumes”.
complexity of the proposed model is 0.62 MB and 131 [18] N. Srivastava, G. Hinton and A. Krizhevsky. "Dropout: A Simple Way
to Prevent Neural Networks from Overfitting," Journal of Machine
MFLOPs, respectively. Compared with other works, the Learning Research. 2014. 56(15), pp. 1929-1958.
proposed lightweight CNN model can better balance
[19] Vladislav Sovrasov, “sovrasov/flops-counter.pytorch”, GitHub.
performance and computational complexity. Thus, the Available: https://github.com/sovrasov/flops-counter.pytorch
proposed model is suitable for applications using embedded [20] Powers, David M W (2011). "Evaluation: From Precision, Recall and
system. F-Measure to ROC, Informedness, Markedness & Correlation".
Journal of Machine Learning Technologies. 2 (1): 37–63.
68
Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.