0% found this document useful (0 votes)
21 views5 pages

Plant Disease Detection Based On Lightweight CNN Model

Uploaded by

Amogh Gadad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Plant Disease Detection Based On Lightweight CNN Model

Uploaded by

Amogh Gadad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2021 4th International Conference on Information and Computer Technologies (ICICT)

Plant disease detection based on lightweight CNN


model
Yang Liu Guoqin Gao Zhenhui Zhang
School of Electrical and information School of Electrical and information School of Electrical and information
EngineeringˈJiangsu University EngineeringˈJiangsu University EngineeringˈJiangsu University
Zhenjiang, China Zhenjiang, China Zhenjiang, China
[email protected] [email protected] [email protected]
2021 4th International Conference on Information and Computer Technologies (ICICT) | 978-1-6654-1399-2/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICICT52872.2021.00018

Abstract—The human population keeps increasing over the considered as the first deep CNN architecture, which shows
last decades. It requires a significant increase in agricultural breakthrough results for image classification and recognition
production. However, the agricultural production is greatly task. It is composed of five convolution and three fully
affected by various plant diseases. Timely and accurately connected layers. VGG [4] is the next step of AlexNet, it is
identifying the types of leaf diseases is very important for plant composed of 19 layers. GoogleNet [5] is composed of 22
diseases control. Convolutional neural network (CNN) is one of layers and it shows a better performance than VGG. ResNet
the most popular ways for image identification. It can
[6] was proposed by He et al., which is considered as a
automatically learn appropriate features from training data. In
continuation of deep networks. It introduces the concept of
this paper, we propose a light-weight CNN model based on
SqueezeNet. The proposed model is trained and tested using the
residual learning in CNNs. ResNet proposed 152-layers deep
open source PlantVillage dataset. Testing results show that the CNN, which was 20 and 8 times deeper than AlexNet and
proposed model can achieve an accuracy of 98.46% while the VGG, respectively.
memory requirements for this model is only 0.62 MB. This Although performance of the CNN has been improved, the
demonstrates the technical feasibility of light-weight CNNs in cost is the computation complexity of the network. The
classifying plant diseases using embedded system. memory requirements and the computation complexity of the
CNN is the bottleneck for embedded applications. Light-
Keywords—plant disease detection, deep learning, transfer
learning, light-weight CNN
weight architecture design [7-10] is a promising solution to
this problem. Light-weight CNN is a kind of high efficient
I. INTRODUCTION CNN which reduce the number of parameters in the network
without compromising the performance. Light-weight
The human population keeps increasing over the last architecture makes CNN applicable for resource constraint
decades. In order to support population growth, a large hardware.
increase in agricultural production is required. However, the
agricultural production is greatly affected by various plant In this work we propose a light-weight CNN model based
diseases. Accurate identification of leaf disease types is the on SqueezeNet [11]. The proposed model was trained to detect
prerequisite for crop disease control. However, plant disease 38 plant disease. The rest of the paper is constructed as follows:
diagnosis through observation of the plant leaves is a very Section II introduce the dataset and data pre-processing.
complex work. Even experienced farmers are often unable to Section III shows the detail of the proposed light-weight CNN
successfully identify certain plant diseases, leading to wrong model. Section IV demonstrate the experiment results and
conclusions and treatment methods. Thus, the traditional data analysis. Conclusion is presented in section V.
method of identifying crop diseases by visual observation is
no longer suitable for modern agriculture. II. MATERIALS AND METHODS
Modern information technology is an effective way to A. Data Acquisition
identify crop diseases. Deep learning [1] refers to the use of In this paper, all the experiments are based on ‘The
artificial neural network architectures that contain a quite PlantVillage Dataset’ [12] (www.plantvillage.org). It is an
large number of processing layers. Deep learning is an end-to- open-access repository, which contains 54,323 images. This
end learning. It can predict the outputs directly from the input dataset contains 38 types of leaves (including 26 types of
raw data. It can automatically extract features for its targets, diseases and 12 types of heathy leaves) from 14 crops.
which eliminate the need to manually select features. However, the number of images of each type is different,
Convolutional Neural Network (CNN) [2] is a class of ranging from 275 to 5357. Normally, machine learning
neural networks that specializes in processing two- algorithms assume that the number of objects in each classes
dimensional image data. A CNN typically has three type of is roughly similar. Otherwise the algorithm will be biased
layers: convolutional layer, pooling layer and fully connected towards the majority group [13].
layer. The convolution layer is the core building block of the To combat this data imbalance issue, two solutions are
CNN. It carries the main portion of the network’s proposed. The first one is under-sampling [14]. The purpose
computational load. The main function of the convolutional of under-sampling is to reduce the number of objects in the
layer is to extract the features from image. majority group such that the number of objects in each classes
In order to improve the performance of the CNN, the is roughly similar. The disadvantage of this method is that
network topology is getting deeper and deeper. AlexNet [3] is some information will be lost. The second method is over-
sampling. The purpose of over-sampling is to increase the
Project fund: National Natural Science Foundation of China (51375210), number of objects in smaller groups so that the number of
Zhenjiang Key R&D Program (GZ2018004), Jiangsu Province Graduate
Research and Practice Innovation Project (CXZZ12_0694), Nantong objects in each group can be balanced. An easy way to over-
Science and Technology Plan Project (MSZ20155). sampling is to supplement the training data with multiple
Yang Liu is a lecture in Nantong Vocational University, he is a part-time copies of some of the minority classes. The disadvantage of
Ph.D candidate in Jiangsu University. this method is that because there are few features that can be

978-1-6654-1399-2/21/$31.00 ©2021 IEEE 64


DOI 10.1109/ICICT52872.2021.00018
Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
learned from the minority classes, the algorithm is prone to branches are E1 and E2, respectively. The number of the
overfitting [13]. Data augmentation is an efficient way to output channel of the fire module is (E1+E2). It is the
solve this problem. Data augmentation is the technique used concatenation of E1 and E2.
to increase the amount of data by adding slightly modified
copies of already existing data or newly created synthetic data
from existing data [15]. It acts as a regularizer and helps
reduce overfitting.
B. Data Pre-Processing
In this paper, data is pre-processed using both of the above
mentioned methods to solve the data imbalance issue. Image
augmentation is carried out using the open source python
library proposed by Jung [16]. The augmentation techniques
involved are: perspective transformations, contrast changes,
add Gaussian noise, dropout of regions, hue/saturation
changes, cropping/padding. After data pre-processing, the
number of leaf images increase to 73,327. The entire database
was divided into two datasets, the training set, and the testing
set, by randomly splitting the 73,327 images so that 80% of
them formed the training set, and 20% formed the testing set.
Fig. 1 shows the examples of image augmentation.

Fig. 2 structure of the classic SqueezeNet

  

  
No.1-No.3 are original apple black rot leaves; No.4 is the
horizontal flip of No.1; No.5 is the vertical flip and increase
brightness of No.2; No.6 is the resize of No.3 with
increased noise;
Fig. 1 Image augmentation

III. CROP LEAVES IDENTIFICATION MODEL


A. Classic SqueezeNet Model Fig. 3 structure of the fire module : fire(M,N,E1,E2)
SqueezeNet is a light-weight architecture proposed by
Iandola [11]. Fig. 2 shows the structure of the classic B. Design of the modified SqueezeNet model
SqueezeNet. There are 3 max pooling layer marked as A, B Three strategies are used to design the modified
and C, respectively. There are two fire modules in between SqueezeNet. The first strategy is to reduce the depth of the
AB and BC. Another four fire modules located after max network. It is obvious that for a relatively simple task, we
pooling layer C. Fire module is the key module in SqueezeNet. don’t need a very deep neural network. Therefore, the idea
Fig.3 shows the structure of a fire module. Fire module is behind this strategy is to use a suitable network instead of a
composed of 2 layers, squeeze layer and expand layer. A fire very complicated neural network for a relatively simple task.
module can be defined as fire(M, N, E1, E2), where M is the Our experiment shows that for a 38 types of image
number of input channel, N is the number of output channel classification task, the performance drop of the modified
of squeeze layer, E1 is the number of the 1x1 filters in expend SqueezeNet is only around 0.5% if we remove the last 3 fire
layer. And E2 is the number of 3x3 filters in expend layer. The modules. The main purpose of this strategy is to reduce the
output of the expand layer is the concatenation of E1 and E2. size of the model at the cost of a slight performance drop.
H and W stand for the Height and Width of the feature map.
The second strategy is to replace part of the 3x3 filter in
As you can see, squeeze layer contains only 1x1 filter, it is the expand layer with 1x1 filter. The main idea behind this
used to reduce the number of input channels from M to N strategy is that a 1x1 filter has 9 times fewer parameters than
(where M>N). Expand layer is composed of two branches. a 3x3 filter. The purpose of this strategy is to decrease the
The left branch contains only 1x1 filter and the right branch quantity of parameters while attempting to preserve accuracy.
contains only 3x3 filter. The output channel of the two

65

Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
The third strategy is to reduce the size of the feature map. IV. EXPERIMENT RESULTS AND DATA ANALYSIS
This can be done by adjusting the location of the pooling layer. All the light-weight CNN model proposed in this paper is
For example, if max pooling layer B is moved in front of fire built and trained using PyTorch ( https://pytorch.org/ ).
module two (as shown in Fig. 4), the computational load of Dropout [18] rate is set as 0.5 to combat overfitting. Initial
this fire module is 4 times fewer (since the feature map size learning rate is 0.01, learning rate is decayed by a factor of 0.1
after the pooling module is 4 times smaller). The purpose of every 7 epochs. The number of training epochs is 30 and the
this strategy is to reduce the computational load while batch size is 32. The computation complexity of the proposed
attempting to preserve the performance. model is estimated using flops-counter [19].
A. Performance evaluation
The performance of the proposed light-weight CNN
models was evaluated using accuracy, precision, recall and F1
score [19]. A confusion matrix is a table that is used to
describe the performance of a classification model. Below is
the terminology used in confusion matrix. TP (true positive):
prediction is positive and the real value is also positive; TN
(true negative): prediction is negative and the real value is also
negative; FP (false positive): prediction is positive but the real
value is negative; FN (false negative): prediction is negative
but the real value is positive.
Accuracy is defined as [20]:

73  71
$FF
 73  71  )3  )1  

Accuracy is the most intuitive performance measurement. It is


the ratio of correctly predicted observations to the number of
a. Mod-2 b. Mod-3 total observations. Accuracy is a great measurement only
when the dataset is balanced where values of false positive and
Fig. 4 structure of the proposed CNN model false negatives are almost the same. Therefore, we have to
Based on the above mentioned strategy, four different look at other parameters to evaluate the performance of the
modified SqueezeNet models are proposed to detect the plant classifier [20]. Precision is defined as:
disease. Mod-0 is based on transfer learning [17] of the classic
SqueezeNet. Transfer learning is an improvement of learning  TP  
P
in a new task through the transfer of knowledge from a related TP  FP
task that has already been learned. It is a popular approach in
deep learning where pre-trained models are used as the Precision is the ratio of correctly predicted positive
starting point of the new task. The only modification of Mod- observation to the total number of predicted positive
0 is to change the output channel of conv10 from 1000 to 38 observations. The higher the precision the better the classifier.
(since we only need to detect 38 types of leaves). Recall is defined as:
Mod-1 is the modification of model-0 based on the first
strategy. The modifications are as follows: remove fire  TP  
R
module 6,7 and 8; Change the parameter of fire module 5 to TP  FN
fire (256, 32, 256, 256). It stands for reduce the output channel
of squeeze layer to 32 and increase the output channel of Recall is the ratio of correctly predicted positive observation
expand layer to 512. to the total number of positive observations. The higher the
recall the better the classifier. F1 score is defined as:
Mod-2 is the modification of mod-1 based on the second
strategy. The modification is as follows: reduce the number of 2* P * R 
3x3 filter in the expand layer by half and increase the number  F1 
PR
of 1x1 filter in the expand layer by half. For example, before
modification, fire module 1 can be expressed as fire (64, 16,
64, 64), which contains 11,264 parameters. After modification, F1 score is the weighted average of precision and recall. It
fire module 1 can be expressed as fire (64,16, 96, 32), which takes both false positive and false negative into account. F1
contains 7,168 parameters. It can be observed that the number score is usually more useful than accuracy, especially when
of parameters after modification is 36% fewer. dataset is imbalanced [20].

Mod-3 is the modification of mod-2 based on the third B. Results and data analysis
strategy. The modification is to move max pooling layer B in Table I shows the performance of the four proposed light-
front of fire module 2. Fig.54 shows the detailed structure of weight CNN model. The model size, computational load,
Mod-2 and Mod-3. accuracy, recall, precision and F1 score are compared. It can
be observed that mod-0 has the best performance (since it is
based on the transfer learning of the classical SqueezeNet,

66

Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
which has lots of redundancy). However, its model size and shown in Fig. 4). The computational load is reduced from 131
computational load are also the largest. MFLOPs to 111 MFLOPs, which is 84.7% of the previous one.
The accuracy and F1 score is 98.13% and 97.85%,
respectively. Compare with mod-2, the performance drop is
Table I Modified model parameters and performance 0.33% and 0.18%, respectively. It shows that the
computational load can be reduced by adjusting the location
Model
Compute
Accuracy Recall Precision F1 of the pooling layer. However, there is a performance drop if
model load smaller feature map size is used.
size /% /% /% /%
/MFLOPs
/MB
mod-0 2.91 272 99.29 98.73 98.78 98.75
mod-1 0.908 185 98.76 98.22 98.04 98.13
mod-2 0.62 131 98.46 98.11 97.96 98.03
mod-3 0.62 111 98.13 98.09 97.62 97.85
Mod-1 is the modification of mod-0. It removes fire
module 6,7 and 8. The total number of the fire module is
reduced from 8 to 5. Therefore, the model size is reduced from
2.91 MB to 0.908 MB, which is only 31.2% of the previous
one. The computational load is reduced from 272 MFOPs
(Million Floating-point Operations) to 185 MFLOPs, which is
only 68% of the previous one. However, its performance is
almost the same as mod-0. Its accuracy reaches 98.76%,
which is only 0.53% lower than mod-0. Its recall rate,
precision and F1 score are 98.22%, 98.04% and 98.13%,
respectively. Compare with mod-0, the performance drop is
only 0.51%, 0.74% and 0.62%, respectively. It shows that the
PlantVillage task (54,323 images in 38 categories) is a Fig. 6 testing loss vs training epoch
relatively simple task compared with the ImageNet task (1
million images in 1000 categories), and therefore, there are a
lot of redundancy in the CNN model.

Fig. 7 confusion matrix of four different apple leaves

Fig. 5 testing accuracy vs. training epoch Fig.5 shows the relationship between the testing accuracy
and the training epoch. It can be observed that due to the
useing of transfer learning, all four models converge very fast.
Mod-2 is the modification of mod-1. The idea is to replace Mod-0 converges fastest among all four models. Its accuracy
part of the 3x3 filter in the expand layer with 1x1 filter. The reaches 93.7% after the first epoch. After 3 epochs, the
model size is reduced from 0.908 MB to 0.62 MB, which is accuracy of the other three models also reached 93% or higher.
only 66% of the previous one. The computational load is All the models converge after 7 epochs, since the learning rate
reduced from 185 MFLOPs to 131 MFLOPs, which is only decays by a factor 0.1 after 7 epochs. Fig.6 shows the
71% of the previous one. Its accuracy and F1 score are 98.46% relationship between testing loss and the training epoch. A
and 98.03, respectively. Compare with mod-1, the similar trend can be found in this curve, and the testing loss
performance drop is 0.3% and 0.1%, respectively. It can be converges after 7 epochs.
observed that the benefits of mod-2 in terms of model size and Fig.7 shows the confusion matrix of four different apple
computational load are much larger (about 30%), while the leaves (scab, black rot, rust and healthy apple leaf). Each row
cost is much smaller (about 0.1%). of the confusion matrix represents the true label, and each
Mod-3 is the modification of mod-2. The difference is that column of the confusion matrix represents the predicted label.
max pooling layer B is moved in front of fire module 2 (as The diagonal elements represent the number of points for
which the predicted label is equal to the true label, while off-

67

Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.
diagonal elements are those that are mislabeled by the REFERENCES
classifier. The higher the diagonal values of the confusion [1] Ian Goodfellow and Yoshua Bengio and Aaron Courville (2016).
matrix the better, indicating many correct predictions [20]. “Deep Learning”. MIT Press.
[2] Venkatesan, Ragav; Li, Baoxin (2017-10-23). “Convolutional Neural
It can be observed that there are 378 apple-scab leaves, of Networks in Visual Computing: A Concise Guide”. CRC Press. ISBN
which 370 are correctly identified. There are 373 apple black 978-1-351-65032-8.
rotten leaves, one of which is identified as apple scab, the [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
other one is identified as apple rust, and the remaining 371 are with deep convolutional neural networks,” in Advances in Neural
correctly identified. There are 385 apple rust leaves, of which Information Processing Systems (NIPS), 2012, pp. 1097–1105.
381 are correctly identified. There are 329 healthy apple [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks
leaves, of which 327 are correctly identified. The recall rates for large-scale image recognition,” arXiv preprint arXiv:1409.1556,
2014.
of these 4 types of leaves were 97.88%, 99.46%, 98.96 and
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
99.39%, respectively. Fig. 8 shows the sample images of apple Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
leaves shown in confusion matrix convolutions,” in Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, 2015.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Conference on Computer Vision and Pattern
Recognition (CVPR). IEEE, 2016, pp. 770–778 .
[7] Chollet, F.: “Xception: Deep learning with depthwise separable
convolutions”. arXiv preprint (2016).
[8] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W.,
Weyand, T., Andreetto, M., Adam, H.: “Mobilenets: Efficient
convolutional neural networks for mobile vision applications”. arXiv
preprint arXiv:1704.04861 (2017).
[9] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.:
“Inverted residuals and linear bottlenecks: Mobile networks for
classification, detection and segmentation”. arXiv preprint
arXiv:1801.04381 (2018).
[10] Zhang, X., Zhou, X., Lin, M., Sun, J.: “Shufflenet: An extremely
efficient convolutional neural network for mobile devices”. arXiv
preprint arXiv:1707.01083 (2017).
[11] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer
parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360,
2016.
No.1: apple-scab leaf, No.2 apple black rotten leaf; No.3: [12] S. Mohanty, "spMohanty/PlantVillage-Dataset", GitHub. Available:
https://github.com/spMohanty/PlantVillageDataset
apple rust leaf, No.4: apple healthy leaf
[13] Krawczyk, B. “Learning from imbalanced data: open challenges and
Fig. 8 Sample images of apple leaves in confusion matrix future directions,” ProgArtifIntell 5, 221 㸫 232 (2016).
https://doi.org/10.1007/s13748-016-0094-0.
[14] García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining,
In: Intelligent Systems Reference Library, vol. 72. Springer, Berlin
V. CONCLUSION (2015) .
In this work, a light-weight CNN model was proposed. [15] Shorten, Connor; Khoshgoftaar, Taghi M. (2019). "A survey on Image
Data Augmentation for Deep Learning". Mathematics and Computers
The proposed model is modified based on SqueezeNet. The in Simulation. springer. 6: 60. doi:10.1186/s40537-019-0197-0.
training and testing of this model was performed using [16] Alexander Jung, "aleju/imgaug", GitHub. Available:
PlantVillage dataset. The dataset comprises 14 plant species https://github.com/aleju/imgaug .
in 38 distinct classes (including healthy and disease leaves). [17] Olivas, E. S., Guerrero, J. D. M., Sober, M. M., Benedito, J. R. M., &
The proposed model achieved an accuracy of 98.46% and F1 Lopez, A. J. S 2009 “Handbook Of Research On Machine Learning
score of 98.03% in the identification of 14,665 plant leaves Applications and Trends: Algorithms, Methods and Techniques-2
images (testing dataset). The model size and computation Volumes”.
complexity of the proposed model is 0.62 MB and 131 [18] N. Srivastava, G. Hinton and A. Krizhevsky. "Dropout: A Simple Way
to Prevent Neural Networks from Overfitting," Journal of Machine
MFLOPs, respectively. Compared with other works, the Learning Research. 2014. 56(15), pp. 1929-1958.
proposed lightweight CNN model can better balance
[19] Vladislav Sovrasov, “sovrasov/flops-counter.pytorch”, GitHub.
performance and computational complexity. Thus, the Available: https://github.com/sovrasov/flops-counter.pytorch
proposed model is suitable for applications using embedded [20] Powers, David M W (2011). "Evaluation: From Precision, Recall and
system. F-Measure to ROC, Informedness, Markedness & Correlation".
Journal of Machine Learning Technologies. 2 (1): 37–63.

68

Authorized licensed use limited to: KLE Technological University. Downloaded on November 19,2024 at 05:14:03 UTC from IEEE Xplore. Restrictions apply.

You might also like