Research On Flower Image Classification Method Bas
Research On Flower Image Classification Method Bas
Abstract—The rapid development of deep learning has accelerated the progress of related
technologies in the computer vision field and it has broad application prospects. Due to flower
inter-class similarity and intra-class differences, flower image classification has essential
research value. To achieve flower image classification, this paper proposes a deep learning
method using the current powerful object detection algorithm YOLOv5 to achieve fine-grained
image classification of flowers. Overlap and occluded objects often appear in the images of the
flowers, so the DIoU_NMS algorithm is used to select the target box to enhance the detection of
the blocked objects. The experimental dataset comes from the Kaggle platform, and experimental
results show that the proposed model in this paper can effectively identify five types of flowers
contained in the dataset, Precision reaching 0.942, Recall reaching 0.933, and mAP reaching
0.959. Compared with YOLOv3 and Faster-RCNN, this model has high recognition accuracy,
real-time performance, and good robustness. The mAP of this model is 0.051 higher than the
mAP of YOLOv3 and 0.102 higher than the mAP of Raster-RCNN.
1. INTRODUCTION
The study of flowers has many aspects, such as the study of ornamental value, nutritional, and medicinal
value of flowers[1]. Accurate classification of flowers is the primary work of related research.
Researchers can realise flower classification accurately under knowledge accumulation and data access,
but it takes much time. With the development of society and the improvement of living standards, people
have an increasing demand for flower appreciation, but ordinary people are often challenging to
accurately identify the types of flowers. Therefore, the automatic classification of flower images can
assist researchers in flower research and provide popular science and convenience for people.
Flower image classification belongs to fine-grained image classification[2], which classifies different
flower subcategories. There are three difficulties in the image classification of flowers. (1) There are
similarities between different categories of flowers, as shown in Figure 1; (2) Flowers of the same
category have different characteristics, as shown in Figure 2; (3) Plants usually have more than one flower,
so there are overlapping flowers in the image, as shown in Figure 3.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022
(a)daisy (b)sunflower
Figure 2 .Intra-class difference between flowers of
the same class( (a)daisy; (b)sunflower)
2
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022
experienced development from v1 to v5 [10-13]. YOLOv5, launched in 2020, has the advantages of small
volume, fast speed, high precision, and implementation in ecologically mature PyTorch. R-CNN belongs
to the Two-stage object detection algorithm, which can effectively improve the problem of the target to
be tested in the image, but the Two-stage model is more complex and computational than the One-stage
model. From all of the above, the latest YOLOv5 target detection algorithm is selected to classify the
flower dataset of the Kaggle platform.
2. YOLOV5 MODEL
YOLOv5 is the latest object detection algorithm in the YOLO algorithm, which has high detection
accuracy and is fast and good in real-time. YOLOv5 includes four models of YOLOv5s, YOLOv5m,
YOLOv5l, YOLOv5x, where the YOLOv5s has the smallest volume. This paper selects the YOLOv5s
model, consisting of four parts, Input, Backbone, Neck, Prediction, respectively. The network structure
is shown in Figure 4.
3
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022
2.1 Input
On the input side, YOLOv5 draws on the CutMix method and uses Mosaic data enhancement to improve
the recognition of small targets effectively. Add adaptive scaling processing, the image unified scaling to
a unified size and then sent to the network learning, to enhance the ability of network data processing.
2.2 Backbone
Backbone includes CSP networks and Focus structures, etc. The Focus structure contains four-slice
operations and one convolution of 32 convolutional cores, transforming the original 608 * 608 * 3 picture
into 304 * 304 * 32 feature maps. CSPNet performs local cross-layer fusion, utilising the feature
information of different layers to obtain richer feature maps.
2.3 Neck
The Neck section contains both PANet and SPP. PANet (PathAggregation), to fully integrate the image
features of different layers, aggregate the top feature information and the output features of different CSP
networks in top-down order, and then aggregate the shallow features from the bottom-up. SPP (space
pyramid pooling) uses four different sized nuclei for maximum pooling and then performs tensor splicing.
2.4 Prediction
2.4.1 loss function: The loss function of YOLOv5s uses GIOU_Loss, which alleviates the situation that
IOU_Loss cannot handle the two boxes. As shown in Figure 5, we assume that the minimum external
rectangle of the prediction box and the real box is C, The union set of the forecast box and the real box is
N, The intersection of the prediction box and the real box is M, IOU is the ratio of intersection to the
union, as shown in (1). D is the difference set of C and N, as shown in (2). GIOU is IOU minus the ratio
of D TO C, as shown in (3), then the formula for GIOU_Loss is shown in (4).
M
IOU = (1)
N
D = |C − N| (2)
D
GIOU = IOU − (3)
C
M D
GIOU_LOSS = 1 − GIOU = 1 − ( − ) (4)
N C
4
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022
2.4.2 Non-Maximum Suppression: locally removes redundant detection boxes, retaining the best one.
YOLOv5 uses NMS to select the detection box, and this article uses DIOU_NMS. DIOU_NMS can
improve the detection accuracy of the overlapping and occluded targets.
3.2 Dataset
The dataset of this paper adopts the flower dataset disclosed on the Kaggle platform, which includes
flower images of daisy, dandelion, rose, sunflower and tulip. See Fig. 7 for the schematic diagram of
various flower images. The flowers were labelled with LabelMe, and the data set was divided into the
training set, verification set and test set according to the ratio of 8:1:1. The dataset consists of 400 images,
each of different sizes.
5
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022
momentum factor is set to 0.95, the weight decays to 0.001, and the epoch is 300. Precision, Recall and
mAP after model training are shown in Figure 8.
For category C, the accuracy is the ratio of the number of correct samples detected to the number of
samples detected, and the calculation formula is shown in (5). The Recall of category C is the ratio of the
correct number of samples detected to the total number of samples of this class, and the calculation
formula is shown in (6). AP is the area of the curve enclosed by P and R, and mAP is the average of the
AP for all categories, and the calculation formula is shown in (7).
TP
Precision =
TP+FP
(5)
TP
Recall = (6)
TP+FN
ΣAveragePrecision
Mean Average Precision = (7)
C
6
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022
4. CONCLUSION
Based on the powerful YOLOv5 algorithm in target detection, this paper realised flower image detection
and fine-grained classification. The experimental results show that the model can detect and recognise
the flower images with insect interference, overlapping flower images and fuzzy flower images.
Compared with the YOLOv3 and Faster-RCNN algorithms, the flower classification model proposed in
this paper has good detection and classification results and obvious performance advantages. In addition,
this experiment mainly focuses on the detection and classification of large flower targets in the image.
The next step is to optimise the model further to classify and detect small flower targets in the image.
REFERENCES
[1] Ou J,YANG C H. Quantitative evaluation of the ornamental value of wild herbal flowers [J].
7
2nd International Conference on Computer Vision and Data Mining (ICCVDM 2021) IOP Publishing
Journal of Physics: Conference Series 2024 (2021) 012022 doi:10.1088/1742-6596/2024/1/012022