Paper 14
Paper 14
Abstract—Deep learning for image recognition has received A one-stage model has no intermediate region detection
a lot of attention in recent years. In this paper we present a case process, and the prediction results are obtained directly from
study using two state-of-the-art deep learning libraries for image the image. This is also known as a Region-free method. One
classification based on single phase (Single Shot Detection - SSD) leading example of this approach is Single Shot Detector
and two-phase (Faster Region-based Convolutional Neural
Network – Faster-RCNN) deep learning technologies. The case
(SSD). With SSD, each stage can learn a feature map, and
study is based on classification of lepidoptera: an order of then carry out border regression and classification [5]. SSD
species that includes butterflies and moths. We describe the data also supports multi-scale feature maps to improve the
that was collected that underpinned this work. We also present detection accuracy of small objects in complex images.
the results and discuss the challenges with the work. Finally, we As the two state of the art approaches, in this paper we
outline the implementation of a mobile application used as the consider Faster R-CNN and SSD and their accuracy and
client interface to the final solution. (Abstract) performance for detecting and classifying a variety of species
of butterflies and moths.
Keywords— Computer vision, Deep learning, Tensorflow,
Faster-RCNN, SSD, Butterfly, Moth, Detection. (key words)
II. RELATED WORK
Deep learning methods have been used in many different
I. INTRODUCTION fields, from object detection in images/video, to audio
Machine learning and more recently deep learning have been sampling, to voice recognition through to text-based
gaining significant attention [1]. This is especially the case in analysis. Arguably the most widely known application has
image recognition. Numerous deep learning models and been face recognition. Yang et al [6] applied deep learning
associated software libraries have been put forward to tackle algorithms for face detection and reached new levels of
different scenarios and diverse demands of business and sophistication and sensitivity, e.g. they could detect faces
research communities alike to detect and classify objects in under severe occlusion (overlapping faces) and unconstrained
images. Information in images can be represented in a variety variations in facial poses.
of ways, e.g. as a vector of each pixel intensity value, or as a Numerous researchers have applied deep learning to a
series of edges, or indeed the regions of particular shapes range of image recognition scenarios. Examples of these
based on their colour, hue or saturation levels [2]. Using such include, detection and classification of breeds of domestic
information, it is possible to learn the representation of cats [7], identification of individual feral cats using camera
features present in given images and then use these to detect traps [8], trees/canopy cover to estimate the amount of foliage
and/or classify objects in related images. and hence amount of pesticide that might be needed for fruit
There are typically two main areas of focus for image growers [9], plants/flowers [10], counting fruit on trees to
recognition: object detection and object classification. The estimate yield for rural farmers [11], estimating the size of
task of object detection is to find all objects of interest in an moving crowds [12], detection and classification of
image and determine their category and location. Object poisonous spiders [13], detection and classification of snakes
detection has historically been a challenging problem in [14], counting heavy vehicles (trucks/lorries) and their
computer vision due to the different appearances, shapes and movement patterns on the road network and their direction of
poses of various objects, as well as the impact of illumination travel [15], and even to detect, classify and block
and issues such as occlusion, e.g. partially hidden objects [3]. pornographic images for under-age Internet-users [16]. Deep
There are many approaches that have been applied for object learning is also being used to tackle systemic issues in the
detection. Single-stage and two-stage detection models are Internet era, e.g. the challenges of fake news and being able
the most prominent. to distinguish deep fakes [17]. Such challenges can have
A two-stage model is based on two-stages of processing seismic impacts on society, e.g. influence elections and the
images. Many of these are based on region proposals. A trust that people have with information and data on the
Region-based CNN (R-CNN) involves two processes. The Internet more generally.
first process is to propose regions that may contain objects in The work on deep learning libraries and models and areas
a given image (so called Region Proposals) using a Selective of application is continually evolving and new approaches are
Search algorithm. The second step is to run a classification continually being put forward. Many mobile devices are also
network on the proposed areas to get the categories of objects now offering dedicated hardware and software support for
in each area. object detection and classification. For example, the Apple
Faster R-CNN is the latest evolution of the two-stage iPhone and Android offer software libraries and hardware
method. It replaces the Regional Proposal Network (RPN) support for machine learning and image identification. In the
used by the Selective Search algorithm, to enable detection case of the iPhone, the CoreML library and Vision framework
tasks to be done end-to-end by the neural network [4]. support image recognition directly. However, the approaches
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on May 01,2024 at 07:03:25 UTC from IEEE Xplore. Restrictions apply.
are generic and not targeted to distinguish unique classes, e.g.
the different types of butterfly or moth that are the focus of
this paper.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on May 01,2024 at 07:03:25 UTC from IEEE Xplore. Restrictions apply.
same, the pattern distribution of different species can be used Table 2: General Butterfly vs Moth Classification using SSD
to distinguish them.
B. Shape
Shape is an important indicator for distinguishing different
species of butterfly and moth. According to statistics, most
moth’s body are larger than butterflies. This conclusion is also
confirmed in our dataset except for one species of moth: As can be seen, the Faster RCNN model far greater accuracy
Uraniidae. This moth has a body similar to a butterfly. when classifying an image to the correct species type with
However, it has swallow-tailed wings that makes it somewhat accuracy over 95%. The moths in particular have a poor
unique. performance when using SSD with results that drop down to
38%.
C. Resting State
Most species of moth have unfolded wigs when resting, whilst B. Individual Species Accuracy Analysis
the wings of the butterfly are often closed. Such information The results of the different accuracy experiments on categories
can be used to distinguish butterfly and moth species, noting of moths and butterflies are shown in Figures 5-8. As can be
that this is not always guaranteed to be the case. seen the accuracy varied considerably with the different
species between 44%-92% and 29%-92% for the moths and
D. Tentacles butterflies using Faster RCNN respectively, and between
The moth's tentacles are feathery or filiform, whilst the 11%-54% and 20%-72% for moths and butterflies using SSD.
butterfly's tentacles are rod-shaped. This feature is well
represented in most of the images and can often be used to
distinguish butterflies and moths.
E. Size
Size is an important factor for distinguishing between different
kinds of butterfly and moth for humans. However, size in
photographic images is unreliable, i.e. the size of an object
varies with distance to camera for example. As a result, size
could not be used to distinguish between species.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on May 01,2024 at 07:03:25 UTC from IEEE Xplore. Restrictions apply.
Figure 10. Leafwing Butterfly
Figure 8. Accuracy of butterfly classification using SSD 3) The position of block diagram is incorrect
When we tested the accuracy of the model to identify
Faster RCNN clearly outperforms SSD in terms of accuracy, Emperor Gum Moths, some of the images used for testing
but it is noted that Faster RCNN also took three times as long showed inaccurate/incomplete bounding boxes. For example,
to train the model. This result is also confirmed by [19,20]. In Figure 11 shows images where part of the moth has a wing
the following sections we discuss the reasons for the diverse that has been clipped.
results for classification of moths and butterflies using the
different models.
C. Discussion
1) Challenges in Specific Butterfly Classification
The species that have poorer results are often due to the
similarities between species. As one example, the white-
banded plane was often erroneously classified as the Euploea
alcathoe. As shown in Figure 9, although the patterns of these
two butterflies are almost the same, the positions of the Figure 11. Emperor Gum Moth with missing pixels
patterns are slightly different. This is clear to see in these
images, but this difference is much more challenging to A high level of noise and missing pixels in the data impact on
identify from a side view. the performance of the model.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on May 01,2024 at 07:03:25 UTC from IEEE Xplore. Restrictions apply.
approach that might benefit the overall accuracy, e.g.
[12].
Our objective here was to develop a mobile application as the
end user interface and the memory and storage demands
combining different models meant that this was not a viable
approach.
V. IOS APPLICATION IMPLEMENTATION
In this section we introduce the iOS application
implementation using the pre-trained Tensorflow-based
model to detect and recognize butterflies and moths.
A. Design choice
Figure 12: Data Size versus Accuracy of Results Our goal is to develop a butterfly/moth detection and
recognition system which is user-friendly. There are around
In general, the more images in the dataset for given species, 5 billion users of mobile devices globally and this number is
the higher accuracy. One exception to this is the Emperor increasing. A mobile application thus represents the most
Gum Moth. This kind of moth has only 324 images in the obvious way to deliver the final solution.
database but achieved an accuracy of 0.75. This moth has a There are two ways to implement such a solution. The first
unique colour and pattern. The difference between the approach is to embed the pre-trained model into the
training dataset and the testing dataset is also small. This application directly to run the model locally on the mobile
results in a higher degree of classification accuracy. As device. The other option is to develop a client-server model
identified in [21], excluding other factors, an increase in the application to run the model, i.e. the smartphone uploads a
number and diversity of training data results in an increase in picture to the server and receives/presents the results to the
the accuracy of the model. mobile device when received from the server.
There are several advantages of running the model locally
5) Overcoming Overfitting and Improving the Models on mobile devices. The application can run without any
There are several approaches that can be taken to improve the network limitations, and the developer does not need to
overall results. As identified in [22,23] various measures can maintain/support the cloud server or deal with scaling issues.
be applied to reduce overfitting and improve learning The models that run on a mobile device have to deal with
accuracy: reduced memory which naturally impacts on the overall
x Acquire more data: Getting more data from the web performance. As such it was decided to build a client-server
and/or, using other data augmentation methods can be model. The advantage of this way is that the server has more
enrich the number of images in the dataset. Such powerful computation capabilities and can run more
enrichment can include automatically changing the advanced models.
aspect ratio and finer grained rotations, e.g. rotating by
B. User interface
10 degrees as opposed to 90 degrees. Care needs to be
taken to ensure that the data set is not overly The mobile application allows users to choose a photo or take
homogeneous, however. a picture. This is then uploaded to the server running the
x Improve the model: there are many other models model. The user interface was designed to be simple and easy
available in the single-phase, e.g. You-Only-Look- to use. The user interface is shown in Figure 13.
Once (YOLO) [24] and two-phase approaches, e.g.
Mask RCNN [25]. These may have other trade-offs in
performance.
x Extending the model training time: currently the
model training ceases when consecutive rounds of
training do not bring sufficient levels of improvement.
It is quite possible to continually train the models for
extended time periods to obtain even slight
performance improvements.
x Improved data cleaning: erroneously labelled images
and or images of incomplete images of
butterflies/moths can impact on the accuracy of the
model. Enhancing the quality of the data set would be
a major factor in improving the performance of the
model more generally.
x Combining multiple models: different models may be Figure 13. User Interface of the iOS App
suitable for classification of different species.
Development of multi-task, cascaded ensembles of The iOS client written was written in Swift. The Swift iOS
models that collectively work to establish the best client uploads an image to the Cloud-based server, which was
possible results from the combined models is another based on the Google Cloud platform. This was chosen for
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on May 01,2024 at 07:03:25 UTC from IEEE Xplore. Restrictions apply.
several reasons: it is freely available for academic use and [8] J. Zhou, S. Wang, Y. Chen, R.O. Sinnott, A Web Application for Feral
Cat Recognition through Deep Learning, International Conference on
provided a range of libraries and systems that benefited the Big Data, Honolulu, Hawaii, June 2020.
work. For example, the solution uses a Firebase Function that [9] K. Wang, R. Huo, Y. Jia, R.O. Sinnott, A Mobile Application for Tree
makes (triggers) the prediction request in Node.js and saves Classification and Canopy Calculation using Machine Learning, 1st
the resulting prediction image and data to Cloud Storage and IEEE International Workshop on Artificial Intelligence for Mobile,
Firebase for access/download by the iOS client application. Hangzhou, China, February 2019.
The model itself supports: [10] M. Gao, L. Lin, R.O. Sinnott, A Mobile Application for Plant
Recognition using Deep Learning, e-Science Conference, Auckland,
x Detection_boxes: used to define and draw a given New Zealand, October 2017.
bounding box around the butterfly/moth if they were [11] H. Yu, S. Song, S. Ma, R.O. Sinnott, Predicting Yield: Identifying,
detected in an image (see Figure 13 (right)). Classifying and Counting Fruit through Deep Learning, 6th IEEE/ACM
x Detection_scores: that return a confidence value for International Conference on Big Data Computing, Applications and
Technologies, Auckland, New Zealand, December 2019.
each detection box (see Figure 13 (right)).
[12] P. Zhao, X. Lyu, S. Wei, R.O. Sinnott, Crowd-counting through a
These are returned to the client application device for display Cascaded, Multi-task Convolutional Neural Network, 6th IEEE/ACM
to the user. International Conference on Big Data Computing, Applications and
Technologies, Auckland, New Zealand, December 2019.
VI. CONCLUSIONS AND FUTURE WORK [13] D. Yang, X. Ding, Z. Ye, R.O. Sinnott, Poisonous Spider Recognition
through Deep Learning, Australia Computer Science Week D. Yang,
As with many deep learning projects, the most challenging X. Ding, Z. Ye, R.O. Sinnott, Poisonous Spider Recognition through
part is in establishing a high-quality data set for training the Deep Learning, Australia Computer Science Week, Melbourne,
models. Over 13,000 images were crawled from the web and Australia, February 2020.
labelled using the Python library Labellmg. Two models [14] Z. Yang, R.O. Sinnott, Snake Detection and Classification using Deep
Faster-RCNN and SSD were used (trained) based on this Learning, Hawaii International Conference on System Sciences
(HICSS) Conference, Hawaii, USA, January 2021.
dataset. As shown, Faster-RCNN provided better results
[15] L. Chen, Y. Jia, P. Sun, R.O. Sinnott, Identification and Classification
overall compared to SSD. This was due to the two-phase of Trucks and Trailers on the Road Network through Deep Learning,
approach that it adopts in creating a bounding box for the 6th IEEE/ACM International Conference on Big Data Computing,
regions of interest and then using these for the actual Applications and Technologies, Auckland, New Zealand, December
2019.
classification.
There are numerous areas where the work could be [16] F. Zhuang, L. Ren, Q. Dong, R.O. Sinnott, A Mobile Application using
Deep Learning to Automatically Classify Adult-only Image Content,
extended and improved. First and foremost, the amount and International Conference on AI and Mobile Services (AIMS 2020),
diversity of data should be increased. The work focused on a Honolulu, Hawaii, June 2020.
small subset of butterflies and moths. Any practical solution [17] D. Pan, L. Sun, R. Wang, X. Zhang, R.O. Sinnott, Deepfake Detection
should ideally cover all species. It is noted that the species through Deep Learning, International Conference on Big Data
Computing, Applications and Technologies (BDCAT), Leicester, UK,
selected were all based in Australia and hence of local December 2020.
interest. [18] Koontz, R. M. , & BandelinDacey. (2009). What's the difference
Secondly, the deep learning libraries are continually between a butterfly and a moth?.
evolving and there is increased hardware support for artificial [19] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., &
intelligence on mobile devices, both at the hardware and the Berg, A. C. (2016, October). SSD: Single shot multibox detector. In
software side. Ideally the mobile application would run European conference on computer vision (pp. 21-37). Springer, Cham.
independently on the mobile device and not deal with [20] Hui, J. (2018, March 28). What do we learn from region based object
detectors (Faster R-CNN, R-FCN, FPN)?
networking and Cloud scaling challenges. https://medium.com/@jonathan_hui/what-do-we-learn-from-region-
The source code for this work is available at: based-object-detectors-faster-r-cnn-r-fcn-fpn-7e354377a7c9
https://github.com/gitjin111/butterfly-moth-detection [21] Chen, H., Xiong, F., Wu, D., Zheng, L., Peng, A. (2017). Assessing
impacts of data volume and data set balance in using deep learning
approach to human activity recognition. IEEE International
REFERENCES Conference on Bioinformatics & Biomedicine. IEEE Computer
Society.
[1] Daffodil Software, Applications of Machine Learning from Day- to- [22] Sun, X., Ren, X., Ma, S., & Wang, H.. (2017). Meprop: sparsified back
Day, Retrieved from https://medium.com/app-affairs/9-applications- propagation for accelerated deep learning with reduced overfitting.
of-machine- learning-from-day-to-day-life-112a47a429d0
[23] Ashiquzzaman, A., Tushar, A. K., Islam, M. R. , & Kim, J. M. . (2017).
[2] Palmer, R., Borck, M., West, G., & Tan, T. (2012). Intensity and Range Reduction of overfitting in diabetes prediction using deep learning
Image based Features for Object Detection in Mobile Mapping Data neural network.
[3] Oliva, A., & Torralba, A. (2007). The role of context in object [24] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You only
recognition. Trends in cognitive sciences, 11(12), 520-527. look once: Unified, real-time object detection. In Proceedings of the
[4] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards IEEE conference on computer vision and pattern recognition (pp. 779-
real-time object detection with region proposal networks. In Advances 788).
in neural information processing systems (pp. 91-99). [25] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," 2017
[5] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & IEEE International Conference on Computer Vision (ICCV), Venice,
Berg, A. C. (2016, October). SSD: Single shot multibox detector. In 2017, pp. 2980-2988, doi: 10.1109/ICCV.2017.322.
European conference on computer vision (pp. 21-37). Springer, Cham.
[6] Yang, S., Luo, P., Loy, C.C. and Tang, X., 2016. Wider face: A face
detection benchmark. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 5525-5533).
[7] L. Yang, X. Zhang, R.O. Sinnott, A Mobile Application for Cat
Detection and Breed Recognition based on Deep Learning, 1st IEEE
International Workshop on Artificial Intelligence for Mobile,
Hangzhou, China, February 2019.
Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on May 01,2024 at 07:03:25 UTC from IEEE Xplore. Restrictions apply.