Bird Species Identifier Using Convolutional Neural Network
Bird Species Identifier Using Convolutional Neural Network
https://doi.org/10.22214/ijraset.2022.47039
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
Abstract: In our world, there are above 9000 bird species. Some bird species are being found rarely and if found also prediction
becomes very difficult. In order to overcome this problem, we have an effective and simple way to recognize these bird species
based on their features. Also, the human ability to recognize the birds through the images is more understandable than audio
recognition. So, we have used Convolutional Neural Networks (CNN). CNN’s are the strong assemblage of machine learning
which have proven efficient in image processing. In this paper, a CNN system classifying bird species is presented and uses the
Caltech-UCSD Birds 200 [CUB-200-2011] dataset for training as well as testing purpose. By establishing this dataset and using
the algorithm of similarity comparison, this system is proved to achieve good results in practice. By using this method, everyone
can easily identify the name of the particular bird which they want to know.
Keywords: Deep Learning, CNN Model, Classification and Prediction, TensorFlow, Keras
I. INTRODUCTION
Bird behaviour and populace patterns have become a significant issue now a days. Birds help us to recognize different life forms on
the earth effectively as they react rapidly to ecological changes. Be that as it may, assembling and gathering data about bird species
requires immense human exertion just as it turns into an extremely costly technique. In such a case, a solid framework that will give
enormous scale preparation of data about birds and will fill in as a significant apparatus for scientists, legislative offices, and so
forth is required. In this way, bird species distinguishing proof assumes a significant job in recognizing that a specific picture of
birds has a place with which categories. Bird species identification means predicting the bird species belongs to which category by
using an image. The recognition of bird species can be possible through a picture, audio or video. An audio processing method
makes it conceivable to recognize by catching the sound sign of different birds. Be that as it may, because of the blended sounds in
condition, for example, creepy crawlies, objects from the real world, and so forth handling of such data turns out to be progressively
convoluted. Normally, people discover images more effectively than sounds or recordings. So, an approach to classify birds using
an image over audio or video is preferred. Bird species identification is a challenging task to humans as well as to computational
procedures that carry out such a task in an automated fashion.
As image-based classification systems are improving the task of classifying, objects are moving into datasets with far more
categories such as Caltech-UCSD. Recent work has seen much success in this area. Caltech UCSD Birds 200(CUB-200-2011) is a
wellknown dataset for bird images with photos of 200 categories. The dataset contains birds that are mostly found in Northern
America. Caltech-UCSD Birds 200 consists of 11,788 images and annotations like 15 Part Locations, 312 Binary Attributes, 1
Bounding Box. In this project, rather than recognizing an oversized number of disparate categories, the matter of recognizing an
oversized number of classes within one category is investigated – that of birds. Classifying birds pose an additional challenge over
categories, as a result 2 of the massive similarity between classes. additionally, birds are non-rigid objects which will deform in
many ways and consequently there's also an oversized variation within classes. Previous work on bird classification has taken care
of a little number of classes, or through voice.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 517
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
B. Convolution Layer:
The convolutional layer is the core constructing block of a CNN. The convolution layer comprises a set of independent feature
detectors. Each Feature map is independently convolved with the images.
C. Pooling Layer:
The pooling layer feature is to progressively reduce the spatial size of the illustration to reduce the wide variety of parameters and
computation in the network. The pooling layer operates on each function map independently.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 518
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
E. Architecture
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign
importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
The preprocessing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods
filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image through the application of relevant
filters. The architecture performs a better fitting to the image dataset due to the reduction in the number of parameters involved and
reusability of weights. In other words, the network can be trained to understand the sophistication of the image better.
F. Convolution Layer
Filters (Convolution Kernels)
A filter (or kernel) is an integral component of the layered architecture. Generally, it refers to an operator applied to the entirety of
the image such that it transforms the information encoded in the pixels. In practice, however, a kernel is a smaller-sized matrix in
comparison to the input dimensions of the image, that consists of real valued entries.
The real values of the kernel matrix change with each learning iteration over the training set, indicating that the network is learning
to identify which regions are of significance for extracting features from the data.
G. Pooling Layer
The Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power
required to process the data through dimensionality reduction. Furthermore, it is useful for extracting dominant features which are
rotational and positional invariant, thus maintaining the process of effectively training of the model.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 519
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
V. PROJECT DESIGN
A. System Implementation
This subsection explains the implementation of the proposed deep learning platform to identify the species of birds. The image of
the bird is given as the input to the system. The software system has a trained model with a set of data. The image of the bird gets
converted into gray scale and then into matrix format. Various alignments from that image will be taken into consideration and each
alignment will be given to the CNN for feature extraction. These extracted features are given to the CNN with trained model and
then the resulting values gets compared. According to the compared values the classifier classifies the image into different
categories. Then the predictive model predicts the species of that particular bird which is considered to be as the final result. The
output layer of the network provides parts of the input image containing the bird.
B. Architecture Diagram
C. Modules Dataset
A dataset is a collection of data. For performing action related to birds a dataset named Caltech-UCSD Birds 200 (CUB-200-2011)
is used.
1) Feature Extraction: This subsection describes the feature extraction process in identification of the bird species using bird
images. Primary task is to extract features from raw input images, when extracting relevant and descriptive information for
fine-grained object recognition. To extract and learn features, CNNs apply a number of filters to the raw pixel data of an image
2) Predictive Model: In the feature extraction process, the feature vectors extracted from the raw data (Image of a bird) is given to
the CNN which is trained with the training dataset. The extracted features are then passed to the predictive model which
compares the features with the test data. Then the model predicts the species of that particular bird in the image
D. Hardware Requirements
1) Processor: Intel Core I3 and above
2) Processor Speed : 1.0GHZ or above
3) RAM: 4 GB RAM or above
4) Hard Disk: 500 GB hard disk or Above
E. Software Requirements
Operating System: Windows 7/10 or above ▪Front End : Python ▪Back End : SQLITE3
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 520
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
VI. IMPLEMENTATION
Training: Testing:
Deep learning operational working is similar to the human brain. It learns from the data and makes inferences on the data feature
based on trained data. Therefore to develop a good neural model having a diverse as well as a huge dataset is necessary. For this
purpose, In our research, we are using the data augmentation technique which helps to increase the number of training samples per
class and reduce the effect of class imbalance. Relevant image augmentation techniques are chosen so that the neural model can
learn from the diverse dataset. Those techniques are Gaussian Noise, Gaussian Blur, Flip, Contrast, Hue, Add (add some values to
each channel of the pixel), multiply (multiply some values to each channel of the pixel), Sharp, Affine transform. The large dataset
also help to avoid the problem of overfitting which happens quite often in deep network learning. As the image dataset requires
higher computational capability as compared to the text-based dataset. In our research, we try to reduce this computational
requirement by removing the unwanted part from the image so that the neural model needs to deal with a lesser amount of pixel in
the image for processing. So to eliminate background elements or regions and extract features from the only body of the birds,
pretrained object detection deep nets are used. For this model, we are using Mask R-CNN to localize birds in each image in training
phase as well as in the inference phase. We have used the pre-trained weights of Mask R-CNN, trained on the COCO dataset [6]
which contains 1.5 million object instances with 80 object categories(including birds)
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 521
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
The table no.1 shows the scoresheet based on the result generated by the system. After analysis of these result it has observe that,the
species those are having the highest score has been predicted as a required species.
VIII. CONCLUSION
The main idea behind developing the identification website is to build awareness regarding bird-watching, bird and their
identification, especially birds found in India. It also caters to the need of simplifying the bird identification process and thus
making bird-watching easier. The technology used in the experimental setup is Convolutional Neural Networks (CNN). It uses
feature extraction for image recognition. The method used is good enough to extract features and classify images. The main purpose
of the project is to identify the bird species from an image given as input by the user. We used CNN because it is suitable for
implementing advanced algorithms and gives good numerical precision accuracy. It is also general-purpose and scientific. We
achieved an accuracy of 85%-90%. We believe this project extends a great deal of scope as the purpose meets. In wildlife research
and monitoring, this concept can be implemented in-camera traps to maintain the record of wildlife movement in specific habitat
and behaviour of any species.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 522
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
REFERENCES
[1] Fagerlund, Seppo. "Bird species recognition using support vector machines." EURASIP Journal on Advances in Signal Processing 2007, no. 1 (2007): 038637.
[2] Marini, Andréia, Jacques Facon, and Alessandro L. Koerich. "Bird species classification based on color features." In 2013 IEEE International Conference on
Systems, Man, and Cybernetics, pp. 4336- 4341. IEEE, 2013.
[3] Barar, Andrei Petru, Victor Neagoe and Nicu Sebe. "Image Recognition with Deep Learning Techniques." Recent Advances in Image, Audio and Signal
Processing: Budapest, Hungary, December 10-2 (2013).
[4] Qiao, Baowen, Zuofeng Zhou, Hongtao Yang, and Jianzhong Cao. "Bird species recognition based on SVM classifier and decision tree." First International
Conference on Electronics Instrumentation & Information Systems (EIIS), pp. 1-4, 2017.
[5] Branson, Steve, Grant Van Horn, Serge Belongie, and Pietro Perona. "Bird species categorization using pose normalized deep convolutional nets." arXiv
preprint arXiv: 1406.2952 (2014).
[6] Madhuri A. Tayal, Atharva Mangrulkar, Purvashree Waldey and Chitra Dangra. Bird Identification by Image Recognition. Helix Vol. 8(6): 4349- 4352
[7] Atanbori, John, Wenting Duan, John Murray, Kofi Appiah, and Patrick Dickinson. "Automatic classification of flying bird species using computer vision
techniques." Pattern Recognition Letters (2016): 53-62.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 523