Birds Species Classification Using Deep Learning
Birds Species Classification Using Deep Learning
SEMINAR REPORT ON
By
Akurdi, Pune.
CERTIFICATE
Submitted by
Dr. A.V.Patil
Principal
Bird watching is a common hobby but to identify their species requires the assistance of
bird books. In environment some rare species are also present and it is difficult to
identify them and predict their name. Naturally, birds present in various scenarios appear
in different sizes, shapes, colors, and angles from human perspective. Recognize birds
by image than the other parameter is easy for humans as well as for machines. Also,
human ability to recognize the birds through the images is more understandable. So, we
are developing a deep learning platform to assist users in recognizing species of birds.
The Raspberry Pi is a basic embedded system and being a low cost a single board
computer used to reduce the complexity of systems in real time applications. Raspberry
pi consist of Camera slot Interface (CSI) to interface the raspberry pi camera. We
applied visual camera images as external data. A convolutional neural network trained
with a deep learning algorithm is applied to the image classification. Firstly, we read
image as input then build a deep learning data set. Once we organize the data, the next
step to train a Convolutional Neural Network (CNN) on the top of data. Advantage of
our system is that we can easily identify the bird’s species for them who are unknown
from it. After that we run a PyTorch model on OS then we will train it. Pytorch platform
is mainly based on python.
Keywords:
I would like to take this opportunity to thank my internal guide Prof. Anil hulsure for giving me all the
help and guidance I needed. I am really grateful to them for their kind support. Their valuable
suggestions were very helpful.
I am also grateful to Prof. P. P. Shevatekar, Head of Computer Engineering Department, Dr. D. Y. Patil
Institute of Engineering, Management & Research for his indispensable support, suggestions.
Rohan Patil
T.E. Computer Engineering
Contents
Chapter 1- Introduction
Chapter3- Methodology
Chapter 8- Conclusion
References
1.1 Motivation:
In 21st century, the world is moving toward digitalization and effective monitoring systems in every sector. But
still, some people still can’t recognize birds by simply observing it. In today’s world everyone is having mobile
phones with smart camera, So, it is possible that anyone can click a picture and it will be given to our system for
search in database, and come up with accurate result. Basically, the motivation behind the project is to design a
system which will overcome the above mentioned problem.
1.2 Background:
Artificial intelligence sounded like a science fiction prophecy of a technological future. Today machine learning
has become a driving force behind technological advancements used by people on a daily basis. Image
recognition is one of the most accessible applications of it and it’s fuelling a visual revolution online. Machine
learning embedded in consumer websites and applications is changing the way visual data is organized and
processed. Visual recognition offers exciting opportunities similar to the ones in science fiction movies that made
our imagination run wild.
Image recognition and detection has grown so effective because it uses deep learning. This is a machine learning
method designed to resemble the way a human brain function. That’s how
computers are taught to recognize visual elements within an image. By noticing emerging patterns and relying on
large databases, machines can make sense of images and formulate relevant categories and tags.
Now a days, BIRD behavior and population trends have become an important issue. Birds help us to detect other
organisms in the environment (e.g. insects they feed on) easily as they respond quickly to the environmental
changes. But, gathering and collecting information about birds requires huge human effort as well as becomes a
very costlier method. In such case, a reliable system that will provide large scale processing of information about
birds and will serve as a valuable tool for researchers, governmental agencies, etc. is required. So, bird species
identification plays an important role in identifying that a particular image of bird belongs to which species. Bird
species identification means predicting the bird species belongs to which category by using an image
New algorithms have been developed to solve the issue which is called Convolutional Neural Network (CNN).
CNN is a state-of-art deep artificial neural network technique for image classification, it clusters similar images
and perform object recognition within regions. The first CNN named LeNet-5 was created by LeCun, Bottou,
Bengio, and Haffner (1998) for handwritten digits recognition. Then CNN research went silence for few years. In
2012, the convolution neural networks arise again on ImageNet Large Scale Visual Recognition Competition
ILSVRC-2012. scaled the structure of AlexNet-5 into a much larger Neural Network which can detects more
complex objects using rectified linear units (ReLU) as non-linearity which is applied to the output of each
convolutional and fully connected layer, dropout technique to avoid the problem of overfitting and overlapping
pooling to avoid the effect in average of average problem. As shown in Figure 1.1, the model structure is divided
into two parts when training on two GTX580 GPUs and the model interacts again with each other at certain
layers. As a result, they won the competition with a deeper and larger margin model, Alex Net achieved 15.3%
test error compared to second best entry 26.2%. It started a revolution of the CNN on Deep Learning Neural
Network with its overwhelming results from others in previous years. In the following years, more accurate and
deeper techniques are developed by many teams thank to the advancement of the computing power and
inspiration from Alex Net. Following the success of Alex Net in the ImageNet Large Scale Visual Recognition
Competition, next year all the teams entered the competition training their model over a CNN network.
The project is a software based project that utilizes Python, PyTorch, Rasberian OS software to perform bird
classification. As the project progresses we will be implementing the same on Raspberry pi board for bird
identification. The Raspberry pi board that we are going to utilize is Raspberry pi Model B. pytorch is a
programming platform designed specifically for engineers and scientists. The heart of pytorch is the python
language. Using python you can: Analyze data, Develop algorithms, and Create models. PyTorch is an open
source machine learning library based on the Torch library, used for applications such as computer vision and
natural language processing. It is primarily developed by Facebook's artificial intelligence research group. It is
free and open-source software released under the Modified BSD license.
Chapter 2
Literature survey:
Table 2.1 represents a brief Literature survey in the given area of Deep learning and current scenario of
this field.
Table2.1 Literature Survey
13
Chapter 3
Methodology
This chapter focuses on Block diagram of our project and the function of the different blocks present in them.
Block diagram
After going through the literature survey we have come up with the following block diagram:
Training Set
Test Data
Predictive Model
Predict
15
3.2 Elements of Block Diagram:
1) Raw data
2) Training Set
4) Test data.
5) Feature Extraction.
6) Predictive Model.
7) Output result.
1) Raw Data: It is data in unstructured form. We can not predict some relevant information form it. Represents a
single, implicitly structured data item in a table.
2) Training Set: The training dataset comprised raw data samples that were incorporated into the training model
to determine specific feature parameters, perform co-relational tasks, and create a related classification model.
3) Deep learning CNN: It module for extracting unique features of birds with the CNN and predicting the most
classified labels for the input images. The model of CNN conjuration for bird identification utilized a stack of
convolutional layers comprising an input layer, two FC layers, and one final output layer.
4) Test data: The test dataset will use to test the classifier parameters and assess the performance of the actual
prediction of the network model. Once the features will be extracted from the raw data, the trained prediction
model will be deployed to classify new input images.
5) Feature Extraction: Firstly, Extracting features from raw input images is our primary task where extracting
relevant and descriptive information for fine grained object recognition. However, because of semantic and intra
class variance, feature extraction will be challenging. We are going to separately extract the features in relevant
positions for each part of an image and subsequently learning the parts of the model features that were mapped
directly to the corresponding parts.
6) Predictive Model: The proposed model can predict the uploaded image of a bird as bird. The proposed system
will predict and differentiate various birds’ images.
2019-20
Department of E&TC, PCCOE
15
Chapter 4
Hardware Implementation
4.1 Circuit Diagram:
5V Power Supply
Raspberry-Pi
Wi-Fi
HDMI to VGA
Display
2019-20
Department of E&TC, PCCOE
16
4.2Hardware Specifications :
USB ports: 4
Video outputs: HDMI, composite video (PAL and NTSC) via 3.5 mm jack
Bluetooth: 4.1
The Raspberry Pi Camera Module v2 replaced the original Camera Module in April 2016.
The v2 Camera Module has a Sony IMX219 8-megapixel sensor. The Camera Module can be
used to take high-definition video, as well as stills photographs. It’s easy to use for beginners,
but has plenty to offer advanced users if you’re looking to expand your knowledge. There are
lots of examples online of people using it for time-lapse, slow- motion, and other video
cleverness. You can also use the libraries we bundle with the camera to create effects.
Small size
2.8V Supply
Up to 15fps
Software Implementations:
This chapter focuses on Software that we are using and its specification.
Feed image
Feature extraction
Apply Classifier
Show result
Python & Pytorch : Every once in a while, a python library is developed that has the
potential of changing the landscape in the field of deep learning. PyTorch is one such library.
Among the various deep learning libraries – PyTorch has been the most flexible and effortless
of them all. This fits right into the python programming methodology, as we don’t have to
wait for the whole code to be written before getting to know if it works or not. We can easily
run a part of the code and inspect it in real time. PyTorch is a python based library built to
provide flexibility as a deep learning development platform. The workflow of PyTorch is as
close as you can get to python’s scientific computing library – numpy.
Python support – As mentioned above, PyTorch smoothly integrates with the python data
science stack. It is so similar to numpy that you might not even notice the difference.
CNN: In deep learning, convolutional neural network (CNN) is a class of deep neural
network mostly used for analyzing visual images. It consists of an input layer and output
layer as well as multiple hidden layers. Every layer is made up of group of neurons and each
layer is fully connected to all neurons of its previous layer. The output layer is responsible for
prediction of output. The convolutional layer takes an image as input, and produces a set of
feature maps as output . The input image can contain multiple channels such as color, wings,
eyes, beak of birds which means that the convolutional layer perform a mapping from 3D
volume to another 3D volume. 3D volumes considered are width, height, depth. The CNN
have two components:
1) Feature extraction part: Features are detected when network performs a series of
convolutional and pooling operation.
2) Classification part : Extracted features are given to fully connected layer which acts as
classifier.
The methodology proposed for achieving this is by using a Convolutional Neural Network.
The basic architecture to form CNNs are: Input Layer, Convolutional Layer, ReLU Layer,
Pooling Layer and Fully Connected Layer and output Layer
The input layer takes image pixel inputs with three colour channels R, G, B and passes it to
convolutional layer. In CNNs, the neurons are constructed in 3D dimensions (height*width*
depth) which the depth means the activation volumes instead of 2D as in simple neural
network.
In convolutional layer, the main purpose of this layer is to extract image features and preserve
the spatial connections between pixels from input. It computes the output of neurons which
means it transforms the image pixel values into output volume or final class scores. This
convolution operation is done by striding filter on input image. Every image can be
represented as a matrix of pixel values in CNNs, and the filter here in another name called
kernel which is a matrix as well which is the concept from image processing. The operation is
striding the filter on feature map each time by one pixel for every position and computing the
convolved output by adding up the output of multiplication of the matrixes. Each
convolutional layer not only computes the final feature map but also there is activation
function for taking input volume from previous layers and parameters of neuron such as bias
and weights
Fig 5.3 CNN layers and activation.
In the ReLU Layer, feature map is passed to this layer to change all negative pixel values to
zero. Most of images from real world are non-linear, but CNNs operation is a linear process.
To make the CNNs learning the non-linear data, ReLU function is implemented to introduce
the non-linearity to CNNs.
The pooling function reduces the size of feature map, but it still preserves the most important
spatial details of the feature map. The purpose of pooling function is to reduce
Parameters and computation in the network; thus, it controls over fitting issue. There are
many types of pooling functions: max, sum, average etc. In this paper, max type of pooling
will be used in pooling layer. In max pooling operation, you define a spatial neighborhood
first like [2 x 2] and you stride it by 2 to the rectified feature map to select the largest pixel
value in each area.
Fully connected layer is the layer where all the neurons are connected to all the other nodes in
the next layer, and a common activation function called softmax is used in this layer as
multinomial logistic regression for multiple classes classification. Softmax function ensures
the output probabilities from fully connected layer is one in the output layer.
Fig 5.5 Max Pooling
Fully connected layer is the layer where all the neurons are connected to all the other nodes in
the next layer, and a common activation function called softmax is used in this layer as
multinomial logistic regression for multiple classes classification. Softmax function ensures
the output probabilities from fully connected layer is one in the output layer.
7.2 Application :
1) Our system can be implemented in forest areas to know, which species are present in that
particular forest.
Chapter 7
Excepted Conclusion
The present study investigates a method to identify the bird species using Deep learning
algorithm (Unsupervised Learning) on the dataset for classification of image. The system will
be connected with a user-friendly system where user will upload photo for identification
purpose and it gives the desired output. The proposed system will works on the principle
based on detection of a part and extracting CNN features from multiple convolutional layers.
These features will be given to the classifier for classification purpose. On basis of the results
we will try to achieve maximum accuracy in prediction of bird species. We will conduct a
series of experiments in a dataset composed of several image to achieve maximum efficiency
References
[4] KRIZHEVSKY, A., I. SUTSKEVER and G. E. HINTON. Image Net classification with
deep con-volutional neural networks. Annual Conference on Neural Information Processing
Systems (NIPS). Harrah’s Lake Tahoe: Curran Associates, 2012, pp. 1097–1105. ISBN 978-
1627480031.
[5] SCHWARZ, M., H. SCHULZ and S. BEHNKE. RGB-D objects recognition and pose
estimation
[6] Tóth, B.P. and Czeba, B., 2016, September. Convolutional Neural Networks for Large
Scale ,Bird Song Classification in Noisy Environment. In CLEF (Working Notes) (pp. 560-
568).
[7]Xception: Deep Learning with Depth wise Separable Convolutions François Chollet
Google, Inc.
Websites :
[8] https://www.youtube.com/playlist?list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG
[9] https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
[10] https://github.com/otmhi/Bird-Image-Classification