0% found this document useful (0 votes)
48 views

Computer Vision

This document summarizes a bachelor's thesis on developing a bird detection system using computer vision. The thesis was completed by three students - Preetham Notla, Ganta Saaketh Reddy, and Sandeep Jyothula - at Blekinge Institute of Technology in Sweden. The thesis aimed to design a software solution to detect birds in real-time video streams with a minimum resolution of full HD at 15 frames per second. The proposed approach involved background subtraction, contour detection, and classification of objects using the YOLO v3 deep learning model. The results were expected to identify birds with 95% accuracy and distinguish them from other moving objects like wind turbine blades, trees, and clouds.

Uploaded by

John Paulin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Computer Vision

This document summarizes a bachelor's thesis on developing a bird detection system using computer vision. The thesis was completed by three students - Preetham Notla, Ganta Saaketh Reddy, and Sandeep Jyothula - at Blekinge Institute of Technology in Sweden. The thesis aimed to design a software solution to detect birds in real-time video streams with a minimum resolution of full HD at 15 frames per second. The proposed approach involved background subtraction, contour detection, and classification of objects using the YOLO v3 deep learning model. The results were expected to identify birds with 95% accuracy and distinguish them from other moving objects like wind turbine blades, trees, and clouds.

Uploaded by

John Paulin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Bachelor Thesis

Electrical Engineering
June 2021

Bird Detection System


Based on Vision

Preetham Notla
Ganta Saaketh Reddy
Sandeep Jyothula

Dept. of Mathematics & Natural Sciences


Blekinge Institute of Technology
SE–371 79 Karlskrona, Sweden
This thesis is submitted to the Department of Mathematics and Natural Science at Blekinge
Institute of Technology in partial fulfillment of the requirements for the degree of Bachelor of
Science in Electrical Engineering with Emphasis on Telecommunication.

Contact Information:
Author(s):
Preetham Notla
E-mail: [email protected]
Ganta Saaketh Reddy
E-mail: [email protected]
Sandeep Jyothula
E-mail: [email protected]

University advisor:
Prof. Wlodek J.Kulesza
Dept. of Mathematics and Natural Sciences

Company advisor:
Dr Damian Dziak
Bioseco Sp. z o.o.
E-mail: [email protected]

Dept.of Mathematics & Natural Sciences : www.bth.se


Blekinge Institute of Technology Phone : +46 455 38 50 00
SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57
Abstract

Context. Air being the free source is used in different ways commer-
cially. In earlier days windmills generate power, water, and electricity.
The excessive establishment of windmills for commercial purposes af-
fected avifauna. Most of the birds lost their lives due to collisions
with windmills. Turbines used to generate power near airports are
also one of the causes for the extinction of birdlife. According to a
survey in 2011 in Canada a total of 23,300 bird deaths were caused
by wind turbines and also it is estimated that the number of deaths
would increase to 2,33,000 in the following 10-15 years.
Objectives. The main objective of this thesis is to find a suitable
software solution to detect the birds on a series of grayscale images in
real-time and in minimum full HD resolution with at least a 15 FPS
rate. User-Driven Design Methodology is used for developing, tools
are Python and Open-CV.
Methods. In this research, a system is designed to detect the bird
in an HD Video. Possible methods that can be considered are con-
volutional neural networks (CNN), vision based detection with back-
ground subtraction, contour detection and confusion matrix classifi-
cation. These methods detect birds in raw images and with help of
a classifier make it possible to see the bird in desired pixels with full
resolution. We will investigate a bird classification method consisting
of two steps, background subtraction, and then object classification.
Background subtraction is a fundamental method to extract moving
objects from a fixed background. For classification, we will use the
YOLO v3 model version for object classification.
Results. The project is expected to result in a system design and
prototype for the bird identification on a grayscale video stream in
at least full HD resolution in a minimum of 15 FPS. The bird should
be distinguished from other moving objects like wind turbine blades,
trees, or clouds. The proposed solution should identify up to 5 birds
simultaneously.
Conclusions. After completing each step and arriving at the classifi-
cation, the methods we have tried, such as Haar Cascades and mobile-
net SSD, were not providing us with the desired results. So we opted
to use YOLO v3, which had the best accuracy in classifying different
objects. By using the YOLO v3 classifier, we have detected the bird
with 95% accuracy, blades with 90% accuracy, clouds with 80% ac-
curacy, trees with 70% accuracy. Moreover, we conclude that there
is a need for further empirical validation of the models in full-scale
industry trials.
Keywords: Background Subtraction, Bird detection, Classification,
Contour Detection, Convolutional Neural Networks(CNN), Python,
Open-CV.

ii
Acknowledgments

We would like to express our special thanks gratitude to Bioseco Company for
assigning three of us to this project. We are grateful to Damian Dziak for provid-
ing all information regarding the project, who also helped us with software and
completing this project. Thank you for guiding us and teaching us many new
things during this project.

We would also like to extend our special thanks to Prof. Wlodek J. Kulesza,
for providing this unique project opportunity and providing facilities that were re-
quired. We need to thank him for his support and motivation during the project.
We are inspired by his commitment to his work.

Also, we would like to thank our family and friends for supporting us during
this project.

This research was funded by a grant "The completion of R&D works lead-
ing to the implementation of MULTIREJESTRATOR PLUS, a new solution for
monitoring and controlling the power system to increase the operating efficiency,
extend the service life and optimise the environmental impact of wind farms."(No.
POIR.01.02.00-00-0247/17) from The National Centre for Research and Develop-
ment of Poland.

iii
Contents

Abstract i

Acknowledgments iii

List of Figures vi

List of Tables vii

Nomenclature viii

1 Introduction 1

2 Survey of Related Work 3


2.1 Methods for Vision-Based Bird Detection . . . . . . . . . . . . . . 3
2.2 Various Technologies for Bird Detection . . . . . . . . . . . . . . . 4

3 Problem Statement, Objective, Hypothesis, and Main Contribu-


tion 6
3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Main Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Modeling and Design 9


4.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Background Subtraction . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Contour Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Classification with Confusion Matrix . . . . . . . . . . . . . . . . 13

5 Implementation 15
5.1 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.1 Flowchart and its Working . . . . . . . . . . . . . . . . . . 15
5.1.2 Implementing Background Subtraction . . . . . . . . . . . 17
5.1.3 Implementing Contour Detection . . . . . . . . . . . . . . 18
5.2 Classification of Objects with YOLO V3 . . . . . . . . . . . . . . 19

iv
6 Testing and Validation 23
6.1 Training and Testing the Dataset (Weights) . . . . . . . . . . . . 23
6.2 Validation of Confusion Matrix . . . . . . . . . . . . . . . . . . . 23

7 Conclusions and Future Work 27

References 29

v
List of Figures

4.1 User Driven Design . . . . . . . . . . . . . . . . . . . . . . . . . . 9


4.2 Background Subtraction Model for Bird Detection . . . . . . . . . 11
4.3 Creating Bounding Boxes Around Objects . . . . . . . . . . . . . 13

5.1 Flowchart of Vision Based Bird Detection System . . . . . . . . . 16


5.2 Before Background Subtraction . . . . . . . . . . . . . . . . . . . 18
5.3 Enlarged Image showing the birds . . . . . . . . . . . . . . . . . . 18
5.4 After Background Subtraction . . . . . . . . . . . . . . . . . . . . 18
5.5 Contour Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.6 Three objects, classified as a bird, marked by a green rectangle,
and the clouds marked by two larger black rectangles. . . . . . . . 21
5.7 Classified Image of blades, clouds, and trees . . . . . . . . . . . . 22
5.8 Classified Blade and Cloud . . . . . . . . . . . . . . . . . . . . . . 22

6.1 Results Obtained After Training . . . . . . . . . . . . . . . . . . . 24


6.2 Confusion Matrix and Predicted Data Sample 1 . . . . . . . . . . 24
6.3 Confusion Matrix and Predicted Data Sample 2 . . . . . . . . . . 25
6.4 Confusion Matrix and Predicted Data Sample 3 . . . . . . . . . . 25

vi
List of Tables

4.1 User-Driven Design Table . . . . . . . . . . . . . . . . . . . . . . 11


4.2 Confusion Matrix Table . . . . . . . . . . . . . . . . . . . . . . . 14

vii
Nomenclature

ACC - Accuracy

ANN - Artificial Neural Networks

CDNN - Context Dependent Neural Network

CNN - Convolutional Neural Networks

DCNN - Deep Convolutional Neural Networks

ERR - Error

FN - False Negatives

FP - False Positives

FPR - False Positive Rate

FPS - Frames Per Second

GPU - Graphic Processing Unit

HD - High Definition

MOG2 - Mixture of Gaussian method 2

OpenCV - Open Source Computer Vision Library

REC - Recovery

RGB - Red-Green-Blue

SEN - Sensitivity

sklearn - Scikit learn

SPC - Specificity

SSD - Single Shot Detector

TN - True Negatives

viii
TP - True Positives

TPR - True Positive Rate

UDD - User Driven Design

YOLO v3 - You Only Live Once version 3

ix
Chapter 1
Introduction

As the population increases, technology is developing in every aspect, and with


that, massive expansion of structures, destroying of environmental habitats, which
causes the wildlife to migrate [1]. Due to this development, natural habitats were
altered and climate change is occurring rapidly. Due to these changes, the birds
that were migrating in search of new places were getting killed when they collided
with tall structures, wind turbines, airplanes, etc. where the damage is done to
both [2]. These deaths of the birds cause damage to both the environment and
technology.

Bird’s collision with airplanes and structures poses damage for both. While we
can observe migrating birds by using a [3] tracking radar, we found that birds
were searching for their prey looking down, while not seeing what is coming for-
ward. Many aircraft components like windshield, engine, wings are susceptible to
collision with birds. Especially windshields are very susceptible for high damage
[4]. According to the data obtained from Federal Aviation Administration, the
total number of bird strikes that happen annually have been increased six times
from 1,795 cases to 10,856 cases in the past 15 years. These birds strike accidents
have caused 103 aircrafts and 262 lives were lost from 1912 to 2008.

The purpose of the thesis is to design a vision-based bird detection system,


which would allow bird detection in a video stream, where the bird’s image must
be a minimum size of 100x100 pixels. We should detect the rare and exotic birds
with high detection effectiveness using [5] Open CV, MATLAB, Python or C++
[6]. The bird should be detected regardless of the background moving objects
in the provided video and images. The system should detect the bird in full
resolution and by using possible algorithms like morphological preprocessing and
feature extraction. We need to identify the bird size, color, velocity, and type
of species where the validity should be greater than 90 percent. It can be done
with [7] motion detection with background subtraction. Open CV can be used
to classify a bird in the bird image taken when the bird is at least 300m away
from the fitted camera. We can use deep learning methods like [8] convolutional
neural networks (CNN) with image pre-processing and motion detection. These

1
Chapter 1. Introduction 2

methods work on the detection of birds in raw images and add filters to make it
possible to see the bird in full resolution image. The image pre-processing can
work in convolutional or artificial neural networks (ANN).

This report consists of a total of 7 chapters where chapter 1 is an introduc-


tion, which explains the importance of bird detection, what happens when a bird
collides with aircrafts, and detection methods. Chapter 2 explains related work
which gives a brief explanation about many possible technologies for bird detec-
tion. Many research papers have been cited here where authors stated different
kinds of methods and technologies for bird detection. Chapter 3 explains the
problem statement, objectives, and main contributions of the project. Chapter 4
explains the modeling and its user-driven design. Chapter 5 gives the implemen-
tation of the project. Chapter 6 explains testing and validations of results in the
project. Chapter 7 consists of the conclusion and the future work of the project.
Chapter 2
Survey of Related Work

This section gives the survey of related work done by the researchers.

2.1 Methods for Vision-Based Bird Detection


Based on the Vision, there are many ways to detect a bird. Many research pa-
pers have been published on bird detection in video streams. During the 1950s,
a vision-based system for detecting birds was developed. Single-camera units [9]
and stereoscopic systems are two types of vision-based detection methods. In
small applications, single-camera units [10] are used because they detect only the
movement of an object and its identification. This solution is relatively inexpen-
sive. Compared to the single-camera method [11], stereoscopic method [12] is
more advanced. This model combines high-resolution cameras [13] . It provides
more information, such as position and size. Compared to other technologies,
such as radar-based detection, vision-based detection is far more efficient and
user-friendly. One of the best ways [14] to detect threats has been vision-based
detection.

It has been calculated that this vision-based detection can detect objects up
to 45 meters away. It was analyzed by calculating the total number of pixels
within the target to the camera. Roberto Albertani [15] presents a method of
determining bird presence and likelihood of collision based on object recognition
using cascading classifiers and a backup tracking system. In addition to removing
repeating false positives, the program also strengthens the detection system in
the process, resulting in a powerful avian detection system from a blade-mounted
camera. To ensure that the selected components will operate correctly, hardware
validation was conducted. As a housing for the camera, transmitter, and power
supply, a 3D-printed on-blade enclosure was built.

On the other hand, there is a method [16] in which through algorithms of ar-
tificial intelligence we can identify the bird. Moreover, the naturalist’s examined
the system and concluded that this system is having enough solidity of identify-

3
Chapter 2. Survey of Related Work 4

ing the bird. Furthermore, it is able to categorize the birds according to their
measurements.

According to Uma D. Nadimpalli [17], In their system, they used some meth-
ods together for identifying birds and performed it on 3 different images. First
and foremost is image morphology, the other one is Artificial Neural Networks
and the last one is template matching which gave the expected outcome for the
three images. However, ANN is advanced during this and is image morphology
incorporated in it and they produced the output according to the complexity of
the images.

2.2 Various Technologies for Bird Detection


Among the latest technologies is computer automated bird detection. Accord-
ing to Dominique Chabot [10], a study on birds that are organized using photos
and videos of the aerial atmosphere is more faultless compared to drifting viewers.
It is also time absorbing if pictures are inspected briefly. Computer-automated
bird detection is now possible in focused aerial pictures, credited to advances in
digital cameras and image-recognizing systems. An overview of image research
methods has been provided and evaluated the writings on this content. Birds
that are highly focused on the grayscale background are mostly susceptible to
this detection, where, it requires the primary image recognizing system.

A few methods [18] that are used to detect mammals can be used on birds, but
to detect birds with low pixels with aerial thermal-infrared images that are used
often on greater mammals will be limited in very little execution. Researchers
in bird populations are able to reduce their time and resources spent monitoring
bird populations by using automated bird detection and counts in aerial images,
thanks to the continued development of the camera and drone technology [19] .

In Another technology, that is according to Jeongjin Jo [20], the problem of


aircraft and birds colliding is being studied in various ways. Deep learning tech-
nologies are currently being used in image recognition research. With Convolu-
tional Neural Networks (CNN), this paper describes how to process images and
detect birds in all kinds of dynamic environments. Dynamic backgrounds are re-
moved from creatures in motion through preprocessing, and the moving creatures
are separated from them. A learning model is built on input data of the image
of the bird remaining in the frame after preprocessing is complete. The authors
want to improve small object classification accuracy using the Inception-v3 neural
network model.
Chapter 2. Survey of Related Work 5

There is one more technology we have read about bird detection that is based
on radar [21]. In reality, radio waves can be sent back by birds with the help of
this we can detect a bird [22] with radar. According to the authors the system
they made is for observing the birds and gathering it’s moving direction details.
Moreover, they also introduced a tool that is Bio-Rad [23] which plays a key role
in collecting the details and studying the data. Furthermore, this will also help
in detecting a bird from a far distance. They also said that this can be performed
at any period without depending on the light [24].
Chapter 3
Problem Statement, Objective,
Hypothesis, and Main Contribution

3.1 Problem Statement


Considering the survey of related work, it is essential to detect and monitor
the birds around urban facilities like windmills and airports. Urban facilities like
airports and wind farms are more and more frequently built nearby the natural
environment of wild animals [25]. This leads to dangerous close-ups of rare bird
species with man-made structures [26]. Birds collide with wind turbines and
airplanes causing a loss in the bird population. Therefore, the development of a
reliable bird detection system is of great importance.

We have a reliable number of methods for bird protection and their preserva-
tion. But some of the methods and systems that were mentioned in the survey
of related works are expensive and need many regulations to implement. Every
bird detection system detects the bird and monitor’s the required data according
to the system needs. The detection system must be stable and need to detect the
bird in an image with minimal number of pixels if possible. This project solves the
main challenges involved in the detection of birds. This project also deals with
the analysis of possible technologies and selecting the most relevant method for
the vision-based bird detection which allows detection of birds in video streams
and selecting the best possible method that would work in real-time at full [27]
HD resolution and also in grayscale. The selected method should be capable to
distinguish between birds and other objects like insects, turbines, etc.

In this project, we developed a system that is capable of separating a bird that


is occupying less than 20 pixels in an image from the other moving objects in
full HD resolution videos and images. We also classify the detected objects in
the image accordingly and create a confusion matrix that can calculate the true
positives, true negatives, false positives, and false negatives of the classification
result.

6
Chapter 3. Problem Statement, Objective, Hypothesis, and Main Contribution7

3.2 Objectives
Four research questions define the objective of the project.

1. Which method is most effective to detect a moving object in a series of


grayscale images of resolution at least 1920x1440 pixels with minimum
15 frames per second (FPS) rate?

2. Which algorithm is most effective to classify moving wind turbines blades


in a series of grayscale images of resolution at least 1920x1440 pixels with
minimum 15 FPS rate?

3. Which algorithm is most effective to classify background trees moving with


the wind in a series grayscale images of resolution at least 1920x1440 pixels
with minimum 15 FPS rate?

4. Which algorithm is most effective to recognize bird-looking clouds in a series


of grayscale images of resolution at least 1920x1440 pixels with minimum
15 FPS rate?

They are many methods to detect moving objects in grayscale images or video
sequences such as shadow detection, frame difference method, background sub-
traction, etc. But, to detect the moving object at the required resolution and
with a minimum [28] FPS rate, the background subtraction technique is effective
as it can detect the moving objects in a series of grayscale images accurately.
To classify moving objects in a video, the convolution neural networks (CNN)
technique is used. Rather than any neural networks, CNN are based on pattern
recognition. This method can classify moving objects by extracting them from a
raw video to detect wind turbines, trees, and clouds. The Haar Cascade classifier
is also used as it will apply filters, representing the bird, blades, trees, and clouds
to the background subtracted sequence of images in required resolution with a
rate of a minimum of 15 FPS.

3.3 Hypothesis
The hypothesis of our project is the system consists of only two functionali-
ties. Bird detection and object classification. For bird detection, we are using a
background subtraction model called the Gaussian mixture model. It is a fun-
damental method to extract moving objects from a fixed background. As for
object classification, we are using [29] convolutional neural networks (CNN) and
[30] Haar Cascade classifier. These classifiers will filter the bird, blades, trees,
and clouds. Bird detection can vary between many methods but is done com-
monly with threshold methods like gray, RGB, and size threshold. These methods
Chapter 3. Problem Statement, Objective, Hypothesis, and Main Contribution8

can separate the birds from the background moving objects with their required
threshold levels and with neural networks techniques to detect the bird.

3.4 Main Contribution


The vision-based bird detection system is designed using a user-driven design
approach. The system was designed in such a way that, the bird will be detected
in streaming video of HD resolution where the minimal size of the bird is less
than 4x5 pixels. We use Python programming language and open CV to create
a bird detection system.

Many deep learning methods have been used for the detection of birds. Some
of the methods are convolutional neural networks, image pre-processing, and mo-
tion detection techniques. These methods can be trained on the dataset which is
created from the obtained images and videos and used to identify the birds and
other objects in the provided data and classify them to make it possible to see the
bird of the desired size in pixels with full resolution. We can use morphological
image pre-processing to separate the background moving objects from the bird.
The image pre-processing can work in convolutional or artificial neural networks
[31]. Template matching is done on pre-processed images where templates have
been stored in a database and accessed one by one for correlating each template
image with an input image. We are proposing a method where we would like to
detect the bird in a minimal size less than 100x100 pixels. The proposed detection
algorithm needs to work in real-time at grayscale with full HD resolution. More-
over, the system has to distinguish birds from the background or other moving
objects like wind turbine blades, trees, or clouds.

The code is developed in a way that it can detect 4 parameters that are trees,
clouds, windmill blades, and birds, and use [32] grayscale background subtraction
to extract the parameters from the background. The detected moving objects
will be shown in the background-subtracted frame. Therefore, the bird can be
detected by subtracting the other moving objects like trees and clouds in the
background subtraction.
Chapter 4
Modeling and Design

Figure 4.1: User Driven Design

From Figure 4.1, the User-Driven Design methodology helps to procure the
objectives of the thesis. It helps in identifying the specific stages of the thesis. It
provides the flow of the thesis. The initial step involves identifying the existing
methods from the literature. The existing methods are analyzed for finding the
state-of-the-art method that suits the proposed problem statement. The algo-
rithm is proposed from the selected algorithms to solve the constraints of the
problem statement and is evaluated on the data provided by Bioseco. Shown in
Figure 4.1 User-Driven Design methodology was based on the [33] design method-
ology approach.

9
Chapter 4. Modeling and Design 10

This User-Driven Design consists of problem formulation and product devel-


opment. The [34] problem formulation was created in a way that consists of the
overall solutions of the system and regulations created to monitor the bird de-
tection continuously. It states that the system should detect the required object
continuously. The required objects that are needs to be detected continuously
are birds, windmill blades, clouds, and trees. These detected objects can be used
further for classification purposes.

The [35] product development involves technologies and algorithms that are
possible to develop a vision-based bird detection system. Product development
consists of two stages, where technology and algorithm is one stage and modelling
and prototyping is another stage. The possible technology and algorithm that
we consider to use in the system are [36] Open-CV, [37] YOLOv3, and [38] back-
ground subtraction technique. The prototyping shows the preliminary version of
the system.

4.1 System Design


This is the summary of the user-driven design table:

• The table has been divided into three main sections. Those are functional-
ities, particular constraints, and possible technologies and algorithms.

• The functionalities section is divided into two sections i.e, general and item-
ized. These functionalities represent the main parameters of the project.

• The particular constraints consist of requirements like accuracy, resolution


and time delay, etc.

• The available technologies and algorithm consists of possible methods of


implementation and techniques that can be used for the project.

• As shown in Table 4.1 general functionalities contains detection and object


classification which are most important for bird detection.

• The itemized section consists of the purpose of general functionalities i.e,


for detection, we must detect the moving object like birds and for object
classification, we must classify the other moving objects like blades, trees,
and clouds.

• The constraints of the detection is that it must be done in at least 15 frames


with resolutions of 2560x1920 pixels, and 3200x2400 pixels in grayscale.
Chapter 4. Modeling and Design 11

Functionalities Particular Possible technologies


General Itemized constraints & algorithms
Min. 15 FPS Resolutions
Vision system and
Detection Moving objects (2560x1920)(3200x2400),
background subtraction
Grayscale.
Birds - 95% accuracy
Blades - 90% accuracy
Object Birds, blades , trees Clouds - 80% accuracy Yolov3, Mobinet SSD ,
Classification and clouds Trees - 70% accuracy Haar Cascade classifier,
2 seconds delay compared to
detection.

Table 4.1: User-Driven Design Table

• The object classification is constrained by the expected minimum accuracy


for blades, clouds, and trees being equal to 90%, 80%, and 70% respectively.
The classification must be done with maximum delay of 2 seconds.
• The technologies that are used for detection are background subtraction and
vision system and for object classification, we use CNN and Haar Cascade
classifier.

4.2 Background Subtraction


This section explains the process of detecting moving objects in given input
frames. The foreground mask is calculated as in [39] by performing the sub-
traction between the current moving frame from the input frame, in general,
which is considered by differentiating the observed scene from the given input
video/image.

Figure 4.2: Background Subtraction Model for Bird Detection


Chapter 4. Modeling and Design 12

As shown in Figure 4.2, background subtraction consists of 4 major steps,


pre-processing, background modeling, foreground detection, and data validation.
Pre-processing is a process where the raw data from the input video sequence
can be ready for the next phase [40]. The background modeling consists of the
stable frame where the moving object is eliminated by updating the new video
frames and calculating the background model with statistical description. Data
validation examines the mask and eliminates the unnecessary pixels from moving
objects and provides the foreground mask output.

This method is mainly used on the computational basis to subtract the moving
objects (foreground objects) from the background in a series of frames or a video
sequence. The idea of this background subtraction method is to extract the
current image from the reference image. We are going to use a [41] Gaussian
mixture-based background subtraction and morphological filter which depends
on moving bounded objects.

We choose the [42] MOG2 method because of its low memory consumption
and low complexity rate. For the MOG2 the background is considered as a para-
metric frame and each pixel is represented as the particular number for Gaussian
function. The equation [43] is given as


k
P (Xt ) = ωi ,t .η(Xt , μi ,t , σi, t) (4.1)
i=1

where

Xt = Observations
η = ith Gaussian Component
ωi ,t = Data(W eights) associated by ith component with time t
μ = M ean intensity
σi, t = Standard deviation
t = time

4.3 Contour Detection


Early approach of contour detection targets quantifying the presence of a bound-
ary at a given image location by local measurements. The early stage operators
detect edges by combining a grayscale image with local derivatives. As shown
in Figure 4.3 more recent local approach takes account color and image texture
information and will make use of learning techniques with color combined cue
Chapter 4. Modeling and Design 13

Figure 4.3: Creating Bounding Boxes Around Objects

information. Contour detection [44] is an important feature that is used to create


cue combined bounding boxes around the detected objects to verify that an object
is being detected. In our project we are going to add the contour detections in
the background subtracted output. The contours of the detected objects are then
indicated by the bounding boxes in the images with the subtracted background.

4.4 Classification with Confusion Matrix


The [45] confusion matrix also known as the error matrix is used to determine the
performance of a classifier for binary classification tasks. The confusion matrix
is a square matrix that consists of columns and rows that list the number of
instances as ”actual class” vs. ”predicted class” ratios. Prediction error (ERR)
and accuracy (ACC) are used to provide information about how many samples are
misclassified. The error can be calculated by summing up all the false predictions
and dividing by the total number of predictions. The accuracy is calculated
by summing up all the correct predictions and dividing by the total number of
predictions, respectively.

ERROR = (F P + F N )/(F P + F N + T P + T N ) = 1ACC(1) (4.2)

ACCU RACY = (T P + T N )/(F P + F N + T P + T N ) = 1ERR (4.3)


The True Positive Rate (TPR) and False Positive Rate (FPR) are used to solve
imbalanced class problems. The main interest in spam classification is detecting
and filtering spam but at the same time, it is also necessary to minimize the
number of messages that were misclassified as spam (False Positives): The True
Positive Rate provides useful information about the fraction of positive samples
that were correctly identified from the total number of positives.

F P R = F P/N = F P/(F P + T N ) (4.4)

T P R = T P/P = T P/(F N + T P ) (4.5)


Chapter 4. Modeling and Design 14

Manual data(y-true)
Predicted - Detected Not detected
data Detected True Positive False Positive
(y- pred) Not Detected False Negative True Negative

Table 4.2: Confusion Matrix Table

where TPR is also known as recall or sensitivity measures the recovery rate of
positives and specificity measures the recovery rate of negatives.

SEN = T P R = REC = T P/P = T P/(F N + T P ) (4.6)

SP C = T N R = T N/N = T N/(F P + T N ) (4.7)


The matrix formation has been started when the software is running. The
comparison of the manual data(y-true) and the predicted data(y-pred) have been
started. The manual data must be given and whereas, the predicted data will
be obtained after running the software. Since the classification of objects varies
with every video/image. The manual data is also different concerning each video
image. The obtained matrix shows the overall confusion matrix as shown in
this table. This table shows the overall calculation of the confusion matrix and
how it is obtained. The matrix consists of two aspects that are used to test
the performance of a classification algorithm, they are the manual data and the
predicted data. Each data can have two states, the detected object, and the not
detected object. From these two states, if the detection has been occurred in
both manual data and predicted data, then the resulting detected object is True
Positives (TP). If the detection has occurred in predicted data but not in manual
data, then the object is False Positive (FP). If the detection has been occurred
in manual data and not in predicted data, then the resulting detected object is
False Negative (FN). If the detection has not been occurred in both manual data
and predicted data, then the object is True Negative (TN).
Chapter 5
Implementation

In this section, we are going to specify the technical details of the project and
describe how the prototype is built in detail with all types of detection methods
and algorithms.

5.1 Software Implementation


The vision-based bird detection system is completely based on developing the
software for bird detection and classification. Therefore, no hardware components
have been used in this project. The developed software implement three methods,
described further in this chapter. The software code has been developed by using
a Python programming language. We have used the [46] Google Colab notebook
that runs in the cloud to train the images.

5.1.1 Flowchart and its Working


Figure 5.1 represents the flowchart process of the vision-based bird detection
system. It involves the information from giving the input video to the pro-
grammed software to obtaining the desired result. It consists of the data briefly
in sequential steps.

The videos that are provided by the Bioseco company have been given as input.
The video will be converted into frames, where every frame will store in a folder
with the same directory as the video. The frames will be then processed by back-
ground subtraction algorithm where the constant background frame is removed
from the grayscale frames which, lets us determine the contour detection.

Contour detection is applied to the processed grayscale frames where a bounded


box is formed around the desired moving objects. Now, these contour detected
frames will be passed through, where the classification begins. The actual (man-
ual) values have been declared in the program to let the software compare the
actual and predicted values at a later step.

15
Chapter 5. Implementation 16

Figure 5.1: Flowchart of Vision Based Bird Detection System


Chapter 5. Implementation 17

The classification begins with reading the objects in every frame. The detected
objects were classified from the initial frame to the last frame. In the processed
frames, the detected objects will be further marked with labels. These frames
store the added labels in data that is created by the program. This data is called
the predicted(y-pred) data. This obtained object data will be stored in the image
folder.

This processing of frames will continue until the completion of the last frame.
After obtaining the data from the last frame, comparing the actual data and the
obtained data will begin by using the confusion matrix. The matrix compares
the actual data values(y-true) and the obtained data values (y-pred) and create
four data values such as TP, TN, FP and FN. The confusion matrix is obtained
and the result can be seen in the program output.

5.1.2 Implementing Background Subtraction


Background subtraction [47] is a method that is used for developing the fore-
ground mask. We have used Gaussian mixture-based foreground/background
subtraction in our software. We have first read the given input video or image
sequences using " cv::VideoCapture ". After giving the input, we have created
the background subtraction model by using "cv::BackgroundSubtractor class". In
this subtractor, we have given three parameters to initialize the background sub-
traction, those are history, varThreshold, and detectShadows. We have given the
history as "2" and [48] varThreshold as "10" and "False" for the detect Shadows
to extract the high resolution and clear output of the video. Later, the out-
put can be obtained by using the "imshow" function for the applied background
subtraction. In Figure 5.2 the input video frame is given.
Chapter 5. Implementation 18

Figure 5.2: Before Background Subtraction

Figure 5.3: Enlarged Image showing the birds

In Figure 5.4 the extracted grayscale image from the given input video frame
is shown. As one can see here, the three small dots together in the middle of the
image are birds, which can be found after doing the background subtraction to
the given input.

Figure 5.4: After Background Subtraction

5.1.3 Implementing Contour Detection


After the background subtraction, we have used the [49] contour detection method
to create a bounding box around the object in the image with subtracted back-
ground. We have given this background subtracted output as an input to the
Chapter 5. Implementation 19

contour detection and given the function "detections = []" to store those de-
tections. After storing the detections, we will go through each detected object.
Now we have given the "for" loop in contours. To calculate the area and remove
the small elements, we have given [50] "cv2.contourArea(cnt)" which will count
the area in pixels of an object. Now, we have given an "if" statement to detect
the object with its area greater than 30 pixels and store the data in the detec-
tion list by giving"if (cont_ar > 30 and cont_ar < 20000):" [51]. By using this
"cv2.boundingRect(cnt)" we have given four parameters to create a bounding box
around the object. Here Figure 5.5 shows the output of the contour detection
stage where the detected bird and blade are indicated by the bounding box.

Figure 5.5: Contour Detection

5.2 Classification of Objects with YOLO V3


The classification of the birds, trees, clouds, and windmill blades is done in this
project. The classification method that we have applied in our project uses [52]
Chapter 5. Implementation 20

YOLO v3 algorithm. This algorithm has been trained on a data set before using
it to classify the objects of interest in our work. We have done the classification
for four detected objects that is bird, blade, trees, and clouds. The accuracy
for the bird must be 95% and for the blade with 90% accuracy. The clouds are
up-to 80% and the trees are with 70% accuracy. We have first extracted the
grayscale image from the provided raw input video and thereafter subtracted the
background in the grayscale image. As shown in Figure 4.3, in the background
subtracted output, we have added a bounding box around the moving objects. To
perform this detection, we have taken the enlarged part of the moving bounded
objects where the cropped image of the detected object will be displayed. This
cropped output has been given as the input for the classifier.

We have used the data provided by the Bioseco company to train the classifier
weights. We have first converted the provided data into frames. To apply the
trained model on the frames, we need the model weights and model configurations.
We need to give the required classes for training and need to filter them. Then
the frames will be fed into the trainer and the weights file will be obtained after
training it for a few hours. We need to use this weights file, configurations file,
and the objects file in the classification code, and the obtained output can classify
bird, blade tree, and cloud.

The classification is done on an image where the bird is detected and has been
shown with a [53] rectangular box around the bird which gives its total True
or False Positives and Negatives, and threshold accuracy of 97.67% as it was
obtained in the trained frame.

After training the weights file and the configurations file, now we will use these
files in the development of the classification of objects. The libraries that are
used in this software implementation are Open-CV, NumPy, time, and Scikit-
learn [37]. We have developed this software by using the Python programming
language. There are four phases in which the software has been implemented.

In the first phase, the given input video has been taken and converted into the
grayscale frames by using the background subtraction method. We have used the
mixture of gaussian-based subtraction techniques to implement the background
subtraction. After converting the frames into grayscale frames, the moving objects
can be identified easily.

In the next phase, we are giving the contours around the moving objects.
Therefore object detection has been done where the contours have been given to
the moving objects in the video. From these moving objects, we need to classify
the detected objects into classes named Birds, Blade, Trees, and Clouds.
Chapter 5. Implementation 21

In the later phase which consists of classification, [54] we are going to use the
weights file, configurations files, and the objects file. From the objects file, we
have filtered the four object classes which needs to be classified. For this purpose,
the configurations file has been used to mark the four object classes. Now while
giving the raw video as the input, we have created a blob function with the height,
width, and shape of the given object frames which is used for the classification
by using the weights file. Figure 5.6 shows the classified image of Bird and Cloud
with its accuracy.

Figure 5.6: Three objects, classified as a bird, marked by a green rectangle, and
the clouds marked by two larger black rectangles.

To classify, we are giving the weights file to two inputs. One is the raw input,
another one is the cropped frame input. By giving the weights file to these two
inputs, we can show the classified objects in the full-frame and the cropped frame.
Chapter 5. Implementation 22

Figure 5.7: Classified Image of blades, clouds, and trees

In the final phase, we have created a confusion matrix [55] that will calculate
the true positive (TP), true negative (TN), false positive (FP), and false negatives
(FN). We are going to calculate them by using by comparing manual data and
the predicted data. The manual data is the one that we are going to give and
the predicted data is the one the software has been predicted. By comparing
these two, we can obtain the TP, TN, FP, and FN values. Figure 5.7 contains the
image of classified blade, cloud, and trees with accuracy and Figure 5.8 contains
the image of classified blade and clouds.

Figure 5.8: Classified Blade and Cloud


Chapter 6
Testing and Validation

6.1 Training and Testing the Dataset (Weights)


To train the weights file, first, we need to convert the provided data into the
frames [56] . These converted frames will be stored in a zipped folder. Now we
should download the dark-net repository files from the internet. The dark-net is
a neural network framework [57] which supports the YOLO v3 object detection
approach. It is a fast, easy to install and high accuracy framework which supports
the CPU and GPU computation. These files contain the pre-trained weights, their
configurations, and their frames names file. To train by using these files, we need
to enable their darknet make-files from "0" to "1" in order to enable it. These
darknet make-files consist of 3 components that are needed to change. They
are Open CV, CDNN, and the GPU. After enabling the darknet make-files, now
we need to filter (removing unnecessary objects to classify the required 4 main
objects) the configurations file by adding the required classes that are needed
to detect. There are 18 classes in this configurations file. Now we need to add
the required 4 classes which are needed for the classification. After filtering the
classes, now we should import the zipped folder and unzip them into the dark-net
repository files folder. Now we should import the frame names file and change
their objects into the required objects which are needed to classify. Now we should
extract the name of each frame and put it in a text file with the repository files.
Now import the pre-trained weights into the repository file and start the object
classification training with the configurations file. Now we can obtain the trained
weights file and configurations file which can be used for the classification. Here
Figure 6.1 shows the obtained results of the data-set training.

6.2 Validation of Confusion Matrix


Here Figure 6.2 shows the obtained confusion matrix and its predicted data of
detected birds and clouds. The obtained true positives of bird count are 268 and
the cloud count is 181. The number of frames that were saved in the output
was 257. The training has been done in a way where the detection priority of
the classifier is according to the inequalities, defined in the expression 6.1. The

23
Chapter 6. Testing and Validation 24

Figure 6.1: Results Obtained After Training

confusion matrix shown in Figure 6.2 has been obtained using the frames from a
video file; an example of a frame from that video file is shown in Figure 5.6

Figure 6.2: Confusion Matrix and Predicted Data Sample 1

Bird > Blade > Cloud > T ree (6.1)


The classifier was trained in this way to detect the moving bird in a streaming
video first and then detect the other objects. The software counts the total detec-
tions of each object, creating the predicted values which are used for comparison
with actual values and forming the confusion matrix. Figure 6.3 represents the
obtained predicted values and the confusion matrix between blade, cloud, and
Chapter 6. Testing and Validation 25

tree. The obtained true positive count for the blade is 119, the cloud is 65 and
the tree is 36. The number of frames that were saved in the output was 184. The
confusion matrix shown in Figure 6.3 has been obtained using the frames from a
video file; an example of a frame from that video file is shown in Figure 5.7

Figure 6.3: Confusion Matrix and Predicted Data Sample 2

Figure 6.4: Confusion Matrix and Predicted Data Sample 3

Figure 6.4 shows the predicted data and calculated true positives between the
blade and the cloud. In this figure, we can see the predicted results and the
confusion matrix for the Blade is 1405 and the cloud is 42. The confusion matrix
shown in Figure 6.4 has been obtained using the frames from a video file; an
example of a frame from that video file is shown in Figure 5.6. The number of
frames that were saved in the output was 400. However, this shows order defined
Chapter 6. Testing and Validation 26

by equation 6.1 works in a way that if the video doesn’t have any bird detections,
then the priority will shift to the blade and so on.
Chapter 7
Conclusions and Future Work

In this developing world, software that can identify and classifies birds and other
objects is necessary. Saving the rare bird species to prevent damage on windmills
can decrease the financial loss and as well as save the bird species. This software
helps identify the moving objects in the air near the windmills, terrestrial places,
and airports and can detect the object with great accuracy.

In this thesis, we have used background subtraction and contour detection to


detect objects in the frames present in the video. These methods are used to
detect moving objects. Then to classify, we used Haar Cascade to detect and
classify the objects. These classifiers are prone to false detection which decreases
the efficiency of the classifications. This classifier detects the bird even if the bird
is not present. So, we have tried different classifiers and switched to another clas-
sifier called the YOLO v3 classifier. Although this classifier takes more time to
train than we expected, the results obtained from training by using this classifier
are reliable and very good.

We have also tried the MobileNet-SSD classifier. It has given the best results
and can pinpoint the detected objects and classified them. But by using this
classifier, we cannot able to use the weights files simultaneously to two inputs,
and creating a confusion matrix from this classifier is not possible. Therefore, we
have taken the YOLO v3 classifier.

This YOLO v3 classifier has given us impressive accuracy in classifying the


different objects like birds, blades, trees, and clouds. By using the YOLO v3
classifier, we have detected the bird with 95% accuracy, blades with 90% accu-
racy, clouds with 80% accuracy, trees with 70% accuracy. This accuracy has been
based on the classifier that we have trained and by comparing the results obtained
for different video streams.

By this thesis, we concluded that the YOLO v3 classifier gives the best ac-
curacy in classifying the different objects when compared to other classifiers like
Haar Cascade and Mobile-Net SSD classifier. This YOLO V3 classifier has dif-

27
Chapter 7. Conclusions and Future Work 28

ferent classes so it gives better accuracy and it erases the false positives which
increase the efficiency of the detected system.

This thesis provides a better classification and this can be extended to increase
the performance time of classification by using multi-threading processing, which
can be used on videos with high frames per second. The classification with the
bounding box can be extended to the segmentation of the objects. The classifica-
tion can be improved by developing new classifiers with deep learning with more
scope on eradicating the false positives.

In this thesis, we have used YOLO v3 classifier to detect and classify birds,
windmills, clouds, and trees. If any researchers want to work in the future, they
can use different classifiers with more complicated neural networks like Deep Con-
volution Neural Networks (DCNN), which may give more accuracy in detection
and classification. This software can be further developed to work in live oper-
ations. The classification tasks can also be expanded by differentiating various
objects and able to classify the rare bird species and estimating distance between
the birds and windmills, and able to alert us and scare away the bird to prevent
collision with the windmill blade.
References

[1] Silke Bauer, Judy Shamoun-Baranes, Cecilia Nilsson, Andrew Farnsworth,


Jeffrey F Kelly, Don R Reynolds, Adriaan M Dokter, Jennifer F Krauel,
Lars B Petterson, Kyle G Horton, et al. The grand challenges of migration
ecology that radar aeroecology can help answer. Ecography, 42(5):861–875,
2019.

[2] K Samantha Nichols, Tania Homayoun, Joanna Eckles, and Robert B Blair.
Bird-building collision risk: An assessment of the collision risk of birds with
buildings by phylogeny and behavior using two citizen-science datasets. PloS
one, 13(8):e0201558, 2018.

[3] Bruno Bruderer. The study of bird migration by radar. Naturwissenschaften,


84(1):1–8, 1997.

[4] Vyacheslav Merculov and Dmitry Ivchenko. Simulation of bird collision with
aircraft laminated glazing. In Advances in Design, Simulation and Man-
ufacturing III: Proceedings of the 3rd International Conference on Design,
Simulation, Manufacturing: The Innovation Exchange, DSMIE-2020, June
9-12, 2020, Kharkiv, Ukraine–Volume 2: Mechanical and Chemical Engi-
neering, page 179. Springer Nature, 2020.

[5] Da Li, Bodong Liang, and Weigang Zhang. Real-time moving vehicle de-
tection, tracking, and counting system implemented with opencv. In 2014
4th IEEE International Conference on Information Science and Technology,
pages 631–634. IEEE, 2014.

[6] Tuan Tu Trinh, Ryota Yoshihashi, Rei Kawakami, Makoto Iida, and Takeshi
Naemura. Bird detection near wind turbines from high-resolution video using
lstm networks. In World Wind Energy Conference (WWEC), volume 2,
page 6, 2016.

[7] Parisa Darvish Zadeh Varcheie, Michael Sills-Lavoie, and Guillaume-


Alexandre Bilodeau. A multiscale region-based motion detection and back-
ground subtraction algorithm. Sensors, 10(2):1041–1061, 2010.

29
References 30

[8] Thomas Grill and Jan Schlüter. Two convolutional neural networks for bird
detection in audio signals. In 2017 25th European Signal Processing Confer-
ence (EUSIPCO), pages 1764–1768. IEEE, 2017.

[9] Suk-Ju Hong, Yunhyeok Han, Sang-Yeon Kim, Ah-Yeong Lee, and Ghiseok
Kim. Application of deep-learning methods to bird detection using un-
manned aerial vehicle imagery. Sensors, 19(7):1651, 2019.

[10] Dominique Chabot and Charles M Francis. Computer-automated bird de-


tection and counts in high-resolution aerial images: A review. Journal of
Field Ornithology, 87(4):343–359, 2016.

[11] Christopher JW McClure, Luke Martinson, and Taber D Allison. Automated


monitoring for birds in flight: Proof of concept with eagles at a wind power
facility. Biological Conservation, 224:26–33, 2018.

[12] Roelof Frans May, Øyvind Hamre, Roald Vang, and Torgeir Nygård. Eval-
uation of the dtbird video-system at the smøla wind-power plant. detection
capabilities for capturing near-turbine avian behaviour. NINA rapport, 2012.

[13] Gustavo Gil, Giovanni Savino, Simone Piantini, and Marco Pierini. Is stereo
vision a suitable remote sensing approach for motorcycle safety? an analysis
of lidar, radar, and machine vision technologies subjected to the dynamics
of a tilting vehicle. Proceedings of the 7th Transport Research Arena TRA,
Vienna, Austria, 12, 2017.

[14] Xin Feng, Youni Jiang, Xuejiao Yang, Ming Du, and Xin Li. Computer
vision algorithms and hardware implementations: A survey. Integration,
69:309–320, 2019.

[15] William Gage Maurer. Bird and bat interaction vision-based detection sys-
tem for wind turbines. 2016.

[16] Dawid Gradolewski, Damian Dziak, Milosz Martynow, Damian Kaniecki,


Aleksandra Szurlej-Kielanska, Adam Jaworski, and Wlodek J Kulesza. Com-
prehensive bird preservation at wind farms. Sensors, 21(1):267, 2021.

[17] Uma D Nadimpalli, Randy R Price, Steven G Hall, and Pallavi Bomma. A
comparison of image processing techniques for bird recognition. Biotechnol-
ogy progress, 22(1):9–13, 2006.

[18] Gábor Bakó, Márton Tolnai, and Ádám Takács. Introduction and testing of a
monitoring and colony-mapping method for waterbird populations that uses
high-speed and ultra-detailed aerial remote sensing. Sensors, 14(7):12828–
12846, 2014.
References 31

[19] Sharon Dulava, William T Bean, and Orien MW Richmond. Environmental


reviews and case studies: applications of unmanned aircraft systems (uas)
for waterbird surveys. Environmental Practice, 17(3):201–210, 2015.

[20] Jeongjin Jo, Junwon Park, Jinyoung Han, Minsun Lee, and Anthony H
Smith. Dynamic bird detection using image processing and neural network.
In 2019 7th International Conference on Robot Intelligence Technology and
Applications (RiTA), pages 210–214. IEEE, 2019.

[21] David Lack and GC Varley. Detection of birds by radar. Nature,


156(3963):446–446, 1945.

[22] Anthony D Fox and Patrick DL Beasley. David lack and the birth of radar
ornithology. Archives of natural history, 37(2):325–332, 2010.

[23] Adriaan M Dokter, Peter Desmet, Jurriaan H Spaaks, Stijn van Hoey,
Lourens Veen, Liesbeth Verlinden, Cecilia Nilsson, Günther Haase, Hidde
Leijnse, Andrew Farnsworth, et al. biorad: biological analysis and visualiza-
tion of weather radar data. Ecography, 42(5):852–860, 2019.

[24] Hans van Gasteren, Karen L Krijgsveld, Nadine Klauke, Yossi Leshem, Is-
abel C Metz, Michal Skakuj, Serge Sorbi, Inbal Schekler, and Judy Shamoun-
Baranes. Aeroecology meets aviation safety: early warning systems in europe
and the middle east prevent collisions between birds and aircraft. Ecography,
42(5):899–911, 2019.

[25] William L Thompson. Towards reliable bird surveys: accounting for individ-
uals present but not detected. The Auk, 119(1):18–25, 2002.

[26] Scott R Loss, Tom Will, and Peter P Marra. Estimates of bird collision
mortality at wind facilities in the contiguous united states. Biological Con-
servation, 168:201–209, 2013.

[27] Holger Flatt, Holger Blume, and Peter Pirsch. Mapping of a real-time ob-
ject detection application onto a configurable risc/coprocessor architecture
at full hd resolution. In 2010 International Conference on Reconfigurable
Computing and FPGAs, pages 452–457. IEEE, 2010.

[28] Jing Yi Tou and Chen Chuan Toh. Optical flow-based bird tracking and
counting for congregating flocks. In Asian Conference on Intelligent Infor-
mation and Database Systems, pages 514–523. Springer, 2012.

[29] Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and Gang Hua. A
convolutional neural network cascade for face detection. In Proceedings of
the IEEE conference on computer vision and pattern recognition, pages 5325–
5334, 2015.
References 32

[30] Sander Soo. Object detection using haar-cascade classifier. Institute of Com-
puter Science, University of Tartu, 2(3):1–12, 2014.

[31] Min-Seok Choi and Whoi-Yul Kim. A novel two stage template match-
ing method for rotation and illumination invariance. Pattern recognition,
35(1):119–129, 2002.

[32] J Cezar Silveira Jacques, Claudio Rosito Jung, and Soraia Raupp Musse.
Background subtraction and shadow detection in grayscale video sequences.
In XVIII Brazilian symposium on computer graphics and image processing
(SIBGRAPI’05), pages 189–196. IEEE, 2005.

[33] Damian Dziak, Bartosz Jachimczyk, and Wlodek J Kulesza. Iot-based in-
formation system for healthcare application: design methodology approach.
Applied sciences, 7(6):596, 2017.

[34] Roger J Volkema. Problem formulation in planning and design. Management


science, 29(6):639–652, 1983.

[35] Shona L Brown and Kathleen M Eisenhardt. Product development: Past


research, present findings, and future directions. Academy of management
review, 20(2):343–378, 1995.

[36] Gary Bradski and Adrian Kaehler. Opencv. Dr. Dobb’s journal of software
tools, 3:2, 2000.

[37] Liquan Zhao and Shuaiyang Li. Object detection algorithm based on im-
proved yolov3. Electronics, 9(3):537, 2020.

[38] Massimo Piccardi. Background subtraction techniques: a review. In 2004


IEEE International Conference on Systems, Man and Cybernetics (IEEE
Cat. No. 04CH37583), volume 4, pages 3099–3104. IEEE, 2004.

[39] Ahmed Elgammal, David Harwood, and Larry Davis. Non-parametric model
for background subtraction. In European conference on computer vision,
pages 751–767. Springer, 2000.

[40] Shahrizat Shaik Mohamed, Nooritawati Md Tahir, and Ramli Adnan. Back-
ground modelling and background subtraction performance for object de-
tection. In 2010 6th International Colloquium on Signal Processing & its
Applications, pages 1–6. IEEE, 2010.

[41] Sriram Varadarajan, Paul Miller, and Huiyu Zhou. Spatial mixture of gaus-
sians for dynamic background modelling. In 2013 10th IEEE International
Conference on Advanced Video and Signal Based Surveillance, pages 63–68.
IEEE, 2013.
References 33

[42] Thierry Bouwmans, Fida El Baf, and Bertrand Vachon. Background mod-
eling using mixture of gaussians for foreground detection-a survey. Recent
patents on computer science, 1(3):219–237, 2008.

[43] Fida El Baf, Thierry Bouwmans, and Bertrand Vachon. Fuzzy statistical
modeling of dynamic backgrounds for moving object detection in infrared
videos. In 2009 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition Workshops, pages 60–65. IEEE, 2009.

[44] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Con-
tour detection and hierarchical image segmentation. IEEE transactions on
pattern analysis and machine intelligence, 33(5):898–916, 2010.

[45] Cinthia OA Freitas, Joao M De Carvalho, JoséJosemar Oliveira, Simone BK


Aires, and Robert Sabourin. Confusion matrix disagreement for multiple
classifiers. In Iberoamerican Congress on Pattern Recognition, pages 387–
396. Springer, 2007.

[46] Teddy Surya Gunawan, Arselan Ashraf, Bob Subhan Riza, Edy Victor
Haryanto, Rika Rosnelly, Mira Kartiwi, and Zuriati Janin. Development
of video-based emotion recognition using deep learning with google colab.
TELKOMNIKA, 18(5):2463–2471, 2020.

[47] Zoran Zivkovic. Improved adaptive gaussian mixture model for background
subtraction. In Proceedings of the 17th International Conference on Pattern
Recognition, 2004. ICPR 2004., volume 2, pages 28–31. IEEE, 2004.

[48] David E Allen, Michael McAleer, and Bernardo da Veiga. Modelling and fore-
casting dynamic var thresholds for risk management and regulation. Avail-
able at SSRN 926270, 2005.

[49] Xin-Yi Gong, Hu Su, De Xu, Zheng-Tao Zhang, Fei Shen, and Hua-Bin
Yang. An overview of contour detection approaches. International Journal
of Automation and Computing, 15(6):656–672, 2018.

[50] Ruchi Manish Gurav and Premanand K Kadbe. Real time finger tracking and
contour detection for gesture recognition using opencv. In 2015 International
Conference on Industrial Instrumentation and Control (ICIC), pages 974–
977. IEEE, 2015.

[51] Xhensila Poda and Olti Qirici. Shape detection and classification using
opencv and arduino uno. RTA-CSIT, 2280:128–36, 2018.

[52] Yiting Li, Haisong Huang, Qingsheng Xie, Liguo Yao, and Qipeng Chen.
Research on a surface defect detection algorithm based on mobilenet-ssd.
Applied Sciences, 8(9):1678, 2018.
References 34

[53] Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, and Xiangyu
Zhang. Bounding box regression with uncertainty for accurate object de-
tection. In Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 2888–2897, 2019.

[54] Wei Fang, Lin Wang, and Peiming Ren. Tinier-yolo: A real-time object
detection method for constrained environments. IEEE Access, 8:1935–1944,
2019.

[55] Nadav David Marom, Lior Rokach, and Armin Shmilovici. Using the confu-
sion matrix for improving ensemble classifiers. In 2010 IEEE 26-th Conven-
tion of Electrical and Electronics Engineers in Israel, pages 000555–000559.
IEEE, 2010.

[56] Guanqing Li, Zhiyong Song, and Qiang Fu. A new method of image detection
for small datasets under the framework of yolo network. In 2018 IEEE
3rd Advanced Information Technology, Electronic and Automation Control
Conference (IAEAC), pages 1031–1035. IEEE, 2018.

[57] Joseph Redmon. Darknet: Open source neural networks in c. http://


pjreddie.com/darknet/, 2013–2016.
Appendix

We have Achieved the desired output by developing and using this code.
1 import cv2 # to install this cv2 "pip
install opencv-python" in command prompt
2 import numpy as np
3 import time
4 from sklearn import metrics
5
6 #greater than or equal to variable "maxObjPerFrame" (
object count) => frame will be saved
7 maxObjPerFrame = 2
8 #Do have the weights and .cfg file in the same folder
where you have the code file
9 net = cv2.dnn.readNet("yolov3_custom2_last.weights","
yolov3_custom2.cfg")
10
11 #defining parameters related Video Saving Function
12 codec = cv2.VideoWriter_fourcc(’X’, ’V’, ’I’, ’D’)
13 framerate = 29
14 resolution = (640, 480)
15
16 classifierOut = cv2.VideoWriter("C:\\Users\\PC\\Desktop\\
WorkSpace\\ImageProcessingProject\\secondSetData\\
classifierResult.avi", codec, framerate, resolution)
17 croppedOut = cv2.VideoWriter("C:\\Users\\PC\\Desktop\\
WorkSpace\\ImageProcessingProject\\secondSetData\\
croppedResult.avi", codec, framerate, resolution)
18 substractionOut = cv2.VideoWriter("C:\\Users\\PC\\Desktop
\\WorkSpace\\ImageProcessingProject\\secondSetData\\
substractionResult.avi", codec, framerate, resolution)
19
20 classes = []
21 with open("obj.names","r") as f:
22 classes = [line.strip() for line in f.readlines()]
23

35
References 36

24 print(classes)
25
26 layer_names = net.getLayerNames()
27 outputlayers = [layer_names[i[0] - 1] for i in net.
getUnconnectedOutLayers()]
28
29 colors= np.random.uniform(0,255,size=(len(classes),3))
30
31 #loading video
32 cap=cv2.VideoCapture("birdcloud1.mp4") #Do have
the video in the same folder where you have the code
file
33 backSub = cv2.createBackgroundSubtractorMOG2(history= 2,
varThreshold=10, detectShadows=False)
34 font = cv2.FONT_HERSHEY_DUPLEX
35 starting_time= time.time()
36 frame_id = 0
37
38 ################Important Do create a Folder with name =>
"Images"
39 #Saving Image function in Images folder
40 pictureCount = 1
41 def saveImage(img, pictureCount):
42 name = ’./Images/image_’+ str(pictureCount)+ ’.jpg’
43 print (’Creating...’ + name)
44 cv2.imwrite(name, img)
45
46 #previously created cropped and substraction function
47 def otherWindows(image):
48 output_img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# converting image to grayscale
49 fgMask = backSub.apply(output_img) # applying
background subtraction
50 contours, _ = cv2.findContours(fgMask, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
51 #detections = []
52 for cnt in contours:# here you are going through each
detected object
53 # Area is calculated and all small elements are
removed
54 cont_ar = cv2.contourArea(cnt) #counting area in
pixcels of an object
55 if (cont_ar > 15 and cont_ar < 20000): # if
References 37

object is larger than 15 px mark it on


detection and add to detection list
56 x, y, w, h = cv2.boundingRect(cnt)
57
58 font = cv2.FONT_HERSHEY_SIMPLEX
59 x_end=x+w+250
60 y_end=y+h+250
61 if x_end>=width:
62 x_end=width
63 if y_end>=height:
64 y_end=height
65 print(output_img.shape,y,x,y_end,x_end)
66 cropped=image[y:y_end, x:x_end] # image
cropped and ready to be classified
67 #cropped = cv2.resize(cropped, None, fx=2, fy
=2)
68
69 ##Saving the croppedVideo
70 croppedOut.write(cropped)
71
72 #cv2.namedWindow(’cropped’, cv2.WINDOW_NORMAL
)
73 cv2.namedWindow(’cropped’,cv2.
WINDOW_FREERATIO)
74 cv2.imshow(’cropped’, cropped)
75
76 #cv2.waitKey(0) #press spacebar for next
frame
77
78 cv2.rectangle(fgMask, (x, y), (x + w, y + h),
(255,0,0), 3)
79
80 #displaying background subtraction results
81 cv2.namedWindow(’Background subtraction method’, cv2.
WINDOW_NORMAL)
82 cv2.imshow(’Background subtraction method’,fgMask)
83 ##Saving the substractionVideo
84 substractionOut.write(fgMask)
85
86
87 #second Classifier creating Function
88 def rectangleFrame(img):
89 height,width,channels = img.shape
References 38

90 blob_c = cv2.dnn.blobFromImage(img,0.00392,(320,320)
,(0,0,0),True,crop=False)
91 net.setInput(blob_c)
92 outs_c = net.forward(outputlayers)
93
94 for out in outs_c:
95 for detection in out:
96 scores = detection[5:]
97 class_id = np.argmax(scores)
98 confidence = scores[class_id]
99
100 if confidence > 0.3:
101 center_x= int(detection[0]*width)
102 center_y= int(detection[1]*height)
103 w = int(detection[2]*width)
104 h = int(detection[3]*height)
105
106 x=int(center_x - w/2)
107 y=int(center_y - h/2)
108
109 cv2.rectangle(img,(x,y),(x+w,y+h),colors[
class_id],3)
110
111
112
113
114
115 y_true = [’bird’, ’cloud’, ’bird’, ’bird’, ’cloud’, ’bird
’, ’bird’, ’bird’, ’bird’,
116 ’bird’, ’cloud’, ’cloud’, ’cloud’, ’cloud’, ’
cloud’, ’cloud’, ’cloud’, ’bird’,
117 ’bird’, ’bird’, ’cloud’, ’cloud’, ’bird’, ’bird
’, ’bird’, ’bird’, ’bird’, ’bird’,
118 ’bird’, ’bird’, ’bird’, ’bird’, ’bird’, ’bird’,
’bird’, ’bird’, ’bird’,
119 ’bird’, ’bird’, ’bird’, ’bird’, ’bird’, ’bird’,
’cloud’, ’bird’, ’bird’,
120 ’bird’, ’bird’, ’bird’, ’bird’, ’bird’, ’bird’,
’cloud’, ’bird’, ’bird’,
121 ’bird’, ’bird’, ’bird’, ’bird’, ’bird’, ’bird’,
’bird’, ’bird’, ’bird’,
122 ’bird’, ’bird’, ’bird’, ’bird’, ’bird’, ’bird’,
’bird’, ’bird’, ’bird’,
References 39

123 ’bird’, ’bird’, ’cloud’, ’bird’, ’bird’, ’bird’


, ’bird’, ’cloud’, ’bird’,
124 ’cloud’, ’cloud’, ’cloud’]
125
126
127 y_pred = []
#
Classifier creating database for confusion matrix
128
129 while True:
130 ret,frame= cap.read()
131 #image = frame
132 if ret is False:
133 break
134 otherWindows(frame) #calling the previously
created cropped and substraction program
135 copyFrame = frame.copy()
136
137 frame_id+=1
138 misCalc =0.3
139 height,width,channels = frame.shape
140 #detecting objects
141 blob = cv2.dnn.blobFromImage(frame,0.00392,(320,320)
,(0,0,0),True,crop=False)
142
143 net.setInput(blob)
144 outs = net.forward(outputlayers)
145
146 #####################
147 count = 0 #object Count
148 for out in outs:
149 for detection in out:
150 scores = detection[5:]
151 class_id = np.argmax(scores)
152 confidence = scores[class_id]
153
154 if confidence > 0.3:
155 count+=1
156 center_x= int(detection[0]*width)
157 center_y= int(detection[1]*height)
158 w = int(detection[2]*width)
159 h = int(detection[3]*height)
160
References 40

161 x=int(center_x - w/2)


162 y=int(center_y - h/2)
163 label = str(classes[class_id])
164
165 ###############cropped###################
166 y_min = 0 if((int(y)-10)<=0) else (int(y)
-10) #resolution for the cropped
image
167 x_min = 0 if((int(x)-10)<=0) else (int(x)
-10) #resolution for the cropped
image
168
169 cropped_img = copyFrame[y_min:int(y+h)
+50, x_min:int(x+w)+50] #
creating the cropped image
170
171 #rectangleFrame(cropped_img)

#Second classifier is created and


frame created by the 2nd classifier
172 #cv2.rectangle(copyFrame, (x_min,y_min),
(x_min+int(x+w),y_min+int(y+h)),
colors[class_id], 3) #default
rectangle frame over the Original
video frame directly visualized in
cropped frame
173 cv2.rectangle(cropped_img, (0,0), (abs(
x_min-int(x+w)), abs(y_min-int(y+h))),
colors[class_id], 3) #default
rectangle frame over the cropped frame
directly
174
175 cv2.namedWindow(label,cv2.
WINDOW_FREERATIO)
176 cv2.imshow(label, cropped_img)
177 ##########################################

178
179 cv2.rectangle(frame,(x,y),(x+w,y+h),
colors[class_id],3)
180 l= len(label+" "+str(round(confidence,2))
)
181
References 41

182 cv2.rectangle(frame, (x, y), (x+l*25 , y


-80), colors[class_id], -1)
183 cv2.putText(frame,label+" "+str(round(
confidence,2)),(x,y-30),font
,1.5,(255,255,255),2)
184
185
186 if((frame_id%1) == 0):
187 if(label == "bird" or label == "blade
"):
188 y_pred.append(label)
189 elif((frame_id%30) == 0):
190 y_pred.append(label)
191
192 if(count >= maxObjPerFrame and (
frame_id%30) == 0):
193 saveImage(frame, pictureCount)
194 pictureCount+=1
195
196
197 elapsed_time = time.time() - starting_time
198 fps=frame_id/elapsed_time
199
200 ##########image = frame
201 frame = cv2.resize(frame, None, fx=0.4, fy=0.2)
202 #cv2.putText(frame,"FPS:"+str(round(fps,2)),(10,50),
font,2,(0,0,0),1)
203
204 ##Saving the Classifier Video
205 classifierOut.write(frame)
206
207 #cv2.namedWindow("Output VIDEO",cv2.WINDOW_FREERATIO)
208 cv2.imshow("Output VIDEO",frame) ##To see the
Classifier output, you customize based on your
wish like contour and all
209 #cv2 is the instance you can utilize and do what ever
function on it
210 #name = ’./Images/image_’+ str(pictureCount)+ ’.jpg’
211 #print (’Creating...’ + name)
212 #cv2.imwrite(name, frame)
213 #pictureCount+=1
214
215 #Esc key stops the process of visualization window
References 42

216 key = cv2.waitKey(1)


217 if key == 27:
218 break;
219
220 ##
221
222 #printing the predicted Value by the classifier for
manual calculation usage
223 print("=============>" , y_pred)
224
225 #Confusion Matrix Algorithm
226 print("Confusion Matrix")
227 print(metrics.confusion_matrix(y_true, y_pred))
228
229 classifierOut.release()
230 croppedOut.release()
231 substractionOut.release()
232 cap.release()
233 cv2.destroyAllWindows()

You might also like