0% found this document useful (0 votes)
47 views58 pages

14C Thesis

The document is a project report submitted by four students to partially fulfill the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The project aims to classify animals into four species - tiger, cheetah, jaguar and hyena - based on their characteristic body markings. Photographs of wild animals taken via motion-triggered cameras are increasingly common but manually analyzing large numbers of images is time-consuming. The project seeks to develop a solution to identify animal species in images to reduce the workload for conservationists.

Uploaded by

Testing Project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views58 pages

14C Thesis

The document is a project report submitted by four students to partially fulfill the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The project aims to classify animals into four species - tiger, cheetah, jaguar and hyena - based on their characteristic body markings. Photographs of wild animals taken via motion-triggered cameras are increasingly common but manually analyzing large numbers of images is time-consuming. The project seeks to develop a solution to identify animal species in images to reduce the workload for conservationists.

Uploaded by

Testing Project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

CLASSIFICATION OF ANIMALS BASED ON

CHARACTERISTIC BODY MARKINGS


A PROJECT REPORT
Submitted in partial fulfilment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
BY

VADLANA SUBHASH VOONNA ABHISHEK


(18331A05G3) (18331A05G9)

SAGI GAYATHRI TIRUMALARAJU MOHITH VARMA


(18331A05C7) (18331A05F8)

Under the Supervision of


Mrs. M. BEULAH RANI
Assistant Professor
Computer Science and Engineering Department
M.V.G.R College of Engineering

1|Page
CLASSIFICATION OF ANIMALS BASED ON
CHARACTERISTIC BODY MARKINGS
A PROJECT REPORT
Submitted in partial fulfilment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
BY

VADLANA SUBHASH VOONNA ABHISHEK


(18331A05G3) (18331A05G9)

SAGI GAYATHRI TIRUMALARAJU MOHITH VARMA


(18331A05C7) (18331A05F8)

Under the Supervision of


Mrs. M. BEULAH RANI
Assistant Professor
Computer Science and Engineering Department
M.V.G.R College of Engineering

2|Page
Maharaj Vijayaram Gajapathi Raj (MVGR) College of Engineering (Autonomous)
Vizianagaram

CERTIFICATE

This is to certify that the project report entitled “Classification of Animals based on
Characteristic Body Marking” being submitted by Vadlana Subhash, Voonna Abhishek,
Sagi Gayathri, Tirumalaraju Mohith Varma bearing registered numbers 18331A05G3,
18331A05G9, 18331A05C7, 18331A05F8 respectively, in partial fulfilment for the award of
the degree of “Bachelor of Technology” in Computer Science andEngineering is a record
of bonafide work done by them under my supervision during the academic year 2021-2022.

HOD CSE SUPERVISOR


Dr. P. RAVIKIRAN VARMA Mrs. M. BEULAH RANI
Head of the Department, Assistant Professor,
Department of CSE, Department of CSE,
MVGR College of Engineering, MVGR College of Engineering,
Vizianagaram. Vizianagaram.

External Examiner

3|Page
DECLARATION

We hereby declare that the work done on the dissertation entitled


“CLASSIFICATION OF ANIMALS BASED ON CHARACTERISTIC BODY
MARKINGS” has been carried out by us and submitted in partial fulfilment for the
award of credits in Bachelor of Technology in Computer Science and Engineering of
MVGR College of Engineering (Autonomous) and affiliated to Jawaharlal Nehru
Technological University (Kakinada). The various contents incorporated in the
dissertation have not been submitted for the award of any degree of any other institution
or university.

4|Page
ACKNOWLEDGEMENTS
The success and final outcome of this project required a lot of guidance and assistance from
many people and we are extremely privileged to have got this all along the completion of my
project. All that we have done is only due to such supervision and assistance and we would not
forget to thank them.

We place on record our heartfelt appreciation and gratitude to Mrs. M. Beulah Rani for the
immense cooperation and navigation as mentor in bringing out the project work under his
guidance. We are deeply indebted to him for his excellent, enlightened and enriched guidance.

We also thank Dr. K. V. L. Raju, Principal, and Dr. P. Ravi Kiran Varma, Head of the
Department, for extending their utmost support and co-operation in providing all the
provisions for the successful completion of the project.

We are also thankful for and fortunate enough to get constant encouragement and guidance
from our panel members Mrs. B. Aruna Kumari, Mrs. K. Santosh Jhansi and Mr. K. Leela
Prasad. We sincerely thank all the members of the staff in the Department of Computer
Science & Engineering for their sustained help in our pursuits. We thank all those who
contributed directly or indirectly in successfully carrying out this work.

Vadlana Subhash

Voonna Abhishek

Sagi Gayathri

Tirumalaraju Mohith Varma

5|Page
ABSTRACT

Photographs of wild animals in their natural habitats can be recorded


unobtrusively via cameras that are triggered by motion nearby. The installation of
such camera traps is becoming increasingly common across the world. Although
this is a convenient source of invaluable data for biologists, ecologists and
conservationists, the arduous task of poring through potentially millions of
pictures each season introduces prohibitive costs and frustrating delays.

The objective of our project is to help in classifying the animals present in the
images, into four different species of animals namely Tiger, Cheetah, Jaguar and
Hyena.

Checking the movements and population of wild animals is a tiresome process


when done manually, without which may cause damage to villages in rural areas
or it may affect the population of animals to become an endangered species. It also
makes wild animals vulnerable to poachers and illegal animal trafficking groups.

To reduce burden on the officials and conservationists, we developed a project


called classification of animals based on their characteristic body markings. It
helps in identifying the species of the animal present in the given input image
among four different species namely tiger, cheetah, jaguar and hyena.

6|Page
LIST OF CONTENTS
CONTENTS Page No

LIST OF ABBREVATIONS 9
LIST OF FIGURES 10

1. INTRODUCTION 12

1.1 PROJECT OVERVIEW 12


1.2 PURPOSE 12
1.3 OBJECTIVE 13
1.4 LITERATURE SURVEY 13
1.5 REQUIREMENTS SPECIFICATION 17
1.5.1 HARDWARE SPECIFICATIONS 17
1.5.2 SOFTWARE SPECIFICATIONS 18

2. SYSTEM STUDY AND ANALYSIS 20

2.1 EXISTING SYSTEM 20


2.2 PROPOSED SYSTEM 20
2.3 THEORITICAL BACKGROUND 21
2.3.1 CLASSIFICATION 21
2.3.2 DEEP LEARNING 23
2.3.3 CORE CONCEPTS OF DEEP LEARNING 24
2.3.4 WHY DEEP LEARNING 25
2.3.5 CONVOLUTIONAL NEURAL NETWORK 26
2.3.6 CNN LAYERS 29
2.3.7 RESNET 34

3. SYSTEM DESIGN 40

3.1 BLOCK DIAGRAM 40

4. IMPLEMENTATIONS 41

4.1 DATASET 41

7|Page
4.2 SAMPLE IMAGES 41
4.3 CODE 43

5. EXPERIMENTAL RESULTS 53

5.1 CLASSIFICATION RESULTS 53


5.2 GRAPHS 53
5.3 TEST CASE SCENARIOS 54
5.4 COMPARISION WITH OTHER MODELS 55

6. CONCLUSION 57

7. REFERENCES 58

8|Page
LIST OF ABBREVATIONS

CNN - Convolution Neural Networks

ANN - Artificial Neural Networks

AI - Artificial Intelligence

DSSA - Data Science and Advanced Analytics

SVM - Support Vector Machine

DL - Deep Learning

ML - Machine Learning

EDA - Exploratory Data Analysis

IDA - Initial Data Analysis

9|Page
LIST OF FIGURES
Figure 2-1 Existing System Flow Chart

Figure 2-2 Proposed System Flow Chart

Figure 2-3 Deep Learning Model

Figure 2-4 Why Deep Learning

Figure 2-5 Deep Learning vs Machine Learning

Figure 2-6 Block Diagram of CNN

Figure 2-7 CNN General Architecture with Several Layers

Figure 2-8 Input and Filter for Convolution

Figure 2-9 Filters slides over input image forms output matrix

Figure 2-10 First step of CNN

Figure 2-11 The ReLU Operation

Figure 2-12 The Max Pooling Operation

Figure 2-13 The Average Pooling Operation

Figure 2-14 The Flattening Operation

Figure 2-15 The Fully Connected Layer Operation

Figure 2-16 The SoftMax Layer Operation

Figure 2-17 The ResNet Building block

Figure 2-18 The Structure of the Improved ResNet-18 Model

Figure 3-1 Block Diagram for Proposed Model

Figure 4-1 Sample Images of Cheetah

Figure 4-2 Sample Images of Jaguar


10 | P a g e
Figure 4-3 Sample Images of Hyena

Figure 4-4 Sample Images of Tiger

Figure 5-1 CNN loss Graph

Figure 5-2 CNN Accuracy Graph

11 | P a g e
CHAPTER 1: INTRODUCTION
1.1 PROJECT OVERVIEW
Checking the movements and population of wild animals is a tiresome
process when done manually, without which may cause damage to villages in rural
areas or it may affect the population of animals to become an endangered species.
It also makes wild animals vulnerable to poachers and illegal trafficking groups.
To reduce burden on the officials and conservationists, we developed this
project called classification of animals based on their characteristic body
markings. This project helps in identifying the species of the animal present in
the given input image among four different species namely tiger, cheetah, jaguar
and hyena.
To identify the species of the animal we perform multi-class classification
among the four species of animals taken, using the Resnet-18 architecture. It is a
Convolution Neural Network (CNN) which is 18 layers deep. After training the
model with the images available in the dataset, the model will become capable of
identifying the species of the animal.
After training the model, we use the model for prediction. We take an image
as an input and display the probabilistic species name to which it belongs to.

1.2 PURPOSE
The Purpose of our project is to help people like biologists, ecologists and
conservationists, in identifying the species of animals present in the images. This
project can be used to help increase the population of wild animals by keeping an
eye on animals present in the territory and also their numbers.
This can also help in keeping poachers and illegal traffickers in-checkin
sanctuaries and in wild forests by keeping close eye on the animals present in the
territories, this may help in improving the numbers of a few endangered species of
animals.

12 | P a g e
1.3 OBJECTIVE
The objective of our project is to help in classifying the animals present in
the images, into four different species of animals namely Tiger, Cheetah, Jaguar
and Hyena.
In general, it will be too difficult to recognize among these four species of
animals, which are nearly similar to each other. So, by using this tool we will be
easily able to identify the species of the animal
1.4 LITERATURE SURVEY
Previously there exist many image classification algorithms which are developed
on the cat family, But many of them are developed on binary classification (only
on one animal). We are implementing the multi-class classification on the same
species of 4 animals, where we get more than 90% accuracy.
[1]Animal Detection Using Deep Learning Algorithm N. Banupriya1, S. Saranya2,
Rashmi Swaminathan3, Sanchithaa Harikumar4, Sukitha Palanisamy5
Many of them are implemented on different animals and for example (tiger and
other animals (other than cat family), we achieve more accuracy than many of
them. Checking wild animals in their common environment is crucial. This
proposed work develops an algorithm to detect the animals in wildlife. Since there
are many different animals, manually identifying them can be a difficult task. This
algorithm classifies animals based on their images so we can monitor them more
efficiently. Animal detection and classification can help to prevent animal- vehicle
accidents, trace animals and prevent theft. This can be achieved by applying
effective deep learning algorithms.
Thus this project uses the Convolutional Neural Network (CNN) algorithm to
detect wild animals. The algorithm classifies animals efficiently with a good
number of accuracy and also the image of the detected animal is displayed for a
better result so that it can be used for other purposes such as detecting wild animals
entering into human habitat and to prevent wildlife poaching.

[2].Cheetah or Jaguar: Image Classification | Convolutional Neural Network An


image classifying project that differentiates between two very similar-looking wild
cats: Cheetahs and Jaguars using Python and TensorFlow.
In this project used a Deep Learning Technique known as Convolutional Neural
Networks (CNN) for this classification problem. CNN are Artificial Neural
Networks that are very popular amongst Image Processing Problems.
CNN Image Classifications take an input image, process it and classify it under
13 | P a g e
given classes (Cheetah or Jaguar in our case). Computers see an image as an array
of pixels which depends on the image resolution. CNN specializes in recognizing
and detecting patterns (edges, shapes, corners, etc in our case) which makes it
perfect for these types of projects.
convolutional-neural-network-437534643262 - in this they are using to identify
the cheetah and Jaguar, they produce the result of (Training accuracy = 87.5% |
Validation accuracy = 85.5%)
Classifying 2 animals using their images is a very basic yet foundational part of
a larger implementation idea that I wish to discuss.
The coat patterns of cheetahs and jaguars differ. Their coat patterns are one-of-a-
kind, just like human fingerprints. So, with the right datasets and possibly a beefier
model, we should be able to identify the precise individual animal merely by their
pictures (Biometrics for animals).
This could replace the need for an expert to identify every animal spotted on, for
example, a night camera in a wildlife sanctuary or National Park. Further, we could
create a pipeline that collects data from cameras installed in a Sanctuary, classifies
and labels the individual and stores it with relevant information like camera spot,
individual name, date and time, etc.
This way the wildlife experts can prioritize their time in analyzing the collected
data to know more information about individual animal behaviour, their territories,
breeding habits, hunting patterns, etc. This can also be helpful in case a new
individual is spotted or for security purposes in terms of poaching.

[3] Automatically Counting and Identifying Breeds of Different Animals using


Neural Networks Sapna Khatri1 , Anjali Rajput2 , Shreya Alhat3 , Vaishnavi
Gursal4 , Prof. Jyoti Deshmukh5 1, 2, 3, 4, 5Computer Department, Bhivarabai
Sawant Institute of Technology and Research
There are many studies carried out in the field of animal recognition using image
processing by many researchers which are implemented in various applications
like electronic medical record for animals that helps to identify dogs using image
processing. Some researchers have worked on detection of fiducial points on faces
that have increased progress in the field of machine learning also some
researchers have made application which helps to find missing puppies by
extracting the facial feature by using convolutional neural networks. These papers
have major focus on image processing but according to these proposed systems
the process is complex and pre-processing maybe not be accurate if the animal in
the image is partially hidden. Also, if there are many different animals in the image
14 | P a g e
the proposed system does not recognize all the animals in the image. Through this
study, we are trying to overcome shortcomings of these system
There were two important stages after data processing. First one was the coarse
stage for breed classification and the second one was a fine stage for dog
identification with an accuracy of 70.94% .
the advanced technology is used for animal identification with their count of
breeds. The proposed method was tested via a case study on publicly available
animals’ dataset. As an ecologist are having more interest in identifying individual
animals with their count of breeds. Thus, the different local and textualfeatures
were extracted from image by using CNN (Convolutional Neural Network).

[4] Machine Vision Classification of Animals Mark Dunn Faculty of Engineering


and Surveying, University of Southern Qld Prof John Billingsley Faculty of
Engineering and Surveying, University of Southern Qld Neal Finch School of
Animal Studies, University of Queensland.
This paper proposes a machine vision system suitable for the automatic
classification of animal species. Edge tracking and silhouette encapsulation using
s- psi coding is used to match the outline shape of the animal against a normalised
library. An experimental system is described, and results from initial tests are
reported. The proposed system has great potential for a low-cost, robust system
to remotely monitor or classify animals.
A sample of six sheep and six goats were combined in an area where the access
to a water trough was past a camera and blue screen. Data was obtained over two
days of the animals randomly passing in front of the camera. In all cases where the
animals passed singly in front of the camera (22 of 24 occurrences), they were
identified as the correct species. In the other two occurrences, an object was
identified, but species could not be determined with high enough probability due
to multiple animals bunching together as they passed the camera.
The system developed demonstrates the feasibility of machine vision systems for
remote sensing and monitoring in the agricultural field. A high degree of accuracy
has been achieved with simple robust algorithms. The limitations of the current
system are that the animals must pass before the camera in single file. If there is
no space between animals, the silhouette boundary method becomes incorrect.
Further work is being conducted to implement a probabilistic match on only the
top half of the object. Given the object direction, the system should be able to
identify the animal very early in the trace from the shape of the head and neck,
giving an approximate position for the start of the next animal.
15 | P a g e
[5] Animal Classification System Based on Image Processing & Support Vector
Machine A. W. D. Udaya Shalika, Lasantha Seneviratne Sri Lanka Institute of
Information Technology, Malabe, Sri Lanka.
This project is mainly focused to develop system for animal researchers & wild
life photographers to overcome so many challenges in their day life today. When
they engage in such situation, they need to be patiently waiting for long hours,
maybe several days in whatever location and under severe weather conditions until
capturing what they are interested in. Also there is a big demand for rare wild life
photo graphs. The proposed method makes the task automatically use
microcontroller controlled camera, image processing and machine learning
techniques. First with the aid of microcontroller and four passive IR sensors
system will automatically detect the presence of animal and rotate the camera
toward that direction. Then the motion detection algorithm will get the animal into
middle of the frame and capture by high end auto focus web cam. Then the
captured images send to the PC and are compared with photograph database to
check whether the animal is exactly the same as the photographer choice. If that
captured animal is the exactly one who need to capture then it will automatically
capture more. Though there are several technologies available none of these are
capable of recognizing what it captures. There is no detection of animal presence
in different angles. Most of available equipment uses a set of PIR sensors and
whatever it disturbs the IR field will automatically be captured and stored. Night
time images are black and white and have less details and clarity due to infrared
flash quality. If the infrared flash is designed for best image quality, range will
be sacrificed. The photographer might be interested in a specific animal but there
is no facility to recognize automatically whether captured animal is the
photographer’s choice or not.

[6]J. P. Dominguez-Morales et al., "Wireless Sensor Network for Wildlife


Tracking and Behaviour Classification of Animals in Doñana," in IEEE
Communications Letters, vol. 20, no. 12, pp. 2534-2537, Dec. 2016, doi:
10.1109/LCOMM.2016.2612652.

16 | P a g e
systems produce good results; however, none of them obtains a high-accuracy
classification because of the lack of information. Doñana National Park is a very
rich environment with various endangered animal species. Thereby, this park
requires a more accurate and efficient system of monitoring to act quickly against
animal behaviours that may endanger certain species. In this letter, we propose a
hierarchical, wireless sensor network installed in this park, to collect information
about animals' behaviours using intelligent devices placed on them which contain
a neural network implementation to classify their behaviour based on sensory
information. Once a behaviour is detected, the network redirects this information
to an external database for further treatment. This solution reduces power
consumption and facilitates animals' behaviours monitoring for biologists.

1.5 REQUIREMENTS SPECIFICATION


1.5.1 HARDWARE SPECIFICATIONS
1.5.1.1 GPU
A graphics processing unit is a specialized electronic circuit designed to
rapidly manipulate and alter memory to accelerate the creation of images in a
frame buffer intended for output to a display device. GPUs are used in embedded
systems, mobile phones, personal computers, workstations, and game consoles.
1.5.1.2 RAM
RAM (Random Access Memory) is the hardware in a computing device
where the operating system, application programs and data in current use are kept
so they can be quickly reached by the device's processor. RAM is the main
memoryin a computer, and it is much faster to read data from RAM.
The Ram required is 8GB.
1.5.1.3 HARD DISK
A hard disk drive (HDD) is a non-volatile computer storage device
containing magnetic disks or platters rotating at high speeds. It is a secondary
storage device which is used to store data permanently.250Gb Hard Disk is
required

17 | P a g e
1.5.2 SOFTWARE SPECIFICATIONS
1.5.2.1 OPERATING SYSTEM

An Operating System (OS) is an interface between a computer user and


computer hardware. An operating system is a software which performs all the
basic tasks like file management, memory management, process management,
handling input and output, and controlling peripheral devices such as disk drives
and printers. Some popular operating systems include Linux Operating System,
Windows Operating System, VMS, OS/400, AIX, z/OS, etc.
We have used Windows OS.

1.5.2.2 PYTHON 3.8


Python is an interpreted, object-oriented, high-level programming
language withdynamic semantics. Its high-level built in data structures, combined
with dynamic typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or glue language to
connect existing components together. Python's simple, easy to learn syntax
emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity
and code reuse. The Python interpreter and the extensive standard library are
available in source or binary form without charge for all major platforms, and can
be freely distributed.
Often, programmers fall in love with Python because of the increased
productivity it provides. Since there is no compilation step, the edit-test-debug
cycle is incredibly fast. Debugging Python programs is easy: a bug or bad input
will never cause a segmentation fault. Instead, when the interpreter discovers an
error, it raises an exception. When the program doesn't catch the exception, the
interpreter prints a stack trace. A source level debugger allows inspection of local
and global variables, evaluation of arbitrary expressions, setting breakpoints,
stepping through the code a line at a time, and so on. The debugger is written in
Python itself, testifying to Python's introspective power

18 | P a g e
Libraries Used
1.Torch
PyTorch is defined as an open source machine learning library for Python.
It is used for applications such as natural language processing. It is initially
developed by Facebook artificial-intelligence research group, and Uber’s Pyro
software for probabilistic programming which is built on it.
The major features of PyTorch are mentioned below −
● Easy Interface − PyTorch offers easy to use API; hence it is considered
to be very simple to operate and runs on Python. The code execution in this
framework is quite easy.
● Python usage − This library is considered to be Pythonic which smoothly
integrats with the Python data science stack. Thus, it can leverage all the
services and functionalities offered by the Python environment.
● Computational graphs − PyTorch provides an excellent platform which
offers dynamic computational graphs. Thus a user can change them during
runtime. This is highly useful when a developer has no idea of how much
memory is required for creating a neural network model.
2.Matplotlib
Matplotib is a comprehensive library for creating static, animated and
interactive visualizations. Matplotlib produces publication-quality figures in a
variety of hardcopy formats and interactive environments across platforms.
Matplotlib can be used in Python scripts, the Python and IPython shell, web
application servers, and various graphical user interface toolkits.
3.OS
The OS module in python provides functions for interacting with the
operating system. OS, comes under Python’s standard utility modules. This
module provides a portable way of using operating system dependent
functionality. The “os” and “os.path” modules include many functions to interact
with the file system

19 | P a g e
CHAPTER 2: SYSTEM STUDY AND ANALYSIS

2.1 EXISTING SYSTEM


● The animal classification applications implemented use a binary pattern
classification task.
● When given a new input image, the classifier will be able to decide if the
sample belongs to a species or not.

Figure 2-1 Existing System flow chart


In the above flow chart, we can see that the result obtained is whether thegiven
input image is Cheetah or not.
● It is done using machine learning and computer vision.

2.2 PROPOSED SYSTEM


The proposed system is to create an Animal classification system with deep
learning that will help to recognize animals easily among the four species.
● The project recognizes the animal on the image that is fed to the system.
● The major focus is on the body patterns of the animals that helps to identify

20 | P a g e
the animal easily.
● Once the animal is recognized, the animal is displayed along with the
probabilistic class.
● In this project we perform a multi-class classification to identify the species
of animal present in the image.

Figure 2-2 Proposed System flow chart

2.3 THEORITICAL BACKGROUND


2.3.1 CLASSIFICATION
The aim of classification is to determine which category an observation
belongs to, and this is done by understanding the relationship between the
dependent variable and the independent variables. Here the dependent variable is
categorical, while the independent variables can be numerical or categorical. As
there is a dependent variable that allows us to establish the relationship between
the input variables and the categories, classification is predictive modeling that
works under a supervised learning setup.
21 | P a g e
Depending upon the dependent variable’s nature, different machine
learning classification techniques can be understood.
Thus, a classification can be of multiple types, and depending upon the
business problem. We have to identify the kind of classification technique and the
algorithms involved in such techniques.

Of the various classification techniques, the most common ones are the following-

2.3.1.1 BINARY CLASSIFICATION


The most basic and commonly used form of classification is a binary
classification. Here, the dependent variable comprises two exclusive categories
that are denoted through 1 and 0, hence the term Binary Classification. Often 1
means True and 0 means False. For example, if the business problem is whether
the bank member was able to repay the loan and we have a feature/variable that
says “Loan Defaulter,” then the response will either be 1 (which would mean True,
i.e., Loan defaulter) or 0 (which would mean False, i.e., Non-Loan Defaulter). This
classification has often formed the basis of various classification algorithms and is
the kind of classification technique that is foremost understood.

2.3.1.2 BINOMIAL CLASSIFICATION


This classification type is technically like Binary Classification only as the Y
variable comprises two categories. However, these categories may not be in the
form of True and False. For example, if we have a dataset for multiple features
that denote pixel density, we have a Y variable with two categories – “Car” or
“Bike,” This type of classification is known as Binomial Classification. From a
practical point of view, especially as far as Machine Learning is concerned, there
is no difference as these two categories can also be encoded and denoted as 0 and
1, making this type look like a Binary Classification only.

2.3.1.3 MULTI-CLASS CLASSIFICATION


An advanced form of classification, multi-class classification, is when the Y
variable is comprised of more than two classes/categories. Here each observation
belongs to a class, and a classification algorithm has to establish the relationship
between the input variables and them. Therefore, during prediction, each
observation is assigned to a single exclusive class. For example, a business
problem where there is a need to categorize whether an observation is “Safe,” “At-

22 | P a g e
Risk,” or “Unsafe” then would be classed as a multi-class classification problem.
Note – Each observation can belong to only one class, and multiple classes can’t
be assigned to observation. Thus here, observation will either be “Safe” or “At-
Risk” or “Unsafe” and can’t be multiple things.

2.3.1.4 MULTI-LABEL CLASSIFICATION


This form of classification is similar to Multi-class classification. Here, the
dependent variable has more than 2 categories; however, it is different from multi-
class classification because here, an observation can be mapped to more than one
category. Therefore, the classification algorithm here has to understand which
classes an observation can be related to and understand the patterns accordingly.
A common use-case of these types of classification problem is foundin text mining
related classification where an observation (e.g., text from anewspaper article) can
have multiple categories in its corresponding dependent variable (such as
“Politics,” “Name of Politicians Involved,” “Important Geographical Location”
etc..).

Classification is a fascinating aspect of data science. While there are many


algorithms available that solve various business problems, a large number of these
algorithms belong to the field of classification. For any Data Scientist, learning
classification is of utmost importance, and to properly learn it. Aspirants must
focus on all the dimensions of classification- business problems that can be solved
through classification, inner workings of algorithms, evaluation, and validation
mechanism of a classification model.
So in this we are using multi class classification to determine the species
name.There are many algorithms in multi class classification. In this we are using
CNN with ResNet18 model in deep learning

2.3.2 DEEP LEARNING


Deep learning is a particular kind of machine learning that achieves great power
and flexibility by learning to represent the world as a nested hierarchy of concepts,
with each concept defined in relation to simpler concepts, and more abstract
representations computed in terms of less abstract ones.

23 | P a g e
In deep learning, we create an artificial structure called an artificial neural net
where we have nodes or neurons. We have some neurons for input value and some
for output value and in between, there may be lots of neurons interconnectedin the
hidden layer.

Figure 2-3 Deep Learning Model


ARCHITECTURE
1.Deep Neural Network
It is a neural network with a certain level of complexity (having multiple
hidden layers in between input and output layers). They are capable of
modelling and processing non-linear relationships.
2.Deep Belief Network
It is a class of Deep Neural Network. It is multi-layer belief networks.
3.Recurrent Network
Perform same task for every element of a sequence.
4.Neural Network
Allows for parallel and sequential computation. Similar to the human brain
(large feedback network of connected neurons). They are able to remember
important things about the input they received and hence enables them to
be more precise.
2.3.3 CORE CONCEPTS OF DEEP LEARNING
● Logistic Regression
The regression analysis identifies a connection between input variable to
predict outcome variables. It is a simple classification algorithm for learning
to make decisions between predicting different variables.
● Activation Function
Nonlinear activation functions are applied to layers and allow Neural
Networks to identify complex decision boundaries.

24 | P a g e
● Artificial Neural Network
Input data is taken in, transformed and applied. The repetition of steps
allows the artificial neural network to learn several layers of non-linear
features and ultimately creates a prediction as the final layer (output). It
learns by generating an error signal measuring the differences between
predictions.
● Layer
Deep learning consists of building blocks, and layers are the highest-level
building blocks bordered by input, and output layers are the hidden layers.
The received weighed input is transformed and then passed on as output to
the next layer.
● Artificial Neuron or Unit
A unit refers to the activation function and inhibits numerous incoming and
outcoming connections. More complex units are referred to as long or short
term memory units.
2.3.4 WHY DEEP LEARNING?
Deep learning lays at the forefront of AI helping shape the tools we use to
achieve tremendous levels of accuracy. Advances in deep learning have pushed
this tool to the point where deep learning outperforms humans in some tasks
like classifying objects in images.

Figure 2-4 Why Deep learning?

25 | P a g e
The most important difference between deep learning and traditional
machine learning is its performance as the scale of data increases. When the data
is small, deep learning algorithms don’t perform that well. This is because deep
learning algorithms need a large amount of data to understand it perfectly. On the
other hand, traditional machine learning algorithms with their handcrafted rules
prevail in this scenario.
In traditional Machine learning techniques, most of the applied features
need to beidentified by a domain expert in order to reduce the complexity of the
data andmake patterns more visible to learning algorithms to work. The biggest
advantage Deep Learning algorithms has is that they try to learn high-level
features from data in an incremental manner. This eliminates the need of domain
expertise and hard core feature extraction.

Figure 2-5 Deep Learning vs Machine Learning

Another major difference between Deep Learning and Machine Learning


technique is the problem-solving approach. Deep Learning techniques tend to
solve the problem end to end, whereas Machine learning techniques need the
problem statements to break down to different parts to be solved first and then
their results to be combine at final stage.

2.3.5 CONVOLUTION NEURAL NETWORK


● A Convolutional Neural Network (CNN) is a Deep Learning algorithm
which can take in an input image, assign importance (learnable weights and
biases) to various aspects/objects in the image and be able to differentiate
one from the other.

26 | P a g e
● The architecture of a CNN is analogous to that of the connectivity pattern
of neurons in the human brain and was inspired by the organization of the
Visual Cortex.
● Individual neurons respond to stimuli only in a restricted region of the visual
field known as the Receptive Field. A collection of such fields overlap to
cover the entire visual area.
● A CNN generally consists of:
1. Input Layer
2. Hidden Layers
3. Output layer

Figure 2-6 Block diagram of CNN


In neural networks, Convolutional neural network (CNN) is one of the main
categories to do images recognition, images classifications, objects detections,
recognition faces etc.
There are several differences to note between other methods and CNN.
1. First, CNN does not require hand-crafted feature extraction.
2. Second, CNN architectures do not necessarily require segmentation of
tumours or organs by human experts.
3. Third, CNN is far more data hungry because of its millions of learnable
parameters to estimate, and, thus, is more computationally expensive,
resulting in requiring graphical processing units (GPUs) for model training.

27 | P a g e
CNN image classifications take an input image, process it andclassify it under
certain categories. Computers sees an input image as array of pixels and it depends
on the image resolution. Based on the image resolution, it will see h x w x d
(h = Height, w = Width, d = Dimension).
Example., An image of 6 x 6 x 3 array of matrix of RGB (3 refers to RGB values)
and an image of 4 x 4 x 1 array of matrix of grayscale image.
Technically in deep learning CNN models, to train and test each input image
will pass it through a series of convolution layers with filters (Kernals), Pooling,
fully connected layers (FC) and apply Softmax function to classify an object with
probabilistic values between 0 and 1.

CNNs have two components:


1. The Hidden layers/Feature extraction part:
Here the network will perform a series of convolutions and pooling
operations during which the features are detected.
2. The Classification part:
Here, the fully connected layers will serve as a classifier on top of these
extracted features. They will assign a probability for the object on the image
being what the algorithm predicts it is.
The below figure is the complete flow of CNN to take an input image and classify
the objects based on the values.

Figure 2-7 CNN general architecture with several layers

28 | P a g e
2.3.6 CNN LAYERS
1.Convolution Layer
The convolutional layer is the core building block of a CNN. The layer’s
parameters consist of a set of learnable filters (or kernels), which have a small
receptive field, but extend through the full depth of the input volume.
Convolution Operation:

Convolution is the mathematical operation which is central to the efficacy of this


algorithm.
In purely mathematical terms, convolution is a function derived from two given
functions by integration which expresses how the shape of one is modified by the
other. It can be done using the below convolution formula:

Convolution in CNN is performed on an input image using a filter or a kernel. To


understand filtering and convolution you will have to scan the screen starting from
top left to right and moving down a bit after covering the width of the screenand
repeating the same process until you are done scanning the whole screen.
For instance, if the input image and the filter look like following:

Figure 2-8 Input and filter for convolution

The filter slides over the input image one pixel at a time starting from the top left.
The filter multiplies its own values with the overlapping values of the image while
sliding over it and adds all of them up to output a single value for each overlap
until the entire image is traversed:

29 | P a g e
Figure 2-9 Filter slides over input image forms output matrix

In the above animation the value 4 (top left) in the output matrix corresponds to
the filter overlap on the top left of the image which is computed as:

Computation: (1×1+0×1+1×1) +(0×0+1×1+1×0) +(1×0+0×0+1×1) =4


Figure 2-10 First step of convolution

Similarly, we compute the other values of the output matrix. Each value in our
output matrix is sensitive to only a particular region in our original image.
Thus, the feature map is derived.
The size of the Feature Map (Convolved Feature) is controlled by three
parameters that we need to decide before the convolution step is performed:
 Depth: Depth corresponds to the number of filters we use for the
convolution operation. The number of distinct filters used produces same
number of different feature maps. You can think of these feature maps as
stacked 2d matrices. So, the ‘depth’ of the feature map would be the number
of feature maps derived.
● Stride: Stride is the number of pixels by which we slide our filter matrix
over the input matrix. When the stride is 1 then we move the filters one pixel
at a time. When the stride is 2, then the filters jump 2 pixels at a time as we
slide them around. Having a larger stride will produce smaller featuremaps.
● Zero-padding: Sometimes, it is convenient to pad the input matrix with
zeros around the border, so that we can apply the filter to borderingelements
of our input image matrix. A nice feature of zero padding is thatit allows
us to control the size of the feature maps. Adding zero-padding is also
called wide convolution, and not using zero-padding would be a
narrow convolution.

30 | P a g e
2.Relu Layer
An additional operation called ReLU is used after every Convolution operation.ReLU
is a non-linear operation.
Its output is given by:
𝑶𝒖𝒕𝒑𝒖𝒕 =
𝑴𝒂𝒙(𝒛𝒆𝒓𝒐, 𝒊𝒏𝒑𝒖𝒕)

ReLU is an element wise operation (applied per pixel) and replaces all negative
pixel values in the feature map by zero. The purpose of ReLU is introduce non-
linearity in our ConvNet.

Figure 2-11 The ReLU operation


Other non-linear functions such as tanh or sigmoid can also be used instead of
ReLU, but ReLU has been found to perform better in most situations.

3.Polling Layer
The layer after convolutional layer is mostly pooling layer in CNN architecture.
It partitions the input image into a set of non-overlapping rectangles and, for each
such sub-region, outputs a value. The intuition is that the exact location of a feature
is less important than its rough location relative to other features.
There are three types of pooling
● Max Pooling
● Average Pooling
● Global Pooling
1. Max Pooling
Max pooling is a pooling operation that selects the maximum element from
the region of the feature map covered by the filter. Thus, the output after max-
pooling layer would be a feature map containing the most prominent features

31 | P a g e
of the previous feature map.

Figure 2-12 The Max pooling operation


Max Pooling also performs as a Noise Suppressant. It discards the noisy
activation altogether and also performs de-noising along with dimensionality
reduction.
2. Average Pooling
Average pooling computes the average of the elements present in the region
of feature map covered by the filter. Thus, while max pooling gives the most
prominent feature in a particular patch of the feature map, average pooling
gives the average of features present in a patch.

Figure 2-13 The Average pooling operation


Average Pooling simply performs dimensionality reduction as a noise
suppressing mechanism. Hence, we can say that Max Pooling performs a lot
better than Average Pooling.
3. Global Pooling
Global pooling reduces each channel in the feature map to a single value.
Thus, an (nh x nw x nc) feature map is reduced to (1 x 1 x nc) feature map. This
is equivalent to using a filter of dimensions nh x nw i.e. the dimensions of the
feature map. Further, it can be either global max pooling or global average
pooling.

32 | P a g e
The Convolutional Layer and the Pooling Layer, together form the ith layer
of a Convolutional Neural Network. Depending on the complexities in the
images, the number of such layers may be increased for capturing low-levels
details even further, but at the cost of more computational power.\

4.Flattening Layer
In between the convolutional layer and the fully connected layer, there is a 'Flatten'
layer. Flattening transforms a two-dimensional matrix of features into a vector that
can be fed into a fully connected neural network classifier.

Figure 2-14 The Flattening operation

5.Fully Connected Layer


Fully Connected layers in a neural network are those layers where all the inputs
from one layer are connected to every activation unit of the next layer. In most
popular machine learning models, the last few layers are full connected layers
which compiles the data extracted by previous layers to form the final output.It
is the second most time consuming layer second to Convolution Layer.

Figure 2-15 The Fully Connected Layer operation

33 | P a g e
6.SoftMax Layer

Softmax layer is typically the final output layer in a neural network that performs
multi-class classification and hence the output can be interpreted as a probabilistic
value.
The name comes from the softmax function that takes input a number of scores
values and squashes them into values in the range between 0 and 1 whose sum is
1. Therefore, they represent a true probability distribution.
The standard (unit) softmax function is defined by the formula:

Figure 2-16 The Softmax Layer operation

Prior to applying softmax, some vector components could be negative, or greater


than one; and might not sum to 1; but after applying softmax, each component will
be in the interval and the components will add up to 1, so that they can be
interpreted as probabilities.
Softmax layers are great at determining multi-class probabilities, however there
are limits. Softmax can become costly as the number of classes grows. In those
situations, candidate sampling can be an effective workaround.
2.3.7 ResNet

The approach used for animal classification is similar to the one applied for image
classification, which seeks deep-seated features by increasing the number of DL
layers. Therefore, our research was focused on finding a DL model suitable for
animal multi class classification by changing the ResNet hierarchy. This article
presents the achieved result—an improved version of the ResNet-18 baseline
model for CNNs. Due to the ResNet-18 characteristics, the CNN can extract more
features by increasing the number of convolutional layers while achieving an
improved accuracy.

34 | P a g e
CNN and RESNETS:
Compared with traditional neural networks, CNNs have two characteristics,
weight sharing and local connection, which greatly improve their ability to extract
features and lead to improved efficiency and reduced number of training
parameters. The main structure of a traditional CNN includes an input layer, a
convolutional layer, a pooling layer, a fully connected layer, and an output layer,
whereby the output of one layer serves as an input for the subsequent layer in the
structure . Usually, the convolutional and pooling layers are alternately used in the
structure.
The convolutional layer, the core of a CNN, contains multiple feature maps,
whereby each feature map contains multiple neurons. When a CNN is used for
image classification, for example, this layer scans the image through the
convolution kernel and makes full use of the information of the adjacent areas in
the image to extract image features. After using the activation function, the feature
map of the image is obtained as follows

where Xl+1 j represents the j th feature of the (l + 1) th convolutional layer, Xl i


represents the input characteristic, f represents the activation function (typically
used is a rectifier linear unit, ReLU , Mj represents a set of input feature maps, ∗
represents a convolution operation, k represents a convolution kernel, and b
represents the offset term. The role of the pooling layer, when used for image
classification, is to imitate the human visual system to reduce the dimensionality
of the data, and to represent the image with higher-level features as follows:

where ⊗ represents the pooling operation. The main pooling methods include
maximum pooling, average pooling, and median pooling. In the fully connected
layer, the maximum likelihood function is used to calculate the probability of each
sample, and the learned features are mapped to the target label. The label with the
highest probability is used as the classification result to realize the CNN-based
classification. The deeper the CNN, the better its performance. However, with
deepening the network, two major problems arise: (1) the gradient dissipates,
which affects the network convergence, and (2) the accuracy tends to saturate. In
order to solve the problems of gradient vanishing explosion and performance

35 | P a g e
degradation caused by the depth increase, residual networks (ResNets) were
proposed in [9], which are easier to optimize and can gain accuracy from
considerably increased depth. The ResNet approach won the first place on the
ILSVRC 2015 classification task

Figure 2-17The ResNet building block


ResNet building block with input parameter x and target output H(x). The block
employs a shortcut connection allowing it to directly learn the residual F(x) =
H(x)− x as to make the target output [F(x) + x], thus avoiding the problem of
performance degradation and accuracy reduction due to having too many
convolutional layers. Such shortcut connections can skip two or more layers and
directly perform identity mapping It makes reference (X) for the input of each
layer, learning to form a residual function, instead of learning some functions
without reference (X). This residual function is easier to optimize and can greatly
deepen the number of network layers. The ResNet building block in Figure 6 has
two layers and uses the following residual mapping function
F = W2σ W1(x)
where σ represents the activation function ReLU . Then, through a shortcut
connection and a second ReLU, one can get the output y:
y = F (x, f { Wi}) + x
When a change of the input and output dimensions is needed (e.g., changing the
number of channels), one can make a linear transformation Ws to x in the
shortcut,as follows
y= F (x, f { Wi}) + Wsx
By using the ResNet building block, residual networks of 18 and 34 layers (called
36 | P a g e
ResNet-18 and ResNet-34, respectively) were proposed and evaluated in, where
it was noted that ResNet-18 is comparably accurate as ResNet-34 but converges
faster.
The training of a neural network model requires a large number of data sets. When
the amount of input data increases, the number of neurons in the model needs to
be also increased to improve the classification accuracy. A fully connected neural
network increases in size with the increase of the input data dimension and the
number of hidden layer neurons, which leads to the increase of network
parameters, and as a result affects the training speed of the network model. As a
solution to this problem, this article uses a CNN with characteristics of local
connection and parameter sharing to reduce the number of model parameters and
accelerate the training speed of the model. Therefore, the research method
presented in this article enhances the CNN design with an improved ResNet-18
model for classification of animals. The proposed model can extract multiple
features of the animal data from the same input, which results in efficient obtaining
of the representation , thus improving the classification accuracy.
The elaborated improved ResNet-18 model was used to realize a high-precision
identification and classification of the four animal classes. Before preprocessing
and training the CNN, the model must be compiled. Parameters are declared for
calculation during training, such as the optimizer, the loss function, and the
learning rate. The optimizer and the loss function are the key elements that enable
the CNN to process data properly. The setting of the optimizer determines the
learning rate of the neural network. The optimizer used in the elaborated model
presented here is the stochastic gradient descent (SGD) , proven to perform better
than many other optimizers. The loss function is an important criterion to measure
the classification quality of the model. The proposed model uses the cross- entropy
loss function. The initial value of the learning rate is set to 0.1, and a stepchange is
adopted in the follow-up, presenting a convenient way for the objective function to
converge better.

37 | P a g e
Figure 2-18The structure of the improved ResNet-18 model.
The structure of the elaborated improved ResNet-18 model, which consists of four
parts: a convolutional layer, a classic ResNet-18 layer, an improved ResNet18
layer, and a fully connected layer. The first part, the convolutional layer, is used
mainly to perform basic feature extraction on the input data in orderto prepare these
for the next deeper level. The second part uses the classic ResNet- 18, which is
known as one of the best models used for animal multi classification.In this part,
the input data are convoluted twice, and the modified linear unit, ReLU, is added
between the two convolutions. ReLU zeros the output of some neurons, which
makes the network sparse and reduces the interdependence of parameters. It also
alleviates the occurrence of overfitting problems. On the other hand, the data before
convolution are inputted into a maximum pooling layer, which divides the sample
into feature regions and uses the maximum value in a region as the region
representative to reduce the amount of calculation and the number of parameters.
Finally, two kinds of data with the same dimension after different processing are
added to complete the creation of the block module. The
purpose of this step is to inherit the optimization effect of the previous step and
make the model continue to converge.
In order to achieve better performance, we use an improved ResNet-18 in the third
part. A batch norm is added before the classical ResNet-18 structure to accelerate
the training of the neural network, increase the convergence speed, andmaintain
the stability of the algorithm. The elaborated model goes through this structure

38 | P a g e
seven times, and then, the data are sent to the fourth part, which is a fully connected
layer.
Finally, the output data features are mapped from the fully connected layer to a
one-dimension vector, and the vector is regressed by a softmax function [38] (also
called a normalized exponential function), which is suitable for a multiobjective
classification. The goal is to transform the output feature vector of the fully
connected layer into an exponential
function and map an n-dimension real number vector into another n-dimension
vector by an exponential function. Finally, all the results are added and normalized
to present the multi classification results in the form of probability. The softmax
function used is defined as

It is important that the l2 regularization is added to all convolutional layers and


to the fully connected layer in the proposed model, in order to speed up the
convergence speed of the network and limit the generation of overfitting
phenomenon. The loss function of weight regularization is defined as
Loss = (θ – WTx) 2 + α|| W ||2
where α is the coefficient of regular term, W is the network weight, θ is the
prediction value of the animal category, x is the feature of the heartbeat sample
data, and T is the number of weighted items.
Dropout (set to 0.5) was added to the convolutional layer for reducing the number
of parameters and training time.
In the conducted experiments, the improved ResNet-18 model was used to classify
animals, available from the dataset. Compared with other classification models,
the proposed model can classify complex mixed images in a shorter computational
time. Overall, for complex images, both the model training and testing time are
reduced.

39 | P a g e
CHAPTER 3: SYSTEM DESIGN
3.1 BLOCK DIAGRAM

Figure 3-1 is the block diagram for our project. In this first we are taken dataset
from Kaggle. After that split that dataset into train set and test set. For train set
80% ofthe dataset and for test set 20% of the dataset taken. After splitting the
dataset, we are train our model with ResNet18.And then save that model for
prediction. For prediction the user gives the input to the saved model and then it
gives output as animal name.

Figure 3-1 Block diagram for proposed model

40 | P a g e
CHAPTER 4: IMPLEMENTATION
4.1 DATASET
The dataset is collected from the Kaggle website.
The dataset consists of images classified into 2 folders namely Training and
Validation, the further consists of 4 folders each which represents the 4 species
of animals which are tiger, cheetah, hyena and jaguar. Each folder consists of 900
images.
The model is well trained using the training images from each class, and the
accuracy of the model is determined by the predictions of images present in the
validation folder.

4.2 SAMPLE IMAGES

4.2.1 CHEETAH

Figure 4-1 Cheetah

41 | P a g e
4.2.2 JAGUAR

Figure 4-2 Jaguar


4.2.3 HYENA

Figure 4-3 Hyena


4.2.4 TIGER

Figure 4-4 Tiger

42 | P a g e
4.3 CODE
TRAINING CODE:
from google.colab import drive
drive.mount('/content/drive')

from zipfile import ZipFile


file_name = "/content/drive/MyDrive/ODS1.zip"
import numpy as npwith ZipFile(file_name,'r') as zip:
zip.extractall()
print('Done')

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
from torchvision.utils import make_grid
from torch.utils.data import random_split
from torch.utils.data.dataloader import DataLoader
import matplotlib.pyplot as plt

data_dir = '/content/ODS'
print('Folders :', os.listdir(data_dir))
classes = os.listdir(data_dir + "/training")
print(len(classes),'classes :', classes)

dataset = ImageFolder(data_dir + '/training', transform=ToTensor())


print('Size of training dataset :', len(dataset))

test = ImageFolder(data_dir + '/validation', transform=ToTensor())


print('Size of test dataset :', len(test))

img, label = dataset[0]


print(img.shape)
print(label)

def show_example(img, label):

43 | P a g e
print('Label: ', dataset.classes[label], "("+str(label)+")")
plt.imshow(img.permute(1, 2, 0))
show_example(*dataset[0])

torch.manual_seed(43)
val_size = 400
train_size = len(dataset) - val_size

train_ds, val_ds = random_split(dataset, [train_size, val_size])


len(train_ds), len(val_ds)

batch_size = 32
train_loader = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4,
pin_memory=True)
val_loader = DataLoader(val_ds, batch_size*2, num_workers=4,
pin_memory=True)
test_loader = DataLoader(test, batch_size*2, num_workers=4,
pin_memory=True)

for images, labels in train_loader:


fig, ax = plt.subplots(figsize=(12, 6))
ax.set_xticks([]); ax.set_yticks([])
ax.imshow(make_grid(images, nrow=16).permute(1, 2, 0))
break

def apply_kernel(image, kernel):


ri, ci = image.shape # image dimensions
rk, ck = kernel.shape # kernel dimensions
ro, co = ri-rk+1, ci-ck+1 # output dimensions
output = torch.zeros([ro, co])
for i in range(ro):
for j in range(co):
output[i,j] = torch.sum(image[i:i+rk,j:j+ck] * kernel)
return output

def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
44 | P a g e
return torch.device('cpu')
device = get_default_device()
device

def to_device(data, device):

if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)

class DeviceDataLoader():

def init (self, dl, device):


self.dl = dl
self.device = device

def iter (self):

for b in self.dl:
yield to_device(b, self.device)

def len (self):


return len(self.dl)

train_loader = DeviceDataLoader(train_loader, device)


val_loader = DeviceDataLoader(val_loader, device)
test_loader = DeviceDataLoader(test_loader, device)

def accuracy(outputs, labels):


_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images)
loss = F.cross_entropy(out, labels) # Calculate
lossreturn loss

def validation_step(self, batch):


images, labels = batch
out = self(images) # Generate predictions
45 | P a g e
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}

def validation_epoch_end(self, outputs):


batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):


print("Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc:
{:.4f}".format(
epoch, result['train_loss'], result['val_loss'], result['val_acc']))

class CnnModel(ImageClassificationBase):
def init (self):
super().__init ()
self.network = nn.Sequential(
nn.Conv2d((3,400,400), 100, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(100, 150, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 150 x 16 x 16

nn.Conv2d(150, 200, kernel_size=3, stride=1, padding=1),


nn.ReLU(),
nn.Conv2d(200, 200, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 200 x 8 x 8

nn.Conv2d(200, 250, kernel_size=3, stride=1, padding=1),


nn.ReLU(),
nn.Conv2d(250, 250, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 250 x 4 x 4

nn.Flatten(),
nn.Linear(128, 64),
nn.ReLU(),
46 | P a g e
nn.Linear(64, 32),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(32, 3))

def forward(self, xb):


return self.network(xb)
#

class CnnModel2(ImageClassificationBase):
def init (self):
super().__init ()
# Use a pretrained model
self.network = models.resnet18(pretrained=True)
# Replace last layer
num_ftrs = self.network.fc.in_features
self.network.fc = nn.Linear(num_ftrs, 4)

def forward(self, xb):


return torch.sigmoid(self.network(xb))
#
model = CnnModel2()
model.cuda()
#
for images, labels in train_loader:
print('images.shape:', images.shape)
out = model(images)
print('out.shape:', out.shape)
print('out[1]:', out[1])
break
#
def get_default_device():
"""Pick GPU if available, else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')
def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
47 | P a g e
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def init (self, dl, device):
self.dl = dl
self.device = device

def iter (self):


"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)

def len (self):


"""Number of batches"""
return len(self.dl)
#
device = get_default_device()
device
#
@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):


history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
model.epoch_end(epoch, result)
48 | P a g e
history.append(result)
return history
#
train_dl = DeviceDataLoader(train_loader, device)
val_dl = DeviceDataLoader(val_loader, device)
to_device(model, device);
#
model = to_device(CnnModel2(), device)
#
evaluate(model, val_loader)
#
num_epochs = 10
opt_func = torch.optim.Adam
lr = 0.001
#
history = fit(num_epochs, lr, model, train_dl, val_dl, opt_func)
#
def plot_accuracies(history):
accuracies = [x['val_acc'] for x in history]
plt.plot(accuracies, '-x')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend(['Training', 'Validation'])
plt.title('Accuracy vs. No. of epochs');
#
plot_accuracies(history)
#
def plot_losses(history):
train_losses = [x.get('train_loss') for x in history]
val_losses = [x['val_loss'] for x in history]
plt.plot(train_losses, '-bx')
plt.plot(val_losses, '-rx')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend(['Training', 'Validation'])
plt.title('Loss vs. No. of epochs');
#
plot_losses(history)
#
evaluate(model, test_loader)
#
torch.save(model, '/content/drive/MyDrive/model_weights1.pt')
49 | P a g e
TESTING CODE
import warnings
warnings.filterwarnings('ignore')
#
from google.colab import drive
drive.mount('/content/drive')
#
from google.colab import files
uploaded = files.upload()
#
import os
import argparse
import torch
import torch.nn as nn
from google.colab.patches import cv2_imshow
from torch.autograd import Variable
import torchvision
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import ImageFolder
import matplotlib.pyplot as plt
from PIL import Image
import cv2
from skimage import io
import numpy as np
#
class ImageClassificationBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss

def validation_step(self, batch):


images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}

50 | P a g e
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):


print("Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc:
{:.4f}".format(
epoch, result['train_loss'], result['val_loss'], result['val_acc']))
class CnnModel(ImageClassificationBase):
def init (self):
super().__init ()
self.network = nn.Sequential(
nn.Conv2d((3,400,400), 100, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(100, 150, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 150 x 16 x 16

nn.Conv2d(150, 200, kernel_size=3, stride=1, padding=1),


nn.ReLU(),
nn.Conv2d(200, 200, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 200 x 8 x 8

nn.Conv2d(200, 250, kernel_size=3, stride=1, padding=1),


nn.ReLU(),
nn.Conv2d(250, 250, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 250 x 4 x 4

nn.Flatten(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(32, 3))
def forward(self, xb):
return self.network(xb)
51 | P a g e
class CnnModel2(ImageClassificationBase):
def init (self):
super().__init ()
# Use a pretrained model
self.network = models.resnet18(pretrained=True)
# Replace last layer
num_ftrs = self.network.fc.in_features
self.network.fc = nn.Linear(num_ftrs, 4)
def forward(self, xb):
return torch.sigmoid(self.network(xb))
model = torch.load("/content/drive/MyDrive/model_weights1.pt")
train_transform= transforms.Compose([
transforms.Scale([512, 512]),
transforms.ToTensor(),])
url=input('Enter URL of Image')
img=io.imread(url)
disp=cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
cv2_imshow(disp)
img2 = Image.open(url)
img2 = train_transform(img2)
img2 = img2.unsqueeze(0)
img2 = Variable(img2)
if torch.cuda.is_available():
img2 = img2.cuda()
model.eval()
target = model(img2)
_, pred = torch.max(target.data, 1)
ans=str(pred[0])
classes={0:'Cheetah',1:'Hyena',2:'Jaguar',3:'Tiger'}
a=int(ans[7])
print('Prediction: ', classes[a])

52 | P a g e
CHAPTER 5: EXPERIMENTAL RESULTS
5.1 CLASSIFICATION RESULTS
The classification is applied to find out the species to which the animal in the given image
belongs to. The classification result in our model is measured by themetric of Accuracy.
Accuracy: The higher the value is, the more reliable our system is and the more is our
model trained.
The Accuracy of a model is obtained by the formula:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑁𝑜. 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠/𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
The accuracy of our model is found out to be 96.88%.

5.2 GRAPH
Two plots can be created from the trained Animal data.
1. A plot of loss on the training and validation datasets over training epochs.
2. A plot of accuracy on the training and validation datasets over training epochs.
A loss is a number indicating how bad the model's prediction was on a single
example. If the model's prediction is perfect, the loss is zero; otherwise,the loss
is greater. The loss is calculated on training and validation and its interpretation is
based on how well the model is doing in these two sets. It is the sum of errors made
for each example in training or validation sets.
An accuracy metric is used to measure the algorithm’s performance in an
interpretable way. Accuracy of a model is usually determined after the model
parameters and is calculated in the form of a percentage. It is the measure of how
accurate your model's prediction is compared to the true data.

53 | P a g e
5.3 TEST CASE SCENARIOS

54 | P a g e
The prediction of our model depends upon the visibility of the body markings of
the animal in the input image, so based on a few criteria like brightness, posture
of the animal in the image, similarity of the animal in the image with our given
species, the prediction varies.

5.4 COMPARISION WITH OTHER MODELS


We Tried to implement our project using a Support vector machine model. But
we did not get expected accuracy from the model. We got 65% accuracy after

55 | P a g e
training the svm model, which is not suitable for consideration. So, we thought
of implementing other models for acquiring better accuracy.
After implement both svm and Resnet algorithms we are going with Resnet
because we are getting more accurate (96%) results compared to svm (65%)
accuracy.

Our model utilizes the Resnet-18 architecture, when compared to SVM, this
model is trained in less time than the SVM, it also makes the prediction faster.

56 | P a g e
CHAPTER 6: CONCLUSION

In this project, an improved ResNet-18 model has been proposed for animal
classification, which help in classifying the animals present in the images, into four
different species of animals namely Tiger, Cheetah, Jaguar and Hyena where
improved ResNet-18 model can effectively be used to identify animal classes.
Therefore, the model has great application prospects and is worthy of further study
and elaboration. Another idea could be to identify the action that is being
performed by the animals like (eating, standing, walking. Etc) and number of
animals in an image can be identified which also can be implemented in future
works.

57 | P a g e
CHAPTER 7: REFERENCES
[1] Gooliaff, T.J., Hodges, K.E., 2018. Measuring agreement among experts in
classifying camera images of similar species. Ecology and Evolution
8,11009e11021. https://doi.org/10.1002/ece3.4567.

[2] Güthlin, D., Storch, I., Küchenhoff, H., 2014. Is it possible to individually
identify red foxes from photographs? Wildl. Soc. Bull. 38, 205e210.
https://doi.org/10.1002/wsb.377.7

[3] Hallgren, W., Santana, F., Low-Choy, S., Zhao, Y., Mackey, B., 2019. Species
distribution models can be highly sensitive to algorithm configuration. Ecol.
Model. 408, 108719. https://doi.org/10.1016/j.ecolmodel.2019.108719.

[4] Mendoza, E., Martineau, P., Brenner, E., Dirzo, R., 2011. A novel method to
improve individual animal identification based on camera-trapping data.
J.Wildl.Manag. 75, 973e979. https://doi.org/10.1002/jwmg.120

[5] Nguyen, H., Maclagan, S.J., Nguyen, T.D., Nguyen, T., Flemons, P., Andrews,
K., Ritchie, E.G., Phung, D., 2017. Animal recognition and identification with
deepconvolutional neural networks for automated wildlife monitoring. In:
2017 IEEE International Conference on Data Science and Advanced
Analytics(DSAA). https://doi.org/10.1109/DSAA.2017.31

[6] Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S.,
Packer, C., Clune, J., 2018. Automatically identifying, counting, and
describing wildanimals in camera-trap images with deep learning. Proc. Natl.
Acad. Sci. Unit. States Am. 115, E5716eE5725.
https://doi.org/10.1073/pnas.1719367115

[7] abak, M.A., Norouzzadeh, M.S., Wolfson, D.W., Sweeney, S.J., Vercauteren,
K.C., Snow, N.P., Halseth, J.M., Salvo, P.A.D., Lewis, J.S., White, M.D.,
Teton,B.,Beasley, J.C., Schlichting, P.E., Boughton, R.K., Wight, B.,
Newkirk, E.S.,Ivan, J.S., Odell, E.A., Brook, R.K., Lukacs, P.M., Moeller,
A.K., Mandeville, E.G., Clune, J., Miller, R.S., 2019. Machine learning to
classify animal species in camera trap images: applications in ecology.
Methods in Ecology and Evolution 10, 585e590. https://doi.org/10.1111/2041-
210X.13120

[8] Dataset: https://www.kaggle.com/kerneler/starter-cheetah-jaguar-leopard-


and-79e225e8-8/data

58 | P a g e

You might also like