Dog Breed Identification Using Deep Learning
Dog Breed Identification Using Deep Learning
net/publication/377392464
CITATION READS
1 369
6 authors, including:
All content following this page was uploaded by Ajay Dureja on 03 July 2024.
Anurag Tuteja, Sumit Bathla, Pallav Jain, Utkarsh Garg, Aman Dureja,
and Ajay Dureja
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 515
A. Swaroop et al. (eds.), Proceedings of Data Analytics and Management,
Lecture Notes in Networks and Systems 785,
https://doi.org/10.1007/978-981-99-6544-1_39
516 A. Tuteja et al.
1 Introduction
With the aid of photographs, this study identifies dog breeds. This is a challenging
topic in fine-grained classification because all breeds of Canis lupus familiaris will
have identical body characteristics and general structure.
In addition to being difficult, this problem’s answer is also applicable to other
fine-grained categorization issues. The techniques employed to solve this issue, for
instance, might also be used to identify different cat and horse breeds, species of
animals and plants, or even different car types. A fine-grained classification issue
can be addressed for any set of classes with minimal variation within them.
Our primary goal in this paper is to use TensorFlow to develop an image clas-
sification system that uses deep learning and convolutional neural networks. These
days, computers can form temporary phrases and describe the many components of
pictures in addition to recognizing images. Convolutional neural networks (CNNs)
[1] carry to do this by identifying patterns in images. Using one of the biggest
databases of tagged photographs to train CNN. Frameworks for deep learning, such
as TensorFlow.
The main goal is to predict the breed of various dogs using deep learning tech-
niques. We will also take a peek at a trained model that was applied to over 40
thousand photographs of 120 different dog breeds. To modify the model and identify
trends in the training data, CNN will be crucial. Even if the dog is a puppy, this model
will be trained to identify the breed.
2 Related Work
Recent years have seen several studies on the use of deep learning to identify dog
breeds. Convolutional neural networks (CNNs) have been the basis for many of these
investigations, and transfer learning has been utilized to hone pre-trained models on
a dataset of dog image data.
Using a tweaked VGG-19 network, a study by Park et al. (2019) developed a dog
breed identification model and achieved an accuracy of 96.2% [2] on a dataset of
120 dog breeds. On the same dataset, a different study (Wang et al. 2018) that used
an Inception v3 model was 96.5% accurate [3].
On a dataset of 120 dog breeds, a study by Aziz pour et al. (2016) developed a
dog breed identification system combining CNNs and local binary patterns (LBP)
features, and it achieved an accuracy of 91.5% [4].
Using an improved Inception v3 model, Chen et al. (2018) achieved an accuracy
of 94.3% on a dataset of 120 dog breeds [5].
This research shows the value of employing pre-trained models and fine-tuning
them on a particular task while demonstrating the efficacy of CNNs with transfer
learning for dog breed identification.
Dog Breed Identification Using Deep Learning 517
Deep learning, a type of machine learning that involves training models with
numerous layers of artificial neural networks, has been used in several recent research
on dog breed identification.
These studies indicate the value of utilizing deep learning for identifying dog
breeds and that this approach can identify dog breeds with high levels of accuracy.
3 Problem Statement
What breed is my dog? This is a question you have probably asked yourself if you
got a mixed-breed dog from a rescue group. Maybe well-meaning family and friends
have asked the question. Based on your dog’s basic appearance, you might even
have some of your own beliefs and educated assumptions. Everyone loves a good
mystery, but occasionally it would be wonderful to have a more certain resolution!
Fortunately, you have a variety of tools at your disposal to aid in your search. Create
a fine-grained dog breed categorization model using images. Focus on achieving a
high level of accuracy with a sizable variance inside a single subcategory and a small
variance across subcategories.
Objective
Our primary goal in this paper is to use TensorFlow to develop an image classi-
fication system that uses deep learning and convolutional neural networks. These
days, computers can form temporary phrases and describe the many components of
pictures in addition to recognizing images. CNN carries this by identifying patterns
in images. Using one of the biggest databases of tagged photographs to train CNN.
Frameworks for deep learning, such as TensorFlow.
The main goal is to predict the breed of various dogs using deep learning tech-
niques. We will also take a peek at a trained model that was applied to over 40
thousand photographs of 120 different dog breeds. To modify the model and identify
trends in the training data, CNN will be crucial. Even if the dog is a puppy, this model
will be trained to identify the breed.
Motivation
1. The goal is to build a model that can classify a dog’s breed simply by “look-
ing” at its image. We began considering several methods to develop a model
for accomplishing this and the level of accuracy it would be able to reach. It
appears that the problem might be handled with a respectable degree of accu-
racy without expending excessive amounts of effort, time, or resources with the
help of contemporary machine learning frameworks like TensorFlow, publically
available datasets, and pre-trained models for picture recognition.
2. Create a fine-grained dog breed categorization model using images. What breed
is my dog? This is a question you have probably asked yourself if you got a
mixed-breed dog from a rescue group. Maybe well-meaning family and friends
have asked the question. Based on your dog’s basic appearance, you might even
518 A. Tuteja et al.
have some of your own beliefs and educated assumptions. Everyone loves a
good mystery, but occasionally it would be wonderful to have a more certain
resolution! Fortunately, you have a variety of tools at your disposal to aid in your
search. Focus on achieving a high level of accuracy with a sizable variance inside
a single subcategory and a small variance across subcategories.
4 Proposed Work
This paper’s implementation consists of three main phases, divided mainly into data
preparation, model training, and testing (Fig. 1).
The data preparation phase is necessary because the paper’s primary focus is
dog facial photographs. Then, the testing procedure and the training process are
separated. An estimator for dog breed models is the result of the training model.
Breed categorization and model evaluation use the model. The basic procedure for
the implementation of the paper is:
1. Understanding the problem: Getting the objectives of the paper and understanding
its implementation.
2. Data collection: Collect the data used to train the model. Data is collected from
[6]
3. Data preparation: Importing the data to the paper environment and making it
suitable for further analysis.
4. Exploratory data analysis: Learning more about the data along with handling the
errors in the data like missing values, null data, etc.
5. Modeling: Build a model for breed identification and make another model for
age estimation.
6. Model evaluation: Evaluate the performance of the model using the validation
dataset and make predictions based on the accuracy of the model.
5 Methodology
See Fig. 2.
A neural network is a type of machine learning technique that is based on how the
human brain functions. It is composed of interconnected “neurons” that communicate
with one another. Between the input and output layers, these neurons are stacked in
layers, with one or more hidden layers.
The essential unit of a neural network is the artificial neuron. It takes in inputs,
processes those inputs, and then generates outputs. Since the artificial neurons are
interconnected, the network allows for unfettered data flow.
A training dataset is a collection of input–output pairs used to calibrate the weights
of the connections between neurons in a neural network. For the neural network to
accurately anticipate the output given an input, the weights are modified during the
training phase (Fig. 3).
Numerous applications, such as speech recognition, natural language processing,
picture recognition, and many others, use neural networks. They have been employed
to achieve cutting-edge performance in a variety of industries, and they are
particularly well-suited for applications involving complex, high-dimensional data.
A “neural network” is a type of machine learning algorithm that is based on
the organization and functioning of the human brain, which is composed of inter-
connected synthetic neurons that process and transfer information. They are trained
using a dataset and utilized for a number of tasks, such as speech recognition, image
recognition, and natural language processing.
Convolutional neural networks (CNNs) are a special class of neural networks that
excel at processing and image recognition. It adapts a typical neural network with
several layers, including fully connected, pooling, and convolutional layers.
The foundational part of a CNN is the convolutional layer. The input image is
subjected to a series of filters, commonly referred to as kernels, in this layer. These
filters serve as feature detectors, spotting patterns like edges, textures, and forms in
the image.
Dog Breed Identification Using Deep Learning 521
The pooling layer is used to minimize the number of parameters in the model
and the spatial dimensions of the image. It operates by taking the output of the
convolutional layer and applying a function like a max or average pooling.
The image is categorized into one of several predetermined categories using
the fully connected layer. A probability distribution across the set of predefined
categories is CNN’s output.
Convolutional, pooling, and fully connected layer parameters are changed during
a CNN training process to reduce the difference between the predicted and actual
results (Fig. 4).
A class of neural networks known as convolutional neural networks (CNNs) excels
at processing and identifying pictures. It has a number of layers, including fully
connected, pooling, and convolutional layers. The pooling layer is used to reduce
the spatial dimensions of the image and the amount of parameters in the model,
while the fully connected layer is used to categorize the image into one of several
predetermined categories. The key element of the CNN is the convolutional layer.
The Google Brain Team created the open-source machine learning software package
known as TensorFlow. It is used for a variety of tasks including developing and
executing neural networks, training and deploying machine learning models, and
carrying out intricate mathematical operations on multi-dimensional data arrays or
tensors.
522 A. Tuteja et al.
TensorFlow is known for its flexibility, which enables programmers to build and
train models on a single machine or a cluster of machines. It also supports a large
number of programming languages, including Python, C++, and Java.
Along with a library of prebuilt models, visualization tools for analyzing model
performance, and support for distributed training, TensorFlow also offers a complete
set of tools for creating, training, and deploying machine learning models.
Additionally, TensorFlow has a sizable and vibrant community that offers help,
guides, and pre-trained models that can be used for a variety of applications, including
speech recognition, picture classification, and natural language processing.
In conclusion, TensorFlow is an open-source machine learning software library
created by the Google Brain Team. It is used for a variety of tasks including building
and running neural networks, training and deploying machine learning models, and
performing intricate mathematical operations on multi-dimensional data arrays. It is
adaptable, gives programmers the option to build and train models on a single machine
or a cluster of machines, and supports a variety of languages. Along with a library of
prebuilt models, visualization tools for analyzing model performance, and support
for distributed training, it also comes with a complete set of tools for developing,
training, and deploying machine learning models. Additionally, it features a sizable
and vibrant community that offers help, guides, and trained models.
A machine learning technique called transfer learning enables a model that has been
trained on one task to be modified and used for another, related task. This can be
accomplished by fine-tuning the pre-trained model on a fresh dataset after it has
already learned features from the original dataset (Fig. 5).
Fig. 7 Model
524 A. Tuteja et al.
the original dataset. Reusing a pre-trained model can save time and resources while
also enhancing the new model’s performance by utilizing the information gained
from the initial work.
The ResNet family of models includes the convolutional neural network (CNN)
architecture ResNet-50. It was created by Microsoft Research Asia and is frequently
used for computer vision applications including object recognition and image
categorization.
ResNet-50 is a deep CNN with 50 layers, comprising fully connected, convolu-
tional, and pooling layers. It is renowned for its capacity to train very deep networks
successfully and circumvent the issue of vanishing gradients, a typical difficulty in
deep neural networks.
The inclusion of residual connections, a crucial component of ResNet-50, enables
the network to efficiently learn an identity function that links inputs to outputs. This
makes the network more effective and precise than conventional CNNs by enabling
it to learn features at different levels of abstraction.
On the ImageNet dataset, which comprises over 14 million photographs and 1000
object categories, ResNet-50 has been pre-trained. For a variety of computer vision
tasks, this pre-trained model can serve as a jumping-off point for transfer learning
(Fig. 8).
In conclusion, ResNet-50 is a convolutional neural network (CNN) architecture
created by Microsoft Research Asia and is a member of the ResNet family of models.
With 50 layers, including convolutional, pooling, and fully linked layers, it is a
deep CNN. It is renowned for its capacity to train very deep networks successfully
and circumvent the issue of vanishing gradients, a typical difficulty in deep neural
networks. The network effectively learns an identity function that maps inputs to
outputs thanks to the utilization of residual connections, making it more effective
and accurate than conventional CNNs. It is suitable for transfer learning in a variety
of computer vision tasks because it has already been trained on the ImageNet dataset.
Using the model for breed identification, few predictions were produced using the
testing data after the model had been trained.
Output diagrams (Figs. 9, 10, and 11):
Fig. 9 Result 1
Fig. 10 Result 2
526 A. Tuteja et al.
Fig. 11 Result 3
Dataset
We used the Stanford Dogs dataset, which includes pictures of 120 different dog
breeds from throughout the world, some of which are shown in Fig. 1, for our
experiment. For the purpose of fine-grained picture categorization, this dataset was
created utilizing images and annotation from ImageNet.
Contents of dataset:
1. Number of classes: 120
2. Number of images: 10,222
3. Annotations: Class labels, bounding boxes.
In this paper, we trained our model with the following configuration (Table 1):
Performance Measurement
In this paper, we get an accuracy score of 87.53%, a precision score of 87.42%, and
the other scores that are as follows (Figs. 12 and 13):
Comparison
See Table 2.
Table 2 Performance
Method Accuracy (%)
comparison
Chen et al. [11] 52
Simon et al. [12] 68.61
Angelova [13] 73.45
Krause et al. [14] 82.6
Ours. ResNet-50 87.53
528 A. Tuteja et al.
7 Limitations
Variability within a breed: While each breed may have some distinguishing physical
characteristics, there can still be a lot of variation within a breed. This can make
it challenging for a breed identifier system to accurately identify the breed of a
particular dog.
Mixed-breed dogs: Many dogs are mixed breeds, which means they have genetic
traits from more than one breed. Identifying the breed of a mixed-breed dog can be
particularly challenging, as it may exhibit the physical characteristics of multiple
breeds.
Limited training data: A breed identifier system is only as accurate as the data it
has been trained on. If the system has been trained on a limited dataset or if the dataset
is biased toward certain breeds, the system may not perform well when presented
with dogs from other breeds.
Environmental factors: Environmental factors such as lighting, camera angle, and
background can all impact the accuracy of a breed identifier system. If the dog is not
in a well-lit area or if the camera angle is not ideal, it can be more difficult for the
system to accurately identify the breed.
Human error: Finally, it is important to note that even the best breed identifier
systems are not infallible. Human error can still impact the accuracy of the system,
particularly if the person inputting the data makes a mistake or misidentifies the
breed.
The main goal of this model is to learn how to categorize photographs, specifically
images of dog breeds, using a machine learning classification tool. The application
has been thoroughly shown with numerous dog photographs, and it consistently
produces accurate results. For each dog breed, this program now provides some
basic scraped data. Convolutional neural networks are a learning technique for data
analysis and forecasting that have recently gained enormous popularity for picture
categorization issues. Convolutional neural networks were used to construct a dog
breed classification system that uses input photographs to estimate the breed of each
image.
In the end, we concluded that, given enough data, the deep learning model with
ResNet-50 has a very high potential to surpass human capabilities. Deep learning
will eventually build another deep learning model all by itself, and deep learning
models will be able to write code and outperform humans. By analyzing images
from deep convolution neural networks, deep learning has a lot of potential in the
medical sciences. One of the potential causes of humanity’s demise is deep learning.
One of the deep learning papers that were created using the exception model and
Dog Breed Identification Using Deep Learning 529
cutting-edge neural networks was the dog breed classifier. By merging a prebuilt
model with the model we built, transfer learning has a lot of potential in the future.
Future study should look into the potential of convolutional neural networks for
predicting dog breeds. This strategy shows promise for the upcoming work given
the success of our keypoint detection network. However, because training neural
networks requires a lot of time, we were unable to employ our technique in many
iterations due to time constraints. We recommend more investigation into keypoint
detection using neural networks, particularly by training networks with different
designs and batch iterators to ascertain which approaches may be most efficient.
Given our success with neural networks and keypoint identification, we recommend
building a neural network for breed classification as well, as this has not been done
in the literature. We were unable to test this method because of the time constraints
of neural networks, but we think the outcomes would be on par with, if not better
than, those of our classification. In contrast to more traditional methods, neural
networks are strong classifiers and will increase prediction accuracy. In the end,
neural networks take a long time to train and iterate, which should be considered for
the next work.
References
11. Chen G, Yang J, Jin H, Shechtman E, Brandt J, Han TX (2015) Selective pooling vector for
fine-grained recognition. In: 2015 IEEE Winter conference on applications of computer vision,
pp 860–867
12. Simon M, Rodner E (2015) Neural activation constellations: Unsupervised part model discovery
with convolutional networks. In: Proceedings of the IEEE international conference on computer
vision, pp 1143–1151
13. Angelova A, Zhu S, Efficient object detection and segmentation for fine-grained recognition.
In: Proceedings of the IEEE conference on computer vision and pattern recognition
14. Krause J, Sapp B, Howard A, Zhou H, Toshev A, Duerig T, Philbin J, Fei-Fei L (2016) The
unreasonable effectiveness of noisy data for fine-grained recognition. In: Leibe B, Matas J,
Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer InternationalPublishing,
Cham, pp 301–320
15. Liu J, Kanazawa A, Jacobs D, Belhumeur P (2002) Dog breed classification using part localiza-
tion. In: Proceedings of the 12th European conference on computer vision, Springer, Florence,
Italy, pp 172–185
16. Srikant MR, Rajendra PK, Ramakrishna AS, A deep learning framework for dog breed iden-
tification” by where a deep learning model was trained on a dataset of images of dogs, and it
was able to achieve an accuracy of 98.4% in identifying dog breeds
17. Tong SG, Huang YY, Tong ZM (2019) A robust face recognition method combining LBP
with multi-mirror symmetry for images with various face interferences. Int J Autom Comput
16(5):671–682. https://doi.org/10.1007/s11633-018-1153-8
18. Zaman FK, Shafie AA, Mustafah YM (2016) Robust face recognition against expressions and
partial occlusions. Int J Autom Comput 13(4):319–337. https://doi.org/10.1007/s11633-016-
0974-6
19. Xue JR, Fang JW, Zhang P (2018) A survey of scene understanding by event reasoning in
autonomous driving. Int J Autom Comput 15(3):249–266. https://doi.org/10.1007/s11633-018-
1126-y
20. Chanvichitkul M, Kumhom P, Chamnongthai K (2007). Face recognition based dog breed
classification using coarse-to-fine concept and PCA. In: Proceedings of Asia-Pacific conference
on communications, IEEE, Bangkok, Thailand, pp 25–29