0% found this document useful (0 votes)

5 views

10. Image Processing With Deep Learning

The document discusses various deep learning architectures for image processing, focusing on ImageNet, AlexNet, VGG, and ResNet. It highlights the significance of ImageNet as a benchmark dataset, the innovative features of AlexNet that revolutionized CNNs, and the advantages of VGG and ResNet in handling complex image classification tasks. Each architecture's strengths and weaknesses are also outlined, emphasizing their contributions to advancements in computer vision.

Uploaded by

toybabyspot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

10. Image Processing With Deep Learning

Uploaded by

toybabyspot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Image Processing with Deep Learning

Tushar B. Kute,
http://tusharkute.com
ImageNet

• ImageNet is a large-scale image database designed for

visual object recognition software research. It contains
over 14 million images and 1,000 classes.
• The images are organized into a hierarchical taxonomy,
with each class representing a different object or scene.
• ImageNet was created by the Stanford Vision Lab and
first released in 2009.
• It has since become a benchmark for evaluating the
performance of visual object recognition software.
ImageNet

• The ImageNet Large Scale Visual Recognition Challenge

(ILSVRC) is an annual competition that tests the
accuracy of visual object recognition software on the
ImageNet dataset.
• ImageNet has been used to train a number of successful
deep learning models, including AlexNet, VGGNet,
ResNet, and Inception.
• These models have achieved state-of-the-art results on
the ILSVRC challenge, and they have been used to
develop a variety of real-world applications, such as
self-driving cars and facial recognition software.
ImageNet: Benefits

• It is a large and diverse dataset, which allows for the

training of more accurate models.
• The images are organized into a hierarchical taxonomy,
which makes it easier to train models for specific tasks.
• The ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) provides a benchmark for evaluating the
performance of visual object recognition software.
ImageNet: Limitations

• The images are all labeled by humans, which can be a

time-consuming and expensive process.
• The dataset is biased towards Western culture, which
can make it less effective for training models for other
cultures.
• The dataset is not always up-to-date, which can make it
less effective for training models for new objects or
scenes.
ImageNet Dataset
ImageNet Web link
AlexNet

• AlexNet is a convolutional neural network (CNN)

architecture that was proposed by Alex Krizhevsky, Ilya
Sutskever, and Geoffrey Hinton in their 2012 paper,
"ImageNet Classification with Deep Convolutional
Neural Networks."
• AlexNet was the first CNN to achieve state-of-the-art
results on the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC), and it is considered to be one of the
most influential papers in the field of deep learning.
AlexNet

• AlexNet consists of eight layers, including five

convolutional layers and three fully connected layers.
• The convolutional layers use rectified linear units (ReLUs)
as activation functions, and the fully connected layers use
softmax activation functions.
• AlexNet also uses dropout regularization to prevent
overfitting.
• AlexNet was trained on the ImageNet dataset, which
contains over 14 million images and 1,000 classes.
• AlexNet achieved a top-5 error rate of 15.3% on the ILSVRC
2012 challenge, which was a significant improvement over
the previous state-of-the-art.
AlexNet

• AlexNet has been used as a baseline for many

subsequent CNN architectures, and it has inspired a
number of new research directions in deep
learning.
• AlexNet is still a powerful CNN architecture, and it
can be used for a variety of image classification
tasks.
AlexNet

• Here are some of the key features of AlexNet:

– It is a deep CNN architecture, with eight layers.
– It uses ReLUs as activation functions.
– It uses dropout regularization to prevent
overfitting.
– It was trained on the ImageNet dataset.
– It achieved state-of-the-art results on the ILSVRC
2012 challenge.
AlexNet

• AlexNet is a significant milestone in the history of

deep learning.
• It showed that deep CNNs could achieve state-of-
the-art results on challenging image classification
tasks.
• AlexNet has inspired a number of new research
directions in deep learning, and it is still a powerful
CNN architecture that can be used for a variety of
image classification tasks.
AlexNet

• The architecture consists of eight layers: five

convolutional layers and three fully-connected layers. But
this isn’t what makes AlexNet special; these are some of
the features used that are new approaches to
convolutional neural networks:
– ReLU Nonlinearity.
• AlexNet uses Rectified Linear Units (ReLU) instead
of the tanh function, which was standard at the
time.
• ReLU’s advantage is in training time; a CNN using
ReLU was able to reach a 25% error on the CIFAR-10
dataset six times faster than a CNN using tanh.
AlexNet

• Multiple GPUs. Back in the day, GPUs were still rolling

around with 3 gigabytes of memory (nowadays those
kinds of memory would be rookie numbers).
• This was especially bad because the training set had
1.2 million images.
• AlexNet allows for multi-GPU training by putting half
of the model’s neurons on one GPU and the other
half on another GPU.
• Not only does this mean that a bigger model can be
trained, but it also cuts down on the training time.
AlexNet

• Overlapping Pooling. CNNs traditionally “pool”

outputs of neighboring groups of neurons with no
overlapping.
• However, when the authors introduced overlap,
they saw a reduction in error by about 0.5% and
found that models with overlapping pooling
generally find it harder to overfit.
AlexNet
AlexNet
AlexNet: Overfitting

• AlexNet had 60 million parameters, a major issue in

terms of overfitting. Two methods were employed to
reduce overfitting:
– Data Augmentation. The authors used label-
preserving transformation to make their data more
varied. Specifically, they generated image translations
and horizontal reflections, which increased the
training set by a factor of 2048.
– They also performed Principle Component Analysis
(PCA) on the RGB pixel values to change the
intensities of RGB channels, which reduced the top-1
error rate by more than 1%.
AlexNet: Dropout

• This technique consists of “turning off” neurons

with a predetermined probability (e.g. 50%).
• This means that every iteration uses a different
sample of the model’s parameters, which forces
each neuron to have more robust features that can
be used with other random neurons.
• However, dropout also increases the training time
needed for the model’s convergence.
AlexNet: The Results

• On the 2010 version of the ImageNet competition,

the best model achieved 47.1% top-1 error and 28.2%
top-5 error.
• AlexNet vastly outpaced this with a 37.5% top-1 error
and a 17.0% top-5 error.
• AlexNet is able to recognize off-center objects and
most of its top five classes for each image are
reasonable.
• AlexNet won the 2012 ImageNet competition with a
top-5 error rate of 15.3%, compared to the second
place top-5 error rate of 26.2%.
AlexNet: What now?

• AlexNet is an incredibly powerful model capable of

achieving high accuracies on very challenging
datasets.
• However, removing any of the convolutional layers
will drastically degrade AlexNet’s performance.
• AlexNet is a leading architecture for any object-
detection task and may have huge applications in the
computer vision sector of artificial intelligence
problems.
• In the future, AlexNet may be adopted more than
CNNs for image tasks.
VGG

• VGG, or Visual Geometry Group, is a type of

convolutional neural network (CNN) architecture that
was proposed by Karen Simonyan and Andrew
Zisserman of the Visual Geometry Group (VGG) at the
University of Oxford in 2014.
• The VGG model is characterized by its simplicity and its
use of small 3x3 convolution filters.
• The model was one of the most successful entries in the
2014 ImageNet Large Scale Visual Recognition
Challenge (ILSVRC), and it has been used as a basis for
many other CNN architectures.
VGG

• The VGG model is composed of a series of

convolutional layers, followed by max pooling
layers, and then fully connected layers.
• The convolutional layers use small 3x3 filters, which
helps to preserve spatial information in the images.
• The max pooling layers downsample the images,
which helps to reduce the computational
complexity of the model.
• The fully connected layers classify the images into
different classes.
VGG

• The VGG model has been shown to be very

effective for image classification tasks.
• It achieved a top-5 error rate of 16.4% on the
ILSVRC 2014 dataset, which was a significant
improvement over the previous state-of-the-art.
• The VGG model has also been used for other tasks,
such as object detection and segmentation.
VGG: Architecture

• The input to the convolution neural network is a fixed-

size 224 × 224 RGB image.
• The only preprocessing it does is subtracting the mean
RGB values, which are computed on the training
dataset, from each pixel.
• Then the image is running through a stack of
convolutional (Conv.) layers, where there are filters with
a very small receptive field that is 3 × 3, which is the
smallest size to capture the notion of left/right,
up/down, and center part.
VGG: Architecture

• In one of the configurations, it also utilizes 1 × 1 convolution

filters, which can be observed as a linear transformation of
the input channels followed by non-linearity.
• The convolutional strides are fixed to 1 pixel; the spatial
padding of convolutional layer input is such that the spatial
resolution is maintained after convolution, that is the
padding is 1 pixel for 3 × 3 Conv. layers.
• Then the Spatial pooling is carried out by five max-pooling
layers, 16 which follow some of the Conv. layers but not all
the Conv. layers are followed by max-pooling.
• This Max-pooling is performed over a 2 × 2-pixel window,
with stride 2.
VGG: Architecture
VGG: Advantages

• It is simple and easy to understand.

• It is very effective for image classification tasks.
• It has been used as a basis for many other CNN
architectures.
VGG: Disadvantages

• It is computationally expensive to train.

• It is not as good as some newer CNN architectures
for some tasks.
ResNet

• ResNet, short for Residual Network, is a type of

convolutional neural network (CNN) architecture that was
introduced in 2015 by Kaiming He, Xiangyu Zhang, Shaoqing
Ren, and Jian Sun in their paper “Deep Residual Learning for
Image Recognition”.
• ResNets are very deep CNNs, which means that they have a
large number of layers.
• However, training very deep CNNs can be difficult, because
the gradients can become very small as they propagate
through the network. This can make it difficult for the
network to learn.
ResNet

• ResNets solve this problem by introducing the concept

of residual connections.
• A residual connection is a direct connection between
the input of a layer and the output of the layer.
• This means that the output of the layer is not simply
the result of the convolution operation, but it is also
the sum of the convolution operation and the input of
the layer.
• This allows the gradient to flow more easily through
the network, which makes it easier for the network to
learn.
Residual Block

• Residual blocks are an important part of the ResNet

architecture. In older architectures such as VGG16,
convolutional layers are stacked with batch
normalization and nonlinear activation layers such as
ReLu between them.
• This method works with a small number of
convolutional layers—the maximum for VGG models is
around 19 layers.
• However, subsequent research discovered that
increasing the number of layers could significantly
improve CNN performance.
Residual Block

• The ResNet architecture introduces the simple concept

of adding an intermediate input to the output of a
series of convolution blocks. This is illustrated below.
ResNet

• ResNets have been very successful for image

classification tasks.
• They have achieved state-of-the-art results on the
ImageNet dataset, which is a large dataset of
images with their corresponding labels.
• ResNets have also been used for other tasks, such
as object detection and segmentation.
ResNet
ResNet: Advantages

• They are very deep, which allows them to learn

complex features.
• They are able to learn even when the gradients are
very small.
• They have been shown to be very effective for
image classification tasks.
ResNet: Disadvantages

• They can be computationally expensive to train.

• They can be difficult to understand and debug.
ResNet: Architectures

• Here are some of the most popular ResNet

architectures:
– ResNet50
– ResNet101
– ResNet152
– ResNet200
– ResNet34
– ResNet18
Thank you
This presentation is created using LibreOffice Impress 7.4.1.2, can be used freely as per GNU General Public License

@mITuSkillologies @mitu_group @mitu-skillologies @MITUSkillologies

Web Resources
https://mitu.co.in
@mituskillologies http://tusharkute.com @mituskillologies

[email protected]
[email protected]

Cirrus 6000 User Manual
No ratings yet
Cirrus 6000 User Manual
542 pages
Assignment No 2 (Aleeza Anjum CS101)
No ratings yet
Assignment No 2 (Aleeza Anjum CS101)
60 pages
ML II - Unit IV
No ratings yet
ML II - Unit IV
20 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
Untitled document (2)
No ratings yet
Untitled document (2)
15 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
CNN Apps
No ratings yet
CNN Apps
17 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
Trustworthy - Final Essay
No ratings yet
Trustworthy - Final Essay
21 pages
DLP
No ratings yet
DLP
50 pages
Understanding AlexNet
No ratings yet
Understanding AlexNet
8 pages
Alex Net
No ratings yet
Alex Net
26 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
Unit 5
No ratings yet
Unit 5
24 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Alexnet: The Architecture That Challenged Cnns
No ratings yet
Alexnet: The Architecture That Challenged Cnns
6 pages
Unit V
No ratings yet
Unit V
84 pages
Benchmark Analysis of Popular Imagenet Classification Deep CNN Architectures
No ratings yet
Benchmark Analysis of Popular Imagenet Classification Deep CNN Architectures
7 pages
Famous Networks
No ratings yet
Famous Networks
6 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
Convolutional Neural Network Report
No ratings yet
Convolutional Neural Network Report
5 pages
DeepLearningAssign2
No ratings yet
DeepLearningAssign2
5 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
Literature Review On Image Classification Architecture
No ratings yet
Literature Review On Image Classification Architecture
14 pages
5b Dana
No ratings yet
5b Dana
67 pages
CNN Architectures - LeNet, AlexNet, VGG, GoogLeNet, ResNet and More - by Siddharth Das - Analytics Vidhya - Medium
No ratings yet
CNN Architectures - LeNet, AlexNet, VGG, GoogLeNet, ResNet and More - by Siddharth Das - Analytics Vidhya - Medium
6 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
Modern Convolutional Neural Networks
No ratings yet
Modern Convolutional Neural Networks
68 pages
Data Science Interview Preparation (#DAY 14)
No ratings yet
Data Science Interview Preparation (#DAY 14)
11 pages
Mổ xẻ cái AlexNet network
No ratings yet
Mổ xẻ cái AlexNet network
5 pages
Deep Learning (MODULE-3) (1)
No ratings yet
Deep Learning (MODULE-3) (1)
85 pages
XCXC
No ratings yet
XCXC
16 pages
Malware_Image_Classification_Using_ML_DL (1)
No ratings yet
Malware_Image_Classification_Using_ML_DL (1)
5 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
Unit 2 CNN
No ratings yet
Unit 2 CNN
15 pages
Alexnet Tugce Kyunghee
No ratings yet
Alexnet Tugce Kyunghee
35 pages
Module 05
No ratings yet
Module 05
10 pages
deeplearning_ppt_unit 4 and 5.pptx
No ratings yet
deeplearning_ppt_unit 4 and 5.pptx
154 pages
Notes
No ratings yet
Notes
15 pages
dl ass 742
No ratings yet
dl ass 742
14 pages
6 Apr - 6 - DL
No ratings yet
6 Apr - 6 - DL
69 pages
Imagenet Classification
No ratings yet
Imagenet Classification
9 pages
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
No ratings yet
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
82 pages
DL3 QB
No ratings yet
DL3 QB
19 pages
Different Deep CNN Architectures - LeNet, AlexNet, VGG
No ratings yet
Different Deep CNN Architectures - LeNet, AlexNet, VGG
13 pages
XLA_final_report (1)
No ratings yet
XLA_final_report (1)
17 pages
Espinosa, Velastin, Branch - 2017 - Vehicle detection using alex net and faster R-CNN deep learning models A comparative study-annotated
No ratings yet
Espinosa, Velastin, Branch - 2017 - Vehicle detection using alex net and faster R-CNN deep learning models A comparative study-annotated
14 pages
Difference between AlexNet, VGGNet, ResNet, and Inception
No ratings yet
Difference between AlexNet, VGGNet, ResNet, and Inception
25 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
DNN Architectures
No ratings yet
DNN Architectures
12 pages
Classic Cnn
No ratings yet
Classic Cnn
39 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
37 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
VGG net
No ratings yet
VGG net
6 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
Post-Reading Report Alex Shen (Mid Exam)
No ratings yet
Post-Reading Report Alex Shen (Mid Exam)
36 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
(Ebook) The Everyday Internet All-In-One Desk Reference For Dummies by Peter Weverka ISBN 9780764588754, 0764588753 - Own the complete ebook with all chapters in PDF format
100% (1)
(Ebook) The Everyday Internet All-In-One Desk Reference For Dummies by Peter Weverka ISBN 9780764588754, 0764588753 - Own the complete ebook with all chapters in PDF format
49 pages
Product Technical Sheet
No ratings yet
Product Technical Sheet
2 pages
MySQL Books
No ratings yet
MySQL Books
2 pages
3d GIS Workshop - Extrusions
No ratings yet
3d GIS Workshop - Extrusions
5 pages
Morningstar Tristar Ts45mppt Ts60mppt Manual en
No ratings yet
Morningstar Tristar Ts45mppt Ts60mppt Manual en
68 pages
(2522-8609) Josephina Antoniou - Game Theory, The Internet of Things and 5G Networks (2019, Springer, Cham) PDF
No ratings yet
(2522-8609) Josephina Antoniou - Game Theory, The Internet of Things and 5G Networks (2019, Springer, Cham) PDF
118 pages
Mcafee Enterprise Security Manager 11.5.x Installation Guide 11-12-2021
No ratings yet
Mcafee Enterprise Security Manager 11.5.x Installation Guide 11-12-2021
52 pages
Microsoft OS-Windows 7 All Brand Key
No ratings yet
Microsoft OS-Windows 7 All Brand Key
2 pages
JCI Philippines Presidents 1947-2021
No ratings yet
JCI Philippines Presidents 1947-2021
3 pages
Diplexer K 782106xx
No ratings yet
Diplexer K 782106xx
2 pages
System Architecture and Implementation of MIMO PDF
No ratings yet
System Architecture and Implementation of MIMO PDF
10 pages
Optical Media Board
No ratings yet
Optical Media Board
2 pages
14.lab Manual PFSD
No ratings yet
14.lab Manual PFSD
77 pages
MF Interview Questions
100% (1)
MF Interview Questions
13 pages
Laporan Flutter Dan Android Studio - 1815051088
No ratings yet
Laporan Flutter Dan Android Studio - 1815051088
14 pages
HTT340_HM1114A_Group 5
No ratings yet
HTT340_HM1114A_Group 5
11 pages
Pixma Mp500: Service Manual
No ratings yet
Pixma Mp500: Service Manual
66 pages
EPRI Ultrasound Assisted Lubrication
100% (1)
EPRI Ultrasound Assisted Lubrication
39 pages
Perl Regex Documentation
No ratings yet
Perl Regex Documentation
35 pages
IPASE_MAY_2025_1ST_YEAR_VOCATIONAL_TIME_TABLE
No ratings yet
IPASE_MAY_2025_1ST_YEAR_VOCATIONAL_TIME_TABLE
3 pages
Mechanics II for JEE (Advanced), 3rd edition DPP B. M. Sharma - The full ebook with all chapters is available for download
100% (1)
Mechanics II for JEE (Advanced), 3rd edition DPP B. M. Sharma - The full ebook with all chapters is available for download
37 pages
Big Data and The Courts
No ratings yet
Big Data and The Courts
8 pages
Chapter 3 Artificial Intelligence (AI) - Final
No ratings yet
Chapter 3 Artificial Intelligence (AI) - Final
43 pages
2019 03 GL Safety-Critical-Communications V1.0 Final
No ratings yet
2019 03 GL Safety-Critical-Communications V1.0 Final
32 pages
UPSC Syllabus
No ratings yet
UPSC Syllabus
4 pages
Abcc1103 - Introduction To Communication
No ratings yet
Abcc1103 - Introduction To Communication
15 pages
Introduction: Introduction To Soft Computing Introduction To Fuzzy Sets and Fuzzy Logic Systems Introduction
No ratings yet
Introduction: Introduction To Soft Computing Introduction To Fuzzy Sets and Fuzzy Logic Systems Introduction
1 page
The Manyfoot Package: 1 User Interface
No ratings yet
The Manyfoot Package: 1 User Interface
23 pages
Modern C++ for Absolute Beginners 2nd Edition Slobodan Dmitrović - Get the ebook in PDF format for a complete experience
100% (1)
Modern C++ for Absolute Beginners 2nd Edition Slobodan Dmitrović - Get the ebook in PDF format for a complete experience
48 pages