0% found this document useful (0 votes)

85 views64 pages

Ch3 CNN

Uploaded by

Sagar Photo1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views64 pages

Ch3 CNN

Uploaded by

Sagar Photo1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

CS 404/504, Fall 2021

Convolutional Neural Networks

1
CS 404/504, Fall 2021

CNN over ANN

• High Computational Cost

• Overfitting
• Loss of image feature related information

2
CS 404/504, Fall 2021

CNN Working Intution

 Input image pixels → Edges → Textures → Parts → Objects

Low-Level Mid-Level High-Level Trainable

Output
Features Features Features Classifier

Slide credit: Param Vir Singh – Deep Learning 3

CS 404/504, Fall 2021

Application Areas of CNN

4
CS 404/504, Fall 2021

Basics of CNN
• The name “convolutional neural network” indicates that the network employs a
mathematical operation called convolution.

• Convolution is a specialized kind of linear operation. Convolutional networks are

simply neural networks that use convolution in place of general matrix
multiplication in at least one of their layers.

• In terms of deep learning, an (image) convolution is an element-wise

multiplication of two matrices followed by a sum.

A convolution is:

• Take two matrices (which both have the same dimensions).

• Multiply them, element-by-element (i.e., not the dot product, just a simple
multiplication).
• Sum the elements together.

Picture from: Ismini Lourentzou – Introduction to Deep Learning 5

CS 404/504, Fall 2021

Convolutions Vs Cross-correlation:

Convolution (denoted by the * operator) over a two-dimensional input

image I and two-dimensional kernel K is defined as:

Here is a simple example of convolution of 3x3 input signal and impulse

response (kernel) in 2D spatial.

Input
Kernel 6
CS 404/504, Fall 2021

Feature Map/Output

7
CS 404/504, Fall 2021

Convolution Animation:

8
CS 404/504, Fall 2021

• However, nearly all machine learning and deep learning libraries use the
simplified cross-correlation function.

Many machine learning libraries implement cross-correlation but call it

convolution.

• All this math amounts to is a sign change in how we access the coordinates of
the image I (i.e., we don’t have to “flip” the kernel relative to the input when
applying cross-correlation).
• A convolutional filter slides (i.e., convolves) across the image

Convolutional
Input matrix 3x3 filter

Picture from: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution 9

CS 404/504, Fall 2021

The “Big Matrix” and “Tiny Matrix” Analogy

• an image as big matrix and a kernel or convolutional matrix as a tiny matrix

that is used for blurring, sharpening, edge detection, and other processing
functions.

• Essentially, this tiny kernel sits on top of the big image and slides from left-to-
right and top-to-bottom, applying a mathematical operation (i.e., a convolution)
at each (x, y)-coordinate of the original image

10
CS 404/504, Fall 2021

Kernels
• we are sliding the kernel (red region)
from left-to-right and top-to-bottom along the
original image.

• At each (x, y)-coordinate of the original

image, we stop and examine the
neighborhood of pixels located at the center of
the image kernel.

• We then take this neighborhood of

pixels, convolve them with the kernel, and
obtain a single output value.

• The output value is stored in the output

image at the same (x, y)-coordinates as the
center of the kernel.

11
CS 404/504, Fall 2021

• let’s take a look at what a kernel looks like

• Kernels can be of arbitrary rectangular size M×N, provided that both M and N are
odd integers.
• We use an odd kernel size to ensure there is a valid integer (x, y)-coordinate at the
center of the image (Figure 2).
• On the left, we have a 3×3 matrix. The center of the matrix is located at x = 1, y = 1,
• where the top-left corner of the matrix is used as the origin and our coordinates
are zero-indexed.
• But on the right, we have a 2×2 matrix. The center of this matrix would be located
at x = 0.5, y = 0.5.

pixel coordinates must be integers!

This reasoning is exactly why we

use odd kernel sizes:

12
CS 404/504, Fall 2021

Types of Kernals
• Prewitt Filters:
It is used to detect vertical and horizontal edges. The horizontal (x-direction) filter helps
to detect edges in the image which cut perpendicularly through the horizontal axis and
vice versa for the vertical (y-direction) filter.

13
CS 404/504, Fall 2021

• Sobel Filters:
 Just like the Prewitt operator, the Sobel operator is also made up of a vertical and
horizontal edge detection filter. edges detected using the Sobel filters are sharper in
comparison to Prewitt filters.

14
CS 404/504, Fall 2021

• Laplacian Filter:
 Laplacian filter is a single filter which detects edges of different orientation. From a
mathematical standpoint, it computes second order derivatives of pixel values unlike
the Prewitt and Sobel filters which compute first order derivatives.

15
CS 404/504, Fall 2021

Three extremely simple but effective filters are the sharpen, Laplacian and
emboss filters given by 3x3 matrices.

16
CS 404/504, Fall 2021

17
CS 404/504, Fall 2021

• Depending on the element values, a kernel can cause a wide range of effects. .

18
CS 404/504, Fall 2021

The Role of Convolutions in Deep Learning

• we must manually hand-define each of our kernels for each of our various image
processing operations, such as smoothing, sharpening, and edge detection.

• Is it possible to define a machine learning algorithm that can look at our input images
and eventually learn these types of operators?

• By applying convolutional filters, nonlinear activation functions, pooling, and

backpropagation, CNNs are able to learn filters that can detect edges and blob-like
structures in lower-level layers of the network – and then use the edges and
structures as “building blocks”, eventually detecting high-level objects (e.x., faces,
cats, dogs, cups, etc.) in the deeper layers of the network.

• This process of using the lower-level layers to learn high-level features is exactly the
compositionality of CNNs that we were referring to earlier.

• But exactly how do CNNs do this? The answer is by stacking a specific set of layers
in a purposeful manner.

19
CS 404/504, Fall 2021

CNN Building Blocks

• As we know, neural networks accept an input image/feature vector (one input
node for each entry) and transform it through a series of hidden layers,
commonly using nonlinear activation functions. Each hidden layer is also made
up of a set of neurons, where each neuron is fully-connected to all neurons in the
previous layer.

• The last layer of a neural network (i.e., the “output layer”) is also fully-
connected and represents the final output classifications of the network.

• Neural networks operating directly on raw pixel intensities:

1. Do not scale well as the image size increases.

2. Leaves much accuracy to be desired (i.e., a standard feedforward neural

network on CIFAR- 10 obtained only 52% accuracy).

20
CS 404/504, Fall 2021

• let’s again consider the CIFAR-10 dataset. Each image in CIFAR-10 is 32x32 with a
Red, Green, and Blue channel, yielding a total of 32x32x3 = 3;072 total inputs to our
network.

• A total of 3072 inputs does not seem to amount to much, but consider if we were
using 250x250 pixel images – the total number of inputs and weights would jump to
250x250x3 = 187,500 – and this number is only for the input layer alone!

• Surely, we would want to add multiple hidden layers with varying number of nodes
per layer – these parameters can quickly add up, and given the poor performance of
standard neural networks on raw pixel intensities, this bloat is hardly worth it.

• Instead, we can use Convolutional Neural Networks (CNNs) that take advantage of
the input image structure and define a network architecture in a more sensible way.

• again consider the CIFAR-10 dataset: the input volume will have dimensions
32x32x3 (width, height, and depth, respectively).

• Neurons in subsequent layers will only be connected to a small region of the layer
before it (rather than the fully-connected structure of a standard neural network) –
we call this local connectivity which enables us to save a huge amount of
parameters in our network.
21
CS 404/504, Fall 2021

Layer Types
• There are many types of layers used to build Convolutional Neural Networks,
but the ones you are most likely to encounter include:

• Convolutional (CONV)
• Activation (ACT or RELU, where we use the same or the actual activation
function)
• Pooling (POOL)
• Fully-connected (FC)
• Batch normalization (BN)
• Dropout (DO)

• Stacking a series of these layers in a specific manner yields a CNN. We often use
simple text diagrams to describe a

CNN: INPUT => CONV => RELU => FC => SOFTMAX

22
CS 404/504, Fall 2021

• Of these layer types, CONV and FC, (and to a lesser extent, BN) are the only
layers that contain parameters that are learned during the training process.
Activation and dropout layers are not considered true “layers" themselves, but
are often included in network diagrams to make the architecture explicitly clear.

• Pooling layers (POOL), of equal importance as CONV and FC, are also
included in network diagrams as they have a substantial impact on the spatial
dimensions of an image as it moves through a CNN.

• CONV, POOL, RELU, and FC are the most important when defining your actual
network architecture. That’s not to say that the other layers are not critical, but
take a backseat to this critical set of four as they define the actual architecture
itself.

Note : Activation functions themselves are practically assumed to be part of the

architecture, When defining CNN architectures we often omit the activation layers from a
table/diagram to save space; however, the activation layers are implicitly assumed to be
part of the architecture.

23
CS 404/504, Fall 2021

Convolutional Layers
• The CONV layer is the core building block of a Convolutional Neural Network.
• The CONV layer parameters consist of a set of K learnable filters (i.e., “kernels”), where
each filter has a width and a height, and are nearly always square.

• These filters are small (in terms of their spatial dimensions) but extend throughout the full
depth of the volume.

• For inputs to the CNN, the depth is the number of channels in the image (i.e., a depth of
three when working with RGB images, one for each channel). For volumes deeper in the
network, the depth will be the number of filters applied in the previous layer.

• let’s consider the forward-pass of a CNN, where we convolve each of the K filters across the
width and height of the input volume

24
CS 404/504, Fall 2021

Fig: Left: At each convolutional layer in a CNN, there are K kernels applied to the
input volume. Middle: Each of the K kernels is convolved with the input volume. Right:
Each kernel produces a 2D output, called an activation map.

• We can think of each of our K kernels sliding across the input region,
computing an element-wise multiplication, summing, and then storing the
output value in a 2-dimensional activation map, such as in Figure.

• After applying all K filters to the input volume, we now have K, 2-dimensional
activation maps.

• We then stack our K activation maps along the depth dimension of our array to
form the final output volume 25
CS 404/504, Fall 2021

Fig: After obtaining the K activation maps, they are stacked together to form the
input volume to the next layer in the network.

• Every entry in the output volume is thus an output of a neuron that “looks” at
only a small region of the input. In this manner, the network “learns” filters that
activate when they see a specific type of feature at a given spatial location in the
input volume.

• In lower layers of the network, filters may activate when they see edge-like or
corner-like regions.

• Then, in the deeper layers of the network, filters may activate in the presence of
high-level features, such as parts of the face, the paw of a dog, the hood of a car, etc.

26
CS 404/504, Fall 2021
• This activation concept goes back to our neural network analogy these neurons
are becoming “excited” and “activating” when they see a particular pattern in an
input image.

• The concept of convolving a small filter with a large(r) input volume has special
meaning in Convolutional Neural Networks – specifically, the local connectivity
and the receptive field of a neuron.

• when utilizing CNNs, we choose to connect each neuron to only a local region
of the input volume – we call the size of this local region the receptive field (or
simply, the variable F) of the neuron.

• let’s return to our CIFAR-10 dataset where the input volume has an input size of
32x32x3. If our receptive field is of size 3x3, then each neuron in the CONV layer
will connect to a 3x3 local region of the image for a total of 3x3x3 = 27 weights

27
CS 404/504, Fall 2021
• Simply put, the receptive field F is the size of the filter, yielding an FxF kernel that is convolved with the input volume.

• There are three parameters that control the size of an output volume:
• the depth,
• stride, and
• zero-padding

28
CS 404/504, Fall 2021

Depth

• The depth of an output volume controls the number of neurons (i.e., filters) in
the CONV layer that connect to a local region of the input volume. Each filter
produces an activation map that “activates” in the presence of oriented edges or
blobs or color.

• For a given CONV layer, the depth of the activation map will be K, or simply the
number of filters we are learning in the current layer. The set of filters that are
“looking at” the same (x,y)location of the input is called the depth column.

29
CS 404/504, Fall 2021

Stride
• In convolution , we only took a step of one pixel each time. In the context of
CNNs, the same principle can be applied – for each step, we create a new depth
column around the local region of the image where we convolve each of the K
filters with the region and store the output in a 3D volume.
• When creating our CONV layers we normally use a stride step size S of either S
= 1 or S = 2.
• Smaller strides will lead to overlapping receptive fields and larger output
volumes. Conversely, larger strides will result in less overlapping receptive
fields and smaller output volumes.
• To make the concept of convolutional stride more concrete, consider the Table
below

30
CS 404/504, Fall 2021

• Thus, we can see how convolution layers can be used to reduce the spatial
dimensions of the input volumes simply by changing the stride of the kernel.

stride = 1

stride = 2

31
CS 404/504, Fall 2021

Zero-padding

• It’s important to understand the process of “sliding” a convolutional matrix

across an image, applying the convolution, and then storing the output, which
will actually decrease the spatial dimensions of our input image.
• we “center” our computation around the center (x,y)-coordinate of the input
image that the kernel is currently positioned over.
• This positioning implies there is no such thing as “center” pixels for pixels
that fall along the border of the image (as the corners of the kernel would be
“hanging off” the image where the values are undefined
32
CS 404/504, Fall 2021

• The decrease in spatial dimension is simply a side effect of applying

convolutions to images.

• Sometimes this effect is desirable, and other times it is not, it simply depends on
your application.

• However, in most cases, we want our output image to have the same
dimensions as our input image. To ensure the dimensions are the same, we
apply padding.

• Here we are simply replicating the pixels along the border of the image, such
that the output image will match the dimensions of the input image.

• we need to “pad” the borders of an image to retain the original image size when
applying a convolution – the same is true for filters inside of a CNN.

• Using zero-padding, we can “pad” our input along the borders such that our
output volume size matches our input volume size. The amount of padding we
apply is controlled by the parameter P.

33
CS 404/504, Fall 2021

padding = 1, stride = 1

34
CS 404/504, Fall 2021

• If we instead set P = 1, we can pad our input volume with zeros (right) to create
a 7x7 volume and then apply the convolution operation, leading to an output
volume size that matches the original input volume size of 5x5 (bottom).

• we can compute the size of an output volume as a function of the input volume
size (W, assuming the input images are square, which they nearly always are),
the receptive field size F, the stride S, and the amount of zero-padding P.

• To construct a valid CONV layer, we need to ensure the following equation is an

integer:

35
CS 404/504, Fall 2021

• If equation value is not an integer, then the strides are set incorrectly, and the
neurons cannot be tiled such that they fit across the input volume in a symmetric
way.

36
CS 404/504, Fall 2021

Activation Layers
• After each CONV layer in a CNN, we apply a nonlinear activation function, such
as ReLU, ELU, or any of the other Leaky ReLU variants.

• Activation layers are not technically “layers” (due to the fact that no
parameters/weights are learned inside an activation layer) and are sometimes
omitted from network architecture diagrams as it’s assumed that an activation
immediately follows a convolution.

• An activation layer accepts an input volume of size WxHxD input and then
applies the given activation function. Since the activation function is applied in
an element-wise manner, the output of an activation layer is always the same as
the input dimension.

37
CS 404/504, Fall 2021

Pooling Layers

• There are two methods to reduce the size of an input volume – CONV layers with
a stride > 1 and POOL layers. It is common to insert POOL layers in-between
consecutive CONV layers in a CNN architectures:

INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC

• The primary function of the POOL layer is to progressively reduce the spatial size
(i.e., width and height) of the input volume.

Benefits:
• Faster computation due to reduced parameters
• Memory issue
• Robust to variations in the features’ positions such that if the feature exists in a
different position than it did in the training data, it can still be accurately classified.
(Translation invariant)
• Reduce overfitting

38
CS 404/504, Fall 2021

• Common Types of Pooling:

 Max Pooling:
o In this type of pooling, the summary of the features in a region is
represented by the maximum value in that region. It is mostly used when
the image has a dark background since max pooling will select brighter
pixels.
 Min Pooling:
o In this type of pooling, the summary of the features in a region is
represented by the minimum value in that region. It is mostly used when
the image has a light background since min pooling will select darker
pixels.

Max Pooling Min Pooling

39
CS 404/504, Fall 2021

 Average Pooling
o In the third type of pooling, the summary of the features in a region are
represented by the average value of that region. Average pooling smooths the
harsh edges of a picture and is used when such edges are not important.

 Global pooling
o Each channel in the feature map is reduced to just one value. The value depends
on the type of global pooling, which can be any one of min, max, average types.
o Global pooling is almost like applying a filter of the exact dimensions of the
feature map.

40
CS 404/504, Fall 2021

• Max pooling is typically done in the middle of the CNN architecture to reduce
spatial size, whereas average pooling is normally used as the final layer of the
network (e.x., GoogLeNet, SqueezeNet, ResNet) where we wish to avoid using
FC layers entirely. The most common type of POOL layer is max pooling.

• Typically we’ll use a pool size of 2x2, although deeper CNNs that use larger
input images (> 200 pixels) may use a 3x3 pool size early in the network
architecture.

• We also commonly set the stride to either S = 1 or S = 2.

41
CS 404/504, Fall 2021

• In summary, POOL layers Accept an input volume of size Wi/p x Hi/p x Di/p.
They then require two parameters:
 The receptive field size F (also called the “pool size”).
 The stride S.
• Applying the POOL operation yields an output volume of size Wo/p x Ho/p x Do/p
where:
 Wo/p = ((Wi/p - F)/S)+1
 Ho/p = ((Hi/p - F)/S)+1
 Do/p = Di/p
• In practice, we tend to see two types of max pooling variations:

• Type #1: F = 3;S = 2 which is called overlapping pooling and normally applied to
images/ input volumes with large spatial dimensions.

• Type #2: F = 2;S = 2 which is called non-overlapping pooling. This is the most
common type of pooling and is applied to images with smaller spatial dimensions.

• For network architectures that accept smaller input images (in the range of 32-64
pixels) you may also see F = 2;S = 1 as well.

42
CS 404/504, Fall 2021

• To POOL or CONV?
 2014 paper, Striving for Simplicity: The All Convolutional Net, Springenberg et al.
recommend discarding the POOL layer entirely and simply relying on CONV layers
with a larger stride to handle down sampling the spatial dimensions of the volume.
 It’s becoming increasingly more common to not use POOL layers in the middle of the
network architecture and only use average pooling at the end of the network if FC
layers are to be avoided.

Flattening :

We take the pooled feature map and convert it into a column. We just take
the numbers row by row and put them into one long column.

For each pooled feature map in the pooling layer, we apply the flattening
and the resulting column becomes a huge vector of inputs for an artificial
neural network. 43
CS 404/504, Fall 2021

Fully-connected Layers

• Neurons in FC layers are fully-connected to all activations in the previous layer,

as is the standard for feedforward neural networks
• FC layers are always placed at the end of the network (i.e., we don’t apply a
CONV layer, then an FC layer, followed by another CONV) layer.
• It’s common to use one or two FC layers prior to applying the softmax classifier,
as the following (simplified) architecture demonstrates:

INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => FC

44
CS 404/504, Fall 2021

Batch Normalization
• Batch Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift.

• Although, our input X was normalized with time the output will no longer be on
the same scale. As the data go through multiple layers of the neural network and
L activation functions are applied, it leads to an internal co-variate shift in the
data.
What exactly the Internal covariate shift?

45
CS 404/504, Fall 2021

• Suppose we are training an image classification model, that classifies the

images into Dog or Not Dog. Let’s say we have the images of white dogs only,
these images will have certain distribution as well. Using these images model
will update its parameters.

• if we get a new set of images, consisting of non-white dogs. These new images
will have a slightly different distribution from the previous images.

• Now the model will change its parameters according to these new images.
Hence the distribution of the hidden activation will also change and results are
showing wrong.

• This change in hidden activation is known as an internal covariate shift.

46
CS 404/504, Fall 2021

• Batch normalization layers (or BN for short), as the name suggests, are used to
normalize the activations of a given input volume before passing it into the
next layer in the network.
• If we consider x to be our mini-batch of activations, then we can compute the
normalized via the following equation:

• We set  equal to a small positive value such as 1e-7 to avoid dividing by zero.
Applying this equation implies that the activations leaving a batch
normalization layer will have approximately zero mean and unit variance
(i.e., zero-centered).

47
CS 404/504, Fall 2021

• At testing time, we replace the mini-batch and with running averages of and
computed during the training process.

• This ensures that we can pass images through our network and still obtain
accurate predictions without being biased by the and from the final mini-batch
passed through the network at training time.

• Benefits of BN:
 extremely effective at reducing the number of epochs it takes to train a neural
network
 Helping “stabilize” training, allowing for a larger variety of learning rates
and regularization strengths.
 help us prevent overfitting and allows us to obtain significantly higher
classification accuracy in fewer epochs compared to the same network
architecture without batch normalization.

• The biggest drawback of batch normalization is that it can actually slow

down the wall time it takes to train your network (even though you’ll need
fewer epochs to obtain reasonable accuracy) by 2-3x due to the computation of
per-batch statistics and normalization.
48
CS 404/504, Fall 2021

Dropout
• Dropout is actually a form of regularization that aims to help prevent overfitting by
increasing testing accuracy, perhaps at the expense of training accuracy.

• For each mini-batch in our training set, dropout layers, with probability p, randomly
disconnect inputs from the preceding layer to the next layer in the network
architecture.

• we randomly disconnect with probability p=0.5 the connections between two FC

layers for a given mini-batch.

• After the forward and backward pass are computed for the minibatch, we re-connect
the dropped connections, and then sample another set of connections to drop.
49
CS 404/504, Fall 2021

• The reason we apply dropout is to reduce overfitting by explicitly altering the

network architecture at training time.

• Randomly dropping connections ensures that no single node in the network is

responsible for “activating” when presented with a given pattern.

• Instead, dropout ensures there are multiple, redundant nodes that will activate
when presented with similar inputs – this in turn helps our model to generalize.

• It is most common to place dropout layers with p = 0:5 in-between FC layers of

an architecture where the final FC layer is assumed to be our softmax classifier:

... CONV => RELU => POOL => FC => DO => FC =>
DO => FC
• we may also apply dropout with smaller probabilities (i.e., p = 0.10 – 0.25) in
earlier layers of the network as well (normally following a down sampling
operation, either via max pooling or convolution).

50
CS 404/504, Fall 2021

Common Architectures and Training Patterns

• Convolutional Neural Networks are made up of four primary layers: CONV,

POOL, RELU, and FC. Taking these layers and stacking them together in a
particular pattern yields a CNN architecture.

• A standard CNN architecture for image classification takes an image as the

input, passes it through a series of convolutional, nonlinear, pooling
(downsampling), and fully connected layers, and gets an output. The output is
a probability of classes that best describes the image.

51
CS 404/504, Fall 2021

• By far, the most common form of CNN architecture is to stack a few CONV and
RELU layers, following them with a POOL operation.

• We repeat this sequence until the volume width and height is small, at which
point we apply one or more FC layers. Therefore, we can derive the most
common CNN architecture using the following pattern :

INPUT => [[CONV => RELU]*N => POOL?]*M => [FC =>
RELU]*K => FC

Here the * operator implies one or more and the ? indicates an optional
operation. Common choices for each repetition include :

0 <= N <= 3
M >= 0
0 <= K <= 2

52
CS 404/504, Fall 2021

Examples :

INPUT => [CONV => RELU => POOL] * 2 => FC => RELU => FC

INPUT => [CONV => RELU => CONV => RELU => POOL] * 3 => [FC => RELU] * 2 => FC

Here is an example of a very shallow CNN with only one CONV layer (N = M = K = 0) :

INPUT => CONV => RELU => FC

AlexNet :Below is an example of an AlexNet-like CNN architecture:

INPUT => [CONV => RELU => POOL] * 2 => [CONV => RELU] * 3 => POOL =>
[FC => RELU => DO] * 2 => SOFTMAX

For deeper network architectures, such as VGGNet, we’ll stack two (or more) layers before
every POOL layer :

INPUT => [CONV => RELU] * 2 => POOL => [CONV => RELU] * 2 => POOL => [CONV =>
RELU] * 3 => POOL => [CONV => RELU] * 3 => POOL => [FC => RELU => DO] * 2 =>
SOFTMAX

Thumb rules for CNN

53
CS 404/504, Fall 2021

• We can build our own CNN Architecture by varying layes of CNN model.

• Before we see the CNN Architectures, we should know some of the history
regarding why these architectures were developed.

• So, There is a world wide competition named “The ImageNet Large Scale Visual
Recognition Challenge (ILSVRC)” in which ImageNet dataset is used for object
detection and image classification at large scale. This dataset was introduced by
Fei-Fei Li in 2006.

• More than 14 million images have been hand-annotated by the project to

indicate what objects are pictured and in at least one million of the images,
bounding boxes are also provided.

• ImageNet contains more than 20,000 categories, with a typical category, such as
"balloon" or "strawberry", table, chair consisting of several hundred images.

54
CS 404/504, Fall 2021

CNN Architectures:
The history of deep CNNs began with the appearance of LeNet. At that time, the CNNs
were restricted to handwritten digit recognition tasks, which cannot be scaled to all
image classes.

LeNet is a convolutional neural network structure proposed by LeCun.

55
CS 404/504, Fall 2021

• The LeNet architecture consists of two series of CONV => TANH =>
POOL layer sets followed by a fully-connected layer and softmax
output.
LAYER TYPE OUTPUT SIZE FILTER SIZE /
STRIDE
INPUT IMAGE 32X32X1
CONV 28X28X6 5X5,6
POOL 14X14X6 POOL:2X2
STRIDE:2
CONV 10X10X16 5X5,16

POOL 5X5X16 POOL:2X2

STRIDE:2

FLATTEN 400
FC 120
FC 84
SOFTMAX 10

• But it was not popular at that time because of the lack of hardware
equipment, especially GPU.
• Since the success of AlexNet in 2012, CNN has become the best choice for
computer vision applications 56
CS 404/504, Fall 2021

57
CS 404/504, Fall 2021

58
CS 404/504, Fall 2021

59
CS 404/504, Fall 2021

60
CS 404/504, Fall 2021

61
CS 404/504, Fall 2021

62
CS 404/504, Fall 2021

63
CS 404/504, Fall 2021

VGGNet Architecture
• VGG is a classical convolutional neural network architecture. It was based on an analysis
of how to increase the depth of such networks.

• The network utilises small 3 x 3 filters. Otherwise the network is characterized by its
simplicity: the only other components being pooling layers and a fully connected layer.

• The input to VGG based convNet is a 224*224 RGB image. Preprocessing layer takes the
RGB image with pixel values in the range of 0–255 and subtracts the mean image values
which is calculated over the entire ImageNet training set.

Esp32s3 Camera Mastery Free
No ratings yet
Esp32s3 Camera Mastery Free
124 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
PDF
No ratings yet
PDF
6 pages
Depth Prediction Single Image
No ratings yet
Depth Prediction Single Image
8 pages
Book 12 Quantum Mechanics of Many-Particle Systems: Atoms, Molecules - and More
No ratings yet
Book 12 Quantum Mechanics of Many-Particle Systems: Atoms, Molecules - and More
173 pages
Vector Relations
No ratings yet
Vector Relations
3 pages
Space Technology Section-A, Module - 1, Lecture-2: 1.2 Thrust From A Stationary Rocket Engine / Motor
No ratings yet
Space Technology Section-A, Module - 1, Lecture-2: 1.2 Thrust From A Stationary Rocket Engine / Motor
7 pages
Plane Poiseuille Flow
100% (1)
Plane Poiseuille Flow
40 pages
05 PDF
100% (1)
05 PDF
37 pages
Mae 4262: Rockets and Mission Analysis: Combustion Overview For Rocket Applications
No ratings yet
Mae 4262: Rockets and Mission Analysis: Combustion Overview For Rocket Applications
28 pages
Physics 05-Fluids (2018)
No ratings yet
Physics 05-Fluids (2018)
73 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
1 Lagrangian For A Continuous System
100% (1)
1 Lagrangian For A Continuous System
9 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
78 pages
VOLTERRA INTEGRAL EQUATIONS .Ru
No ratings yet
VOLTERRA INTEGRAL EQUATIONS .Ru
15 pages
The Discovery of The Vector Representation of Moments and Angular Velocity (Caparrini)
100% (1)
The Discovery of The Vector Representation of Moments and Angular Velocity (Caparrini)
31 pages
Center Manifold Reduction
100% (2)
Center Manifold Reduction
8 pages
Matlab Matlab Toolbox Deep Learning Toolbox Neural Network Toolbox Libraries Functions How To Use
No ratings yet
Matlab Matlab Toolbox Deep Learning Toolbox Neural Network Toolbox Libraries Functions How To Use
5 pages
Guide Convolutional Neural Network CNN
100% (1)
Guide Convolutional Neural Network CNN
25 pages
1960 Copper Dynamics of Liquids in Moving Containers
No ratings yet
1960 Copper Dynamics of Liquids in Moving Containers
5 pages
SplineCNN-Fast Geometric Deep Learning With Continuous B-Spline Kernels
No ratings yet
SplineCNN-Fast Geometric Deep Learning With Continuous B-Spline Kernels
9 pages
Convolutional Neural Networks-CNN PDF
No ratings yet
Convolutional Neural Networks-CNN PDF
95 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Large-Scale Deep Reinforcement Learning
No ratings yet
Large-Scale Deep Reinforcement Learning
6 pages
8.4 - Nozzle Theory PDF
No ratings yet
8.4 - Nozzle Theory PDF
170 pages
ANN Notes
No ratings yet
ANN Notes
54 pages
Rocket Propulsion Elements: Seventh Edition
No ratings yet
Rocket Propulsion Elements: Seventh Edition
7 pages
Chapter 17 PDF
No ratings yet
Chapter 17 PDF
10 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
Computational Tools and Software MATLAB Python
No ratings yet
Computational Tools and Software MATLAB Python
5 pages
Fluids - Lecture 11 Notes: Vorticity and Strain Rate
No ratings yet
Fluids - Lecture 11 Notes: Vorticity and Strain Rate
46 pages
Numerical Fluid Mechanics: Chungen Yin, PH.D
No ratings yet
Numerical Fluid Mechanics: Chungen Yin, PH.D
28 pages
Introduction To AlphaFold RCS 2022
No ratings yet
Introduction To AlphaFold RCS 2022
36 pages
LensDesignForLeds WorkBook Davis3 PDF
No ratings yet
LensDesignForLeds WorkBook Davis3 PDF
66 pages
Chapter 2. Lagrangian Formalism: Essential Graduate Physics CM: Classical Mechanics
No ratings yet
Chapter 2. Lagrangian Formalism: Essential Graduate Physics CM: Classical Mechanics
14 pages
GNN Review
No ratings yet
GNN Review
26 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
How Euler Did It 59 PDEs of Fluids
No ratings yet
How Euler Did It 59 PDEs of Fluids
7 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
PDE For Modelica
No ratings yet
PDE For Modelica
108 pages
Cell Biology by The Numbers: Ron Milo and Rob Phillips
No ratings yet
Cell Biology by The Numbers: Ron Milo and Rob Phillips
368 pages
Fundamentals of Operating Systems-April-2024
No ratings yet
Fundamentals of Operating Systems-April-2024
450 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Matlab Neural Network
No ratings yet
Matlab Neural Network
9 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
Face Recognition With GNU Octave/MATLAB: Philipp Wagner
No ratings yet
Face Recognition With GNU Octave/MATLAB: Philipp Wagner
14 pages
Data Visualization With Ma Thematic A
No ratings yet
Data Visualization With Ma Thematic A
46 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization (Week 2) Quiz
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization (Week 2) Quiz
7 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
Solving ODEs With Matlab Instructors Manual
No ratings yet
Solving ODEs With Matlab Instructors Manual
35 pages
Numerical Methods
No ratings yet
Numerical Methods
28 pages
Demonstration of Artificial Neural Network in Matlab
No ratings yet
Demonstration of Artificial Neural Network in Matlab
5 pages
PETSc Tutorial
No ratings yet
PETSc Tutorial
132 pages
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
No ratings yet
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
35 pages
Introductory Mathematics-Part A
No ratings yet
Introductory Mathematics-Part A
77 pages
Introduction To Plasmonics
No ratings yet
Introduction To Plasmonics
256 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Convolution Nueral Networks
No ratings yet
Convolution Nueral Networks
32 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Module5 ML
No ratings yet
Module5 ML
112 pages