0% found this document useful (0 votes)
28 views

Introducing Convolutional Neural Networks Slides

The document provides an overview of convolutional neural networks (CNNs). It explains that CNNs are inspired by biological processes in the visual cortex and consist of convolution and pooling layers. Convolution layers apply filters to input images to extract features, while pooling layers subsample the inputs to reduce dimensionality. Typical CNN architectures stack multiple convolution and pooling layers to learn increasingly complex patterns in images.

Uploaded by

Rahul Shetty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Introducing Convolutional Neural Networks Slides

The document provides an overview of convolutional neural networks (CNNs). It explains that CNNs are inspired by biological processes in the visual cortex and consist of convolution and pooling layers. Convolution layers apply filters to input images to extract features, while pooling layers subsample the inputs to reduce dimensionality. Typical CNN architectures stack multiple convolution and pooling layers to learn increasingly complex patterns in images.

Uploaded by

Rahul Shetty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Introducing Convolutional

Neural Networks

Janani Ravi
CO-FOUNDER, LOONYCORN
www.loonycorn.com
Overview Intuition behind Convolutional Neural
Networks (CNNs)
Convolution layers and feature maps
Pooling layers to subsample inputs
Typical CNN architecture
How Do We See?
Viewing an Image

All neurons in the eye don’t see the


entire image
Viewing an Image

Each neuron has its own local


receptive field
Viewing an Image

It reacts only to visual stimuli


located in its receptive field
Viewing an Image

Some neurons react to more complex patterns


that are combinations of lower level patterns
Neural Networks

Layer 2
Layer 1

Layer N

Sounds like a classic neural network
problem
Two Kinds of Layers in CNNs

Convolution Pooling
Local receptive field Subsampling of inputs
Convolution
Two Kinds of Layers in CNNs

Convolution Pooling
Local receptive field Subsampling of inputs
Convolution
In this context, a sliding window function applied to
a matrix
Convolution
In this context, a sliding window function applied to
a matrix

e.g. a matrix of pixels


representing an image
Convolution
In this context, a sliding window function applied to
a matrix

Often called a kernel or


filter
Convolution
In this context, a sliding window function applied to
a matrix

Kernel is applied element-wise


in sliding-window fashion
Representing Images as Matrices
28

28

= 784 pixels
Representing Images as Matrices
6
0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0


6 0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

= 36 pixels
Representing Images
3
0 0 0 0 0 0
1 0 1
0.2 0.8 0 0.3 0.6 0

0.2 0.9 0 0.3 0.8 0

0.3 0.8 0.7 0.8 0.9 0 3 0 1 0

0 0 0 0.2 0.8 0
1 0 1
0 0 0 0.2 0.2 0

Matrix Kernel
Convolution

0 0 0 0 0 0
3
0.2 0.8 0 0.3 0.6 0
x1 x0 x1

0.2 0.9 0 0.3 0.8 0


3 x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Kernel
Convolution

0 0 0 0 0 0
4
0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0


4 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1
x0 x1 x0
0 0 0 0.2 0.8 0 1.0 1.8 2.0 1.8
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1 1.2


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1 1.2


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0
x1 x0 x1

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x0 x1 x0

0.2 0.9 0 0.3 0.8 0


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9 2.7


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9 2.7


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9


x1 x0 x1

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x0 x1 x0

0.3 0.8 0.7 0.8 0.9 0


x1 x0 x1

0 0 0 0.2 0.8 0

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9


x1 x0 x1

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x0 x1 x0

0 0 0 0.2 0.8 0
x1 x0 x1

0 0 0 0.2 0.2 0

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0
x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0 1.8


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0 1.8


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0 1.8 2.0


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0 1.8 2.0


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolution

0 0 0 0 0 0

0.2 0.8 0 0.3 0.6 0 1 1.2 1.1 0.9

0.2 0.9 0 0.3 0.8 0 1.9 2.7 2.5 1.9

0.3 0.8 0.7 0.8 0.9 0 1.0 2.1 2.4 1.4


x1 x0 x1

0 0 0 0.2 0.8 0 1.0 1.8 2.0 1.8


x0 x1 x0

0 0 0 0.2 0.2 0
x1 x0 x1

Matrix Convolution
Result
Convolutional Layers
Convolutional Layers

Convolution layers - zoom in on specific


bits of input

Extract structure and features in the input


image

Successive layers aggregate inputs into


higher level features

Pixels >> Lines >> Edges >> Object


Feature Maps

Image Pixels Feature


Map
Feature maps are convolutional
layers generated by applying a
convolutional kernel to the input
Feature Maps

Neurons

Pixels Convolutional
Layer
Feature Maps
Local
Receptive Field Neuron i
of Neuron i

Pixels Convolutional
Layer
Number of neurons Feature Maps
in receptive field =
kernel size Neuron i

Pixels Convolutional
Layer
Kernel Size

Convolutional kernel size usually


expressed in terms of width and
height of receptive area

Use small convolutional kernels, more


efficient

Stacking two 3x3 kernels is preferable


to one 9x9 kernel
Feature Maps

Stride: Distance
between successive
receptive fields

Pixels Convolutional
Layer
Feature Maps

Horizontal Stride

Pixels Convolutional
Layer
Feature Maps

Vertical Stride

Pixels Convolutional
Layer
Feature Maps

Pixels Convolutional
Layer
Feature Maps

Pixels Convolutional
Layer
Feature Maps
Zero padding
may be needed
at the edges

Convolutional
Layer
Feature Maps

Sparse, not
Dense
Feature Maps

Notice also that neurons are not


connected to all pixels
CNNs are sparse neural networks
Feature Maps

All neurons in a feature map have the


same weights and biases
Two big advantages over DNNs
- Dramatically fewer parameters to train
- CNN can recognize feature patterns
independent of location
Feature Maps

The parameters of all neurons in a


feature map are collectively called the
filter
Why filter?
Because weights highlight (filter)
specific patterns from the input pixels
CNNs

Feature Convolutional
CNN
Map Layer
Convolutional Layer

Each convolutional layer consists of


several feature maps of equal sizes
The different feature maps have
different parameters
Pooling Layers
Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 0.3 0.8 0.8 0.9
Max,
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9


2
4 0.3 0.8 0.8 0.9
Max,
2x2 filter,
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 0.3 0.8 0.8 0.9
Max,
2x2 filter,
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 0.3 0.8 0.8 0.9
Max,
2x2 filter, 0.8
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Pooling
4
2
0.2 0.8 0.3 0.6

0.2 0.9 0.3 0.8 0.9 0.8


2
4 0.3 0.8 0.8 0.9
Max,
2x2 filter, 0.8 0.9
stride = 2
0 0 0.2 0.8

Matrix Pooling Result


Pooling Layers

Neurons in a pooling layer have no


weights or biases
A pooling neuron simply applies some
aggregation function to all inputs
Max, sum, average
Pooling Layers

Why use them?


- Greatly reduce memory usage during
training
- Mitigate overfitting (via subsampling)
- Make NN recognize features independent
of location (location invariance)
Pooling Layers

Pooling layers typically act on each


channel independently
So, usually, output area < input area but
Output depth = Input depth
CNN Architectures
Typical CNN Architecture

Convolutional Pooling Convolutional

Alternating groups of convolutional and


pooling layers
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

Each group of convolutional layers usually


followed by a ReLU layer
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

The output of each layer is also an image


Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

However successive outputs are smaller and


smaller (due to pooling layers)
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

As well as deeper and deeper (due to feature


maps in the convolutional layers)
Typical CNN Architecture

ReLU

ReLU
Convolutional Pooling Convolutional

This entire set of layers is then fed into a


regular, feed-forward NN
Typical CNN Architecture

ReLU
ReLU
Convolutional Feed-forward
Pooling Convolutional
Layers

CNN Layers

This entire set of layers is then fed into a


regular, feed-forward NN
Typical CNN Architecture

Fully Connected

Fully Connected
ReLU

ReLU
CNN Layers

Feed-forward
Layers

This feed-forward has a few fully connected


layers with ReLU activation
Typical CNN Architecture

P(Y = 0)

Fully Connected

Fully Connected
P(Y = 1)

Prediction
SoftMax
ReLU

ReLU
CNN Layers

Feed-forward
Layers
P(Y = 9)

This is the output layer, emitting probabilities


Typical CNN Architectures

P(Y=0) P(Y=9)

CNN Input is an image


Outputs are probabilities
Demo
Apply convolution and pooling filters
to images
Summary Intuition behind Convolutional Neural
Networks (CNNs)
Convolution layers and feature maps
Pooling layers to subsample inputs
Typical CNN architecture

You might also like