0% found this document useful (0 votes)
14 views

FundamentalsOfDeepLearning (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

FundamentalsOfDeepLearning (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Fundamentals of Deep Learning

Pontificia Universidad Católica del Perú


Summer Camp en IA
2025

Notes adapted from Dr. César Beltrán (PUCP) and Dr. Ivan Serina (UNIBS)
Review
Scalars

● A scalar is a single number

● Integers, real numbers, rational numbers, etc.


Vectors

● A vector is a 1-D array of numbers:

● Can be real, binary, integer, etc.


● Example notation for type and size:
Matrices

● Multiplications (matrix and vector)


Matrix (Dot) Product
Tensors

A tensor is an array of numbers, that may have

● zero dimensions, and be a scalar

● one dimension, and be a vector

● Two dimensions, and be a matrix

● or more dimensions.
Review Scalar Derivative
Gradients
Chain Rule
Chain Rule
Gradient Descent
Approximate Optimization
History Review
Mark I Perceptron
Frank Rosenblatt ~1958
Mark I Perceptron
The first page of Rosenblatt's
article, “The Design of an
Intelligent Automaton,” in
Research Trends, a Cornell
Aeronautical Laboratory
publication, Summer 1958.

An image of the perceptron from Rosenblatt's


“The Design of an Intelligent Automaton,” Summer
1958.

Rosenblatt and the Images courtesy of Cornell Chronicle (2019)


perceptron.
Perceptron
Perceptron training rule
Adeline/Madeline
Widrow and Hoff ~1960

Adaptive Linear Neuron (Adeline)

https://www.youtube.com/watch?v=IEFRtz68m-8
Neocognitron: a self organizing neural network
model for a mechanism of pattern recognition
unaffected by shift in position.
Fukushima K. 1980
https://www.youtube.com/watch?v=Qil4kmvm2Sw
Learning representations by back-
propagating errors
Rumelhart et. al., 1986
Sigmoid unit
Cost Function
Gradient Descent

Every weight is modified by a small


quantity in the opposite direction
(addition or subtraction) that mostly
minimizes E
Backpropagation Algorithm
Gradient-based learning applied to document
recognition
Y. Le Cun et. al, 1998
Reducing the
Dimensionality of
Data with Neural
Networks
Hinton and Salakhutdinov 2006
Imagenet classification with deep convolutional
neural networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012
Classification

[Krizhevsky 2012]

29
Detection Segmentation

[Faster R-CNN: Ren, He, Girshick, Sun 2015] [Farabet et al., 2012]

30
Convolutional Neural
Networks
CNN
● CNN architecture main task is the feature extraction through 2D or 3D
convolutional operations.

● The simple CNN framework involves four layers: convolutional, activation,


pooling, and fully connected layer.

Lenet
¿Qué es una convolución?

1 0 1

0 1 0

1 0 1

Kernel
Convolution Layer

32x32x3 image

32 height

http://setosa.io/ev/image-kernels/
32 width
3 depth
Convolution Layer
activation map
32x32x3 image

32
5x5x3 filter
28

convolve
28
32
1
3
Convolution Layer

32x32x3 image

5x5x3 filter
32

32
3
Convolution Layer

32x32x3 image

5x5x3 filter
32

32
3
Convolution Layer
activation map
32x32x3 image
5x5x3 filter
32

28

convolve

32 28
3 1
Convolution Layer Un segundo filtro
32x32x3 image activation maps
5x5x3 filter
32

28

convolve

32 28
3 1
Convolution Layer
activation maps

32

28

Convolution Layer

32 28
3 6

Si tenemos 6 filtros, el resultado tendría la forma: 28x28x6


Convolution Layer
activation maps

32

28

Convolution Layer

32 28
● Kernel size = 5
3 ● # kernels = 6 6
● padding =0
7
7x7 input
3x3 filter

42
7
7x7 input
3x3 filter

43
7
7x7 input
3x3 filter

44
7
7x7 input
3x3 filter

45
7
7x7 input
3x3 filter

=> 5x5 output


7

46
Padding
0 0 0 0 0 0
input 7x7
0 3x3 filter
0 padding 1
0

0
Padding
0 0 0 0 0 0
input 7x7
0 3x3 filter
0 padding 1
0
7x7 output!
0

https://ezyang.github.io/convolution-visualizer/index.html
Pooling layer

49
Max Pooling

Single depth slice


1 1 2 4
x max pool with 2x2 filters
5 6 7 8 and stride 2 6 8

3 2 1 0 3 4

1 2 3 4

y
50
Avg Pooling

Single depth slice


1 1 2 4
x avg pool with 2x2 filters
5 6 7 8 and stride 2 3.25 5.25

3 2 1 0
2 2
1 2 3 4

y
51
Activation Function
Fully Connected Layer

32x32x3
Fully Connected Layer

3072

32x32x3
input

32

32
3
Fully Connected Layer
input

32

(32x32x3)

32
3
Fully Connected Layer
Keras code
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten

model = Sequential([
Conv2D(16, 3, activation='relu', input_shape=(28,28,1)),
MaxPool2D(),
Conv2D(32, 3, activation='relu'),
MaxPool2D(),
Flatten(),
Dense(10, activation='softmax')
])
Arquitecturas conocidas
LeNet-5
[LeCun et al., 1998]

Conv filters were 5x5, applied at stride 1


Subsampling (Pooling) layers were 2x2 applied at stride 2
i.e. architecture is [CONV-POOL-CONV-POOL-CONV-FC]
AlexNet
[Krizhevsky et al. 2012]

Full (simplified) AlexNet architecture:


[227x227x3] INPUT
[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0
Details/Retrospectives:
[27x27x96] MAX POOL1: 3x3 filters at stride 2
- first use of ReLU
[27x27x96] NORM1: Normalization layer
- used Norm layers (not common anymore)
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2
- heavy data augmentation
[13x13x256] MAX POOL2: 3x3 filters at stride 2
- dropout 0.5
[13x13x256] NORM2: Normalization layer
- batch size 128
[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1
- SGD Momentum 0.9
[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1
- Learning rate 1e-2, reduced by 10
[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1
manually when val accuracy plateaus
[6x6x256] MAX POOL3: 3x3 filters at stride 2
- L2 weight decay 5e-4
[4096] FC6: 4096 neurons
- 7 CNN ensemble: 18.2% -> 15.4%
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class scores)
VGGNet
[Simonyan and Zisserman, 2014]

Only 3x3 CONV stride 1, pad 1


and 2x2 MAX POOL stride 2

best model
7.3% top 5 error
GoogLeNet

[Szegedy et al., 2014]

Inception module

ILSVRC 2014 winner (6.7% top 5 error)


Inception module (Keras code)

from tensorflow.keras.layers import Conv2D, MaxPool2D, concatenate

tower_1 = Conv2D(64, 1, padding='same', activation='relu')(input_img)

tower_2 = Conv2D(64, 1, padding='same', activation='relu')(input_img)


tower_2 = Conv2D(64, 3, padding='same', activation='relu')(tower_1)

tower_3 = Conv2D(64, 1, padding='same', activation='relu')(input_img)


tower_3 = Conv2D(64, 5, padding='same', activation='relu')(tower_2)

tower_4 = MaxPool2D(3, strides=(1,1), padding='same')(input_img)


tower_4 = Conv2D(64, 1, padding='same', activation='relu')(tower_3)

output = concatenate([tower_1, tower_2, tower_3, tower_4], axis = 3)


Inception module (Keras code)
GoogLeNet
ResNet [He et al., 2015] ILSVRC 2015 winner (3.6% top 5 error)
224x224x3

spatial dimension
only 56x56!
ResNet [He et al., 2015]
- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout used

75
ResNet [He et al., 2015]
YOLO [Redmon et al., 2016]
SqueezeNet
[Iandola et al., 2017]
Thank You !

Susan Palacios Salcedo


PhD Candidate,
PUCP
[email protected]

You might also like