Python Deep
Learning
Introducción práctica con
Keras y TensorFlow 2
PATC Courses | Barcelona - February 2020
Jordi TORRES.AI
1
Transparencias para
impartir docencia con
el libro #PythonDL
https://torres.ai/python-deep-learning/
2
Acerca de estas transparencias:
● Versión: 0.8 (Barcelona, 31/01/2020)
○ Borrador actual de las transparencias del libro
«Pyhon Deep Learning».
○ Algunas transparencias contienen texto en inglés. Con el tiempo
iremos «puliendo» las transparencias (pero creemos que incluso
así pueden ser usadas y por ello ya las compartimos)
3
Contenido del curso
Course content
PART 1: INTRODUCCIÓN PART 3: TÉCNICAS DEL DEEP LEARNING
PART 1: INTRODUCTION PART 3: DEEP LEARNING TECHNIQUES
1. ¿Qué es el Deep Learning? 9. Etapas de un proyecto Deep Learning
1. What is Deep Learning? 9. Stages of a Deep Learning project
2. Entorno de trabajo 10. Datos para entrenar redes neuronales
2. Work environment 10. Data to train neural networks
3. Python y sus librerías 11. Data Augmentation y Transfer Learning
3. Python and its libraries 11. Data Augmentation and Transfer Learning
12. Arquitecturas avanzadas de redes neuronales
12. Advanced neural network architectures
PART 2: FUNDAMENTOS DEL DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas PART 4: DEEP LEARNING GENERATIVO
4. Densely connected neural networks. PART 4: GENERATIVE DEEP LEARNING
5. Redes neuronales en Keras
5. Neural networks in Keras
13. Redes neuronales recurrentes
6. Cómo se entrena una red neuronal 13. Recurrent neural networks
6. How a neural network is trained 14. Generative Adversarial Networks
7. Parámetros e hiperparámetros en redes neuronales 14. Generative Adversarial Networks
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks.
4
Recursos del libro
● Página web del libro:
https://JordiTorres.ai/python-deep-learning
● Github del libro:
https://github.com/JordiTorresBCN/python-deep-learning
● Material adicional del libro para descargar:
https://marketing.marcombo.com + código promocional del libro
5
PART 2: FUNDAMENTOS DE DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas
4. Densely connected neural networks
5. Redes Neuronales en Keras
5. Neural networks in Keras
6. Cómo se entrena un red neuronal
6. How a neural network is trained
7. Parámetros e hiperparámetros en redes neuronales
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks
6
PART 2: FUNDAMENTOS DE DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas
4. Densely connected neural networks
5. Redes Neuronales en Keras
5. Neural networks in Keras
6. Cómo se entrena un red neuronal
6. How a neural network is trained
7. Parámetros e hiperparámetros en redes neuronales
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks
7
Caso de estudio
● The MNIST database
○ Dataset of handwritten digits classification
○ 60,000 28x28 grayscale images of the 10 digits, along with a test
set of 10,000 images.
8
Caso de estudio
● The MNIST database
○ Features: matrix of 28x28 pixels with values [0, 255]
○ Labels: values [0, 9]
9
Caso de estudio
10
Basic machine learning terminology
● Model: defines the relation between features and labels
y=wx+b
○ y: Labels
○ x: Features
○ w :weights
○ b : bias
11
Basic machine learning terminology
12
Una neurona artificial simple
13
Una neurona artificial simple
14
Una neurona artificial simple
15
Una neurona artificial simple
16
Un neurona artificial simple
● función sigmoid
17
18
Perceptron (esquemáticamente)
19
Red neuronal
20
Perceptrón multicapa
21
Perceptrón multicapa
22
Perceptron multicapa para clasificar
Neural Networks
are often used for
classification, and
specifically when
classes are
exclusive. In this
case the output
layer is a softmax
function in which
the output of each
neuron
corresponds to the
estimated
probability of the
corresponding
class.
23
Función de activación softmax
● The softmax function has two main steps:
○ first, the “evidences” for an image belonging to a certain label are
computed,
○ and later the evidences are converted into probabilities for each
possible label.
24
Evidence of belonging
● An approach to measure the
evidence that a certain image
belongs to a particular class is to
make a weighted sum of the
evidence of belonging to each of its
pixels to that class.
To explain the idea I will use a visual
example ->
25
Evidence of belonging
Let’s suppose that we already have the
model learned for the number zero
(28x28):
● Pixels in red represent negative
weights (i.e., reduce the evidence that
it belongs),
● Pixels in blue represent positive
weights (the evidence of which is
greater increases).
● The black color represents the neutral
value.
26
Evidence of belonging
• Let’s imagine that we trace a zero
over it.
• In general, the trace of our zero
would fall on the blue zone
• It is quite evident that if our stroke
goes over the red zone, it is most likely
that we are not writing a zero;
• therefore, using a metric based on
adding if we pass through the blue
zone and subtracting if we pass
through the red zone seems
reasonable.
27
28
Evidence of belonging
● To confirm that it is a good
metric, let’s imagine now
that we draw a three
● it is clear that the red zone of
the center of the previous
model that we used for the
zero will penalize the
aforementioned metric since,
● as we can see in the figure,
when writing a three we pass
over
29
Evidence of belonging
But on the other
hand, if the
reference model
is the one
corresponding to
number 3
we can see that,
in general, the
different possible
traces that
represent a three
are mostly
maintained in the
blue zone.
30
Probability of belonging
● The second step involves computing probabilities.
● Specifically we turn the sum of evidences into predicted
probabilities using this function:
● softmax uses the exponential value of the calculated
evidence and then normalizes them so that the sum
equates to one, forming a probability distribution.
31
Probability of belonging
● Intuitively, the effect obtained with the use of
exponentials is that one more unit of
evidence has a multiplier effect and one unit
less has the inverse effect.
● The interesting thing about this function is
Notes: that
○ a good prediction will have a single entry in the
vector with a value close to 1, while the remaining
entries will be close to 0.
○ in a weak prediction, there will be several
possible labels, which will have more or less the
same probability.
32
PART 2: FUNDAMENTOS DE DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas
4. Densely connected neural networks
5. Redes Neuronales en Keras
5. Neural networks in Keras
6. Cómo se entrena un red neuronal
6. How a neural network is trained
7. Parámetros e hiperparámetros en redes neuronales
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks
33
Preparar entorno ejecución
34
Precarga de los datos en Keras
35
36
37
Preprocesado de datos de entrada en
una red neuronal
38
39
one-hot encoding
40
one-hot encoding
41
Definición del modelo
Number of Type of activation
neurons per layer function
- keras.models.Sequential class is a wrapper for the neural network model
- Keras will automatically infer the shape of all layers after the first layer
42
Definición del modelo
43
Definición del modelo
44
Configuración del proceso de learning
45
Configuración del proceso de learning
46
Entrenamiento del modelo
47
Evaluación del modelo
48
Matriz de confusión
49
Matriz de confusión
50
Generación de predicciones
51
Generación de predicciones
52
53
It is time to get
your hands dirty!
54
Homework: Fashion-MNIST dataset
55
Usando el mismo modelo
Equivalent to numpy.reshape (,784)
that gives a new shape to an array
without changing its data.
56
Motivation for next chapter
57
PART 2: FUNDAMENTOS DE DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas
4. Densely connected neural networks
5. Redes Neuronales en Keras
5. Neural networks in Keras
6. Cómo se entrena un red neuronal
6. How a neural network is trained
7. Parámetros e hiperparámetros en redes neuronales
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks
58
Una red neuronal está parametrizada
59
Función de pérdida
60
Optimizador
61
Learning process
62
How does Deep Learning
works?
63
MANY EXAMPLES
MANY EXAMPLES
(X,Y) PAIRS
MANY EXAMPLES
(X,Y) PAIRS
MANY EXAMPLES
x1 (X,Y) PAIRS
(X,Y) PAIRS
w1j
w2j
x2 zj = ∑' (') *' + bj yj = +(-j )
yj
1
1 + # $%
wnj
xn
bj
TRAINING stage 64
MANY EXAMPLES
MANY EXAMPLES
(X,Y) PAIRS
MANY EXAMPLES
(X,Y) PAIRS
MANY EXAMPLES
x1 (X,Y) PAIRS
(X,Y) PAIRS
w1j
w2j
x2 zj = ∑' (') *' + bj yj = +(-j )
yj
1
1 + # $%
wnj
xn
bj
TRAINING stage 65
MANY EXAMPLES
MANY EXAMPLES
(X,Y) PAIRS
MANY EXAMPLES
(X,Y) PAIRS
MANY EXAMPLES
x1 (X,Y) PAIRS
(X,Y) PAIRS
w1j
Tunning W & b
w2j
x2 zj = ∑' (') *' + bj yj = +(-j )
yj
1
1 + # $%
wnj
xn
bj
TRAINING stage 66
New
Data x1
w1j
w2j
x2 zj = ∑' (') *' + bj yj = +(-j ) Predicción
Y
yj
1
1 + # $%
wnj
xn
bj
67
Learning process
68
Piezas claves del proceso de
backpropagation
69
Ajuste de parámetros: Gradient Descent
70
Ajuste de parámetros: Gradient Descent
71
Ajuste de parámetros: Gradient Descent
72
Tipos de Gradient Descent
● ¿con qué frecuencia se ajustan los valores de los
parámetros?
○ Stochastic Gradient Descent
○ Batch Gradient Descent
○ Mini Bath Gradient Descent
● SGD (con batch)
73
Loss function
74
Optimizers
● SGD, RMSprop, AdaGrad, Adadelta, Adam, Adamax, Nadam …
75
PART 2: FUNDAMENTOS DE DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas
4. Densely connected neural networks
5. Redes Neuronales en Keras
5. Neural networks in Keras
6. Cómo se entrena un red neuronal
6. How a neural network is trained
7. Parámetros e hiperparámetros en redes neuronales
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks
76
Parámetros e hiperparámetros
● Parameter: A variable of a model that the DL system
trains on its own. For example, weights are parameters
whose values the DL system gradually learns through
successive training iterations.
● Hyperparameters: The "knobs" that you tweak during
successive runs of training a model.
77
Parámetros e hiperparámetros
● Hiperparámetros a nivel de estructura y topología de la red
neuronal:
○ número de capas,
○ número de neuronas por capa,
○ sus funciones de activación,
○ inicialización de los pesos,
○ etc.
78
Parámetros e hiperparámetros
● Hiperparámetros a nivel de algoritmo de aprendizaje:
○ epochs,
○ batch size,
○ learning rate,
○ momentum,
○ etc.
79
Epochs y Batch size
● Epoch: as a single training iteration of all batches in both
forward and back propagation. This means 1 epoch is a
single forward and backward pass of the entire input data.
● Batch size: The number of examples in a batch. The set of
examples used in one single update of a model's weights
during training.
80
Learning rate y learning rate decay
81
Momentum
82
Activation functions
● Sigmoid
83
Activation functions
● Tanh
84
Activation functions
● ReLU
85
TensorFlow Playground
86
TensorFlow Playground
87
TensorFlow Playground
88
TensorFlow Playground
89
Clasificación con una sola neurona
90
91
Clasificación con más de una neurona
92
93
94
95
Clasificación con varias capas
96
97
98
PART 2: FUNDAMENTOS DE DEEP LEARNING
PART 2: FUNDAMENTALS OF DEEP LEARNING
4. Redes neuronales densamente conectadas
4. Densely connected neural networks
5. Redes Neuronales en Keras
5. Neural networks in Keras
6. Cómo se entrena un red neuronal
6. How a neural network is trained
7. Parámetros e hiperparámetros en redes neuronales
7. Parameters and hyperparameters in neural networks
8. Redes neuronales convolucionales
8. Convolutional neural networks
99
Repaso:
Deep Learning
Matematic models
100
101
forward
face
Prob.
Deep Learnning - Supervised Learning
102
Fase ”TRAINING”
forward
F(x) = 70%
Jordi TORRES.AI
103
Fase ”TRAINING”
forward
F(x) = 70%
BACKPROPAGATION
(update of model weights)
error = 30%
Jordi TORRES.AI
104
Fase ”TRAINING”
forward
F(x) = 80%
BACKPROPAGATION
(update of model weights)
error = 20%
105
Fase ”TRAINING”
FORWARD PROPAGATION
LOSS
estimation
(update of model weights)
Jordi TORRES.AI
106
Fase ”INFERENCE”
Jordi TORRES.AI
107
Fase ”INFERENCE”
Jordi TORRES.AI
108
Convolutional Neural Networks
● Un CNN: An explicit assumption that the inputs are images.
● Channel
○ is a conventional term used to refer to a certain component of an
image.
○ For an RGB color image à 3 channels
109
Convolutional Neural Networks
• Intuitive Explanation of CNN.
Fase ”TRAINING”
edges edge combination object models
Jordi TORRES.AI
110
Convolutional Neural Networks
• Intuitive Explanation of CNN.
Fase ”INFERENCE”
edges edge combination object models
Jordi TORRES.AI
111
Convolutional Neural Networks
• Intuitive Explanation of CNN.
Fase ”INFERENCE”
edges edge combination object models
Jordi TORRES.AI
112
Basic components of a CNN
● The convolution operation
● The pooling operation
● Classification (Fully Connected Layer)
113
Basic components of a CNN
● The convolution operation
● The pooling operation
● Classification (Fully Connected Layer)
114
The convolution operation
In CNN not all the neurons of a layer
are connected with all the neurons of
the next layer as in the case of fully
connected neural networks; it is done
by using only small localized areas of
the space of input neurons.
115
The convolution operation
Sliding window
Use the same filter (the same W matrix of
weights and the same bias b) for all the
neurons in the next layer
116
The convolution operation: visual example
Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
117
The convolution operation
● Many filters (one for each feature that we want to detect)
118
Basic components of a CNN
● The convolution operation
● The pooling operation
● Classification (Fully Connected Layer)
119
The pooling operation
○ max-pooling
○ average-pooling
120
The pooling operation
We slide our 2 x 2 window by 2
cells (also called ‘stride’) and take
Source: http://cs231n.github.io/convolutional-networks/ the maximum value in each
region.
121
The pooling operation
● The pooling maintains the spatial relationship
122
Convolutional+Pooling layers: Summary
123
Implementación con la API Keras
124
125
Basic components of a CNN
● The convolution operation
● The pooling operation
● Classification (Fully Connected Layer)
126
Un modelo simple
127
128
número de parámetros
● conv2D: (32 × (25 + 1))
● conv2D: ((5 × 5 × 32) + 1) × 64
● Dense: 10 × 1024 +10
129
130
131
Hyperparameters of the convolutional layer
● Size and number of filters
○ The size of the window (window_height × window_width) that
keeps information about the spatial relationship of pixels are
usually 3×3 or 5×5.
○ The number of filters (output_depth) indicates the number of
features and is usually 32 or 64.
Conv2D(output_depth, (window_height, window_width))
132
Hyperparameters of the convolutional layer
5×5 3×3
Output image 3x3!!!
133
Hyperparameters of the convolutional layer
● Padding
○ Sometimes we want to get an output image of the same dimensions as the input.
○ We can add zeros around the input images before sliding the window through it.
134
Hyperparameters of the convolutional layer
In TensorFlow, the padding in
the Conv2D layer is configured
with the padding argument,
which can have two values:
"Same"
indicates that as many rows and
columns of zeros are added as
necessary, so that the output has
the same dimension as the
input.
"Valid”
indicates not to do padding (it is
the default value of this
argument in Keras/TensorFlow).
135
Hyperparameters of the convolutional layer
● Stride : Number of steps the sliding window jumps
○ Ex: stride 2
136
It is time to get
your hands dirty!
137
Homework: Fashion-MNIST dataset
138
139
140
Capas y optimizadores
141
142
Capas de Dropout y BatchNormalization
143
Decaimiento del ratio de aprendizaje
● callback LearningRateScheduler
144