Introduction To Neural Network and Deep Learning
Introduction To Neural Network and Deep Learning
Đỗ Trọng Hợp
Khoa Khoa Học và Kỹ Thuật Thông Tin
Đại học Công Nghệ Thông Tin TP. Hồ Chí Minh
Trong-Hop Do
Introduction to Deep Learning
Trong-Hop Do
neuron
Size Price
x y
Trong-Hop Do
Trong-Hop Do
price
Trong-Hop Do
Supervised learning
Standard NN
CNN
RNN
Hybrid
Trong-Hop Do
Trong-Hop Do
Trong-Hop Do
Trong-Hop Do
Scale drives deep learning progress
• Data
• Computation
• Algorithm
Trong-Hop Do
Trong-Hop Do
Binary classification
Trong-Hop Do
Trong-Hop Do
Logistic Regression cost function
1
𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧 = − log 𝑦ො − log 1 − 𝑦ො
1+𝑒 −𝑧
1 2
Loss (error) function: ℒ 𝑦,
ො 𝑦 = 𝑦ො − 𝑦
2
ℒ 𝑦,
ො 𝑦 = − 𝑦 log 𝑦ො + 1 − 𝑦 log 1 − 𝑦ො
• If 𝑦 = 1: ℒ 𝑦,
ො 𝑦 = − log 𝑦ො ← 𝑦ො should be close to 1
• If 𝑦 = 0: ℒ 𝑦,
ො 𝑦 = − log 1 − 𝑦ො ← 𝑦ො should be close to 0
Repeat {
𝜕𝐽 𝑤
𝑤 ≔𝑤−𝛼
𝜕𝑤
𝜕𝐽 𝑤
𝑏 ≔𝑏−𝛼
𝜕𝑏
}
Trong-Hop Do
Computation Graph
𝐽 𝑎, 𝑏, 𝑐 = 3(𝑎 + 𝑏𝑐)
𝑢 = 𝑏𝑐
𝑢
𝑣 =𝑎+𝑢
𝑣
𝐽 = 3𝑣
𝐽
𝑎= 5
11 33
𝑏= 3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢 = 𝑏𝑐
𝑐= 2
Trong-Hop Do
Derivatives with Computation Graphs
Chain rule 𝐹(𝑥) = 𝑓(𝑔(𝑥))
𝑑𝑎 𝑎= 5
11 33
𝑑𝑏 𝑏= 3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢 = 𝑏𝑐 𝑑𝑣
𝑑𝑐 𝑐= 2
𝑑𝑢
𝜕𝐽 𝜕𝐽 𝜕𝑣 𝜕𝐽 𝜕𝐽 𝜕𝑣 𝜕𝐽 𝜕𝐽 𝜕𝑢 𝜕𝐽 𝜕𝐽 𝜕𝑢
= = = 𝜕𝑐
= 𝜕𝑢 𝜕𝑐
𝜕𝑎 𝜕𝑣 𝜕𝑎 𝜕𝑢 𝜕𝑣 𝜕𝑢 𝜕𝑏 𝜕𝑢 𝜕𝑏
𝑑𝑎 3 1 𝑑𝑢 3 1 𝑑𝑏 3 2 𝑑𝑐 3 3
Trong-Hop Do
Logistic Regression Radient descent
Minimize ℒ
Trong-Hop Do
𝜕𝐿 𝜕𝐿 𝜕𝑎 𝜕𝐿 𝑦 1−𝑦
𝑑𝑧 = = =𝑎−𝑦 𝑑𝑎 = =− +
𝜕𝑧 𝜕𝑎 𝜕𝑧 𝜕𝑎 𝑎 1−𝑎
𝑎 1−𝑎
𝑤1 ≔ 𝑤1 − 𝛼 𝑑𝑤1
𝜕𝐿 𝜕𝐿 𝜕𝑧
𝑑𝑤1 = = = 𝑥1 𝑑𝑧 𝑑𝑤2 = 𝑥2 𝑑𝑧 𝑑𝑏 = 𝑑𝑧 𝑤2 ≔ 𝑤2 − 𝛼 𝑑𝑤2
𝜕𝑤1 𝜕𝑧 𝜕𝑤1
𝑏 ≔ 𝑏 − 𝛼 𝑑𝑏
Trong-Hop Do
Gradient Descent on m Examples
𝑚 𝑚
𝜕𝐽 1 𝜕𝐿 1 (𝑖)
= = 𝑑𝑤𝑖
𝜕𝑤𝑖 𝑚 𝜕𝑤𝑖 𝑚
𝑖=1 𝑖=1
Trong-Hop Do
Gradient Descent on m Examples
J=0; dw1=0; dw2=0; db=0;
for i = 1 to m
z(i) = wx(i)+b;
a(i) = sigmoid(z(i));
J += -[y(i)log(a(i))+(1-y(i))log(1-a(i));
dz(i) = a(i)-y(i);
dw1 += x1(i)dz(i);
dw2 += x2(i)dz(i);
db += dz(i);
J /= m;
dw1 /= m;
dw2 /= m;
db /= m;
Trong-Hop Do
Vectorizing Logistic Regression
d w = n p . ze r o s ( n X, 1 ) )
𝑑𝑤 += 𝑥 𝑖 𝑑𝑧 (𝑖)
dw /=m
Trong-Hop Do
Vetorizing Logistic Regression
Trong-Hop Do
Z = np.dot(w.T,X) + b
A = sigmoid(Z)
dZ = A-Y
dw = 1/m*np.dot(X,dZ.T)
db = 1/m*np.sum(dZ)
w = w - alpha*dw
b = b - alpha*db
Trong-Hop Do
One hidden layer Neural Network
Trong-Hop Do
What is neural network?
Trong-Hop Do
[1] 1𝑇 [1]
𝑧1 = 𝑤1 𝑥 + 𝑏1
[1] [1]
𝑎1 = 𝜎 𝑧1
[𝑙] layer
𝑎𝑖 node in layer
[1] 1𝑇 [1]
𝑧2 = 𝑤2 𝑥 + 𝑏2
[1] [1]
𝑎2 = 𝜎 𝑧2
Trong-Hop Do
(4,3)
𝑤11 𝑇 𝑥1 𝑏1
[1]
𝑥2 [1]
𝑤21
[1] 𝑇
𝑧 = 𝑥3 𝑏2
+ [1]
𝑤31 𝑇
𝑏3
[1]
𝑤41 𝑇
𝑏4
[1]
𝑊 [1] 𝑏
Trong-Hop Do
(4,1) (4,3) (3,1) (4,1)
(4,1) (4,1)
(1,1) (1,1)
Trong-Hop Do
Trong-Hop Do
𝑒 𝑧 − 𝑒 −𝑧
tanh: 𝑎 = 𝑧
𝑒 + 𝑒 −𝑧
Trong-Hop Do
With linear activation function, the outputs on later
layers are also linear combination of the input. So the
hidden layer (even many of them) has no meaning.
Trong-Hop Do
Derivatives of activation function
2
𝑔′ 𝑧 = 𝑔 𝑧 1 − 𝑔 𝑧 𝑔′ 𝑧 = 1 − 𝑔 𝑧
= 𝑎(1 − 𝑎) = 1 − 𝑎2
𝑒 𝑧 − 𝑒 −𝑧
tanh: 𝑎 = 𝑧
𝑒 + 𝑒 −𝑧
0.01 if 𝑧 < 0
𝑔′ 𝑧 = ቊ
0 if 𝑧 < 0 1 if 𝑧 ≥ 0
𝑔′ 𝑧 = ቊ
1 if 𝑧 ≥ 0
Trong-Hop Do
Gradient descent for neural network
𝑛[0] 𝑛[1] 𝑛[2] 𝑛1 , 𝑛0 𝑛1 , 1 𝑛 2 ,𝑛 1 (𝑛 1 , 1)
Parameter: 𝑊 1 , 𝑏 [1] , 𝑊 [2] , 𝑏 [2]
Cost function:
𝑅𝑒𝑝𝑒𝑎𝑡 {
Compute 𝑎 1 , 𝑧 1 , 𝒂 𝟐 , 𝑧 [2]
𝜕𝐽 𝜕𝐽 𝜕𝐽 𝜕𝐽
𝑑𝑊 [2] = , 𝑑𝑏 [2] = , 𝑑𝑊 [1] = , 𝑑𝑏 [1] =
𝜕𝑊 [2] 𝜕𝑏 [2] 𝜕𝑊 [1] 𝜕𝑏 [1]
𝜕𝐿 𝑦 1−𝑦
𝑑𝑎[2] = =− +
𝜕𝑎[2] 𝑎[2] 1−𝑎[2]
𝜕𝐿 𝜕𝐿 𝜕𝑎[2]
𝑑𝑧 [2] = = = 𝑑𝑎 2 𝑎 1 − 𝑎 = 𝑎[2] − 𝑦
𝜕𝑧 [2] 𝜕𝑎[2] 𝜕𝑧 [2]
𝜕𝐿 𝜕𝐿 𝜕𝑧 [2] 𝑇
𝑑𝑊 [2] = = = 𝑑𝑧 [2] 𝑎 1
𝜕𝑊 [2] 𝜕𝑧 [2] 𝜕𝑊 [2]
𝜕𝐿 𝜕𝐿 𝜕𝑧 [2]
𝑑𝑏 [2] = = = 𝑑𝑧 [2]
𝜕𝑏[2] 𝜕𝑧 [2] 𝜕𝑏[2]
𝑛 1 ,1 𝑛 1 ,1 1,1
𝜕𝐿 𝜕𝐿 𝜕𝑧 [2] 2 𝑇 𝑑𝑧 [2]
𝑑𝑎[1] = = =𝑊
𝜕𝑎[1] 𝜕𝑧 [2] 𝜕𝑎[1]
𝜕𝐿 𝜕𝐿 𝜕𝑎[1] ′ 2 𝑇 𝑑𝑧 [2] * ′
𝑑𝑧 [1] = = = 𝑑𝑎 1 𝑔 1 𝑧1 =𝑊 𝑔1 𝑧1
𝜕𝑧 [1] 𝜕𝑎[1] 𝜕𝑧 [1]
Trong-Hop Do
If you initialize weights to zeros, all rows in matrix 𝑊 will be the same.
Trong-Hop Do
Trong-Hop Do
Deep L-Layer Neural Network
Trong-Hop Do
Trong-Hop Do
Trong-Hop Do
Deep neural network notation
Trong-Hop Do
Forward Propagation in a Deep Network
Trong-Hop Do
Implement forward propagation
def forwardprop(A_prev, W, b, activation):
Z = np.dot(W, A) + b
if activation == "sigmoid":
A = sigmoid(Z)
A = relu(Z)
A = tanh(Z)
cache = (Z, W)
return A, cache
Trong-Hop Do
Implement forward propagation
Output layer L
ReLu Sigmoid
A=X
……
𝑑𝑎 𝐿−1 𝑑𝑧 𝐿
𝑑𝑎 𝐿
Input 𝑑𝑎[𝑙]
𝐿
𝑑𝑊 𝑑𝑏 𝐿
Trong-Hop Do
Vectorization of backward Propagation in a Deep Network
Trong-Hop Do
Implement backward propagation
def backprop(dA, cache, activation):
Z, W = cache
if activation == "relu":
dZ = relu_backward(dA, Z)
dZ = sigmoid_backward(dA, Z)
for l in reversed(range(L-1)):
Trong-Hop Do
Implement backward propagation
def relu_backward(dA, Z):
dZ = dA
dZ[Z <= 0] = 0
0 if 𝑧 < 0
𝑔′ 𝑧 = ቊ return dZ
1 if 𝑧 ≥ 0
ReLu: 𝑎 = max 0, 𝑧
def sigmoid_backward(dA, Z):
A = 1/(1+np.exp(-Z))
𝑔′ 𝑧 = 𝑔 𝑧 1 − 𝑔 𝑧 = 𝑎(1 − 𝑎) dZ = dA * A * (1-A)
return dZ
2 A = 1/(1+np.exp(-Z))
𝑔′ 𝑧 = 1 − 𝑔 𝑧 = 1 − 𝑎2
dZ = dA * (1-A*A)
𝑒 𝑧 − 𝑒 −𝑧
tanh: 𝑎 = 𝑧
𝑒 + 𝑒 −𝑧
return dZ
Trong-Hop Do
Implement backward propagation
m = A_prev.shape[1]
dW = np.dot(dZ, A_prev.T)/m
db = dZ.sum(axis = 1, keepdims = True)/m
dA_prev = np.dot(W.T, dZ)
Trong-Hop Do
Implementation methodology
Trong-Hop Do
Backward Propagation in a Deep Network
Trong-Hop Do
Intuition about deep representation
Trong-Hop Do
Building blocks of deep neural networks
Trong-Hop Do
Building blocks of deep neural networks
Trong-Hop Do
Implement cost function
𝑚 𝑚
1 1
Cross entropy cost function: 𝐽 𝑤, 𝑏 = ℒ 𝑦ො 𝑖 , 𝑦 𝑖 =− 𝑦 (𝑖) log 𝑦ො 𝑖 + 1−𝑦 𝑖 log 1 − 𝑦ො 𝑖
𝑚 𝑚
𝑖=1 𝑖
m = Y.shape[1]
return cost
Trong-Hop Do
Future topics
• Sequence Models