0% found this document useful (0 votes)
10 views59 pages

Introduction To Neural Network and Deep Learning

Uploaded by

22521056
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views59 pages

Introduction To Neural Network and Deep Learning

Uploaded by

22521056
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction to Neural Network and Deep Learning

Đỗ Trọng Hợp
Khoa Khoa Học và Kỹ Thuật Thông Tin
Đại học Công Nghệ Thông Tin TP. Hồ Chí Minh

Trong-Hop Do
Introduction to Deep Learning

Trong-Hop Do
neuron

Size Price
x y

Trong-Hop Do
Trong-Hop Do
price

Trong-Hop Do
Supervised learning

Standard NN

CNN

RNN

Hybrid

Trong-Hop Do
Trong-Hop Do
Trong-Hop Do
Trong-Hop Do
Scale drives deep learning progress

• Data

• Computation

• Algorithm

Trong-Hop Do
Trong-Hop Do
Binary classification

Trong-Hop Do
Trong-Hop Do
Logistic Regression cost function
1
𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧 = − log 𝑦ො − log 1 − 𝑦ො
1+𝑒 −𝑧

Given 𝑥 1 ,𝑦 1 ,…, 𝑥 𝑚 ,𝑦 𝑚 , want 𝑦ො (𝑖) ≈ 𝑦 (𝑖) .

1 2
Loss (error) function: ℒ 𝑦,
ො 𝑦 = 𝑦ො − 𝑦
2

ℒ 𝑦,
ො 𝑦 = − 𝑦 log 𝑦ො + 1 − 𝑦 log 1 − 𝑦ො

• If 𝑦 = 1: ℒ 𝑦,
ො 𝑦 = − log 𝑦ො ← 𝑦ො should be close to 1

• If 𝑦 = 0: ℒ 𝑦,
ො 𝑦 = − log 1 − 𝑦ො ← 𝑦ො should be close to 0

Cross Entropy Cost function:


𝑚 𝑚
1 1
𝐽 𝑤, 𝑏 = ෍ ℒ 𝑦ො 𝑖 , 𝑦 𝑖 = − ෍ 𝑦 (𝑖) log 𝑦ො 𝑖 + 1−𝑦 𝑖 log 1 − 𝑦ො 𝑖
𝑚 𝑚
𝑖=1 𝑖
Trong-Hop Do
Trong-Hop Do
Global optimum
Gradient Descent
𝑑𝐽 𝑤 Learning rate
<0
𝑑𝑤 𝐽(𝑤) Repeat {
𝑑𝐽 𝑤
𝑤 ≔𝑤−𝛼
𝑑𝑤
}
Update
𝑤

Repeat {
𝜕𝐽 𝑤
𝑤 ≔𝑤−𝛼
𝜕𝑤

𝜕𝐽 𝑤
𝑏 ≔𝑏−𝛼
𝜕𝑏
}
Trong-Hop Do
Computation Graph
𝐽 𝑎, 𝑏, 𝑐 = 3(𝑎 + 𝑏𝑐)
𝑢 = 𝑏𝑐
𝑢
𝑣 =𝑎+𝑢
𝑣
𝐽 = 3𝑣
𝐽

𝑎= 5
11 33
𝑏= 3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢 = 𝑏𝑐
𝑐= 2

Trong-Hop Do
Derivatives with Computation Graphs
Chain rule 𝐹(𝑥) = 𝑓(𝑔(𝑥))

𝑑𝑎 𝑎= 5
11 33
𝑑𝑏 𝑏= 3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢 = 𝑏𝑐 𝑑𝑣
𝑑𝑐 𝑐= 2
𝑑𝑢

𝜕𝐽 𝜕𝐽 𝜕𝑣 𝜕𝐽 𝜕𝐽 𝜕𝑣 𝜕𝐽 𝜕𝐽 𝜕𝑢 𝜕𝐽 𝜕𝐽 𝜕𝑢
= = = 𝜕𝑐
= 𝜕𝑢 𝜕𝑐
𝜕𝑎 𝜕𝑣 𝜕𝑎 𝜕𝑢 𝜕𝑣 𝜕𝑢 𝜕𝑏 𝜕𝑢 𝜕𝑏
𝑑𝑎 3 1 𝑑𝑢 3 1 𝑑𝑏 3 2 𝑑𝑐 3 3

Trong-Hop Do
Logistic Regression Radient descent

Minimize ℒ

Trong-Hop Do
𝜕𝐿 𝜕𝐿 𝜕𝑎 𝜕𝐿 𝑦 1−𝑦
𝑑𝑧 = = =𝑎−𝑦 𝑑𝑎 = =− +
𝜕𝑧 𝜕𝑎 𝜕𝑧 𝜕𝑎 𝑎 1−𝑎
𝑎 1−𝑎
𝑤1 ≔ 𝑤1 − 𝛼 𝑑𝑤1
𝜕𝐿 𝜕𝐿 𝜕𝑧
𝑑𝑤1 = = = 𝑥1 𝑑𝑧 𝑑𝑤2 = 𝑥2 𝑑𝑧 𝑑𝑏 = 𝑑𝑧 𝑤2 ≔ 𝑤2 − 𝛼 𝑑𝑤2
𝜕𝑤1 𝜕𝑧 𝜕𝑤1
𝑏 ≔ 𝑏 − 𝛼 𝑑𝑏

Trong-Hop Do
Gradient Descent on m Examples

𝑚 𝑚
𝜕𝐽 1 𝜕𝐿 1 (𝑖)
= ෍ = ෍ 𝑑𝑤𝑖
𝜕𝑤𝑖 𝑚 𝜕𝑤𝑖 𝑚
𝑖=1 𝑖=1

Trong-Hop Do
Gradient Descent on m Examples
J=0; dw1=0; dw2=0; db=0;
for i = 1 to m
z(i) = wx(i)+b;
a(i) = sigmoid(z(i));
J += -[y(i)log(a(i))+(1-y(i))log(1-a(i));
dz(i) = a(i)-y(i);
dw1 += x1(i)dz(i);
dw2 += x2(i)dz(i);
db += dz(i);
J /= m;
dw1 /= m;
dw2 /= m;
db /= m;

Trong-Hop Do
Vectorizing Logistic Regression

d w = n p . ze r o s ( n X, 1 ) )

𝑑𝑤 += 𝑥 𝑖 𝑑𝑧 (𝑖)

dw /=m
Trong-Hop Do
Vetorizing Logistic Regression

Trong-Hop Do
Z = np.dot(w.T,X) + b
A = sigmoid(Z)
dZ = A-Y
dw = 1/m*np.dot(X,dZ.T)
db = 1/m*np.sum(dZ)

w = w - alpha*dw
b = b - alpha*db

Trong-Hop Do
One hidden layer Neural Network

Trong-Hop Do
What is neural network?

Trong-Hop Do
[1] 1𝑇 [1]
𝑧1 = 𝑤1 𝑥 + 𝑏1
[1] [1]
𝑎1 = 𝜎 𝑧1

[𝑙] layer
𝑎𝑖 node in layer

[1] 1𝑇 [1]
𝑧2 = 𝑤2 𝑥 + 𝑏2
[1] [1]
𝑎2 = 𝜎 𝑧2

Trong-Hop Do
(4,3)

𝑤11 𝑇 𝑥1 𝑏1
[1]
𝑥2 [1]
𝑤21
[1] 𝑇
𝑧 = 𝑥3 𝑏2
+ [1]
𝑤31 𝑇
𝑏3
[1]
𝑤41 𝑇
𝑏4
[1]
𝑊 [1] 𝑏
Trong-Hop Do
(4,1) (4,3) (3,1) (4,1)

(4,1) (4,1)

1,1 1,4 (4,1) (1,1)

(1,1) (1,1)

Trong-Hop Do
Trong-Hop Do
𝑒 𝑧 − 𝑒 −𝑧
tanh: 𝑎 = 𝑧
𝑒 + 𝑒 −𝑧

ReLu: 𝑎 = max 0, 𝑧 Leaky ReLu: 𝑎 = max 0.01𝑧, 𝑧

Trong-Hop Do
With linear activation function, the outputs on later
layers are also linear combination of the input. So the
hidden layer (even many of them) has no meaning.

Trong-Hop Do
Derivatives of activation function

2
𝑔′ 𝑧 = 𝑔 𝑧 1 − 𝑔 𝑧 𝑔′ 𝑧 = 1 − 𝑔 𝑧
= 𝑎(1 − 𝑎) = 1 − 𝑎2

𝑒 𝑧 − 𝑒 −𝑧
tanh: 𝑎 = 𝑧
𝑒 + 𝑒 −𝑧

0.01 if 𝑧 < 0
𝑔′ 𝑧 = ቊ
0 if 𝑧 < 0 1 if 𝑧 ≥ 0
𝑔′ 𝑧 = ቊ
1 if 𝑧 ≥ 0

ReLu: 𝑎 = max 0, 𝑧 Leaky ReLu: 𝑎 = max 0.01𝑧, 𝑧

Trong-Hop Do
Gradient descent for neural network
𝑛[0] 𝑛[1] 𝑛[2] 𝑛1 , 𝑛0 𝑛1 , 1 𝑛 2 ,𝑛 1 (𝑛 1 , 1)
Parameter: 𝑊 1 , 𝑏 [1] , 𝑊 [2] , 𝑏 [2]

Cost function:

𝑅𝑒𝑝𝑒𝑎𝑡 {

Compute 𝑎 1 , 𝑧 1 , 𝒂 𝟐 , 𝑧 [2]

𝜕𝐽 𝜕𝐽 𝜕𝐽 𝜕𝐽
𝑑𝑊 [2] = , 𝑑𝑏 [2] = , 𝑑𝑊 [1] = , 𝑑𝑏 [1] =
𝜕𝑊 [2] 𝜕𝑏 [2] 𝜕𝑊 [1] 𝜕𝑏 [1]

𝑊 [2] = 𝑊 [2] − 𝛼 𝑑𝑊 2 , 𝑏 2 = 𝑏 2 − 𝛼 𝑑𝑏 [2]

𝑊 [2] = 𝑊 [2] − 𝛼 𝑑𝑊 2 , 𝑏 2 = 𝑏 2 − 𝛼 𝑑𝑏 [2]


} Trong-Hop Do
Gradient descent for neural network
1, 𝑛 1

𝜕𝐿 𝑦 1−𝑦
𝑑𝑎[2] = =− +
𝜕𝑎[2] 𝑎[2] 1−𝑎[2]

𝜕𝐿 𝜕𝐿 𝜕𝑎[2]
𝑑𝑧 [2] = = = 𝑑𝑎 2 𝑎 1 − 𝑎 = 𝑎[2] − 𝑦
𝜕𝑧 [2] 𝜕𝑎[2] 𝜕𝑧 [2]

𝜕𝐿 𝜕𝐿 𝜕𝑧 [2] 𝑇
𝑑𝑊 [2] = = = 𝑑𝑧 [2] 𝑎 1
𝜕𝑊 [2] 𝜕𝑧 [2] 𝜕𝑊 [2]

𝜕𝐿 𝜕𝐿 𝜕𝑧 [2]
𝑑𝑏 [2] = = = 𝑑𝑧 [2]
𝜕𝑏[2] 𝜕𝑧 [2] 𝜕𝑏[2]
𝑛 1 ,1 𝑛 1 ,1 1,1
𝜕𝐿 𝜕𝐿 𝜕𝑧 [2] 2 𝑇 𝑑𝑧 [2]
𝑑𝑎[1] = = =𝑊
𝜕𝑎[1] 𝜕𝑧 [2] 𝜕𝑎[1]

𝜕𝐿 𝜕𝐿 𝜕𝑎[1] ′ 2 𝑇 𝑑𝑧 [2] * ′
𝑑𝑧 [1] = = = 𝑑𝑎 1 𝑔 1 𝑧1 =𝑊 𝑔1 𝑧1
𝜕𝑧 [1] 𝜕𝑎[1] 𝜕𝑧 [1]

Same goes for 𝑑𝑊 [1] and 𝑑𝑏 [1] Trong-Hop Do


Summary of gradient descent for neural network

Trong-Hop Do
If you initialize weights to zeros, all rows in matrix 𝑊 will be the same.
Trong-Hop Do
Trong-Hop Do
Deep L-Layer Neural Network

Trong-Hop Do
Trong-Hop Do
Trong-Hop Do
Deep neural network notation

Trong-Hop Do
Forward Propagation in a Deep Network

Vectorizing across m training example

Trong-Hop Do
Implement forward propagation
def forwardprop(A_prev, W, b, activation):

Z = np.dot(W, A) + b

if activation == "sigmoid":

A = sigmoid(Z)

elif activation == "relu":

A = relu(Z)

elif activation == "tanh":

A = tanh(Z)

cache = (Z, W)

return A, cache

Trong-Hop Do
Implement forward propagation

Output layer L
ReLu Sigmoid

def L_model_forward(X, parameters):

A=X

for l in range(1, L):


A_prev = A

A, cache = forwardprop(A_prev, W_l, b_l, activation= 'relu')

AL, cache = forwardprop(A, W_L, b_L, activation= 'sigmoid')

return AL, caches


Trong-Hop Do
Backward Propagation in a Deep Network

……

𝑑𝑎 𝐿−1 𝑑𝑧 𝐿
𝑑𝑎 𝐿
Input 𝑑𝑎[𝑙]

𝐿
𝑑𝑊 𝑑𝑏 𝐿

With cross-entropy cost function:


[𝐿]
𝜕𝐿 𝑦 1−𝑦
𝑑𝑎 = [𝐿] = − [𝐿] +
𝜕𝑎 𝑎 1 − 𝑎[𝐿]

Trong-Hop Do
Vectorization of backward Propagation in a Deep Network

With cross-entropy cost function:


[𝐿]
𝜕𝐿 𝑌 1−𝑌
𝑑𝐴 = = − [𝐿] +
𝜕𝐴[𝐿] 𝐴 1 − 𝐴[𝐿]

Trong-Hop Do
Implement backward propagation
def backprop(dA, cache, activation):

Z, W = cache

if activation == "relu":

dZ = relu_backward(dA, Z)

elif activation == "sigmoid":

dZ = sigmoid_backward(dA, Z)

elif activation == "tanh":


dZ = tanh_backward(dA, Z)

dA_prev, dW, db = linear_backward(dZ, A_prev, W)

return dA_prev, dW, db


Trong-Hop Do
Implement backward propagation
With cross-entropy cost function:
Output layer L 𝜕𝐿 𝑦 1−𝑦
ReLu
𝑑𝑎 [𝐿] = [𝐿] = − [𝐿] +
Sigmoid 𝜕𝑎 𝑎 1 − 𝑎[𝐿]

def L_model_backward(AL, Y, caches):

# Initializing the backpropagation


dAL = - (Y/AL) + (1-Y)/(1-AL)

dA_pre, dW_L, db_L = backprop(dAL, caches[L], activation = "sigmoid")

for l in reversed(range(L-1)):

dA_pre, dW_l, db_l = backprop(dA_pre, caches[l], activation = "relu")

return [dW, db] of all layers

Trong-Hop Do
Implement backward propagation
def relu_backward(dA, Z):

dZ = dA
dZ[Z <= 0] = 0
0 if 𝑧 < 0
𝑔′ 𝑧 = ቊ return dZ
1 if 𝑧 ≥ 0
ReLu: 𝑎 = max 0, 𝑧
def sigmoid_backward(dA, Z):

A = 1/(1+np.exp(-Z))
𝑔′ 𝑧 = 𝑔 𝑧 1 − 𝑔 𝑧 = 𝑎(1 − 𝑎) dZ = dA * A * (1-A)

return dZ

def tanh_backward(dA, Z):

2 A = 1/(1+np.exp(-Z))
𝑔′ 𝑧 = 1 − 𝑔 𝑧 = 1 − 𝑎2
dZ = dA * (1-A*A)
𝑒 𝑧 − 𝑒 −𝑧
tanh: 𝑎 = 𝑧
𝑒 + 𝑒 −𝑧
return dZ
Trong-Hop Do
Implement backward propagation

def linear_backward(dZ, A_prev, W):

m = A_prev.shape[1]

dW = np.dot(dZ, A_prev.T)/m
db = dZ.sum(axis = 1, keepdims = True)/m
dA_prev = np.dot(W.T, dZ)

return dA_prev, dW, db

Trong-Hop Do
Implementation methodology

1. Initialize parameters / Define hyperparameters

2. Loop for num_iterations:


a. Forward propagation
b. Compute cost function
c. Backward propagation
d. Update parameters (using parameters, and grads from backprop)

3. Use trained parameters to predict labels

Trong-Hop Do
Backward Propagation in a Deep Network

Trong-Hop Do
Intuition about deep representation

Trong-Hop Do
Building blocks of deep neural networks

Trong-Hop Do
Building blocks of deep neural networks

Trong-Hop Do
Implement cost function

𝑚 𝑚
1 1
Cross entropy cost function: 𝐽 𝑤, 𝑏 = ෍ ℒ 𝑦ො 𝑖 , 𝑦 𝑖 =− ෍ 𝑦 (𝑖) log 𝑦ො 𝑖 + 1−𝑦 𝑖 log 1 − 𝑦ො 𝑖
𝑚 𝑚
𝑖=1 𝑖

def compute_cost(AL, Y):

m = Y.shape[1]

cost = -np.sum(Y*np.log(AL) + (1 - Y)*np.log(1 - AL))/m

return cost

Trong-Hop Do
Future topics

• Introduction to Neural Network and Deep Learning

• Improving Deep Neural Networks

• Convolutional Neural Networks

• Sequence Models

• Deep Learning in data mining and big data analysis

• Other architectures and current research activity in deep learning


Trong-Hop Do

You might also like