Week9 CIV2020 Lecture Note Rev
Week9 CIV2020 Lecture Note Rev
e.g. x = -2, y = 5, z = -4
Want:
activations
“local gradient”
Momentum update
- Physical interpretation as ball rolling down the loss function + friction (mu coefficient).
- mu = usually ~0.5, 0.9, or 0.99 (Sometimes annealed over time, e.g. from 0.5 -> 0.99)
Momentum update
SGD
vs
Momentum notice momentum
overshooting the target, b
ut overall getting to the mi
nimum much faster.
gradient
step
nag =
Nesterov Accelerated
Gradient
Introduced in a slide in
Geoff Hinton’s Coursera
class, lecture 6
Introduced in a slide in
Geoff Hinton’s Coursera
class, lecture 6
Cited by several
papers as:
adagrad
rmsprop
momentum
RMSProp-like
Looks a bit like RMSProp with momentum
momentum
bias correction
(only relevant in first few itera
tions when t is small)
RMSProp-like
The bias correction compensates for the fact that m,v are initialized
at zero and need some time to “warm up”.
Solving for the critical point we obtain the Newton parameter update:
L-BFGS
- Usually works very well in full batch, deterministic mode
i.e. if you have a single, deterministic f(x) then L-BFGS will probably
work very nicely
- Does not transfer very well to mini-batch setting. Gives bad results
. Adapting L-BFGS to large-scale, stochastic setting is an active area
of research.
step decay:
e.g. decay learning rate by half every few
epochs.
exponential decay:
1/t decay:
In practice:
Regularization 사용 결과?
→ 모델의 일반화가 가능해짐
36
[CIV7084] 21-2 건설딥러닝특론 강의자료
00. Review
과적합(Overfitting)을 방지하기 위한 방안
Weight Regularization
L2 regularization
L1 regularization
Elastic net (L1 + L2)
Dropout
02. Dropout
Regularization: Dropout
“randomly set some neurons to zero in the forward pass”
02. Dropout
Example forward
pass with a 3-layer n
etwork using
dropout
02. Dropout
Waaaait a second…
How could this possibly be a good idea?
02. Dropout
Waaaait a second…
How could this possibly be a good idea?
Forces the network to have a redundant representation.
has an ear X
has a tail
is furry X cat
score
has claws
mischievous X
look
02. Dropout
Waaaait a second…
How could this possibly be a good idea?
Another interpretation:
02. Dropout
At test time….
Ideally:
want to integrate out all the noise
02. Dropout
At test time….
Can in fact do this with a single forward pass! (approximately)
Leave all input neurons turned on (no dropout).
during test: a = w0*x + w1*y
a
during train:
E[a] = ¼ * (w0*0 + w1*0
+ w0*0 + w1*y
w0 w1
+ w0*x + w1*0
x y
+ w0*x + w1*y)
= ¼ * (2 w0*x + 2 w1*y)
= ½ * (w0*x + w1*y)
Slide from CS231n
44
[CIV7084] 21-2 건설딥러닝특론 강의자료
02. Dropout
At test time….
Can in fact do this with a single forward pass! (approximately)
Leave all input neurons turned on (no dropout).
With p=0.5, using all
during test: a = w0*x + w1*y inputs in the forward
a
during train: pass would inflate the
E[a] = ¼ * (w0*0 + w1*0 activations by 2x from
what the network was
+ w0*0 + w1*y “used to” during
w0 w1
+ w0*x + w1*0 training!
+ w0*x + w1*y) Have to
x y compensate by
= ¼ * (2 w0*x + 2 w1*y) scaling the activations
= ½ * (w0*x + w1*y) back down by ½
02. Dropout
02. Dropout
Dropout Summary
02. Dropout
Cross-validation
cycle through the choice of which fold is
the validation fold, average results.
03. 딥러닝 모델 학습 과정
다양한 Hyperparameter를 적용한 모델 개발 수행
52
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 1: 딥러닝 모델 학습 목표 설정
• 목표 : MNIST 데이터 분류
• 개발을 통해 얻고자 하는 결과
-한 글자 짜리 손글씨 사진이 입력되었을 때, 해당 손글씨 사진이
어떤 숫자인지 자동으로 분류하는 딥러닝 모델
• 개발을 위해 적절한 학습 방법론
-지도학습
• 인공지능학습을 위한 데이터의 종류, 형태 등
-‘0’ (Class 1)
-‘1’ (Class 2)
-‘2’ (Class 3)
-‘…’ (Class m)
-‘9’ (Class 10)
53
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 2: 입력 데이터 전처리 수행
• 라벨링(Labeling)
* 강아지 (Class 1)
* 고양이 (Class 2)
* 그 외 사진들 (Class 3)
1 ⋯ 17
1
⋮ ⋱ ⋮
Class 1 0
M
35 ⋯ 1
0
X data Y data
(MxN matrix) (3x1 matrix)
N
54
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 2: 입력 데이터 전처리 수행
• 데이터 정규화(Normalization)
* Gaussian Normalization
* Min-max Normalization
* Centering 등등
55
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 3: 딥러닝 모델 구성
50 hidden
neurons
10 output neurons,
output layer
input layer one per class
CIFAR-10 images,
3072 numbers hidden layer
56
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 3: 딥러닝 모델 구성
• Network Architecture
57
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 4: Hyperparameter Fine-tuning
58
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 4: Hyperparameter Fine-tuning
모델 학습이
수행되는 과정에서,
Loss가 작아지는 것이
일반적인 현상..
59
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 4: Hyperparameter Fine-tuning
모델 학습이
수행되는 과정에서,
Loss가 작아지는 것이
일반적인 현상..
60
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 5: 학습 결과 분석
61
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 5: 학습 결과 분석
Loss
Bad initialization
a prime suspect
62
[CIV7084] 21-2 건설딥러닝특론 강의자료
05. 딥러닝 모델 학습 과정
Step 5: 학습 수행정도 평가
no gap
=> increase model capacity?
63
[CIV7084] 21-2 건설딥러닝특론 강의자료
03. 딥러닝 모델 학습 과정
Step 6: 모델 정확도 평가
• Confusion Matrix
Class 1 Class 2
Predicted Predicted
Class 1
TP FN
Actual
Class 2
FP TN
Actual
𝑇𝑁 + 𝑇𝑃 𝑃𝑟𝑒𝑐.× 𝑅𝑒𝑐.
Accuracy = 𝐹1 score = 2 ×
𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑃 𝑃𝑟𝑒𝑐. +𝑅𝑒𝑐.
𝑇𝑃 𝑇𝑃
Precision = Recall =
𝐹𝑃 + 𝑇𝑃 𝑇𝑃 + 𝐹𝑁
64
감사합니다.