0% found this document useful (0 votes)

5 views37 pages

(DL) Ch04-Regularization

The document discusses various regularization techniques in deep learning, including parameter norm penalties, constrained optimization, and dataset augmentation, to improve model generalization. Key methods such as L1 and L2 regularization, dropout, and semi-supervised learning are highlighted, emphasizing their roles in reducing test error and enhancing model performance. The lecturer is Duc Dung Nguyen from the Faculty of Computer Science and Engineering at Hochiminh City University of Technology.

Uploaded by

baotran.fablab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views37 pages

(DL) Ch04-Regularization

Uploaded by

baotran.fablab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Deep Learning

Regularization

Lecturer: Duc Dung Nguyen, PhD.

Contact: [email protected]

Faculty of Computer Science and Engineering

Hochiminh city University of Technology
Contents

1. Parameter Norm Penalties

2. Constrained optimization

3. Dataset Augmentation

4. Other Regularization Approaches

5. Dropout

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 1 / 31

Parameter Norm Penalties
Regularization

• Problem in ML: generalization!

• Regularization: strategies are explicitly designed to reduce the test error, possibly at the
expense of increased training error.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 2 / 31

Regularization

• Most regularization strategies are based on regularizing estimators

• Regularization of an estimator works by trading increased bias for reduced variance
• An effective regularizer: makes a profitable trade, reducing variance significantly while
not overly increasing the bias

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 3 / 31

Parameter Norm Penalties

• Main regularization approaches: limiting the capacity of the model by adding a parameter
norm penalty Ω(θ) to the objective function J

˜ X, y) = J(θ, X, y) + αΩ(θ)
J(θ, (1)

• Different choices of the parameter norm Ω can result in different solutions being referred
• In NNs, Ω is chosen to penalize only the weights of the affine transformation at each layer
(leave the bias unregularized)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 4 / 31

Parameter Norm Penalties

• The most common parameter norm penalty: L2 (weight decay), also called ridge
regression, or Tikhonov regularization

˜ α
J(w, X, y) = J(w, X, y) + αΩ(w) = J(θ, X, y) + w> w (2)
2
with the corresponding parameter gradient

˜
∇w J(w; X, y) = αw + ∇w J(w; X, y) (3)

• Gradient step

w ← w − (αw + ∇w J(w; X, y)) = (1 − α)w − ∇w J(w; X, y) (4)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 5 / 31

Parameter Norm Penalties

• L1 regularization X
Ω(θ) = kwk1 = |wi | (5)
i

• Regularized objective function

˜
J(w, X, y) = J(w, X, y) + αkwk1 (6)

with the corresponding parameter gradient

˜
∇w J(w; X, y) = αsign(w) + ∇w J(w; X, y) (7)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 6 / 31

Parameter Norm Penalties

• The regularization contribution to the gradient no longer scale linearly with each wi
• L1 regularization results in a solution that is more spare, comparing to L2 : some
parameters have an optimal value of zero.
• Sparsity property of L1 regularization has been used extensively as a feature selection
mechanism

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 7 / 31

Constrained optimization
Constrained Optimization

Constrained optimization

• Find the maximal or minimal value of f (x) for values of x in some set S
• Feasible points: points x that lie within the set S
• Find a solution that is small in some sense
• Common approach: impose a norm constraint, such as kxk ≤ 1

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 8 / 31

Constrained Optimization

Approach to constrained optimization

• Modify gradient descent taking the constraint into account

• If we use a small constant step size , we can make gradient descent steps, then project
the result back into S.
• If we use a line search, we can search only over step sizes that yield new x points that
are feasible, or we can project each point on the line back into the constraint region.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 9 / 31

Constrained Optimization

Karush–Kuhn–Tucker (KKT): a very general solution to constrained optimization.

• KKT multipliers: introduce new variables λi and αj for each constraint

• The generalized Lagrangian is then defined as
X X
L(x, λ, α) = f (x) + λi g (i) (x) + αj h(j) (x). (8)
i j

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 10 / 31

Constrained Optimization

• Solve a constrained minimization problem using unconstrained optimization of the

generalized Lagrangian
• Minimize minx∈S f (x) is equivalent to

min max max L(x, λ, α). (9)

x λ α,α≥0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 11 / 31

Constrained Optimization

• This follows because any time the constraints are satisfied,

max max L(x, λ, α) = f (x) (10)

λ α,α≥0

while any time a constraint is violated

max max L(x, λ, α) = ∞ (11)

λ α,α≥0

• No infeasible point will ever be optimal

• The optimum within the feasible points is unchanged

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 12 / 31

Dataset Augmentation
Dataset Augmentation

• Making ML model generalize better: train on more data!

• In practice: the amount of data is limited!
• Solution: create fake data for training

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 13 / 31

Dataset Augmentation

• Dataset augmentation: a particularly effective technique for object recognition

• Image: high dimensional, enormous variety of factors of variation
• E.g.: rotating, scaling, affine transformation, etc.
• Dataset augmentation is effective for speech recognition tasks as well

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 14 / 31

Dataset Augmentation

• Injecting noise
• NN prove not to be very robust to noise
• Unsupervised learning: denoising autoencoder
• Noise in hidden units: dataset augmentation at multiple levels of abstraction

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 15 / 31

Dataset Augmentation

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 16 / 31

Other Regularization Approaches
Semi-Supervised Learning

• Semi-supervised learning: usually refers to learning a representation h = f (x)

• Learn a representation so that examples from the same class have similar representations
• Provide useful cues for how to group examples in representation space
• A linear classifier in the new space may achieve better generalization in many cases
• Principal components analysis (PCA): a pre-processing step before applying a classifier
(on the projected data)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 17 / 31

Semi-Supervised Learning

• Construct models in which a generative model of either P (x) or P (x, y) shares

parameters with a discriminative model of P (y|x)
• Trade-off the supervised criterion − log P (y|x) with the unsupervised or generative one
(such as − log P (x) or − log P (x, y))
• The generative criterion expresses a particular form of prior belief about the solution to
the supervised learning problem

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 18 / 31

Multitask Learning

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 19 / 31

Early Stopping

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 20 / 31

Parameter Typing and Parameter Sharing

• Parameter sharing: force sets of parameters to be equal

• Interpret the various models or model components as sharing a unique set of parameters.
• Only a subset of the parameters (the unique set) need to be stored
• CNN

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 21 / 31

Dropout
Bagging

• Bagging (bootstrap aggregating): a techniques to reduce generalization error by

combining several models (Breiman, 1994)
• General strategy in ML: model averaging → ensemble method

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 22 / 31

Bagging

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 23 / 31

Dropout

• Dropout: provides a computationally inexpensive but powerful method of regularizing a

broad family of models
• A method of making bagging practical for ensembles of very large neural networks.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 24 / 31

Dropout

• Good for five to ten neural networks

• Dropout trains the ensemble consisting of all sub-networks that can be formed by
removing non-output units from an underlying base network

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 25 / 31

Dropout

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 26 / 31

Dropout

• Bagging
• The models are independent
• Each model is trained to convergence on its respective training set
• Dropout
• Models share parameters
• most models are not explicitly trained at all
• It is infeasible to sample all possible subnetworks within the lifetime of the universe
• The remaining sub-networks to arrive at good settings of the parameters
• Dropout can represent an exponential number of models with a tractable amount of
memory

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 27 / 31

Dropout

• Assume that the model’s role is to output a probability distribution

• Bagging:
• each model i produces a probability distribution p(i) (y|x)
• The prediction of the ensemble is given by the arithmetic mean of all of these distributions
k
1 X (i)
p (y|x) (12)
k i=1

• Dropout:
• Each sub-model defined by mask vector µ defines a probability distribution p(y|x, µ)
• The arithmetic mean over all masks is given by
X
p(µ)p(y|x, µ) (13)
µ

where p(µ) is the probability distribution that was used to sample µ at training time
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 28 / 31
Dropout

• Very computationally cheap

• Using dropout during training requires only O(n) computation per example per update, to
generate n random binary numbers and multiply them by the state
• Dropout does not significantly limit the type of model or training procedure that can be
used

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 29 / 31

Dropout

• The cost of using dropout in a complete system can be significant

• Increase the size of the model
• Typically the optimal validation set error is much lower when using dropout
• The cost: a much larger model and many more iterations of the training algorithm

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 30 / 31

Dropout

• For very large datasets

• Regularization confers little reduction in generalization error
• The computational cost may outweigh the benefit of regularization
• Very few data samples
• Dropout is less effective
• When additional unlabeled data is available, unsupervised feature learning can gain an
advantage over dropout

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 31 / 31

Your Guide To Developing Thinking Skills in Science 1726418158
No ratings yet
Your Guide To Developing Thinking Skills in Science 1726418158
19 pages
Fluency Assessment Reading
No ratings yet
Fluency Assessment Reading
4 pages
PROBLEM and JOURNAL
No ratings yet
PROBLEM and JOURNAL
2 pages
Human Resource Management and Digitalization by Franca Cantoni (Editor), Gianluigi Mangia (Editor) (Z-Lib - Org) - Trang-1
No ratings yet
Human Resource Management and Digitalization by Franca Cantoni (Editor), Gianluigi Mangia (Editor) (Z-Lib - Org) - Trang-1
19 pages
CelebAI Hiring Notice - Campus
No ratings yet
CelebAI Hiring Notice - Campus
3 pages
Strategies and Challenges of English Education Students in Vocabulary Mastery
No ratings yet
Strategies and Challenges of English Education Students in Vocabulary Mastery
14 pages
Report On Scholastic Internship Programme
No ratings yet
Report On Scholastic Internship Programme
5 pages
PT25
No ratings yet
PT25
5 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Exploring ChatGPT's Role in English Learning For EFL Students: Insights and Experiences
No ratings yet
Exploring ChatGPT's Role in English Learning For EFL Students: Insights and Experiences
12 pages
14 - Học sâu (3) - Improve DNN - v3
No ratings yet
14 - Học sâu (3) - Improve DNN - v3
129 pages
Midterm Study Guide Csci566
No ratings yet
Midterm Study Guide Csci566
20 pages
Pawim F 003 Swot Matrix Snes
No ratings yet
Pawim F 003 Swot Matrix Snes
2 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
22-23 Professional Growth Plan Kacey
No ratings yet
22-23 Professional Growth Plan Kacey
3 pages
Day 2 Week 32
No ratings yet
Day 2 Week 32
8 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
(DL) Ch06-Convolutional Networks
No ratings yet
(DL) Ch06-Convolutional Networks
59 pages
DL Qa
No ratings yet
DL Qa
15 pages
Deep Learning - Unit-III Two Marks
100% (1)
Deep Learning - Unit-III Two Marks
3 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Tle10 Afa Poultry q4 Mod7 Poultrybreedsclasses Varieties v4
No ratings yet
Tle10 Afa Poultry q4 Mod7 Poultrybreedsclasses Varieties v4
43 pages
Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
Deep Learning Module-03
No ratings yet
Deep Learning Module-03
20 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
Cisco Networking Academy Course Information
No ratings yet
Cisco Networking Academy Course Information
12 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
Week 10
No ratings yet
Week 10
69 pages
Unit Iv NNHDL
No ratings yet
Unit Iv NNHDL
15 pages
Lesson Plans 33 and 34: MODULE 2: English All Around 2.1 English Rocks CONTENTS: The Importance of The English Language
No ratings yet
Lesson Plans 33 and 34: MODULE 2: English All Around 2.1 English Rocks CONTENTS: The Importance of The English Language
5 pages
Research Brief - Mara Macalan
No ratings yet
Research Brief - Mara Macalan
1 page
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
No ratings yet
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
13 pages
M.sc. Research Proposal Guide
No ratings yet
M.sc. Research Proposal Guide
1 page
Unit 3
No ratings yet
Unit 3
47 pages
NN&DL Unit-IV Regularization For Deep Learning
No ratings yet
NN&DL Unit-IV Regularization For Deep Learning
16 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
Programación Didáctica: Inglés
0% (1)
Programación Didáctica: Inglés
14 pages
Chapter 2 - 4 Important Techniques
No ratings yet
Chapter 2 - 4 Important Techniques
34 pages
The European Profiling Grid: 2011-1-FR1-LEO05-24446
No ratings yet
The European Profiling Grid: 2011-1-FR1-LEO05-24446
35 pages
School Forms (Go)
No ratings yet
School Forms (Go)
18 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
32 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Learning Skills Assessment Rubric
No ratings yet
Learning Skills Assessment Rubric
2 pages
Cours 4
No ratings yet
Cours 4
30 pages
5 Regularization
No ratings yet
5 Regularization
79 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
Grade 10 ORV Test
No ratings yet
Grade 10 ORV Test
2 pages
Taylor Spratt Cosmo The Cat
No ratings yet
Taylor Spratt Cosmo The Cat
4 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Unit 5
No ratings yet
Unit 5
36 pages
Machine Learning
No ratings yet
Machine Learning
44 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
A Mathematical Guide To Operator Learning
No ratings yet
A Mathematical Guide To Operator Learning
45 pages
RADL TQKhoat
No ratings yet
RADL TQKhoat
50 pages
Ch2 - Fundamental of Deep Learning
No ratings yet
Ch2 - Fundamental of Deep Learning
33 pages
DL Unit 3
No ratings yet
DL Unit 3
59 pages
Whales English 1v2 Demo Material
No ratings yet
Whales English 1v2 Demo Material
10 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
Instructional Supervisory Plan For ED 22
No ratings yet
Instructional Supervisory Plan For ED 22
12 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Architecture Thesis: "Anand Ram Jaipuria School" (Synopsis Report)
No ratings yet
Architecture Thesis: "Anand Ram Jaipuria School" (Synopsis Report)
11 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
3 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Obe Syllabus Format Rcastillo Pol Theory
100% (1)
Obe Syllabus Format Rcastillo Pol Theory
11 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
40 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
DL Intro
No ratings yet
DL Intro
64 pages
q3 g11 Practical Research 1 Week 4 Module 10
100% (1)
q3 g11 Practical Research 1 Week 4 Module 10
15 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
DL Class3
No ratings yet
DL Class3
28 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Code No.:OO Paper - I Subject: General Paper On Teaching Research Aptitude Syllabus
No ratings yet
Code No.:OO Paper - I Subject: General Paper On Teaching Research Aptitude Syllabus
7 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Deep Learning: Computer Science and Engineering
No ratings yet
Deep Learning: Computer Science and Engineering
18 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Deep Learning
100% (2)
Deep Learning
49 pages

(DL) Ch04-Regularization

Uploaded by

(DL) Ch04-Regularization

Uploaded by

Deep Learning

Lecturer: Duc Dung Nguyen, PhD.

Faculty of Computer Science and Engineering

1. Parameter Norm Penalties

4. Other Regularization Approaches

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 1 / 31

• Problem in ML: generalization!

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 2 / 31

• Most regularization strategies are based on regularizing estimators

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 3 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 4 / 31

w ← w − (αw + ∇w J(w; X, y)) = (1 − α)w − ∇w J(w; X, y) (4)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 5 / 31

• Regularized objective function

with the corresponding parameter gradient

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 6 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 7 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 8 / 31

Approach to constrained optimization

• Modify gradient descent taking the constraint into account

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 9 / 31

Karush–Kuhn–Tucker (KKT): a very general solution to constrained optimization.

• KKT multipliers: introduce new variables λi and αj for each constraint

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 10 / 31

• Solve a constrained minimization problem using unconstrained optimization of the

min max max L(x, λ, α). (9)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 11 / 31

• This follows because any time the constraints are satisfied,

max max L(x, λ, α) = f (x) (10)

while any time a constraint is violated

max max L(x, λ, α) = ∞ (11)

• No infeasible point will ever be optimal

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 12 / 31

• Making ML model generalize better: train on more data!

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 13 / 31

• Dataset augmentation: a particularly effective technique for object recognition

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 14 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 15 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 16 / 31

• Semi-supervised learning: usually refers to learning a representation h = f (x)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 17 / 31

• Construct models in which a generative model of either P (x) or P (x, y) shares

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 18 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 19 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 20 / 31

• Parameter sharing: force sets of parameters to be equal

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 21 / 31

• Bagging (bootstrap aggregating): a techniques to reduce generalization error by

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 22 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 23 / 31

• Dropout: provides a computationally inexpensive but powerful method of regularizing a

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 24 / 31

• Good for five to ten neural networks

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 25 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 26 / 31

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 27 / 31

• Assume that the model’s role is to output a probability distribution

• Very computationally cheap

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 29 / 31

• The cost of using dropout in a complete system can be significant

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 30 / 31

• For very large datasets

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Deep Learning 31 / 31

You might also like

w ← w − (αw + ∇w J(w; X, y)) = (1 − α)w − ∇w J(w; X, y) (4)