0% found this document useful (0 votes)

12 views21 pages

Unit Online 1.3

The document discusses regularization techniques in deep learning to prevent overfitting, including L1 and L2 regularization, dropout, and early stopping. It explains the importance of model parameters and hyper-parameters in training deep neural networks, as well as the processes of model optimization and selection. Additionally, it covers concepts like training, validation, and test sets, along with error metrics such as Mean Squared Error and the Delta Learning Rule.

Uploaded by

aakilalig

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

Unit Online 1.3

Uploaded by

aakilalig

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

NEURAL NETWORKS & DEEP LEARNING

(21MCA24DB3)

Prepared & Presented By:

Dr. Balkishan
Assistant Professor
Department of Computer Science & Applications
Maharshi Dayanand University
Rohtak
Regularizing a Deep Network
(Technique to prevent overfitting)
• Regularization is a technique which makes
slight modifications to the learning algorithm
such that the model generalizes better.
• This in turn improves the model’s
performance on the unseen data.
• Reduce the complexity of the model
Regularization
• Regularization is a technique used to reduce the errors by fitting the function
appropriately on the given training set and avoid overfitting.
The commonly used regularization techniques are :

-L2 regularization
-L1 regularization
-Dropout regularization
- Early Stopping Regularization

• A regression model that uses L2 regularization technique is called Ridge regression.

• A regression model which uses L1 Regularization technique is called LASSO(Least

Absolute Shrinkage and Selection Operator) regression.
•

Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term

to the loss function(L).
L2 Regularization
Equation (1), x is the independent variable, y is
the dependent variable and .7, 1.2, 21, 39 are
regression coefficients

Scale down version of equation (1)

What is Ridge Regression?

• Ridge regression is a model tuning method

that is used to analyze any data that suffers
from multicollinearity.
• This method performs L2 regularization.
• When the issue of multicollinearity occurs,
least-squares are unbiased, and variances are
large, this results in predicted values being far
away from the actual values.
Important Observations
• In simple terms, the minimization objective = LS Obj + α (sum of
the square of coefficients)
• Where LS Obj is Least Square Objective that is the linear
regression objective without regularization.
• Here α is the turning factor that controls the strength of the
penalty term.
• If α = 0, the objective becomes similar to simple linear regression.
So we get the same coefficients as simple linear regression.
• If α = ∞, the coefficients will be zero because of infinite weightage
on the square of coefficients as anything less than zero makes the
objective infinite.
• If 0 < α < ∞, the magnitude of α decides the weightage given to the
different parts of the objective.
L1 Regularization
Dropout Regularization
• Randomly selected neurons are ignored during each
training step.

• Dropped neurons don’t have effect on next layers.

• Dropped neurons are not updated in backward training.

Model Exploration and Hyper Parameter Tuning

Model Parameters (Learned during training)

•Model Parameters are the entities learned via training from the
training data.
•They are not set manually by the designer.
With respect to deep neural networks, the model parameters are:
-Weights
- Biases
Model Hyper-parameters (Control the parameters)
•These are parameters that govern(control) the determination of the model
parameters during training
-They are typically set manually via heuristics
-They are tuned during a cross-validation phase
Examples:
Learning rate, number of layers, number of units in each layer, activation
functions, many others
• What is a model?
• Model contains the hyper parameters describe the neural network.
Because hyper parameters govern (control) the parameters of the network.

Implicitly the model contains:

-The topology of the deep neural network (i.e. layers and their
interconnection)
- The learned parameters (i.e., the learned weights and biases)

The model is dependent upon the hyper parameters because the hyper
parameters determine the learned parameters (weights and biases).

Hyper parameters include:

-Learning Rate
-Number of Layers
-Number of Units in each Layer
-Activation Functions
-Etc.
Model Optimization

• To optimize the model (inference time behavior), a process

known as model selection is performed
• Model selection contains the selection of hyper parameters that
yield the best performance of the neural network
• The hyper parameters are tuned using an iterative process of
either:
-Validation
-Cross-Validation
• Many models may be evaluated during the validation/cross-
validation phase and the optimal model is selected
• The optimal model is then evaluated on the test dataset to
determine how well it performs on data never seen before
Training set-Validation Set and Test Set
• Training Set – Data set used to learn the optimal model
parameters (weights, biases)
• Validation (“Dev”) Set – Data set used to perform model
selection (tuning of hyper parameters)
• Used to estimate the generalization error of the training
allowing for the hyper parameters to be updated accordingly
• Test Set – Data set used to assess the fully trained model
• A fully trained model is the model that has been selected via
hyper parameter tuning and has been subsequently trained to
determine the optimal weights and biases (e.g., using back
propagation)
Train, Validation and Test Sets
Gradient Descent Learning
Mean Squared Error
• Mean Squared Error (MSE) – SSE/n where n
is the number of instances in the data set
– SSE means Sum Square Error
– This can be used to normalizes the error for data
sets of different sizes
– MSE is the average squared error per pattern
• Root Mean Squared Error (RMSE) – is the
square root of the MSE
– This puts the error value back into the same units
as the features and can thus be more intuitive
– RMSE is the average distance (error) of targets
from the outputs in the same scale as the features
Gradient Descent Learning: Minimize
(Maximze) the Objective Function

Error Landscape

SSE:
Sum
Squared
Error
S (ti – i)2

0
Weight Values
Delta Learning Rule
(Widrow-Hoff Rule)
• Goal is to decrease overall error each time a weight is
changed
• Total Sum Squared Error (SSE) is called objective function E
= S (ti – zi)2

• Delta learning rule is valid only for continuous activation

functions and in the supervised training mode
• The delta rule may be stated as “ the adjustment made to
a synaptic weight of a neuron is proportional to the
product of the error signal and the input signal of the
synapse”
Delta Rule for Single Output Unit
• The delta rule changes the weight of the connection to
minimize the difference between the net input to the
output unit i.e. yin and the target value t
Delta rule is given as

wi  t  yin xi

-Where x is the vector of activation of input unit
-Yin is the net input to output unit
- t is the target vector, α is the learning rate
Difference between Perceptron and Delta
(Windrow-Hoff) Learning Rule
• The Windrow-Hoff is very similar to perceptron learning
rule, but their origins are different.
• Perceptron learning rule originates from the Hebbian
assumption while delta rule is derived from gradient-
descent method
• Perceptron learning rule stops after a finite number of
learning steps, but the gradient-descent approach
continues forever, converging only asymptotically to the
solution.
• Delta rule updates the weights between the connections so as
to minimize the difference the net input to the output unit
and the target value

Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
DL Class3
No ratings yet
DL Class3
28 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Machine Learning by Tom Mitchell - Definitions
No ratings yet
Machine Learning by Tom Mitchell - Definitions
12 pages
Richi's Neural Nets Summary
No ratings yet
Richi's Neural Nets Summary
114 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Unit 2
No ratings yet
Unit 2
37 pages
6 Working Example 01-08-2024
No ratings yet
6 Working Example 01-08-2024
21 pages
DL M2 Regularization
No ratings yet
DL M2 Regularization
12 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
1 Intro
No ratings yet
1 Intro
91 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Neural Network Intro Lecture 4
No ratings yet
Neural Network Intro Lecture 4
46 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Lect3 UWA PDF
No ratings yet
Lect3 UWA PDF
73 pages
QB1 DL
No ratings yet
QB1 DL
20 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
Deep Learning Important Questions For Ia 1
No ratings yet
Deep Learning Important Questions For Ia 1
11 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Deep Learning Notes-2
No ratings yet
Deep Learning Notes-2
16 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Session 3
No ratings yet
Session 3
26 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
No ratings yet
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
16 pages
ANN Presentation Exam Hafsa
No ratings yet
ANN Presentation Exam Hafsa
29 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Underfitting Overfitting
No ratings yet
Underfitting Overfitting
38 pages
Cours 4
No ratings yet
Cours 4
30 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
DNN Hyperparameter Tuning
No ratings yet
DNN Hyperparameter Tuning
105 pages
CNN Basic Structure, Hyper-Parameter Tuning, Regularization-Dropouts
No ratings yet
CNN Basic Structure, Hyper-Parameter Tuning, Regularization-Dropouts
54 pages
Unit 3
No ratings yet
Unit 3
47 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Neural Networks Report
No ratings yet
Neural Networks Report
4 pages
DL Module 2
No ratings yet
DL Module 2
8 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Annn
No ratings yet
Annn
12 pages
Deep Learning and Its Applications
No ratings yet
Deep Learning and Its Applications
21 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Reserch Papers On Deep Learning Mpgi
No ratings yet
Reserch Papers On Deep Learning Mpgi
6 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Deep Learning Module 3-1
No ratings yet
Deep Learning Module 3-1
31 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
AIML (4th Sem)
No ratings yet
AIML (4th Sem)
22 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Civil and Environmental Engineering Reports: Hedonic Pricing Model For Real Property Valuation Via Gis - A Review
No ratings yet
Civil and Environmental Engineering Reports: Hedonic Pricing Model For Real Property Valuation Via Gis - A Review
14 pages
Saim PDF
No ratings yet
Saim PDF
12 pages
MLR - Bank Revenues PDF
No ratings yet
MLR - Bank Revenues PDF
18 pages
Prakhar Sikka - CV
No ratings yet
Prakhar Sikka - CV
1 page
Chapter 3
No ratings yet
Chapter 3
43 pages
The Effect of Digital Marketing Transformation Trends On Consumers Purchase Intention in B2B Businesses The Moderating Role of Brand Awareness
No ratings yet
The Effect of Digital Marketing Transformation Trends On Consumers Purchase Intention in B2B Businesses The Moderating Role of Brand Awareness
25 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
241-Article Text-641-1-10-20221025
No ratings yet
241-Article Text-641-1-10-20221025
11 pages
Estimation of Road Load Parameters Via On-Road Vehicle Testing G
No ratings yet
Estimation of Road Load Parameters Via On-Road Vehicle Testing G
44 pages
Liu 2016
No ratings yet
Liu 2016
9 pages
Handbook For Experimenters DX8 Design Expert
No ratings yet
Handbook For Experimenters DX8 Design Expert
70 pages
Supplement 5 - Multiple Regression
No ratings yet
Supplement 5 - Multiple Regression
19 pages
The Effect of Multicollinearity in Nonlinear Regression Models
No ratings yet
The Effect of Multicollinearity in Nonlinear Regression Models
4 pages
Essentials of Structural Equation Modeling PDF
No ratings yet
Essentials of Structural Equation Modeling PDF
120 pages
2017 NIR Meat and Meat Product
No ratings yet
2017 NIR Meat and Meat Product
24 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Econometrics Board Questions
No ratings yet
Econometrics Board Questions
13 pages
Evaluating Mediation and Moderation Effects in School Psychology A Presentation of Methods and Review of Current Practice
No ratings yet
Evaluating Mediation and Moderation Effects in School Psychology A Presentation of Methods and Review of Current Practice
32 pages
6 Building and Testing Models in SEM
No ratings yet
6 Building and Testing Models in SEM
8 pages
Dissertation Milestones Timetable Semester 2: Milestone Target Times Completion of Dissertation
No ratings yet
Dissertation Milestones Timetable Semester 2: Milestone Target Times Completion of Dissertation
9 pages
1 s2.0 S1877042814059035 Main
No ratings yet
1 s2.0 S1877042814059035 Main
8 pages
Pls-Sem P
100% (1)
Pls-Sem P
32 pages
Multiple Regression
100% (1)
Multiple Regression
7 pages
IJMRES 668 Vol 14 (4) 2024 PP (44-63)
No ratings yet
IJMRES 668 Vol 14 (4) 2024 PP (44-63)
20 pages
CH 4.violations of The Assumptions of The Classical Model
No ratings yet
CH 4.violations of The Assumptions of The Classical Model
54 pages
Impact of Tourism Sector On The Rural Development of Sri Lanka
No ratings yet
Impact of Tourism Sector On The Rural Development of Sri Lanka
7 pages
Problem 2:: Expenditure (Y) Income (X1) Family Size (X2)
No ratings yet
Problem 2:: Expenditure (Y) Income (X1) Family Size (X2)
8 pages
The Effect of Brand Awareness, Brand Association and Perceived Quality On Consumer Loyalty Through Consumer Satisfaction
No ratings yet
The Effect of Brand Awareness, Brand Association and Perceived Quality On Consumer Loyalty Through Consumer Satisfaction
8 pages
Module 3 Basic-Econometrics
No ratings yet
Module 3 Basic-Econometrics
38 pages
The Predictive Machine Learning Model Towards Effect of Human
No ratings yet
The Predictive Machine Learning Model Towards Effect of Human
12 pages

Unit Online 1.3

Uploaded by

Unit Online 1.3

Uploaded by

NEURAL NETWORKS & DEEP LEARNING

Prepared & Presented By:

• A regression model that uses L2 regularization technique is called Ridge regression.

• A regression model which uses L1 Regularization technique is called LASSO(Least

Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term

Scale down version of equation (1)

• Ridge regression is a model tuning method

• Dropped neurons don’t have effect on next layers.

• Dropped neurons are not updated in backward training.

Model Parameters (Learned during training)

Implicitly the model contains:

Hyper parameters include:

• To optimize the model (inference time behavior), a process

• Delta learning rule is valid only for continuous activation

wi  t  yin xi

You might also like