ML Concepts

The document discusses key concepts in machine learning, including bias and variance in model fitting, the importance of finding a balance between overfitting and underfitting, and various optimization techniques. It also covers cross-entropy and KL divergence in relation to model evaluation, as well as different types of recommendation systems and weight initialization techniques. Additionally, it outlines various optimizers used in deep learning, highlighting their unique characteristics and applications.

Uploaded by

kjmfqkwj4v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views3 pages

ML Concepts

Uploaded by

kjmfqkwj4v

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Bias & Variance

(Assuming for below description that we are built linear regression model..)

When trying to fit data to model (say linear regression line)… bias refers to model’s
inability to capture true relationship.. meaning data is seen as curved line but
model is fitting straight line.. then model will have high bias, but if model is
changed to curvey/zigzag line then bias will be reduced or become zero.. since we
build model by using training data and model does not see testing data until after
fitting is completed, trade off here is that - fitting a curvy line with low bias for
training data will likely cause overfitting as it’s seen perfectly predicting for training
data but not that great for testing data as model is not generalised.. Vice versa -
High bias that’s fitting straight line to curvey/zigzag data points may underfit
training data and so may not properly capture true relationship of data

Difference in fits between data sets (training vs testing) is called variance.. Using
Sum of squared errors method, curvey or zigzag line will have zero errors for
training data but high errors for testing data (since it’s fitting training data points
perfectly).. so it has very High variance or variability.. Vice versa - Straight line was
fitting training data with some amount of error but it does the same for testing
data.. in both cases, sum of squared errors is not too much different.. so basically
this model is generalising data representation quite well.. essentially having low
variance or variability..

Ideally model should have low bias ans low variability.. but since it’s not possible in
real world.. idea is to find sweet spot between simple model (say line with mx+B
equation) versus complex model (say quadratic model).. balancing bias and
variance ultimately achieving sweet spot between overfitting and underfitting too..

__Entropy / KL Divergence__
Cross-Entropy: Average number of total bits to represent an event from Q instead
of P.
Relative Entropy (KL Divergence): Average number of extra bits to represent an
event from Q instead of P.

https://machinelearningmastery.com/cross-entropy-for-machine-learning/

Cross Entropy (Total bits ) - Cross entropy of two distributions (real and predicted)
that have the same probability distribution for a class label, will also always be 0.0.
Recall that when evaluating a model using cross-entropy on a training dataset that
we average the cross-entropy across all examples in the dataset. Therefore, a
cross-entropy of 0.0 when training a model indicates that the predicted class
probabilities are identical to the probabilities in the training dataset, e.g. zero loss.
In practice, a cross-entropy loss of 0.0 often indicates that the model has overfit
the training dataset, but that is another story.

__Word Embeddings__
https://ai.stackexchange.com/questions/18634/what-are-the-main-differences-
between-skip-gram-and-continuous-bag-of-words

Hyperparameters : Deep Learning

Dropout, Early stopping, Epochs, Learning rate, batch size, layers, activatin function,
optimizer, weight decay

Types of Recommendation Systems

Collaborative filtering : based on gathering and analyzing data on user’s behavior.
This includes the user’s online activities and predicting what they will like based on
the "similarity with other users - user/user collaborative".
Content-Based Filtering : based on the description of a product and a profile of the
"user’s preferred choices". In this recommendation system, products are described
using keywords, and a user profile is built to express the kind of item this user likes'

__Types of Optimizers__
- Gradient Descent: Gradient Descent is a fundamental optimization algorithm used
in machine learning. It updates the model parameters in the opposite direction of
the gradient of the loss function with respect to the parameters. It iteratively
adjusts the parameters to find the minimum of the loss function, aiming to
optimize the model.
- Stochastic Gradient Descent (SGD): SGD is a variant of Gradient Descent that
randomly selects a subset (mini-batch) of training examples to compute the
gradient and update the parameters. It introduces randomness to the optimization
process and is more computationally efficient for large datasets.
- AdaGrad (Adaptive Gradient Algorithm): AdaGrad adapts the learning rate for
each parameter based on the historical gradients. It increases the learning rate for
infrequent features and decreases it for frequent features. AdaGrad is effective in
handling sparse data and has been widely used in natural language processing
tasks.
- RMSprop (Root Mean Square Propagation): RMSprop is an optimizer that
addresses the limitations of AdaGrad by maintaining an exponentially decaying
average of past squared gradients. It divides the learning rate by the root mean
square (RMS) of the past gradients, allowing for more stable and adaptive updates.
- Adam (Adaptive Moment Estimation): Adam is an adaptive optimization algorithm
that combines the advantages of both AdaGrad and RMSprop. It adapts the
learning rate for each parameter by considering the first and second moments of
the gradients. Adam is widely used in deep learning due to its efficiency and
effectiveness in optimizing complex models.
- Adadelta: Adadelta is an extension of AdaGrad that further improves its
limitations by addressing the rapidly decreasing learning rate. It replaces the
accumulation of past gradients with an exponentially decaying average, which
allows for adaptive learning rates without explicitly setting a global learning rate.
- Momentum: Momentum is an optimizer that adds a fraction of the previous
update to the current update, effectively creating momentum in the optimization
process. It helps accelerate convergence by dampening oscillations and speeding
up convergence along shallow directions in the optimization landscape.
- Nesterov Accelerated Gradient (NAG): NAG is an extension of the momentum
optimizer that calculates the gradient using the momentum term's future position.
It reduces the oscillations and allows for faster convergence, especially in
scenarios with sparse gradients.

Adam, RMSProp and Adagrad and Stochastic gradient descent (SGD) are all
optimizers, however SGD is the slowest among them.

Adaboost is an ensemble method algorithm.

Weight Initialization Techniques

- Zero initialization: This technique initializes all the weights to zero. However,
using this technique can lead to symmetry among the neurons, causing all the
neurons in a layer to update in the same way during training. Consequently, it is not
typically recommended for most scenarios.
- Random initialization: Random initialization involves assigning random values to
the weights from a specified distribution. The most common approach is to
sample the weights from a Gaussian distribution with zero mean and a small
standard deviation. This technique helps break the symmetry among the neurons
and promotes diverse updates during training.
- Xavier/Glorot initialization: Xavier initialization is designed "to address the
vanishing/exploding gradient problem" in deep neural networks. It initializes the
weights using a distribution with zero mean and a variance calculated based on the
number of inputs and outputs of a layer. It provides a balanced initialization for the
weights and is commonly used with activation functions like "tanh or sigmoid".
- He initialization: He initialization is similar to Xavier initialization but is specifically
designed for activation functions that benefit from a larger range of values, such as
ReLU (Rectified Linear Unit) and its variants. It initializes the weights using a
distribution with zero mean and a variance calculated based on the number of
inputs to the layer.
- LeCun initialization: LeCun initialization, also known as LeCun's uniform
initialization, is specifically designed for networks that use the "tanh activation
function". It initializes the weights using a uniform distribution within a specific
range that takes into account the number of inputs to the layer.
- Orthogonal initialization: Orthogonal initialization initializes the weights as
orthogonal matrices. It ensures that the weight vectors are orthogonal to each
other, which can help prevent the collapsing of gradients and improve training
stability, especially in recurrent neural networks (RNNs).

DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
DL Unit 4&5
No ratings yet
DL Unit 4&5
27 pages
Theory DL
No ratings yet
Theory DL
227 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Module 2
No ratings yet
Module 2
67 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
All DL
No ratings yet
All DL
72 pages
Introduction To Optimization-Lec1
No ratings yet
Introduction To Optimization-Lec1
36 pages
Sources of Data
100% (3)
Sources of Data
18 pages
DLA Model Set 2 14marks
No ratings yet
DLA Model Set 2 14marks
25 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
ANN Module-III
No ratings yet
ANN Module-III
16 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
Unit - IV
No ratings yet
Unit - IV
24 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
ML 5
No ratings yet
ML 5
26 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
8 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Reserch Papers On Deep Learning Mpgi
No ratings yet
Reserch Papers On Deep Learning Mpgi
6 pages
DL 4
No ratings yet
DL 4
15 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
DL Unit2
No ratings yet
DL Unit2
22 pages
DL Notes
No ratings yet
DL Notes
16 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
No ratings yet
4-Tensors and Opeartions - Probability Basics-Gradient Descent-27!07!2024
18 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
Unit 2
No ratings yet
Unit 2
37 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
6 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
No ratings yet
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
50 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
q2 Activity Sheets - Grade 3
100% (2)
q2 Activity Sheets - Grade 3
13 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Saso Iso 17089 2 2020 e
No ratings yet
Saso Iso 17089 2 2020 e
45 pages
AI Book 10 - Worksheets - Unit 1 - Answer Key
No ratings yet
AI Book 10 - Worksheets - Unit 1 - Answer Key
8 pages
8.1-Transport - in - Plants - Igcse-Cie-Biology - Solved OLI
100% (1)
8.1-Transport - in - Plants - Igcse-Cie-Biology - Solved OLI
13 pages
GPT-9000 User Manual - EN Rev G 201712
No ratings yet
GPT-9000 User Manual - EN Rev G 201712
183 pages
THESESAASTU 2019 Diversion Weir
50% (2)
THESESAASTU 2019 Diversion Weir
68 pages
Alexis de Tocqueville The First Social Scientist 1st Edition Jon Elster Instant Download
No ratings yet
Alexis de Tocqueville The First Social Scientist 1st Edition Jon Elster Instant Download
49 pages
Background of The Study vs. Literature Review
100% (3)
Background of The Study vs. Literature Review
6 pages
Understanding The Self Activity 4
No ratings yet
Understanding The Self Activity 4
2 pages
ECO Exam IMP Questions (JAN-24) by HM Hasnan
No ratings yet
ECO Exam IMP Questions (JAN-24) by HM Hasnan
83 pages
Authors Book
No ratings yet
Authors Book
274 pages
Assessment Task 1.2
No ratings yet
Assessment Task 1.2
14 pages
Impact of Colonialism On Africa and Its Economic Development
No ratings yet
Impact of Colonialism On Africa and Its Economic Development
8 pages
A Model of Self-Regulation
No ratings yet
A Model of Self-Regulation
15 pages
Meaning of Political Science
No ratings yet
Meaning of Political Science
4 pages
CE118 Project Part 1
No ratings yet
CE118 Project Part 1
42 pages
15114L23 Popa-Mirela 2023 29-2 - 133-137
No ratings yet
15114L23 Popa-Mirela 2023 29-2 - 133-137
5 pages
Fluorescence Micros
No ratings yet
Fluorescence Micros
22 pages
Data Science For Civil Engineering Unit 3 Notes-1
No ratings yet
Data Science For Civil Engineering Unit 3 Notes-1
29 pages
ACR-Orientation Work Arrangement
No ratings yet
ACR-Orientation Work Arrangement
10 pages
Flaws in Education System
No ratings yet
Flaws in Education System
47 pages
Dissertation Zusammenfassung Schreiben
100% (2)
Dissertation Zusammenfassung Schreiben
6 pages
Practical Skill Improvement Needs of Technical College Mechanical Engineering Craft Practice Curriculum in Nigeria
No ratings yet
Practical Skill Improvement Needs of Technical College Mechanical Engineering Craft Practice Curriculum in Nigeria
9 pages
Stal S700 - Porownanie - 10-Hillong-Milan-Veljkovic
No ratings yet
Stal S700 - Porownanie - 10-Hillong-Milan-Veljkovic
30 pages
SRS-02 (Gen. Aptitude Test) SET-A PDF
No ratings yet
SRS-02 (Gen. Aptitude Test) SET-A PDF
22 pages
Environmental Problems in The Philippines
No ratings yet
Environmental Problems in The Philippines
7 pages
English Questions
No ratings yet
English Questions
6 pages
San Chit
No ratings yet
San Chit
2 pages
TiO2 APPLAB989092510 1
No ratings yet
TiO2 APPLAB989092510 1
3 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet