Unit - 3 - ML - 24
Unit - 3 - ML - 24
Dr.Rupali Pawar
Bias and Variance
Dr.Rupali Pawar
Underfitting
• In the case of underfitting, the model is
not able to learn enough from the
training data, and hence it reduces the
accuracy and produces unreliable
predictions.
• An underfitted model has high bias and
low variance.
How to avoid underfitting:
• By increasing the training time of the
model.
• By increasing the number of features.
Dr.Rupali Pawar
Over fitting and how to reduce overfitting
What is overfitting? How to reduce overfitting
• Building a model that matches the training • Cross-validation: By splitting the data into training and testing sets multiple
times, cross-validation can help identify if a model is overfitting or underfitting
data “too closely”, generating a complex and can be used to tune hyperparameters to reduce variance.
model.
• Feature selection: By choosing the only relevant feature will decrease the
Why does it occur? model’s complexity. and it can reduce the variance error.
• Evaluating a model by testing it on the same • Regularization: We can use L1 or L2 regularization to reduce variance in
machine learning models
data that was used to train it.
• Ensemble methods: It will combine multiple models to improve
• Creating a model that is “too complex”. generalization performance. Bagging, boosting, and stacking are common
ensemble methods that can help reduce variance and improve generalization
What is the impact of over-fitting? performance.
• Simplifying the model: Reducing the complexity of the model, such as
• Model will do well on the training data, but decreasing the number of parameters or layers in a neural network, can also
won’t generalize to out-of-sample data i.e.,on help reduce variance and improve generalization performance.
test data • Early stopping: Early stopping is a technique used to prevent overfitting by
stopping the training of the deep learning model when the performance on the
• Model will have low bias, but high variance. validation set stops improving.
Dr.Rupali Pawar
Dr.Rupali Pawar
Bias Variance Tradeoff
Dr.Rupali Pawar
Linear Regression
• Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between a target
or dependent (y) and one or more independent (y) variables, hence called
as linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
• The linear regression model provides a sloped straight line representing
the relationship between the variables.
• Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between a target
or dependent (y) and one or more independent (x) variables, hence called
as linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
• The linear regression model provides a sloped straight line representing
the relationship between the variables.
• Lasso stands for – Least Absolute Shrinkage and Selection Operator. It is a technique
where data points are shrunk towards a central point, like the mean. Lasso is also known
as L1 regularization.
• It is applied when the model is overfitted or facing computational challenges.
Batch Gradient Descent involves calculations over the full training set at
each step, which is very slow on very large training data. Thus, it
becomes very computationally expensive to do Batch GD. However, this
is great for convex or relatively smooth error manifolds. Also, Batch GD
scales well with the number of features.
Batch Gradient Descent and Stochastic
Gradient Descent
• Stochastic Gradient Descent tries to solve the main problem in Batch Gradient
descent which is the usage of whole training data to calculate gradients at each
step. SGD is stochastic in nature i.e. it picks up a “random” instance of training
data at each step and then computes the gradient, making it much faster as
there is much fewer data to manipulate at a single time, unlike Batch GD.
Computes gradient using the whole Training sample Computes gradient using a single Training sample
Slow and computationally expensive algorithm Faster and less computationally expensive than Batch GD
Not suggested for huge training samples. Can be used for large training samples.
Gives optimal solution given sufficient time to converge. Gives good solution but not optimal.
The data sample should be in a random order, and this is why we want to
No random shuffling of points are required.
shuffle the training set for every epoch.
Can’t escape shallow local minima easily. SGD can escape shallow local minima more easily.
The learning rate is fixed and cannot be changed during training. The learning rate can be adjusted dynamically.
It may suffer from overfitting if the model is too complex for the It can help reduce overfitting by updating the model parameters more
dataset. frequently.