0% found this document useful (0 votes)

24 views48 pages

Module 3 Modified

Uploaded by

shitalastik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views48 pages

Module 3 Modified

Uploaded by

shitalastik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Presenter’s Name

Dr. Shital Bhatt

Associate Professor
School of Computational and Data Sciences

www.vidyashilpuniversity.co www.vidyashilpuniversity.co
m m
Bias and Variance in Machine
Learning
 There are various ways to evaluate a machine-learning model. We can
use MSE (Mean Squared Error) for Regression; Precision, Recall, and ROC
(Receiver operating characteristics) for a Classification Problem along
with Absolute Error. In a similar way, Bias and Variance help us in
parameter tuning and deciding better-fitted models among several built.
 Bias is one type of error that occurs due to wrong assumptions about
data such as assuming data is linear when in reality, data follows a
complex function. On the other hand, variance gets introduced with high
sensitivity to variations in training data. This also is one type of error
since we want to make our model robust against noise. There are two
types of error in machine learning. Reducible error and Irreducible error.
Bias and Variance come under reducible error.
What is Bias?
 Bias is simply defined as the inability of the model because of that there is some
difference or error occurring between the model’s predicted value and the actual
value. These differences between actual or expected values and the predicted
values are known as error or bias error or error due to bias. Bias is a systematic
error that occurs due to wrong assumptions in the machine learning process.
 Let YY be the true value of a parameter, and let Y’ be an estimator of Y based on
a sample of data. Then, the bias of the estimator Y^Y^ is given by:
 Bias(Y’)=E(Y’)–Y
 where E(Y’) ’ is the expected value of the estimator Y’. It is the measurement of
the model that how well it fits the data.
• Low Bias: Low bias value means fewer assumptions are taken to build the
target function. In this case, the model will closely match the training dataset.
• High Bias: High bias value means more assumptions are taken to build the
target function. In this case, the model will not match the training dataset
closely.
 The high-bias model will not be able to capture the dataset trend. It is
considered as the underfitting model which has a high error rate. It is
due to a very simplified algorithm.
 For example, a linear regression model may have a high bias if the data
has a non-linear relationship.
Ways to reduce high bias in Machine
Learning:
• Use a more complex model: One of the main reasons for high bias is the
very simplified model. it will not be able to capture the complexity of the
data. In such cases, we can make our mode more complex by increasing the
number of hidden layers in the case of a deep neural network. Or we can use
a more complex model like Polynomial regression for non-linear datasets,
CNN for image processing, and RNN for sequence learning.
• Increase the number of features: By adding more features to train the
dataset will increase the complexity of the model. And improve its ability to
capture the underlying patterns in the data.
• Reduce Regularization of the model: Regularization techniques such as
L1 or L2 regularization can help to prevent overfitting and improve the
generalization ability of the model. if the model has a high bias, reducing the
strength of regularization or removing it altogether can help to improve its
performance.
• Increase the size of the training data: Increasing the size of the training
data can help to reduce bias by providing the model with more examples to
learn from the dataset.
What is Variance?
 Variance is the measure of spread in data from its mean position. In
machine learning variance is the amount by which the performance of a
predictive model changes when it is trained on different subsets of the
training data. More specifically, variance is the variability of the model
that how much it is sensitive to another subset of the training dataset.
i.e. how much it can adjust on the new subset of the training dataset.
 Let Y be the actual values of the target variable, and Y’ be the
predicted values of the target variable. Then the variance of a model
can be measured as the expected value of the square of the difference
between predicted values and the expected value of the predicted
values. Variance=E[(Y’–E[Y’])^2]
 Variance errors are either low or high-variance errors.
• Low variance: Low variance means that the model is less sensitive to
changes in the training data and can produce consistent estimates of
the target function with different subsets of data from the same
distribution. This is the case of underfitting when the model fails to
generalize on both training and test data.
• High variance: High variance means that the model is very sensitive to
changes in the training data and can result in significant changes in the
estimate of the target function when trained on different subsets of data
from the same distribution. This is the case of overfitting when the
model performs well on the training data but poorly on new, unseen test
data. It fits the training data too closely that it fails on the new training
dataset.
Ways to Reduce the reduce Variance in
Machine Learning:

• Cross-validation: By splitting the data into training and testing sets multiple times, cross-
validation can help identify if a model is overfitting or underfitting and can be used to tune
hyperparameters to reduce variance.
• Feature selection: By choosing the only relevant feature will decrease the model’s complexity.
and it can reduce the variance error.
• Regularization: We can use L1 or L2 regularization to reduce variance in machine learning
models
• Ensemble methods: It will combine multiple models to improve generalization performance.
Bagging, boosting, and stacking are common ensemble methods that can help reduce variance
and improve generalization performance.
• Simplifying the model: Reducing the complexity of the model, such as decreasing the
number of parameters or layers in a neural network, can also help reduce variance and improve
generalization performance.
• Early stopping: Early stopping is a technique used to prevent overfitting by stopping the
training of the deep learning model when the performance on the validation set stops
improving.
Different Combinations of Bias-
Variance
• High Bias, Low Variance: A model with high bias and low variance is
said to be underfitting.
• High Variance, Low Bias: A model with high variance and low bias is
said to be overfitting.
• High-Bias, High-Variance: A model has both high bias and high
variance, which means that the model is not able to capture the
underlying patterns in the data (high bias) and is also too sensitive to
changes in the training data (high variance). As a result, the model will
produce inconsistent and inaccurate predictions on average.
• Low Bias, Low Variance: A model that has low bias and low variance
means that the model is able to capture the underlying patterns in the
data (low bias) and is not too sensitive to changes in the training data
(low variance). This is the ideal scenario for a machine learning model,
as it is able to generalize well to new, unseen data and produce
consistent and accurate predictions. But in practice, it’s not possible.
 Now we know that the ideal case will be Low Bias and Low variance,
but in practice, it is not possible. So, we trade off between Bias and
variance to achieve a balanced bias and variance.
 A model with balanced bias and variance is said to have optimal
generalization performance. This means that the model is able to
capture the underlying patterns in the data without overfitting or
underfitting. The model is likely to be just complex enough to capture
the complexity of the data, but not too complex to overfit the training
data. This can happen when the model has been carefully tuned to
achieve a good balance between bias and variance, by adjusting the
hyperparameters and selecting an appropriate model architecture.
Bias Variance Tradeoff
 If the algorithm is too simple (hypothesis with linear equation) then it
may be on high bias and low variance condition and thus is error-prone.
If algorithms fit too complex (hypothesis with high degree equation)
then it may be on high variance and low bias. In the latter condition, the
new entries will not perform well. Well, there is something between both
of these conditions, known as a Trade-off or Bias Variance Trade-off. This
tradeoff in complexity is why there is a tradeoff between bias and
variance. An algorithm can’t be more complex and less complex at the
same time. For the graph, the perfect tradeoff will be like this.
What is the difference between parameter
and hyperparameter?

• Model parameters: These are the parameters that are estimated by

the model from the given data. For example the weights of a deep
neural network.
• Model hyperparameters: These are the parameters that cannot be
estimated by the model from the given data. These parameters are used
to estimate the model parameters. For example, the learning rate in
deep neural networks.
Model Parameters:
 Model parameters are configuration variables that are internal to the
model, and a model learns them on its own. For example, W Weights
or Coefficients of independent variables in the Linear regression
model. or Weights or Coefficients of independent variables in
SVM, weight, and biases of a neural network, cluster centroid in
clustering. Some key points for model parameters are as follows:
• They are used by the model for making predictions.
• They are learned by the model from the data itself
• These are usually not set manually.
• These are the part of the model and key to a machine learning
Algorithm.
Model Hyperparameters:

 Hyperparameters are those parameters that are explicitly defined by the

user to control the learning process. Some key points for model
parameters are as follows:
• These are usually defined manually by the machine learning engineer.
• One cannot know the exact best value for hyperparameters for the
given problem. The best value can be determined either by the rule of
thumb or by trial and error.
• Some examples of Hyperparameters are the learning rate for
training a neural network, K in the KNN algorithm,
What is hyperparameter tuning and why it is
important?

 Hyperparameter tuning (or hyperparameter optimization) is the process of

determining the right combination of hyperparameters that maximizes the
model performance. It works by running multiple trials in a single training
process. Each trial is a complete execution of your training application with
values for your chosen hyperparameters, set within the limits you specify.
This process once finished will give you the set of hyperparameter values
that are best suited for the model to give optimal results.
 Hyperparameter tuning is the process of selecting the optimal values for a
machine learning model’s hyperparameters. Hyperparameters are settings
that control the learning process of the model, such as the learning rate,
the number of neurons in a neural network, or the kernel size in a support
vector machine. The goal of hyperparameter tuning is to find the values
that lead to the best performance on a given task.
 Manual hyperparameter tuning
 Manual hyperparameter tuning involves experimenting with different
sets of hyperparameters manually i.e. each trial with a set of
hyperparameters will be performed by you.
 Advantages of manual hyperparameter optimization:
• Tuning hyperparameters manually means more control over the process.
• If you’re researching or studying tuning and how it affects the network
weights then doing it manually would make sense.
 Disadvantages of manual hyperparameter optimization:
• Manual tuning is a tedious process since there can be many trials and
keeping track can prove costly and time-consuming.
• This isn’t a very practical approach when there are a lot of
hyperparameters to consider.
 Automated hyperparameter tuning
 Automated hyperparameter tuning utilizes already existing algorithms to
automate the process. The steps you follow are:
• First, specify a set of hyperparameters and limits to those hyperparameters’
values (note: every algorithm requires this set to be a specific data structure,
e.g. dictionaries are common while working with algorithms).
• Then the algorithm does the heavy lifting for you. It runs those trials and
fetches you the best set of hyperparameters that will give optimal results.
• The k in kNN or K-Nearest Neighbour algorithm
• Learning rate for training a neural network
• Train-test split ratio
• Batch Size
• Number of Epochs
• Branches in Decision Tree
• Number of clusters in Clustering Algorithm
Types of Hyperparameters

 Model-Specific Hyperparameters: These control the structure or

complexity of the model.
 Example:The depth of a decision tree.The number of neurons in a neural
network layer.
 Algorithm-Specific Hyperparameters: These control how the algorithm
optimizes the model.
 Example:The learning rate in gradient descent.The number of iterations
for training.
Examples of Common
Hyperparameters

 1. Learning Rate (`alpha`):

 - Determines the step size at each iteration while moving toward a minimum of the loss
function.
 - Small learning rates converge slowly, while large learning rates may cause the model to
overshoot the minimum.

 2. Number of Epochs:
 - The number of times the learning algorithm works through the entire training dataset.
 - Too few epochs can lead to underfitting, while too many can lead to overfitting.

 3. Batch Size:
 - Number of training samples used to compute the gradient during optimization.
 - Smaller batches allow for more frequent updates, while larger batches provide a more
accurate estimate of the gradient.
 4. Regularization Parameter (`lambda` or `alpha`):

 - Penalizes large weights in models to avoid overfitting. Common regularization

techniques include:

 - Lasso (L1 regularization).

 - Ridge (L2 regularization).

 5. Number of Hidden Layers/Neurons in Neural Networks:

 - Controls the depth and capacity of the network.

 - More layers can learn more complex features but may lead to overfitting.
 6. Max Depth for Decision Trees:

 - Limits the number of splits in decision trees.

 - A deeper tree can model more complex patterns but may lead to overfitting.

 7. Dropout Rate (Neural Networks):

 - Used to randomly drop units (along with their connections) during training to prevent
overfitting.

 8. Number of Neighbors in K-Nearest Neighbors (KNN):

 - Controls how many neighbors are considered when predicting a class.

Hyperparameter Tuning

 1. Grid Search:

 - Try every possible combination of hyperparameter values from a predefined set. This
can be computationally expensive, but it guarantees that the best combination will be
found within the search space.

 2. Random Search:

 - Instead of searching all possible combinations, it randomly samples the

hyperparameter space for a fixed number of times.
 3. Bayesian Optimization:

 - This method tries to intelligently explore the hyperparameter space by predicting the
performance based on previous trials. It can be more efficient than grid search and
random search.

 4. Gradient-Based Optimization:

 - Methods like gradient descent or its variants can also be applied to hyperparameter
tuning by treating the hyperparameter space as a continuous function.
Ensemble Methods
 Ensemble methods in machine learning are techniques that combine
predictions from multiple models to improve the performance,
accuracy, and robustness of predictive outcomes. They are widely
used because they help reduce issues like overfitting, enhance
prediction accuracy, and often provide more stable and
generalized results compared to single models. Here’s a deep dive
into the most common ensemble techniques: Bagging, Boosting, and
Stacking, along with examples, code, and visuals to illustrate each
approach.
Types of Ensemble Methods

 The main types of ensemble methods are:

• Bagging (Bootstrap Aggregating): Creates multiple subsets of the
dataset, trains individual models on these subsets, and combines their
predictions.
• Boosting: Sequentially builds models that focus on correcting the errors
of previous models.
• Stacking: Combines different types of models, often in layers, to create
a more powerful prediction model.
 Bagging with Random Forest
 Bagging reduces variance by training multiple versions of a model on
random subsets of the data. A commonly used bagging method is the
Random Forest, which combines predictions from multiple decision trees,
reducing overfitting while maintaining flexibility.
• Random Forest trains multiple decision trees on random subsets of the data
and combines their predictions, reducing variance and improving accuracy.
 Boosting with AdaBoost
 Boosting sequentially builds models that focus on the errors of previous
models. Each model tries to correct the mistakes of the previous one,
gradually improving the performance. AdaBoost (Adaptive Boosting) is a
popular boosting algorithm.
• In AdaBoost, a weak learner (a simple model, like a shallow decision tree) is
used in multiple iterations.
• Each new model is trained to give more weight to the misclassified
samples, focusing on harder cases.
 Gradient Boosting with XGBoost
 Gradient Boosting minimizes the error by training each new model on
the residuals (errors) of the previous model. XGBoost (Extreme Gradient
Boosting) is a powerful library optimized for speed and performance.
 Stacking with Multiple Classifiers
 Stacking combines predictions from different types of models. Unlike
bagging or boosting, stacking uses a meta-learner to make the final
predictions based on the outputs of the individual base models.
• Each base model learns independently on the data, providing diverse
perspectives.
Performance Metrics for Regression
 1. Mean Absolute Error (MAE)
 The Mean Absolute Error is the average of the absolute differences between the
predicted and actual values. It gives an idea of the average error in the
predictions.
• Lower MAE indicates a more accurate model.
• MAE is sensitive to outliers since it only considers absolute error.

2. Mean Squared Error (MSE)

 Mean Squared Error squares the differences between actual and predicted
values. Squaring penalizes larger errors more, making MSE sensitive to outliers.
• Lower MSE indicates better model performance.
• Because of the squaring effect, MSE emphasizes large errors more than MAE.
 3. Root Mean Squared Error (RMSE)
 RMSE is simply the square root of MSE. It maintains the same unit as the
target variable, making interpretation easier.

• Lower RMSE is better.

 4. Mean Absolute Percentage Error (MAPE)
 MAPE provides an error rate as a percentage. It’s useful when
interpreting errors relative to the size of the actual values.
• Lower MAPE indicates better performance.
 5. R-Squared (R²)
 R-Squared measures the proportion of variance in the target variable
explained by the model. It ranges from 0 to 1, with 1 indicating a perfect
model.

• Higher R² values indicate a better fit.

• If R² is 0, the model does no better than the mean prediction, and if it is
negative, the model performs worse than predicting the mean.
Classification Metrics
 1. Accuracy
 Accuracy measures the proportion of correctly classified instances out of the total
instances.

• Higher accuracy indicates better overall performance. However, if the dataset is

imbalanced, accuracy alone may not be sufficient.
 2. Precision
 Precision is the ratio of true positives to all predicted positives, showing the
percentage of positive predictions that are actually correct.
 High precision is particularly useful when false positives have a high cost, such as in
marketing, where you want to avoid wasting resources on uninterested clients.
 3. Recall (Sensitivity)
 Recall measures the ability of a model to find all positive instances,
calculated as the ratio of true positives to all actual positives.

 High recall indicates that the model is effective at capturing positive

instances. It’s important in scenarios where missing positive cases is
costly, like identifying potential customers.
 4. F1 Score
 The F1 Score is the harmonic mean of precision and recall. It provides a
balanced metric when there’s a trade-off between precision and recall.
 The F1 score is useful when you need a single metric that balances
precision and recall.
 5. Confusion Matrix
 The confusion matrix shows the counts of true positives, true negatives, false
positives, and false negatives.
• A confusion matrix provides insight into the types of errors the model makes. For
instance, false negatives indicate missed opportunities.
 6. ROC-AUC Score and ROC Curve
 The ROC-AUC (Receiver Operating Characteristic - Area Under Curve) score measures
the model's ability to distinguish between classes. The ROC curve shows the true
positive rate against the false positive rate at various threshold levels.
• ROC-AUC Score: A higher ROC-AUC score means the model is better at
distinguishing between the positive and negative classes.
• ROC Curve: The closer the curve is to the top-left corner, the better the model's
performance.
 Final Interpretation
• Accuracy gives an overall success rate, but alone may not be sufficient.
• Precision is useful when the cost of false positives is high.
• Recall is vital when you need to capture as many positives as possible.
• F1 Score is a balanced metric, helpful when you need to consider both
precision and recall.
• ROC-AUC Score provides insight into the model's ability to separate
classes.
Clustering Evaluation Metrics
 Unlike supervised learning, clustering lacks a ground truth. Thus, we rely on
different metrics to evaluate the "quality" of clusters, often based on internal
criteria. Here are some commonly used clustering evaluation metrics:
 1. Silhouette Score
 The Silhouette Score measures how similar an object is to its own cluster
compared to other clusters. A higher score indicates that clusters are well-
separated and compact.
 Where:
• a: The mean distance between a sample and all other points in the same
cluster.
• b: The mean distance between a sample and all points in the next nearest
cluster.
• A Silhouette Score closer to 1 indicates well-separated clusters, whereas a
score near 0 suggests overlapping clusters.
 2. Calinski-Harabasz Index
 The Calinski-Harabasz Index, or Variance Ratio Criterion, measures the
ratio of the sum of between-cluster dispersion to within-cluster
dispersion. A higher score indicates more distinct clusters.
• Higher values represent clusters that are dense and well-separated.
 3. Davies-Bouldin Score
 The Davies-Bouldin Score is the average similarity ratio of each cluster
with the cluster that is most similar to it. Lower values indicate better
clustering.
• Lower values of the Davies-Bouldin Score indicate well-separated
clusters. Values closer to zero are ideal.
Model Selection
 Model selection involves choosing the best model for a particular dataset and
task, often by evaluating different algorithms or parameter settings. We use
various techniques such as cross-validation, hyperparameter tuning, and
metrics comparison to help select the most suitable model.
 Model Selection Process Overview
1. Data Preprocessing: Clean and preprocess the dataset.
2. Train-Test Split: Split the data into training and test sets.
3. Baseline Models: Train and evaluate different baseline models.
4. Cross-Validation: Use cross-validation to get more robust evaluation metrics.
5. Hyperparameter Tuning: Use techniques like Grid Search or Randomized
Search to optimize model parameters.
6. Model Evaluation: Compare models using metrics like accuracy, precision,
recall, F1 score, and AUC-ROC.
Case: Credit Card Fraud Detection
 Objective: Build a model to classify whether a given transaction is
fraudulent or not based on features available in the dataset.
 Overview of the Process
1. Load and Preprocess the Data: Read the dataset and perform
necessary cleaning and preprocessing.
2. Exploratory Data Analysis (EDA): Understand the dataset better
through visualizations and statistics.
3. Model Selection: Train several models on the dataset.
4. Evaluate Models: Use different metrics to assess model performance.
5. Hyperparameter Tuning: Optimize model parameters for better
performance.
6. Final Evaluation: Evaluate the best model on a test set.

DL Unit1
100% (2)
DL Unit1
79 pages
ML Decode
No ratings yet
ML Decode
130 pages
DSC-7 Introduction To Business Analytics Sol Chapter
No ratings yet
DSC-7 Introduction To Business Analytics Sol Chapter
101 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Bias and Variance in Machine Learning
100% (1)
Bias and Variance in Machine Learning
7 pages
Welfare Facilities and Employee Satisfaction in HLL PROJECT REPORT MBA
0% (1)
Welfare Facilities and Employee Satisfaction in HLL PROJECT REPORT MBA
91 pages
Big Data Data Mining and Data Science - George Dimitoglou
No ratings yet
Big Data Data Mining and Data Science - George Dimitoglou
386 pages
ML Decode
No ratings yet
ML Decode
130 pages
Business Analytics
No ratings yet
Business Analytics
12 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
SAS - Regression Using JMP
100% (1)
SAS - Regression Using JMP
283 pages
Interpretation of Geochemical Data - 2019.
100% (1)
Interpretation of Geochemical Data - 2019.
19 pages
Real Statistics Examples Correlation Reliability
No ratings yet
Real Statistics Examples Correlation Reliability
404 pages
Bias, Variance, and Tradeoff
No ratings yet
Bias, Variance, and Tradeoff
8 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
ML Unit 2
No ratings yet
ML Unit 2
53 pages
Tna & Tni
No ratings yet
Tna & Tni
41 pages
4 - Bias-Variance Tradeoff
No ratings yet
4 - Bias-Variance Tradeoff
28 pages
T9 - Table For Constants For Control and Formulas For Control Charts
No ratings yet
T9 - Table For Constants For Control and Formulas For Control Charts
3 pages
Bias and Variance in Machine Learning - Javatpoint
100% (2)
Bias and Variance in Machine Learning - Javatpoint
6 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
9 pages
Teaching in Tandem Action Research Proposal
No ratings yet
Teaching in Tandem Action Research Proposal
16 pages
Underfitting & Overfitting
No ratings yet
Underfitting & Overfitting
13 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Statistics 17 by Keller
No ratings yet
Statistics 17 by Keller
76 pages
Merge +1
No ratings yet
Merge +1
107 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
Brookfield CT3 - Texturometro Press
No ratings yet
Brookfield CT3 - Texturometro Press
6 pages
Role of The Broadcast Media in Promoting Free and Fair Election in Nasarawa Local Government Area During The 2019 General Election
No ratings yet
Role of The Broadcast Media in Promoting Free and Fair Election in Nasarawa Local Government Area During The 2019 General Election
50 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Unit 2
No ratings yet
Unit 2
97 pages
12 Bias-Variance - Underfit - Overfit
No ratings yet
12 Bias-Variance - Underfit - Overfit
4 pages
(Technical) Machine Learning U3-6 (2019 Pattern)
No ratings yet
(Technical) Machine Learning U3-6 (2019 Pattern)
101 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Data Warehousing: Modern Database Management 8 Edition
No ratings yet
Data Warehousing: Modern Database Management 8 Edition
34 pages
Aggestam, M. (2006)
No ratings yet
Aggestam, M. (2006)
23 pages
All DL
No ratings yet
All DL
72 pages
Machine Learning Math Essentials - 12.02.2025
No ratings yet
Machine Learning Math Essentials - 12.02.2025
88 pages
Unit 4
No ratings yet
Unit 4
50 pages
Bias and Variance
No ratings yet
Bias and Variance
36 pages
40 Machine Learning Interview Questions
No ratings yet
40 Machine Learning Interview Questions
55 pages
11 July Unit 1
No ratings yet
11 July Unit 1
47 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
Sarjana's Thesis Ardelianda Augesla
No ratings yet
Sarjana's Thesis Ardelianda Augesla
18 pages
Adaptive Design
No ratings yet
Adaptive Design
13 pages
Linear Regression, Polynomical, Gradiant Descent
No ratings yet
Linear Regression, Polynomical, Gradiant Descent
42 pages
GBT 4.4
No ratings yet
GBT 4.4
25 pages
Variance and Bias
No ratings yet
Variance and Bias
14 pages
Machine Learning-Unit 3
No ratings yet
Machine Learning-Unit 3
18 pages
ML Bias and Variance
No ratings yet
ML Bias and Variance
14 pages
Lec 3
No ratings yet
Lec 3
13 pages
Lec 8
No ratings yet
Lec 8
19 pages
ML Lec-7
No ratings yet
ML Lec-7
12 pages
Regularization Linear Models
No ratings yet
Regularization Linear Models
23 pages
Bias and Variance
No ratings yet
Bias and Variance
15 pages
Apjeas-2018 5 1 02
No ratings yet
Apjeas-2018 5 1 02
17 pages
Bias - Variance Trade Off
No ratings yet
Bias - Variance Trade Off
11 pages
Regression Analysis, Linear or Nonlinear Regression? That Is The Question. - Minitab
No ratings yet
Regression Analysis, Linear or Nonlinear Regression? That Is The Question. - Minitab
11 pages
Ensemble Method
No ratings yet
Ensemble Method
12 pages
Syilfi, Dwi Ispriyanti, Diah Safitri: Analisis Regresi Linier Piecewise Dua Segmen
No ratings yet
Syilfi, Dwi Ispriyanti, Diah Safitri: Analisis Regresi Linier Piecewise Dua Segmen
11 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
Trinetra Banerjee
No ratings yet
Trinetra Banerjee
9 pages
08 Eval-Intro Notes
No ratings yet
08 Eval-Intro Notes
10 pages
ML UNIT 4 Notes
No ratings yet
ML UNIT 4 Notes
30 pages
Bias Variance Dichotomy
No ratings yet
Bias Variance Dichotomy
11 pages
An Introduction To Quantitative Research
No ratings yet
An Introduction To Quantitative Research
5 pages
Chapter2 1 22
No ratings yet
Chapter2 1 22
9 pages
Assignment
No ratings yet
Assignment
7 pages
Bias Variance
No ratings yet
Bias Variance
8 pages
Bais and Variance
No ratings yet
Bais and Variance
4 pages
Marketing Class One Way Anova 12 Dec 2022
No ratings yet
Marketing Class One Way Anova 12 Dec 2022
6 pages
Bias and Variance
No ratings yet
Bias and Variance
7 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Lecture 8
No ratings yet
Lecture 8
15 pages
1 Bias Variance Overfit Underfit
No ratings yet
1 Bias Variance Overfit Underfit
6 pages
Bias and Variance
No ratings yet
Bias and Variance
4 pages
Report 3
No ratings yet
Report 3
3 pages
Overview of Bias and Variance
No ratings yet
Overview of Bias and Variance
3 pages
Uf, Of, Bias-Variance Tradeoff
No ratings yet
Uf, Of, Bias-Variance Tradeoff
3 pages
Bias Variance Overfitting
No ratings yet
Bias Variance Overfitting
3 pages
E-Commerce Catalog Manager - Detailed Explanation
No ratings yet
E-Commerce Catalog Manager - Detailed Explanation
2 pages
Emsemble Methods-Pages-Deleted
No ratings yet
Emsemble Methods-Pages-Deleted
2 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

Module 3 Modified

Uploaded by

Module 3 Modified

Uploaded by

Presenter’s Name

Dr. Shital Bhatt

• Model parameters: These are the parameters that are estimated by

 Hyperparameters are those parameters that are explicitly defined by the

 Hyperparameter tuning (or hyperparameter optimization) is the process of

 Model-Specific Hyperparameters: These control the structure or

 1. Learning Rate (`alpha`):

 - Penalizes large weights in models to avoid overfitting. Common regularization

 - Lasso (L1 regularization).

 - Ridge (L2 regularization).

 5. Number of Hidden Layers/Neurons in Neural Networks:

 - Controls the depth and capacity of the network.

 - Limits the number of splits in decision trees.

 7. Dropout Rate (Neural Networks):

 8. Number of Neighbors in K-Nearest Neighbors (KNN):

 - Controls how many neighbors are considered when predicting a class.

 - Instead of searching all possible combinations, it randomly samples the

 The main types of ensemble methods are:

2. Mean Squared Error (MSE)

• Lower RMSE is better.

• Higher R² values indicate a better fit.

• Higher accuracy indicates better overall performance. However, if the dataset is

 High recall indicates that the model is effective at capturing positive

You might also like