0% found this document useful (0 votes)

13 views20 pages

ML Unit4 Notes

The document discusses model validation techniques in classification, focusing on cross-validation methods such as Holdout, K-Fold, and Stratified K-Fold, as well as Leave-One-Out Cross Validation. It also covers the bias-variance tradeoff, regularization techniques like Ridge and Lasso regression, and the concepts of overfitting and underfitting in machine learning models. These methods and concepts are essential for evaluating and selecting the best predictive models while minimizing errors on unseen data.

Uploaded by

freefiretopup606

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views20 pages

ML Unit4 Notes

Uploaded by

freefiretopup606

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT-IV

Model Validation in Classification : Cross Validation - Holdout Method, K-Fold,

Stratified K-Fold, Leave-One-Out Cross Validation. Bias-Variance tradeoff,
Regularization , Overfitting, Underfitting. Ensemble Methods: Boosting, Bagging,
Random
Forest.

4.1 CROSS
VALIDATION

To test the performance of a classifier, we need to have a number of training/validation

set pairs from a dataset X. To get them, if the sample X is large enough, we can
randomly divide it then divide each part randomly into two and use one half for training
and the other half for validation. Unfortunately, datasets are never large enough to
do this. So, we use the same data split differently; this is called cross-validation.
Cross-validation is a technique to evaluate predictive models by partitioning the
original sample into a training set to train the model, and a test set to evaluate it.

During the evaluation of machine learning (ML) models, the following question might
arise:

Is this model the best one available from the hypothesis space of the algorithm in

terms of generalization error on an unknown/future data set?

What training and testing techniques are used for the

model?

What model should be selected from the available ones?

4.2 Methods used for Cross-Validation:

4.2.1 Holdout
Method

Consider training a model using an algorithm on a given dataset. Using the same
training data, you determine that the trained model has an accuracy of 95% or
even 100%. What does this mean? Can this model be used for prediction?
No. This is because your model has been trained on the given data, i.e. it knows the
data
and has generalized over it very well. In contrast, when you try to predict over a
new set

of data, it will most likely give you very bad accuracy because it has never seen the
data before and thus cannot generalize well over it. To deal with such
problems, hold-out methods can be employed.

The hold-out method involves splitting the data into multiple parts and using one
part for training the model and the rest for validating and testing it. It can be
used for both model evaluation and selection.
In cases where every piece of data is used for training the model, there
remains the
problem of selecting the best model from a list of possible models. Primarily, we
want to identify which model has the lowest generalization error or which model
makes a better prediction on future or unseen datasets than all of the others.
There is a need to have a mechanism that allows the model to be trained on one
set of data and tested on another set of data. This is where hold-out comes into
play.

Hold-Out Method for Model Evaluation

Model evaluation using the hold-out method entails splitting the dataset into training

and test datasets, evaluating model performance, and determining the most optimal

model. This
diagram illustrates the hold-out method for model evaluation.
There are two parts to the dataset in the diagram above. One split is held aside as a
training set.
Another set is held back for testing or evaluation of the model. The percentage of the
split is

determined based on the amount of training data available. A typical split of 70–30% is
used in which 70% of the dataset is used for training and 30% is used for testing the
model.

The objective of this technique is to select the best model based on its accuracy on the
testing dataset and compare it with other models. There is, however, the possibility that
the model can be well fitted to the test data using this technique. In other words,

models are trained to improve model accuracy on test datasets based on the

assumption that the test dataset represents the population. As a result, the test error

becomes an optimistic estimation of the generalization error. Obviously, this is not what

we want. Since the final model is trained to fit well (or overfit) the test data, it won’t
generalize well to unknowns or future datasets.

Follow the steps below for using the hold-out method for model
evaluation:

1. Split the dataset in two (preferably 70–30%; however, the split percentage
can vary and should be random).
2. Now, we train the model on the training dataset by selecting some fixed
set of
hyperparameters while training the model.

3. Use the hold-out test dataset to evaluate the model.

4. Use the entire dataset to train the final model so that it can generalize better on
future
datasets.
In this process, the dataset is split into training and test sets, and a fixed set of
hyperparameters is used to evaluate the model. There is another process in
which data can also be split into three sets, and these sets can be used to select a
model or to tune hyperparameters.

Hold-Out Method for Model Selection

Sometimes the model selection process is referred to as hyperparameter tuning.

During the hold-out method of selecting a model, the dataset is separated into
three sets — training, validation, and test.

Follow the steps below for using the hold-out method for model
selection:
1. Divide the dataset into three parts: training dataset, validation dataset, and test
dataset.
2. Now, different machine learning algorithms can be used to train different models.
You can train your classification model, for example, using logistic regression,
random forest, and XGBoost.
3. Tune the hyperparameters for models trained with different algorithms.
Change the hyperparameter settings for each algorithm mentioned in step 2
and come up with multiple models.
4. On the validation dataset, test the performance of each of these models
(associating with each of the algorithms).
5. Choose the most optimal model from those tested on the validation dataset. The
most optimal model will be set up with the most optimal hyperparameters. Using the
example above, let’s suppose the model trained with XGBoost with the
most optimal hyperparameters is selected.
6. Finally, on the test dataset, test the performance of the most optimal
model.

4.2.2 K-Fold Cross-Validation

K-fold cross-validation approach divides the input dataset into K groups of samples of
equal sizes. These samples are called folds. For each learning set, the prediction function
uses k-1

folds, and the rest of the folds are used for the test set. This approach is a very popular
CV
approach because it is easy to understand, and the output is less biased than other

methods. The steps for k-fold cross-validation are:

o Split the input dataset into K groups

o For each group:

o Take one group as the reserve or test data set.

o Use remaining groups as the training dataset

o Fit the model on the training set and evaluate the performance of
the model using the test set.

Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds.
On
1st iteration, the first fold is reserved for test the model , and rest are used to train the
model. On 2nd iteration, the second fold is used to test the model, and rest are used to
train the model. This process will continue until each fold is not used for the test fold.
4.2.3 Stratified k-fold cross-validation:

This technique is similar to k-fold cross-validation with some little changes. This
approach works on stratification concept, it is a process of rearranging the data to
ensure that each fold or group is a good representative of the complete dataset. To deal
with the bias and variance,
it is one of the best
approaches.

It can be understood with an example of housing prices, such that the price of some
houses can be much high than other houses. To tackle such situations, a stratified
k-fold cross- validation technique is useful.

4.2.4 Leave one out cross-

validation

This method is similar to the leave-p-out cross-validation, but instead of p, we need to

take 1 dataset out of training. It means, in this approach, for each learning set, only one
data point is reserved, and the remaining dataset is used to train the model. This process
repeats for each data point. Hence for n samples, we get n different training set
and n test set. It has the following features:

o In this approach, the bias is minimum as all the data points are used.

o The process is executed for n times; hence execution time is high.

o This approach leads to high variation in testing the effectiveness of the model as
we iteratively check against one data point.
4.3 Bias-Variance Trade
off

It is important to understand prediction errors (bias and variance) when it comes

to accuracy in any machine learning algorithm. There is a tradeoff between a model’s
ability to minimize bias and variance which is referred to as the best solution for selecting
a value of Regularization constant. Proper understanding of these errors would help
to avoid the overfitting and underfitting of a data set while training the algorithm

Bi
as
The bias is known as the difference between the prediction of the values by the ML
model and the correct value. Being high in biasing gives a large error in training as well as
testing data. Its recommended that an algorithm should always be low biased to avoid
the problem of underfitting.By high bias, the data predicted is in a straight line format,
thus not fitting
accurately in the data in the data set. Such fitting is known as Underfitting of Data. This

happens when the hypothesis is too simple or linear in nature. Refer to the graph
given below for an example of such a situation.

In such a problem, a hypothesis looks like follows.

Varianc
e
The variability of model prediction for a given data point which tells us spread of our
data is called the variance of the model. The model with high variance has a very complex
fit to the training data and thus is not able to fit accurately on the data which it
hasn’t seen before. As a result, such models perform very well on training data but has
high error rates on test data.When a model is high on variance, it is then said to as
Overfitting of Data. Overfitting is fitting the training set accurately via
complex curve and high order hypothesis but is not the solution as the
error with unseen data is high. While training a data model variance should be
kept low.
The high variance data looks like
follows.

In such a problem, a hypothesis looks like follows.

Bias Variance
Tradeoff
If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and
low variance condition and thus is error-prone. If algorithms fit too complex
( hypothesis with high degree eq.) then it may be on high variance and low bias. In the
latter condition, the new entries will not perform well. Well, there is something
between both of these conditions, known as Trade-off or Bias Variance Trade-off.

This tradeoff in complexity is why there is a tradeoff between bias and variance.
An algorithm can’t be more complex and less complex at the same time. For the
graph, the perfect tradeoff will be like.

The best fit will be given by hypothesis on the tradeoff

point. The error to complexity graph to show trade-off is

given as –

This is referred to as the best point chosen for the training of the algorithm which gives
low error in training as well as testing data.
4.4 Regularization :

Regularization is one of the most important concepts of machine learning. It is a

technique to prevent the model from overfitting by adding extra information to it.
Sometimes the machine learning model performs well with the training data but does not
perform well with the test data. It means the model is not able to predict the output
when deals with unseen data by introducing noise in the output, and hence the model is
called overfitted. This problem can be deal with the help of a regularization technique.

This technique can be used in such a way that it will allow to maintain all
variables or features in the model by reducing the magnitude of the variables.
Hence, it maintains accuracy as well as a generalization of the model. it mainly
regularizes or reduces the coefficient of features toward zero. In simple words, "In
regularization technique, we reduce the magnitude of the features by keeping the
same
number of
features."

Regularization works by adding a penalty or complexity term to the complex model.

Let's
consider the simple linear regression equation:

y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively.

Here
represents the bias of the model, and b represents the intercept.

Linear regression models try to optimize the β0 and b to minimize the cost
function. The
equation for the cost function for the linear model is given below:
Now, we will add a loss function and optimize parameter to make the model that can
predict the accurate value of Y. The loss function for the linear regression is
called as RSS or Residual sum of squares.

Techniques of
Regularization

There are mainly two types of regularization techniques, which are given
below:

o Ridge Regression
o Lasso Regression

Ridge
Regression
o Ridge regression is one of the types of linear regression in which a small
amount of bias is introduced so that we can get better long-term predictions.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o In this technique, the cost function is altered by adding the penalty term to
it. The
amount of bias added to the model is called Ridge Regression penalty. We
can
calculate it by multiplying with the lambda to the squared weight of each
individual feature.
o The equation for the cost function in ridge regression will be:

o In the above equation, the penalty term regularizes the coefficients of the
model, and hence ridge regression reduces the amplitudes of the
coefficients that decreases the complexity of the model.

o As we can see from the above equation, if the values of λ tend to zero, the
equation becomes the cost function of the linear regression model. Hence,
for the minimum value of λ, the model will resemble the linear regression
model.

o A general linear or polynomial regression will fail if there is high collinearity

between the independent variables, so to solve such problems, Ridge
regression can be used.

o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:
o Lasso regression is another regularization technique to reduce the complexity
of the model. It stands for Least Absolute and Selection Operator.
o It is similar to the Ridge Regression except that the penalty term contains
only the absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for the cost function
of Lasso regression will be:

o Some of the features in this technique are completely neglected for

model evaluation.
o Hence, the Lasso regression can help us to reduce the overfitting in the
model as well as the feature selection.

4.5 Overfitting and Under

fitting:
To train our machine learning model, we give it some data to learn from.
The process of plotting a series of data points and drawing the best fit line
to understand the relationship between the variables is called Data Fitting.
Our model is the best fit when it can find all necessary patterns in our data
and avoid the random data points and unnecessary patterns called Noise.
4.5.1
Overfitti
ng
When a model performs very well for training data but has poor performance
with test data (new data), it is known as overfitting. In this case, the
machine learning model learns the details and noise in the training data such
that it negatively affects the performance of the
model on test data. Overfitting can happen due to low bias and
high variance.

Reasons for
Overfitting

Data used for training is not cleaned and contains noise (garbage values) in it

The model has a high variance

The size of the training dataset used is not enough

The model is too complex

Ways to Tackle Overfitting

Using K-fold cross-validation

Using Regularization techniques such as Lasso and Ridge
Training model with sufficient data

Adopting ensembling techniques

4.5.2
Underfitting:
When a model has not learned the patterns in the training data well and is unable to
generalize well on the new data, it is known as underfitting. An underfit model has poor
performance on the training data and will result in unreliable predictions. Underfitting
occurs due to high bias and low variance.

Reasons for Underfitting

Data used for training is not cleaned and contains noise (garbage values) in it

The model has a high bias

The size of the training dataset used is not enough

The model is too simple

Ways to Tackle
Underfitting
Increase the number of features in the dataset
Increase model complexity
Reduce noise in the data
Increase the duration of training the data

4.6 Ensemble Methods:

When you want to purchase a new car, will you walk up to the first car shop and
purchase
one based on the advice of the dealer? It’s highly unlikely.

You would likely browser a few web portals where people have posted their
reviews and compare different car models, checking for their features and prices. You
will also probably ask your friends and colleagues for their opinion. In short, you
wouldn’t directly reach a conclusion, but will instead make a decision considering the
opinions of other people as well.

Ensemble models in machine learning operate on a similar idea. They combine the
decisions from multiple models to improve the overall performance.

Advantage : Improvement in predictive accuracy.

Disadvantage : It is difficult to understand an ensemble of

classifiers.

Ensembles overcome three problems

Statistical Problem –

The Statistical Problem arises when the hypothesis space is too large
for the amount of available data. Hence, there are many hypotheses
with the same accuracy on the data and the learning algorithm chooses only
one of them! There is a risk that the accuracy of the chosen hypothesis is low
on unseen data!

Computational Problem –
The Computational Problem arises when the learning algorithm cannot
guarantees finding the best hypothesis

Representational Problem –
The Representational Problem arises when the hypothesis space does not
contain any good approximation of the target class(es).
Types of Ensemble Classifier –
1)Bagging
2)Boosting
3)Random Forest

4.6.1 Bagging:
BAGGing, or Bootstrap AGGregating. BAGGing gets its name because
it combines Bootstrapping and Aggregation to form one ensemble model.
Given a sample of data, multiple bootstrapped subsamples are pulled. A
Decision Tree is formed on each of the bootstrapped subsamples. After each
subsample Decision Tree has been formed, an algorithm is used to aggregate over
the Decision Trees to form the

most efficient predictor. The image below will help

explain:

4.6.2 Boosting :
Unlike bagging, which aggregates prediction results at the end, boosting aggregates
the results at each step. They are aggregated using weighted averaging.
Weighted averaging involves giving all models different weights depending on
their
predictive power. In other words, it gives more weight to the model with the
highest predictive power. This is because the learner with the highest
predictive power is considered the most important.

Boosting works with the following

steps:

1. We sample m-number of subsets from an initial training dataset.

2. Using the first subset, we train the first weak learner.

3. We test the trained weak learner using the training data. As a result of

the testing, some data points will be incorrectly predicted.

4. Each data point with the wrong prediction is sent into the second subset of data,

and this subset is updated.

5. Using this updated subset, we train and test the second weak learner.

6. We continue with the following subset until the total number of subsets is reached.

7. We now have the total prediction. The overall prediction has already been

aggregated at each step, so there is no need to calculate it.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset." Instead of relying on one decision tree, the random
forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.

Why use Random Forest?

Below are some points that explain why we should use the Random Forest algorithm:

<="" li="">

o It takes less training time as compared to other algorithms.

o It predicts output with high accuracy, even for the large dataset it runs efficiently.

o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
Max Voting
• max voting method is generally used for classification problems.
• In this technique, multiple models are used to make predictions for each data point.
• The predictions by each model are considered as a ‘vote’.
Averaging
• In this method, we take an average of predictions from all the models and use it to make the final
prediction.
• Averaging can be used for making predictions in regression problems or while calculating
probabilities for classification problems.
Weighted Averaging
• All models are assigned different weights defining the importance of each model for prediction.

Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
No ratings yet
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
21 pages
Wa0001.
No ratings yet
Wa0001.
173 pages
11 Jan Bbit Proposal
No ratings yet
11 Jan Bbit Proposal
27 pages
Basics of Structural Equation Modeling
100% (2)
Basics of Structural Equation Modeling
328 pages
Cross-Validation in Machine Learning - Javatpoint
No ratings yet
Cross-Validation in Machine Learning - Javatpoint
8 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
CHAPTER I: Nature of Inquiry and Research
No ratings yet
CHAPTER I: Nature of Inquiry and Research
16 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
Modellingandevaluationunit2june2322 220623063944 5c70ebed
No ratings yet
Modellingandevaluationunit2june2322 220623063944 5c70ebed
53 pages
CHP 3
No ratings yet
CHP 3
70 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
43 pages
Unit IV
No ratings yet
Unit IV
51 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
Module 6 - ML
No ratings yet
Module 6 - ML
30 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Ch6-Models Selection Evaluating Classifiers
No ratings yet
Ch6-Models Selection Evaluating Classifiers
28 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
K Fold
No ratings yet
K Fold
21 pages
Unit 2
No ratings yet
Unit 2
28 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Unit 6 - Model Selection
No ratings yet
Unit 6 - Model Selection
13 pages
Sampling Methods in Machine Learning
No ratings yet
Sampling Methods in Machine Learning
13 pages
ch-3 FML
No ratings yet
ch-3 FML
14 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Lec 16
No ratings yet
Lec 16
18 pages
Wk07 Topic07 2 - 202303
No ratings yet
Wk07 Topic07 2 - 202303
21 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Unit 3
No ratings yet
Unit 3
13 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
4 Model Order
No ratings yet
4 Model Order
10 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
ML 5
No ratings yet
ML 5
14 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Predictive Accuracy Evaluation
No ratings yet
Predictive Accuracy Evaluation
13 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
Model Validation
No ratings yet
Model Validation
5 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
Muticollinearity of Technical Indicators
No ratings yet
Muticollinearity of Technical Indicators
42 pages
Model Evaluation and Cross-Validation Methods
No ratings yet
Model Evaluation and Cross-Validation Methods
3 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Randomized Block Design
No ratings yet
Randomized Block Design
7 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
MDU B.Tech CSE 8th Sem Syllabus
No ratings yet
MDU B.Tech CSE 8th Sem Syllabus
7 pages
Normative Legal Research
No ratings yet
Normative Legal Research
16 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
20csa311 - Big Data Bca-Ds
No ratings yet
20csa311 - Big Data Bca-Ds
13 pages
Finance
No ratings yet
Finance
24 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
31 pages
PL 300T00 Microsoft Power BI Data Analyst Certification
No ratings yet
PL 300T00 Microsoft Power BI Data Analyst Certification
5 pages
Chapter 14: Repeated Measures Analysis of Variance (ANOVA)
No ratings yet
Chapter 14: Repeated Measures Analysis of Variance (ANOVA)
20 pages
An Empirical Analysis of The Antecedents and Performance Consequences of Using The Moodle Platform
No ratings yet
An Empirical Analysis of The Antecedents and Performance Consequences of Using The Moodle Platform
5 pages
Solutions To Exercises: 5-23. (15 Min.) Methods of Estimating Costs-Account Analysis: Miller Fixtures
No ratings yet
Solutions To Exercises: 5-23. (15 Min.) Methods of Estimating Costs-Account Analysis: Miller Fixtures
6 pages
Correspondence Analysis
No ratings yet
Correspondence Analysis
19 pages
Ferric Processing Template Analysis Finale Excel
100% (1)
Ferric Processing Template Analysis Finale Excel
12 pages
(AR) An Alternative Method For Quantitative Synthesis of Single-Subject Researches. Percentage of Data Points Exceeding The Median (2006)
No ratings yet
(AR) An Alternative Method For Quantitative Synthesis of Single-Subject Researches. Percentage of Data Points Exceeding The Median (2006)
20 pages
2024 From - Authenticity - To - Perceived - Value - Role - of - Souvenir Image and Place Identity On Ceramic Souvenir-Repurchasing Intention
No ratings yet
2024 From - Authenticity - To - Perceived - Value - Role - of - Souvenir Image and Place Identity On Ceramic Souvenir-Repurchasing Intention
24 pages
ISOM2500 Regression Practice Solutions
No ratings yet
ISOM2500 Regression Practice Solutions
3 pages
PL 300t00 Power Bi Data Analyst Training
No ratings yet
PL 300t00 Power Bi Data Analyst Training
5 pages
1 Simple Linear Regression
No ratings yet
1 Simple Linear Regression
9 pages
MBA Part II (Semester III)
No ratings yet
MBA Part II (Semester III)
16 pages
Pengaruh Safety Talk Terhadap Tingkat Pemahaman K3 Pada Pekerja Dimoderasi Dengan Gender Instruktur Safety Talk
No ratings yet
Pengaruh Safety Talk Terhadap Tingkat Pemahaman K3 Pada Pekerja Dimoderasi Dengan Gender Instruktur Safety Talk
8 pages
The Impact of Brand Image Towards Loyalty With Satisfaction As A Mediator in Mcdonald'S
No ratings yet
The Impact of Brand Image Towards Loyalty With Satisfaction As A Mediator in Mcdonald'S
9 pages
The Vietnamese Version of The Social and Emotional Competence Questionnaire (Secq) : Psychometric Properties Among Adolescents
No ratings yet
The Vietnamese Version of The Social and Emotional Competence Questionnaire (Secq) : Psychometric Properties Among Adolescents
16 pages
Assignment of HR Analytics
No ratings yet
Assignment of HR Analytics
3 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Secrets of Statistical Data Analysis and Management Science!
From Everand
Secrets of Statistical Data Analysis and Management Science!
Andrei Besedin
No ratings yet

ML Unit4 Notes

Uploaded by

ML Unit4 Notes

Uploaded by

UNIT-IV

Model Validation in Classification : Cross Validation - Holdout Method, K-Fold,

To test the performance of a classifier, we need to have a number of training/validation

terms of generalization error on an unknown/future data set?

What training and testing techniques are used for the

What model should be selected from the available ones?

4.2 Methods used for Cross-Validation:

Hold-Out Method for Model Evaluation

3. Use the hold-out test dataset to evaluate the model.

Hold-Out Method for Model Selection

Sometimes the model selection process is referred to as hyperparameter tuning.

4.2.2 K-Fold Cross-Validation

methods. The steps for k-fold cross-validation are:

o Split the input dataset into K groups

o For each group:

o Take one group as the reserve or test data set.

o Use remaining groups as the training dataset

4.2.4 Leave one out cross-

This method is similar to the leave-p-out cross-validation, but instead of p, we need to

o The process is executed for n times; hence execution time is high.

It is important to understand prediction errors (bias and variance) when it comes

In such a problem, a hypothesis looks like follows.

In such a problem, a hypothesis looks like follows.

The best fit will be given by hypothesis on the tradeoff

point. The error to complexity graph to show trade-off is

Regularization is one of the most important concepts of machine learning. It is a

Regularization works by adding a penalty or complexity term to the complex model.

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively.

o A general linear or polynomial regression will fail if there is high collinearity

o It helps to solve the problems if we have more parameters than samples.

o Some of the features in this technique are completely neglected for

4.5 Overfitting and Under

The model has a high variance

The size of the training dataset used is not enough

The model is too complex

Ways to Tackle Overfitting

Using K-fold cross-validation

Adopting ensembling techniques

Reasons for Underfitting

The model has a high bias

The model is too simple

4.6 Ensemble Methods:

Advantage : Improvement in predictive accuracy.

Disadvantage : It is difficult to understand an ensemble of

Ensembles overcome three problems

most efficient predictor. The image below will help

Boosting works with the following

1. We sample m-number of subsets from an initial training dataset.

2. Using the first subset, we train the first weak learner.

the testing, some data points will be incorrectly predicted.

and this subset is updated.

aggregated at each step, so there is no need to calculate it.

Random Forest Algorithm

Why use Random Forest?

o It takes less training time as compared to other algorithms.

o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?

Step-1: Select random K data points from the training set.

Step-4: Repeat Step 1 & 2.

You might also like