Chapter2 1 33

Uploaded by

mr.devilrocks.0007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views18 pages

Chapter2 1 33

Uploaded by

mr.devilrocks.0007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

University Institute of Engineering

DEPARTMENT OF COMPUTER SCIENCE

& ENGINEERING
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)

DISCOVER . LEARN . EMPOWER

Course Outcome

2
Course Objectives

3
Model validation
• What is Model Validation
• Machine learning is all about the data, its quality, quantity, and playing with the same. Here most of the
time, we collect the data; we have to clean it, preprocess it, and then we have to apply the appropriate
algorithm and get the best-fit model out of it. But after getting a model, the task is not done; the model
validation is as important as the training.
• Directly training and then deploying a model would not work, and in sensitive areas like the healthcare
model, there is a huge amount of risk associated, and real-life predictions have to be made; in this case,
there should not be an error in the model as it can cost a lot then.
• Advantages of Model Validation
Here are many advantages that model validation provides.
• Quality of the Model
The first and foremost advantage of model validation is the quality of the model; yes, we can quickly get an
idea bout the performance and quality of the model by validating the same.
• The flexibility of the Model
Secondly, validating the model makes it easy to get an idea about the flexibility. Model validation helps
make the model more flexible also
4
Validation
• This process of deciding whether the numerical results quantifying hypothesized
relationships between variables, are acceptable as descriptions of the data, is known
as validation.
• Generally, an error estimation for the model is made after training, better known as
evaluation of residuals.
• In this process, a numerical estimate of the difference in predicted and original responses
is done, also called the training error.
• However, this only gives us an idea about how well our model does on data used to train
it.
• Now its possible that the model is underfitting or overfitting the data.
• So, the problem with this evaluation technique is that it does not give an indication of
how well the learner will generalize to an independent/ unseen data set.
• Getting this idea about our model is known as Cross Validation.
5
Cross Validation
• To evaluate the performance of any machine learning model we need to test it on some
unseen data.
• Based on the models performance on unseen data we can say weather our model is
Under-fitting/Over-fitting/Well generalised.
• Cross validation (CV) is one of the technique used to test the effectiveness of a machine
learning models, it is also a re-sampling procedure used to evaluate a model if we have a
limited data.
• To perform CV we need to keep aside a sample/portion of the data on which is do not use
to train the model, later us this sample for testing/validating.

6
Cross Validation
• In machine learning, we couldn’t fit the model on the training data and can’t say that the
model will work accurately for the real data.
• For this, we must assure that our model got the correct patterns from the data, and it is
not getting up too much noise.
• For this purpose, we use the cross-validation technique.
• Cross-validation is a technique in which we train our model using the subset of the data-
set and then evaluate using the complementary subset of the data-set.

7
Steps
• Reserve some portion of sample data-set.
• Using the rest data-set train the model.
• Test the model using the reserve portion of the data-set.

8
Methods of Cross validation

• Train test split

• Hold out Method

• K-Fold Cross Validation

• Stratified K-fold Cross Validation

• Leave One Out Cross Validation

9
Train Test Split
• In this approach we randomly split the
complete data into training and test sets.
• Then Perform the model training on the
training set and use the test set for validation
purpose, ideally split the data into 70:30 or
80:20.
• With this approach there is a possibility of
high bias if we have limited data, because we
would miss some information about the data
which we have not used for training.
• If our data is huge and our test sample and
train sample has the same distribution then
this approach is acceptable

10
Hold Out Cross Validation
• Now a basic remedy for this involves removing a part of the training data and using it to
get predictions from the model trained on rest of the data.
• The error estimation then tells how our model is doing on unseen data or the validation
set.
• This is a simple kind of cross validation technique, also known as the holdout
method.
• Although this method doesn’t take any overhead to compute and is better than traditional
validation, it still suffers from issues of high variance.
• This is because it is not certain which data points will end up in the validation set
and the result might be entirely different for different sets.

11
K-fold Cross Validation
• The procedure has a single parameter called k that refers to the number of groups that a
given data sample is to be split into.
• As such, the procedure is often called k-fold cross-validation.
• When a specific value for k is chosen, it may be used in place of k in the reference to the
model, such as k=10 becoming 10-fold cross-validation.
• If k=5 the dataset will be divided into 5 equal parts and the below process will run 5
times, each time with a different holdout set.
• 1. Take the group as a holdout or test data set
• 2. Take the remaining groups as a training data set
• 3. Fit a model on the training set and evaluate it on the test set
• 4. Retain the evaluation score and discard the model

12
K-fold Cross Validation

13
K-fold Cross Validation
• The value for k is chosen such that each train/test group of data samples is large enough
to be statistically representative of the broader dataset.
• A value of k=10 is very common in the field of applied machine learning, and is
recommend if you are struggling to choose a value for your dataset.
• If a value for k is chosen that does not evenly split the data sample, then one group will
contain a remainder of the examples.
• It is preferable to split the data sample into k groups with the same number of samples,
such that the sample of model skill scores are all equivalent.

14
Leave p Out Cross Validation(LPOCV)
• This approach leaves p data point out
of training data, i.e. if there are n data
points in the original sample then, n-
p samples are used to train the model
and p points are used as the
validation set.
• This is repeated for all combinations
in which the original sample can be
separated this way, and then the error
is averaged for all trials, to give
overall effectiveness.
• The number of possible
combinations is equal to the number
of data points in the original sample
or n. 15
Leave p Out Cross Validation
• This method is exhaustive in the sense that it needs to train and validate the model for all
possible combinations, and for moderately large p, it can become computationally
infeasible.
• A particular case of this method is when p = 1. This is known as Leave one out cross
validation.
• This method is generally preferred over the previous one because it does not suffer from
the intensive computation, as number of possible combinations is equal to number
of data points in original sample or n.

16
References
• Books and Journals
• Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai
Ben-David-Cambridge University Press 2014
• Introduction to machine Learning – the Wikipedia Guide by Osman Omer.

• Video Link-
• https://www.youtube.com/watch?v=9f-GarcDY58
• https://www.youtube.com/watch?v=GwIo3gDZCVQ

• Web Link-
• https://towardsdatascience.com/data-science-simplified-simple-linear-regression-models-3a97811a6a3d
• https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
• https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc

17
THANK YOU

P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Module 6 - ML
No ratings yet
Module 6 - ML
30 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Model Validation
No ratings yet
Model Validation
5 pages
Unit 2
No ratings yet
Unit 2
28 pages
Cross-Validation in Machine Learning - Javatpoint
No ratings yet
Cross-Validation in Machine Learning - Javatpoint
8 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Unit 4
No ratings yet
Unit 4
34 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Unit V
No ratings yet
Unit V
12 pages
Unit IV
No ratings yet
Unit IV
51 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
No ratings yet
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
21 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
ML 5
No ratings yet
ML 5
14 pages
Comparison Between Performance of Classifiers
No ratings yet
Comparison Between Performance of Classifiers
5 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
Chap 2 Logistique Regression
No ratings yet
Chap 2 Logistique Regression
32 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
Answer-4 Shreyansh
No ratings yet
Answer-4 Shreyansh
4 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Model Validation & Data Partition
No ratings yet
Model Validation & Data Partition
14 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
Analysis of K-Fold Cross-Validation Over Hold-Out
No ratings yet
Analysis of K-Fold Cross-Validation Over Hold-Out
6 pages
Machine Learning: Cross Validation Machine Learning by Tom M. Mitchell Muhammad Affan Alim
No ratings yet
Machine Learning: Cross Validation Machine Learning by Tom M. Mitchell Muhammad Affan Alim
56 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
7 ML
No ratings yet
7 ML
38 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
Cross Validation
No ratings yet
Cross Validation
13 pages
ch-3 FML
No ratings yet
ch-3 FML
14 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
43 pages
Chapter2 1 22
No ratings yet
Chapter2 1 22
9 pages
Chapter2 1 55ppt
No ratings yet
Chapter2 1 55ppt
8 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Chapter2 1 44
No ratings yet
Chapter2 1 44
9 pages