0% found this document useful (0 votes)
40 views18 pages

Chapter2 1 33

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views18 pages

Chapter2 1 33

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

University Institute of Engineering

DEPARTMENT OF COMPUTER SCIENCE


& ENGINEERING
Bachelor of Engineering (Computer Science & Engineering)
Artificial Intelligence and Machine Learning(21CSH-316)
Prepared by:
Sitaram patel(E13285)

DISCOVER . LEARN . EMPOWER


Course Outcome

2
Course Objectives

3
Model validation
• What is Model Validation
• Machine learning is all about the data, its quality, quantity, and playing with the same. Here most of the
time, we collect the data; we have to clean it, preprocess it, and then we have to apply the appropriate
algorithm and get the best-fit model out of it. But after getting a model, the task is not done; the model
validation is as important as the training.
• Directly training and then deploying a model would not work, and in sensitive areas like the healthcare
model, there is a huge amount of risk associated, and real-life predictions have to be made; in this case,
there should not be an error in the model as it can cost a lot then.
• Advantages of Model Validation
Here are many advantages that model validation provides.
• Quality of the Model
The first and foremost advantage of model validation is the quality of the model; yes, we can quickly get an
idea bout the performance and quality of the model by validating the same.
• The flexibility of the Model
Secondly, validating the model makes it easy to get an idea about the flexibility. Model validation helps
make the model more flexible also
4
Validation
• This process of deciding whether the numerical results quantifying hypothesized
relationships between variables, are acceptable as descriptions of the data, is known
as validation.
• Generally, an error estimation for the model is made after training, better known as
evaluation of residuals.
• In this process, a numerical estimate of the difference in predicted and original responses
is done, also called the training error.
• However, this only gives us an idea about how well our model does on data used to train
it.
• Now its possible that the model is underfitting or overfitting the data.
• So, the problem with this evaluation technique is that it does not give an indication of
how well the learner will generalize to an independent/ unseen data set.
• Getting this idea about our model is known as Cross Validation.
5
Cross Validation
• To evaluate the performance of any machine learning model we need to test it on some
unseen data.
• Based on the models performance on unseen data we can say weather our model is
Under-fitting/Over-fitting/Well generalised.
• Cross validation (CV) is one of the technique used to test the effectiveness of a machine
learning models, it is also a re-sampling procedure used to evaluate a model if we have a
limited data.
• To perform CV we need to keep aside a sample/portion of the data on which is do not use
to train the model, later us this sample for testing/validating.

6
Cross Validation
• In machine learning, we couldn’t fit the model on the training data and can’t say that the
model will work accurately for the real data.
• For this, we must assure that our model got the correct patterns from the data, and it is
not getting up too much noise.
• For this purpose, we use the cross-validation technique.
• Cross-validation is a technique in which we train our model using the subset of the data-
set and then evaluate using the complementary subset of the data-set.

7
Steps
• Reserve some portion of sample data-set.
• Using the rest data-set train the model.
• Test the model using the reserve portion of the data-set.

8
Methods of Cross validation

• Train test split

• Hold out Method

• K-Fold Cross Validation

• Stratified K-fold Cross Validation

• Leave One Out Cross Validation


9
Train Test Split
• In this approach we randomly split the
complete data into training and test sets.
• Then Perform the model training on the
training set and use the test set for validation
purpose, ideally split the data into 70:30 or
80:20.
• With this approach there is a possibility of
high bias if we have limited data, because we
would miss some information about the data
which we have not used for training.
• If our data is huge and our test sample and
train sample has the same distribution then
this approach is acceptable

10
Hold Out Cross Validation
• Now a basic remedy for this involves removing a part of the training data and using it to
get predictions from the model trained on rest of the data.
• The error estimation then tells how our model is doing on unseen data or the validation
set.
• This is a simple kind of cross validation technique, also known as the holdout
method.
• Although this method doesn’t take any overhead to compute and is better than traditional
validation, it still suffers from issues of high variance.
• This is because it is not certain which data points will end up in the validation set
and the result might be entirely different for different sets.

11
K-fold Cross Validation
• The procedure has a single parameter called k that refers to the number of groups that a
given data sample is to be split into.
• As such, the procedure is often called k-fold cross-validation.
• When a specific value for k is chosen, it may be used in place of k in the reference to the
model, such as k=10 becoming 10-fold cross-validation.
• If k=5 the dataset will be divided into 5 equal parts and the below process will run 5
times, each time with a different holdout set.
• 1. Take the group as a holdout or test data set
• 2. Take the remaining groups as a training data set
• 3. Fit a model on the training set and evaluate it on the test set
• 4. Retain the evaluation score and discard the model

12
K-fold Cross Validation

13
K-fold Cross Validation
• The value for k is chosen such that each train/test group of data samples is large enough
to be statistically representative of the broader dataset.
• A value of k=10 is very common in the field of applied machine learning, and is
recommend if you are struggling to choose a value for your dataset.
• If a value for k is chosen that does not evenly split the data sample, then one group will
contain a remainder of the examples.
• It is preferable to split the data sample into k groups with the same number of samples,
such that the sample of model skill scores are all equivalent.

14
Leave p Out Cross Validation(LPOCV)
• This approach leaves p data point out
of training data, i.e. if there are n data
points in the original sample then, n-
p samples are used to train the model
and p points are used as the
validation set.
• This is repeated for all combinations
in which the original sample can be
separated this way, and then the error
is averaged for all trials, to give
overall effectiveness.
• The number of possible
combinations is equal to the number
of data points in the original sample
or n. 15
Leave p Out Cross Validation
• This method is exhaustive in the sense that it needs to train and validate the model for all
possible combinations, and for moderately large p, it can become computationally
infeasible.
• A particular case of this method is when p = 1. This is known as Leave one out cross
validation.
• This method is generally preferred over the previous one because it does not suffer from
the intensive computation, as number of possible combinations is equal to number
of data points in original sample or n.

16
References
• Books and Journals
• Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai
Ben-David-Cambridge University Press 2014
• Introduction to machine Learning – the Wikipedia Guide by Osman Omer.

• Video Link-
• https://www.youtube.com/watch?v=9f-GarcDY58
• https://www.youtube.com/watch?v=GwIo3gDZCVQ

• Web Link-
• https://towardsdatascience.com/data-science-simplified-simple-linear-regression-models-3a97811a6a3d
• https://www.nku.edu/~statistics/Simple_Linear_Regression.htm
• https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc

17
THANK YOU

You might also like