0% found this document useful (0 votes)

3 views

The document discusses various concepts in data preparation and analysis, including cross-validation in machine learning. Cross-validation is a technique used to evaluate model performance by dividing data into subsets for training and testing, helping to prevent overfitting. It outlines different methods of cross-validation, their advantages and disadvantages, and emphasizes the importance of model selection and hyperparameter tuning.

Uploaded by

Mithali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Uploaded by

Mithali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

List steps in data preparation. Give short description of each step.

Differentiate between data science and data analytics.

Explain types of data and properties of data
Explain Bowley’s Coefficient of Skewness method with formulas
Give significance of Z-test, T-test, F-Test.
ANOVA - Theory and numerical
Explain Type-I & Type-II errors with example
Short note: Cross Validation

Cross Validation in Machine Learning

In machine learning, we couldn’t fit the model on the training data and can’t say that
the model will work accurately for the real data. For this, we must assure that our
model got the correct patterns from the data, and it is not getting up too much noise.
For this purpose, we use the cross-validation technique. In this article, we’ll delve
into the process of cross-validation in machine learning.

What is Cross-Validation?

Cross validation is a technique used in machine learning to evaluate the performance

of a model on unseen data. It involves dividing the available data into multiple folds
or subsets, using one of these folds as a validation set, and training the model on the
remaining folds. This process is repeated multiple times, each time using a different
fold as the validation set. Finally, the results from each validation step are averaged to
produce a more robust estimate of the model’s performance. Cross validation is an
important step in the machine learning process and helps to ensure that the model
selected for deployment is robust and generalizes well to new data.

What is cross-validation used for?

The main purpose of cross validation is to prevent overfitting, which occurs when a
model is trained too well on the training data and performs poorly on new, unseen
data. By evaluating the model on multiple validation sets, cross validation provides a
more realistic estimate of the model’s generalization performance, i.e., its ability to
perform well on new, unseen data.

Types of Cross-Validation

There are several types of cross validation techniques, including k-fold cross
validation, leave-one-out cross validation, and Holdout validation, Stratified
Cross-Validation. The choice of technique depends on the size and nature of the data,
as well as the specific requirements of the modeling problem.

1. Holdout Validation

In Holdout Validation, we perform training on the 50% of the given dataset and rest
50% is used for the testing purpose. It’s a simple and quick way to evaluate a model.
The major drawback of this method is that we perform training on the 50% of the
dataset, it may possible that the remaining 50% of the data contains some important
information which we are leaving while training our model i.e. higher bias.

2. LOOCV (Leave One Out Cross Validation)

In this method, we perform training on the whole dataset but leaves only one
data-point of the available dataset and then iterates for each data-point. In LOOCV,
the model is trained on
n−1

n−1 samples and tested on the one omitted sample, repeating this process for each data

point in the dataset. It has some advantages as well as disadvantages also.

An advantage of using this method is that we make use of all data points and hence it
is low bias.

The major drawback of this method is that it leads to higher variation in the testing
model as we are testing against one data point. If the data point is an outlier it can lead
to higher variation. Another drawback is it takes a lot of execution time as it iterates
over ‘the number of data points’ times.

3. Stratified Cross-Validation

It is a technique used in machine learning to ensure that each fold of the

cross-validation process maintains the same class distribution as the entire dataset.
This is particularly important when dealing with imbalanced datasets, where certain
classes may be underrepresented. In this method,

1. The dataset is divided into k folds while maintaining the proportion of
classes in each fold.
2. During each iteration, one-fold is used for testing, and the remaining folds
are used for training.
3. The process is repeated k times, with each fold serving as the test set exactly
once.

Stratified Cross-Validation is essential when dealing with classification problems

where maintaining the balance of class distribution is crucial for the model to
generalize well to unseen data.

4. K-Fold Cross Validation

In K-Fold Cross Validation, we split the dataset into k number of subsets (known as
folds) then we perform training on the all the subsets but leave one(k-1) subset for the
evaluation of the trained model. In this method, we iterate k times with a different
subset reserved for testing purpose each time.

Note: It is always suggested that the value of k should be 10 as the lower value of k
takes towards validation and higher value of k leads to LOOCV method.

Example of K Fold Cross Validation

The diagram below shows an example of the training subsets and evaluation subsets
generated in k-fold cross-validation. Here, we have total 25 instances. In first iteration
we use the first 20 percent of data for evaluation, and the remaining 80 percent for
training ([1-5] testing and [5-25] training) while in the second iteration we use the
second subset of 20 percent for evaluation, and the remaining three subsets of the data
for training ([5-10] testing and [1-5 and 10-25] training), and so on.

Total instances: 25

Value of k : 5

No. Iteration Training set observations

Testing set observations

1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24] [0 1 2 3 4]
2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24] [5 6 7 8 9]

3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22
23 24] [10 11 12 13 14]

4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22
23 24] [15 16 17 18 19]

5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19] [20 21 22 23 24]

Comparison between cross-validation and hold out method

Advantages of train/test split:

1. This runs K times faster than Leave One Out cross-validation because
K-fold cross-validation repeats the train/test split K-times.
2. Simpler to examine the detailed results of the testing process.

Advantages of cross-validation:

1. More accurate estimate of out-of-sample accuracy.

2. More “efficient” use of data as every observation is used for both training
and testing.

Advantages and Disadvantages of Cross Validation

Advantages:
1. Overcoming Overfitting: Cross validation helps to prevent overfitting by
providing a more robust estimate of the model’s performance on unseen
data.
2. Model Selection: Cross validation can be used to compare different models
and select the one that performs the best on average.
3. Hyperparameter tuning: Cross validation can be used to optimize the
hyperparameters of a model, such as the regularization parameter, by
selecting the values that result in the best performance on the validation set.
4. Data Efficient: Cross validation allows the use of all the available data for
both training and validation, making it a more data-efficient method
compared to traditional validation techniques.

Disadvantages:

1. Computationally Expensive: Cross validation can be computationally

expensive, especially when the number of folds is large or when the model
is complex and requires a long time to train.
2. Time-Consuming: Cross validation can be time-consuming, especially when
there are many hyperparameters to tune or when multiple models need to be
compared.
3. Bias-Variance Tradeoff: The choice of the number of folds in cross
validation can impact the bias-variance tradeoff, i.e., too few folds may
result in high bias, while too many folds may result in high variance.

ISTQB Advanced Level Technical Test Analyst- Exam Insights: Q&A with Explanations
From Everand
ISTQB Advanced Level Technical Test Analyst- Exam Insights: Q&A with Explanations
SUJAN
No ratings yet
Advantages & Disadvantages of Interactive Learning
90% (10)
Advantages & Disadvantages of Interactive Learning
2 pages
Ansoff 1990
No ratings yet
Ansoff 1990
13 pages
cross validation
No ratings yet
cross validation
5 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
Unit 2
No ratings yet
Unit 2
28 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
Lecture Note #6_PEC-CS701E
No ratings yet
Lecture Note #6_PEC-CS701E
11 pages
Module 6_ML
No ratings yet
Module 6_ML
30 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
3. Cross Validation
No ratings yet
3. Cross Validation
16 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Cross-Validation in Machine Learning - Javatpoint
No ratings yet
Cross-Validation in Machine Learning - Javatpoint
8 pages
model-validation
No ratings yet
model-validation
5 pages
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
No ratings yet
Cross Validation: Chandan B K Mrs. S Asst Professor, Department of Computer Science Engineering
21 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Resampling Methods
No ratings yet
Resampling Methods
15 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Cofusion Matrix Cross- Validation
No ratings yet
Cofusion Matrix Cross- Validation
34 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
Unit V
No ratings yet
Unit V
12 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
Answer-4 Shreyansh
No ratings yet
Answer-4 Shreyansh
4 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
All Types of Cross Validation
No ratings yet
All Types of Cross Validation
9 pages
K Fold
No ratings yet
K Fold
21 pages
Cross-Validation and Model Selection
No ratings yet
Cross-Validation and Model Selection
46 pages
Comparison Between Performance of Classifiers
No ratings yet
Comparison Between Performance of Classifiers
5 pages
Ml Unit4 Notes
No ratings yet
Ml Unit4 Notes
20 pages
ml_pyq_ans
No ratings yet
ml_pyq_ans
37 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
Cross Validation - Notes
No ratings yet
Cross Validation - Notes
10 pages
Cross Validation
No ratings yet
Cross Validation
6 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
UNIT4 Cross Validation
No ratings yet
UNIT4 Cross Validation
16 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
Analysis of K-Fold Cross-Validation Over Hold-Out
No ratings yet
Analysis of K-Fold Cross-Validation Over Hold-Out
6 pages
Unit 6_model selection (1)
No ratings yet
Unit 6_model selection (1)
13 pages
Cross Validation
No ratings yet
Cross Validation
13 pages
ML 5
No ratings yet
ML 5
14 pages
ADS-Methodology and Data Visualization
No ratings yet
ADS-Methodology and Data Visualization
12 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
A Gentle Introduction To K-Fold Cross-Validation
No ratings yet
A Gentle Introduction To K-Fold Cross-Validation
69 pages
Cross Validation Thesis
100% (4)
Cross Validation Thesis
5 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Lec 16
No ratings yet
Lec 16
18 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Crossvalidation - 1
No ratings yet
Crossvalidation - 1
30 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Copy of Model Answer paper- UT1-QP-ML-SEM7-COMPUTER-2023-3024 version2
No ratings yet
Copy of Model Answer paper- UT1-QP-ML-SEM7-COMPUTER-2023-3024 version2
18 pages
ML Technique01
No ratings yet
ML Technique01
7 pages
Cross Validation
No ratings yet
Cross Validation
10 pages
8
No ratings yet
8
56 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
Social Information Filtering_Unit V
No ratings yet
Social Information Filtering_Unit V
78 pages
UnitNo-6_PPT_Part 4
No ratings yet
UnitNo-6_PPT_Part 4
72 pages
EM UT 2[1]
No ratings yet
EM UT 2[1]
4 pages
DRust Language-Guided Distributed Shared Memory with Fine Granularity
No ratings yet
DRust Language-Guided Distributed Shared Memory with Fine Granularity
24 pages
Module 6-Environmental legislation
No ratings yet
Module 6-Environmental legislation
6 pages
SMA TechNeo Full Merged
No ratings yet
SMA TechNeo Full Merged
171 pages
Cheryl DonovnN Akashim A 021910
No ratings yet
Cheryl DonovnN Akashim A 021910
1 page
ADVANCED EDUCATIONAL PSYCHOLOGY 2nd Edition S. K. Mangal - The ebook in PDF format is ready for download
100% (1)
ADVANCED EDUCATIONAL PSYCHOLOGY 2nd Edition S. K. Mangal - The ebook in PDF format is ready for download
77 pages
PDT Worksheet 5
No ratings yet
PDT Worksheet 5
2 pages
Tayler Logue Math Lesson Plan - Grade 4
No ratings yet
Tayler Logue Math Lesson Plan - Grade 4
4 pages
Board of Intermediate Education of Karachi Commerce Regular - WWW - Geo436
100% (1)
Board of Intermediate Education of Karachi Commerce Regular - WWW - Geo436
140 pages
Reading Materials-Black Box
No ratings yet
Reading Materials-Black Box
10 pages
Lesson Plan
No ratings yet
Lesson Plan
2 pages
Lesson Plan 3-Final-Revised
No ratings yet
Lesson Plan 3-Final-Revised
3 pages
Microteaching Ia Lesson Plan November 14th
No ratings yet
Microteaching Ia Lesson Plan November 14th
4 pages
Treasure Box Session Guide PPT Final
No ratings yet
Treasure Box Session Guide PPT Final
40 pages
Lattice Representations of Vector Algebras
No ratings yet
Lattice Representations of Vector Algebras
13 pages
PORTFOLIO-PROJECT
No ratings yet
PORTFOLIO-PROJECT
68 pages
Ensemble
No ratings yet
Ensemble
2 pages
Different Approaches and Methods of Teaching Prepared By: Elmo Diongson Jr. and Kathryn Mae Cagalitan Multiple Choice: Read The Questions Carefully
No ratings yet
Different Approaches and Methods of Teaching Prepared By: Elmo Diongson Jr. and Kathryn Mae Cagalitan Multiple Choice: Read The Questions Carefully
1 page
SPS210 - Rubric - Individual Assignment
No ratings yet
SPS210 - Rubric - Individual Assignment
2 pages
Span 1110a2014
No ratings yet
Span 1110a2014
5 pages
Edt 317 Drama Lesson Plan
No ratings yet
Edt 317 Drama Lesson Plan
4 pages
Complete Download How to Manage Spelling Successfully 1st Edition Philomena Ott PDF All Chapters
100% (3)
Complete Download How to Manage Spelling Successfully 1st Edition Philomena Ott PDF All Chapters
41 pages
1st Exam
No ratings yet
1st Exam
2 pages
Critical Reading - 6
No ratings yet
Critical Reading - 6
2 pages
Rose Marie N. Hermosa
No ratings yet
Rose Marie N. Hermosa
2 pages
Math6 Q2 Smile LP3
No ratings yet
Math6 Q2 Smile LP3
10 pages
Curiculum Vitae: Personal Details:-Name: Abhishek Kirubhakaran C
No ratings yet
Curiculum Vitae: Personal Details:-Name: Abhishek Kirubhakaran C
2 pages
Self-Regulation and Self-Efficacy For The Improvement of Listening Proficiency Outside The Classroom
No ratings yet
Self-Regulation and Self-Efficacy For The Improvement of Listening Proficiency Outside The Classroom
15 pages
RL Quiz 22-3-24
No ratings yet
RL Quiz 22-3-24
8 pages
Q1 Housekeeping Week4
100% (2)
Q1 Housekeeping Week4
4 pages
Ramesh Resume
No ratings yet
Ramesh Resume
2 pages
Application Letter
No ratings yet
Application Letter
3 pages

ADS

Uploaded by

ADS

Uploaded by

List steps in data preparation. Give short description of each step.

Differentiate between data science and data analytics.

Cross Validation in Machine Learning

Cross validation is a technique used in machine learning to evaluate the performance

What is cross-validation used for?

2. LOOCV (Leave One Out Cross Validation)

point in the dataset. It has some advantages as well as disadvantages also.

It is a technique used in machine learning to ensure that each fold of the

Stratified Cross-Validation is essential when dealing with classification problems

4. K-Fold Cross Validation

Example of K Fold Cross Validation

No. Iteration Training set observations

Comparison between cross-validation and hold out method

Advantages of train/test split:

1.​ More accurate estimate of out-of-sample accuracy.

Advantages and Disadvantages of Cross Validation

1.​ Computationally Expensive: Cross validation can be computationally

You might also like

1. More accurate estimate of out-of-sample accuracy.

1. Computationally Expensive: Cross validation can be computationally