0% found this document useful (0 votes)

8 views21 pages

Cross Validation

Cross-validation is a technique for evaluating machine learning models by splitting the dataset into training and testing subsets, ensuring robust performance. In a K-Fold cross-validation example using the Iris dataset and a RandomForestClassifier, accuracy scores for five folds were calculated, resulting in a mean score of 0.9134 and a standard deviation of 0.034. This method helps assess the model's reliability across different data partitions.

Uploaded by

ayeshasadiqa148

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views21 pages

Cross Validation

Uploaded by

ayeshasadiqa148

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Cross-Validation

Definition
• Cross-validation is a technique used to evaluate the performance of a
machine learning model by partitioning the original dataset into
training and testing subsets. This method helps to ensure that the
model's performance is robust and not dependent on a particular set
of data.
Dry Run of Cross-Validation with K-
Fold
• Context:

• Dataset: Iris dataset with 150 samples and 4 features (sepal length,
sepal width, petal length, petal width).
• Classes: 3 classes (0, 1, 2) representing different species of Iris.
• Model: RandomForestClassifier (default settings).
• Cross-Validation: KFold with 5 splits.
• Load Dataset:

• X: 150 samples with 4 features each.

• y: 150 target labels, each being 0, 1, or 2.

• KFold Definition:

• Number of splits (n_splits=5): 150 samples / 5 splits = 30 samples per fold.

• Shuffle: Enabled for randomness.
• Random State: Set to 42 for reproducibility.
Fold 1
• Data Splitting:

• Training data: 120 samples (since 150 samples split into 5, leaving 4 folds for
training).
• Testing data: 30 samples (1 fold reserved for testing).

• Let's assume the indices for the testing data in the first fold are: [0, 1, 2, ..., 29].

• Training indices: [30, 31, 32, ..., 149]

• Testing indices: [0, 1, 2, ..., 29]
Continue
• Training:
• The RandomForestClassifier is trained on the 120 training samples.
The training involves building multiple decision trees.
• Prediction:
• The trained model predicts the classes for the 30 testing samples.
• Let’s assume the predictions are [0, 0, 1, 1, 2, ..., 1] for the first 30
samples.
Continue
• Evaluation:

• Calculate the accuracy by comparing the predicted labels with the actual labels.
• Assume the actual labels are [0, 0, 1, 1, 2, ..., 2].
• If 27 out of 30 predictions are correct, the accuracy score for this fold would be
27/30 = 0.90.

• Store Score:

• Accuracy score of 0.90 is stored in the scores array.

Fold 2
• Data Splitting:

• Testing data: Another set of 30 samples, say indices [30, 31, 32, ..., 59].
• Training data: Remaining 120 samples.

• Training:

• The model is trained again from scratch on these new 120 training samples.

• Prediction:

• The model makes predictions on the new testing set of 30 samples.

• Assume predictions are [1, 1, 0, ..., 2].
Continue
• Evaluation:

• Assume actual labels are [1, 1, 0, ..., 1].

• If 28 out of 30 predictions are correct, the accuracy score is 28/30 =
0.933.

• Store Score:

• Accuracy score of 0.933 is stored.

Fold 3
• Data Splitting:

• Testing data: Next set of 30 samples, say indices [60, 61, 62, ..., 89].
• Training data: Remaining 120 samples.

• Training:

• The model is retrained on the new 120 training samples.

• Prediction:

• Predictions made on the 30 testing samples.

• Assume predictions are [2, 2, 1, ..., 0].
• Evaluation:

• Actual labels are [2, 2, 1, ..., 1].

• If 29 out of 30 predictions are correct, accuracy is 29/30 = 0.967.

• Store Score:

• Accuracy score of 0.967 is stored.

Fold 4
• Data Splitting:

• Testing data: Next set of 30 samples, say indices [90, 91, 92, ..., 119].
• Training data: Remaining 120 samples.

• Training:

• The model is retrained on the new training samples.

• Prediction:

• Predictions made on the testing samples.

• Assume predictions are [0, 0, 1, ..., 2].
• Evaluation:

• Actual labels are [0, 0, 1, ..., 2].

• If 26 out of 30 predictions are correct, accuracy is 26/30 = 0.867.

• Store Score:

• Accuracy score of 0.867 is stored.

Fold 5
• Data Splitting:

• Testing data: Last set of 30 samples, say indices [120, 121, 122, ..., 149].
• Training data: Remaining 120 samples.

• Training:

• The model is trained one last time on the new training samples.

• Prediction:

• Predictions made on the testing samples.

• Assume predictions are [2, 2, 2, ..., 0].
• Evaluation:

• Actual labels are [2, 2, 2, ..., 0].

• If 27 out of 30 predictions are correct, accuracy is 27/30 = 0.90.

• Store Score:

• Accuracy score of 0.90 is stored.

Final Computation
• scores = [0.90, 0.933, 0.967, 0.867, 0.90]
• Mean Score: Calculated as the average of the scores.

• Mean = (0.90 + 0.933 + 0.967 + 0.867 + 0.90) / 5 = 0.9134

Steps to Calculate Standard
Deviation
• Variance is the average of the squared differences from the Mean.

• Compute the squared differences from the Mean:

(0.90−0.9134)2=(−0.0134)2=0.00018
(0.933−0.9134)2=(0.0196)2=0.000384
(0.967−0.9134)2=(0.0536)2=0.00287
• (0.867−0.9134)2=(−0.0464)2=0.00215
• (0.90−0.9134)2=(−0.0134)2=0.00018

• Sum of squared differences:

• 0.00018+0.000384+0.00287+0.00215+0.00018=0.005764
• Variance:

• = 0.005764 / 5 = 0.001153
• Calculate the Standard Deviation:
• Standard deviation is the square root of the variance.

• ≈0.034
Summary

• Mean Score: 0.9134

• Standard Deviation: 0.034

Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Unit 5 ML
No ratings yet
Unit 5 ML
21 pages
ML Unit1
No ratings yet
ML Unit1
11 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
Understanding Datasets Features Selection Train Test Validation Sets L12
No ratings yet
Understanding Datasets Features Selection Train Test Validation Sets L12
25 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
Unit 3
No ratings yet
Unit 3
37 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Sklearn
No ratings yet
Sklearn
141 pages
Faiml Revision
No ratings yet
Faiml Revision
5 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
CS326 Report
No ratings yet
CS326 Report
36 pages
MLA CT1 - Notes
No ratings yet
MLA CT1 - Notes
17 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
Unit 4
No ratings yet
Unit 4
34 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Module 6 - ML
No ratings yet
Module 6 - ML
30 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
7 ML
No ratings yet
7 ML
38 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Cross-Validation Eg
No ratings yet
Cross-Validation Eg
3 pages
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
No ratings yet
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
12 pages
Answer-4 Shreyansh
No ratings yet
Answer-4 Shreyansh
4 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
Introduction To K-Fold Cross-Validation
No ratings yet
Introduction To K-Fold Cross-Validation
6 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Cross Validation - Notes
No ratings yet
Cross Validation - Notes
10 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
2020 Evaluation PDF
No ratings yet
2020 Evaluation PDF
25 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Unit 2
No ratings yet
Unit 2
28 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
20 pages
Machine Learning in Non Stationary Environments Ab00 PDF
No ratings yet
Machine Learning in Non Stationary Environments Ab00 PDF
263 pages
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
No ratings yet
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
16 pages
New Classification11
No ratings yet
New Classification11
98 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
Artificial Intelligence in Internal Controls and Risk Management
No ratings yet
Artificial Intelligence in Internal Controls and Risk Management
17 pages
Project1 Report
No ratings yet
Project1 Report
21 pages
DM Practicals in Python
No ratings yet
DM Practicals in Python
55 pages
2024 09 30 615762v1 Full
No ratings yet
2024 09 30 615762v1 Full
22 pages
Image Enhancement Effect On The Performance of Convolutional Neural Networks
No ratings yet
Image Enhancement Effect On The Performance of Convolutional Neural Networks
40 pages
Create Cross Validation Rules
No ratings yet
Create Cross Validation Rules
12 pages
An Anaya
No ratings yet
An Anaya
40 pages
Ashenafi Alietal 2024
No ratings yet
Ashenafi Alietal 2024
22 pages
IASRI News January-March 2023 - 280723
No ratings yet
IASRI News January-March 2023 - 280723
30 pages
Unit 2 ML
No ratings yet
Unit 2 ML
14 pages
Leadership - Machine Learning 3564
No ratings yet
Leadership - Machine Learning 3564
12 pages
V11i1 1160
No ratings yet
V11i1 1160
9 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Xiaobo 2010
No ratings yet
Xiaobo 2010
19 pages
KDSH 2024 - PS
No ratings yet
KDSH 2024 - PS
18 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Dataset of Short-Term Prediction of CO2 Concentration Based On A Wireless Sensor Network
No ratings yet
Dataset of Short-Term Prediction of CO2 Concentration Based On A Wireless Sensor Network
13 pages
UPI Fraud Detection Using Convolutional Neural Net
No ratings yet
UPI Fraud Detection Using Convolutional Neural Net
16 pages
Journal Pone 0269323
No ratings yet
Journal Pone 0269323
13 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Ss PPT Presentation
No ratings yet
Ss PPT Presentation
11 pages
Jaggia Chapter 7 2
No ratings yet
Jaggia Chapter 7 2
23 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Ba Unit 4 - Part1
No ratings yet
Ba Unit 4 - Part1
7 pages
An Android Based Course Attendance System Using Face Recognition
No ratings yet
An Android Based Course Attendance System Using Face Recognition
10 pages
Neural Networks: A New Technique For Development of Decision Support Systems in Dentistry
No ratings yet
Neural Networks: A New Technique For Development of Decision Support Systems in Dentistry
5 pages
Greykite Part 2
No ratings yet
Greykite Part 2
2 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
The Smart Math Tricks Secrets to Solving Math Fast and Easy
From Everand
The Smart Math Tricks Secrets to Solving Math Fast and Easy
Leonardo Cruz
No ratings yet
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet

Cross Validation

Uploaded by

Cross Validation

Uploaded by

Cross-Validation

• X: 150 samples with 4 features each.

• Number of splits (n_splits=5): 150 samples / 5 splits = 30 samples per fold.

• Training indices: [30, 31, 32, ..., 149]

• Accuracy score of 0.90 is stored in the scores array.

• The model makes predictions on the new testing set of 30 samples.

• Assume actual labels are [1, 1, 0, ..., 1].

• Accuracy score of 0.933 is stored.

• The model is retrained on the new 120 training samples.

• Predictions made on the 30 testing samples.

• Actual labels are [2, 2, 1, ..., 1].

• Accuracy score of 0.967 is stored.

• The model is retrained on the new training samples.

• Predictions made on the testing samples.

• Actual labels are [0, 0, 1, ..., 2].

• Accuracy score of 0.867 is stored.

• Predictions made on the testing samples.

• Actual labels are [2, 2, 2, ..., 0].

• Accuracy score of 0.90 is stored.

• Mean = (0.90 + 0.933 + 0.967 + 0.867 + 0.90) / 5 = 0.9134

• Compute the squared differences from the Mean:

• Sum of squared differences:

• Mean Score: 0.9134

You might also like