Unit No. 4

Uploaded by

sarthakokane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views4 pages

Unit No. 4

Uploaded by

sarthakokane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Unit No.

4 Statistical Methods
Standard deviation is a statistical measure of the dispersion or variability of a set of values. It
quantifies how much the values in a dataset deviate from the mean (average). A low standard
deviation indicates that the values tend to be close to the mean, while a high standard deviation
indicates that the values are spread out over a wider range.

Mathematically, the standard deviation (σ) is calculated using the following formula:

Key points about standard deviation:

1. Interpretation:
o A small standard deviation indicates that the values are close to the mean, while
a large standard deviation indicates that the values are spread out.
o It provides a measure of the average distance between each data point and the
mean.
o It is expressed in the same units as the original data.
2. Relationship with Variance:
o Standard deviation is the square root of variance. Variance is the average of the
squared differences from the mean, while standard deviation is the square root
of this average.
o Standard deviation is preferred over variance in some cases because it is in the
same units as the original data, making it easier to interpret.
3. Applications:
o Standard deviation is widely used in various fields, including finance,
engineering, natural sciences, and social sciences.
o It is used to quantify risk, assess the variability of data, evaluate the precision
of measurements, and determine the spread of distributions.

1. Normalization:
o Normalization is the process of scaling numerical features to a standard range
or distribution to ensure that they have a similar scale.
o It helps improve the performance and stability of machine learning algorithms,
especially those sensitive to the scale of input features.
o Common normalization techniques include Min-Max scaling and Z-score
normalization.
2. Feature Scaling:
o Feature scaling is a specific type of normalization that involves scaling each
feature (or variable) individually to a similar range.
o It ensures that features with larger magnitudes do not dominate those with
smaller magnitudes during the learning process.
o Feature scaling is particularly important for algorithms that use distance-based
metrics or gradient descent optimization.
3. Min-Max Scaling:
o Min-Max scaling is a normalization technique that rescales features to a fixed
range, typically between 0 and 1.
o It subtracts the minimum value of the feature and then divides by the difference
between the maximum and minimum values.
o Min-Max scaling preserves the original distribution of the data but can be
sensitive to outliers.
4. Bias and Variance:
o Bias and variance are two types of errors in machine learning models that affect
their predictive performance.
o Bias refers to the error introduced by approximating a real-world problem with
a simplified model. High bias can lead to underfitting.
o Variance refers to the error introduced by the model's sensitivity to small
fluctuations in the training data. High variance can lead to overfitting.
o Balancing bias and variance is crucial for building models that generalize well
to unseen data.
5. Regularization:
o Regularization is a technique used to prevent overfitting by adding a penalty
term to the model's loss function.
o It discourages complex models that fit the training data too closely by imposing
constraints on model parameters.
o Common regularization techniques include L1 regularization (Lasso), L2
regularization (Ridge), and ElasticNet regularization.
o Regularization helps improve the generalization ability of models by reducing
variance at the cost of introducing some bias.

Normalization plays a crucial role in regularization techniques like Ridge Regression and Lasso
Regression. Let's explore how normalization interacts with these methods:

1. Ridge Regression:
o Ridge Regression is a linear regression technique that adds a penalty term to the
ordinary least squares (OLS) loss function.
o The penalty term is proportional to the squared magnitudes of the coefficients,
multiplied by a regularization parameter (λ or alpha).
o The goal of Ridge Regression is to shrink the coefficients towards zero,
effectively reducing the model's complexity and variance.
o Normalization is important in Ridge Regression because it ensures that all
features are on a similar scale, preventing features with larger magnitudes from
dominating the regularization term.
o When using Ridge Regression, it's common to normalize the features using
techniques like Min-Max scaling or Z-score normalization before fitting the
model.
2. Lasso Regression:
o Lasso Regression (Least Absolute Shrinkage and Selection Operator) is another
linear regression technique that adds a penalty term to the OLS loss function.
o Unlike Ridge Regression, Lasso Regression uses the L1 norm of the coefficients
as the penalty term.
o Lasso Regression has a tendency to shrink some coefficients all the way to zero,
effectively performing feature selection.
o Normalization is also important in Lasso Regression to ensure that all features
are on a similar scale, as it influences the magnitude of the penalty applied to
each coefficient.
o Like Ridge Regression, it's common to normalize the features before applying
Lasso Regression to prevent any single feature from dominating the penalty
term.

Cross-validation (CV) techniques are used to evaluate the performance of machine learning
models and to tune model hyperparameters. Here are some common cross-validation
techniques:

1. K-fold Cross-Validation:
o In K-fold cross-validation, the original dataset is randomly partitioned into K
equal-sized subsets or "folds".
o The model is trained K times, each time using K-1 folds for training and the
remaining fold for validation.
o The performance metrics (e.g., accuracy, error) are then averaged over the K
folds to obtain an overall estimate of model performance.
o K-fold cross-validation helps to reduce the variability in the performance
estimate compared to a single train-test split.
2. Leave-One-Out Cross-Validation (LOOCV):
o LOOCV is a special case of K-fold cross-validation where K equals the number
of samples in the dataset.
o For each iteration, one data point is left out as the validation set, and the model
is trained on the remaining data.
o LOOCV is computationally expensive, especially for large datasets, but it
provides an unbiased estimate of model performance with low bias.
3. Stratified K-fold Cross-Validation:
o Stratified K-fold cross-validation ensures that each fold contains approximately
the same proportion of target classes as the original dataset.
o It is particularly useful for imbalanced datasets where one class is much more
prevalent than the others.
o Stratified K-fold helps to produce more reliable performance estimates,
especially when the target classes are unevenly distributed.
4. Grid Search Cross-Validation:
o Grid Search Cross-Validation is a technique used to tune hyperparameters by
exhaustively searching through a predefined grid of parameter values.
o For each combination of hyperparameters, the model is trained and evaluated
using K-fold cross-validation.
oThe optimal hyperparameters are selected based on the performance metric
(e.g., accuracy) obtained during cross-validation.
5. Cross-Validation Error:
o Cross-validation error refers to the error estimate obtained during cross-
validation, typically averaged over multiple folds.
o It provides an unbiased estimate of the model's generalization error on unseen
data.
o Cross-validation error is commonly used to compare the performance of
different models or to select the best hyperparameters through techniques like
grid search.

Overall, cross-validation techniques are essential for assessing and optimizing the performance
of machine learning models, providing reliable estimates of model performance, and helping
to prevent overfitting.

6.classification & Regression
No ratings yet
6.classification & Regression
45 pages
Unit 4
No ratings yet
Unit 4
33 pages
Mastering The Basics of Machine Learning
No ratings yet
Mastering The Basics of Machine Learning
65 pages
Week 6
No ratings yet
Week 6
34 pages
ML 3
No ratings yet
ML 3
50 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
PMA Unit-2 PDF
No ratings yet
PMA Unit-2 PDF
19 pages
Updated Syllabus ML Ver 1
No ratings yet
Updated Syllabus ML Ver 1
21 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
ML Endsem
No ratings yet
ML Endsem
14 pages
ML Exam Answers
No ratings yet
ML Exam Answers
26 pages
Lecture 05 06
No ratings yet
Lecture 05 06
40 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
ML Module 2,3,4
No ratings yet
ML Module 2,3,4
13 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
CC02 Group6 Report
No ratings yet
CC02 Group6 Report
36 pages
Unit 2
No ratings yet
Unit 2
23 pages
IML Summary
No ratings yet
IML Summary
12 pages
MLquestions
No ratings yet
MLquestions
26 pages
Data Mining
No ratings yet
Data Mining
33 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Machine
No ratings yet
Machine
21 pages
Ai - W7L14
No ratings yet
Ai - W7L14
22 pages
DS Notes
No ratings yet
DS Notes
36 pages
Chapter 5 2025
No ratings yet
Chapter 5 2025
19 pages
ML Notes
No ratings yet
ML Notes
8 pages
Mod 2 and Mod 3 Pyq
No ratings yet
Mod 2 and Mod 3 Pyq
12 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
机器学习
No ratings yet
机器学习
41 pages
Pyq ML
No ratings yet
Pyq ML
5 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
Regularization - Ridge and Lasso
No ratings yet
Regularization - Ridge and Lasso
7 pages
Feature Selection
No ratings yet
Feature Selection
19 pages
Oral Aswers Dsbda
No ratings yet
Oral Aswers Dsbda
7 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
Dma
No ratings yet
Dma
4 pages
Uklanjanje I Normalizacija Karakteristika
No ratings yet
Uklanjanje I Normalizacija Karakteristika
2 pages
ML
No ratings yet
ML
6 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Scikit Learn User Guide 0.12
100% (1)
Scikit Learn User Guide 0.12
1,049 pages
SML
No ratings yet
SML
8 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
4 Unit 1 Part
No ratings yet
4 Unit 1 Part
9 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
# 7 Report: L1 (Lasso) Regularization
No ratings yet
# 7 Report: L1 (Lasso) Regularization
6 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
SMAI Assignment 7 Report - 20161204 PDF
No ratings yet
SMAI Assignment 7 Report - 20161204 PDF
6 pages
Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Machine Learning OBE Question Paper 2020
0% (1)
Machine Learning OBE Question Paper 2020
3 pages
BE COMP - SEM VII - Final Project Report Guidelines - 3OCT2023
No ratings yet
BE COMP - SEM VII - Final Project Report Guidelines - 3OCT2023
29 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
Unit V
No ratings yet
Unit V
25 pages
Report c3 (Rohit) 1
No ratings yet
Report c3 (Rohit) 1
6 pages
Ids Unit I
No ratings yet
Ids Unit I
46 pages
Deep Learning Fill in The Blanks With Answers
No ratings yet
Deep Learning Fill in The Blanks With Answers
8 pages
FJP Endsem Que Paper 2023-24
No ratings yet
FJP Endsem Que Paper 2023-24
2 pages
Project JAISON
No ratings yet
Project JAISON
61 pages
PM6
No ratings yet
PM6
2 pages
जल्लोष २०२५ युवक महोत्सवाबाबत
No ratings yet
जल्लोष २०२५ युवक महोत्सवाबाबत
12 pages
Machine Learning For Microstrip Patch Antenna Design: Observations and Recommendations
No ratings yet
Machine Learning For Microstrip Patch Antenna Design: Observations and Recommendations
2 pages
Variational Regularization of 3d Data Experiments With Matlab 1st Edition Hebert Montegranario PDF Download
No ratings yet
Variational Regularization of 3d Data Experiments With Matlab 1st Edition Hebert Montegranario PDF Download
32 pages
5.3.3 - Data Template
No ratings yet
5.3.3 - Data Template
1 page
FEMU Main
No ratings yet
FEMU Main
40 pages
Project Initialization: Risk Assesment
No ratings yet
Project Initialization: Risk Assesment
34 pages
Data Science and Machine Learning With Python
No ratings yet
Data Science and Machine Learning With Python
11 pages
Education: Website Google Scholar
No ratings yet
Education: Website Google Scholar
11 pages
2.b Applied Machine Learning Secret Sauce - Slides
No ratings yet
2.b Applied Machine Learning Secret Sauce - Slides
41 pages
Mathematical Model
No ratings yet
Mathematical Model
34 pages
G14 (2) Removed
No ratings yet
G14 (2) Removed
33 pages
4-1 Data Science Syllabus
No ratings yet
4-1 Data Science Syllabus
7 pages
Intelligence-Based Medicine: Rutuja Shinde
No ratings yet
Intelligence-Based Medicine: Rutuja Shinde
15 pages
(ArXiv2407.06322v2) MagMax Leveraging Model Merging For Seamless Continual Learning
No ratings yet
(ArXiv2407.06322v2) MagMax Leveraging Model Merging For Seamless Continual Learning
22 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
13 pages
Artificial Neural Networks in Construction Engineering and Management
No ratings yet
Artificial Neural Networks in Construction Engineering and Management
12 pages
Learning Deep Gradient Descent Optimization For Image Deconvolution
No ratings yet
Learning Deep Gradient Descent Optimization For Image Deconvolution
40 pages
DTAM: Dense Tracking and Mapping in Real-Time Seminar
No ratings yet
DTAM: Dense Tracking and Mapping in Real-Time Seminar
34 pages
Deep Geometric Prior For Surface Reconstruction
No ratings yet
Deep Geometric Prior For Surface Reconstruction
13 pages
1 - A Day in The Life of ChatGPT As A Researcher
No ratings yet
1 - A Day in The Life of ChatGPT As A Researcher
20 pages
Deep Neural Networks and Tabular Data: A Survey
No ratings yet
Deep Neural Networks and Tabular Data: A Survey
22 pages
Sensor Fusion Presentation
No ratings yet
Sensor Fusion Presentation
10 pages
PM4
No ratings yet
PM4
3 pages
Nov Dec 2024
No ratings yet
Nov Dec 2024
2 pages
Shreepad Exp8
No ratings yet
Shreepad Exp8
1 page
5th Exp CN
No ratings yet
5th Exp CN
2 pages
10 Teju
No ratings yet
10 Teju
2 pages
Firodia Karandak Music Plan Format 2025
No ratings yet
Firodia Karandak Music Plan Format 2025
1 page
Date 10/04/2023: To Whom It May Concern
No ratings yet
Date 10/04/2023: To Whom It May Concern
1 page
An Effective Document Image Deblurring Algorithm
No ratings yet
An Effective Document Image Deblurring Algorithm
8 pages

Unit No. 4

Uploaded by

Unit No. 4

Uploaded by

Unit No.

Key points about standard deviation:

You might also like