0% found this document useful (0 votes)

15 views33 pages

Unit 4

The document discusses various statistical methods including normalization, bias and variance, regularization, and cross validation. Specifically, it covers topics such as feature scaling to normalize data between 0 and 1, how regularization adds penalties to reduce overfitting, and that bias is when a statistic overestimates or underestimates a true parameter. It provides technical definitions and explanations of these key statistical concepts.

Uploaded by

PaiEducation

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views33 pages

Unit 4

Uploaded by

PaiEducation

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Unit IV

STATISTICAL METHODS

Sumit Kr. Choubey

1
Table of contents

1. Normalization
Feature Scaling

2. Bias & Variance

3. Regularization

4. Cross Validation

2
Normalization

3
Normalization

“ The word “normalization” is used informally in statistics, and so the term nor-
malized data can have multiple meanings. In most cases, when you normalize
data you eliminate the units of measurement, enabling you to more easily
compare data from different places.

4
”
Normalization I
In statistics and applications of statistics, normalization can have a range of meanings.
• In the simplest cases, normalization means adjusting values measured on different
scales to a notionally common scale, often prior to averaging.
• In more complicated cases, normalization may refer to more sophisticated adjust-
ments where the intention is to bring the entire probability distributions of ad-
justed values into alignment.

In the case of normalization of scores in educational assessment,

1 there may be an intention to align distributions to a normal distri-
bution.

A different approach to normalization of probability distributions is

2 quantile normalization, where the quantiles of the different measures
are brought into alignment.

5
Normalization II

• In another usage in statistics, normalization refers to the creation of shifted and

scaled versions of statistics, where the intention is that these normalized values
allow the comparison of corresponding normalized values for different datasets in
a way that eliminates the effects of certain gross influences, as in an anomaly time
series.
• Some types of normalization involve only a rescaling, to arrive at values relative
to some size variable. In terms of levels of measurement, such ratios only make
sense for ratio measurements (where ratios of measurements are meaningful), not
interval measurements (where only distances are meaningful, but not ratios).

6
Normalization III
• There are different types of normalizations in statistics : nondimensional ratios of
errors, residuals, means and standard deviations, which are hence scale invariant.
some of which may be summarized as follows.

Standardization : Transforming data using a z-score or t-score. This is

usually called standardization. In the vast majority of cases, if a statis-
1 tics textbook is talking about normalizing data, then this is the defi-
nition of “normalization” they are probably using.

feature scaling : Rescaling data to have values between 0 and 1. This

2 is usually called feature scaling.

Standardizing residuals: Ratios used in regression analysis can force

3 residuals into the shape of a normal distribution.

7
Normalization IV

4 Normalizing Moments using the formula µ/σ.

Normalizing vectors (in linear algebra) to a norm of one. Normaliza-

5 tion in this sense means to transform a vector so that it has a length
of one.

8
Feature Scaling I

Feature scaling is used to bring all values into the range [0, 1]. This is also called unity
based normalization. This can be generalized to restrict the range of values in the dataset
between any arbitrary points a and b.

Feature is a method used to normalize the range of independent variables

scaling or features of data. In data processing, it is also known as data nor-
malization and is a preprocessing step.

Why Feature scaling?

▶ Since the range of values of raw data varies widely, in some machine learning
algorithms, objective functions will not work properly without normalization.
▶ For example, many classifiers calculate the distance between two points by the
Euclidean distance. If one of the features has a broad range of values, the
distance will be governed by this particular feature.

9
Feature Scaling II

▶ Therefore, the range of all features should be normalized so that each feature
contributes approximately proportionately to the final distance.
▶ Another reason why feature scaling is applied is that gradient descent converges
much faster with feature scaling than without it.
▶ It’s also important to apply feature scaling if regularization is used as part of the
loss function (so that coefficients are penalized appropriately).

10
Feature Scaling III

Min-Max Also known as Rescaling or min-max normalization, is the simplest

scaling method and consists in rescaling the range of features to scale the
range in [0, 1] or [−1, 1]. The general formula for a min-max of [0, 1]
is given as:

x − min(x)
x′ =
max(x) − min(x)

To rescale a range between an arbitrary set of values [a, b], the for-
mula becomes:

(x − min(x))(b − a)
x′ = a +
max(x) − min(x)

11
Feature Scaling IV

Mean nor- To normalize mean(x) of the data formula is

malization
x − mean(x)
x′ =
max(x) − min(x)

Standardization We can handle various types of data, e.g. audio signals and pixel
(Z-score Nor- values for image data, and this data can include multiple dimen-
malization) sions. Feature standardization makes the values of each feature in
the data have zero-mean and unit-variance.

x − mean(x)
x′ =
σ

12
Bias & Variance I

What is it ? Bias is the tendency of a statistic to overestimate or underesti-

mate a parameter. Bias can seep into results for a slew of reasons
including sampling or measurement errors, or unrepresentative
samples.

Statistical bias is a feature of a statistical technique or of its results whereby the ex-
pected value of the results differs from the true underlying quantitative parameter be-
ing estimated.

13
Regularization I

What is Reg- Regularization is a way to avoid overfitting by penalizing high-

ularization? valued regression coefficients. In simple terms, it reduces param-
eters and shrinks (simplifies) the model. This more streamlined,
more parsimonious model will likely perform better at predictions.
Regularization adds penalties to more complex models and then
sorts potential models from least overfit to greatest; The model
with the lowest “overfitting” score is usually the best choice for
predictive power.

Why Regularization?
• Regularization is necessary because least squares regression methods, where the
residual sum of squares is minimized, can be unstable. This is especially true if there
is multicollinearity in the model.

14
Regularization II

• However, the mere practice of model fitting comes with a major pitfall: any set of
data can be fitted to a model, even if that model is ridiculously complex.
• For example, take a simple data set of two points. A set of two points can be fitted
by multiple models, including a linear model (green) and an unlimited number of
higher-degree polynomial models (red).

• Fitting a small amount of data will often lead to a complex, overfit model. A simpler
model may be underfit and will perform poorly with predictions.

15
Regularization III

• Just because two data points fit a line perfectly doesn’t mean that a third point will
fall exactly on that line – in fact, it’s highly unlikely.
• Simply put, regularization penalizes models that are more complex in favor of sim-
pler models (ones with smaller regression coefficients) – but not at the expense of
reducing predictive power.

16
Regularization IV

Penalty Terms Regularization works by biasing data towards particular values

(such as small values near zero). The bias is achieved by adding
a tuning parameter to encourage those values:
• L1 regularization adds an L1 penalty equal to the absolute
value of the magnitude of coefficients. In other words, it lim-
its the size of the coefficients. L1 can yield sparse models (i.e.
models with few coefficients); Some coefficients can become
zero and eliminated. Lasso regression uses this method.
• L2 regularization adds an L2 penalty equal to the square of
the magnitude of coefficients. L2 will not yield sparse models
and all coefficients are shrunk by the same factor (none are
eliminated). Ridge regression and SVMs use this method.
• Elastic nets combine L1 & L2 methods, but do add a hyperpa-
rameter

17
Ridge Regression
Ridge regression is a way to create a parsimonious model when the number of pre-
dictor variables in a set exceeds the number of observations, or when a data set has
multicollinearity (correlations between predictor variables).

18
Lasso Regression

19
Cross Validation I

Cross-Validation also referred to as out of sampling technique is an essential element

of ML. It is a resampling procedure used to evaluate ML models and access how the
model will perform for an independent test dataset.

Cross Val- Cross validation (also called rotation estimation, or out-of-sample

idation testing) is one way to ensure your model is robust. A portion of your
data (called a holdout sample) is held back; The bulk of the data is
trained and the holdout sample is used to test the model. This is
different from the “classical” method of model testing, which uses
all of the data to test the model.

20
Cross Validation II

Consider the following situation:

Example I want to catch the subway to go to my office. My plan is to take my

car, park at the subway and then take the train to go to my office.
My goal is to catch the train at 8.15 am every day so that I can reach
my office on time. I need to decide the following: [A] the time at
which I need to leave from my home and [B] the route I will take
to drive to the station.

21
Cross Validation III

In the above example, I have two parameters (i.e., time of departure from home
and route to take to the station) and I need to choose these parameters such
that I reach the station by 8.15 am.

In order to solve the above problem I may try out different sets of ’parame-
ters’ (i.e., different combination of times of departure and route) on Mondays,
Wednesdays, and Fridays, to see which combination is the ’best’ one. The idea is
that once I have identified the best combination I can use it every day so that I
achieve my objective.

22
Cross Validation IV

Problem of The problem with the above approach is that I may overfit which
Overfitting essentially means that the best combination I identify may in
some sense may be unique to Mon, Wed and Fridays and that
combination may not work for Tue and Thu. Overfitting may hap-
pen if in my search for the best combination of times and routes I
exploit some aspect of the traffic situation on Mon/Wed/Fri which
does not occur on Tue and Thu.

One Solution Cross-validation is one solution to overfitting. The idea is that once
to Overfit- we have identified our best combination of parameters (in our
ting: Cross- case time and route) we test the performance of that set of pa-
Validation rameters in a different context. Therefore, we may want to test on
Tue and Thu as well to ensure that our choices work for those days
as well.

23
Cross Validation V

Cross validation didn’t become prevalent until huge datasets came into being.
Prior to that, analysts preferred to use all the available data to test a model. With
larger data sets, it makes sense to hold back a portion of the data to test the
model. However, the question becomes which portion of the data do you hold
back? Most data isn’t homogeneous across it’s entire length, so if you choose
the wrong chunk of data, you could invalidate a perfectly good model. Cross val-
idation solves this problem by using multiple, sequential holdout samples that
cover all of the data.

24
Leave p-out cross-validation

• Leave p-out cross-validation (LpOCV) is an exhaustive cross-validation technique,

that involves using p-observation as validation data, and remaining data is used
to train the model.
• This is repeated in all ways to cut the original sample on a validation set of p
observations and a training set.
• A variant of LpOCV with p=2 known as leave-pair-out cross-validation has been
recommended as a nearly unbiased method for estimating the area under ROC
curve of a binary classifier.

25
Leave One-out cross-validation
• Leave-one-out cross-validation (LOOCV) is an exhaustive cross-validation technique.
It is a category of LpOCV with the case of p=1.
• For a dataset having n rows, 1st row is selected for validation, and the rest (n-1)
rows are used to train the model. For the next iteration, the 2nd row is selected
for validation and rest to train the model. Similarly, the process is repeated until n
steps or the desired number of operations.

Both the above two cross-validation techniques are the types of exhaustive
cross-validation. Exhaustive cross-validation methods are cross-validation meth-
ods that learn and test in all possible ways. They have the same pros and cons.

Pros Simple, easy to understand, and implement.

Cons • The model may lead to a low bias.

• The computation time required is high.
26
Holdout cross-validation I

• The holdout technique is an exhaustive cross-validation method, that randomly

splits the dataset into train and test data depending on data analysis.

• In the case of holdout cross-validation, the dataset is randomly split into training
and validation data. Generally, the split of training data is more than test data. The
training data is used to induce the model and validation data is evaluates the per-
formance of the model.

27
Holdout cross-validation II

The more data is used to train the model, the better the model is. For the holdout
cross-validation method, a good amount of data is isolated from training.

Pros Simple, easy to understand, and implement.

Cons • Not suitable for an imbalanced dataset.

• A lot of data is isolated from training the model.

28
k-fold cross-validation I

• In k-fold cross-validation, the original dataset is equally partitioned into k subparts

or folds. Out of the k-folds or groups, for each iteration, one group is selected as
validation data, and the remaining (k-1) groups are selected as training data.

• The process is repeated for k times until each group is treated as validation and
remaining as training data.

29
k-fold cross-validation II

The final accuracy of the model is computed by taking the mean accuracy of the
k-models validation data.
∑
k
acci
acccv =
k
i=1
LOOCV is a variant of k-fold cross-validation where k=n.

Pros • The model has low bias

• Low time complexity
• The entire dataset is utilized for both training and validation.

Cons • Not suitable for an imbalanced dataset.

30
Stratified k-fold cross-validation I

Stratified The splitting of data into folds may be governed by criteria such as
ensuring that each fold has the same proportion of observations
with a given categorical value, such as the class outcome value.
This is called stratified cross-validation.

• For all the cross-validation techniques discussed above, they may not work well
with an imbalanced dataset. Stratified k-fold cross-validation solved the problem
of an imbalanced dataset.

31
Stratified k-fold cross-validation II

• In Stratified k-fold cross-validation, the dataset is partitioned into k groups or folds

such that the validation data has an equal number of instances of target class label.
This ensures that one particular class is not over present in the validation or train
data especially when the dataset is imbalanced.

32
Stratified k-fold cross-validation III

• The final score is computed by taking the mean of scores of each fold.

Pros • Works well for an imbalanced dataset.

Cons • Now suitable for time series dataset.

Noesis Exed - CFA Level 2 Formula Sheet (2023)
100% (2)
Noesis Exed - CFA Level 2 Formula Sheet (2023)
48 pages
Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
BFI Analysis Document Final
No ratings yet
BFI Analysis Document Final
52 pages
Data Mining
No ratings yet
Data Mining
33 pages
Unit No. 4
No ratings yet
Unit No. 4
4 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Statistic and Data Science Ii PDF
No ratings yet
Statistic and Data Science Ii PDF
37 pages
DS Notes
No ratings yet
DS Notes
36 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
Lecture 1.3
No ratings yet
Lecture 1.3
11 pages
PMA Unit-2 PDF
No ratings yet
PMA Unit-2 PDF
19 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Data Pre Processing
No ratings yet
Data Pre Processing
23 pages
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
No ratings yet
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
26 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Regularization
No ratings yet
Regularization
46 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Unit 2
No ratings yet
Unit 2
23 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Ai - W7L14
No ratings yet
Ai - W7L14
22 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
02 - Diagnostics For Machine Learning Model
No ratings yet
02 - Diagnostics For Machine Learning Model
20 pages
Standardisation Vs Normalisation
No ratings yet
Standardisation Vs Normalisation
6 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Bias Variance
No ratings yet
Bias Variance
14 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
Regularization
No ratings yet
Regularization
9 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Session 3
No ratings yet
Session 3
26 pages
Lecture 05 06
No ratings yet
Lecture 05 06
40 pages
06 Regularizations
No ratings yet
06 Regularizations
42 pages
Unit 3
No ratings yet
Unit 3
55 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
"Regularization
No ratings yet
"Regularization
48 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
机器学习
No ratings yet
机器学习
41 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
My Notes
No ratings yet
My Notes
15 pages
Normalization Vs Standardization
No ratings yet
Normalization Vs Standardization
2 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Overfitting and Mitigation
No ratings yet
Overfitting and Mitigation
15 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Data Highlights Combined
No ratings yet
Data Highlights Combined
36 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
LMVTX Legg Mason Value Trust Pcbax Blackrock Asset Allocation FCNTX Fidelity Contrafund Usawx Usaa World Growth
No ratings yet
LMVTX Legg Mason Value Trust Pcbax Blackrock Asset Allocation FCNTX Fidelity Contrafund Usawx Usaa World Growth
17 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
The Normalization of The Dielectric Dissipation Factor For Transformer Insulation
No ratings yet
The Normalization of The Dielectric Dissipation Factor For Transformer Insulation
6 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
Regression and Factor
No ratings yet
Regression and Factor
95 pages
MAS-I Sample Questions
No ratings yet
MAS-I Sample Questions
8 pages
Statapp Finals
No ratings yet
Statapp Finals
11 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Econometrics1 Cha2
100% (1)
Econometrics1 Cha2
77 pages
Estimation Theory Overview
100% (1)
Estimation Theory Overview
17 pages
Unit4 Problemset
No ratings yet
Unit4 Problemset
6 pages
1 Hsiao
No ratings yet
1 Hsiao
4 pages
Example Metrics - Final Assignment - WS1920 - SH
No ratings yet
Example Metrics - Final Assignment - WS1920 - SH
9 pages
Study Guide - Inference Procedures
No ratings yet
Study Guide - Inference Procedures
4 pages
08 Introduction To Correlation and Linear Regression Analysis 2
No ratings yet
08 Introduction To Correlation and Linear Regression Analysis 2
5 pages
Question Bank - ML - Unit1,2,3
No ratings yet
Question Bank - ML - Unit1,2,3
3 pages
Stata Excel Spreadsheet
No ratings yet
Stata Excel Spreadsheet
43 pages
Rolling Regression Theory
No ratings yet
Rolling Regression Theory
30 pages
Chapter 10: Multicollinearity Chapter 10: Multicollinearity: Iris Wang
No ratings yet
Chapter 10: Multicollinearity Chapter 10: Multicollinearity: Iris Wang
56 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Girish Food Store-Case Study-2
No ratings yet
Girish Food Store-Case Study-2
25 pages
Kompetensi Pedagogik: Case Processing Summary
No ratings yet
Kompetensi Pedagogik: Case Processing Summary
3 pages
Fundamentals of Biostatistics 8th Edition by Bernard Rosner 130526892X 9798214344201 - Read The Ebook Online or Download It To Own The Full Content
100% (10)
Fundamentals of Biostatistics 8th Edition by Bernard Rosner 130526892X 9798214344201 - Read The Ebook Online or Download It To Own The Full Content
90 pages
18 Simultaneous Equation Models Two Stage Least Squares Estimation
No ratings yet
18 Simultaneous Equation Models Two Stage Least Squares Estimation
6 pages
Autoregressive Conditional Heteroskedasticity ARCH Family of Estimators
No ratings yet
Autoregressive Conditional Heteroskedasticity ARCH Family of Estimators
33 pages
Module 2 Mid Terms 2023-2025
No ratings yet
Module 2 Mid Terms 2023-2025
11 pages
Unit 3 Partial and Multiple Correlations: Structure
No ratings yet
Unit 3 Partial and Multiple Correlations: Structure
17 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit IV

Sumit Kr. Choubey

2. Bias & Variance

In the case of normalization of scores in educational assessment,

A different approach to normalization of probability distributions is

• In another usage in statistics, normalization refers to the creation of shifted and

Standardization : Transforming data using a z-score or t-score. This is

feature scaling : Rescaling data to have values between 0 and 1. This

Standardizing residuals: Ratios used in regression analysis can force

4 Normalizing Moments using the formula µ/σ.

Normalizing vectors (in linear algebra) to a norm of one. Normaliza-

Feature is a method used to normalize the range of independent variables

Why Feature scaling?

Min-Max Also known as Rescaling or min-max normalization, is the simplest

Mean nor- To normalize mean(x) of the data formula is

What is it ? Bias is the tendency of a statistic to overestimate or underesti-

What is Reg- Regularization is a way to avoid overfitting by penalizing high-

Penalty Terms Regularization works by biasing data towards particular values

Cross-Validation also referred to as out of sampling technique is an essential element

Cross Val- Cross validation (also called rotation estimation, or out-of-sample

Consider the following situation:

Example I want to catch the subway to go to my office. My plan is to take my

• Leave p-out cross-validation (LpOCV) is an exhaustive cross-validation technique,

Pros Simple, easy to understand, and implement.

Cons • The model may lead to a low bias.

• The holdout technique is an exhaustive cross-validation method, that randomly

Pros Simple, easy to understand, and implement.

Cons • Not suitable for an imbalanced dataset.

• In k-fold cross-validation, the original dataset is equally partitioned into k subparts

Pros • The model has low bias

Cons • Not suitable for an imbalanced dataset.

• In Stratified k-fold cross-validation, the dataset is partitioned into k groups or folds

Pros • Works well for an imbalanced dataset.

Cons • Now suitable for time series dataset.

You might also like