0% found this document useful (0 votes)
6 views29 pages

Linear Regression Multilinear Polynomial

The document introduces statistical learning techniques, focusing on machine learning concepts such as linear regression, multilinear regression, and polynomial regression. It explains the importance of algorithms, data, and training in creating machine learning models to predict outcomes based on input variables. Additionally, it discusses assumptions necessary for linear regression and methods to prevent overfitting through techniques like Lasso and Ridge regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views29 pages

Linear Regression Multilinear Polynomial

The document introduces statistical learning techniques, focusing on machine learning concepts such as linear regression, multilinear regression, and polynomial regression. It explains the importance of algorithms, data, and training in creating machine learning models to predict outcomes based on input variables. Additionally, it discusses assumptions necessary for linear regression and methods to prevent overfitting through techniques like Lasso and Ridge regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction to Statistical Learning Techniques

Linear Regression, Multilinear regression, Polynomial regression

Ashis Kumar Pati

Centre for Data Science


SOA University

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 1 / 16


Introduction

What is Machine Learning or statistical machine learning?


Let the machine learn by them selves.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 2 / 16


Introduction

What is Machine Learning or statistical machine learning?


Let the machine learn by them selves. Is it possible?

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 2 / 16


Introduction

What is Machine Learning or statistical machine learning?


Let the machine learn by them selves. Is it possible?We will feed the data
to the machine, we will train the machines with some algorithms to learn
from that data. So that it can make predictions for similar data.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 2 / 16


Introduction

What is Machine Learning or statistical machine learning?


Let the machine learn by them selves. Is it possible?We will feed the data
to the machine, we will train the machines with some algorithms to learn
from that data. So that it can make predictions for similar data.

Why is it called statistical machine learning?


All Machine learning algorithm are based on Statistical theories.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 2 / 16


Introduction

What is Machine Learning or statistical machine learning?


Let the machine learn by them selves. Is it possible?We will feed the data
to the machine, we will train the machines with some algorithms to learn
from that data. So that it can make predictions for similar data.

Why is it called statistical machine learning?


All Machine learning algorithm are based on Statistical theories.

Essentially, Statistical learning tries to figure out the relation between a


variable y and a set of variables X1 , . . . ,Xp .

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 2 / 16


Introduction

What is Machine Learning or statistical machine learning?


Let the machine learn by them selves. Is it possible?We will feed the data
to the machine, we will train the machines with some algorithms to learn
from that data. So that it can make predictions for similar data.

Why is it called statistical machine learning?


All Machine learning algorithm are based on Statistical theories.

Essentially, Statistical learning tries to figure out the relation between a


variable y and a set of variables X1 , . . . ,Xp . That is, it assumes that
y = f (X1 , ..., Xp ) + ϵ, and tries to estimate f with a goal of predicting a
value of y at a given new value of X1 , ..., Xp .

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 2 / 16


Introduction

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 3 / 16


Introduction

Terminologies:
1 Machine learning model: is a mathematical representation created
using a machine learning algorithm to learn patterns or relationships
from data. It is used to make predictions, classifications, or decisions
based on new, unseen data.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 4 / 16


Introduction

Terminologies:
1 Machine learning model: is a mathematical representation created
using a machine learning algorithm to learn patterns or relationships
from data. It is used to make predictions, classifications, or decisions
based on new, unseen data.
2 Data: Data: The information used to train the model(train data) and
test the model(test data).

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 4 / 16


Introduction

Terminologies:
1 Machine learning model: is a mathematical representation created
using a machine learning algorithm to learn patterns or relationships
from data. It is used to make predictions, classifications, or decisions
based on new, unseen data.
2 Data: Data: The information used to train the model(train data) and
test the model(test data).
3 Algorithm: The method or process used to train the model.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 4 / 16


Introduction

Terminologies:
1 Machine learning model: is a mathematical representation created
using a machine learning algorithm to learn patterns or relationships
from data. It is used to make predictions, classifications, or decisions
based on new, unseen data.
2 Data: Data: The information used to train the model(train data) and
test the model(test data).
3 Algorithm: The method or process used to train the model.
4 Training: The process of feeding data into the algorithm to create
the model.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 4 / 16


Introduction

Terminologies:
1 Machine learning model: is a mathematical representation created
using a machine learning algorithm to learn patterns or relationships
from data. It is used to make predictions, classifications, or decisions
based on new, unseen data.
2 Data: Data: The information used to train the model(train data) and
test the model(test data).
3 Algorithm: The method or process used to train the model.
4 Training: The process of feeding data into the algorithm to create
the model.
5 Prediction: Once trained, the model can be used to make predictions
or infer outcomes based on new input data.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 4 / 16


Introduction

Machine Learning

Supervised ML Unsupervised ML Semisupervised ML

Regression Classification Dimension Reduction Self-training

Linear LR DT Clustering Co-training

Multilinear NB SVM K-NN K-means Agglomerative

Polynomial

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 5 / 16


Introduction:

Questions to addressed:
Is the target variable categorical or numerical
How does the model work?
What is the assumption?
What is the error?
How to predict for the new data point?

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 6 / 16


Regression

When the target variable y is continuous.

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 7 / 16


Regression

When the target variable y is continuous.


Used for forecasting or predicting price/temperature/score
want to study how the change in independent variable(x) will affect
the change in the dependent variable(y).
Linear Regression
Polynomial regression
Ridge regression, Lasso regression
Local polynomial approximation(Piece wise linear apprxmn)
Wavelet regression
Decision tree regressor
Elastic Net regression

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 7 / 16


Regression

When the target variable y is continuous.


Used for forecasting or predicting price/temperature/score
want to study how the change in independent variable(x) will affect
the change in the dependent variable(y).
Linear Regression
Polynomial regression
Ridge regression, Lasso regression
Local polynomial approximation(Piece wise linear apprxmn)
Wavelet regression
Decision tree regressor
Elastic Net regression
Quantile, Tobit, Ordinal, Cox and Many more...

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 7 / 16


Linear regression

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 8 / 16


Linear Regression

We shall approximate the data with a straight line


The straight line is parametrized by slope(m) and intercept(c).
We want to find m and c such that the error will be minimum
Error:
Individual point Error: ŷi − yi P
n Pn
Total Squared Error: E (m, c) = i=1 (ŷi − yi )2 = i=1 (mxi + c − yi )2
Mean squared Error: Total squared error/n Pn
We can consider Mean Absolute error MAE= n1 i=1 |ŷi − yi |2
If we will consider Mean squared error, it is convex in m and c, so
critical points will be the points of minimum.
∂E (m,c)
By solving ∂m = 0 and ∂E ∂c
(m,c)
= 0 we get that
Pn
(x−x)(y −y )
m= i=1
Pn 2 and c = y − mx
i=1 (x−x)

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 9 / 16


Assumptions

* Basic Assumptions before doing linear


regression

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 10 / 16


Assumptions

* Basic Assumptions before doing linear


regression
Linear relationship between dependent
and independent variables(Use scatter
plot)
No multi-colinearity or linear
dependence among independent
variables(Check correlation matrix)
Normality of Residuals (Plot residuals
vs. predicted values)

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 10 / 16


Assumptions

* Basic Assumptions before doing linear


regression
Linear relationship between dependent
and independent variables(Use scatter
plot)
No multi-colinearity or linear
dependence among independent
variables(Check correlation matrix)
Normality of Residuals (Plot residuals
vs. predicted values)

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 10 / 16


Multiple linear regression

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 11 / 16


Polynomial Regression

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 12 / 16


Multilinear vs polynomial

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 13 / 16


Lasso and ridge Regression
When the number of variables are high, sometimes this leads to over
fitting, and increasing the complexity of the model.
We can reduce the number of variables by taking some of the
coefficients in linear regression to be zero.
Along with minimizing the mean squared norm we can add an
constraint on number of nonzero coefficients
n
X X
min ( (yi − β T X ))such that |βi | ≤ t
β T =β1 ,...,βn
i=1

where we are saying that out of n-coefficients(coming from n


variables) t-coefficients are non zero.
This can be rewritten as (using Lagrange multiplier theorem)
n
X X
min ( (yi − β T X )) + λ |βi |
β T =β1 ,...,βn
i=1 i

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 14 / 16


Regularization

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 15 / 16


Penalized regression

Ashis Kumar Pati (ITER,SOA) Introduction to Statistical Learning Techniques 2024 16 / 16

You might also like