0% found this document useful (0 votes)
36 views19 pages

Unit 2 Topic 1 REGRESSION

Uploaded by

yrajat2650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views19 pages

Unit 2 Topic 1 REGRESSION

Uploaded by

yrajat2650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

REGRESSION

Dr. Anil Kumar Dubey


Associate Professor,
Computer Science & Engineering Department,
ABES EC, Ghaziabad
Affiliated to Dr. A.P.J. Abdul Kalam Technical University,
Uttar Pradesh, Lucknow
Basic
 Regression is a statistical method that tries to
determine the strength and character of the
relationship between one dependent variable and a
series of other variables. It is used in finance,
investing, and other disciplines.

 Regression analysis is a set of statistical methods


used for the estimation of relationships between a
dependent variable and one or more independent
variables. It can be utilized to assess the strength of
the relationship between variables and for modeling
the future relationship between them.
Conti…
Regression, a statistical approach, dissects
the relationship between dependent and
independent variables, enabling predictions
through various regression models.
Regression
Regression is a statistical approach used to
analyze the relationship between a
dependent variable (target variable) and one
or more independent variables (predictor
variables).

Objective is to determine the most suitable


function that characterizes the connection
between these variables.
Conti…
 It is a supervised machine learning technique, used
to predict the value of the dependent variable for
new, unseen data. It models the relationship between
the input features and the target variable, allowing
for the estimation or prediction of numerical values.

 Regression analysis problem works with if output


variable is a real or continuous value, such as
“salary” or “weight”. Many different models can be
used, the simplest is the linear regression. It tries to
fit data with the best hyper-plane which goes through
the points.
Linear Model Assumptions
 Linear regression analysis is based on six fundamental
assumptions:
 The dependent and independent variables show a
linear relationship between the slope and the intercept.
 The independent variable is not random.
 The value of the residual (error) is zero.
 The value of the residual (error) is constant across all
observations.
 The value of the residual (error) is not correlated across
all observations.
 The residual (error) values follow the normal
distribution.
Regression Types

Simple Regression
• Used to predict a continuous dependent variable based on a
single independent variable.
• Simple linear regression should be used when there is only a
single independent variable.
Conti…
 Multiple Regression
◦ Used to predict a continuous dependent variable based
on multiple independent variables.
◦ Multiple linear regression should be used when there are
multiple independent variables.

 NonLinear Regression
◦ Relationship between the dependent variable and
independent variable(s) follows a nonlinear pattern.
◦ Provides flexibility in modeling a wide range of functional
forms.
Simple Linear Regression
 Simple linear regression is a model that assesses the
relationship between a dependent variable and an
independent variable. The simple linear model is
expressed using the following equation:
Y = a + bX + ϵ
Where:
Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
Multiple Linear Regression
 Multiplelinear regression analysis is essentially similar
to the simple linear model, with the exception that
multiple independent variables are used in the model.
The mathematical representation of multiple linear
regression is:
Y = a + bX1 + cX2 + dX3 + ϵ
Where:
Y – Dependent variable
X1, X2, X3 – Independent (explanatory) variables
a – Intercept
b, c, d – Slopes
Terminologies related to Regression
Response Variable: The primary factor to
predict or understand in regression, also known as
the dependent variable or target variable.

Predictor Variable: Factors influencing the


response variable, used to predict its values; also
called independent variables.

Outliers: Observations with significantly low or


high values compared to others, potentially
impacting results and best avoided.
Conti…
Multicollinearity: High correlation among
independent variables, which can complicate
the ranking of influential variables.

Underfitting and Overfitting: Overfitting


occurs when an algorithm performs well on
training but poorly on testing, while
underfitting indicates poor performance on
both datasets.
Characteristics of Regression
Continuous Target Variable: Regression deals
with predicting continuous target variables that
represent numerical values. Examples include
predicting house prices, forecasting sales figures, or
estimating patient recovery times.

Error Measurement: Regression models are


evaluated based on their ability to minimize the
error between the predicted and actual values of the
target variable. Common error metrics include mean
absolute error (MAE), mean squared error (MSE),
and root mean squared error (RMSE).
Conti…
 Model Complexity: Regression models range from
simple linear models to more complex nonlinear
models. The choice of model complexity depends on
the complexity of the relationship between the input
features and the target variable.

 Overfitting and Underfitting: Regression models are


susceptible to overfitting and underfitting.
 Interpretability: The interpretability of regression
models varies depending on the algorithm used. Simple
linear models are highly interpretable, while more
complex models may be more difficult to interpret.
Conti…
In ML, "overfitting" means a model has a high
accuracy on the training data but a significantly lower
accuracy on the testing data, indicating that the
model has learned the training data too closely
and cannot generalize well to new data.
In ML, "underfitting" refers to a situation where
both the training accuracy and testing
accuracy are low, indicating that the model is
too simple to capture the patterns in the data
and is not performing well on either the
training data or new unseen data.
Examples
To predicting age of a person
To predicting nationality of a person
To predicting whether stock price of a company will
increase tomorrow
To predicting whether a document is related to
sighting of UFOs?

Note: Predicting age of a person (because it is a real


value, predicting nationality is categorical, whether
stock price will increase is discrete-yes/no answer,
predicting whether a document is related to UFO is
Linear Regression
Logistic Regression
Thanks

You might also like