0% found this document useful (0 votes)
11 views5 pages

Regression Unit-2

Regression analysis is a statistical method used to model the relationship between a dependent variable and one or more independent variables, enabling predictions of continuous values. It is a supervised learning technique primarily used for forecasting, determining causal relationships, and analyzing trends. Types of regression include simple linear regression, multiple linear regression, and nonlinear regression, with applications in various fields such as marketing, real estate, and traffic analysis.

Uploaded by

yashi.bajpai18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Regression Unit-2

Regression analysis is a statistical method used to model the relationship between a dependent variable and one or more independent variables, enabling predictions of continuous values. It is a supervised learning technique primarily used for forecasting, determining causal relationships, and analyzing trends. Types of regression include simple linear regression, multiple linear regression, and nonlinear regression, with applications in various fields such as marketing, real estate, and traffic analysis.

Uploaded by

yashi.bajpai18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Regression

Regression Analysis
Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables. More
specifically, Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when other independent
variables are held fixed. It predicts continuous/real values such as temperature, age, salary,
price, etc.
We can understand the concept of regression analysis using the below example:
Example: Suppose there is a marketing company A, who does various advertisement every
year and get sales on that. The below list shows the advertisement made by the company in
the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2023 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data.
In simple words, "Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
Some examples of regression can be as:
 Prediction of rain using temperature and other factors
 Determining Market trends
 Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:
 Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target variable.
 Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
Outliers: Outlier is an observation which contains either very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be
avoided. Now, the company wants to do the advertisement of $200 in the year 2023 and
wants to know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data.
In simple words, "Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
Some examples of regression can be as:
 Prediction of rain using temperature and other factors
 Determining Market trends
 Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:
 Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target variable.
 Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
Outliers: Outlier is an observation which contains either very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be
avoided.
Below is the mathematical equation for Linear regression:
Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables), a and b are the linear coefficients
A Linear Regression model’s main aim is to find the best fit linear line and the optimal values
of intercept and coefficients such that the error is minimized. Error is the difference between
the actual value and Predicted value and the goal is to reduce this difference.

Statistical tools for high-throughput data analysis In the above diagram,


 x is our dependent variable which is plotted on the x-axis and y is the dependent
variable which is plotted on the y-axis.
 Black dots are the data points i.e the actual values. bo is the intercept which is 10 and
b1 is the slope of the x variable.
 The blue line is the best fit line predicted by the model i.e the predicted values lie on
the blue line.
The vertical distance between the data point and the regression line is known as error or
residual. Each data point has one residual and the sum of all the differences is known as the
Sum of Residuals/Errors.
Mathematical Approach:
Residual/Error = Actual values – Predicted Values
Sum of Residuals/Errors = Sum(Actual- Predicted Values)
Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2
i.e

Some popular applications of linear regression are:


 Analyzing trends and sales estimates
 Salary forecasting
 Real estate prediction
 Arriving at ETAs in traffic.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:
 Simple Linear Regression: If a single independent variable is used to predict the
value of a numerical dependent variable, then such a Linear Regression algorithm is
called Simple Linear Regression.
 Multiple Linear regression: If more than one independent variable is used to predict
the value of a numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.
Equation of Simple Linear Regression, where bo is the intercept, b1 is coefficient or slope, x
is the independent variable and y is the dependent variable.
y=b0+b1x
Equation of Multiple Linear Regression, where bo is the intercept, b1,b2,b3,b4…,bn are
coefficients or slopes of the independent variables x1,x2,x3,x4…,xn and y is the dependent
variable.
y=b0+b1x1+ b2x2+ b3x3+…….+ bnxn
Linear Regression:
 Straight-line regression analysis involves a response variable, y, and a single predictor
variable x.
 It is the simplest form of regression, and models y as a linear function of x. That is,y =
b+wx where the variance of y is assumed to be constant band w are regression
coefficients specifying the Y-intercept and slope of the line.
 The regression coefficients, w and b, can also be thought of as weights, so that we can
equivalently write, y = w0+w1x These coefficients can be solved for by the method of
least squares, which estimates the best-fitting straight line as the one that minimizes
the error between the actual data and the estimate of the line.
 Let D be a training set consisting of values of predictor variable, x, for some
population and their associated values for response variable, y. The training set
contains |D| data points of the form (x1, y1), (x2, y2), … , (x|D|, y|D|).
The regression coefficients can be estimated using this method with the following equations:
where x is the mean value of x1, x2, … , x|D|, and y is the mean value of y1, y2,…, y|D|. The
coefficients w0 and w1 often provide good approximations to otherwise complicated
regression equations.
Multiple Linear Regression:
 It is an extension of straight-line regression so as to involve more than one predictor
variable.
 It allows response variable y to be modeled as a linear function of, say, n predictor
variables or attributes, A1, A2, …, An, describing a tuple, X.
An example of a multiple linear regression model based on two predictor attributes or
variables, A1 and A2, isy = w0+w1x1+w2x2
where x1 and x2 are the values of attributes A1 and A2, respectively, in X.
 Multiple regression problems are instead commonly solved with the use of statistical
software packages, such as SAS,SPSS, and S-Plus.
Nonlinear Regression:
 It can be modeled by adding polynomial terms to the basic linear model.
 By applying transformations to the variables, we can convert the nonlinear model into
a linear one that can then be solved by the method of least squares.
 Polynomial Regression is a special case of multiple regression. That is, the addition of
high-order terms like x2, x3, and so on, which are simple functions of the single
variable, x, can be considered equivalent to adding new independent variables.
Transformation of a polynomial regression model to a linear regression model:
Consider a cubic polynomial relationship given by
y = w0+w1x+w2x2+w3x3
To convert this equation to linear form, we define new variables:
x1 = x, x2 = x2 ,x3 = x3
It can then be converted to linear form by applying the above assignments, resulting in the
Equation
y = w0+w1x+w2x2+w3x3
which is easily solved by the method of least squares using software for regression analysis.

You might also like