Linear Regression

Linear Regression
Regression:
 It is one of the most important statistical and machine learning tools.
 It is defined as the parametric technique that allows us to take decisions based on data.
 It allows us to make predictions based upon data by learning the relationship between input and
output variables.
 The output variable dependent on the input variables are continuous valued real numbers.
 Regression help us to understand how the value of output variable changes with respect to the
changes in the input variable.

 Regression techniques are used for the prediction of prices , economics and variations.
Simple Linear Regression:
 It is the simplest form of linear regression used when there is one single input variable(input) for
the output variable(target).
 The input variable helps in predicting the value of the output variable.
 It is referred to as X.
 The output or target variable is the variable that we want to predict(y).
 ß0 , called the intercept , shows the point where the estimated regression line crosses the y-axis.
 ß1 determines the slope of the estimated regression line.
 Random error describes the random component of the linear component between independent
and dependent variable.

 The true regression model is usually never known.
 The value of the random error term corresponding to observed data points remains unknown.
 Regression model can be estimated by calculating the parameters of the model for an observed
dataset.
 The main aim of regression is to estimate the parameters ß0 and ß1 from the sample.
 Once we find the optimum values for these two parameters , a line of best fit can be used to find
the values of Y given the values of X.

 Fit a line to find the relationship between input and output variables.
 The line is used to predict the output of unseen inputs.

 ß0 and ß1 values are estimated using the Ordinary Least Squares(OLS).
 The main goal is to minimize the distance from the black dots to the red line as close to zero as
possible.
 It is done by minimizing the squared distances between actual and predicted outcomes.
 The difference between actual and predicted value is called the residual(e) .
 It can be negative or positive depending on whether the model overpredicted or underpredicted
the outcome.
 To calculate the net error , adding all the residuals can lead to the cancellation of terms and
reduction of net effect.

 To avoid this , take the sum of squares of error terms and it is called the residual sum of
squares(RSS).
 The ordinary least squares method(OLS) method reduces the residual sum of squares(RSS) .
 Its objective is to fit a regression line that would minimize the regression line from the observed
values to the predicted values(the regression line).

Different Kinds Of Relationship:
 Positive Relationship: When the regression line between two variables moves in the same
direction with an upward slope , then the variables are said to be positively correlated.
 If we increase the value of x(independent variable) , then we will see an increase in the
dependent variable.
 Negative Relationship: When the regression line between two variables moves in the same
direction with a downward slope , then the variables are said to be in a negative relationship.
 If we increase the value of independent variable(x) , we will see a decrease in the depenedent
variable(y).
Different Kinds Of Relationship:
 No Relationship: If the best fit line is flat , then we can say that there is no relationship between
the variables.
 The dependent variable won’t change by increasing or decreasing the independent variable.
Linear Regression Relationship:
 Covariance: This paramter tell us the direction of relationship between x and y .
 It doesn’t tell anything about how positive or negative a relationship is.
 If the covariance value is negative , if the independent variable increases , then the dependent
variable decreases.
 Correlation: It is a statistical measure that tell us the direction of relationship as well as the
strength of relationship.
Applications:
 Predicting advertising expenses.
 Medical diagnosis.
 Agricultural research.
Advantages and Disadvantages:
It performs well for linearly separable data.
It is easier to implement , interpret and training can be done in a faster
way.
Disadvantages:
The assumption of linearity between independent and dependent variables.
It is prone to noise and overfitting.
Regression:
 A regression problem is one when the output variable is a continuous value , such as “salary” or
“weight”.
 Linear regression is a statistical method of finding the relationship between the independent and
dependent variable.
Regression:
 This regression is a technique where the correct data is given and we need to find the correlation
between the data.

 In a regression problem , it always predicts a real value or continuous value as the input.
 This example is used to predict the salary (dependent variable y) of a person based on the
independent variable(x) are given.

Regression:
First , we need to find the independent variable(values which are used to
predict the dependent variable) and dependent(value which is to be

predicted) variable from the dataset .
We need to fit those variables in the linear regression cost function.
The cost function is used to measure the performance of the machine
learning model for the given data.

Regression:
A regression plot is being plotted and when a new value comes in(year) ,
the salary of the person can be predicted with the help of regression
model.
Only one independent variable is taken and it is also called as linear
regression with one variable or univariate linear regression.

Cost Function Of Linear Regression:
 The linear function equation is the cost function for this simple linear regression.
 ‘x’ is used to denote the input variable(years of experience) .
 They are also called as input features.
 y is used to denote the “output” or target variable .
 y is nothing but the predictor variable(salary).
 When the target variable we are trying to predict is continuous , the learning problem is called as
a regression problem.
 Theta 0 and theta 1 are the parameters of the model are the parameters of the model .
 X is the independent variable.
 Theta 0 and theta 1 values must be chosen such that h(x) is close to y.
 Linear Regression algorithm aims to solve a minimization problem.
 The difference between h(x) and y should be small.
 Use the notation (x(i),y(i)) to denote the ith training example.
 Sum over the training set , i=1 to m(training examples) , of the squared difference between them
and this is the prediction of the hypothesis.

The accuracy of the hypothesis function can be measured by using the cost
function.
It takes an average difference of all the results of hypothesis with inputs
from x’s and the actual outputs y’s.

 To break it apart , it is 1/2x-,
 X- is the mean of the squares of h(theta) (x{i})-y{i} .
 It is nothing but the difference between the predicted value and the actual value.
 This function is called as “Mean Squared Error”.
 This is the cost function.

First , assign some random values to theta0,theta1 and then find h(x).
 In the above plot , the curved graph is drawn flat.
 The 3D drawing is plotted as 2D.
 We have to find the min. Value of J(theta0,theta1) that is the small oval(global optimum).
 From the contour plot , some method like OLS method is used to find the min of
J(theta0,theta1)
 The corresponding values of theta0 and theta1 is taken for h(x).
 The regression line is plotted to that data and this is the cost function.
Ordinary Least Squares(OLS):
 We need to find the best fit line to the dataset.
 In order to find the best fit line , we need to use the OLS method:
 Y = mx+b.
 M – slope,
 X – independent variable.
 B – intercept.
 OLS method is used to find the best line intercept:

 So our regression value m(theta1)=9449.96232 and b(theta0)=25792.2002.So ŷ =
9449.96232X + 25792.2002 this is our regression line.
Gradient Descent:
 It is an efficient function to find out the min .values of J(theta0,theta1).
 This method is not only used in linear regression but it is also employed in other machine learning
algorithms.
 First , the process is started with some random values of theta0, theta1 and the values of theta0 and
theta1 will be changed to reduce J(theta0,theta1).

 This step is done repeatedly until we end to the min . Value.
 If we start at a point , the gradient – descent algorithm will take small steps in order to find the
local minimum.
 This is an important property of gradient descent .
 If we start at a different point , it may find out a different local minimum.

Gradient Descent:
Following is an equation of gradient descent algorithm ,
Gradient Descent:
 J = 0,1 -> It denotes the feature index number.
 Alpha-> Learning rate.
 Next alpha->Partial derivative of theta j.
 At each iteration j , one should simultaneously update the parameters theta_1, theta_2 ,….,
theta_n.
 This parameter should be updated properly in order to get the correct implementation of the
gradient descent.
Gradient Descent:
Gradient Descent:
Gradient Descent:
• Consider the partial derivative term and theta1.
• J(theta1) is nothing but the slope of that point theta1.
• This derivative term is used to find the slope of thetaj(j =0,1).

Gradient Descent:
Gradient Descent:
If the learning rate is too small, then gradient descent will take small steps
and it will take more time to find a min value.
If the learning rate is too large , it will take a huge step and if the value is
near to minimum but the learning rate is too high , it will fail to converge
or even diverge.
Multiple Linear Regression:
 It uses several explanatory variables in order to predict the outcome of a response variable.
 It uses several explanatory variables in order to predict the outcome of a response variable.
 The main aim of the multiple linear regression model is to model the relationship between the
independent variable and the response variable.

 Multiple linear regression is an extension of OLS regression because it involves more than one
explanatory variable.
 MLR uses several explanatory variables in order to predict the outcome of a response variable.
 It is used in econometrics and financial inference.

Why Multiple Linear Regression:(MLR)
 This type of algorithm is useful in such situations when the number of variables is small.
 This algorithm is used in finding the correlation between the dependent and independent
variables.
Multiple Linear Regression(MLR):
 y = m1.x + m2.z+ c
 y is the dependent variable, that is, the variable that needs to be predicted.
x is the first independent variable. It is the first input.

 m1 is the slope of x1. It lets us know the angle of the line (x).
z is the second independent variable. It is the second input.

m2 is the slope of z. It helps us to know the angle of the line (z).
c is the intercept. A constant that finds the value of y when x and z are 0.
Steps Of Multivariate Regression Analysis:
 Feature Selection:
 It is an important step in multivariate regression.
 This step is essential in order to pick important features for model building.
 Normalizing Features:
 The features should be scaled as it maintains general distribution and ratios in the data.
 Loss Function Selection and Hypothesis:
 The loss function predicts if there is an error.
 Hypothesis is the predicted value from the feature.

Steps Of Multivariate Regression Analysis:
 Minimize the loss function:
 The loss function should be minimized using a loss minimization algorithm on the dataset.
 Gradient descent is one of the commonly used algorithms for loss minimization.
 Test the Hypothesis function:
 The hypothesis function should be checked as it is predicting values.

Cost Function:
 It is nothing but the sum of the square of the difference between the predicted value and the
actual value divided by twice the length of the dataset.

Multiple Linear Regression (MLR):
 It is a form of linear regression used when there are two or more
predictor or independent variables.

 It includes some additional predictors.
Multiple Linear Regression (MLR):
 The above equation is an extension of simple linear regression one.
 Here , each input has the corresponding slope coefficient (ß).
 ß0 is the intercept constant and is the value of y in the absence of all predictors(when all x terms
are zero).
 As the number of features grow , the complexity of the model increases.
 It becomes more difficult to visualize our data.
 As there are more parameters in these models , we should be more careful while working with
them.
 If we add more terms , it will improve the fit to the data.
 This is dangerous because it leads to a model that fits the data but doesn’t mean anything useful.
Example:
 The advertising dataset consists of a sales of a product in 200 different markets .
 It contains advertising budgets for three different media : TV , radio and newspaper.
 Dataset is used to predict the amount of sales(dependent variable) based on TV , radio and
newspaper advertising budgets(independent variables).
 The formula is:
Example:
 The ß values are found in order to find the error function and fit the best line or hyperplane(depending on the number of
input variables).
 Load The Data and Describe the Data:
 Import the required libraries:
 import pandas as pd
 import numpy as np
 import seaborn as sns
 import matplotlib.pyplot as plt
 from sklearn.model_selection
 import train_test_split
 from sklearn.linear_model
 import LinearRegression
 from sklearn import metrics
 from sklearn.metrics import r2_score
 import statsmodels.api as sm
Example:
 Load the Dataset:
 df = pd.read_csv(“Advertising.csv”)
 Understand the Dataset and Describe it:

 df.head()
Example:
 Drop the first column unnamed since we don’t need them:
 df = df.drop([‘Unnamed: 0’], axis=1)
 df.info()
Example:
 Dataset contains four columns , 200 registers and no missing values.
 Visualize the relationship between independent and target variables.
 sns.pairplot(df)
Example:
 The relationship between TV and sales is very strong .
 There is some trend between radio and sales , the relationship between newspaper and sales is
non-existent.
 It can be verified numerically through a correlation map.
 mask = np.tril(df.corr())
 sns.heatmap(df.corr(), fmt=’.1g’, annot=True, cmap= ‘cool’, mask=mask)

Output:
Example:
 The strongest positive correlation happens between sales and TV .
 The relationship between sales and newspaper is close to 0.
 Select Features and Target Variable:
 Divide the variables into two sets: dependent(or target variable “y”) and
 Independent(or feature variable “X”).
 X = df.drop([‘sales’], axis=1)
 y = df[‘sales’]
Example:
 Split the Dataset:
 For understanding the model performance , the dataset is divided into training set and the testing set.
 By splitting the dataset into two separate sets , we can train the model using one set and test the
performance of the model using unseen data on the other one.

 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
 The dataset is splitted into 70%train and 30%test.
 The random_state parameter is used for initializing the internal random number generator.
 If the random state is set to 0 . We can compare the output over multiple runs of the code using the
same parameter.
Example:
 print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
 From the output , we can observe the following:
 2 datasets of 140 registers each, one with 3 independent variables and one with the target variable.
 It will be used for training and producing the linear regression model.
 2 datasets of 60 registers each , one with 3 independent variables and one with the target variable ,
that will be used for testing the performance of the linear regression model.
Build Model:
 mlr = LinearRegression()
 Train the Model:

 The training data is fitted to the model and it denotes the training part of the modelling process.
 After it is trained , the model can be used to make predictions.
 mlr.fit(X_train, y_train)
 mlr.intercept_
Example:
 Print the values of the coefficients ß:
 coeff_df = pd.DataFrame(mlr.coef_, X.columns, columns =[‘Coefficient’]) coeff_df.
Example:
 Sales value can be estimated based on different budget values of “TV” , “radio” and
“newspaper”.
Example:
 For example, if we determine a budget value of 50 for TV, 30 for radio and 10 for newspaper,
the estimated value of “sales” will be:
 example = [50, 30, 10]
 output = mlr.intercept_ + sum(example*mlr.coef_)
 output
Test Model:
 A test dataset is a dataset that is independent of the training dataset:
 This test dataset is the unseen data set for your model which will help
you have a better view of its ability to generalize:
 y_pred = mlr.predict(X_test)
Evaluate Performance:
 The quality of the model is estimated on how well the predictions match up against the actual
values of the testing dataset:

 print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred)) print(‘Mean
Squared Error:’, metrics.mean_squared_error(y_test, y_pred)) print(‘Root Mean Squared

Error:’, np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
 print(‘R Squared Score is:’, r2_score(y_test, y_pred))
Advantages and Disadvantages:
 This type of algorithmm help us to find the relationship between the various variables present in
the dataset.
 It helps us in understanding the relation between the independent and the dependent variables.
Disadvantages:
 They are a bit complex and require high levels of mathematical calculation.
 It is not easy to interpret .
 It contains some loss and error output which are not identical.
 They are not suitable for small datasets . They can be applied only on larger datasets.
Limitations:
 Mismeasurement: Factors might not be measured correctly.
 For example , aptitude is difficult to measure and there are well known problems with IQ tests.
 As a result , regression using IQ might not properly control for aptitude.
 Too limited a focus:
 A regression coefficient provides information about only about how small changes in one
variable relate to changes in other variable.

 For eg , it will show how a small change in education will affect the earnings but it will not
allow the researcher to generalize about the effect of large changes.

 Simple linear regression function allow us to make predictions about one variable based on the
information that is available about the other variable.

 Linear regression algorithm can only be used when one has two continuous variables – an
independent variable and a dependent variable.

 The independent variable is the parameter that is used to calculate the dependent variable.
 A multiple linear regression model can be extended to several explanatory variables.

 There is a linear relationship between the dependent variable and the independent variable.
 The independent variables are not highly correlated with each other.
 Yi observations are selected independently and randomly from the population.
 Residuals should be normally distributed with a mean of 0 and a variance of sigma.
 The coefficient of determination(R – Squared) – It is a statistical metric and it is used to measure
how much of the variation in the outcome can be explained by the variation in the independent
variables.
 R^2 itself cannot be used to identify which predictors should be included in the model and which
should be excluded.
 R^2 value can only vary between 0 and 1.
 The value 0 indicates that the value cannot be predicted by any of the independent variables.
 The value 1 indicates that the outcome can be predicted without error from the independent
variables.
 When we interpret the results of multiple regression , beta coefficients are valid while holding all
other variables constant.

 The output from a multiple regression can be displayed horizontally as an equation or it can be
displayed vertically in a table form.

How to Use Multiple Linear Regression:
 An analyst wants to know how the movement of market affects the price of
ExxonMobil(XOM).
 The linear equation will have the value of S and P.
 500 index as the independent variable or predictor and the price of XOM as the dependent
variable.
 There are various factors that affect the outcome of an event.
 The price movement of ExxonMobil , depends on just the performance of the overall market.
How to Use Multiple Linear Regression?
 There are other predictors such as price of oil , interest rates and the price movement of oil can
affect the price of XOM.

 They also affect the stock prices of other oil companies.
 In order to understand the relationship when two or more variables are present , multiple linear
regression is used.
 Multiple Linear Regression(MLR) is used to establish a mathematical relationship between
several random variables.

 This algorithm examines how multiple indepenedent variables are related to one dependent
variable.
 Once each of the independent factors has been determined to predict the dependent variable , the
information on multiple variables can be used to create an accurate prediction on the level of
effect they have on the outcome variable.
 The model creates a relationship in the form of a straight line that best approximates all the
individual data points.

 When we see the Multiple Linear Regression(MLR) equation above , we can see that:
 Yi = dependent variable – the price of XOM.
 Xi1 = interest rates.
 Xi2 = oil price.
 Xi3 = value of S and P 500 index.
 Xi4 = price of oil features.
 B0 = y-intercept at time 0.
 B1 = Regression coefficient . It measures the unit change in the dependent variable.
 When xi1 changes , the change in XOM price when interest rates changes.
 B2 – coefficient value that measures a unit change in the dependent variable when xi2 changes –
the change in XOM price when oil prices changes.

 The least squares estimates – B0,B1,B2 … Bp . These values are usually computed by statistical
software.
 Many different variables can be included in a regression model.
 Each independent variable is differentiated with a number – 1,2,3,4,…p.
 Multiple Regression model allows an analyst to predict an income based on the information
provided on multiple explanatory variables.

 Model is not perfectly accurate as each data point can differ slightly from the outcome predicted
by the model.
 The residual error , e is the difference between the actual outcome and the predicted outcome.
 It is included in the model to account for such slight variations.
 If the price of other variables are held constant , then the price of XOM will increase by 7.8% if
the price of oil in the markets increases by 1%.

 The model also shows that the price of XOM will decrease by 1.5% following a 1% rise in the
interest rates.
 R^2 indicates that 86.5% of the variations in the stock price of Exxon Mobil can be explained
by changes in the interest rate, oil price , oil futures and S and P 500 index.
Difference Between Linear and Multiple Regression:
 Ordinary Least squares (OLS) method compares the response of a dependent variable with
respect to some change in some explanatory variables.

 A dependent variable is rarely explained by only one variable.
 An analyst uses multiple regression .
 It attempts to explain a dependent variable using more than one independent variable.
 Multiple regressions can be linear and nonlinear.
 These regression algorithms are based on the assumption that there is a linear relationship
between the dependent and the independent variables.
Difference Between Linear and Multiple Regression:
 It is also based on the assumption that there is no correlation between the independent variables.
What makes Multiple Regression Multiple?
 A multiple regression considers the effect of more than one explanatory variable on some
outcome of interest.
 It evaluates the relative effect of these independent variables on the dependent variable and it
holds some other variables in the model as constants.

Advantages Of Multiple Regression Over Simple OLS
Regression:
 A dependent variable is rarely explained by only one variable.
 In case of Multiple Linear Regression , it attempts to explain a dependent variable by more than
one independent variable.

 The model assumes that there are no major correlations between the independent variables.
 Multiple Regression models are complex .
 It becomes even more complex when more variables are included in the model or when the size
of the data grows.

 To run multiple regression , we need to use specialized software functions within programs like
Excel.
How We Can Make Multiple Regressions To Be Linear:
 Multiple Linear Regression model calculates the best fit line .
 It minimizes the variances of each of the variables included as it relates to the dependent
variable.
 As it fits a line , it is considered as a linear model.
 There are other non-linear regression models and it involves multiple variables , such as logistic
Regression , quadratic Regression and probit models.

Application Of Multiple Linear Regression
Models In Finance:
Any econometric model that looks at more than one variable is
considered as multiple.
Factor models compare two or more factors to analyze the relationships
between the variables and the resulting performance.
Limitations:
 Omitted Variables:
 It is necessary to have a good theoretical model to suggest variables that explain the dependent variable.
 Various factors should be considered to explain the dependent variable while dealing with two-variable
regression.
 Reverse Causality:
 Many theoretical models predict bidirectional causality – a dependent variable can cause changes in one or
more explanatory variables.

 For instance , higher earnings may enable people to invest more in their education which in turn raises
their earnings.

Linear Regression

Uploaded by

Copyright:

Available Formats

Linear Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Regression

Uploaded by

Copyright:

Available Formats

Linear Regression

changes in the input variable.

 ß1 determines the slope of the estimated regression line.

and dependent variable.

the values of Y given the values of X.

 The line is used to predict the output of unseen inputs.

 It can be negative or positive depending on whether the model overpredicted or underpredicted

reduction of net effect.

values to the predicted values(the regression line).

 It doesn’t tell anything about how positive or negative a relationship is.

between the data.

independent variable(x) are given.

predict the dependent variable) and dependent(value which is to be

The cost function is used to measure the performance of the machine

learning model for the given data.

regression with one variable or univariate linear regression.

 ‘x’ is used to denote the input variable(years of experience) .

 They are also called as input features.

 y is used to denote the “output” or target variable .

 y is nothing but the predictor variable(salary).

 X is the independent variable.

 Linear Regression algorithm aims to solve a minimization problem.

 The difference between h(x) and y should be small.

 Use the notation (x(i),y(i)) to denote the ith training example.

and this is the prediction of the hypothesis.

from x’s and the actual outputs y’s.

 X- is the mean of the squares of h(theta) (x{i})-y{i} .

 This function is called as “Mean Squared Error”.

 This is the cost function.

 The 3D drawing is plotted as 2D.

 OLS method is used to find the best line intercept:

theta1 will be changed to reduce J(theta0,theta1).

 If we start at a different point , it may find out a different local minimum.

 Alpha-> Learning rate.

 Next alpha->Partial derivative of theta j.

• J(theta1) is nothing but the slope of that point theta1.

• This derivative term is used to find the slope of thetaj(j =0,1).

independent variable and the response variable.

 It is used in econometrics and financial inference.

x is the first independent variable. It is the first input.

z is the second independent variable. It is the second input.

 It is an important step in multivariate regression.

 Loss Function Selection and Hypothesis:

 The loss function predicts if there is an error.

 Hypothesis is the predicted value from the feature.

 Test the Hypothesis function:

 The hypothesis function should be checked as it is predicting values.

actual value divided by twice the length of the dataset.

 It is a form of linear regression used when there are two or more

predictor or independent variables.

 Here , each input has the corresponding slope coefficient (ß).

 It becomes more difficult to visualize our data.

 Understand the Dataset and Describe it:

 sns.heatmap(df.corr(), fmt=’.1g’, annot=True, cmap= ‘cool’, mask=mask)

 The relationship between sales and newspaper is close to 0.

 Select Features and Target Variable:

 Independent(or feature variable “X”).

performance of the model using unseen data on the other one.

 The dataset is splitted into 70%train and 30%test.