Linear Regression
Linear Regression
Linear Regression
Regression:
It is one of the most important statistical and machine learning tools.
It is defined as the parametric technique that allows us to take decisions based on data.
It allows us to make predictions based upon data by learning the relationship between input and
output variables.
The output variable dependent on the input variables are continuous valued real numbers.
Regression help us to understand how the value of output variable changes with respect to the
Random error describes the random component of the linear component between independent
The value of the random error term corresponding to observed data points remains unknown.
Regression model can be estimated by calculating the parameters of the model for an observed
dataset.
Simple Linear Regression:
The main aim of regression is to estimate the parameters ß0 and ß1 from the sample.
Once we find the optimum values for these two parameters , a line of best fit can be used to find
The main goal is to minimize the distance from the black dots to the red line as close to zero as
possible.
It is done by minimizing the squared distances between actual and predicted outcomes.
The difference between actual and predicted value is called the residual(e) .
the outcome.
To calculate the net error , adding all the residuals can lead to the cancellation of terms and
Its objective is to fit a regression line that would minimize the regression line from the observed
direction with an upward slope , then the variables are said to be positively correlated.
If we increase the value of x(independent variable) , then we will see an increase in the
dependent variable.
Negative Relationship: When the regression line between two variables moves in the same
direction with a downward slope , then the variables are said to be in a negative relationship.
If we increase the value of independent variable(x) , we will see a decrease in the depenedent
variable(y).
Different Kinds Of Relationship:
No Relationship: If the best fit line is flat , then we can say that there is no relationship between
the variables.
The dependent variable won’t change by increasing or decreasing the independent variable.
Linear Regression Relationship:
Covariance: This paramter tell us the direction of relationship between x and y .
If the covariance value is negative , if the independent variable increases , then the dependent
variable decreases.
Correlation: It is a statistical measure that tell us the direction of relationship as well as the
strength of relationship.
Applications:
Predicting advertising expenses.
Medical diagnosis.
Agricultural research.
Advantages and Disadvantages:
It performs well for linearly separable data.
It is easier to implement , interpret and training can be done in a faster
way.
Disadvantages:
The assumption of linearity between independent and dependent variables.
It is prone to noise and overfitting.
Regression:
A regression problem is one when the output variable is a continuous value , such as “salary” or
“weight”.
Linear regression is a statistical method of finding the relationship between the independent and
dependent variable.
Regression:
This regression is a technique where the correct data is given and we need to find the correlation
This example is used to predict the salary (dependent variable y) of a person based on the
the salary of the person can be predicted with the help of regression
model.
Only one independent variable is taken and it is also called as linear
When the target variable we are trying to predict is continuous , the learning problem is called as
a regression problem.
Cost Function Of Linear Regression:
Cost Function Of Linear Regression:
Theta 0 and theta 1 are the parameters of the model are the parameters of the model .
Theta 0 and theta 1 values must be chosen such that h(x) is close to y.
Sum over the training set , i=1 to m(training examples) , of the squared difference between them
function.
It takes an average difference of all the results of hypothesis with inputs
It is nothing but the difference between the predicted value and the actual value.
We have to find the min. Value of J(theta0,theta1) that is the small oval(global optimum).
From the contour plot , some method like OLS method is used to find the min of
J(theta0,theta1)
The corresponding values of theta0 and theta1 is taken for h(x).
The regression line is plotted to that data and this is the cost function.
Ordinary Least Squares(OLS):
We need to find the best fit line to the dataset.
In order to find the best fit line , we need to use the OLS method:
Y = mx+b.
M – slope,
X – independent variable.
B – intercept.
This method is not only used in linear regression but it is also employed in other machine learning
algorithms.
First , the process is started with some random values of theta0, theta1 and the values of theta0 and
If we start at a point , the gradient – descent algorithm will take small steps in order to find the
local minimum.
This is an important property of gradient descent .
At each iteration j , one should simultaneously update the parameters theta_1, theta_2 ,….,
theta_n.
This parameter should be updated properly in order to get the correct implementation of the
gradient descent.
Gradient Descent:
Gradient Descent:
Gradient Descent:
• Consider the partial derivative term and theta1.
It uses several explanatory variables in order to predict the outcome of a response variable.
The main aim of the multiple linear regression model is to model the relationship between the
explanatory variable.
MLR uses several explanatory variables in order to predict the outcome of a response variable.
y is the dependent variable, that is, the variable that needs to be predicted.
This step is essential in order to pick important features for model building.
Normalizing Features:
The features should be scaled as it maintains general distribution and ratios in the data.
The loss function should be minimized using a loss minimization algorithm on the dataset.
Gradient descent is one of the commonly used algorithms for loss minimization.
ß0 is the intercept constant and is the value of y in the absence of all predictors(when all x terms
are zero).
As the number of features grow , the complexity of the model increases.
As there are more parameters in these models , we should be more careful while working with
them.
If we add more terms , it will improve the fit to the data.
This is dangerous because it leads to a model that fits the data but doesn’t mean anything useful.
Example:
The advertising dataset consists of a sales of a product in 200 different markets .
It contains advertising budgets for three different media : TV , radio and newspaper.
Dataset is used to predict the amount of sales(dependent variable) based on TV , radio and
newspaper advertising budgets(independent variables).
The formula is:
Example:
The ß values are found in order to find the error function and fit the best line or hyperplane(depending on the number of
input variables).
Load The Data and Describe the Data:
Import the required libraries:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection
import train_test_split
from sklearn.linear_model
import LinearRegression
from sklearn import metrics
from sklearn.metrics import r2_score
import statsmodels.api as sm
Example:
Load the Dataset:
df = pd.read_csv(“Advertising.csv”)
sns.pairplot(df)
Example:
The relationship between TV and sales is very strong .
There is some trend between radio and sales , the relationship between newspaper and sales is
non-existent.
It can be verified numerically through a correlation map.
mask = np.tril(df.corr())
Divide the variables into two sets: dependent(or target variable “y”) and
X = df.drop([‘sales’], axis=1)
y = df[‘sales’]
Example:
Split the Dataset:
For understanding the model performance , the dataset is divided into training set and the testing set.
By splitting the dataset into two separate sets , we can train the model using one set and test the
The random_state parameter is used for initializing the internal random number generator.
If the random state is set to 0 . We can compare the output over multiple runs of the code using the
same parameter.
Example:
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
2 datasets of 140 registers each, one with 3 independent variables and one with the target variable.
It will be used for training and producing the linear regression model.
2 datasets of 60 registers each , one with 3 independent variables and one with the target variable ,
that will be used for testing the performance of the linear regression model.
Build Model:
mlr = LinearRegression()
mlr.fit(X_train, y_train)
mlr.intercept_
Example:
Print the values of the coefficients ß:
coeff_df = pd.DataFrame(mlr.coef_, X.columns, columns =[‘Coefficient’]) coeff_df.
Example:
Sales value can be estimated based on different budget values of “TV” , “radio” and
“newspaper”.
Example:
For example, if we determine a budget value of 50 for TV, 30 for radio and 10 for newspaper,
output
Test Model:
This test dataset is the unseen data set for your model which will help
y_pred = mlr.predict(X_test)
Evaluate Performance:
The quality of the model is estimated on how well the predictions match up against the actual
the dataset.
It helps us in understanding the relation between the independent and the dependent variables.
Disadvantages:
They are a bit complex and require high levels of mathematical calculation.
It contains some loss and error output which are not identical.
They are not suitable for small datasets . They can be applied only on larger datasets.
Limitations:
Mismeasurement: Factors might not be measured correctly.
For example , aptitude is difficult to measure and there are well known problems with IQ tests.
A regression coefficient provides information about only about how small changes in one
The independent variables are not highly correlated with each other.
how much of the variation in the outcome can be explained by the variation in the independent
variables.
Multiple Linear Regression:
R^2 itself cannot be used to identify which predictors should be included in the model and which
should be excluded.
R^2 value can only vary between 0 and 1.
The value 0 indicates that the value cannot be predicted by any of the independent variables.
The value 1 indicates that the outcome can be predicted without error from the independent
variables.
When we interpret the results of multiple regression , beta coefficients are valid while holding all
500 index as the independent variable or predictor and the price of XOM as the dependent
variable.
There are various factors that affect the outcome of an event.
The price movement of ExxonMobil , depends on just the performance of the overall market.
How to Use Multiple Linear Regression?
There are other predictors such as price of oil , interest rates and the price movement of oil can
In order to understand the relationship when two or more variables are present , multiple linear
regression is used.
How to Use Multiple Linear Regression?
Multiple Linear Regression(MLR) is used to establish a mathematical relationship between
variable.
Once each of the independent factors has been determined to predict the dependent variable , the
information on multiple variables can be used to create an accurate prediction on the level of
effect they have on the outcome variable.
The model creates a relationship in the form of a straight line that best approximates all the
software.
Many different variables can be included in a regression model.
Multiple Regression model allows an analyst to predict an income based on the information
by the model.
The residual error , e is the difference between the actual outcome and the predicted outcome.
If the price of other variables are held constant , then the price of XOM will increase by 7.8% if
interest rates.
How to Use Multiple Linear Regression?
R^2 indicates that 86.5% of the variations in the stock price of Exxon Mobil can be explained
by changes in the interest rate, oil price , oil futures and S and P 500 index.
Difference Between Linear and Multiple Regression:
Ordinary Least squares (OLS) method compares the response of a dependent variable with
It attempts to explain a dependent variable using more than one independent variable.
These regression algorithms are based on the assumption that there is a linear relationship
between the dependent and the independent variables.
Difference Between Linear and Multiple Regression:
It is also based on the assumption that there is no correlation between the independent variables.
What makes Multiple Regression Multiple?
A multiple regression considers the effect of more than one explanatory variable on some
outcome of interest.
It evaluates the relative effect of these independent variables on the dependent variable and it
In case of Multiple Linear Regression , it attempts to explain a dependent variable by more than
It becomes even more complex when more variables are included in the model or when the size
Excel.
How We Can Make Multiple Regressions To Be Linear:
Multiple Linear Regression model calculates the best fit line .
It minimizes the variances of each of the variables included as it relates to the dependent
variable.
As it fits a line , it is considered as a linear model.
There are other non-linear regression models and it involves multiple variables , such as logistic
It is necessary to have a good theoretical model to suggest variables that explain the dependent variable.
Various factors should be considered to explain the dependent variable while dealing with two-variable
regression.
Reverse Causality:
Many theoretical models predict bidirectional causality – a dependent variable can cause changes in one or
their earnings.