CSL0777 L16
CSL0777 L16
CSL0777 L16
Unit No. 2
Supervised learning Part-1
Lecture No. 16
Multiple Linear Regression
Mr. Praveen
Gupta
Assistant Professor,
CSA/SOET
Outlines
• Multiple Linear Regression
• Multiple Linear Regression Model
• Implementation of Multiple Linear Regression Algorithm
using Python
• References
Student Effective Learning Outcomes(SELO)
01: Ability to understand subject related concepts clearly along with
contemporary issues.
02: Ability to use updated tools, techniques and skills for effective domain
specific practices.
03: Understanding available tools and products and ability to use it effectively.
Multiple Linear
•In Regression
the previous lecture , wehave learned about
Simple Linear Regression, where a single
Independent/Predictor(X) variable is used to model
the response variable (Y).
•But there may be various cases in which the
response variable is affected by more than one
predictor variable; for such cases, the Multiple
Linear Regression algorithm is used.
•Moreover, Multiple Linear Regression is an
extension of Simple Linear regression as it takes
more than one predictor variable to predict the
response variable.
4 / 22
Multiple Linear
Regression
Example:
Prediction of CO2 emission based on engine size and
number of cylinders in a car.
Some key points about MLR:
• For MLR, the dependent or target variable(Y) must
be the continuous/real, but the predictor or
independent variable may be of continuous or
categorical form.
• Each feature variable must model the linear
relationship with the dependent variable.
•MLR tries to fit a regression line through a
multidimensional space of data-points.
4 / 22
Multiple Linear Regression
Model
The Multiple Linear Regression model can be represented
using the below equation:
y = b 1x 1 + b 2x 2 + … + b n x n
Where,
Y= Output/Response variable
b0, b1, b2, b3 , bn....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable
4 / 22
Assumptions for Multiple Linear Regression
•A linear relationship should exist between
the Target and predictor variables.
•The regression residuals must be normally
distributed.
•MLR assumes little or no
multicollinearity (correlation between the
independent variable) in data.
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
•We have a dataset of 50 start-up companies. This dataset
contains five main information: R&D Spend, Administration
Spend, Marketing Spend, State, and Profit for a financial
year.
•Our goal is to create a model that can easily determine
which company has a maximum profit, and which is the
most affecting factor for the profit of a company.
•Since we need to find the Profit, so it is the dependent
variable, and the other four variables are independent
variables.
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
To implement the Multiple Linear regression model in
machine learning using Python, we need to follow the below
steps:
Step-1: Data Pre-processing
The first step for creating the Simple Linear Regression
model is data pre-processing.
•First, we will import the three important libraries, which will
help us for loading the dataset, plotting the graphs, and
creating the Simple Linear Regression model.
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
Next, we will load the dataset into our code:
data_set= pd.read_csv('50_CompList.csv')
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
• After that, we need to extract the dependent and
independent variables from the given dataset.
•In previous output, we can clearly see that there are five
variables, in which four variables are continuous and one is
categorical variable.. Below is code for it:
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 4].values
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
Encoding Dummy Variables:
As we have one categorical variable (State), which cannot be
directly applied to the model, so we will encode it.
To encode the categorical variable into numbers, we will use
the LabelEncoder class.
But it is not sufficient because it still has some relational
order, which may create a wrong model.
So in order to remove this problem, we will
use OneHotEncoder, which will create the dummy
variables.
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
Encoding Dummy Variables:
#Catgorical data
from sklearn.preprocessing import LabelEncoder, OneHotEn
coder
labelencoder_x= LabelEncoder()
x[:, 3]= labelencoder_x.fit_transform(x[:,3])
onehotencoder= OneHotEncoder(categorical_features= [3])
x= onehotencoder.fit_transform(x).toarray()
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
•We should not use all the dummy variables at the same
time, so it must be 1 less than the total number of dummy
variables, else it will create a dummy variable trap.
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
Now we will split the dataset into training and test set. The
code for this is given below:
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size
= 0.2, random_state=0)
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
• Training Dataset:
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
Step: 2- Fitting our MLR model to the Training set:
Now, we have well prepared our dataset in order to provide
training, which means we will fit our regression model to the
training set. It will be similar to as we did in Simple Linear
Regression model. The code for this will be:
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
Step: 3. Prediction of test set result:
•The last step for our model is checking the performance of
the model. We will do it by predicting the test set result. For
prediction, we will create a y_pred vector.
#Predicting the Test set result;
y_pred= regressor.predict(x_test)
•By executing the above lines of code, a new vector will be
generated under the variable explorer option. We can test
our model by comparing the predicted values and test set
values.
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
4 / 22
Implementation of Multiple Linear Regression
Algorithm using Python
We can also check the score for training dataset and test
dataset. Below is the code for it:
print('Train Score: ', regressor.score(x_train, y_train))
print('Test Score: ', regressor.score(x_test, y_test))
4 / 22
Learning Outcomes
using Python
References