0% found this document useful (0 votes)

24 views4 pages

Exercise#8 Instructions Linear Regression Model

Uploaded by

laylaydeanne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

Exercise#8 Instructions Linear Regression Model

Uploaded by

laylaydeanne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Week b Interactive Exercise#7a

A: Linear Regression Model (estimated time 30 minutes)

In this exercise we will do the following:

 Build a linear regression model using:

o The ols method and the statsmodel.formula.api library
o The scikit-learn package

Pre-requisites:

1- Install Anoconda
2- We will be using a lot of Public datasets these datasets called 'Advertising.csv' . Download it
from the course shell

Steps for building a linear regression model:

1- Open your spider IDE

2- Load the 'Advertising.csv' file into a dataframe name the dataframe data_firstname_adv where
first name is your first name carry out the following activities:
a. Display the column names
b. Display the shape of the data frame i.e number of rows and number of columns
c. Display the main statistics of the data
d. Display the types of columns
e. Display the first five records

Following is the code, make sure you update the path to the correct path where you placed the
files and update the data frame name correctly:
# -*- coding: utf-8 -*-
"""
@author: viji
"""
import pandas as pd
import os
path = "C:/A_COMP309/data/Datasets for Predictive Modelling/Datasets for Predictive
Modelling with Python/Chapter 5"
filename = 'Advertising.csv'
fullpath = os.path.join(path,filename)
data_viji_adv = pd.read_csv(fullpath)
data_viji_adv.columns.values
data_viji_adv.shape
data_viji_adv.describe()
data_viji_adv.dtypes
data_viji_adv.head(5)
3- Let us check if there is a correlation between advertisement costs on TV and the resultant sales.
Remember the formula:

a. Use the numpy package to build a function to calculate the correlation between each
input variable TV,Radio & Newspaper and the output Sales
b. Run the below code snippet , you should get a result the following results:
0.782224424861606
0.5762225745710553
0.22829902637616525
Following is the code, make sure you update the path to the correct path where you placed the
files and the dataframe name.
import numpy as np
def corrcoeff(df,var1,var2):
df['corrn']=(df[var1]-np.mean(df[var1]))*(df[var2]-np.mean(df[var2]))
df['corrd1']=(df[var1]-np.mean(df[var1]))**2
df['corrd2']=(df[var2]-np.mean(df[var2]))**2
corrcoeffn=df.sum()['corrn']
corrcoeffd1=df.sum()['corrd1']
corrcoeffd2=df.sum()['corrd2']
corrcoeffd=np.sqrt(corrcoeffd1*corrcoeffd2)
corrcoeff=corrcoeffn/corrcoeffd
return corrcoeff
print(corrcoeff(data_viji_adv,'TV','Sales'))
print(corrcoeff(data_viji_adv,'Radio','Sales'))
print(corrcoeff(data_viji_adv,'Newspaper','Sales'))
4- Use the matplotlib module to visualize the relationships between each of the inputs and the
output (sales), i.e. generate three scattered plots.

Following is the code, make sure you update the path to the correct path where you placed the files
and use the correct dataframe name:

import matplotlib.pyplot as plt

plt.plot(data_viji_adv['TV'],data_viji_adv['Sales'],'ro')
plt.title('TV vs Sales')
plt.plot(data_viji_adv['Radio'],data_viji_adv['Sales'],'ro')
plt.title('Radio vs Sales')
plt.plot(data_viji_adv['Newspaper'],data_viji_adv['Sales'],'ro')
plt.title('Newspaper vs Sales')

4. Use the ols method and the statsmodel.formula.api library to build a linear regression model
with TV costs as the predictor (input) and sales as the predicted i.e. estimate the parameters of
the model. You should get the following results:
Intercept 7.032594
TV 0.047537
Following is the code, make sure you update the path to the correct path where you placed the
files and use the correct dataframe name:

import statsmodels.formula.api as smf

model1=smf.ols(formula='Sales~TV',data=data_viji_adv).fit()
model1.params
5- Generate the p-values and the R-squared and model summary, run the following lines of code

print(model1.pvalues)
print(model1.rsquared)
print(model1.summary())
6- Re-build the model with two predictors TV and Radio as input variables and print the
parameters, p-values, rsquared and summary. Then:
a. Create a new data frame with 2 new values for TV and Radio
b. Predict using the new values
c. Change the values and run the prediction again
d. Change the values again to two values already existing in the dataset and run the
prediction again
7- Based on the output our new formula is:

Following is the code, make sure you update the path to the correct path where you placed the
files and use the correct dataframe name:

import statsmodels.formula.api as smf

model3=smf.ols(formula='Sales~TV+Radio',data=data_viji_adv).fit()

print(model3.params)

print(model3.rsquared)

print(model3.summary())

## Predicte a new value

X_new2 = pd.DataFrame({'TV': [50],'Radio' : [40]})

# predict for a new observation

sales_pred2=model3.predict(X_new2)

print(sales_pred2)

Notice in this exercise we used all the data for training, this is not the best approach, it is better to
split the data randomly into test and train.

8- In this step we will build the model using scikit-learn package, this is the more commonly used
package to build data science projects. This method is more elegant as it has more in-built
methods to perform the regular processes associated with regression. Carry out the following:
a. Import the necessary modules
b. Split the dataset into 80% for training and 20% for testing
c. Print out the parameters
d. Test the model using the Train/Test

Following is the code, make sure you update the path to the correct path where you placed the files
and use the correct dataframe name:

#Better solution than the previous method- test and train split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
feature_cols = ['TV', 'Radio']
X = data_viji_adv[feature_cols]
Y = data_viji_adv['Sales']
trainX,testX,trainY,testY = train_test_split(X,Y, test_size = 0.2)
lm = LinearRegression()
lm.fit(trainX, trainY)
print (lm.intercept_)
print (lm.coef_)
zip(feature_cols, lm.coef_)
[('TV', 0.045706061219705982), ('Radio', 0.18667738715568111)]
lm.score(trainX, trainY)
lm.predict(testX)
9- Feature selection: using the scikit , in order to check which predictors are best as input variable
to the model run the following code sinpet and don’t forget to change the path name:

from sklearn.feature_selection import RFE

from sklearn.svm import SVR
feature_cols = ['TV', 'Radio','Newspaper']
X = data_viji_adv[feature_cols]
Y = data_viji_adv['Sales']
estimator = SVR(kernel="linear")
selector = RFE(estimator,2,step=1)
selector = selector.fit(X, Y)
print(selector.support_)
print(selector.ranking_)

E-Commerce Capstone Project Presentation
No ratings yet
E-Commerce Capstone Project Presentation
26 pages
Tax Analytics
No ratings yet
Tax Analytics
55 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
Excel For Data Analysis
No ratings yet
Excel For Data Analysis
14 pages
ML Combined
No ratings yet
ML Combined
254 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Unit 5
No ratings yet
Unit 5
171 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Means and Variance of The Sampling Distribution of Sample Means
No ratings yet
Means and Variance of The Sampling Distribution of Sample Means
19 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
ML LAB MANUAL (ACSML0651) - DR Roop Singh
No ratings yet
ML LAB MANUAL (ACSML0651) - DR Roop Singh
58 pages
ISLP - Website 135 200
No ratings yet
ISLP - Website 135 200
66 pages
ISLP - Website-135-200 (1) - 1-60
No ratings yet
ISLP - Website-135-200 (1) - 1-60
60 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
Sales
No ratings yet
Sales
7 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
67 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
66 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Albright DADM 6e - PPT - Ch03
No ratings yet
Albright DADM 6e - PPT - Ch03
58 pages
Tableau Certified Data Analyst: Beta Exam Guide
No ratings yet
Tableau Certified Data Analyst: Beta Exam Guide
16 pages
Business Research Methods: Multivariate Analysis
No ratings yet
Business Research Methods: Multivariate Analysis
34 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Lab5 MLR
No ratings yet
Lab5 MLR
12 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Regression Metrics
No ratings yet
Regression Metrics
26 pages
Effects of Digitalisation of Organisations On Internal Audit Activities and Practices (10150)
No ratings yet
Effects of Digitalisation of Organisations On Internal Audit Activities and Practices (10150)
18 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Research Proposal Iitm
No ratings yet
Research Proposal Iitm
24 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Forecasting: To Accompany
No ratings yet
Forecasting: To Accompany
61 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Stat Packages
No ratings yet
Stat Packages
50 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
An Introduction To Stadistical Learning-129-140-1-8
No ratings yet
An Introduction To Stadistical Learning-129-140-1-8
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
4.8 Slides - Example Melanoma Mortality (Count)
No ratings yet
4.8 Slides - Example Melanoma Mortality (Count)
12 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
Water Meter Assessment Project Results and Recommendations
No ratings yet
Water Meter Assessment Project Results and Recommendations
18 pages
Ex 1
No ratings yet
Ex 1
7 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
COMP 312 Chapter 1
No ratings yet
COMP 312 Chapter 1
13 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
LAB01
No ratings yet
LAB01
8 pages
Ds - Lab - 4.ipynb - Colab
No ratings yet
Ds - Lab - 4.ipynb - Colab
7 pages
OUTPUT Valid
No ratings yet
OUTPUT Valid
13 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Ds Lab 4.ipynb - TARUN
No ratings yet
Ds Lab 4.ipynb - TARUN
6 pages
Exp 1
No ratings yet
Exp 1
6 pages
Design and Analysis of Experiments
No ratings yet
Design and Analysis of Experiments
12 pages
Sales and Advertising
No ratings yet
Sales and Advertising
14 pages
Moving Average
No ratings yet
Moving Average
7 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
ml1 PRG
No ratings yet
ml1 PRG
2 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
11 pages
Saravanan CV March
No ratings yet
Saravanan CV March
2 pages
Lab - 8
No ratings yet
Lab - 8
7 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
Overview of "General Biology"
No ratings yet
Overview of "General Biology"
9 pages
Ei Samay Case
No ratings yet
Ei Samay Case
19 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
91 Ordlogistic
No ratings yet
91 Ordlogistic
4 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
ML Article Writing
No ratings yet
ML Article Writing
3 pages
Chapter 18
No ratings yet
Chapter 18
3 pages
BPPV Update Draft Public Comment
No ratings yet
BPPV Update Draft Public Comment
5 pages
IEE581 SUM22O Schedule 20220511
No ratings yet
IEE581 SUM22O Schedule 20220511
1 page
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet

Exercise#8 Instructions Linear Regression Model

Uploaded by

Exercise#8 Instructions Linear Regression Model

Uploaded by

Week b Interactive Exercise#7a

A: Linear Regression Model (estimated time 30 minutes)

 Build a linear regression model using:

Steps for building a linear regression model:

1- Open your spider IDE

import matplotlib.pyplot as plt

import statsmodels.formula.api as smf

import statsmodels.formula.api as smf

## Predicte a new value

X_new2 = pd.DataFrame({'TV': [50],'Radio' : [40]})

# predict for a new observation

from sklearn.feature_selection import RFE

You might also like