0% found this document useful (0 votes)

34 views

Tutorial 4 - Jupyter Notebook

Uploaded by

fubbyubby

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Tutorial 4 - Jupyter Notebook

Uploaded by

fubbyubby

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

Tutorial 4 - Multiple Linear Regression

The brake horsepower developed by an automobile engine on a dynamometer is thought to be a
function of the engine speed in revolutions per minute (rpm), the road octane number of the fuel,
and the engine compression. An experiment is run and the data can be found in "Tutorial 4
data.xlsx"

In [31]: #libraries
import numpy as np
import matplotlib.pyplot as plt
from tabulate import tabulate
import pandas as pd
import scipy

Part 1 - Multiple Linear Regression on Dataset

Below, the data has been imported and separated into independent (x) and dependent variables
(y).

Create the data matrix X.

First create a vector of ones using np.ones((rows,columns)).

Next combine that vector with x using np.hstack((vector,x)). This built in function stacks your two
matricies horizontally as long as they have the same number of rows. Print the result to verify your
data matrix.

In [15]: df = pd.read_excel("Tutorial 4 data.xlsx").to_numpy()

#if your dataset begins in row 1, the titles become headers and the dataset
x=df[0:12,2:5]
y=df[0:12,1]

#make a vector of ones and then the data matrix X

xvectors=np.ones((12,1))
xvector=np.hstack((xvectors,x))

Solve for the parameters of the multiple regression model and print the result.

𝛽 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
Transpose of matrix X: X.T
Multply matrices together: X@y
inverse of a matrix: np.linalg.inv()

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 1/7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

In [16]: #solve for beta and print the values here

beta=np.linalg.inv(xvector.T@xvector)@xvector.T@y
print(beta)

[-2.66031212e+02 1.07132079e-02 3.13480626e+00 1.86740943e+00]

Part 2 - Resiudal Analysis

Use the parameters found in part 1 to solve for 𝑦 ̂

𝑦 ̂ = 𝑋𝛽
Next, solve for the residuals and plot your results on subplots.
𝑒 = 𝑦 − 𝑦̂

The first subplot is setup for you below. We want to plot the residuals vs the independent and
dependent variables (4 total). Therefore, we make a matrix of 2 X 2. The layout='constained'
provides space to label each axis.

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 2/7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

In [18]: # solve for yhat and residuals here

yhat=xvector@beta
residuals=y-yhat
print(yhat)
print(residuals)

#this is code for making the subplot and plotting the first residual plot.
fig,axs=plt.subplots(2,2,layout='constrained')
axs[0,0].scatter(x[:,0],residuals)
axs[0,0].set(xlabel='rpm',ylabel='residuals')

axs[0,1].scatter(x[:,1],residuals)
axs[0,1].set(xlabel='Road Octane Number',ylabel='residuals')

axs[1,0].scatter(x[:,2],residuals)
axs[1,0].set(xlabel='Compression',ylabel='residuals')

axs[1,1].scatter(y,residuals)
axs[1,1].set(xlabel='Break Horsepower',ylabel='residuals')
plt.show()

[224.26871063 225.32824692 240.95847559 218.86255836 207.44420245

267.10824644 243.78632464 237.12456005 235.90671156 221.12574828
222.12606905 233.96014603]
[ 0.73128937 -13.32824692 -11.95847559 3.13744164 11.55579755
10.89175356 2.21367536 -0.12456005 -2.90671156 2.87425172
0.87393095 -3.96014603]

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 3/7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

Are there any patterns in the residuals? Are there any outliers? Is there a better plot we can use to
determine this?

YES! Calculate the standardized residuals and determine if there are any outliers.

𝑒𝑖
𝑑𝑖 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯
√𝑀𝑆𝐸

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 4/7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

In [25]: #Calculate standardized residuals here

SSE=np.sum((y-yhat)**2)
print(SSE)
MSE=SSE/(12-4)
print(MSE)
sqrtMSE=np.sqrt(MSE)
print(sqrtMSE)
di=residuals/sqrtMSE
print(di)

#You can copy your subplot from a previous section and quickly modify it to
fig,axs=plt.subplots(2,2,layout='constrained')
axs[0,0].scatter(x[:,0],di)
axs[0,0].set(xlabel='rpm',ylabel='Standardized Residuals')

axs[0,1].scatter(x[:,1],di)
axs[0,1].set(xlabel='Road Octane Number',ylabel='Standardized Residuals')

axs[1,0].scatter(x[:,2],di)
axs[1,0].set(xlabel='Compression',ylabel='Standardized Residuals')

axs[1,1].scatter(y,di)
axs[1,1].set(xlabel='Break Horsepower',ylabel='Standardized Residuals')
plt.show()

621.2650617774844
77.65813272218556
8.812385189163349
[ 0.08298427 -1.51244489 -1.35700782 0.35602638 1.31131326 1.23595977
0.25120048 -0.01413466 -0.32984391 0.32616047 0.09917076 -0.44938413]

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 5/7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

Based on the figures above, there are no outliers since no observe has
|𝑑 𝑖 | > 3

Part 3 - Model Analysis

In [27]: ybar=(np.mean(y))
SSR=np.sum((yhat-ybar)**2)
MSR=SSR/(4-1)
F=MSR/MSE
print(F)

table=[["Regression",3,SSR,MSR,F],["Residual Error",8,SSE,MSE],["Total",11
col2names=['DF','SS','MS','F']
print(tabulate(table,headers=col2names))

11.11596363636129
DF SS MS F
-------------- ---- -------- -------- ------
Regression 3 2589.73 863.245 11.116
Residual Error 8 621.265 77.6581
Total 11 3211

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 6/7

5/30/24, 3:17 PM Tutorial 4 - Jupyter Notebook

Calculate the p-value for your ANOVA Table and print the result below. You can solve for a p-value
using the built in function

p_value=1-scipy.stats.f.cdf(#F value, DOF in numerator, DOF in denominator)

In [33]: #solve for the p-value here

p_value=1-(scipy.stats.f.cdf(F,3,8))
print(p_value)

0.0031699790971878583

What is your conclusion from your ANOVA Table and p-value?

At this point, have you determined that each regressor variable is significant in the model?

In [ ]: #The p-value is highly significant and from the ANOVA table we understand t
#each regressor variable is in fact significant at a 95% confidence level??

localhost:8888/notebooks/Desktop/Tutorial 4 .ipynb 7/7

Review: Normal Distribution
No ratings yet
Review: Normal Distribution
46 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Lab3 Report Revathy
No ratings yet
Lab3 Report Revathy
8 pages
Simple Linear Regression With Jupyter Notebook: Dr. Alvin Ang
No ratings yet
Simple Linear Regression With Jupyter Notebook: Dr. Alvin Ang
16 pages
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter Kattan
No ratings yet
Lab 4
No ratings yet
Lab 4
7 pages
LinearRegression HandsOn
No ratings yet
LinearRegression HandsOn
3 pages
Assignment7
No ratings yet
Assignment7
4 pages
Linear Regression
100% (1)
Linear Regression
16 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Shivam Batra (19BPS1131) 21/01/2022: List
No ratings yet
Shivam Batra (19BPS1131) 21/01/2022: List
5 pages
MATLAB for Beginners: A Gentle Approach - Revised Edition
From Everand
MATLAB for Beginners: A Gentle Approach - Revised Edition
Peter I. Kattan
3.5/5 (11)
Expt3.ipynb - JupyterLab
No ratings yet
Expt3.ipynb - JupyterLab
5 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
En Tanagra Python StatsModels PDF
No ratings yet
En Tanagra Python StatsModels PDF
20 pages
Lab 8
No ratings yet
Lab 8
13 pages
Submission_template_513_E_div
No ratings yet
Submission_template_513_E_div
53 pages
Report For Numerical Techniques Assignments
No ratings yet
Report For Numerical Techniques Assignments
18 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Lab 4
No ratings yet
Lab 4
21 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Exercises 2 Unfinished
No ratings yet
Exercises 2 Unfinished
8 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Assignment 4: Computer Programming & Applications (ME-214)
No ratings yet
Assignment 4: Computer Programming & Applications (ME-214)
6 pages
Assignment 2 ML
No ratings yet
Assignment 2 ML
11 pages
Implementation of Simple Linear Regression Algorithm Using Python
No ratings yet
Implementation of Simple Linear Regression Algorithm Using Python
12 pages
Colab 2.2
No ratings yet
Colab 2.2
5 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
External
No ratings yet
External
11 pages
vertopal.com_Lab_Linear_Regression
No ratings yet
vertopal.com_Lab_Linear_Regression
21 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Applied Linear Regression
No ratings yet
Applied Linear Regression
13 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Lab 5
No ratings yet
Lab 5
6 pages
228371_Lecture_Notes_Week_3
No ratings yet
228371_Lecture_Notes_Week_3
61 pages
Lab4 MultipleLinearRegression
No ratings yet
Lab4 MultipleLinearRegression
7 pages
CODE
No ratings yet
CODE
15 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
Section 2
No ratings yet
Section 2
22 pages
Applied Linear Regression
No ratings yet
Applied Linear Regression
6 pages
Regression Practice - MLR
No ratings yet
Regression Practice - MLR
9 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
Agniva
No ratings yet
Agniva
16 pages
HW12
No ratings yet
HW12
10 pages
Regression Model
No ratings yet
Regression Model
6 pages
Beyond Syllabus:Implement Linear Regression Technique On Boston Houses Dataset in Python
No ratings yet
Beyond Syllabus:Implement Linear Regression Technique On Boston Houses Dataset in Python
2 pages
ML Lab 04 Manual - Pandas and MatplotLib
No ratings yet
ML Lab 04 Manual - Pandas and MatplotLib
7 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Lecture 4
No ratings yet
Lecture 4
12 pages
Ayushi Patel A044 R Software
No ratings yet
Ayushi Patel A044 R Software
8 pages
Lab Experiment No. 3 Part A Part B Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
No ratings yet
Lab Experiment No. 3 Part A Part B Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
5 pages
DAY 6 MLR Case Studies
No ratings yet
DAY 6 MLR Case Studies
24 pages
Inverse Matrix
No ratings yet
Inverse Matrix
2 pages
Machine Learning Lab File (BTCS619-18)
No ratings yet
Machine Learning Lab File (BTCS619-18)
50 pages
A Level Statistics Paper 1
No ratings yet
A Level Statistics Paper 1
5 pages
Describing Group Performances
No ratings yet
Describing Group Performances
40 pages
Regression Modeling Strategies: Frank E. Harrell, JR
50% (2)
Regression Modeling Strategies: Frank E. Harrell, JR
11 pages
Interpretation of Anova Table
No ratings yet
Interpretation of Anova Table
3 pages
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank - Available For Instant Download And Reading
100% (3)
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank - Available For Instant Download And Reading
43 pages
Formula Sheet
No ratings yet
Formula Sheet
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Pengaruh Fasilitas Pengembangan SDM
No ratings yet
Pengaruh Fasilitas Pengembangan SDM
13 pages
Final S03
No ratings yet
Final S03
7 pages
22.2 Binomial Distribution
No ratings yet
22.2 Binomial Distribution
2 pages
S1 Data Presentation & Interpretation 1 QP
No ratings yet
S1 Data Presentation & Interpretation 1 QP
2 pages
A Step by Step Guide To Learning SAS
No ratings yet
A Step by Step Guide To Learning SAS
49 pages
PNS Notes
No ratings yet
PNS Notes
351 pages
CNHS CAR EXAM 3rd Q. 2018-19 Ans Key
No ratings yet
CNHS CAR EXAM 3rd Q. 2018-19 Ans Key
7 pages
Sampling Plan
100% (3)
Sampling Plan
2 pages
STA108 July 2022 (ODL5)
No ratings yet
STA108 July 2022 (ODL5)
9 pages
7280-Article Text-27228-11-10-20221105
No ratings yet
7280-Article Text-27228-11-10-20221105
17 pages
Universiti Tunku Abdul Rahman Faculty of Business and Finace ACADEMIC YEAR 2022/2023
No ratings yet
Universiti Tunku Abdul Rahman Faculty of Business and Finace ACADEMIC YEAR 2022/2023
3 pages
Lecture 07 - Large and Small Estimation
No ratings yet
Lecture 07 - Large and Small Estimation
44 pages
Correlation Analysis
No ratings yet
Correlation Analysis
7 pages
Brock University Goodman School of Business Mbab 5P02: Quantitative Analysis (Sections 1 & 2)
No ratings yet
Brock University Goodman School of Business Mbab 5P02: Quantitative Analysis (Sections 1 & 2)
4 pages
DMlab - FilE prINCE
No ratings yet
DMlab - FilE prINCE
27 pages
Practice Questions Midterm 1 Blank
No ratings yet
Practice Questions Midterm 1 Blank
11 pages
Chi Squared Tests Applied To Ecology
No ratings yet
Chi Squared Tests Applied To Ecology
3 pages
9709 s05 QP 7
No ratings yet
9709 s05 QP 7
4 pages
Part-I Central Tendency and Dispersion: Unit-3 Basic Statistics
No ratings yet
Part-I Central Tendency and Dispersion: Unit-3 Basic Statistics
32 pages
Data Analysis
No ratings yet
Data Analysis
12 pages
Nonparametric Methods: Analysis of Ordinal Data
No ratings yet
Nonparametric Methods: Analysis of Ordinal Data
38 pages
SDST1303
No ratings yet
SDST1303
10 pages