Tutorial 4 - Jupyter Notebook
Tutorial 4 - Jupyter Notebook
In [31]: #libraries
import numpy as np
import matplotlib.pyplot as plt
from tabulate import tabulate
import pandas as pd
import scipy
Next combine that vector with x using np.hstack((vector,x)). This built in function stacks your two
matricies horizontally as long as they have the same number of rows. Print the result to verify your
data matrix.
Solve for the parameters of the multiple regression model and print the result.
𝛽 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦
Transpose of matrix X: X.T
Multply matrices together: X@y
inverse of a matrix: np.linalg.inv()
𝑦 ̂ = 𝑋𝛽
Next, solve for the residuals and plot your results on subplots.
𝑒 = 𝑦 − 𝑦̂
The first subplot is setup for you below. We want to plot the residuals vs the independent and
dependent variables (4 total). Therefore, we make a matrix of 2 X 2. The layout='constained'
provides space to label each axis.
Are there any patterns in the residuals? Are there any outliers? Is there a better plot we can use to
determine this?
YES! Calculate the standardized residuals and determine if there are any outliers.
𝑒𝑖
𝑑𝑖 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯
√𝑀𝑆𝐸
621.2650617774844
77.65813272218556
8.812385189163349
[ 0.08298427 -1.51244489 -1.35700782 0.35602638 1.31131326 1.23595977
0.25120048 -0.01413466 -0.32984391 0.32616047 0.09917076 -0.44938413]
Based on the figures above, there are no outliers since no observe has
|𝑑 𝑖 | > 3
11.11596363636129
DF SS MS F
-------------- ---- -------- -------- ------
Regression 3 2589.73 863.245 11.116
Residual Error 8 621.265 77.6581
Total 11 3211
Calculate the p-value for your ANOVA Table and print the result below. You can solve for a p-value
using the built in function
0.0031699790971878583
At this point, have you determined that each regressor variable is significant in the model?
In [ ]: #The p-value is highly significant and from the ANOVA table we understand t
#each regressor variable is in fact significant at a 95% confidence level??