0% found this document useful (0 votes)

4 views21 pages

vertopal.com_Lab_Linear_Regression

This document covers linear regression techniques using Python, specifically focusing on the Boston housing dataset. It includes sections on simple and multiple linear regression, interaction terms, and diagnostic plots, utilizing libraries such as statsmodels and matplotlib for analysis and visualization. Key results from regression analyses, including coefficients and statistical significance, are presented throughout the document.

Uploaded by

tedxitu2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views21 pages

vertopal.com_Lab_Linear_Regression

Uploaded by

tedxitu2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

ISLR- Python: Chapter 3 -- Lab Linear

Regression
• Load Datasets
• 3.6.2 Simple Linear Regression
• 3.6.3 Multiple Linear Regression
• 3.6.4 Interaction Terms
• 3.6.5 Non-linear Transformations of the Predictors
• 3.6.6 Qualitative Predictors
## perform imports and set-up
import numpy as np
import pandas as pd
import statsmodels.api as sm

from scipy import stats

from matplotlib import pyplot as plt
from sklearn.datasets import load_boston # boston data set is part of
sklearn
from statsmodels.sandbox.regression.predstd import wls_prediction_std
from mpl_toolkits.mplot3d import Axes3D

%matplotlib inline
plt.style.use('ggplot') # emulate pretty r-style plots

Load Datasets
# Load Boston housing data set
boston = load_boston()

#Transform the data into a dataframe for analysis¶

# combine the predictors and responses for a dataframe

predictors = boston.data
response = boston.target
boston_data = np.column_stack([predictors,response])

# now get the column names of the data frame

col_names = np.append(boston.feature_names, 'MEDV')

# create the data frame

boston_df = pd.DataFrame(boston_data, columns = col_names)
boston_df.head()

# Load the carseats data set

carseats_df = pd.read_csv('../data/Carseats.csv',index_col = 0)
3.6.2 Simple Linear Regression
We will regress LSTAT (% of households with low economic status) onto the MEDV
(median home values). We will do this using two methods -- scipy.linregress and
statsmodels. We will look at several diagnostics that describe the quality of the fit.

Regress LSTAT onto MEDV (Scipy)

# Create a figure to plot our data and OLS estimate.
fig, ax = plt.subplots(figsize=(8,6))

ax.scatter(boston_df.LSTAT.values,
boston_df.MEDV.values,facecolors='none', edgecolors='b',\
label="data");
ax.set_ylabel('MEDV');
ax.set_xlabel('LSTAT');

# call scipy's linregress returning fit coefficients and simple

statistics
beta1, beta0, r_value, p_value, stderr =
stats.linregress(boston_df.LSTAT.values,\

boston_df.MEDV.values)

# add the estimation to the data plot

ax.plot(boston_df.LSTAT.values, beta0 +
beta1*boston_df.LSTAT.values,color='r', label="OLS");
ax.legend(loc='best');

# print the regression estimates returned from scipy

print('beta= [',round(beta0,3),',', round(beta1,3),']')
print('R={0:.3f}, p_value={1:.3f}, stderr={2:.3f}'.format(r_value,
p_value, stderr))

beta= [ 34.554 , -0.95 ]

R=-0.738, p_value=0.000, stderr=0.039
Regress LSTAT onto MEDV (Statsmodels)
# Another method is to use the package statsmodels.

# Create a design matrix

# set the independent variable as the LSTAT
X = boston_df.LSTAT
# We add a constant for the intercept term
X = sm.add_constant(X)

# set the dependent variable

Y = boston_df.MEDV

# create the model instance and fit

linear_model = sm.OLS(Y,X)
linear_results = linear_model.fit()
# data about the model is stored in summary
print(linear_results.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.544
Model: OLS Adj. R-squared:
0.543
Method: Least Squares F-statistic:
601.6
Date: Fri, 24 Jun 2016 Prob (F-statistic):
5.08e-88
Time: 10:05:12 Log-Likelihood:
-1641.5
No. Observations: 506 AIC:
3287.
Df Residuals: 504 BIC:
3295.
Df Model: 1

Covariance Type: nonrobust

======================================================================
========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
--------
const 34.5538 0.563 61.415 0.000 33.448
35.659
LSTAT -0.9500 0.039 -24.528 0.000 -1.026
-0.874
======================================================================
========
Omnibus: 137.043 Durbin-Watson:
0.892
Prob(Omnibus): 0.000 Jarque-Bera (JB):
291.373
Skew: 1.453 Prob(JB):
5.36e-64
Kurtosis: 5.319 Cond. No.
29.7
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.

# statsmodels includes functions for getting the prediction and

confidence intervals.

# Get prediction interval for a given x

prstd, iv_l, iv_u = wls_prediction_std(linear_results)

# statsmodels also includes the confidence intervals for the fitted

values (i.e. the mean fits)
# but they are buried in the results of summary table function.
from statsmodels.stats.outliers_influence import summary_table

# call the summary table at a (1-alpha)100% confidence interval level

simpleTable, data, column_names = summary_table(linear_results,
alpha=0.05)

# Data contains the confidence intervals we want but we need to make

sure we get the right
# columns so lets print off the names
print('column_names: ', column_names)

# Get confidence intervals for a given x

predicted_mean_ci_low, predicted_mean_ci_high = data[:,4:6].T

column_names: ['Obs', 'Dep Var\nPopulation', 'Predicted\nValue', 'Std

Error\nMean Predict', 'Mean ci\n95% low', 'Mean ci\n95% upp', 'Predict
ci\n95% low', 'Predict ci\n95% upp', 'Residual', 'Std Error\
nResidual', 'Student\nResidual', "Cook's\nD"]

# Create a plot to plot the data, OLS estimate, prediction and

confidence intervals
fig, ax = plt.subplots(figsize=(8,6))

# get numpy array values from dataframe

x = boston_df.LSTAT.values
y = boston_df.MEDV.values

# Plot the data

ax.scatter(x, y, facecolors='none', edgecolors='b', label="data")
# plot the models fitted values
ax.plot(x, linear_results.fittedvalues, 'g', label="OLS")

# plot the high and low prediction intervals

ax.plot(x, iv_u, color='0.75',label="Prediction Interval")
ax.plot(x, iv_l, color='0.75')

# plot the high and low mean confidence intervals

ax.plot(x,predicted_mean_ci_low, 'r', label="Predicted Mean CI")
ax.plot(x,predicted_mean_ci_high,'r')

ax.legend(loc='best');

plt.xlabel('LSTAT');
plt.ylabel('MEDV');
Diagnostic Plots for Linear Model
# Create plots of residuals
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(12,6))

# RESIDUALS
# The results contain the residuals
fitted_values = linear_results.fittedvalues.values
residuals = linear_results.resid.values

# Plot the residual for each fitted value

ax1.scatter(fitted_values, residuals, facecolors='none',
edgecolors='b');
ax1.set_xlabel('fitted values');
ax1.set_ylabel('residuals');
# The residual plot indicates significant nonlinearity (a u-shape
pattern is clear)

# STUDENTIZED RESIDUALS
# To asses data outliers we will look at the studentized residuals.
This is in the data array
# returned from summary table (10th column)
studentized_residuals = data[:,10]

# Plot the studentized residuals

ax2.scatter(fitted_values,studentized_residuals, facecolors='none',
edgecolors='b');
ax2.set_ylabel('Studentized Residuals');
ax2.set_xlabel('fitted values');

# |studentized residual| > 3 are generally considered outliers

# We can also examine the leverages to identify points that may alter
the regression line
from statsmodels.stats.outliers_influence import OLSInfluence
leverage = OLSInfluence(linear_results).influence

fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(leverage, studentized_residuals,facecolors='none',
edgecolors='b');
ax.set_xlabel('Leverage');
ax.set_ylabel('Studentized Residuals');
3.6.3 Multiple Linear Regression
Here we will estimate MEDV using multiple linear regression. In the first example we
will regress LSTAT and AGE onto MEDV.
# create our design matrix using LSTAT and AGE predictors
X = sm.add_constant(boston_df[['LSTAT','AGE']])

# set the dependent variable

Y = boston_df.MEDV

fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111, projection='3d',azim=-60, elev=5)

# create the model instance and estimate

model = sm.OLS(Y,X)
estimate = model.fit()
# data about the model is stored in summary
print('Model parameters:\n', estimate.params[:])

# Plot the data

ax.scatter(X.loc[:,'LSTAT'], X.loc[:,'AGE'], Y.values,
facecolors=(0,0,0,0),\
edgecolor='k', depthshade=True);
ax.set_xlabel('LSTAT');
ax.set_ylabel('AGE')
ax.set_zlabel('MEDV')

# Plot the OLS estimate

# create a grid of points
xx1, xx2 = np.meshgrid(np.linspace(X.LSTAT.min(), X.LSTAT.max(), 100),

np.linspace(X.AGE.min(), X.AGE.max(), 100))

# plot the plane by evaluating the parameters over the grid

Z = estimate.params[0] + estimate.params[1] * xx1 + estimate.params[2]
* xx2
# plot plane
surf = ax.plot_surface(xx1, xx2, Z, cmap=plt.cm.RdBu_r, alpha=0.75,
linewidth=0)

Model parameters:
const 33.222761
LSTAT -1.032069
AGE 0.034544
dtype: float64
Now we will perform the regression over all 13 predictors in the boston housing
dataset.
# create our design matrix using all the predictors (last column is
MEDV)
X = sm.add_constant(boston_df.iloc[:,0:-1])

# create the model instance and estimate

model = sm.OLS(Y,X)
est = model.fit()
# data about the model is stored in summary
print(est.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.741
Model: OLS Adj. R-squared:
0.734
Method: Least Squares F-statistic:
108.1
Date: Fri, 24 Jun 2016 Prob (F-statistic):
6.95e-135
Time: 10:05:13 Log-Likelihood:
-1498.8
No. Observations: 506 AIC:
3026.
Df Residuals: 492 BIC:
3085.
Df Model: 13

Covariance Type: nonrobust

======================================================================
========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
--------
const 36.4911 5.104 7.149 0.000 26.462
46.520
CRIM -0.1072 0.033 -3.276 0.001 -0.171
-0.043
ZN 0.0464 0.014 3.380 0.001 0.019
0.073
INDUS 0.0209 0.061 0.339 0.735 -0.100
0.142
CHAS 2.6886 0.862 3.120 0.002 0.996
4.381
NOX -17.7958 3.821 -4.658 0.000 -25.302
-10.289
RM 3.8048 0.418 9.102 0.000 2.983
4.626
AGE 0.0008 0.013 0.057 0.955 -0.025
0.027
DIS -1.4758 0.199 -7.398 0.000 -1.868
-1.084
RAD 0.3057 0.066 4.608 0.000 0.175
0.436
TAX -0.0123 0.004 -3.278 0.001 -0.020
-0.005
PTRATIO -0.9535 0.131 -7.287 0.000 -1.211
-0.696
B 0.0094 0.003 3.500 0.001 0.004
0.015
LSTAT -0.5255 0.051 -10.366 0.000 -0.625
-0.426
======================================================================
========
Omnibus: 178.029 Durbin-Watson:
1.078
Prob(Omnibus): 0.000 Jarque-Bera (JB):
782.015
Skew: 1.521 Prob(JB):
1.54e-170
Kurtosis: 8.276 Cond. No.
1.51e+04
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.51e+04. This might indicate that
there are
strong multicollinearity or other numerical problems.

# Compute all the variance inflation factors

from statsmodels.stats.outliers_influence import
variance_inflation_factor

VIFs = [(predictor, variance_inflation_factor(X.values,_)) \

for _,predictor in enumerate(list(X))] # list(X) returns
column names of df

print('Variance Inflation Factors')

for tup in VIFs:
print('{:10}'.format(tup[0]), '{:.3f}'.format(tup[1]))

Variance Inflation Factors

const 585.425
CRIM 1.773
ZN 2.299
INDUS 3.991
CHAS 1.074
NOX 4.395
RM 1.934
AGE 3.101
DIS 3.957
RAD 7.481
TAX 9.008
PTRATIO 1.799
B 1.346
LSTAT 2.938

3.6.4 Interaction Terms

Statsmodels uses the patsy package to convert formulas to matrices for fitting. This
allows for easy implementation of arbitrary functions of the predictors.
# import statsmodels patsy api
import statsmodels.formula.api as smf
# Construct model and fit
model = smf.ols(formula='MEDV ~ LSTAT*AGE', data=boston_df)
estimate = model.fit()
print(estimate.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.556
Model: OLS Adj. R-squared:
0.553
Method: Least Squares F-statistic:
209.3
Date: Fri, 24 Jun 2016 Prob (F-statistic):
4.86e-88
Time: 10:05:13 Log-Likelihood:
-1635.0
No. Observations: 506 AIC:
3278.
Df Residuals: 502 BIC:
3295.
Df Model: 3

Covariance Type: nonrobust

======================================================================
========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
--------
Intercept 36.0885 1.470 24.553 0.000 33.201
38.976
LSTAT -1.3921 0.167 -8.313 0.000 -1.721
-1.063
AGE -0.0007 0.020 -0.036 0.971 -0.040
0.038
LSTAT:AGE 0.0042 0.002 2.244 0.025 0.001
0.008
======================================================================
========
Omnibus: 135.601 Durbin-Watson:
0.965
Prob(Omnibus): 0.000 Jarque-Bera (JB):
296.955
Skew: 1.417 Prob(JB):
3.29e-65
Kurtosis: 5.461 Cond. No.
6.88e+03
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 6.88e+03. This might indicate that
there are
strong multicollinearity or other numerical problems.

3.6.5 Non-linear Transformations of the Predictors

Statsmodel patsy api allows us to include non-linear terms
model = smf.ols('MEDV ~ LSTAT + I(LSTAT**2)', data=boston_df)
quadratic_results = model.fit()
print(quadratic_results.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.641
Model: OLS Adj. R-squared:
0.639
Method: Least Squares F-statistic:
448.5
Date: Fri, 24 Jun 2016 Prob (F-statistic):
1.56e-112
Time: 10:05:13 Log-Likelihood:
-1581.3
No. Observations: 506 AIC:
3169.
Df Residuals: 503 BIC:
3181.
Df Model: 2

Covariance Type: nonrobust

======================================================================
===========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
-----------
Intercept 42.8620 0.872 49.149 0.000
41.149 44.575
LSTAT -2.3328 0.124 -18.843 0.000 -
2.576 -2.090
I(LSTAT ** 2) 0.0435 0.004 11.628 0.000
0.036 0.051
======================================================================
========
Omnibus: 107.006 Durbin-Watson:
0.921
Prob(Omnibus): 0.000 Jarque-Bera (JB):
228.388
Skew: 1.128 Prob(JB):
2.55e-50
Kurtosis: 5.397 Cond. No.
1.13e+03
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.13e+03. This might indicate that
there are
strong multicollinearity or other numerical problems.

The near zero p-value for the quadratic term suggest an improved model. We will
plot and perform some diagnostics of the fit.
fig, ax = plt.subplots(figsize=(8,6))

# get numpy array values from dataframe

x = boston_df.LSTAT.values
y = boston_df.MEDV.values

# Plot the data

ax.scatter(x, y, facecolors='none', edgecolors='b', label="data");
# plot the models fitted values
ax.plot(x, quadratic_results.fittedvalues, 'g',
marker='o',linestyle='none', label="OLS");

ax.legend(loc='best');

plt.xlabel('LSTAT');
plt.ylabel('MEDV');
Diagnostic tests of quadratic estimate
# import anova function
from statsmodels.stats.api import anova_lm

# perform the hypothesis test (see https://en.wikipedia.org/wiki/F-

test regression section)
anova_table = anova_lm(linear_results, quadratic_results)
print(anova_table)

df_resid ssr df_diff ss_diff F

Pr(>F)
0 504 19472.381418 0 NaN NaN
NaN
1 503 15347.243158 1 4125.13826 135.199822 7.630116e-
28
The F-statistic is 135 with a p-value of ~0 indicating there is a large difference in the
unexplained variances of the two models. This is not too suprising given the plot
above. Now consider the residuals.
# Create plots of residuals
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(12,6))

# RESIDUALS OF LINEAR MODEL

# The results contain the residuals
linear_fit_values = linear_results.fittedvalues.values
residuals = linear_results.resid.values

# Plot the residual for each fitted value for the linear model
ax1.scatter(linear_fit_values, residuals, facecolors='none',
edgecolors='b');
ax1.set_xlabel('fitted values');
ax1.set_ylabel('residuals');
ax1.set_title('Linear Model Residuals')

# RESIDUALS OF QUADRATIC MODEL

# The results contain the residuals
quadratic_fit_values = quadratic_results.fittedvalues.values
quadratic_residuals = quadratic_results.resid.values
ax2.scatter(quadratic_fit_values, quadratic_residuals,
facecolors='none', edgecolors='b');
ax2.set_title('Quadratic Model Residuals');
We can also try higher order polynomial fits:
formula = 'MEDV ~ 1 + ' + ' + '.join('I(LSTAT**{})'.format(i) for i in
range(1, 6))
print(formula)
model = smf.ols(formula, data=boston_df)
order_5_results = model.fit()
print(order_5_results.summary())

MEDV ~ 1 + I(LSTAT1) + I(LSTAT2) + I(LSTAT3) + I(LSTAT4) +

I(LSTAT**5)
OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.682
Model: OLS Adj. R-squared:
0.679
Method: Least Squares F-statistic:
214.2
Date: Fri, 24 Jun 2016 Prob (F-statistic):
8.73e-122
Time: 10:57:12 Log-Likelihood:
-1550.6
No. Observations: 506 AIC:
3113.
Df Residuals: 500 BIC:
3139.
Df Model: 5

Covariance Type: nonrobust

======================================================================
===========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
-----------
Intercept 67.6997 3.604 18.783 0.000
60.618 74.781
I(LSTAT ** 1) -11.9911 1.526 -7.859 0.000 -
14.989 -8.994
I(LSTAT ** 2) 1.2728 0.223 5.703 0.000
0.834 1.711
I(LSTAT ** 3) -0.0683 0.014 -4.747 0.000 -
0.097 -0.040
I(LSTAT ** 4) 0.0017 0.000 4.143 0.000
0.001 0.003
I(LSTAT ** 5) -1.632e-05 4.42e-06 -3.692 0.000 -2.5e-
05 -7.63e-06
======================================================================
========
Omnibus: 144.085 Durbin-Watson:
0.987
Prob(Omnibus): 0.000 Jarque-Bera (JB):
494.545
Skew: 1.292 Prob(JB):
4.08e-108
Kurtosis: 7.096 Cond. No.
1.37e+08
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.37e+08. This might indicate that
there are
strong multicollinearity or other numerical problems.

3.6.6 Qualitative Predictors

# Examine the head of the carseats data set
carseats_df.head()

Sales CompPrice Income Advertising Population Price ShelveLoc

Age \
1 9.50 138 73 11 276 120 Bad
42
2 11.22 111 48 16 260 83 Good
65
3 10.06 113 35 10 269 80 Medium
59
4 7.40 117 100 4 466 97 Medium
55
5 4.15 141 64 3 340 128 Bad
38

Education Urban US
1 17 Yes Yes
2 10 Yes Yes
3 12 Yes Yes
4 14 Yes Yes
5 13 Yes No

# Construct the formula with two interaction terms

formula ='Sales ~' +
'+'.join(list(carseats_df.iloc[:,1:].columns.tolist()) +
['Income:Advertising'] + ['Price:Age'])
print(formula)
print()

model = smf.ols(formula, data=carseats_df)

carseat_results = model.fit()
print(carseat_results.summary())

Sales
~CompPrice+Income+Advertising+Population+Price+ShelveLoc+Age+Education
+Urban+US+Income:Advertising+Price:Age

OLS Regression Results

======================================================================
========
Dep. Variable: Sales R-squared:
0.876
Model: OLS Adj. R-squared:
0.872
Method: Least Squares F-statistic:
210.0
Date: Fri, 24 Jun 2016 Prob (F-statistic):
6.14e-166
Time: 14:38:28 Log-Likelihood:
-564.67
No. Observations: 400 AIC:
1157.
Df Residuals: 386 BIC:
1213.
Df Model: 13

Covariance Type: nonrobust

======================================================================
=================
coef std err t P>|t|
[95.0% Conf. Int.]
----------------------------------------------------------------------
-----------------
Intercept 6.5756 1.009 6.519 0.000
4.592 8.559
ShelveLoc[T.Good] 4.8487 0.153 31.724 0.000
4.548 5.149
ShelveLoc[T.Medium] 1.9533 0.126 15.531 0.000
1.706 2.201
Urban[T.Yes] 0.1402 0.112 1.247 0.213
-0.081 0.361
US[T.Yes] -0.1576 0.149 -1.058 0.291
-0.450 0.135
CompPrice 0.0929 0.004 22.567 0.000
0.085 0.101
Income 0.0109 0.003 4.183 0.000
0.006 0.016
Advertising 0.0702 0.023 3.107 0.002
0.026 0.115
Population 0.0002 0.000 0.433 0.665
-0.001 0.001
Price -0.1008 0.007 -13.549 0.000
-0.115 -0.086
Age -0.0579 0.016 -3.633 0.000
-0.089 -0.027
Education -0.0209 0.020 -1.063 0.288
-0.059 0.018
Income:Advertising 0.0008 0.000 2.698 0.007
0.000 0.001
Price:Age 0.0001 0.000 0.801 0.424
-0.000 0.000
======================================================================
========
Omnibus: 1.281 Durbin-Watson:
2.047
Prob(Omnibus): 0.527 Jarque-Bera (JB):
1.147
Skew: 0.129 Prob(JB):
0.564
Kurtosis: 3.050 Cond. No.
1.31e+05
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.31e+05. This might indicate that
there are
strong multicollinearity or other numerical problems.

# The default treatment for categoricals in patsy is the reference=0.

In this case the dummy variable for bad is 0.
# more treatments can be found in the patsy documentation. ....from
patsy.contrasts import Treatment

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Microfit Guide2
No ratings yet
Microfit Guide2
17 pages
Use The Data in WAGE File To Estimate A Simple Regression Explaining Monthly Salary
No ratings yet
Use The Data in WAGE File To Estimate A Simple Regression Explaining Monthly Salary
4 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Assignment_Solution_1
No ratings yet
Assignment_Solution_1
11 pages
Pregunta 5
No ratings yet
Pregunta 5
2 pages
TP Regression
100% (1)
TP Regression
1 page
Simple_and_Multiple_Regression
No ratings yet
Simple_and_Multiple_Regression
9 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
MLR-handson - Jupyter Notebook
No ratings yet
MLR-handson - Jupyter Notebook
5 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
assignment2
No ratings yet
assignment2
5 pages
TestExercise 3.ipynb - Colab
No ratings yet
TestExercise 3.ipynb - Colab
8 pages
An Introduction to Stadistical Learning-129-140-1-8
No ratings yet
An Introduction to Stadistical Learning-129-140-1-8
8 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
En Tanagra Python StatsModels PDF
No ratings yet
En Tanagra Python StatsModels PDF
20 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Week 2 MrSumanBera HandsOn
No ratings yet
Week 2 MrSumanBera HandsOn
9 pages
230393_2_14022025
No ratings yet
230393_2_14022025
7 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Assigment Regression
No ratings yet
Assigment Regression
9 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Exercise 4: Simple and Multiple Linear Regression Analysis
No ratings yet
Exercise 4: Simple and Multiple Linear Regression Analysis
15 pages
20mia1006_FDA_LAB_REGRESSION_TYPES
No ratings yet
20mia1006_FDA_LAB_REGRESSION_TYPES
11 pages
New Text Document
No ratings yet
New Text Document
7 pages
The Arbitrage Pricing Theory Model
No ratings yet
The Arbitrage Pricing Theory Model
3 pages
Machine Learning Basics 1683717543
No ratings yet
Machine Learning Basics 1683717543
15 pages
Simpreg
No ratings yet
Simpreg
6 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
hw2
No ratings yet
hw2
9 pages
regression
No ratings yet
regression
4 pages
Chapter 14
No ratings yet
Chapter 14
18 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
228371_Lecture_Notes_Week_3
No ratings yet
228371_Lecture_Notes_Week_3
61 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Lab 5
No ratings yet
Lab 5
6 pages
Python_Codes_Regression - Jupyter Notebook
No ratings yet
Python_Codes_Regression - Jupyter Notebook
7 pages
ps3
No ratings yet
ps3
16 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Kata Pengantar Vano
No ratings yet
Kata Pengantar Vano
86 pages
Lab 4
No ratings yet
Lab 4
7 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
BA Soln
No ratings yet
BA Soln
9 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Lab 8
No ratings yet
Lab 8
13 pages
Section 2
No ratings yet
Section 2
22 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
STATS135 REVIEWER
No ratings yet
STATS135 REVIEWER
5 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
1
No ratings yet
1
58 pages
vertopal.com_Ch02-statlearn-lab
No ratings yet
vertopal.com_Ch02-statlearn-lab
58 pages
Lec 02 Ch02 TheBasicsofSupplyandDemand
No ratings yet
Lec 02 Ch02 TheBasicsofSupplyandDemand
71 pages
Eighteenth Edition, Global Edition: Customer Value-Driven Marketing Strategy: Creating Value For Target Customers
No ratings yet
Eighteenth Edition, Global Edition: Customer Value-Driven Marketing Strategy: Creating Value For Target Customers
51 pages
Week 4 Survey
No ratings yet
Week 4 Survey
56 pages
PP Outline - Updated
No ratings yet
PP Outline - Updated
3 pages
xzcxzdsadW3-1-Hypothesis Testing - Two Sample Tests
No ratings yet
xzcxzdsadW3-1-Hypothesis Testing - Two Sample Tests
47 pages
W1 Part1 Sampling Distributions
No ratings yet
W1 Part1 Sampling Distributions
18 pages
Esad
No ratings yet
Esad
14 pages
Chapter 4 Details
No ratings yet
Chapter 4 Details
5 pages
W2-1-Dasdasdasd Testing - One Sample
No ratings yet
W2-1-Dasdasdasd Testing - One Sample
50 pages
Saddsakdsfaspof
No ratings yet
Saddsakdsfaspof
1 page
2 e 312321 Qwe
No ratings yet
2 e 312321 Qwe
2 pages
W1 - Part2 - Confidence Interval
No ratings yet
W1 - Part2 - Confidence Interval
36 pages
Digital Transformation in Sustainability
No ratings yet
Digital Transformation in Sustainability
49 pages
Statistics II Homework II Week 2-Week 3 Two-Sample Tests
No ratings yet
Statistics II Homework II Week 2-Week 3 Two-Sample Tests
3 pages
2
No ratings yet
2
2 pages
Governance (Bitmedi)
No ratings yet
Governance (Bitmedi)
19 pages
The Evolution of Interstellar Travel in Science Fiction
No ratings yet
The Evolution of Interstellar Travel in Science Fiction
2 pages
Additive and Multiplicative
No ratings yet
Additive and Multiplicative
2 pages
IMT Regression Analysis
100% (1)
IMT Regression Analysis
5 pages
Matlab
No ratings yet
Matlab
7 pages
Ecoholics: UGC NET Economics Syllabus
No ratings yet
Ecoholics: UGC NET Economics Syllabus
15 pages
n n x¯ x¯ σ = σ =: Sample 1 Sample 2
No ratings yet
n n x¯ x¯ σ = σ =: Sample 1 Sample 2
8 pages
Adjusted FBA Assignment Questions. Year3
No ratings yet
Adjusted FBA Assignment Questions. Year3
9 pages
The Science and Art of DSGE Modelling
No ratings yet
The Science and Art of DSGE Modelling
9 pages
Microeconometrics of International Trade (Wagner, Joachim) (Z-Library)
No ratings yet
Microeconometrics of International Trade (Wagner, Joachim) (Z-Library)
532 pages
4 FWL Handout
No ratings yet
4 FWL Handout
12 pages
M6 Check in Activity 4 Group
No ratings yet
M6 Check in Activity 4 Group
10 pages
Econometrics With R
No ratings yet
Econometrics With R
56 pages
STA2604 2022 TL 014 0 B Assignment 2
No ratings yet
STA2604 2022 TL 014 0 B Assignment 2
4 pages
Assignment6.1 DataMining Part1 Simple Linear Regression
No ratings yet
Assignment6.1 DataMining Part1 Simple Linear Regression
8 pages
Yy 1 Xy
No ratings yet
Yy 1 Xy
4 pages
Ch11 Panel PA Feb2021
No ratings yet
Ch11 Panel PA Feb2021
27 pages
Machine Learning Qb
No ratings yet
Machine Learning Qb
2 pages
Pengaruh Pengalaman Kerja Dan Semangat Kerja Terhadap Kinerja Karyawan Di Pt. Dutagriya Sarana Medan
No ratings yet
Pengaruh Pengalaman Kerja Dan Semangat Kerja Terhadap Kinerja Karyawan Di Pt. Dutagriya Sarana Medan
8 pages
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
No ratings yet
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
51 pages
Dummy Variables and Properties of OLS Estimators - Lecture Notes
No ratings yet
Dummy Variables and Properties of OLS Estimators - Lecture Notes
19 pages
Econometrics Theory and Applications ECO501 - Comsats
No ratings yet
Econometrics Theory and Applications ECO501 - Comsats
5 pages
Chap 012
75% (4)
Chap 012
91 pages
Mixed Effects Models and Extensions in Ecology with R Instant PDF Download
100% (8)
Mixed Effects Models and Extensions in Ecology with R Instant PDF Download
16 pages
Notes On Forecasting With Moving Averages - Robert Nau
No ratings yet
Notes On Forecasting With Moving Averages - Robert Nau
28 pages
Ken Black QA 5th Chapter17 Solution
No ratings yet
Ken Black QA 5th Chapter17 Solution
44 pages
Regression Metrics
No ratings yet
Regression Metrics
26 pages
Seznam Literatury o Statistice
No ratings yet
Seznam Literatury o Statistice
151 pages
Simple Linear Regression Analysis
100% (5)
Simple Linear Regression Analysis
62 pages
Does Foreign Direct Investment Accelerate Economic Growth?: Maria Carkovic and Ross Levine
No ratings yet
Does Foreign Direct Investment Accelerate Economic Growth?: Maria Carkovic and Ross Levine
23 pages

vertopal.com_Lab_Linear_Regression

Uploaded by

vertopal.com_Lab_Linear_Regression

Uploaded by

ISLR- Python: Chapter 3 -- Lab Linear

from scipy import stats

#Transform the data into a dataframe for analysis¶

# combine the predictors and responses for a dataframe

# now get the column names of the data frame

# create the data frame

# Load the carseats data set

Regress LSTAT onto MEDV (Scipy)

# call scipy's linregress returning fit coefficients and simple

# add the estimation to the data plot

# print the regression estimates returned from scipy

beta= [ 34.554 , -0.95 ]

# Create a design matrix

# set the dependent variable

# create the model instance and fit

OLS Regression Results

Covariance Type: nonrobust

# statsmodels includes functions for getting the prediction and

# Get prediction interval for a given x

# statsmodels also includes the confidence intervals for the fitted

# call the summary table at a (1-alpha)100% confidence interval level

# Data contains the confidence intervals we want but we need to make

# Get confidence intervals for a given x

column_names: ['Obs', 'Dep Var\nPopulation', 'Predicted\nValue', 'Std

# Create a plot to plot the data, OLS estimate, prediction and

# get numpy array values from dataframe

# Plot the data

# plot the high and low prediction intervals

# plot the high and low mean confidence intervals

# Plot the residual for each fitted value

# Plot the studentized residuals

# |studentized residual| > 3 are generally considered outliers

# set the dependent variable

fig = plt.figure(figsize=(12, 8))

# create the model instance and estimate

# Plot the data

# Plot the OLS estimate

np.linspace(X.AGE.min(), X.AGE.max(), 100))

# plot the plane by evaluating the parameters over the grid

# create the model instance and estimate

OLS Regression Results

Covariance Type: nonrobust

# Compute all the variance inflation factors

VIFs = [(predictor, variance_inflation_factor(X.values,_)) \

print('Variance Inflation Factors')

Variance Inflation Factors

3.6.4 Interaction Terms

OLS Regression Results

Covariance Type: nonrobust

3.6.5 Non-linear Transformations of the Predictors

OLS Regression Results

Covariance Type: nonrobust

# get numpy array values from dataframe

# Plot the data

# perform the hypothesis test (see https://en.wikipedia.org/wiki/F-

df_resid ssr df_diff ss_diff F

# RESIDUALS OF LINEAR MODEL

# RESIDUALS OF QUADRATIC MODEL

MEDV ~ 1 + I(LSTAT**1) + I(LSTAT**2) + I(LSTAT**3) + I(LSTAT**4) +

Covariance Type: nonrobust

3.6.6 Qualitative Predictors

Sales CompPrice Income Advertising Population Price ShelveLoc

# Construct the formula with two interaction terms

model = smf.ols(formula, data=carseats_df)

OLS Regression Results

Covariance Type: nonrobust

# The default treatment for categoricals in patsy is the reference=0.

You might also like

MEDV ~ 1 + I(LSTAT1) + I(LSTAT2) + I(LSTAT3) + I(LSTAT4) +