0% found this document useful (0 votes)
25 views

Simple Linear Regression

The document imports common Python libraries for data analysis and visualization. It loads blood pressure (BP) and rental price data from Excel and Stata files. Simple linear regression models are fit to predict BP (Y) from age (X) and rental price (rent) from rental price per square meter (rentsqm). The models find age and rentpsqm to be statistically significant predictors of BP and rent respectively.

Uploaded by

2023005
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Simple Linear Regression

The document imports common Python libraries for data analysis and visualization. It loads blood pressure (BP) and rental price data from Excel and Stata files. Simple linear regression models are fit to predict BP (Y) from age (X) and rental price (rent) from rental price per square meter (rentsqm). The models find age and rentpsqm to be statistically significant predictors of BP and rent respectively.

Uploaded by

2023005
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import scipy as sp
import statsmodels.api as sm
import statsmodels.formula.api as smf

%matplotlib inline

plt.rcParams['figure.dpi'] =150
plt.rcParams['figure.figsize'] = (8,6)
plt.rcParams.update({'font.size':8})

bp = pd.read_excel('Copy of Age-BP.xlsx')

bp.head()

i Y X
0 1 144 39
1 2 220 47
2 3 138 43
3 4 145 47
4 5 162 65

bp.tail()

i Y X
25 26 158 53
26 27 154 63
27 28 130 29
28 29 125 25
29 30 175 69

bp.head(2)

i Y X
0 1 144 39
1 2 220 47

plt.scatter(bp.X, bp.Y, c = 'maroon')


# grid
plt.grid(True)
# x-label
plt.xlabel('Age')
# y - label
plt.ylabel('Systolic Blood Pressure')
# show plot
plt.show()
Simple Linear Regreesion
model1 = smf.ols(formula = 'Y ~ X', data = bp)

model1_fit = model1.fit()

model1_fit.summary()

<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results

=============================================================================
=
Dep. Variable: Y R-squared:
0.471
Model: OLS Adj. R-squared:
0.452
Method: Least Squares F-statistic:
24.92
Date: Sat, 20 Jan 2024 Prob (F-statistic): 2.83e-
05
Time: 16:51:55 Log-Likelihood: -
126.35
No. Observations: 30 AIC:
256.7
Df Residuals: 28 BIC:
259.5
Df Model: 1

Covariance Type: nonrobust

=============================================================================
=
coef std err t P>|t| [0.025
0.975]
-----------------------------------------------------------------------------
-
Intercept 97.7273 9.655 10.122 0.000 77.951
117.504
X 1.0210 0.205 4.992 0.000 0.602
1.440
=============================================================================
=
Omnibus: 44.814 Durbin-Watson:
1.727
Prob(Omnibus): 0.000 Jarque-Bera (JB):
212.423
Skew: 2.958 Prob(JB): 7.46e-
47
Kurtosis: 14.616 Cond. No.
148.
=============================================================================
=

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
"""

model:
Y = β0 + β 1 × X

Y =97.273+1.021 X
rent = pd.read_stata('rent99.dta')

rent.head()

rent rentsqm area yearc location bath kitchen cheating \


0 120.974358 3.456410 35.0 1939.0 1.0 0.0 0.0 0.0
1 436.974335 4.201676 104.0 1939.0 1.0 1.0 0.0 1.0
2 355.743591 12.267021 29.0 1971.0 2.0 0.0 0.0 1.0
3 282.923096 7.254436 39.0 1972.0 2.0 0.0 0.0 1.0
4 807.230774 8.321964 97.0 1985.0 1.0 0.0 0.0 1.0
district
0 1112.0
1 1112.0
2 2114.0
3 2148.0
4 2222.0

rent.tail()

rent rentsqm area yearc location bath kitchen cheating


\
3077 525.384644 7.841564 67.0 1971.0 2.0 0.0 0.0 1.0

3078 712.615356 8.585729 83.0 1918.0 2.0 0.0 0.0 0.0

3079 833.230774 7.574826 110.0 1918.0 2.0 1.0 0.0 1.0

3080 557.333374 13.593497 41.0 1972.0 2.0 0.0 0.0 1.0

3081 360.820496 5.819687 62.0 1953.0 2.0 0.0 0.0 0.0

district
3077 2148.0
3078 341.0
3079 961.0
3080 381.0
3081 522.0

model2 = smf.ols(formula = 'rent ~ rentsqm', data = rent)

model2_fit = model2.fit()

model2_fit.summary()

<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results

=============================================================================
=
Dep. Variable: rent R-squared:
0.253
Model: OLS Adj. R-squared:
0.252
Method: Least Squares F-statistic:
1041.
Date: Sat, 20 Jan 2024 Prob (F-statistic): 6.28e-
197
Time: 17:25:52 Log-Likelihood: -
20186.
No. Observations: 3082 AIC:
4.038e+04
Df Residuals: 3080 BIC:
4.039e+04
Df Model: 1

Covariance Type: nonrobust

=============================================================================
=
coef std err t P>|t| [0.025
0.975]
-----------------------------------------------------------------------------
-
Intercept 172.4176 9.405 18.332 0.000 153.976
190.859
rentsqm 40.3613 1.251 32.257 0.000 37.908
42.815
=============================================================================
=
Omnibus: 568.919 Durbin-Watson:
1.917
Prob(Omnibus): 0.000 Jarque-Bera (JB):
1590.958
Skew: 0.972 Prob(JB):
0.00
Kurtosis: 5.934 Cond. No.
23.6
=============================================================================
=

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
"""

You might also like