0% found this document useful (0 votes)
1K views

Regression Analysis - Notes

The document discusses using Ordinary Least Squares (OLS) regression in Python's Statsmodels library to analyze a housing price dataset. It demonstrates how to load data, select target and predictor variables, initialize an OLS model, fit the model to the data, and extract the R-squared value from the model summary. Key steps include assigning housing prices and average rooms per dwelling to X and Y, initializing an OLS model with sm.OLS(Y, X), fitting the model, and printing the summary to get the R-squared metric.

Uploaded by

h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Regression Analysis - Notes

The document discusses using Ordinary Least Squares (OLS) regression in Python's Statsmodels library to analyze a housing price dataset. It demonstrates how to load data, select target and predictor variables, initialize an OLS model, fit the model to the data, and extract the R-squared value from the model summary. Key steps include assigning housing prices and average rooms per dwelling to X and Y, initializing an OLS model with sm.OLS(Y, X), fitting the model, and printing the summary to get the R-squared metric.

Uploaded by

h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

##OLS in Python Statsmodels - hands on

Q. From the above output you can see the various attributes of the dataset.
The 'target' column has the dependent values(housing prices) and rest of the colums
are the independent values that influence the target values
Lets find the relation between 'housing price' and 'average number of rooms per
dwelling' using stats model
Assign the values of column "RM"(average number of rooms per dwelling) to variable
X
similerly assign the values of 'target'(housing price) column to variable Y
sample code: values = data_frame['attribute_name']

Ans:
X = dataset['RM']
Y = dataset['target']

Q. import statsmodels.api as sm

Ans:
import statsmodels.api as sm

Q. - initialise the OLS model by passing target(Y) and attribute(X).Assign the


model to variable 'statsModel'
- fit the model and assign it to variable 'fittedModel'
- sample code for initialization: sm.OLS(target, attribute)

Ans:
statsModel = sm.OLS(Y, X)
fittedModel = statsModel.fit()

Q. print the summary of fittedModel using the summary() function

Ans:
print(fittedModel.summary())

Q. from the summary report note down the R-squared value and assign it to variable
'r_squared' in the below cell

Ans.
###Start code here
r_squared = fittedModel.rsquared
###End code(approx 1 line)
with open("output.txt", "w") as text_file:
text_file.write("rsquared= %f\n" % r_squared)

Q. print

Ans. print(r_squared)

----------------------------
##

Q. create a datframe named as 'X' such that it includes all the feature columns and
drop the target column.
assign the 'target' columns to variiable Y

Ans.
X = dataset.drop('target', axis = 1)
Y = dataset['target']

Q.
Now the dataframe X has just the features that influence the target
print the correlation matrix for dataframe X. Use '.corr()' function to compute
correlation matrix
from the correlation matrix note down the correlation value between 'CRIM' and
'PTRATIO' and assign it to variable 'corr_value'

Ans.
###Start code here
#print correlation matrix for X
print(X.corr())
corr_value = X['CRIM'].corr(X['PTRATIO'])
print(corr_value)

Q. import stats model as sm


initalize the OLS model with target Y and dataframe X(features)
fit the model and print the summary

Ans.
###Start code here
import statsmodels.api as sm
statsModel = sm.OLS(Y, X)
fittedModel = statsModel.fit()
print(fittedModel.summary())

###End code(approx 4 lines)

Q. from the summary report note down R squared value and assign it to variable
'r_square'

ans.
###Start code here
r_squared = fittedModel.rsquared
###End code(approx 1 line)
with open("output.txt", "w") as text_file:
text_file.write("corr= %f\n" % corr_value)
text_file.write("rsquared= %f\n" % r_squared)
Q. print

Ans.
print(r_squared)

You might also like