Simple Explanation of Statsmodel Linear Regression Model Summary
Simple Explanation of Statsmodel Linear Regression Model Summary
Simple Explanation of Statsmodel Linear Regression Model Summary
Open in app
Member-only story
Image by Author
Introduction
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 1/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Regression analysis is the bread and butter for many statisticians and data
scientists. We perform simple and multiple linear regression for the purpose of
prediction and always want to obtain a robust model free from any bias. In this
article, I am going to discuss the summary output of python’s statsmodel library
using a simple example and explain a little bit how the values reflect the model
performance.
For the purposae of demonstration, I will use kaggle’s Salary dataset (Apache 2.0
open source license). This dataset has two columns: years of experience and salary.
I have two two more column: Projects and People_managing.
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 2/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Sample data
When we use statsmodel to use all the three variables to predict Salary, we get the
following summary result.
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 3/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Dep variable
OLS which stands for Ordinary Least Square. The model tries to find out a linear
expression for the dataset which minimizes the sum of residual squares.
Covariance type
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 4/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
R-squared
Adj. R-squared
As we add more and more independent variables to our model, the R-squared
values increases but in reality, those variables do not necessarily make any
contribution towards explaining the dependent variable. Therefore addition of each
unnecessary variables needs some sort of penalty. The original R-squared values is
adjusted when there are multiple variables incorporated. In essence, we should
always look for adjusted R-squared value while performing multiple linear
regression. For a single independent variable, both R-squared and adjusted R-
squared value are same.
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 5/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
The coef column represents the coefficients for each independent variable along
with intercept value. Std err is the standard deviation of the corresponding
variable’s coefficient across all the data points. When using only one predicting
variable, the standard error can be obtained from this two dimensional space as
shown below
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 6/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Image by Author
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 7/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
F-statistics
F-test provides a way to check all the independent variables all together if any of
those are related to the dependent variable. If Prob(F-statistic) is greater than 0.05,
there is no evidence of relationship between any of the independent variable with
the output. If it is less than 0.05, we can say that there is at least one variable which
is significantly related with the output. In our example, the p-value is less than 0.05
and therefore, one or more than one of the independent variable are related to
output variable Salary. We have seen previously that YearsExperience is significantly
related with Salary but others are not. Therefore, the F-test data supports the t-test
outcomes. However, there may be some cases when prob(F-statistic) may be greater
than 0.05 but one of the independent variable shows strong correlation. This is
because each t-test is carried out with different set of data whereas F-test checks the
combined effect including all variables globally.
Log-likelihood
The log-likelihood value is a measure for fit of the model with the given data. It is
useful when we compare two or more models. The higher the value of log-
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 8/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
likelihood, the better the model fits the given data. It can range from negative
infinity to positive infinity.
When all three independent variables are incorporated in the model, the log-
likelihood value is -310.21 which is higher than -334.95 when only Projects data is
included. This mean the first model fits the data better. It also goes hand in hand
with R-squared values as seen above.
Omnibus test checks the normality of the residuals once the model is deployed. If
the value is zero, it means the residuals are perfectly normal. Here, in the example
prob(Omnibus) is 0.357 indicating that there is 35.7% chance that the residuals the
normally distributed. For a model to be robust, besides checking R-squared and
other rubrics, the residual distribution is also required to be normal ideally. In other
words, the residual should not follow any pattern when plotted against the fitted
values.
Skew values tells us the skewness of the residual distribution. Normally distributed
variables have 0 skew values. Kurtosis is a measure of light-tailed or heavy-tailed
distribution compared to normal distribution. High kurtosis indicates the
distribution is too narrow and low kurtosis indicates the distribution is too flat. A
kurtosis value between -2 and +2 is good to prove normalcy.
Durbin-Watson
Jarque-Bera (JB) and Prob(JB) is similar to Omni test measuring the normalcy of the
residuals.
Condition Number
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 10/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
High condition number indicates that there are possible multicollinearity present
in the dataset. If only one variable is used as predictor, this value is low and can be
ignored. We can proceed like stepwise regression and see if there is any
multicollinearity added when additional variables are included.
Conclusion
We have discussed all the summary parameters from statsmodel output. This will
useful for readers who are interested to check all the rubrics for a robust model.,
Most of the time, we look for R-squared value to make sure that the model explains
most of the variability but we have seen that there is much more than that.
Github page
Youtube Channel
Statistics
Follow
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 11/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
109 4
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 12/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
1.8K 16
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 13/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
887 10
32 1
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 14/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
60
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 15/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Maninder Singh
151
Lists
New_Reading_List
174 stories · 105 saves
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 16/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Yennhi95zz
203 3
Vivekawasthi
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 17/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
The Central Limit Therom & Making Estimates with the Confidence
intervals
This blog will cover the central limit theorem (CLT), allowing us to apply the concepts we
learned on the normal distribution to…
Wendy Hu
47
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 18/19
9/17/23, 11:47 AM Simple Explanation of Statsmodel Linear Regression Model Summary | by Md Sohel Mahmood | Towards Data Science
Juan Broglio
https://medium.com/towards-data-science/simple-explanation-of-statsmodel-linear-regression-model-summary-35961919868b 19/19