0% found this document useful (0 votes)

2 views

predictive modelling outputs

The document outlines the performance evaluation of a linear regression model on both training and test datasets, reporting metrics such as RMSE, MAE, and MAPE. It also discusses the process of checking linear regression assumptions, including multicollinearity, linearity, independence, normality, and homoscedasticity, along with methods for feature selection based on p-values. Finally, the document presents the final model summary and performance metrics after addressing the assumptions and refining the model.

Uploaded by

hemant09041995

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

predictive modelling outputs

Uploaded by

hemant09041995

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

# checking model performance on train set (seen 70% data)

print("Training Performance\n")
olsmodel1_train_perf = model_performance_regression(olsmodel1, x_train,
y_train)
olsmodel1_train_perf

Training Performance

RMSE MAE MAPE

0.84491
1.127269 26.95745
1

# checking model performance on test set (seen 30% data)

print("Test Performance\n")
olsmodel1_test_perf = model_performance_regression(olsmodel1, x_test,
y_test) ## Complete the code to get the test performance
olsmodel1_test_perf

O/p
Test Performance

RMSE MAE MAPE

0 1.030785 0.81383 23.978281

Checking Linear Regression Assumptions

We will be checking the following Linear Regression assumptions:

1. No Multicollinearity
2. Linearity of variables
3. Independence of error terms
4. Normality of error terms
5. No Heteroscedasticity

TEST FOR MULTICOLLINEARITY

Dropping variables with high VIF

 We will test for multicollinearity using VIF.

 General Rule of thumb:
o If VIF is 1 then there is no correlation between that predictor and
the remaining predictor variables.
o If VIF exceeds 5 or is close to exceeding 5, we say there is
moderate multicollinearity.
 Can or cannot be treated with proper reasoning
o If VIF is 10 or exceeding 10, it shows signs of high
multicollinearity.
 Must be treated

Feature VIF

0 const 6.124153

1 capital 4.014583

2 patents 2.986430

3 randd 5.545531

4 employment 3.593570

5 tobinq 1.064449

6 value 2.799430

7 institutions 1.331542

8 sp500_yes 1.622238
Dropping high p-value variables

We will drop the predictor variables having a p-value greater than 0.05 as they do not
significantly impact the target variable.

But sometimes p-values change after dropping a variable. So, we'll not drop all variables
at once.

Instead, we will do the following:

Build a model, check the p-values of the variables, and drop the column with the highest
p-value.

Create a new model without the dropped feature, check the p-values of the variables,
and drop the column with the highest p-value.

Repeat the above two steps till there are no columns with p-value > 0.05.

The above process can also be done manually by picking one variable at a time that has
a high p-value, dropping it, and building a model again. But that might be a little tedious
and using a loop will be more efficient.

# initial list of columns

cols = x_train.columns.tolist()

# setting an initial max p-value

max_p_value = 1

while len(cols) > 0:

# defining the train set
x_train_aux = x_train[cols]

# fitting the model

model = sm.OLS(y_train, x_train_aux).fit()

# getting the p-values and the maximum p-value

p_values = model.pvalues
max_p_value = max(p_values)

# name of the variable with maximum p-value

feature_with_p_max = p_values.idxmax()

if max_p_value > 0.05:

cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)

o/p
['const', 'employment', 'tobinq', 'value', 'institutions', 'sp500_yes']

# checking model performance on train set (seen 70% data)

print("Training Performance\n")
olsmodel2_train_perf = model_performance_regression(olsmodel2, x_train2,
y_train)
olsmodel2_train_perf

Training Performance

RMSE MAE MAPE

0 1.131306 0.843946 26.941502

# checking model performance on test set (seen 30% data)

print("Test Performance\n")
olsmodel2_test_perf = model_performance_regression(olsmodel2, x_train2,
y_train) ## Complete the code to get the test performance
olsmodel2_test_perf

Test Performance

RMSE MAE MAPE

0 1.131306 0.843946 26.941502

TEST FOR LINEARITY AND INDEPENDENCE

We will test for linearity and independence by making a plot of fitted values vs residuals
and checking for patterns.
If there is no pattern, then we say the model is linear and residuals are independent.

Otherwise, the model is showing signs of non-linearity and residuals are not
independent.

Actual Fitted Residu

Values Values als

-
65
5.772882 5.774955 0.00207
2
3

36 0.94420
6.340426 5.396227
6 0

-
44
9.259054 9.546073 0.28701
7
9

61 0.64043
6.229126 5.588692
8 4

61 0.12216
5.455543 5.333378
0 5

stats.shapiro(df_pred["Residuals"])

ShapiroResult(statistic=0.9822825883697879, pvalue=1.4029046526104298e-06)

TEST FOR HOMOSCEDASTICITY

We will test for homoscedasticity by using the goldfeldquandt test.

If we get a p-value greater than 0.05, we can say that the residuals are homoscedastic.
Otherwise, they are heteroscedastic.

import statsmodels.stats.api as sms

from statsmodels.compat import lzip
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(df_pred["Residuals"], x_train2)
lzip(name, test)

[('F statistic', 1.0339466399164485), ('p-value', 0.38839487294695685)]

Final Model Summary

olsmodel_final = sm.OLS(y_train, x_train2).fit()

print(olsmodel_final.summary())

OLS Regression Results

==========================================================================
====
Dep. Variable: sales R-squared: 0.667
Model: OLS Adj. R-squared:.664
Method: Least Squares F-statistic: 233.5
Date: Sun, 05 Jan 2025 Prob (F-statistic): 1.09e-136
Time: 15:56:45 Log-Likelihood: -909.96
No. Observations: 590 AIC: 1832.
Df Residuals: 584 BIC: 1858.
Df Model: 5
Covariance Type: nonrobust
==========================================================================
======
coef std err t P>|t| [0.02 0.975]
--------------------------------------------------------------------------
------
const 4.7867 0.115 41.533 0.000 4.560 5.013
employment 0.0053 0.001 3.947 0.000 0.003 0.008
tobinq -0.1406 0.015 -9.555 0.000 -0.170-0.112
value 7.475e-05 8.81e-06 8.488 0.000 5.74e-05 9.2e-05
institutions 0.0251 0.002 10.122 0.000 0.020 0.030
sp500_yes 1.4786 0.129 11.487 0.000 1.226 1.731
==========================================================================
====
Omnibus: 25.118 Durbin-Watson: 1.983
Prob(Omnibus): 0.000 Jarque-Bera (JB): 68.680
Skew: -0.020 Prob(JB): 1.22e-15
Kurtosis: 4.671 Cond. No: 2.28e+04
==========================================================================
====
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 2.28e+04. This might indicate that
there are
strong multicollinearity or other numerical problems.

# checking model performance on train set (seen 70% data)

print("Training Performance\n")
olsmodel_final_train_perf = model_performance_regression(
olsmodel_final, x_train2, y_train
)
olsmodel_final_train_perf

Training Performance

RMSE MAE MAPE

0 1.131306 0.843946 26.941502

# checking model performance on test set (seen 30% data)

print("Test Performance\n")
olsmodel_final_test_perf = model_performance_regression(olsmodel_final,
x_test2, y_test)
olsmodel_final_test_perf

Test Performance

RMSE MAE MAPE

0 1.030857 0.812045 23.962577

Girish Chadha - 29th December 2022
100% (3)
Girish Chadha - 29th December 2022
35 pages
Game Theory Slides
100% (2)
Game Theory Slides
44 pages
Python_Codes_Regression - Jupyter Notebook
No ratings yet
Python_Codes_Regression - Jupyter Notebook
7 pages
Simple_and_Multiple_Regression
No ratings yet
Simple_and_Multiple_Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
211423205047-Exp1c
No ratings yet
211423205047-Exp1c
6 pages
Supervised Learning For Data Science...
No ratings yet
Supervised Learning For Data Science...
14 pages
Code Book
No ratings yet
Code Book
20 pages
Lab4 - SLR - Ipynb - Colaboratory
No ratings yet
Lab4 - SLR - Ipynb - Colaboratory
7 pages
New Text Document
No ratings yet
New Text Document
7 pages
Data_Analysis_Report
No ratings yet
Data_Analysis_Report
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
ml_all_projectpdf_removed
No ratings yet
ml_all_projectpdf_removed
41 pages
Task1
No ratings yet
Task1
5 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Btech1007022_lab5.1
No ratings yet
Btech1007022_lab5.1
9 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
2.3 ML (Implementation of Polynomial Regression Using Python)
No ratings yet
2.3 ML (Implementation of Polynomial Regression Using Python)
9 pages
Sample Analysis Multiple Regression
No ratings yet
Sample Analysis Multiple Regression
9 pages
Chapter 2: Properties of The Regression Coe Cients and Hypothesis Testing
No ratings yet
Chapter 2: Properties of The Regression Coe Cients and Hypothesis Testing
5 pages
How to Perform Simple Linear Regression in Python
No ratings yet
How to Perform Simple Linear Regression in Python
8 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
week2
No ratings yet
week2
43 pages
CE1 Sol
No ratings yet
CE1 Sol
7 pages
Kata Pengantar Vano
No ratings yet
Kata Pengantar Vano
86 pages
ADS_EXP_01_B4_64
No ratings yet
ADS_EXP_01_B4_64
4 pages
Problem Set 6
No ratings yet
Problem Set 6
6 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
2 Regression
No ratings yet
2 Regression
15 pages
assignment2
No ratings yet
assignment2
5 pages
Btech1007022_lab5
No ratings yet
Btech1007022_lab5
14 pages
vertopal.com_Lab_Linear_Regression
No ratings yet
vertopal.com_Lab_Linear_Regression
21 pages
BA Soln
No ratings yet
BA Soln
9 pages
CH 14 Handout
No ratings yet
CH 14 Handout
6 pages
sol_eval_1
No ratings yet
sol_eval_1
4 pages
Regression
No ratings yet
Regression
16 pages
Unit II - Diagnotis and Multiple Linear
No ratings yet
Unit II - Diagnotis and Multiple Linear
8 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
TestExercise 3.ipynb - Colab
No ratings yet
TestExercise 3.ipynb - Colab
8 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Lecture-5---Polynomial-Regression-imran-07032025-114203am
No ratings yet
Lecture-5---Polynomial-Regression-imran-07032025-114203am
39 pages
Naive Bayes
No ratings yet
Naive Bayes
58 pages
Question 1 B
No ratings yet
Question 1 B
6 pages
Machine Learning Basics 1683717543
No ratings yet
Machine Learning Basics 1683717543
15 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
132 pages
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
No ratings yet
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
21 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Building Better Policies The Nuts and Bolts of Monitoring and Evaluation Systems
100% (1)
Building Better Policies The Nuts and Bolts of Monitoring and Evaluation Systems
252 pages
Variables left
No ratings yet
Variables left
7 pages
The Pragmatic Research Approach - A Framework For Sustainable Management of Public Housing Estates in Nigeria
100% (1)
The Pragmatic Research Approach - A Framework For Sustainable Management of Public Housing Estates in Nigeria
12 pages
Intro To Quantitative Analysis
100% (2)
Intro To Quantitative Analysis
40 pages
1 s2.0 S2405844023075643 Main
No ratings yet
1 s2.0 S2405844023075643 Main
19 pages
Family Buying Behaviour: Parents' Perspective of Children Influence On Their Buying Behaviour
No ratings yet
Family Buying Behaviour: Parents' Perspective of Children Influence On Their Buying Behaviour
78 pages
Bun Sartori PDF
No ratings yet
Bun Sartori PDF
22 pages
Analysis of Correlated Data with SAS and R Mohamed M. Shoukri 2024 scribd download
100% (3)
Analysis of Correlated Data with SAS and R Mohamed M. Shoukri 2024 scribd download
55 pages
Games and Decision Making
No ratings yet
Games and Decision Making
7 pages
Research Handouts
100% (2)
Research Handouts
14 pages
Key Concepts and Steps in Quanti and Quali Research
No ratings yet
Key Concepts and Steps in Quanti and Quali Research
51 pages
Experimental Research Design: Module IV Continued
No ratings yet
Experimental Research Design: Module IV Continued
54 pages
MBA Semester-II: CC 207 - Research Methodology and Operations Research (RM & OR)
No ratings yet
MBA Semester-II: CC 207 - Research Methodology and Operations Research (RM & OR)
3 pages
Sta 250 2022 Session 2
No ratings yet
Sta 250 2022 Session 2
9 pages
PORTAL2 - User Guide - v1 - EN
No ratings yet
PORTAL2 - User Guide - v1 - EN
28 pages
Lesson 1 - Prac 2
No ratings yet
Lesson 1 - Prac 2
46 pages
(Ebook) Qualitative Analysis for Planning and Policy: Beyond the Numbers by John Gaber and Sharon Gaber ISBN 9780367330033, 9781351179614, 9781932364316, 9781932364323, 0367330032, 1351179616, 1932364315, 1932364323, 2007936587 instant download
100% (3)
(Ebook) Qualitative Analysis for Planning and Policy: Beyond the Numbers by John Gaber and Sharon Gaber ISBN 9780367330033, 9781351179614, 9781932364316, 9781932364323, 0367330032, 1351179616, 1932364315, 1932364323, 2007936587 instant download
58 pages
02 - Sistem Penunjang Keputusan Dalam Optimalisasi Pemberian Insentif Terhadap Pemasok Menggunakan Metode TOPSIS
No ratings yet
02 - Sistem Penunjang Keputusan Dalam Optimalisasi Pemberian Insentif Terhadap Pemasok Menggunakan Metode TOPSIS
8 pages
BRM PPT 1
No ratings yet
BRM PPT 1
8 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Langley, A. (2007). Process Thinking in Strategic Organization. Strategic Organization, 5(3)- 271-283
No ratings yet
Langley, A. (2007). Process Thinking in Strategic Organization. Strategic Organization, 5(3)- 271-283
12 pages
Zelalem Anley
No ratings yet
Zelalem Anley
67 pages
Nursing Research 1 Module
No ratings yet
Nursing Research 1 Module
121 pages
Qualitative Research in Logistics and Supply Chain Management 9781781904046 9781781904039 Compress
No ratings yet
Qualitative Research in Logistics and Supply Chain Management 9781781904046 9781781904039 Compress
165 pages
Critical Thinking
No ratings yet
Critical Thinking
12 pages
Process Control:: Introduction To Quality Control
No ratings yet
Process Control:: Introduction To Quality Control
11 pages
Abhishek Project Final (2)
No ratings yet
Abhishek Project Final (2)
50 pages
How To Research
No ratings yet
How To Research
47 pages
Practical Research 2: Quarter 1 - Module 2 Identifying The Inquiry and Stating The Problem
No ratings yet
Practical Research 2: Quarter 1 - Module 2 Identifying The Inquiry and Stating The Problem
25 pages

predictive modelling outputs

Uploaded by

predictive modelling outputs

Uploaded by

# checking model performance on train set (seen 70% data)

RMSE MAE MAPE

# checking model performance on test set (seen 30% data)

RMSE MAE MAPE

0 1.030785 0.81383 23.978281

Checking Linear Regression Assumptions

We will be checking the following Linear Regression assumptions:

TEST FOR MULTICOLLINEARITY

Dropping variables with high VIF

 We will test for multicollinearity using VIF.

Instead, we will do the following:

# initial list of columns

# setting an initial max p-value

while len(cols) > 0:

# fitting the model

# getting the p-values and the maximum p-value

# name of the variable with maximum p-value

if max_p_value > 0.05:

# checking model performance on train set (seen 70% data)

RMSE MAE MAPE

0 1.131306 0.843946 26.941502

# checking model performance on test set (seen 30% data)

RMSE MAE MAPE

0 1.131306 0.843946 26.941502

TEST FOR LINEARITY AND INDEPENDENCE

Actual Fitted Residu

TEST FOR HOMOSCEDASTICITY

We will test for homoscedasticity by using the goldfeldquandt test.

import statsmodels.stats.api as sms

[('F statistic', 1.0339466399164485), ('p-value', 0.38839487294695685)]

Final Model Summary

olsmodel_final = sm.OLS(y_train, x_train2).fit()

OLS Regression Results

# checking model performance on train set (seen 70% data)

RMSE MAE MAPE

0 1.131306 0.843946 26.941502

# checking model performance on test set (seen 30% data)

RMSE MAE MAPE

0 1.030857 0.812045 23.962577

You might also like