predictive modelling outputs
predictive modelling outputs
print("Training Performance\n")
olsmodel1_train_perf = model_performance_regression(olsmodel1, x_train,
y_train)
olsmodel1_train_perf
Training Performance
0.84491
1.127269 26.95745
1
O/p
Test Performance
Feature VIF
0 const 6.124153
1 capital 4.014583
2 patents 2.986430
3 randd 5.545531
4 employment 3.593570
5 tobinq 1.064449
6 value 2.799430
7 institutions 1.331542
8 sp500_yes 1.622238
Dropping high p-value variables
We will drop the predictor variables having a p-value greater than 0.05 as they do not
significantly impact the target variable.
But sometimes p-values change after dropping a variable. So, we'll not drop all variables
at once.
Build a model, check the p-values of the variables, and drop the column with the highest
p-value.
Create a new model without the dropped feature, check the p-values of the variables,
and drop the column with the highest p-value.
Repeat the above two steps till there are no columns with p-value > 0.05.
The above process can also be done manually by picking one variable at a time that has
a high p-value, dropping it, and building a model again. But that might be a little tedious
and using a loop will be more efficient.
o/p
['const', 'employment', 'tobinq', 'value', 'institutions', 'sp500_yes']
Training Performance
Test Performance
We will test for linearity and independence by making a plot of fitted values vs residuals
and checking for patterns.
If there is no pattern, then we say the model is linear and residuals are independent.
Otherwise, the model is showing signs of non-linearity and residuals are not
independent.
-
65
5.772882 5.774955 0.00207
2
3
36 0.94420
6.340426 5.396227
6 0
-
44
9.259054 9.546073 0.28701
7
9
61 0.64043
6.229126 5.588692
8 4
61 0.12216
5.455543 5.333378
0 5
stats.shapiro(df_pred["Residuals"])
ShapiroResult(statistic=0.9822825883697879, pvalue=1.4029046526104298e-06)
If we get a p-value greater than 0.05, we can say that the residuals are homoscedastic.
Otherwise, they are heteroscedastic.
Training Performance
Test Performance