0% found this document useful (0 votes)
2 views

Practical 6 Multiple Linear Regression Using SPSS

The document outlines the process of developing a multiple linear regression model using SPSS to predict monthly revenue based on various independent variables. It discusses key metrics for evaluating regression performance, including R-squared, RMSE, and MAE, and highlights issues such as multicollinearity and normality of residuals. Ultimately, a simplified model using only the average room price variable is recommended due to the insignificance of other variables and the fulfillment of regression assumptions.

Uploaded by

easyupload999
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Practical 6 Multiple Linear Regression Using SPSS

The document outlines the process of developing a multiple linear regression model using SPSS to predict monthly revenue based on various independent variables. It discusses key metrics for evaluating regression performance, including R-squared, RMSE, and MAE, and highlights issues such as multicollinearity and normality of residuals. Ultimately, a simplified model using only the average room price variable is recommended due to the insignificance of other variables and the fulfillment of regression assumptions.

Uploaded by

easyupload999
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Multiple Linear

Regression Using SPSS


Assumption of multiple linear
regression
Metrics to evaluate regression
• 1. R-squared (R²):
• Definition: Measures the proportion of the variance in the dependent
variable that is predictable from the independent variables.
• Range: 0 to 1.
• Higher is better: Closer to 1 indicates better model performance.
• Limitations: Does not convey the magnitude of prediction errors.
• Sensitive to overfitting (especially in high-dimensional datasets).
• Doesn't work well with nonlinear relationships unless transformed appropriately.
Metrics to evaluate regression…
2. Root Mean Squared Error (RMSE):
• Definition: Square root of the average squared difference between predicted
and actual values.

• Range: 0 to ∞.
• Lower is better: Smaller RMSE indicates better model performance.
• Strength:
• Penalizes large errors more than small ones (due to squaring).
• Useful when large deviations are more concerning.
• Limitations: Sensitive to outliers due to squaring.
Metrics to evaluate regression…
• 3. Mean Absolute Error (MAE):
• Definition: Average of absolute differences between predicted and actual
values.

• Range: 0 to ∞.
• Lower is better: Smaller MAE indicates better model performance.
• Strength:
• Provides a straightforward measure of average error magnitude.
• Less sensitive to outliers compared to RMSE.
• Limitations: Does not account for variance in error magnitude.
Dataset

Problem Statement: Develop a regression model to predict monthly revenue


based on average room price, guest satisfaction score, marketing expenses.
Enter data
• Input the dataset
• Define variable names Monthly_Revenue, Avg_Room_Price,
Guest_Satisfaction, Marketing_Expenses in the "Variable View" tab.
• Enter the dataset in the "Data View" tab.
• Go to Analyze > Regression > Linear.
• Select Monthly Revenue as the dependent variable.
• Select Average Room Price, Guest Satisfaction Score, and Marketing
Expenses as independent variables.
Statistics Options
• Click on Statistics.
• Select Collinearity diagnostic for
Collinearity assumption testing,
• In Residual options select
Durbin-Watson for the
independence of residuals
assumptions
Plots options
• Click on Plots
• Add ZPRED (Normalised
predictions) to Y-Axis
• Add ZRESID (Normalised
residuals) to X-Axis
• This will generate plot to check
for Homoscedasticity assumption
• Select Histogram and Normal
probability plot, this will test the
normality of Residuals.
Save options
• Click on Save button,
• Select Predicted Values as
Unstandardized.
• This step will save the regression
predicted values as a new
variable to data view.
• Select Unstandardized Residuals,
this will save the error/residuals
to data.
• Click Continue, Click Ok.
Output

Our model’s R square is 0.984 which indicates that 98.4% of the variation in monthly revenue is explained
by the independent variables. This is pretty good regression model.
Durbin-Watson Test for
Independence of Residuals

From previous slide, Durbin-Watson Value is 1.448 which means there is positive
autocorrelation.
Anova for Regression

The ANOVA (Analysis of Variance) table in regression analysis evaluates whether the regression model
explains a significant portion of the variation in the dependent variable. It essentially tests the null
hypothesis that all regression coefficients (except the intercept) are zero.

Since null hypothesis is rejected, hence we cans ay that regression coefficients aren’t zero.
T-test for the regression coefficients

In regression analysis, the t-test for regression coefficients evaluates whether each independent
variable significantly contributes to predicting the dependent variable. Specifically, it tests the null
hypothesis (H0​) that a coefficient (β) is equal to zero, indicating no effect.
Only average room price contributes significantly for predicting the dependent variable. We
can remove guest satisfaction and marketing expenses variables.
VIF (Variance Inflation Ratio) is high for avg_room_price and marketing expense (>10).
Hence, multicollinearity exists here.
Correlation bw independent
variables.

Since correlation is high, we can say that there exists multicollinearity In the data.
Normality Testing

Residuals aren’t normally distributed


Testing normality of Residuals using
Tests

Residuals aren’t normally distributed as tested by the above two tests


Test for homoscedasticity

All the points in this scatter plot appear


randomly distributed, with no
discernible pattern. Therefore, the
residuals exhibit homoscedasticity.
Regression line

Line of regression is Y= 2569.615 + 19.845 × X1 + 29.883 × X2 + .111 × X3


Where X1 is avg. room price, X2 is guest satisfaction score, X3 is
marketing expenses
Evaluate the model
• Calculate the following model evaluation metrics.
• R2= 0.983
• RMSE =56.05
• MAE=38.56
Assumption violated
• Independence of residuals
• Multicollinearity.
• Normality of residuals.
Build a new regression model
• Here we’ll use model with average room per price variable only, as
the contribution of other variables found in-significant.
Check R-square and Durbin Watson
Test

R-square is 0.98 and Durbin Watson test range is between 1.5-2.5,


hence Residuals are independent of each other.
Test for Homoscedasticity

No pattern exists here, hence homoscedasticity assumption is satisfied.


Normality Test for Residuals
Normality Test for Residuals

Residuals are normally distributed as p value is greater


than 0.05.
Regression line

The regression model is : Y= 2473.982 + 25.340 × X1


Where X1 is Average room price variable
Conclusions
• R-Square is 0.98
• RMSE=62.47
• MAE=47.13
• No multicollinearity as we’ve only single variable.
• RMSE and MAE are higher than the model built with 3 variables.
• All the assumptions are satisfied by single independent variables.
• But RMSE and MAE values are high.
Few other concepts related to
models
• Training data
• Test data
• Overfitting
• Underfitting

You might also like