0% found this document useful (0 votes)

119 views3 pages

Business Analytics - Prediction Model

This document evaluates several models for predicting average dollars spent per week by panelists. It finds that a naive model without predictors has a RMSE of $1.85 on the validation set. Model 1 includes intercept only and has slightly better RMSE than the naive model. However, it shows signs of overfitting with decreasing performance on the validation set. Model 2 applies a logarithmic transformation to reduce skewness in the residuals, resulting in a better fitting model compared to Model 1 based on error metrics and residual analysis. Subset 12 from feature selection analysis seems most promising for predictive power due to having many predictors and a Mallow's Cp close to the number of predictors.

Uploaded by

Mukul Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views3 pages

Business Analytics - Prediction Model

Uploaded by

Mukul Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

BADM Assignment 1

Evaluate Model 0:

Question 1. Using the information in the training set, what is a naive prediction (without building any prediction model)
for average dollars spent per week by a panelist? Why?
Naïve prediction is the benchmark prediction which sets the base for any prediction model. In this, we simply use mean or
median value of the output from the training set instead of building any models, which is the closest representation of the
predicted value. In my case the mean comes out to be $3.54 (Refer Model_0 in the excel sheet)
Question 2. What is the RMSE on the validation set using naïve prediction?

RMSE value comes out to be $1.85 (Refer Model_0 in the excel sheet).

Evaluate Model 1:

Predictor Variables:

Output Variable: Avg(Sum(DOLLARS))

Question 3. Please copy the Prediction Summary for the training and the validation sets into your report. How does the
performance compare across the training and validation sets? Which error metrics can be compared across the two
summary reports? Is there an indication of over-fitting? How does this model (Model 1) compare to the Naïve model
(Model 0) as per the RMSE of this model and that of the Naïve Model?

By comparing the above two tables, we can see that model performance has decreased as all (except SSE) the error
measuring parameters increased. It can be observed that RMSE changed from $1.65 to $1.81 whereas R2 changed from
0.12 to 0.03 which can be an indication of over-fitting.
Other than SSE metrics we can compare all other metrics. Reason behind the same is that SSE depends on the size of the
sample, which is 60:40 in this case hence cannot be compared. All other metrics are obtained by taking average at some
point in the calculation process and hence normalizing the sample size.
This model looks slightly better than Naïve model as RMSE values of both training ($1.65) and validation set ($1.81) is
better than RMSE value of Naïve model ($1.85). Since, R2 has also reduced significantly we can only say that Model_1 is
slightly better than Model_0.
Question 4. Create a histogram for the training set residuals as obtained in Model 1. What does this chart tell us about
potential prediction errors---skewed / not skewed, the nature of positive or negative errors, etc.? How is this situation
typically handled?

We can see that histogram is skewed towards left

indicating a lot of negative values in the residuals, which
suggests that actual values are less than the predicted
values.

To remove the skewness, we can apply various type of

transformations such as logarithmic, square root, z-score
normalization, etc. This transformation allows to bring
data to similar scale and linearizes both the scales.

Evaluate Model 2:

Predictor Variables: Same as Model_1

Output Variable: LNAvg(Sum(DOLLARS))

Question 5. Include the histogram of the training set residuals in Model 2 in your report. How does this histogram
compare to Model 1 histogram? Can you say which model ‘better’ fits the training data?

Residual histogram for model_2 looks to be normally

distributes suggesting that skewness was eliminated by
using logarithmic transformation for avg(sum(dollars)).

Since, skewness is less and error is normally distributed,

we can safely say that model_2 better fits the training
data.
Question 6. From Model 2 output, copy the Prediction Summary Reports for the training and validation sets. Can the
Prediction Summary Reports as obtained in Model 2 be compared to those obtained in Model 1 as is? Well, the answer is
no, but I like you to explain clearly why not.

Metrics for Model_2 looks better compared to Model_1, we can see that error parameters have reduced significantly and
R2 is a bit higher as well. We cannot compare the values of Model_1 and Model_2 directly as the scale of sample set is
different for both the models (Normal vs Logarithmic)

Question 7. Based on your explanation above, manually compute the RMSE of the training and the validation sets
corresponding to Model 2 so that this RMSE can be compared to the RMSE obtained in Model 1.

After calculating inverse log for all the individual prediction value in training and validation data set:
Refer Model_2 training worksheet: RMSE_Model_2 (training set): 1.69
Refer Model_2 validation worksheet: RMSE_Model_2 (validation set): 1.84

RMSE values of model_1 picked from Question 3:

RMSE_Model_1 (training set): 1.65
RMSE_Model_1 (validation set): 1.81

Question 8. Which model would you prefer for predicting average dollars spent if you go with RMSE? Why?

Basis the RMSE, I would want to go ahead with model_1 since RMSE values in both training as well as validation set is
lower for model_1 as compared to model_2. But on the other hand if we see that skewness of residuals is eliminated in
model_2 so it should be the model preferred for prediction.

Model_1 (LinReg_Output):
Question 9. If we only use two predictors in addition to the intercept, which pair will you choose? Why?

Based on the LinReg_FS output we can choose the following as the pair other than intercept:
• HH_AGE
• CHILDREN_GROUP_CODE

After the intercept, these are the next best predictor variables as per the feature selection report.

Question 10. Which subset of predictors seems to be most promising in terms of predictive power? Why?

Subset 12 (with 12 predictor variables) seems to be most promising in terms of predictive power as mallow’s cp (~15) is
close to the number of predictor variables in the model and subset 12 has the closest number of predictor variables as well.

Blank Horoscope Wheel
100% (2)
Blank Horoscope Wheel
2 pages
Atlantic Computer Case Solutions
100% (1)
Atlantic Computer Case Solutions
4 pages
Riskman SectionD Group3 Case2
100% (1)
Riskman SectionD Group3 Case2
8 pages
Pile Driving Analysis Via Driving Test
No ratings yet
Pile Driving Analysis Via Driving Test
173 pages
4.1 WILDAVSKY If Planning Is Everything
100% (1)
4.1 WILDAVSKY If Planning Is Everything
27 pages
Wind Tunnels
100% (4)
Wind Tunnels
240 pages
P&G Questions and Answers
No ratings yet
P&G Questions and Answers
12 pages
DATA MINING TOOLS & ACTIVITIES PPT by Me.....
50% (2)
DATA MINING TOOLS & ACTIVITIES PPT by Me.....
25 pages
Steel - Case Study
No ratings yet
Steel - Case Study
18 pages
Maheshwari Chapter 1
No ratings yet
Maheshwari Chapter 1
39 pages
DS II Optimization Self Solns
No ratings yet
DS II Optimization Self Solns
68 pages
MIS BA Solution Chapter03
No ratings yet
MIS BA Solution Chapter03
3 pages
Competing On Resources: A Comprehensive Study of Article by David J. Collis and Cynthia A. Montgomery
No ratings yet
Competing On Resources: A Comprehensive Study of Article by David J. Collis and Cynthia A. Montgomery
13 pages
Visual Plumes Mixing Zone Modeling Software
No ratings yet
Visual Plumes Mixing Zone Modeling Software
11 pages
The Seer-Sucker Theory: The Value of Experts in Forecasting
No ratings yet
The Seer-Sucker Theory: The Value of Experts in Forecasting
7 pages
Financial Management - Case - Sealed Air
100% (1)
Financial Management - Case - Sealed Air
7 pages
BAE Automated Systems (A)
100% (1)
BAE Automated Systems (A)
6 pages
Jurimetrics An Introduction
No ratings yet
Jurimetrics An Introduction
7 pages
Bharat Forge Limited - Global Leadership - Submission by Group 3
100% (1)
Bharat Forge Limited - Global Leadership - Submission by Group 3
1 page
Subject: Fm-07 Assignment: S B Decker, Inc.: Tanley Lack AND
No ratings yet
Subject: Fm-07 Assignment: S B Decker, Inc.: Tanley Lack AND
3 pages
Samsung Electronics Case
100% (1)
Samsung Electronics Case
3 pages
Stock Price Literature Review
100% (1)
Stock Price Literature Review
5 pages
Final PPT
No ratings yet
Final PPT
16 pages
Case 10
No ratings yet
Case 10
7 pages
How Would You Describe Jetblue'S Operations Strategy Prior To The November 2005 Adoption of The E190?
No ratings yet
How Would You Describe Jetblue'S Operations Strategy Prior To The November 2005 Adoption of The E190?
9 pages
Predicting The Trends of Quality-Oriented Jobs
No ratings yet
Predicting The Trends of Quality-Oriented Jobs
3 pages
Sealed Air Case Questions
No ratings yet
Sealed Air Case Questions
1 page
Lex Service PLC - Cost of Capital1
No ratings yet
Lex Service PLC - Cost of Capital1
4 pages
Lucent Technologies Case
No ratings yet
Lucent Technologies Case
9 pages
Tugas Teori Akun 17 No
No ratings yet
Tugas Teori Akun 17 No
9 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
Allergan Case Analysis: This Study Resource Was
No ratings yet
Allergan Case Analysis: This Study Resource Was
5 pages
Student Spreadsheet
0% (1)
Student Spreadsheet
14 pages
A Comparison of Dynamic Pile Driving Formulas With The Wave Equation
No ratings yet
A Comparison of Dynamic Pile Driving Formulas With The Wave Equation
22 pages
LottoArchitect 2 2-Helpfile
No ratings yet
LottoArchitect 2 2-Helpfile
39 pages
CH 032
No ratings yet
CH 032
57 pages
Private Equity and Venture Capital: Atul Kedia
No ratings yet
Private Equity and Venture Capital: Atul Kedia
12 pages
Atlantic Computer Analysis
No ratings yet
Atlantic Computer Analysis
15 pages
Panel Flutter
No ratings yet
Panel Flutter
23 pages
The Language of Writing Ielts Task 1
No ratings yet
The Language of Writing Ielts Task 1
25 pages
Chapters 9 and 10 Edited
No ratings yet
Chapters 9 and 10 Edited
18 pages
Daud Engine Parts Company
No ratings yet
Daud Engine Parts Company
3 pages
Sealed Air Corporation v1.0
No ratings yet
Sealed Air Corporation v1.0
8 pages
Jetblue Airways 609046-PDF-ENG - Case Study
No ratings yet
Jetblue Airways 609046-PDF-ENG - Case Study
9 pages
Prediction of Stock Prices Using Predictive Data Analytic On African Market
No ratings yet
Prediction of Stock Prices Using Predictive Data Analytic On African Market
93 pages
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
No ratings yet
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
6 pages
Sun Microsystems Case PDF
No ratings yet
Sun Microsystems Case PDF
30 pages
Psych 205 Book Notes
No ratings yet
Psych 205 Book Notes
19 pages
Earn Pay-Outs: Sales Goals Bonus Earnings Goals Bonus
No ratings yet
Earn Pay-Outs: Sales Goals Bonus Earnings Goals Bonus
5 pages
Eastboro Machine Tools Corporation
No ratings yet
Eastboro Machine Tools Corporation
19 pages
Sealed Air Leveraged Recapitalization
No ratings yet
Sealed Air Leveraged Recapitalization
4 pages
CASE 2 Suburban Electronics Company Session17
No ratings yet
CASE 2 Suburban Electronics Company Session17
7 pages
Irjet V3i7176 PDF
No ratings yet
Irjet V3i7176 PDF
5 pages
Group 7 - Morrissey Forgings
No ratings yet
Group 7 - Morrissey Forgings
10 pages
Tata Tea Limited - Brand Valuation
No ratings yet
Tata Tea Limited - Brand Valuation
37 pages
Game Theory Quiz
No ratings yet
Game Theory Quiz
3 pages
Chace Shipping Case Study - Op Research
No ratings yet
Chace Shipping Case Study - Op Research
5 pages
Auction at Casturn Systems
No ratings yet
Auction at Casturn Systems
3 pages
Beach Nut
No ratings yet
Beach Nut
4 pages
Atlantic Bundle Case Study - Group M2
No ratings yet
Atlantic Bundle Case Study - Group M2
6 pages
Print Out
No ratings yet
Print Out
17 pages
Case Submission On: Mellon Financial and The Bank of New York
No ratings yet
Case Submission On: Mellon Financial and The Bank of New York
3 pages
GE Health Care Case: Executive Summary
No ratings yet
GE Health Care Case: Executive Summary
4 pages
Chapter - 4 Demand Forcasting
No ratings yet
Chapter - 4 Demand Forcasting
13 pages
Whirlpool Case Study
No ratings yet
Whirlpool Case Study
7 pages
Precise Software Solutions
No ratings yet
Precise Software Solutions
2 pages
BNB Assignment Questions
0% (4)
BNB Assignment Questions
2 pages
Case Analysis - Cumberland Metal Industries
No ratings yet
Case Analysis - Cumberland Metal Industries
2 pages
Competitor Analysis of Rahimafrooz IPS.
No ratings yet
Competitor Analysis of Rahimafrooz IPS.
1 page
Predictive Analytics Siegel en 27852
No ratings yet
Predictive Analytics Siegel en 27852
7 pages
Precision Animal Nutrition
No ratings yet
Precision Animal Nutrition
9 pages
Manoranjan PP T
No ratings yet
Manoranjan PP T
22 pages
A Machine Learning Approach To Big Data Regression Analysis of Real Estate Prices For Inferential and Predictive Purposes
No ratings yet
A Machine Learning Approach To Big Data Regression Analysis of Real Estate Prices For Inferential and Predictive Purposes
39 pages
Case 4 Precise Software Solutions: Name: - Revappa (2021085)
No ratings yet
Case 4 Precise Software Solutions: Name: - Revappa (2021085)
2 pages
Case Analysis Report
No ratings yet
Case Analysis Report
9 pages
G6 Moore M
No ratings yet
G6 Moore M
4 pages
Unit 3
No ratings yet
Unit 3
5 pages
Module 3,4and 5
No ratings yet
Module 3,4and 5
50 pages
Operation and Product Management
No ratings yet
Operation and Product Management
2 pages
Predictive Maintenance - Final Presentation
No ratings yet
Predictive Maintenance - Final Presentation
23 pages
What Do You Think Hilton Leadership Should Do After The Blackstone Acquisition? Should They Further Invest in CRM or Simply Maintain The Status Quo?
No ratings yet
What Do You Think Hilton Leadership Should Do After The Blackstone Acquisition? Should They Further Invest in CRM or Simply Maintain The Status Quo?
1 page
Biocon Limited
No ratings yet
Biocon Limited
4 pages
Soal Capital Budgeting Chapter 11
No ratings yet
Soal Capital Budgeting Chapter 11
1 page
Wound Healing Prediction
No ratings yet
Wound Healing Prediction
8 pages
BlueSky Single-Leg (B2) Solt
100% (1)
BlueSky Single-Leg (B2) Solt
2 pages
Corp Gov Group1 - Sealed Air
No ratings yet
Corp Gov Group1 - Sealed Air
5 pages
Southwest Airlines 2
No ratings yet
Southwest Airlines 2
3 pages
Boeing Presentation
No ratings yet
Boeing Presentation
6 pages

Business Analytics - Prediction Model

Uploaded by

Business Analytics - Prediction Model

Uploaded by

BADM Assignment 1

Output Variable: Avg(Sum(DOLLARS))

We can see that histogram is skewed towards left

To remove the skewness, we can apply various type of

Predictor Variables: Same as Model_1

Output Variable: LNAvg(Sum(DOLLARS))

Residual histogram for model_2 looks to be normally

Since, skewness is less and error is normally distributed,

RMSE values of model_1 picked from Question 3:

You might also like