0% found this document useful (0 votes)

32 views

Syndicate 6 - Assignment 3

The document compares different predictive models for wine quality and discusses their performance based on error metrics and a user-specific loss function. It finds that a neural network model has the lowest error and loss, while k-means clustering performs worst. Regression tree and k-NN models have similar errors to basic regressions. Underpredicting high quality wines leads to the highest losses.

Uploaded by

Sope Dalley

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Syndicate 6 - Assignment 3

Uploaded by

Sope Dalley

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Syndicate 6- Alister King, Radeyan Sazzad, Matt Lewis, Sope Dalley

Predictive Analytics Assignment 3

Comment on how your results from these methods
compare to your analysis from Syndicate Task #1.
Compared to the results from Syndicate Task #1 shown in Q2, all four models are inferior for this
application in terms of error. Of the four new models, the neural network is considered the most
accurate given the lowest errors for RMSE, MAE, MAPE and MASE. The K-means clustering model
is the least accurate of all models. All models perform better than the naïve case for MASE.

Table 1: Prediction Error For All Models

RMSE MAE MAPE MASE UserLoss

Linear 0.635 0.488 8.700 0.735 99.432

Stepwise 0.636 0.489 8.700 0.735 99.436

Non-Linear 0.627 0.488 8.710 0.734 99.437

RegTree 0.629 0.474 8.902 0.743 99.432

NeurNet 0.599 0.462 8.779 0.735 97.846

kMeansCluster 0.661 0.532 9.940 0.834 102.607

kNN 0.609 0.493 9.324 0.773 99.432

The regression tree shows the highest level of error which given the limited level of granularity that
can be achieved. Changing the CP level lower had little impact on the amount of error the model
produced, however using a value of 0.05 made the tree too simple and not functional. For this reason
0.01 was the optimal value for CP. Only one layer was used for the neural networks, adding more
layers did not improve the predictability of the model. This was the most accurate of the machine
learning models for MAE and MASE.Two clusters were chosen for the K-means clustering as all
three three outputs for the elbow method, the gap statistic and the silhouette measure showed two as
the optimal number of clusters. The K-NN regression model was the third most accurate of the
machine learning models, with all errors higher than the Neural network and K-means regression.

Comparing the variable importance plot of the linear regression from the first assignment to the
variable importance plot for the regression tree, Garson’s relative importance and Olden’s connection
weights shows similarity between the importance of different variables and the quality score.
Syndicate 6- Alister King, Radeyan Sazzad, Matt Lewis, Sope Dalley

Figure 1 - VIP of physicochemical properties from Task 1. Figure 2 – VIP of the regression tree.

The variable importance plot showed Alc, Density, sulphates, VA as the most important factors,
noting the magnitude of the result cannot be interpreted. This differed from the correlation plot as
density is considered the second most important factor.

Figure 3 – Garson’s relative importance. Figure 4 – Olden’s connection weights.

The highest magnitude factors from Garson’s relative importance shows CA, Alc, TSD and FA to be
the most important factors for the quality score of wine. It should be noted however that the direction
of the response cannot be determined. This differed from the correlation table given sulphates had a
higher correlation to quality score and VA which had a high negative correlation. Olden’s connection
weights show Alc, FA, FSD and Sulphates to be the most important variables for the quality score of
wine. It should be noted however that the magnitude of the variables cannot be interpreted. This
differed from the correlation table as FA had a lower correlation compared to other variables and CA
had a positive correlation compared to a negative importance.

Consider the scenario where you would like to use the

predictions for forming pricing and marketing strategies.
The premium wine segment typically ranges in pricing from $50 to over $1000 per bottle compared
with the mass produced market ranging from $5-$20 per bottle. Assuming the margins follow a
similar pattern, a weighting of 5 times the marketing cost was allocated to the opportunity cost of the
margin not realised on a premium wine. Using this logic a user-specific loss function was developed
which sums the instances when predictions were above 7 and the actual score in the training set was
below 7 (excess marketing expenses), and the opposite when actuals were below 7 but prediction was
above 7 (opportunity cost). The sum incorporates the magnitude of the error.

User Specific Loss Function (USLF)=excess marketing expense + 5xopporutnity cos

Syndicate 6- Alister King, Radeyan Sazzad, Matt Lewis, Sope Dalley

Regression Models (Task #1)

All 3 regression models (linear, stepwise, nonlinear) from task one demonstrated near identical results
for the loss function (99.432-99.439); indicating they would have similar under and overprediction
characteristics (see residual plots below). Further inspection of the residuals indicated a tendency to
underpredict at the higher scores (6-7) and overpredict at the lower QS (3-4). This was the case for all
3 regression models, indicating a flatter prediction. When conducting sensitivity analysis on
regression models, it was found that the significant underpredictions occurred with more extreme QS
scores (8), resulting in higher opportunity cost penalty from the USLF. The higher residuals are also
reflected in the histograms below. If the regression models were used to predict wine scores the flat
curve should be kept in mind and potentially reconsider the models if applying to the premium
market, or use segmentation methods discussed below, or tailor a separate regression for premium QS
values.

Figure 5– Histogram of fitted residuals for linear, stepwise, and nonlinear regressions.

Regression Tree
Regression tree analysis resulted in a similar USLF as the three regression models (99.432);
indicating the same impact of residuals at the higher end and minimal impact on segmenting the data.
This is due to the fewer observations at the higher and lower QS results.

Neural Network
The USLF for the single layer neural network was 97.846, 2 points lower than the regression models
and tree, indicating they are slightly less costly to the user for the specified USLF.

K-Means Clustering
K-Means clustering with 2 clusters showed a 3 point higher USLF when compared with the
regression models indicating higher cost to the used. This is driven by the higher standard error
increasing the opportunity cost and marketing costs (larger residuals on both sides).

K-NN Regression
K-NN regression showed a similar USLF to the regression models (99.432), reflecting a similar
amount of error (RMSE~0.6) as outlined in Table 1 which drives the residuals and thus the USLF.

The Neural Network model resulted in the lowest cost to the user based on the function weighted
towards opportunity cost (significantly penalising underprediction). Sensitivity analysis was
conducted by changing the weights further which further reinforced the cost of underprediction.
Larger amounts of data would allow for greater segmentation for the extreme scores (7,8) which
would reduce the impact of underprediction. It is also worth noting that the actual scores are allocated
in integers which reduces data resolution and increases the error across all models.

Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
Syndicate 6 - Assignment 1
No ratings yet
Syndicate 6 - Assignment 1
4 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
Wine Prediction
100% (1)
Wine Prediction
13 pages
7
No ratings yet
7
8 pages
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Conjoint 3
No ratings yet
Conjoint 3
6 pages
Notes - Predicitve Analystics - Multiple Regression_s
No ratings yet
Notes - Predicitve Analystics - Multiple Regression_s
24 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
8
No ratings yet
8
5 pages
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Machine Learning Based Predictive Modelling For The Enhancement of Wine Quality
No ratings yet
Machine Learning Based Predictive Modelling For The Enhancement of Wine Quality
18 pages
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Pred analytics
No ratings yet
Pred analytics
5 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
AI LOGISTIC REGRESSION
No ratings yet
AI LOGISTIC REGRESSION
2 pages
Model Evalution
No ratings yet
Model Evalution
6 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Wine Quality Prediction Using ML PPR
100% (1)
Wine Quality Prediction Using ML PPR
8 pages
Marketing Analytics Case Study Report
No ratings yet
Marketing Analytics Case Study Report
12 pages
Finalised FBA CIA 3
No ratings yet
Finalised FBA CIA 3
16 pages
Final Assessment Introductory Data Science Part 2
No ratings yet
Final Assessment Introductory Data Science Part 2
6 pages
ch03 Regression
No ratings yet
ch03 Regression
10 pages
Regression
No ratings yet
Regression
19 pages
Regression
No ratings yet
Regression
90 pages
Mscfe CRT m2
100% (1)
Mscfe CRT m2
6 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
625 Preliminary
No ratings yet
625 Preliminary
39 pages
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE, and RMSE in Regression Analysis Evaluation
No ratings yet
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE, and RMSE in Regression Analysis Evaluation
28 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
HW3_solution_Fall_2024
No ratings yet
HW3_solution_Fall_2024
8 pages
Copy of 4 In-class Examples (Excel)
No ratings yet
Copy of 4 In-class Examples (Excel)
36 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
JMP for Mixed Models
From Everand
JMP for Mixed Models
Ruth Hummel
No ratings yet
Regression - Analysis - Aldrin Sunny Antony
No ratings yet
Regression - Analysis - Aldrin Sunny Antony
15 pages
ANOVA inetrpretation-LIQUOR BY SHAILESH TIWARI
No ratings yet
ANOVA inetrpretation-LIQUOR BY SHAILESH TIWARI
14 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Practice Set 7
No ratings yet
Practice Set 7
5 pages
Bench Marking Paper
No ratings yet
Bench Marking Paper
31 pages
Irjmets Journal
No ratings yet
Irjmets Journal
7 pages
Practice 01 Linear Regression
No ratings yet
Practice 01 Linear Regression
3 pages
wine 9
No ratings yet
wine 9
20 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Part 1: Regression Model With Dummy Variables
No ratings yet
Part 1: Regression Model With Dummy Variables
16 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
SubjectiveQuestions
No ratings yet
SubjectiveQuestions
4 pages
ML Assigment 1 Report
No ratings yet
ML Assigment 1 Report
8 pages
SPSS Regression PC
No ratings yet
SPSS Regression PC
8 pages
Development of The Statistical Errors Raster Toolb
No ratings yet
Development of The Statistical Errors Raster Toolb
16 pages
Tutorial PracticeQuestions Class3 2023 Solution
No ratings yet
Tutorial PracticeQuestions Class3 2023 Solution
3 pages
IDS 532 Le Club Francais Case Report
No ratings yet
IDS 532 Le Club Francais Case Report
11 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
Conjoint For The Classwork
No ratings yet
Conjoint For The Classwork
8 pages
Excel Statistical Analysis
No ratings yet
Excel Statistical Analysis
108 pages
A Guide On How To Compare Different Models in Linear Progression
No ratings yet
A Guide On How To Compare Different Models in Linear Progression
8 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Predictive Analytics - Session 3: Associate Professor Ole Maneesoonthorn
No ratings yet
Predictive Analytics - Session 3: Associate Professor Ole Maneesoonthorn
30 pages
SD Custom Plan 2
No ratings yet
SD Custom Plan 2
2 pages
Income Statement Vertical Analysis Template
No ratings yet
Income Statement Vertical Analysis Template
2 pages
Income Statement Horizontal Analysis Template
No ratings yet
Income Statement Horizontal Analysis Template
2 pages
Term 1 2019 Mid-Term and Final Exam Timetable
No ratings yet
Term 1 2019 Mid-Term and Final Exam Timetable
1 page
Income Statement Horizontal Analysis Template
No ratings yet
Income Statement Horizontal Analysis Template
2 pages
Income Statement Vertical Analysis Template
No ratings yet
Income Statement Vertical Analysis Template
2 pages
A Wrist-Worn Biosensor System For Assessment of Neurological Status
No ratings yet
A Wrist-Worn Biosensor System For Assessment of Neurological Status
4 pages
Expert Systems With Applications: Marcin Michał Miro Nczuk, Jarosław Protasiewicz
No ratings yet
Expert Systems With Applications: Marcin Michał Miro Nczuk, Jarosław Protasiewicz
19 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
final print reporttt_removed
No ratings yet
final print reporttt_removed
26 pages
636-ArticleText-2070-1-10-20231207
No ratings yet
636-ArticleText-2070-1-10-20231207
17 pages
IEEE_Format_Paper
No ratings yet
IEEE_Format_Paper
20 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Classifying The Supervised Machine Learning and Comparing The Performances of The Algorithms
No ratings yet
Classifying The Supervised Machine Learning and Comparing The Performances of The Algorithms
17 pages
2022 CHVR Lalitha ICSCSP 2021 Proceedings
No ratings yet
2022 CHVR Lalitha ICSCSP 2021 Proceedings
793 pages
Iiver
No ratings yet
Iiver
53 pages
Orange Machine Learning
No ratings yet
Orange Machine Learning
8 pages
CS583 Supervised Learning
No ratings yet
CS583 Supervised Learning
166 pages
[English (Auto-generated)] All Machine Learning Algorithms Explained in 17 Min [DownSub.com]
No ratings yet
[English (Auto-generated)] All Machine Learning Algorithms Explained in 17 Min [DownSub.com]
19 pages
Fncom 17 1243779
No ratings yet
Fncom 17 1243779
14 pages
Projectfile SunilKaushikAIpoweredStressDetection
No ratings yet
Projectfile SunilKaushikAIpoweredStressDetection
41 pages
Hands on Data Science for Biologists Using Python 1st Edition Yasha Hasija pdf download
100% (2)
Hands on Data Science for Biologists Using Python 1st Edition Yasha Hasija pdf download
47 pages
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
No ratings yet
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
6 pages
Unit - 2 ML notes
No ratings yet
Unit - 2 ML notes
14 pages
Prediction of Failures in The Project Management K
No ratings yet
Prediction of Failures in The Project Management K
14 pages
Mcq's On Unit V
100% (1)
Mcq's On Unit V
6 pages
College Recommender System Using Student' Preferences/voting: A System Development With Empirical Study
No ratings yet
College Recommender System Using Student' Preferences/voting: A System Development With Empirical Study
12 pages
Lab 04 - SUpervised ML Classification
No ratings yet
Lab 04 - SUpervised ML Classification
3 pages
M S Ramaiah Institute of Technology Department of Information Science & Engg
No ratings yet
M S Ramaiah Institute of Technology Department of Information Science & Engg
11 pages
R2032051
No ratings yet
R2032051
7 pages
21CS64 Data Science and Visualization (PE)
No ratings yet
21CS64 Data Science and Visualization (PE)
37 pages
Ransomware Detection and Classification Using Ensemble Learning: A Random Forest Tree Approach
No ratings yet
Ransomware Detection and Classification Using Ensemble Learning: A Random Forest Tree Approach
7 pages
Movie Recommender System Using Content Based AndCollaborative Filtering
No ratings yet
Movie Recommender System Using Content Based AndCollaborative Filtering
7 pages
DM Manual-Min
No ratings yet
DM Manual-Min
100 pages
Chapter
100% (1)
Chapter
101 pages

Syndicate 6 - Assignment 3

Uploaded by

Syndicate 6 - Assignment 3

Uploaded by

Syndicate 6- Alister King, Radeyan Sazzad, Matt Lewis, Sope Dalley

Predictive Analytics Assignment 3

Table 1: Prediction Error For All Models

Linear 0.635 0.488 8.700 0.735 99.432

Stepwise 0.636 0.489 8.700 0.735 99.436

Non-Linear 0.627 0.488 8.710 0.734 99.437

RegTree 0.629 0.474 8.902 0.743 99.432

NeurNet 0.599 0.462 8.779 0.735 97.846

kMeansCluster 0.661 0.532 9.940 0.834 102.607

kNN 0.609 0.493 9.324 0.773 99.432

Figure 3 – Garson’s relative importance. Figure 4 – Olden’s connection weights.

Consider the scenario where you would like to use the

User Specific Loss Function (USLF)=excess marketing expense + 5xopporutnity cos

Regression Models (Task #1)

You might also like