0% found this document useful (0 votes)

24 views13 pages

Report Revathy

This report explores using linear regression to predict wine quality based on various wine properties. Four models were evaluated and model 3, incorporating interaction terms between predictors, was selected as the best model due to its superior predictive accuracy and model fit. The analysis demonstrates the application of linear regression for creating predictive models of wine quality.

Uploaded by

Revathy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views13 pages

Report Revathy

Uploaded by

Revathy P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

WINE QUALITY PREDICTION- Submitted By:

REVATHY
LINEAR REGRESSION PROJECT PRABHAKARAN
Student No:8903669

Course Name: Multivariate Statistics-8031

Submitted To: Bip. Thapa
College: Conestoga College
Table of Contents

1. Abstract
2. Gathering Data
o Dataset Selection
o Dataset Description
3. Initial Modeling
o Variables Fixed
o Results of Missing Values
o Rows Removed
o Duplicates Found and Removed
o Structure of the Cleaned Dataset
4. Diagnostics
o Diagnostic Methods Employed
o Assumption Checks and Results
5. Model Selection
o List of Evaluated Models
6. Model Evaluation
o Evaluation Metrics
o Choosing the Best Model
7. Prediction
o Model Performance on Existing Data
8. Conclusion
o Key Findings
o Suggestions for Future Work
9. Appendix (if applicable)
o Dataset Link
o R program codes
o Diagnostic Plots
Abstract
This report explores using linear regression to predict wine quality. The analysis utilizes a
dataset containing measurements of various wine properties, like fixed acidity, volatile
acidity, and alcohol content. The goal is to develop a model predicting wine quality based
on these properties.
Following data exploration, an initial linear regression model was created. Tests were
conducted to assess the validity of the model's assumptions, and its performance was
evaluated using metrics like R-squared and root mean squared error (RMSE). Techniques
like polynomial transformations and interaction terms were then applied to improve the
model.
Four models were evaluated based on their performance and adherence to linear
regression assumptions. The models are Multiple Linear Regression, Logarithmic
Transformations, Interaction Terms and Forward Selection. Model 3(Interaction Terms),
which incorporates interaction terms between predictors, was chosen as the best
performing model due to its superior predictive accuracy and model fit. Predictions were
made using Model 3, and the results were analyzed to identify the most influential
predictors of wine quality.
This analysis offers insights into the factors impacting wine quality and demonstrates the
application of linear regression for creating predictive models. Further research and
refinement of the model could enhance predictive accuracy and understanding of factors
influencing wine quality.

Gathering Data

Dataset Selection:
For this project, the Wine Quality dataset sourced from Kaggle was selected. It contains
various wine properties, including fixed acidity, volatile acidity, citric acid, residual sugar,
chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol content,
and wine quality scores ranging from 0 to 10. (Raj Parmar, 2016)
Dataset Description:
The dataset comprises 6497 observations and 12variables. The response variable is wine
quality, and the predictors include fixed acidity, volatile acidity, citric acid, residual sugar,
chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol
content. The objective is to predict wine quality based on these properties.

Initial Modelling

Variables Fixed: The dataset initially contained 13 variables, including "type",

"fixed_acidity", "volatile_acidity", "citric_acid", "residual_sugar", "chlorides",
"free_sulfur_dioxide", "total_sulfur_dioxide", "density", "pH", "sulphates", "alcohol", and
"quality". The column "type" was removed as it was a categorical variable not relevant to
this study's scope.
Results of Missing Values: The missing_values output indicates the presence of missing
data for each variable in the dataset, highlighting any variables with missing values that
required handling.
Rows Removed: The number of rows removed due to missing values can be inferred by
comparing the dataset's row count before and after the removal process.
Duplicates Found and Removed: The duplicated_rows output shows whether any
duplicated rows were identified in the dataset. Duplicates, if found, would be eliminated
using the duplicated(wine_data) condition. However, the exact number of duplicates
detected and removed is not explicitly stated.
Structure of the Cleaned Dataset: The cleaned dataset consists of 6463 observations and
12 variables, encompassing both numerical and character types. Each row represents a
wine sample, with variables reflecting attributes such as acidity, residual sugar, and
alcohol content. Missing values and duplicate rows were addressed during the data
cleaning process.

Diagnostics

Diagnostic Methods Employed:

The analysis utilized diagnostic methods including Residuals vs Fitted Values Plot, Normal
Q-Q Plot.The Residuals vs Fitted Values Plot assessed linearity and homoscedasticity, the
Normal Q-Q Plot evaluated the normality of residuals.

Assumption Checks and Results:

Upon applying these diagnostic methods, the following results were obtained:
1. Residuals vs Fixed Acidity: No significant violations were detected, and a reasonable fit
was observed. (Fig. 1)
2. Normal QQ Plot: A slight deviation from normal distribution was noted. (Fig. 2)
3. Residuals vs Volatile Acidity: An underprediction tendency was observed for high
volatile acidity. (Fig. 3)
4. Residuals vs Citric Acid: No major violations were found, and a reasonable fit was
observed. (Fig. 4)
5. Residuals vs Chlorides: No substantial violations were detected, and a reasonable fit
was observed. (Fig. 6)
6. Residuals vs Free Sulfur Dioxide: No significant violations were detected, and a
reasonable fit was observed. (Fig. 7)
7. Residuals vs Total Sulfur Dioxide: Possible issues with the model fit were identified.
(Fig. 8)
8. Residuals vs Density: No significant violations were observed, and a reasonable fit was
noted. (Fig. 9)
9. Residuals vs pH: No major violations were detected, and a reasonable fit was observed.
(Fig. 10)
10. Residuals vs Alcohol: No significant violations were found, and a reasonable fit was
observed. (Fig. 12)
11.Residuals vs Sulphates: There is a general trend of increasing residuals with
increasing sulfates, there are also some data points that fall outside of this trend.
(Fig.11)
12. Residuals vs Residual Sugar: Relationship is minimum. (Fig. 5)
To improve the model's performance, potential non-linear relationships and interactions
between predictors were explored. Four models were evaluated based on their
performance.

Model Selection

1. Model 1: Multiple Linear Regression- Includes all predictors.

2. Model 2: Polynomial Transformation - Includes polynomial transformations of
predictors.
3. Model 3: Interaction Terms - Includes interaction terms between predictors.
4. Model 4: Stepwise Regression - Utilizes stepwise regression to select the best subset of
predictors.

Model Evaluation

The initial models’ performances were evaluated using metrics such as R-squared,
adjusted R-squared, and root mean squared error (RMSE) to determine its goodness of fit
and predictive accuracy. The results were used to assess the model's effectiveness in
capturing the variability in wine quality.
After careful consideration, Model 3, incorporating interaction terms between predictors,
was selected as the optimal model due to its superior performance in terms of higher R-
squared, lower RMSE, and adherence to linear regression assumptions.

(R output for model evaluation)

Prediction

The assessment of different models using wine data indicates that Model3 outperforms
other models analyzed. Model3's predictions on the existing wine dataset, demonstrate a
range of numerical values spanning approximately 4.32 to 7.01, representing the predicted
quality scores for each observation. The observed variation in predicted quality scores
suggests that Model3 effectively captures the intricacies of wine sample quality. Thus, it
can be inferred that Model3 excels in accurately predicting the quality scores of wine
samples based on the dataset's features.
Conclusion

In conclusion, this study investigates the use of linear regression for predicting wine quality
using diverse properties. Among the models developed, Model 3, which incorporates
interaction terms, emerges as the most precise predictor. Its effectiveness in capturing the
subtleties of wine quality is evident from metrics such as R-squared and RMSE. This
analysis emphasizes the significance of linear regression in predicting wine quality and
proposes potential avenues for refining models to improve predictive precision.

Suggestions for Future Work Improvement: Incorporating additional features beyond

the current set of wine properties for example the variable “type” which we ignored in this
study as it was a categorical variable.
Applying more complex Machine Learning/ Deep Learning algorithms like Randon Forests,
Decision Trees, Gradient Boosting and Neural Networks that may capture non-linear
relationships more effectively.
Validating the model's performance to another dataset to evaluate its generalizability.

Abstract

1. Raj Parmar. (2016). Wine Quality Dataset. Kaggle. Retrieved from

https://www.kaggle.com/datasets/rajyellow46/wine-quality
2. Program used for analysis using R:
# Load required libraries (they are already installed)
library(dplyr)
library(ggplot2)
library(tidyr)
library(MASS)
library(stargazer)
library(corrplot)

# Load dataset (assuming the file path is correct)

wine_data <-
read.csv("C:\\Users\\revak\\OneDrive\\Desktop\\CaseStudy2\\winequalityN.csv")

# Fix variable names:

names(wine_data) <- c("type", "fixed_acidity", "volatile_acidity", "citric_acid",
"residual_sugar", "chlorides",
"free_sulfur_dioxide", "total_sulfur_dioxide", "density", "pH", "sulphates",
"alcohol", "quality")

# Check for missing values

missing_values <- colSums(is.na(wine_data))
print(missing_values)
# Remove rows with missing values
wine_data <- wine_data[complete.cases(wine_data), ]

# Check for duplicated rows

duplicated_rows <- wine_data[duplicated(wine_data), ]
print(duplicated_rows)

# Remove duplicated rows

wine_data <- wine_data[!duplicated(wine_data), ]

# Check the structure of the cleaned dataset

str(wine_data)

#Diagnostics
# Residuals vs. Fitted Values Plot
plot(model, which = 1)

# Normal Q-Q Plot

qqnorm(model$residuals)
qqline(model$residuals)

# Create a dataframe of predictor variables

predictors <- wine_data[, -which(names(wine_data) == "quality")]

# Model Selection

# Remove type column

wine_data <- wine_data[, -which(names(wine_data) == "type")]

# 1. Multiple Linear Regression (Baseline)

model1 <- lm(quality ~ ., data = wine_data)
stargazer(model1, type = "text") # Print model summary using stargazer

# 2. Logarithmic Transformation
wine_data_log <- wine_data
wine_data_log$volatile_acidity <- log(wine_data_log$volatile_acidity + 1)
model2 <- lm(quality ~ ., data = wine_data_log)
stargazer(model2, type = "text") # Print model summary using stargazer

# 3. Interaction Terms
wine_data_interaction <- wine_data
wine_data_interaction$interaction_term <- wine_data_interaction$fixed_acidity *
wine_data_interaction$volatile_acidity
model3 <- lm(quality ~ . + interaction_term, data = wine_data_interaction)
stargazer(model3, type = "text") # Print model summary using stargazer

#Diagnostics
# Residuals vs. Fitted Values Plot
plot(model3, which = 1)

# Normal Q-Q Plot

qqnorm(model3$residuals)
qqline(model3$residuals)

# Create a dataframe of predictor variables

predictors <- wine_data[, -which(names(wine_data) == "quality")]

# Residuals vs. Predictor Variables Plots

for (predictor in colnames(predictors)) {
plot(predictors[[predictor]], model3$residuals, xlab = predictor, ylab = "Residuals",
main = paste("Residuals vs.", predictor))
}

# Calculate VIF for each predictor variable

vif <- car::vif(model3)
print(vif)

# 4. Forward selection using MASS

forward_model <- step(lm(quality ~ ., data = wine_data), direction = "forward")

# Extract the final model formula

best_model_formula <- formula(forward_model)

# Create the final model using lm

model4 <- lm(best_model_formula, data = wine_data)

# Summary and further analysis

stargazer(model4, type = "text") # Print model summary using stargazer

# Model Evaluation
# Function to calculate and store evaluation metrics
evaluate_model <- function(model) {
# Calculate desired metrics (e.g., R-squared, RMSE, AIC)
r_squared <- summary(model)$r.squared
rmse <- sqrt(mean((model$residuals)^2))
aic <- AIC(model)

# Return a data frame with metrics

return(data.frame(model = deparse(substitute(model)), r_squared, rmse, aic))
}

# Evaluate each model using their formulas

model_evaluations <- rbind(
evaluate_model(model1),
evaluate_model(model2),
evaluate_model(model3),
evaluate_model(model4)
)

print(model_evaluations)

# model3 has the best scores

# Make predictions on existing data using the best model

predictions_existing_data <- predict(model3, data = wine_data_interaction)

# Print the predictions

print(predictions_existing_data)

3) Diagnistic Plots

Fig. 1
Fig.2

Fig. 3

Fig. 4
Fig. 5

Fig. 6

Fig. 7
Fig. 8

Fig. 9

Fig.10
Fig. 11.

Fig 12.

Project CST 383
No ratings yet
Project CST 383
1,083 pages
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
No ratings yet
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
12 pages
EDA Mini Project Report
No ratings yet
EDA Mini Project Report
23 pages
FINLATICS
No ratings yet
FINLATICS
8 pages
Does Earning Per Share (Eps) Affected by Debt To Asset Ratio (Dar) and Debt To Equity Ratio (Der) ?
No ratings yet
Does Earning Per Share (Eps) Affected by Debt To Asset Ratio (Dar) and Debt To Equity Ratio (Der) ?
11 pages
Project Report AS
No ratings yet
Project Report AS
32 pages
Research Paper Using Regression Analysis
No ratings yet
Research Paper Using Regression Analysis
6 pages
Wine
No ratings yet
Wine
15 pages
Homework #1 - Hida Efri Nurfina
No ratings yet
Homework #1 - Hida Efri Nurfina
13 pages
Uji Linearitas
No ratings yet
Uji Linearitas
9 pages
MCEN3030 Project1 Wine-Chemistry HZ4jcSg
No ratings yet
MCEN3030 Project1 Wine-Chemistry HZ4jcSg
3 pages
Finaldocmp
No ratings yet
Finaldocmp
40 pages
Data Analysis and Modeling in R
No ratings yet
Data Analysis and Modeling in R
12 pages
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
From Everand
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Humair Arshad Wine Quality Revised
No ratings yet
Humair Arshad Wine Quality Revised
16 pages
Xstkfinal
No ratings yet
Xstkfinal
29 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
Wine Quality Predictor
0% (1)
Wine Quality Predictor
9 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Statistics and Probability PROJECT 2
No ratings yet
Statistics and Probability PROJECT 2
8 pages
Module 1
No ratings yet
Module 1
65 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
54 pages
DWDM Glob
No ratings yet
DWDM Glob
20 pages
Psych Ass 1&2
No ratings yet
Psych Ass 1&2
23 pages
Applsci 12 08252
No ratings yet
Applsci 12 08252
20 pages
Pred Analytics
No ratings yet
Pred Analytics
5 pages
w15z3q
No ratings yet
w15z3q
10 pages
Wine Quality Prediction Report
No ratings yet
Wine Quality Prediction Report
2 pages
Exploratory Data Analysis and Case
No ratings yet
Exploratory Data Analysis and Case
29 pages
4BBA B Rakhi Toshniwal 21211137
No ratings yet
4BBA B Rakhi Toshniwal 21211137
37 pages
21452-Article Text-76851-1-10-20231212
No ratings yet
21452-Article Text-76851-1-10-20231212
14 pages
2023-24 ML Notes 1
No ratings yet
2023-24 ML Notes 1
25 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
ML PR
No ratings yet
ML PR
32 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
No ratings yet
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
6 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Determinants of Financial Inclusion in Small and Medium Enterprises Evidence From Ethiopia
No ratings yet
Determinants of Financial Inclusion in Small and Medium Enterprises Evidence From Ethiopia
8 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
Ai Logistic Regression
No ratings yet
Ai Logistic Regression
2 pages
Lab Rep
No ratings yet
Lab Rep
9 pages
Sklearn - An Introduction Guide To Machine Learning - AlgoTrading101 Blog
No ratings yet
Sklearn - An Introduction Guide To Machine Learning - AlgoTrading101 Blog
26 pages
Statistics For Psychology
No ratings yet
Statistics For Psychology
9 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Analysis of Regression
No ratings yet
Analysis of Regression
22 pages
HW3 Solution Fall 2024
No ratings yet
HW3 Solution Fall 2024
8 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
No ratings yet
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
5 pages
Mahima 2020
No ratings yet
Mahima 2020
8 pages
S Selection Nofimp Portant Fe Machi Eatures A Ne Learn and Pred Ning Tech Dicting W Hniques Wine Qual Lity Using G
No ratings yet
S Selection Nofimp Portant Fe Machi Eatures A Ne Learn and Pred Ning Tech Dicting W Hniques Wine Qual Lity Using G
8 pages
R Project
No ratings yet
R Project
22 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Customer Satisfaction Towards Nissan
No ratings yet
Customer Satisfaction Towards Nissan
43 pages
In Vino Veritas Data Mining and Machine Learning Final Project
No ratings yet
In Vino Veritas Data Mining and Machine Learning Final Project
11 pages
Facial-Expression Recognition An Emergent Approach To The Measurement of Turist Satisfaction Through Emotions
No ratings yet
Facial-Expression Recognition An Emergent Approach To The Measurement of Turist Satisfaction Through Emotions
14 pages
Terrorism in Pakistan and Its Impact On Foreign Investment
No ratings yet
Terrorism in Pakistan and Its Impact On Foreign Investment
23 pages
DA
No ratings yet
DA
4 pages
Analysis of The Foam
No ratings yet
Analysis of The Foam
22 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
2 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
BRM PDF
No ratings yet
BRM PDF
5 pages
MA Assignment (MDS) : Section A - Group 03
No ratings yet
MA Assignment (MDS) : Section A - Group 03
10 pages
01 Modules Thinking Electrons - Modeling People & Categorical and Linear Models - Coursera
No ratings yet
01 Modules Thinking Electrons - Modeling People & Categorical and Linear Models - Coursera
8 pages
Consumer Credit Card Usage Analysis Krithik Jain Business Statistics MGSC 2301-07 Professor Dimitrios Fotiadis
No ratings yet
Consumer Credit Card Usage Analysis Krithik Jain Business Statistics MGSC 2301-07 Professor Dimitrios Fotiadis
10 pages
Cem: Coarsened Exact Matching in Stata: 9, Number 4, Pp. 524-546
No ratings yet
Cem: Coarsened Exact Matching in Stata: 9, Number 4, Pp. 524-546
23 pages
An Investigation of Wine Quality Testing Using Machine Learning Techniques
No ratings yet
An Investigation of Wine Quality Testing Using Machine Learning Techniques
8 pages
ch04 PDF
100% (1)
ch04 PDF
43 pages
Syndicate 6 - Assignment 1
No ratings yet
Syndicate 6 - Assignment 1
4 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
Wine Prediction
100% (1)
Wine Prediction
13 pages
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
No ratings yet
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
6 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Six Sigma Tools in A Excel Sheet
No ratings yet
Six Sigma Tools in A Excel Sheet
23 pages
Ps4 Sol Fall2019
No ratings yet
Ps4 Sol Fall2019
11 pages
Determinants of Price Earnings Ratio
No ratings yet
Determinants of Price Earnings Ratio
13 pages
Mod 3 Worksheet Review 14KEY
No ratings yet
Mod 3 Worksheet Review 14KEY
5 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Wine Recog Data
No ratings yet
Wine Recog Data
2 pages
Wine Case Report
100% (2)
Wine Case Report
16 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
Regression Fundamentals: Below Is A Scored Review of Your Assessment. All Questions Are Shown
No ratings yet
Regression Fundamentals: Below Is A Scored Review of Your Assessment. All Questions Are Shown
17 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
Project Work
No ratings yet
Project Work
2 pages
Bnad 277 Chapter 6 Lab
No ratings yet
Bnad 277 Chapter 6 Lab
3 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
Multilevel Modeling Using R - Finch Bolin Kelley
100% (2)
Multilevel Modeling Using R - Finch Bolin Kelley
82 pages

Report Revathy

Uploaded by

Report Revathy

Uploaded by

WINE QUALITY PREDICTION- Submitted By:

Course Name: Multivariate Statistics-8031

Variables Fixed: The dataset initially contained 13 variables, including "type",

Diagnostic Methods Employed:

Assumption Checks and Results:

1. Model 1: Multiple Linear Regression- Includes all predictors.

(R output for model evaluation)

Suggestions for Future Work Improvement: Incorporating additional features beyond

1. Raj Parmar. (2016). Wine Quality Dataset. Kaggle. Retrieved from

# Load dataset (assuming the file path is correct)

# Fix variable names:

# Check for missing values

# Check for duplicated rows

# Remove duplicated rows

# Check the structure of the cleaned dataset

# Normal Q-Q Plot

# Create a dataframe of predictor variables

# Remove type column

# 1. Multiple Linear Regression (Baseline)

# Normal Q-Q Plot

# Create a dataframe of predictor variables

# Residuals vs. Predictor Variables Plots

# Calculate VIF for each predictor variable

# 4. Forward selection using MASS

# Extract the final model formula

# Create the final model using lm

# Summary and further analysis

# Return a data frame with metrics

# Evaluate each model using their formulas

# model3 has the best scores

# Make predictions on existing data using the best model

# Print the predictions

You might also like