0% found this document useful (0 votes)

40 views15 pages

Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction

Penelitian mengevaluasi kinerja sembilan model machine learning dalam memprediksi kualitas wine menggunakan dataset dari repositori UCI telah dilakukan. Model machine learning yang digunakan adalah Logistic Regression, K- Nearest Neighbor (KNN), Decision Tree, Support Vector Machine (SVM), Random Forest, XGBoost, LightGBM, CatBoost, dan Gradient Boosting. Dataset wine yang digunakan terdiri dari 1.599 sampel dengan 12 parameter kimia

Uploaded by

derayfabi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views15 pages

Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction

Uploaded by

derayfabi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021

Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

Performance Evaluation of Multiple Machine Learning Models for Wine

Quality Prediction
Evaluasi Kinerja Multiple Model Machine Learning untuk Prediksi Kualitas Wine

Dedik Fabiyanto1, Yan Rianto2

1,2
S2 Ilmu Komputer, Fakultas Teknologi Informasi, Universitas Nusa Mandiri, Indonesia

1*
[email protected], [email protected]
*: Penulis korenspondensi (corresponding author)

Informasi Artikel Abstract

Received: December 2023 Research utilizing a dataset from the UCI repository
Revised: January 2024 evaluated the predictive accuracy of nine machine learning
Accepted: January 2024
Published: February 2024
models for wine quality. The models employed include
Logistic Regression, K-Nearest Neighbor (KNN), Decision
Tree, Support Vector Machine (SVM), Random Forest,
XGBoost, LightGBM, CatBoost, and Gradient Boosting. The
dataset comprises 1,599 samples with 12 chemical
parameters. Data preprocessing, including oversampling,
normalization, standardization, and seeding, was performed
to enhance model performance.
The study's findings indicate that the models with the highest
accuracy values were LightGBM (87.80%), CatBoost
(86.60%), and Random Forest (85.70%). A voting classifier
combining these three models achieved an accuracy of
87.29%. Further analysis using a confusion matrix
demonstrated that this combined model effectively predicts
the "Good" and "Not Good" classes.
In conclusion, the combination of LightGBM, CatBoost, and
Random Forest models proves to be an effective approach
for predicting wine quality based on chemical parameters,
with an accuracy value of 87.29%.

Abstrak
Keywords: wine quality, voting Penelitian mengevaluasi kinerja sembilan model machine
classifier, model evaluation learning dalam memprediksi kualitas wine menggunakan
Kata kunci: kualitas wine, voting
classifier, evaluasi model
dataset dari repositori UCI telah dilakukan. Model machine
learning yang digunakan adalah Logistic Regression, K-
Nearest Neighbor (KNN), Decision Tree, Support Vector
Machine (SVM), Random Forest, XGBoost, LightGBM,
CatBoost, dan Gradient Boosting. Dataset wine yang
digunakan terdiri dari 1.599 sampel dengan 12 parameter

▪ 209
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

kimia. Pra-pemrosesan data termasuk pengaturan seed,

oversampling, normalisasi, dan standarisasi dilakukan untuk
meningkatkan kinerja multi model machine learning. Hasil
penelitian menunjukkan bahwa model LightGBM,
CatBoost, dan Random Forest memberikan akurasi tertinggi
dengan masing-msing nilai akurasi secara berturut-turut
sebesar 87,80%, 86,60%, dan 85,70%. Dengan
menggunakan voting classifier yang menggabungkan ketiga
model ini, akurasi sebesar 87,29% berhasil dicapai. Analisis
lebih lanjut menggunakan confusion matrix menunjukkan
bahwa model kombinasi ini memiliki performa yang baik
dalam memprediksi kelas "Good" dan "Not Good".
Penelitian ini menyimpulkan bahwa kombinasi model Light
GBM, Cat Boost, dan Random Forest adalah pendekatan
yang efektif untuk memprediksi kualitas anggur berdasarkan
parameter kimia dengan nilai akurasi 87,29%.

1. Introduction
Wine is an alcoholic refreshment made by aging grapes and other natural products. The
generation preparation includes yeast maturing the characteristic sugars within the natural
product, changing them into liquor and carbon dioxide (CO2). Wine quality is impacted by
different variables, including grape assortment, maturation strategies, capacity conditions, and
the wine's age [1]. Wine quality is crucial in the alcoholic beverage industry, directly impacting
consumer satisfaction and market price. Experts typically perform quality assessments using
sensory methods, which require experience and are subjective [2]. while these conventional
methods have proven effective, they are time-consuming and costly. Therefore, a more efficient
and objective approach to wine quality assessment is needed. Advances in technology,
particularly in Artificial Intelligence (AI) and machine learning, offer opportunities to develop
more efficient and objective approaches for assessing wine quality. Traditional approaches to
predicting wine quality use statistical methods such as linear regression or discriminant analysis.
However, with advancements in machine learning, particularly deep learning, there is an
opportunity to improve prediction accuracy. Deep learning models, such as Deep Neural
Networks (DNN) and Convolutional Neural Networks (CNN), have shown success in various
predictive applications [3]. These approaches aim to reduce the subjectivity associated with
human assessment and improve consistency in determining wine quality. Machine learning
algorithms, like deep learning, enable more in-depth and precise analysis of the physicochemical
data related to wine [4].
The primary journal referenced in this research discusses wine quality prediction using machine
learning algorithms. The data originates from a public dataset that includes various chemical
components in wine. Researchers conducted data analysis and visualization, applying several
machine learning algorithms, including Random Forest, XGBoost, and Decision Tree, to predict
wine quality. The findings indicate that the Random Forest model has the best predictive
accuracy at 66.8%, followed by XGBoost at 60.1% and Decision Tree at 59.5%. The researchers

▪ 210
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

concluded that these machine learning models perform well in predicting high-quality wine but
less so for low-quality wine [5].
Previous studies have explored the use of machine learning methods for wine quality prediction.
Research by Jeffrey A. Clarin compared the performance of several regression algorithms in
predicting white wine quality using a dataset from the UCI Machine Learning Repository and
implemented using WEKA. The study found that the Random Forest algorithm provided the
best performance with a correlation coefficient of r = 0.7459. Among the input variables, alcohol
and acidity remained significantly correlated with the prediction model performance, with
values of r = 0.44, and r = −0.391 respectively [6].
Another think about utilizing a machine learning technique to look at 1,599 wine tests, each
containing 11 input parameters, to recognize the factors with the foremost noteworthy effect on
by and large wine quality. The utilization of direct relapse models in this think about appeared
that liquor and causticity were the essential components influencing wine quality. Furthermore,
warm maps were utilized to display the connections among these factors. Assist investigation
utilized box plots and three-dimensional scramble plots to strengthen the conclusions inferred
from the straight relapse demonstration, giving more particular experiences into the factors that
have the most noteworthy impact on wine quality [4].
Other investigations compared the execution of a few relapse models and combinations of
relapse and gathering models in anticipating wine quality utilizing the wine quality dataset from
the UCI Machine Learning Store. This dataset comprises white and ruddy Vinho Verde wines
from northern Portugal, with 6,497 tests. Sometime recently preparing the models, the dataset
experienced suitable preprocessing steps to guarantee information quality and consistency. Five
relapse algorithms Linear Relapse (LR), Arbitrary Timberland Regressor (RF), Bolster Vector
Relapse (SVR), Choice Tree Regressor (DT), and Multi-layer Perceptron Regressor (MLP)—
were prepared and tried on the dataset. Furthermore, expectations from these person relapse
models were combined with four outfit models XGB Regressor (XGB), AdaBoost Regressor
(ABR), Stowing Regressor (BR), and Slope Boosting Regressor (GRB). They come about
demonstrated that among person models, Arbitrary Woodland (RF) appeared the finest
execution, with the most reduced MAE, MSE, and RMSE values and the most noteworthy R²
score. This recommends that RF is more suited to the ruddy wine quality dataset compared to
other relapse models. In any case, combining Irregular Woodland with Sacking Regressor (RF
and BR) outflanked the person models, appearing with lower mistakes and generally higher R²
scores [7].

2. Metode/Perancangan
This research employs quantitative methods with multiple machine-learning models. The nine
machine learning models used are Logistic Regression, K-Nearest Neighbor (KNN), Decision
Tree, Support Vector Machine (SVM), Random Forest, XGBoost, LightGBM, CatBoost, and
Gradient Boosting. The selection of these nine models provides a broad spectrum, allowing for
a comprehensive evaluation and precise accuracy comparison. This approach also helps identify
which model best fits the data characteristics. The overall research steps are shown in Figure
1.

▪ 211
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

Figure 1. Research Flowchart

2.1. Data Collection
The dataset used is the wine quality dataset available in the UCI machine learning repository
[8]. This dataset consists of 1,599 wine samples with 12 chemical parameters and their quality
labels. The 12 parameters used are shown in Table 1.

Table 1. Chemical Parameters of Wine [9]

No. Parameter Information

1. Fixed acidity The total amount of non-volatile acids in wine, mainly
tartaric, malic, and citric acids
2. Volatile acidity The total amount of volatile acids, primarily acetic
acid
3. Citric acid Natural acid found in citrus fruits
4. Residual sugar The amount of sugar remaining after the completion of
alcoholic fermentation
5. Chlorides The content of chloride ions in wine
6. Free sulfur dioxide The amount of free sulfur dioxide available in wine
(SO2)
7. Total sulfur dioxide The total amount of sulfur dioxide, including both free
(SO2) and bound forms
8. Density The density of wine, often correlated with alcohol and
sugar content

▪ 212
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

No. Parameter Information

9. pH The acidity or alkalinity level of wine
10. Sulphates Sulfur compounds, such as potassium sulfate, found in
wine
11. Alcohol The alcohol content in wine, usually measured as a
percentage of volume
12. Quality Wine quality rating, usually on a numerical scale (e.g.,
0-10)

Twelve chemical parameters were tested on 1,599 wine samples with varying values, as shown
in Table 2, which provides an example of the dataset used in this study.

Table 2. Example of Dataset and Parameters Used

free total
fixed volatile citric residual
chlorides sulfur sulfur density pH sulphates alcohol quality
acidity acidity acid sugar
dioxide dioxide
7.4 0.7 0.0 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
7.8 0.88 0.0 2.6 0.098 25.0 67.0 0.9968 3.2 0.68 9.8 5
7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.997 3.26 0.65 9.8 5
11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.998 3.16 0.58 9.8 6
7.4 0.7 0.0 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
7.4 0.66 0.0 1.8 0.075 13.0 40.0 0.9978 3.51 0.56 9.4 5
7.9 0.6 0.06 1.6 0.069 15.0 59.0 0.9964 3.3 0.46 9.4 5

2.2. Data Preprocessing

Data quality significantly influences the outcome of the model built, hence preprocessing steps
are necessary. The steps in data preprocessing for this study include seed setting, oversampling,
normalization, and standardization. Each step aims to prepare the data so that the model can
learn more effectively and produce accurate predictions. The first step, seed setting, involves
setting the initial value for the random number generator. Seed setting ensures that data splitting,
random sample selection, and other random processes can be repeated with the same results
each time training and testing are run, thus validating the results consistently [10]. The next step,
oversampling, addresses the issue of class imbalance in the dataset. Oversampling techniques,
such as the Synthetic Minority Over-sampling Technique (SMOTE), have been proven effective
in improving model performance on imbalanced datasets. Oversampling helps the model learn
better about the features of the minority class, thereby improving the model's performance in
predicting that class [11].
Normalization is the process of changing the scale of features in the data so that they have the
same range, usually between 0 and 1. Normalization helps in accelerating the convergence of
algorithms, reducing variability in data, and allowing the model to learn more efficiently from
the data, thus improving prediction accuracy. The normalization method used in this study is
Min-Max Scaling [12]. The final process is standardization, which transforms the distribution
of features to have a mean of 0 and a standard deviation of 1. Unlike normalization, which
adjusts the data scale to a specific range, standardization ensures that the data has a uniform

▪ 213
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

distribution. This is very useful in algorithms such as Support Vector Machines (SVM) that
assume the data is normally distributed. With standardization, features with different scales can
be treated equally by the model, potentially improving overall model performance [13].

2.3. Model Selection

Multiple machine-learning models were used to compare the performance of various classifiers
in the modeling process. The models used in this study are shown in Table 3.

Table 3. Research Models Used

No. Research Model Python Code

1. Logistic Regression lr = LogisticRegression(max_iter=500, n_jobs=-1,
random_state=SEED)
2. K-Nearest knn = KNeighborsClassifier()
Neighbors (KNN)
3. Decision Tree dt = DecisionTreeClassifier(random_state=SEED)
4. Support Vector svc = SVC(random_state=SEED)
Classifier (SVC)
5. Random Forest rf = RandomForestClassifier(random_state=SEED)
6. Extreme Gradient xgbc = xgb.XGBClassifier(random_state=SEED)
Boosting (XGBoost)
7. LightGBM lgbmc = lgbm.LGBMClassifier(random_state=SEED)
8. CatBoost cbc = cb.CatBoostClassifier(random_state=SEED,
verbose=False)
9. Gradient Boosting gbc = GradientBoostingClassifier(random_state=SEED)

2.4. Dataset Splitting

The dataset consisting of 1,599 samples was then split into training and testing data with a
composition of 70% and 30%. The training data was used to train the predetermined models,
and the testing data was used to further test or evaluate model performance.

2.5. Model Performance Evaluation

Model performance evaluation was conducted by comparing the accuracy values of each model.
The three models with the highest accuracy values were then combined using the voting
classifier method.

2.6. Analysis
Subsequently, an analysis of the model performance evaluation results was conducted through
the confusion matrix. The confusion matrix provides an overview of the model's prediction
distribution and informs about the performance of the classification model by comparing
predicted values with actual values from the test data. The information includes the True
Positive (TP) value, which is the number of positive cases correctly predicted by the model,
meaning the model accurately identifies positive cases. True Negative (TN) is the number of
negative cases correctly predicted by the model, meaning the model accurately identifies

▪ 214
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

negative cases. False Positive (FP) is the number of negative cases incorrectly predicted as
positive by the model. False Negative (FN) is the number of positive cases incorrectly predicted
as negative by the model [14]. From these four values, further evaluation metrics such as
accuracy, precision, recall, and F1-score can be calculated [15].
Accuracy is the total percentage of correct predictions out of all predictions made by the model.
It is calculated as shown in Equation (1) and Equation (2).
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
) (1)
Accuracy for more than one class:
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝐽𝑢𝑚𝑙𝑎ℎ 𝑠𝑎𝑚𝑝𝑒𝑙 (2)

Precision shows the percentage of positive cases correctly predicted out of all positive
predictions made by the model. It is calculated as shown in Equation (3).
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (𝑇𝑃+𝐹𝑃) (3)
Recall (Sensitivity or True Positive Rate) is the percentage of positive cases correctly identified
by the model out of all actual positive cases. It is calculated as shown in Equation (4).
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (𝑇𝑃+𝐹𝑁) (4)
F1-score is the harmonic mean of precision and recall. The F1-score provides a balance between
these two metrics and is useful when there is a class imbalance. It is calculated as shown in
Equation (5).
2∗(𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑠𝑖𝑜𝑛)
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = (5)
(𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑠𝑖𝑜𝑛)

3. Results and Discussion

The distribution of wine quality in the dataset on a scale of 0-10 is shown in Table 4.
Table 4. Frequency Distribution of Wine Quality

Quality Frequency

5 681

6 638

7 199

4 53

8 18

3 10

▪ 215
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

From Table 3, it is evident that the wine quality with the highest number of samples is 5 with
681 samples, followed by 6 with 638 samples. Meanwhile, the qualities with the least number
of samples are 3 and 8, with 10 and 18 samples, respectively. The frequency distribution of wine
quality can be seen in Figure 2.

Figure 2. Frequency Distribution of Wine Quality

The researcher further examined the correlation of two variables with wine quality, namely
alcohol content and fixed acidity. The correlation between alcohol content and wine quality is
shown in Figure 3.

▪ 216
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

Figure 3. Correlation Between Wine Quality and Alcohol Content

Each dot in the scatter plot represents a wine sample in the dataset. The distribution pattern
suggests that wines with lower quality (scores 3 and 4) tend to have varying alcohol content, but
generally in the lower range (around 9-11%). Wines with medium quality (scores 5 and 6) show
a denser distribution with varying alcohol content, but there is a significant concentration in the
9-11% range for quality 5 and 10-12% for quality 6. Higher quality wines (scores 7 and 8) tend
to have higher alcohol content overall. For quality 7, alcohol content is often in the 10-13%
range, and for quality 8, despite the small sample size, alcohol content tends to be higher and
more variable. The positive correlation observed suggests that higher-quality wines tend to have
higher alcohol content. This is seen from the rightward distribution of dots (higher quality), with
alcohol content tending to increase. The plot also shows considerable variability in alcohol
content for each quality level, especially for qualities 5 and 6, which show denser and broader
spread.
The second variable reviewed is shown in Figure 4, which is the correlation between fixed
acidity and wine quality. The distribution shows that wines with quality scores of 5, 6, and 7
have higher data point densities, indicating these quality scores are more common in the dataset.
In contrast, quality scores of 3, 4, and 8 have fewer data points, indicating these scores are less
common. There is no strong linear relationship between fixed acidity and wine quality. Fixed
acidity values for wines with quality scores of 5, 6, and 7 show a wide range, from around 6 to
over 14.

▪ 217
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

Figure 4. Correlation Between Wine Quality and Fixed Acidity

The highest fixed acidity observed is around 16, which appears in wines with a quality score of
5. Wines with a quality score of 8 have lower fixed acidity, primarily ranging between 6 and 10.
The correlation between wine quality and other variables in the dataset is shown through a
heatmap in Figure 5.

Figure 5. Heatmap of Wine Quality Correlation with Dataset Variables

▪ 218
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

The heatmap appears relationship values extending from -1 to 1. A esteem of 1 shows a idealize
positive relationship, meaning that as one trait increments, the other trait too increments
relatively. A esteem of -1 demonstrates a culminate negative relationship, meaning that as one
property increments, the other quality diminishes relatively. A esteem of shows no relationship
between the two traits. Based on the color elucidation, lighter colors (toward white) show more
grounded relationships (both positive and negative), whereas darker colors (toward dark) show
weaker or no relationship.
The correlation between variables shows a strong positive correlation between fixed acidity and
density (0.67) and between citric acid and fixed acidity (0.67). Total sulfur dioxide and free
sulfur dioxide also show a very strong positive correlation (0.67). The anticipated strong
correlation between alcohol and quality turned out to be moderately strong based on the
correlation heatmap, with a value of 0.48. Additionally, the correlations between volatile acidity
and quality, and between citric acid and pH, show moderately strong negative correlations, with
values of -0.39 and -0.54 respectively.

3.1. Modeling
The researcher compared the accuracy of models before and after normalization and
standardization, as shown in Table 5 and Figure 6.

Table 5. Model Accuracy Comparison

Accuracy
Before
Model Normalization After After
and Normalization Standardization
Standardization
Logistic Regression 56.7% 58,90% 58,70%
K-Nearest Neighbors (KNN) 69.6% 75,10% 76,70%
Support Vector Classifier
42.7% 72,90% 77,00%
(SVC)
Random Forest 85.7% 85,30% 85,60%
Decision Tree 78.1% 78,70% 79,00%
Extreme Gradient Boosting
77.5% 77,80% 77,80%
(XGBoost)
LightGBM 87.8% 87,80% 86,40%
CatBoost 86.4% 86,60% 86,60%
Gradient Boosting 81.8% 82,00% 82,00%

▪ 219
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

100.00%

90.00%

80.00%

70.00%

60.00%

50.00%

40.00%

30.00%

20.00%

10.00%

0.00%
Logistic K-Nearest Support Random Decision Extreme LightGBM CatBoost Gradient
Regression Neighbors Vector Forest Tree Gradient Boosting
(KNN) Classifier Boosting
(SVC) (XGBoost)

Akurasi Sebelum Normalisasi dan Standarisasi Akurasi Setelah Normalisasi

Akurasi Setelah Standarisasi

Figure 6. Model Accuracy Comparison

The table and figure above show that the three models with the highest accuracy are LightGBM,
CatBoost, and Random Forest with accuracy values of 87.80%, 86.60%, and 85.70%,
respectively. To achieve better accuracy, the researcher combined the models with the highest
accuracy (LightGBM, CatBoost, and Random Forest) into a voting classifier. The accuracy of
the voting classifier model is 87.29%. This shows that the combination of LightGBM, CatBoost,
and Random Forest models provides good results.

3.2. Model Analysis

The perplexity network utilized to assess the execution of the voting classifier demonstrate is
appeared in Figure 7. Both the real and anticipated values comprise of two classes: "Good" and
"Not Good". Each cell within the network appears the number of forecasts that drop into a
certain category. The network has the structure (0, 0): Anticipated "Good" and real esteem
"Good" (TP), (0, 1): Anticipated "Great" and real esteem "Not Good" (FP), (1, 0): Anticipated
"Not Good" and real esteem "Good" (FN), and (1, 1): Anticipated "Not Good" and real esteem
"Not Good" (TN).
True Positive (TP): According to the matrix, there are 132, 128, 115, 95, 111, and 133 cases
where the model correctly predicted "good" as "good".
True Negative (TN): The model correctly identified "not good" as "not good" in 0 (or close to
zero) cases.

▪ 220
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

False Positive (FP): There are 1, 2, 3, 1, and 1 cases where the model incorrectly predicted "not
good" as "good".
False Negative (FN): There are 1, 2, 7, 8, 35, 12, and 4 cases where the model incorrectly
predicted "good" as "not good".

Figure 7. Confusion Matrix of Voting Classifier Model

The combined classification model shows good performance in predicting the "Good" class.
The precision for this class is 0.6567, which means that about 65.67% of all predictions stating
"Good" are correct. The recall for the "Good" class is 0.9635, indicating that the model
successfully identifies about 96.35% of all actual "Good" cases. The F1-score for the "Good"
class is 0.7836, which is a harmonic measure of precision and recall, showing a good balance
between them despite some false positives.
The model shows excellent performance in predicting the "Not Good" class. The precision for
this class is 1.0, meaning all predictions stating "Not Good" are correct without any errors. The
recall for the "Not Good" class is 0.9568, indicating that the model can identify about 95.68%
of all actual "Not Good" cases. The F1-score for the "Not Good" class is 0.978, indicating an
excellent balance between precision and recall, with very few prediction errors. This is
comprehensively shown in Table 6.
Table 6. Model Performance Comparison

Class Precision Recall F1-Score

Good 0.6567 0.9635 0.7836

Not Good 1.0 0.9568 0.978

▪ 221
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

4. Conclusion and Suggestions

his research has successfully evaluated the performance of nine machine learning models in
predicting wine quality using a wine quality dataset from the UCI machine learning repository.
The models used are Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Support
Vector Machine (SVM), Random Forest, XGBoost, LightGBM, CatBoost, and Gradient
Boosting. Through a series of data preprocessing steps such as seed setting, oversampling,
normalization, and standardization, this study aims to prepare the data for more effective model
learning and accurate predictions.
The results show that data normalization and standardization positively impact the accuracy of
most models, with some models showing significant accuracy improvements after
normalization and standardization. The three models with the highest accuracy are LightGBM,
CatBoost, and Random Forest with accuracy values of 87.80%, 86.60%, and 85.70%,
respectively. Combining these three models in a voting classifier resulted in an accuracy of
87.29%, showing that model combination can yield good results.
Further analysis through the confusion matrix and evaluation metrics of precision, recall, and
F1-score shows that the voting classifier model has excellent performance in predicting both
"Good" and "Not Good" classes. Precision, recall, and F1-score for the "Not Good" class reach
very high values, each being 1.0, 0.9568, and 0.978, respectively, indicating an excellent balance
between precision and recall with very few prediction errors. For the "Good" class, the precision,
recall, and F1-score are 0.6567, 0.9635, and 0.7836, respectively, showing that the model also
performs well in predicting this class despite some false positives.
Overall, this research shows that with proper data preprocessing and the appropriate selection
of machine learning models, accurate and reliable models for predicting wine quality can be
obtained. The voting classifier model combining LightGBM, CatBoost, and Random Forest
proves to be an effective approach in this study, providing optimal results in wine quality
classification.

References

[1] R. Zhu, “Chemical Change and Quality Control in Winemaking,” Scientific and Social
Research, vol. 4, no. 7, pp. 62-67, 14 Juli 2022.
[2] M. H. Shahrajabian dan W. Sun, “Assessment of Wine Quality, Traceability and Detection
of Grapes Wine, Detection of Harmful Substances in Alcohol and Liquor Composition
Analysis,” Letters in Drug Design & Discovery, vol. 21 (8), no. Doi:
10.2174/1570180820666230228115450, pp. 1377-1399, Juni 2024.
[3] L. Le, P. N. Hurtado, I. Lawrence, Q. Tian dan B. Chen, “Applying Neural Networks in
Wineinformatics with the New Computational Wine Wheel,” Fermentation, vol. 9 (7), no.
Doi: 10.3390/fermentation9070629, pp. 629-629., 2023.
[4] J. Dong, “Red Wine Quality Analysis based on Machine Learning Techniques,”
Highlights in Science, Engineering and Technology, vol. 49, no. Doi:
10.54097/hset.v49i.8506, pp. 208-213, 2023.

▪ 222
Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021
Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

[5] C. Zeng, J. Fang, Q. Yang, C. Xiang, Z. Zhao dan Y. Lei, “Wine quality grade data analysis
and prediction based on multiple machine learning algorithms,” dalam Proceedings of the
2nd International Conference on Mechatronics and Smart Systems, 2024.
[6] J. A. Clarin, “Comparison of the Performance of Several Regression Algorithms in
Predicting the Quality of White Wine in WEKA,” International Journal of Emerging
Technology and Advanced Engineering, vol. 12 (07), no. Doi: 10.46338/ijetae0722_03 ,
pp. 20-26, 3 Juli 2022.
[7] A. K., “Regression Modeling Approaches for Red Wine Quality Prediction: Individual
and Ensemble,” International Journal for Research in Applied Science & Engineering
Technology (IJRASET), vol. 11, no. Doi: doi.org/10.22214/ijraset.2023.54363, pp. 3621-
3627, Juni 2023.
[8] N. Pourmoradi, “Red Wine Quality,” Kaggle, 2023. [Online]. Available:
https://www.kaggle.com/code/nimapourmoradi/red-wine-quality/input. [Diakses 21 Juni
2024].
[9] R. S. Jackson, Wine Science: Principles and Applications (3rd Edition), Burlington:
Academic Press, 2008.
[10] A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition), N. Tache,
Penyunt., Sebastopol: O’Reilly Media, Inc, 2019.
[11] Ridwan, E. H. Hermaliani dan M. Ernawati, “Penerapan Metode SMOTE Untuk
Mengatasi Imbalanced Data Pada,” Computer Science (CO-SCIENCE), vol. 4 (1), no. E-
ISSN: 2774-9711, pp. 80-88, Januari 2024.
[12] D. A. Nasution, H. H. Khotimah dan N. Chamidah, “Perbandingan Normalisasi Data untuk
Klasifikasi Wine Menggunakan Algoritma K-NN,” CESS (Journal of Computer
Engineering System and Science), vol. 4 (1), pp. 78-82, Januari 2019.
[13] S. Ioffe dan C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by
Reducing Internal Covariate Shift,” dalam Proceedings of the 32nd International
Conference on Machine Learning, PMLR, 2015.
[14] K. S. Nugroho, “Confusion Matrix untuk Evaluasi Model pada Supervised Learning,” 13
November 2019. [Online]. Available: https://ksnugroho.medium.com/confusion-matrix-
untuk-evaluasi-model-pada-unsupervised-machine-learning-bc4b1ae9ae3f. [Diakses 23
Juni 2024].
[15] S. Raschka dan V. Mirjalili, Python Machine Learning, Burningham: Packt Publishing
Ltd., 2019.

▪ 223

Project CST 383
No ratings yet
Project CST 383
1,083 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
Wine Quality Predictor
0% (1)
Wine Quality Predictor
9 pages
Course-Plan-Advance-Nursing-1st Year M.SC
No ratings yet
Course-Plan-Advance-Nursing-1st Year M.SC
16 pages
Vedic Maths Final PPT-1
No ratings yet
Vedic Maths Final PPT-1
21 pages
Wine Quality Prediction Using ML PPR
100% (1)
Wine Quality Prediction Using ML PPR
8 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
MGT420 Tan Sri Mokhtar Albukhary
No ratings yet
MGT420 Tan Sri Mokhtar Albukhary
33 pages
6 - Risk Appetite Statement Template
100% (1)
6 - Risk Appetite Statement Template
5 pages
Finaldocmp
No ratings yet
Finaldocmp
40 pages
Guillermo Garcia Rodriguez - Rivendel S.L
No ratings yet
Guillermo Garcia Rodriguez - Rivendel S.L
85 pages
Project Report AS
No ratings yet
Project Report AS
32 pages
Data Set Information WINE QUALITY
100% (1)
Data Set Information WINE QUALITY
4 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Geography Project On Environment Conserv
No ratings yet
Geography Project On Environment Conserv
18 pages
Humair Arshad Wine Quality Revised
No ratings yet
Humair Arshad Wine Quality Revised
16 pages
Wine Final Projects
No ratings yet
Wine Final Projects
19 pages
DWDM Glob
No ratings yet
DWDM Glob
20 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
22 pages
Wine Quality
100% (1)
Wine Quality
2 pages
ML PR
No ratings yet
ML PR
32 pages
Honours LY Project
No ratings yet
Honours LY Project
31 pages
Wine Quality Prediction GHAR
No ratings yet
Wine Quality Prediction GHAR
19 pages
Homework #1 - Hida Efri Nurfina
No ratings yet
Homework #1 - Hida Efri Nurfina
13 pages
w15z3q
No ratings yet
w15z3q
10 pages
Second Year Memo Download
No ratings yet
Second Year Memo Download
2 pages
Data Analysis and Modeling in R
No ratings yet
Data Analysis and Modeling in R
12 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
ML Miniproject
No ratings yet
ML Miniproject
19 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Machine Learning Based Predictive Modelling For The Enhancement of Wine Quality
No ratings yet
Machine Learning Based Predictive Modelling For The Enhancement of Wine Quality
18 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Mahima 2020
No ratings yet
Mahima 2020
8 pages
Wine Quality Prediction Project Report
No ratings yet
Wine Quality Prediction Project Report
4 pages
10.1007@978 981 13 7403 623
No ratings yet
10.1007@978 981 13 7403 623
9 pages
Wine5 PDF
No ratings yet
Wine5 PDF
29 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
In Vino Veritas Data Mining and Machine Learning Final Project
No ratings yet
In Vino Veritas Data Mining and Machine Learning Final Project
11 pages
Wine Quality
No ratings yet
Wine Quality
8 pages
VinQCheck: An Intelligent Wine Quality Assessment
No ratings yet
VinQCheck: An Intelligent Wine Quality Assessment
9 pages
S Selection Nofimp Portant Fe Machi Eatures A Ne Learn and Pred Ning Tech Dicting W Hniques Wine Qual Lity Using G
No ratings yet
S Selection Nofimp Portant Fe Machi Eatures A Ne Learn and Pred Ning Tech Dicting W Hniques Wine Qual Lity Using G
8 pages
Wine Quality Prediction Research Paper 22
No ratings yet
Wine Quality Prediction Research Paper 22
6 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
No ratings yet
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
6 pages
Lab Rep
No ratings yet
Lab Rep
9 pages
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
No ratings yet
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
5 pages
Pred Analytics
No ratings yet
Pred Analytics
5 pages
Report
No ratings yet
Report
6 pages
5th Sem Mini Project Synopsis 2
No ratings yet
5th Sem Mini Project Synopsis 2
2 pages
Wine Quality Prediction Report
No ratings yet
Wine Quality Prediction Report
2 pages
An Investigation of Wine Quality Testing Using Machine Learning Techniques
No ratings yet
An Investigation of Wine Quality Testing Using Machine Learning Techniques
8 pages
Wine Quality Prediction PoC Report
No ratings yet
Wine Quality Prediction PoC Report
2 pages
Irjmets Journal
No ratings yet
Irjmets Journal
7 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
2 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
ML Mini Report
No ratings yet
ML Mini Report
6 pages
A Study On Emotional Maturity and Self Esteem Among Adolescents - May - 2020 - 1589879447 - 78142741
No ratings yet
A Study On Emotional Maturity and Self Esteem Among Adolescents - May - 2020 - 1589879447 - 78142741
3 pages
1st Poster
No ratings yet
1st Poster
1 page
The Classification of White Wine and Red Wine Acco
No ratings yet
The Classification of White Wine and Red Wine Acco
5 pages
Vit Assignment 4
No ratings yet
Vit Assignment 4
1 page
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
Older Persons Programme Western Cape Government
No ratings yet
Older Persons Programme Western Cape Government
4 pages
TQ For Gen Math
No ratings yet
TQ For Gen Math
4 pages
Deep Learning in Computer Vision - Principles and Applications
100% (3)
Deep Learning in Computer Vision - Principles and Applications
339 pages
Parental Leave Will Cost
No ratings yet
Parental Leave Will Cost
284 pages
Solution Manual For Fundamentals of Mechatronics 1st Edition by Jouaneh
100% (1)
Solution Manual For Fundamentals of Mechatronics 1st Edition by Jouaneh
2 pages
Get Our Special Grand Bundle PDF Course For All Upcoming Bank Exams
No ratings yet
Get Our Special Grand Bundle PDF Course For All Upcoming Bank Exams
238 pages
Conceptual Physics Chapter 9 Conservation of Energy Answers 7Gkg
100% (1)
Conceptual Physics Chapter 9 Conservation of Energy Answers 7Gkg
3 pages
KK275P-3CD3CG: IEC61215 Ed2 IEC61730
No ratings yet
KK275P-3CD3CG: IEC61215 Ed2 IEC61730
2 pages
HighEntropy Carbide
No ratings yet
HighEntropy Carbide
10 pages
Clax 100 Ob 2al1 (E) - Pis 2018 New Logo
No ratings yet
Clax 100 Ob 2al1 (E) - Pis 2018 New Logo
2 pages
LT LG400 13 Inspection Maintenance Procedures
No ratings yet
LT LG400 13 Inspection Maintenance Procedures
6 pages
Teaching Practicum Syllabus
No ratings yet
Teaching Practicum Syllabus
9 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
ĐỀ 9 M
No ratings yet
ĐỀ 9 M
5 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
Results Certificate Tolc 2935109
No ratings yet
Results Certificate Tolc 2935109
1 page
The A.I. Playbook
86% (7)
The A.I. Playbook
43 pages
Provide Compassionate, Provide Compassionate, Provide Compassionate, Respectful and Caring Service Learninig Guide 02
No ratings yet
Provide Compassionate, Provide Compassionate, Provide Compassionate, Respectful and Caring Service Learninig Guide 02
13 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
93% (14)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
Lesson Plan 1849californiagoldrush - LBLP
No ratings yet
Lesson Plan 1849californiagoldrush - LBLP
3 pages
MCNN-AAPT Accurate Classification and Functional Prediction of Amino Acid and Peptide Transporters in Secondary Active Transporters Using Protein Lan
No ratings yet
MCNN-AAPT Accurate Classification and Functional Prediction of Amino Acid and Peptide Transporters in Secondary Active Transporters Using Protein Lan
11 pages
Top 100 Applications of Generative AI 1683282083
100% (15)
Top 100 Applications of Generative AI 1683282083
119 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
93% (15)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Government of Tamil Nadu Directorate of Medical and Rural Health Services
No ratings yet
Government of Tamil Nadu Directorate of Medical and Rural Health Services
20 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
Apress Understanding Large Language Models B0CJ2C8TXQ
100% (11)
Apress Understanding Large Language Models B0CJ2C8TXQ
166 pages
Building and Environment: Mosha Zhao, Schew-Ram Mehra, Hartwig M. Künzel
No ratings yet
Building and Environment: Mosha Zhao, Schew-Ram Mehra, Hartwig M. Künzel
16 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
2023 Grade 7 Mentor Life Skills Schemes of Work Term 2
No ratings yet
2023 Grade 7 Mentor Life Skills Schemes of Work Term 2
7 pages
Metadata Digestive System Grade 6 Week 2 Q2
No ratings yet
Metadata Digestive System Grade 6 Week 2 Q2
1 page
The Python Bible
97% (31)
The Python Bible
506 pages
Generative Ai Fundamentals v1
100% (16)
Generative Ai Fundamentals v1
80 pages
Forhad CVH
No ratings yet
Forhad CVH
3 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (14)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Salah Khalid Alasbahi Objective: Sana'a University
No ratings yet
Salah Khalid Alasbahi Objective: Sana'a University
2 pages
Lit. Rev
No ratings yet
Lit. Rev
1 page
Machine Learning - An Applied Mathematics Introduction PDF
100% (13)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
94% (16)
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
438 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (10)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Create LLM Application Using Langchain With Ease
100% (5)
Create LLM Application Using Langchain With Ease
12 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
Machine Learning Masterclass
100% (11)
Machine Learning Masterclass
108 pages
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
Machine Learning Paradigms
100% (10)
Machine Learning Paradigms
336 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Big Data Tools 2 - Apache Spark With PySpark
No ratings yet
Big Data Tools 2 - Apache Spark With PySpark
33 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages

Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction

Uploaded by

Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction

Uploaded by

Telematika: Jurnal Informatika dan Teknologi Informasi ISSN: 1829-667X / E-ISSN: 2460-9021

Vol. 21, No. 2, Juni 2024, pp.209-223 DOI:10.31515/telematika.v21i2. 13007

Performance Evaluation of Multiple Machine Learning Models for Wine

Dedik Fabiyanto1, Yan Rianto2

Informasi Artikel Abstract

kimia. Pra-pemrosesan data termasuk pengaturan seed,

Figure 1. Research Flowchart

Table 1. Chemical Parameters of Wine [9]

No. Parameter Information

No. Parameter Information

Table 2. Example of Dataset and Parameters Used

2.2. Data Preprocessing

2.3. Model Selection

Table 3. Research Models Used

No. Research Model Python Code

2.4. Dataset Splitting

2.5. Model Performance Evaluation

3. Results and Discussion

Figure 2. Frequency Distribution of Wine Quality

Figure 3. Correlation Between Wine Quality and Alcohol Content

Figure 4. Correlation Between Wine Quality and Fixed Acidity

Figure 5. Heatmap of Wine Quality Correlation with Dataset Variables

Table 5. Model Accuracy Comparison

Akurasi Sebelum Normalisasi dan Standarisasi Akurasi Setelah Normalisasi

Figure 6. Model Accuracy Comparison

3.2. Model Analysis

Figure 7. Confusion Matrix of Voting Classifier Model

Class Precision Recall F1-Score

Good 0.6567 0.9635 0.7836

Not Good 1.0 0.9568 0.978

4. Conclusion and Suggestions

You might also like