0% found this document useful (0 votes)

59 views27 pages

Wine Quality Analysis

The document outlines a mini project focused on analyzing wine quality using machine learning techniques on a dataset of Portuguese 'Vinho Verde' wines. It includes sections on data preparation, exploratory data analysis, various regression methods employed, results evaluation, and conclusions drawn from the findings. Key insights reveal the importance of physicochemical properties, particularly alcohol content, in predicting wine quality, with models like Random Forest and SVM showing the best performance.

Uploaded by

erkshitizarora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views27 pages

Wine Quality Analysis

Uploaded by

erkshitizarora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Mini Project

UNC514: Data Science Fundamentals

Wine Quality Analysis

Submitted by: Submitted to:

Kshitiz Arora (102215226) Department of ECE

THAPAR INSTITUTE OF ENGINEERING & TECHNOLOGY, PATIALA, PUNJAB

November 2024
Table of Contents

Sr. No Content Page No

1 Introduction (objectives and problem statement) Max 2 pages

2 Data Preparation (detail introduction of dataset used) Max 2 pages

3 Exploratory Data Analysis (various EDA operation Max 4 pages

you have applied)
4 Methods Used (define and discuss all Max 5 pages
classification/regression models you have used)
5 Results and Evaluation (include all plots, graphs with Atleast 5 pages
proper numbering and name)
6 Conclusions 1 page

7 References (websites, books, researcharticle) 1 Page

1. Introduction

1.1 Overview

Wine is more than just a drink; for thousands of years, humanity has been captivated by the
complexities of this timeless elixir. But what truly makes a wine exceptional? Is it the
sharpness of its acidity, the warmth of its alcohol, or the perfect harmony of sweetness and
bitterness?

Our goal is not to replace the human touch but to complement it—to provide winemakers,
enthusiasts, and consumers alike with a powerful tool that demystifies the science of wine.
By analyzing the physicochemical properties of wine, such as pH, alcohol content, residual
sugar, and acidity, we aim to uncover the secrets hidden in each drop, revealing the factors
that set apart an ordinary wine from an award-winning masterpiece.

1.2 Problem Definition and Scope

The process of evaluating wine quality has long been shrouded in subjectivity, relying on the
discerning taste of experts who may differ in their judgments. This subjectivity creates a
challenge for winemakers striving to consistently craft high-quality products. How can the
industry ensure a fair, consistent, and scalable method of assessing wine quality?

With this project, we tackle the pressing problem of developing a data-driven approach to
predict wine quality based on measurable attributes. The dataset at our disposal, rich with
information about red wine, offers the perfect canvas for this endeavor.

Through this journey, we embrace the challenge of translating the mystique of wine into
code, algorithms, and visualizations, all while preserving its soul. Because at the end of the
day, a good wine doesn’t just taste great—it tells a story. And with this project, we’re here to
write the next chapter.
2. Data Preparation
2.1 Dataset Used

The dataset utilized in this project is derived from the physicochemical and sensory
properties of Portuguese "Vinho Verde" wines.

● Name: Wine Quality Dataset

● Origin: Sourced from the UCI Machine Learning Repository and available on Kaggle.
● Owner: The dataset was originally curated by Paulo Cortez and colleagues .
● Type: structured dataset, primarily numerical in nature.
● Country: Portugal (specifically focused on the "Vinho Verde" wine region).
● Size: 1,599 instances.

Column Names and Descriptions

The dataset contains 11 input variables and 1 output variable:

Input Variables (Physicochemical Properties):

1. Fixed Acidity: Concentration of non-volatile acids (e.g., tartaric acid) that do not
evaporate.
2. Volatile Acidity: Concentration of volatile acids (e.g., acetic acid), which can
contribute to a vinegar-like taste if excessive.
3. Citric Acid: Enhances freshness and adds a citrus-like flavor.
4. Residual Sugar: Sugar left after fermentation; affects sweetness.
5. Chlorides: Salt content in wine; influences taste balance.
6. Free Sulfur Dioxide: Protects wine from oxidation and spoilage.
7. Total Sulfur Dioxide: The total amount of SO₂, including bound forms.
8. Density: Relates to the alcohol and sugar content.
9. pH: Indicates acidity/basicity of wine.
10. Sulphates: Adds to the wine's preservation and taste complexity.
11. Alcohol: Ethanol content in the wine.

Output Variable (Sensory Property):

12. Quality: Rated on a scale of 0 to 10 based on sensory evaluation by wine tasters.
3. Exploratory Data Analysis
3.1 Getting Insights About the Dataset

● Dataset Shape and Size: The red wine dataset contains 1,599 samples, while the
white wine dataset includes 4,898 samples, with 12 columns each.
● Data Types: All features are numerical, and suitable for direct application of machine
learning models without additional conversions.
● Summary Statistics: Used descriptive statistics to examine the distribution of each
feature, including mean, median, minimum, and maximum values, along with
standard deviation.
● Class Distribution: Analyzed the distribution of the quality variable to observe
imbalances. It was found that most wines fall into the average quality range (scores of
5–6), while high-quality and poor-quality wines are less frequent.
● Correlation Analysis: Computed the Pearson correlation coefficients between
features to identify relationships.

3.2 Data Encoding (If Applicable)

● No categorical data was present in the dataset, so no encoding was required. The data
was inherently numerical, simplifying preprocessing.

3.3 Data Visualization

● Distribution Plots:
○ Examined the distribution of numerical variables such as alcohol, pH and
citric acid.
○ Observed that variables like alcohol exhibit a right-skewed distribution, while
others like density are nearly uniform.
● Box Plots:
○ Explored variations in wine quality against features like volatile acidity and
residual sugar.
● Correlation Heatmap:
○ Generated a heatmap to visualize the correlation matrix, identifying strong
positive correlations like alcohol and quality and negative ones like volatile
acidity.
4. Methods Used
Regression Techniques:
4.1 Linear Regression:
● The simplest regression model, used to predict the wine quality score as a
linear combination of features. While interpretable, it struggled with
non-linear relationships.

4.2 Ridge Regression:

● A variant of linear regression, it included a penalty term to control overfitting
and handle multicollinearity among features. The regularization strength was
(alpha) tuned for optimal results.

4.3 Support Vector Regression:

● SVR extended SVM for regression tasks by fitting a hyperplane that
minimizes prediction error within a margin of tolerance. Non-linear kernels
like RBF were applied to model complex patterns.

4.4 Decision Tree Regression

● Modeled the wine quality score using a decision tree, which segmented the
data space based on physicochemical features. Pruning techniques were used
to prevent overfitting.

4.5 Lasso Regression:

● Similar to ridge regression but employed L1 regularization, driving
coefficients of less important features to zero. This helped in feature selection
and simplified the model.

4.6 Regression Analysis Using Artificial Neural Networks:

● A neural network regression model was trained to predict continuous quality
scores. The architecture consisted of multiple hidden layers, and the model
was trained using backpropagation. Dropout layers and batch normalization
were used to prevent overfitting and accelerate convergence.

4.7 Naïve Bayes Regression:

● Naïve Bayes Regression predicts wine quality by assuming Gaussian

distributions for features. It is simple and efficient but limited by its
independence assumption and struggles with complex relationships, making it
a baseline for comparison.
Results and Evaluation
4.6 Results

Fig Number 1. Comparison Between R^2 Score for various regression models for min. value

Figure Number 2. Accuracy % Comparison between all regression models for min. value

Figure Number 3. MAE Comparison for min. Value

Fig Number 4. RMSE Comparison for min. value

Figure Number 5. MAPE Comparison for min. value

Fig Number 6. Comparison Between R^2 Score for various regression models for max. value
Figure Number 7. Accuracy % Comparison between all regression models for max. value

Figure Number 8. MAE Comparison for max. Value

Figure Number 9. RMSE Comparison for max. value

Figure Number 10. MAPE for max. value

Figure Number 11. Comparison Between R^2 Score for the mean value

Figure Number 12. Accuracy % Comparison between all regression models for mean value
Fig Number 13. Comparison Between R^2 Score for various regression models for Standard
Deviation

Figure Number 14. Accuracy % Comparison between all regression models for standard
deviation

Fig Number 13. Comparison Between R^2 Score for Normal Data
Fig Number 14. Comparison Between MAE, MAPE, and RMSE Score for Mean Value

Fig Number 14. Comparison of Accuracy % for Normal Data

Fig Number 14. Comparison Between MAE, MAPE, and RMSE Score for Normal Data
Conclusion
In this project, we delved into the intriguing world of wine quality prediction by applying a variety of
machine-learning models to the Vinho Verde wine dataset. The project’s primary objective was to
determine how physicochemical properties influence the quality of wine and to leverage these insights
for accurate predictions of wine quality scores.

Throughout the process, we performed thorough Exploratory Data Analysis (EDA), where we
visualized distributions, correlations, and potential outliers within the dataset. This helped in
understanding the inherent patterns in the data, such as the strong correlation between alcohol content
and wine quality, and the negative correlation between volatile acidity and quality. Data cleaning and
preprocessing were also essential steps, ensuring that the dataset was ready for modeling by handling
missing values, encoding, and visualizing relationships.

We applied a wide range of classification and regression models, from simpler techniques like Naïve
Bayes and Decision Trees to more complex methods like Random Forests, Support Vector Machines
(SVM), and Neural Networks. Each model was tested for its ability to predict wine quality, either as a
categorical outcome (good vs. bad wine) or as a continuous variable (quality score). The classification
models showed varied success, with Random Forest and SVM achieving high accuracy in
distinguishing between "good" and "bad" wines. The regression models, particularly Random Forest
Regression and Neural Networks, demonstrated the best performance in predicting continuous quality
scores.

Key findings from the analysis include:

1. Feature Importance: The alcohol content was consistently one of the most important
predictors of wine quality.
2. Data Imbalance: The dataset's skewed distribution of wine quality classes necessitated special
attention to balance and model tuning.
3. Model Performance: Among the models, Random Forest and SVM outperformed others in
terms of predictive accuracy. The Neural Network provided a robust prediction for continuous
quality scores but was computationally more expensive.

Ultimately, the results highlight the value of machine learning in understanding complex data and
making predictions that can be applied to real-world problems, such as wine quality assessment in the
food and beverage industry. This project not only showcased the power of various algorithms but also
the importance of data preprocessing, feature selection, and model evaluation in building an effective
predictive system.

In conclusion, this analysis offers valuable insights into the factors influencing wine quality and
serves as a foundation for future improvements, such as incorporating additional features or
experimenting with advanced ensemble techniques. The lessons learned here can be applied to other
domains where prediction based on numerical features plays a crucial role in decision-making.
References

● P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis

Modeling wine preferences by data mining from physicochemical properties,
Decision Support Systems, Elsevier, 47(4):547-553, 2009.
● UCI Machine Learning Repository
Wine Quality Dataset
● Kaggle
Wine Quality Dataset
● Scikit-learn Documentation
https://scikit-learn.org
● Matplotlib Documentation
https://matplotlib.org
● Pandas Documentation
https://pandas.pydata.org
● Seaborn Documentation
https://seaborn.pydata.org
● Python Official Documentation
https://python.org
● Relevant Tutorials and Articles (Feature Engineering Techniques for Regression
Models)
Annexure 1:

Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
No ratings yet
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
6 pages
Chapter 9 Real Mortgage
100% (3)
Chapter 9 Real Mortgage
6 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
5081505-02-GB Servicemanual ULUF450 - 490 - 850 - 890 - 750 (G-214)
No ratings yet
5081505-02-GB Servicemanual ULUF450 - 490 - 850 - 890 - 750 (G-214)
60 pages
I, Hereby Declare That The Research Work Presented in The Summer Training Based Project Report Entitled, Study of Compotators of Frooti Juice
No ratings yet
I, Hereby Declare That The Research Work Presented in The Summer Training Based Project Report Entitled, Study of Compotators of Frooti Juice
98 pages
Bigmart PDF
No ratings yet
Bigmart PDF
20 pages
Event Action Script Call Equivalents
No ratings yet
Event Action Script Call Equivalents
17 pages
Towards A Critical Health Psychology Practice
100% (1)
Towards A Critical Health Psychology Practice
15 pages
Wine Case Report
100% (2)
Wine Case Report
16 pages
Bourns N1027 4300 Vs 4600 FPB
No ratings yet
Bourns N1027 4300 Vs 4600 FPB
23 pages
Iso 15614 11 2002
No ratings yet
Iso 15614 11 2002
12 pages
Lab Rheology and Injection Molding - 1
No ratings yet
Lab Rheology and Injection Molding - 1
3 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
Almera n16 Europa Em-K9k
No ratings yet
Almera n16 Europa Em-K9k
98 pages
Wine Quality
100% (1)
Wine Quality
2 pages
Data Set Information WINE QUALITY
100% (1)
Data Set Information WINE QUALITY
4 pages
Privileged User Appl
No ratings yet
Privileged User Appl
2 pages
Jawaban Laporan Arus Kas Dan Laba Rugi Komprehensif
No ratings yet
Jawaban Laporan Arus Kas Dan Laba Rugi Komprehensif
4 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
02 Chittick Vs CA
No ratings yet
02 Chittick Vs CA
5 pages
Pixilab Blocks
No ratings yet
Pixilab Blocks
108 pages
The Classification of White Wine and Red Wine Acco
No ratings yet
The Classification of White Wine and Red Wine Acco
5 pages
ML Mini Report
No ratings yet
ML Mini Report
6 pages
Irjmets Journal
No ratings yet
Irjmets Journal
7 pages
Cataloge E&H Weld-In Adapter and Flanges
No ratings yet
Cataloge E&H Weld-In Adapter and Flanges
40 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
Guillermo Garcia Rodriguez - Rivendel S.L
No ratings yet
Guillermo Garcia Rodriguez - Rivendel S.L
85 pages
An Investigation of Wine Quality Testing Using Machine Learning Techniques
No ratings yet
An Investigation of Wine Quality Testing Using Machine Learning Techniques
8 pages
Sutton Construction Inc Is A Privately Held Family Founded Corporation That
No ratings yet
Sutton Construction Inc Is A Privately Held Family Founded Corporation That
2 pages
Labor Law BarVenture 2024
No ratings yet
Labor Law BarVenture 2024
4 pages
24 Coercion Exercise
No ratings yet
24 Coercion Exercise
1 page
Is Homework Harmful or Helpful To Students
100% (1)
Is Homework Harmful or Helpful To Students
4 pages
Analytics Report
No ratings yet
Analytics Report
3 pages
ML Miniproject
No ratings yet
ML Miniproject
19 pages
VinQCheck: An Intelligent Wine Quality Assessment
No ratings yet
VinQCheck: An Intelligent Wine Quality Assessment
9 pages
Final PPT CAMPUS
No ratings yet
Final PPT CAMPUS
20 pages
Vit Assignment 4
No ratings yet
Vit Assignment 4
1 page
In Vino Veritas Data Mining and Machine Learning Final Project
No ratings yet
In Vino Veritas Data Mining and Machine Learning Final Project
11 pages
Project CST 383
No ratings yet
Project CST 383
1,083 pages
ML PR
No ratings yet
ML PR
32 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
Wine Quality Predictor
0% (1)
Wine Quality Predictor
9 pages
SAP Handling Unit Tcodes (Transaction Codes)
No ratings yet
SAP Handling Unit Tcodes (Transaction Codes)
6 pages
Listening HW
No ratings yet
Listening HW
3 pages
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Finaldocmp
No ratings yet
Finaldocmp
40 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Consultancy Project Assessment Sheet
No ratings yet
Consultancy Project Assessment Sheet
1 page
Reflection Task #2
No ratings yet
Reflection Task #2
2 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
Wine Quality Prediction GHAR
No ratings yet
Wine Quality Prediction GHAR
19 pages
IPR Assignment
No ratings yet
IPR Assignment
5 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Exploratory Data Analysis and Case
No ratings yet
Exploratory Data Analysis and Case
29 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
Lab Rep
No ratings yet
Lab Rep
9 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction
No ratings yet
Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction
15 pages
Mahima 2020
No ratings yet
Mahima 2020
8 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
2 pages
DWDM Glob
No ratings yet
DWDM Glob
20 pages
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
No ratings yet
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
5 pages
NFC Pre-Authorization API v1.01
No ratings yet
NFC Pre-Authorization API v1.01
5 pages
Geosynthetics in Civil Engineering by Oleg Stolyarov Unit 3
No ratings yet
Geosynthetics in Civil Engineering by Oleg Stolyarov Unit 3
52 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
22 pages
Incred PL LXDEL18924 257067901 06 12 2024 - Statement - of - Account - 140420
No ratings yet
Incred PL LXDEL18924 257067901 06 12 2024 - Statement - of - Account - 140420
2 pages
Project Report AS
No ratings yet
Project Report AS
32 pages
Xstkfinal
No ratings yet
Xstkfinal
29 pages
Pred Analytics
No ratings yet
Pred Analytics
5 pages
Wine Final Projects
No ratings yet
Wine Final Projects
19 pages
w15z3q
No ratings yet
w15z3q
10 pages
Homework #1 - Hida Efri Nurfina
No ratings yet
Homework #1 - Hida Efri Nurfina
13 pages
Tarot by Numbers by Liz Dean
100% (1)
Tarot by Numbers by Liz Dean
334 pages
Conjunction of 2 Planets
No ratings yet
Conjunction of 2 Planets
15 pages
Humair Arshad Wine Quality Revised
No ratings yet
Humair Arshad Wine Quality Revised
16 pages
Data Analysis and Modeling in R
No ratings yet
Data Analysis and Modeling in R
12 pages
Wine Quality Prediction Report
No ratings yet
Wine Quality Prediction Report
2 pages
EDA Mini Project Report
No ratings yet
EDA Mini Project Report
23 pages
5th Sem Mini Project Synopsis 2
No ratings yet
5th Sem Mini Project Synopsis 2
2 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Best Practices of Apheresis in Hematopoietic Cell Transplantation Ebook Full Text
100% (14)
Best Practices of Apheresis in Hematopoietic Cell Transplantation Ebook Full Text
14 pages
History 4/3 Gold Mining 1886
No ratings yet
History 4/3 Gold Mining 1886
15 pages
1 s2.0 S2212429223010052 Main
No ratings yet
1 s2.0 S2212429223010052 Main
16 pages
Blockchain Lab Assignment 1
No ratings yet
Blockchain Lab Assignment 1
5 pages
DBMS Lab Assignment 3
No ratings yet
DBMS Lab Assignment 3
2 pages
Rudranath Mahadev
No ratings yet
Rudranath Mahadev
2 pages
UEC502
No ratings yet
UEC502
1 page
Untitled Document
No ratings yet
Untitled Document
2 pages
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
No ratings yet
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
12 pages
Wine Quality Prediction Project Report
No ratings yet
Wine Quality Prediction Project Report
4 pages
FINLATICS
No ratings yet
FINLATICS
8 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
From Everand
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Wine Quality Analysis

Uploaded by

Wine Quality Analysis

Uploaded by

Mini Project

UNC514: Data Science Fundamentals

Wine Quality Analysis

Submitted by: Submitted to:

THAPAR INSTITUTE OF ENGINEERING & TECHNOLOGY, PATIALA, PUNJAB

Sr. No Content Page No

1 Introduction (objectives and problem statement) Max 2 pages

2 Data Preparation (detail introduction of dataset used) Max 2 pages

3 Exploratory Data Analysis (various EDA operation Max 4 pages

7 References (websites, books, researcharticle) 1 Page

1.2 Problem Definition and Scope

● Name: Wine Quality Dataset

Column Names and Descriptions

The dataset contains 11 input variables and 1 output variable:

Input Variables (Physicochemical Properties):

Output Variable (Sensory Property):

3.2 Data Encoding (If Applicable)

3.3 Data Visualization

4.2 Ridge Regression:

4.3 Support Vector Regression:

4.4 Decision Tree Regression

4.5 Lasso Regression:

4.6 Regression Analysis Using Artificial Neural Networks:

4.7 Naïve Bayes Regression:

● Naïve Bayes Regression predicts wine quality by assuming Gaussian

Figure Number 3. MAE Comparison for min. Value

Figure Number 5. MAPE Comparison for min. value

Figure Number 8. MAE Comparison for max. Value

Figure Number 9. RMSE Comparison for max. value

Fig Number 14. Comparison of Accuracy % for Normal Data

Key findings from the analysis include:

● P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis

You might also like