0% found this document useful (0 votes)

22 views29 pages

Xstkfinal

This project report from the Faculty of Applied Science at Ho Chi Minh City University of Technology focuses on analyzing the quality of wine using probability and statistics. It details the dataset properties, data collection methods, and the importance of physicochemical variables in predicting wine quality. The report includes sections on data handling, descriptive statistics, and multiple regression analysis to explore the relationships between various factors affecting wine quality.

Uploaded by

T.T.P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views29 pages

Xstkfinal

Uploaded by

T.T.P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Vietnam National University Ho Chi Minh City

Ho Chi Minh City University of Technology

Faculty of Applied Science
🙞···☼···🙜

PROBABILITY AND STATISTICS

PROJECT REPORT

Class: CC07 Group: 2

Instructor: Dr. Nguyen Tien Dung

No. Student Student ID Faculty

1 Trần Duy Phát 2052644 Applied Science
2 Lý Thanh Thúy Vy 1852885 Chemical Engineering
3 Đào Các Tường 2050025 Chemical Engineering
4 Lý Phổ Phương 2153710 Chemical Engineering
5 Võ Đăng Hoàng Vũ 2153982 Chemical Engineering
Ho Chi Minh City – May, 2023

CONTRIBUTION OF MEMBERS

No. Student Student ID Contribution

1 Trần Duy Phát 2052644 100%
2 Lý Thanh Thúy Vy 1852885 100%
3 Đào Các Tường 2050025 100%
4 Lý Phổ Phương 2153710 100%
5 Võ Đăng Hoàng Vũ 2153982 100%
Table of Contents
CONTRIBUTION OF MEMBERS ................................................................................ 2
1. INTRODUCTION ................................................................................................... 4
2. PROBLEM DEFINING .......................................................................................... 6
2.1. Definition ......................................................................................................... 6
2.2. Datasets properties .......................................................................................... 6
2.3. Data collection ................................................................................................. 7
2.4. Hypothesis........................................................................................................ 7
3. HANDLING THE DATA ........................................................................................ 8
3.1. Import the dataset ........................................................................................... 8
3.2. Data cleaning ................................................................................................... 8
3.2.1. Checking N/A ............................................................................................. 9
3.2.2. Removing duplicate .................................................................................... 9
3.2.3. Data summary .......................................................................................... 10
4. DESCRIPTIVE STATISTICS ............................................................................... 13
4.1. Univariate Analysis ............................................................................................ 13
4.1.1. Quality of wine ......................................................................................... 13
4.1.2. Level of alcohol ........................................................................................ 14
4.1.3. Density of wine......................................................................................... 15
4.1.4. Level of Volatile acidity............................................................................ 16
4.1.5. Level of Chlorides ( level of salt) .............................................................. 17
4.1.6. Summary .................................................................................................. 18
4.2. Bivariate Analysis .......................................................................................... 18
4.2.1. Correlation test. ......................................................................................... 18
4.2.1.1. Theories. ..................................................................................................... 18
5. MULTIPLE REGRESSION .................................................................................. 26
5.1. What is muiltiple regression? ....................................................................... 26
5.2. Applying into predicting the quality of the wine: ........................................ 26
6. TOTAL SUMMARY ................................................................................................. 28
6.1 Variables affect each other ................................................................................ 28
6.2 Variables affect the wine's quality ..................................................................... 28
REFERENCES............................................................................................................. 29
1. INTRODUCTION
Wine is an alcoholic drink typically made from fermented grapes. Yeast consumes
the sugar in the grapes and converts it to ethanol and carbon dioxide, releasing heat in the
process. Different varieties of grapes and strains of yeasts are major factors in different
styles of wine. These differences result from the complex interactions between the
biochemical development of the grape, the reactions involved in fermentation, the grape's
growing environment (terroir), and the wine production process. Many countries enact
legal appellations intended to define styles and qualities of wine. These typically restrict
the geographical origin and permitted varieties of grapes, as well as other aspects of wine
production. Wines can be made by fermentation of other fruit crops such as plum, cherry,
pomegranate, blueberry, currant and elderberry.
Wine has long played an important role in religion. Red wine was associated with
blood by the ancient Egyptians and was used by both the Greek cult of Dionysus and the
Romans in their Bacchanalia; Judaism also incorporates it in the Kiddush, and Christianity
in the Eucharist. Egyptian, Greek, Roman, and Israeli wine cultures are still connected to
these ancient roots. Similarly the largest wine regions in Italy, Spain, and France have
heritages in connection to sacramental wine, likewise, viticulture traditions in the
Southwestern United States started within New Spain as Catholic friars and monks first
produced wines in New Mexico and California.
Wine testing is a meliculus process of testing from the appearance, the smell to the
taste. All of that to conclude a quality score of the wine. The wine is always tested before
coming to the market. Moreover, testing wine is always needed in the R & D department
of any wine, any small change throughout the process of making the wine should change
its quality.
Testing wine was a human-based process which is slow and inefficient. In order to
produce more quality breeds of wine, machines need to come to hand. For that to happen,
we need machines to understand the wine. Unlike humans, machines do not get which
smell is good or which tastes of the wine is delicious, machines only test the wine by its
properties. Our goal is from the properties of wine, predicting the wine's quality.
The wine in test today is Vinho Verde, a wine originated from Portugal. Vinho
Verde is not a grape variety, the grape that comes from "Vinho Verde region" is the grape
that makes the wine. The wine is chosen due to its variability in breed of wine.
The data set is related to red variants of the wine. Due to privacy and logistic issues,
only physicochemical (inputs) and sensory (the output) variables are available (e.g. there
is no data about grape types, wine brand, wine selling price, etc.).
The dataset is obtained via:
Paulo Cortez, University of Minho, Guimarães, Portugal,
http://www3.dsi.uminho.pt/pcortez
A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho
Verde Region(CVRVV), Porto, Portugal
@2009
2. PROBLEM DEFINING
2.1. Definition
As mentioned, wine testing is meticulous and human - based process which is slow
and inefficient. To check the quality of wine in mass production before coming out to the
market, machines must come in hand.
Machines don't understand intuitive quality like good smell nor delicious taste. The
idea is to examine all the physicochemical of the wine, and with the quality given by a
specialist, we'll try to predict the quality of the wine by physicochemical.

2.2. Datasets properties

The dataset we got is the list of physicochemicals and the quality of the wine:
Variables Unit Description
Input variables
Fixed acidity g/dm3 Most acids involved with wine or fixed or nonvolatile (do
not evaporate readily)
Volatile acidity g/dm2 The amount of acetic acid in wine, which at too high of
levels can lead to an unpleasant, vinegar taste
Citric acid g/dm3 Found in small quantities, citric acid can add ‘freshness’
and flavor to wines
Residual sugar g/dm3 The amount of sugar remaining after fermentation stops,
it’s rare to find wines with less than 1 gram/liter and wines
with greater than 45 grams/liter are considered sweet
Chlorides g/dm3 The amount of salt sodium chloride in the wine
Free sulfur mg/dm3 The free form of SO2 exists in equilibrium between
dioxide molecular SO2 (as a dissolved gas) and bisulfite ion; it
prevents microbial growth and the oxidation of wine
Total sulfur mg/dm3 Amount of free and bound forms of S02; in low
dioxide concentrations, SO2 is mostly undetectable in wine, but at
free SO2 concentrations over 50 ppm, SO2 becomes
evident in the nose and taste of wine
Density g/cm3 The density of water is close to that of water depending on
the percent alcohol and sugar content
pH Describes how acidic or basic a wine is on a scale from 0
(very acidic) to 14 (very basic); most wines are between 3-
4 on the pH scale
Sulphates g/dm3 Potassium Sulphate - a wine additive which can contribute
to sulfur dioxide gas (SO2) levels, wich acts as an
antimicrobial and antioxidant
Alcohol % by volume The percent alcohol content of the wine
Output variable
Quality score between The quality of the wine.
0 and 10

2.3 . Data collection

The dataset is a list of wines tested and examined for its qualities and
physicochemical. Each bottle of wine will be examined and collect its properties then given
a score from specialists.
The wine selected in the list can vary from a sample of new wine to wine from
factory that need to check before coming to the market. With the pre-given score, we can
hopefully success in predicting the result.

2.3. Hypothesis
The plans are to first test dependency between all the input variable, and then test
dependency of the physicochemical to the quality. But we can come up with some
hypothesis that most likely to come true:
● The amount of CO2 (which made the wine's famous gassy taste) can affect
the quality.
● More alcohol may affect the quality.
● Density of the wine can affect the quality.
● Also, the amount of salt in the wine can affect the quality.
● And more....
The main hypothesis is that, maybe the physicochemical affect each other. We
haven't known which one affecting which one yet, trials will give us answer.

3. HANDLING THE DATA

3.1. Import the dataset
Firstly, we'll need to include some packages (if the packages not installed, install
it):
#package for cleaning

#install.packages("janitor")
library(janitor)
library(dplyr)
#package for ploting

#install.packages("ggplot2")
#install.packages("GGally")

library(ggplot2)
library(GGally)
winequality <- read.csv('winequality-red.csv')
The command will import the dataset and put it into the name winequality for us to
use.
Note that, from now, the dataset in R is in the name of winequality.

3.2. Data cleaning

Before analyzing the data, we must check and maybe clean the data (if needed).
3.2.1. Checking N/A
First, we will look over the dataset and see if there is any N/A value. This can be
done with the following function:
winequality <- winequality %>% distinct()

dim(winequality)
colSums(is.na(winequality))

The result is the numbers of N/A of each column:

● fixed acidity: 0
● volatile acidity: 0
● citric acid: 0
● residual sugar: 0
● chlorides: 0
● free sulfur dioxide: 0
● total sulfur dioxide: 0
● density: 0
● pH: 0
● sulphates: 0
● alcohol: 0
● quality: 0
We can see that there are no missing values in our data, that mean there is no
cleaning needed in this compartment.

3.2.2. Removing duplicate

Next, we remove duplicate entry (if there is any) using these commands:
winequality <- winequality %>% distinct()
In order to prove there is duplicate entry, I'll add these commands:
dim(winequality)

The dimmension has been decreased, that mean there are duplicate rows.
Note that, because the ease of only add 1 command, we will ignore the process of checking
first then remove.

3.2.3. Data summary

After cleaning the dataset, we will have a summary look to it. We'll investigate:
● Dimension of the dataset, using:
dim(winequality)

● Min, Max, Mean, Median, 1st quadrant, 3rd Quadrant, using:

summary(winequality)
● Structure of the data set, using:
str(winequality)

After investigating the dataset, we get some observation:

● Mean residual sugar level is 5.4 g/l, but there is a sample of very sweet wine with
65.8 g/l (an outlier).
● Mean free sulfur dioxide is 30.5 ppm. Max value is 289 which is quite high as 75%
is 41 ppm.
● PH of wine is within range from 2.7 till 4, mean 3.2.
● There are no basic wines in this dataset (no high pH levels).
● Alcohol: lightest wine is 8%, strongest is 14.9.
● Minimum quality mark is 3, mean 5.8, highest is 9.
Also, as there is Outlier in our dataset, we need to keep in mind when going further
in analyzing the dataset.
4. DESCRIPTIVE STATISTICS
4.1. Univariate Analysis
First, we'll need to have a look at some of the variables to see plot, and determined
which one will be chosen to be examining further.
4.1.1. Quality of wine
As our main concern is the quality of wine, we might as well look at it first.
We have seen the summary for all the data, but we will call it again for easier anlysing.
Also, we will add another command that will draw a table for us.
summary(winequality$quality)
dim(winequality$quality)
table(winequality$quality)

As we can see, the quality will vary from 3 to 8, and there is no outlier. The unit is
decimal, so we will plot quality's pmf with the limit of 2 to 9 ( larger than 2 and less than
9) and unit is 1.

qplot(quality, data = winequality, fill = "red", binwidth = 1) +

scale_x_continuous(breaks = seq(3,8,1), lim = c(2,9)) +
scale_y_sqrt()
We can see that quality have a normal distrubution with the peak at 5 at 6 (5 is a
little bit higher)

4.1.2. Level of alcohol

Level of alcohol must be one of the importance properties that we want to look at,
we will follow the same step of first get the summary of it, also as it will vary way more
than quality, we will not built the table of it (you will see the variation after we plot it).

summary(winequality$alcohol)

Then we plot it:

ư
Alcohol level distribution looks skewed. Most frequently wines have 9.5%, mean is
10.49% of alcohol.

4.1.3. Density of wine

Density of wine is another properties that we want to look at. The step is the same
as level of alcohol, so we will cut the instruction:

summary(winequality$density)

Then we plot it, to see the distribution clearer, we'll use log10:

qplot(density, data = winequality, fill = "red", binwidth = 0.0002) +

scale_x_log10(lim = c(min(winequality$density), 1.00370),
breaks = seq(min(winequality$density), 1.00370, 0.002))
Looking at ‘table’ summary we see that there are two outliers: 1.0103 and 1.03898.
To see the distribution of density clearer I used log10 and limited the data. Now we can
see that density distribution of the wine is normal.

4.1.4. Level of Volatile acidity

High level of acidity can lead to bad taste, we might as well looking at it:
Summary:
summary(winequality$volatile.acidity)

Ploting:
qplot(volatile.acidity, data = winequality, fill = "red", binwidth = 0.001) +
scale_x_log10(breaks = seq(min(winequality$volatile.acidity),
max(winequality$volatile.acidity), 0.1))
Volatile acidity has normal distribution.

4.1.5. Level of Chlorides (Level of salt)

As mentioned before, high level of salt in the wine is not good, we might want to
look at it:
Summary:
summary(winequality$clhorides)

Ploting:
qplot(chlorides, data = winequality, fill = "red", binwidth = 0.01) +
scale_x_log10(breaks = seq(min(winequality$chlorides), max(winequality$chlorides),
0.1))
Chlorides distribution initially is skewed so I used log10 to see the distribution
clearer.

4.1.6. Summary
After examining all the properties, we decided to only show some of it as it's
unnecessary to see all the plot.
We can see that some of the quality is in normal distribution and some quite skewed.

4.2. Bivariate Analysis

4.2.1. Correlation test.
4.2.1.1. Theories.
Correlation analysis is a statistical method used to evaluate the strength of the
relationship between two or more quantitative variables. A high correlation coefficient
means that two or more variables have a strong relationship with each other, while a weak
correlation means that the variables are hardly associated.
Statistical correlation is measured by the coefficent of correlation (𝑟). Its numerical
value ranges from + 1.0 to − 1.0. It gives us an indication of both the strength and direction
of the relationship between variables.
In genaral, 𝑟 > 0 indicates a positive relationship while 𝑟 < 0 signals a negative
relationship. 𝑟 = 0 indicate dis-allocation (or that the variables are independent of each
other and not related).
𝑟 = +1.0 describes a perfect positive correlation and 𝑟 = −1.0 describes a
perfect negative correlation. The closer the coefficients are to + 1.0 to − 1.0, the greater
the strength of disintegration of the relationship between the variables.

4.2.1.2. Methodologies.
Method that we're using to perform correlation analysis:
• Pearson correlation formula:
Σ(x − 𝑥 )(𝑦 − 𝑦)
r =
√Σ(𝑥 − 𝑥 )2 (𝑦 − 𝑦 )2
The p-value can then be determined via the t-value which follows the t-distribution
with 𝑛 − 2 degree of freedom:
𝑟
𝑡= . √𝑛 − 2
√1 − 𝑟 2
If p-value < 5% then the correlation between variables are significant.

4.2.2. Linear regression

Regression analysis is a collection of statistical tools that are used to model and
explore relationships between variables that are related in a nondeterministic manner.
Multiple Linear Regression attempts to model a linear relationship between a dependent
variable (response) and some independent variables (predictors/regressors).
To model the dataset using multiple linear regression, consider 𝑋𝑖 and 𝑌𝑖 where i =
1, 2, 3, ..., n. The model states that:
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝜀𝑖
where 𝛼, 𝛽 are regression coefficients and 𝜀 is a variable follows the normal
distribution with 𝜀~𝑁(0, 𝜎 2 ).

In reality, we can develop this into multiple 𝑋𝑖 and 𝑌𝑖 , namely 𝑋𝑖𝑗 and 𝑌𝑖𝑗 that
follows:
𝑌𝑖 = 𝛼 + 𝛽1𝑋𝑖1 + 𝛽2𝑋𝑖2 + ⋯ + 𝛽𝑗𝑋𝑖𝑗 + 𝜀𝑖

4.2.3. Relationship visualization.

As said in the hypothesis, we also want to find the relations between variables if
there is any. We will try to calculate the correlation between the variables.
We can mass calculate correlations between the variables, then only look at the pair
with high correlation. The method of calculating correlation is Pearson correlation
formula.
Also, we can plot data and linear regression line between pairs only use the
command:
ggpairs(winequality)
After seeing the plot, we can see that there are these pari that have high
correlation:
● citric acid and fixed acidity: 0.667.
● density and fixed acidity: 0.670.
● pH and fixed acidity: -0.687
● citric acid and volatile acidity: -0.551
● pH and citric acidity: -0.550
● total sulfur dioxide and free sulfur dioxide: 0.667.
● alcohol density: -0.505.
These correlations will need to double check by plotting. We will ignore pair
between pH and acids or acidity as we have all known their relation. The method we are
doing so is linear regression, drawing data and plot line.
To do so, we will need write our own function for easier coding:
f <- function(dataset, x, y, opts=NULL) {
ggplot(dataset, aes_string(x = x, y = y)) + #the plot
geom_point(alpha = 1/5, position = position_jitter(h = 0), size = 2) + #plot point
geom_smooth(method = 'lm') #plot the correlation line
}
The function will plot point and correlation line for us.
● Citric acid and Fixed acidity: 0.667.
# Citric acid and fixed acidity
p <- f(winequality, "citric.acid", "fixed.acidity")
p + coord_cartesian(xlim=c(min(winequality$citric.acid),max(winequality$citric.acid)),
ylim=c(min(winequality$fixed.acidity),max(winequality$fixed.acidity)))

● Density and Fixed acidity: 0.670.

# Density and fixed acidity
p <- f(winequality, "density", "fixed.acidity")
p + coord_cartesian(xlim=c(min(winequality$density),max(winequality$density)),
ylim=c(min(winequality$fixed.acidity),max(winequality$fixed.acidity)))
● Citric acid and Volatile acidity: -0.551
# citric acid and volatile acidity
p <- f(winequality, "citric.acid", "volatile.acidity")
p + coord_cartesian(xlim=c(min(winequality$citric.acid),max(winequality$citric.acid)),
ylim=c(min(winequality$volatile.acidity),max(winequality$volatile.acidity)))
● Total sulfur dioxide and free sulfur dioxide: 0.667.
# total sulfur dioxide and free sulfur dioxide
p <- f(winequality, "free.sulfur.dioxide", "total.sulfur.dioxide")
p+
coord_cartesian(xlim=c(min(winequality$free.sulfur.dioxide),max(winequality$free.sulfu
r.dioxide)),
ylim=c(min(winequality$total.sulfur.dioxide),max(winequality$total.sulfur.dioxide)))

● Alcohol and density: -0.505.

# density vs. alcohol plot
p <- f(winequality, "density", "alcohol")
p + coord_cartesian(xlim=c(min(winequality$density),max(winequality$density)),
ylim=c(min(winequality$alcohol),max(winequality$alcohol)))
We can see all the correlation calculate are true.
Summary:
After analyzed 2 of each variables, we got relation between:
● Citric acid and Fixed acidity, correlation value: 0.667.
● Fensity and Fixed acidity, correlation value: 0.670.
● Citric acid and volatile acidity, correlation value: -0.551
● Total Sulfur dioxide and free sulfur dioxide, correlation value: 0.667.
● Alcohol and Density, correlation value: -0.505.
5. MULTIPLE LINEAR REGRESSION
5.1. What is muiltiple regression?
In reality, cheking dependency pair by pairs is inefficiency, using multiple linear
regression can mass calculate the coefficient, especially we'll using machine to calculate.
From the linear regression formula, we can develop this into multiple 𝑋𝑖 and 𝑌𝑖 ,
namely 𝑋𝑖𝑗 and 𝑌𝑖𝑗 that follows:
𝑌𝑖 = 𝛼 + 𝛽1𝑋𝑖1 + 𝛽2𝑋𝑖2 + ⋯ + 𝛽𝑗𝑋𝑖𝑗 + 𝜀𝑖

5.2. Applying into predicting the quality of the wine:

We will use multiple linear regression to test the variable affecting the wine's
quality:
We will use these codes to applying :
abc <-
glm(quality~fixed.acidity+volatile.acidity+citric.acid+residual.sugar+chlorides+free.sulf
ur.dioxide+total.sulfur.dioxide+density+pH +sulphates+alcohol, data = winequality)

summary(abc)
The first line is building the model with all the variables, the second is to display it
to the screen.

We will want to look as the variables with p-value nearly 0 ( with the code
***behind)
● volatile acidity with the coefficient of: -1.1204370.
● chlorides with the coefficient of: - 1.9302567
● total sulfur dioxide with the coefficient of: - 0.0027073
● sulphates with the coefficient of: 0.9147023
● alcohol with the coefficient of: 0.2895307
Each with the coefficient of affecting the quality score. For example, with the
coefficient of -1.12, each 1% of volatile acidity increase, the quality score decreases 1.12%.
As we can see there a lot of variables will affect the result.

6. TOTAL SUMMARY
6.1 Variables affect each other
As predicting from the beginning, there are varialble that will affect each other
beside the one that is obvious. The results are those pair:
● Citric acid and Fixed acidity, correlation value: 0.667.
● Density and Fixed acidity, correlation value: 0.670.
● Citric acid and Volatile acidity, correlation value: -0.551
● Total sulfur dioxide and Free sulfur dioxide, correlation value: 0.667.
● Alcohol and Density, correlation value: -0.505.

6.2 Variables affect the wine's quality

There are quite alot variable that will affect the wine quality:
● Volatile acidity with the coefficient of: -1.1204370.
● Chlorides with the coefficient of: - 1.9302567
● Total sulfur dioxide with the coefficient of: - 0.0027073
● Sulphates with the coefficient of: 0.9147023
● Alcohol with the coefficient of: 0.2895307
REFERENCES
[1] Bevans, R. (2022, November 15). Linear Regression in R | A Step-by-Step Guide &
Examples. Scribbr. https://www.scribbr.com/statistics/linear-regression-in-r/
[2] Kelly, L., PhD. (2020, November 20). Practice 9 Calculating Confidence Intervals in R
| R Practices for Learning Statistics. https://bookdown.org/logan_kelly/r_practice/p09.html

Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
No ratings yet
Combined Synthetic Minority Oversampling Technique and Deep Neural Network For Red Wine Quality Prediction
6 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
Manual de Servicio de Analizador de Química Clínica
0% (1)
Manual de Servicio de Analizador de Química Clínica
516 pages
Breakout Play (Trend Following) - Trading Plan - Full (Sample)
91% (11)
Breakout Play (Trend Following) - Trading Plan - Full (Sample)
15 pages
Lab Rep
No ratings yet
Lab Rep
9 pages
Econometrics Project AARYAN BHANOT
No ratings yet
Econometrics Project AARYAN BHANOT
13 pages
Nuriel Shalom Mor - Wine Quality and Type Prediction
No ratings yet
Nuriel Shalom Mor - Wine Quality and Type Prediction
13 pages
Physiocochemical Properties That Affects Wine Quality: A Multiple Linear Analysis
No ratings yet
Physiocochemical Properties That Affects Wine Quality: A Multiple Linear Analysis
12 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
The Classification of White Wine and Red Wine Acco
No ratings yet
The Classification of White Wine and Red Wine Acco
5 pages
An Investigation of Wine Quality Testing Using Machine Learning Techniques
No ratings yet
An Investigation of Wine Quality Testing Using Machine Learning Techniques
8 pages
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
No ratings yet
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
5 pages
Wine Quality Synopsis
No ratings yet
Wine Quality Synopsis
3 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
Project Report AS
No ratings yet
Project Report AS
32 pages
1 s2.0 S2212429223010052 Main
No ratings yet
1 s2.0 S2212429223010052 Main
16 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
Pred Analytics
No ratings yet
Pred Analytics
5 pages
Wine Case Report
100% (2)
Wine Case Report
16 pages
Humair Arshad Wine Quality Revised
No ratings yet
Humair Arshad Wine Quality Revised
16 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
Wine Quality Prediction Using ML PPR
100% (1)
Wine Quality Prediction Using ML PPR
8 pages
Physicochemical and Antioxidant Properties of Selected Polish Grape and Fruit Wines
No ratings yet
Physicochemical and Antioxidant Properties of Selected Polish Grape and Fruit Wines
11 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
1 Literature Review Cactuse
No ratings yet
1 Literature Review Cactuse
31 pages
Wine Quality Dataset
No ratings yet
Wine Quality Dataset
2 pages
ETMHS19309
No ratings yet
ETMHS19309
6 pages
Exploring The Influence of Terroir On Douro White and Red Wines Characteristics: A Study of Human Perception and Electronic Analysis
No ratings yet
Exploring The Influence of Terroir On Douro White and Red Wines Characteristics: A Study of Human Perception and Electronic Analysis
17 pages
Nuriel Shalom Mor - Physicochemical Properties Importance Wine
No ratings yet
Nuriel Shalom Mor - Physicochemical Properties Importance Wine
17 pages
ICDMpaperv 1
No ratings yet
ICDMpaperv 1
11 pages
A Data Mining Approach To Wine Quality Prediction - Radosavljevic, Ilic, Pitulic
No ratings yet
A Data Mining Approach To Wine Quality Prediction - Radosavljevic, Ilic, Pitulic
5 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Farlin Bnad276-003 Completed Analytics Report
No ratings yet
Farlin Bnad276-003 Completed Analytics Report
6 pages
Analytics Report
No ratings yet
Analytics Report
3 pages
Homework #1 - Hida Efri Nurfina
No ratings yet
Homework #1 - Hida Efri Nurfina
13 pages
White Wine Production
No ratings yet
White Wine Production
3 pages
Price and Quality in The California Wine Industry An Empirical Investigation
No ratings yet
Price and Quality in The California Wine Industry An Empirical Investigation
15 pages
VEN124 Section 1
No ratings yet
VEN124 Section 1
67 pages
Chemometric Characterization of Italian Wines by Thin-Film Multisensors Array and Artificial Neural Networks
No ratings yet
Chemometric Characterization of Italian Wines by Thin-Film Multisensors Array and Artificial Neural Networks
14 pages
Understanding Wines
No ratings yet
Understanding Wines
16 pages
Wine Quality Predictor
0% (1)
Wine Quality Predictor
9 pages
Molecules 28 06326 v2
No ratings yet
Molecules 28 06326 v2
15 pages
Quality Concept and Wine Styles 1. What Is The Quality of A Wine?
No ratings yet
Quality Concept and Wine Styles 1. What Is The Quality of A Wine?
16 pages
AI Projects
No ratings yet
AI Projects
41 pages
Bnad Case Assignment 1 - Hunter Bona
No ratings yet
Bnad Case Assignment 1 - Hunter Bona
7 pages
F & B Project Works
No ratings yet
F & B Project Works
22 pages
Geographical Origin Traceability of Red Wines Based On Chemometric Classification Via Organic Acid Profiles
No ratings yet
Geographical Origin Traceability of Red Wines Based On Chemometric Classification Via Organic Acid Profiles
17 pages
Sns College of Technology: Ganapathy Post, Coimbatore-641006
No ratings yet
Sns College of Technology: Ganapathy Post, Coimbatore-641006
54 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Wine Making
100% (1)
Wine Making
31 pages
Wine Quality Control
No ratings yet
Wine Quality Control
3 pages
Wine Storage and Bottling QC
100% (1)
Wine Storage and Bottling QC
12 pages
BHM 602BT
No ratings yet
BHM 602BT
110 pages
Wines - 1-1-12
No ratings yet
Wines - 1-1-12
12 pages
Preparation of Wine 22
No ratings yet
Preparation of Wine 22
13 pages
Authentication of Italian CDO Wines by Class-Modeling Techniques
No ratings yet
Authentication of Italian CDO Wines by Class-Modeling Techniques
8 pages
Wine Quality Prediction Research Paper 22
No ratings yet
Wine Quality Prediction Research Paper 22
6 pages
VinQCheck: An Intelligent Wine Quality Assessment
No ratings yet
VinQCheck: An Intelligent Wine Quality Assessment
9 pages
Introduction To Wine
No ratings yet
Introduction To Wine
21 pages
AI-Powered Course Recommendation System
No ratings yet
AI-Powered Course Recommendation System
11 pages
FSBC01 The Use of Repair and Maintenance Budget For Buildings
No ratings yet
FSBC01 The Use of Repair and Maintenance Budget For Buildings
5 pages
Inputs and Outputs List Page:1/21: Example-9: Sequential Control of Induction Motors
No ratings yet
Inputs and Outputs List Page:1/21: Example-9: Sequential Control of Induction Motors
7 pages
3 Recessed
No ratings yet
3 Recessed
11 pages
Phase 0
No ratings yet
Phase 0
15 pages
01.09 - Vocab - Life in The Countryside
No ratings yet
01.09 - Vocab - Life in The Countryside
3 pages
RAC QB Final-2023
No ratings yet
RAC QB Final-2023
9 pages
Introduction To Interdisciplinary Studies 2nd Edition Repko Test Bank
100% (33)
Introduction To Interdisciplinary Studies 2nd Edition Repko Test Bank
8 pages
Đề thi minh họa số 16
No ratings yet
Đề thi minh họa số 16
6 pages
Air Compressor Parts PDF
0% (1)
Air Compressor Parts PDF
51 pages
Translate Skenario Toefl
No ratings yet
Translate Skenario Toefl
2 pages
Number System Representation - Study Notes
No ratings yet
Number System Representation - Study Notes
12 pages
Project Proposal Seminar Workshop
No ratings yet
Project Proposal Seminar Workshop
6 pages
Modeling Class X AI
No ratings yet
Modeling Class X AI
24 pages
Battery Room Gas Monitoring Application Note WSA Datasheet
No ratings yet
Battery Room Gas Monitoring Application Note WSA Datasheet
3 pages
School Plan of Activities Sembreak
No ratings yet
School Plan of Activities Sembreak
2 pages
ISO 9001 Clauses Simply Explained Rev.1
No ratings yet
ISO 9001 Clauses Simply Explained Rev.1
26 pages
PDF Living On A Prayer - English Version
No ratings yet
PDF Living On A Prayer - English Version
17 pages
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 04-02 - Piston and Cylinder Sleeve
No ratings yet
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 04-02 - Piston and Cylinder Sleeve
4 pages
Raghuvamsa CantoV English Meaning
No ratings yet
Raghuvamsa CantoV English Meaning
69 pages
March Apr Current RAS NEW (1) 1
No ratings yet
March Apr Current RAS NEW (1) 1
40 pages
Tax Invoice: 1046.17 Total Invoice Amount Rs
No ratings yet
Tax Invoice: 1046.17 Total Invoice Amount Rs
2 pages
Chapter 3 Data Modeling Using The Entity Relationship ER Model
No ratings yet
Chapter 3 Data Modeling Using The Entity Relationship ER Model
55 pages
Gravity Light Project
No ratings yet
Gravity Light Project
16 pages
Intermittent Fasting
100% (1)
Intermittent Fasting
36 pages
Amec Unit 2 QB
No ratings yet
Amec Unit 2 QB
23 pages
To 15a8-4-10-3 Navair 03-30ak-103
No ratings yet
To 15a8-4-10-3 Navair 03-30ak-103
42 pages
Records Management Plan Template 042022
No ratings yet
Records Management Plan Template 042022
4 pages

Xstkfinal

Uploaded by

Xstkfinal

Uploaded by

Vietnam National University Ho Chi Minh City

Ho Chi Minh City University of Technology

PROBABILITY AND STATISTICS

Class: CC07 Group: 2

No. Student Student ID Faculty

No. Student Student ID Contribution

2.2. Datasets properties

2.3 . Data collection

3. HANDLING THE DATA

3.2. Data cleaning

The result is the numbers of N/A of each column:

3.2.2. Removing duplicate

3.2.3. Data summary

● Min, Max, Mean, Median, 1st quadrant, 3rd Quadrant, using:

After investigating the dataset, we get some observation:

qplot(quality, data = winequality, fill = "red", binwidth = 1) +

4.1.2. Level of alcohol

Then we plot it:

4.1.3. Density of wine

qplot(density, data = winequality, fill = "red", binwidth = 0.0002) +

4.1.4. Level of Volatile acidity

4.1.5. Level of Chlorides (Level of salt)

4.2. Bivariate Analysis

4.2.2. Linear regression

4.2.3. Relationship visualization.

● Density and Fixed acidity: 0.670.

● Alcohol and density: -0.505.

5.2. Applying into predicting the quality of the wine:

6.2 Variables affect the wine's quality

You might also like