Wine Quality Prediction Using Machine Learning Algorithms
Wine Quality Prediction Using Machine Learning Algorithms
Abstract: Wine classification is a difficult task since taste is the least understood of the human senses. A good wine quality
prediction can be very useful in the certification phase, since currently the sensory analysis is performed by human tasters, being
clearly a subjective approach. An automatic predictive system can be integrated into a decision support system, helping the
speed and quality of the performance. Furthermore, a feature selection process can help to analyze the impact of the analytical
tests. If it is concluded that several input variables are highly relevant to predict the wine quality, since in the production process
some variables can be controlled, this information can be used to improve the wine quality. Classification models used here are
1) Random Forest 2) Stochastic Gradient Descent 3) SVC 4)Logistic Regression .
www.ijcat.com 385
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
Dataset Description: The two datasets are related Other than that the selection is being done randomly with
to red wine of the Portuguese "Vinho Verde" wine. For uniform distribution.
more details, consult: [Web Link] or the reference [Cortez Various classification and regression algorithms are used
et al., 2009]. Due to privacy and logistic issues, only to fit the model. The algorithms used in this paper are as
physicochemical (inputs) and sensory (the output) follows:
variables are available (e.g. there is no data about grape
types, wine brand, wine selling price, etc.). For classification:
These datasets can be viewed as classification or Random Forest Decision Trees classifier
regression tasks. The classes are ordered and not balanced
(e.g. there are many more normal wines than excellent or Support Vector Machine classifier
poor ones). Outlier detection algorithms could be used to
Stochastic gradient descent
detect the few excellent or poor wines. Also, we are not
sure if all input variables are relevant. So it could be Logistic Regression classifier
interesting to test feature selection methods.
Preprocessing: Label Encoding is used to convert
1)fixed acidity the labels into numeric form so as to convert it into the
2) volatile acidity machine-readable form. It is an important pre-processing
3) citric acid step for the structured dataset in supervised learning. We
4) residual sugar have used label encoding to label the quality of data as
5) chlorides good or bad. Assigning 1 to good and 0 to bad.
6)free sulfur dioxide
7)total sulfur dioxide
8)density
9)pH Feature Selection:
10) sulphates
As we can clearly see, volatile acidity and residual sugar
11) alcohol
are both not very impact full of the quality of wine. Hence
Output variable (based on sensory data):
we can eliminate these features. Though we are selecting
12)quality (score between 0 and 10)
these features, they will change according to the domain
experts.
IV. DATA PROCESSING METHODS
www.ijcat.com 386
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
www.ijcat.com 387
International Journal of Computer Applications Technology and Research
Volume 8–Issue 09, 385-388, 2019, ISSN:-2319–8656
CONCLUSION
Based on the bar plots plotted we come to an conclusion
that not all input features are essential and affect the data,
for example from the bar plot against quality and residual
sugar we see that as the quality increases residual sugar is
moderate and does not have change drastically. So this
feature is not so essential as compared to others like
alcohol and citric acid, so we can drop this feature while
feature selection.
1) Logistic Regression
4) Random Forest
References:
[1] Yunhui Zeng1 , Yingxia Liu1 , Lubin Wu1 , Hanjiang
Dong1. “Evaluation and Analysis Model of Wine Quality
Based on Mathematical Model ISSN 2330-2038 E-ISSN
2330-2046,Jinan University, Zhuhai,China.
www.ijcat.com 388