0% found this document useful (0 votes)

20 views10 pages

Wine Quality Prediction Using Machine Learning

Uploaded by

Online Learning

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views10 pages

Wine Quality Prediction Using Machine Learning

Uploaded by

Online Learning

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Wine Quality Prediction Using Machine Learning

BE G I NNE R C LA S S I F I C AT I O N M A C HI NE LE A RNI NG PYT HO N RE S O URC E S T RUC T URE D D AT A S UPE RVI S E D

This article was published as a part of the Data Science Blogathon.

Overview

Basics understanding of Wine.

Data description
Impor ting modules
Study dataset
Visualization
Handle null values
Split dataset
Normalization
Applying model
Save model
Endnote

Introduction

” Wine is the most healthful and most hygienic of beverages “

– Louis Pasteur

Yes, if you think deep down then you just notice that we are discussing wine, above quote seems to be
right because all over the world wine was soo popular among people, and 5% of the population doesn’t
know what is wine? sounds good.

We definitely came across the fruit graphs, which is soo sweet on the test but graphs are not just to eat,
they are used to make different types of things. Wine is one of them Wine is an alcoholic drink that is
made up of fermented grapes. If you have come across wine then you will notice that wine has also their
type they are red and white wine this was because of different varieties of graphs.

You are shocked to hear that the worldwide distribution of wine is 31 million tonnes which were huge in
number.
What if you think about the quality of wine, how can you differentiate the wine
according to their quality? The big question arises.

According to experts, the wine is differentiated according to its smell, flavor, and color, but we are not a
wine expert to say that wine is good or bad. What will we do then? Here’s the use of Machine Learning
comes, yes you are thinking to write we are using machine learning to check wine quality. ML have some
techniques that will discuss below:

To the ML model, we first need to have data for that you don’t need to go anywhere just click here for the
wine quality dataset. This dataset was picked up from the Kaggle.

Now, we start our journey towards the prediction of wine quality, as you can see in the data that there is
red and white wine, and some other features. Let’s start :

Description of Dataset

If you download the dataset, you can see that several features will be used to classify the quality of wine,
many of them are chemical, so we need to have a basic understanding of such chemicals.

volatile acidity : Volatile acidity is the gaseous acids present in wine.

fixed acidity : Primary fixed acids found in wine are tartaric, succinic, citric, and malic

residual sugar : Amount of sugar left after fermentation.

citric acid : It is weak organic acid, found in citrus fruits naturally.

chlorides : Amount of salt present in wine.

free sulfur dioxide : So2 is used for prevention of wine by oxidation and microbial spoilage.

total sulfur dioxide

pH : In wine pH is used for checking acidity

density

sulphates : Added sulfites preserve freshness and protect wine from oxidation, and bacteria.

alcohol : Percent of alcohol present in wine.

Rather than chemical features, you can see that there is one feature named Type it contains the types of
wine we here discuss on red and white wine, the percent of red wine is greater than white.

For the next step we have to import some important library :

Importing modules

Let’s import,

# import pandas import pandas as pd # import numpy import numpy as np # import seaborn import seaborn as sb #

import matplotlib import matplotlib.pyplot as plt

Let’s we take brief about these libraries, pandas are used for data analysis NumPy is for n-dimensional
array seaborn and matplotlib both have similar functionalities which are used for visualization.

The next step is to read the wine quality dataset and see their information:

Study dataset

For the next step, we have to check what technical information contained in the data,

# creating Dataframe object df = pd.read_csv(R'D://xdatasets/winequalityN.csv') print(df.head())

print(df.info()) print(df.describe())

output:-

As we see in the above image, there is vital information on features and with this information, we will
process our next work.

Visualization

We know that the “image speaks everything” here the visualization came into the work, we use
visualization for explaining the data. In other words, we can say that it is a graphic representation of data
that is used to find useful information.

df.hist(bins=25,figsize=(10,10)) # display histogram plt.show()

output:-

The above image reveals that how that data is easily distributed on features.

Now, we plot the bar graph in which we check what value of alcohol can able to make changes in quality.

plt.figure(figsize=[10,6]) # plot bar graph plt.bar(df['quality'],df['alcohol'],color='red') # label x-axis

plt.xlabel('quality') #label y-axis plt.ylabel('alcohol')

output:-
When we performing any machine learning operations then we have to study the data features deep, there
are many ways by which we can differentiate each of the features easily. Now, we will perform a correlation
on the data to see how many features are there they correlated to each other.

Correlation:-

For checking correlation we use a statistical method that finds the bonding and relationship between two
features.

# ploting heatmap plt.figure(figsize=[19,10],facecolor='blue') sb.heatmap(df.corr(),annot=True)

output:-
Now, we have to find those features that are fully correlated to each other by this we reduce the number of
features from the data.

If you think that why we have to discard those correlated, because relationship among them is equal they
equally impact on model accuracy so, we delete one of them.

for a in range(len(df.corr().columns)): for b in range(a): if abs(df.corr().iloc[a,b]) >0.7: name =

df.corr().columns[a] print(name)

Here we write a python program with that we find those features whose correlation number is high, as you
see in the program we set the correlation number greater than 0.7 it means if any feature has a correlation
value above 0.7 then it was considered as a fully correlated feature, at last, we find the feature total sulfur
dioxide which satisfy the condition.

So, we drop that feature

new_df=df.drop('total sulfur dioxide',axis=1)

Handle null values

In the dataset, there is so much notice data present, which will affect the accuracy of our ML model. In
machine learning, there are many ways to handle null or missing values. Now, we will use them to handle
our unorganized data.

new_df.isnull().sum()
We see that there are not many null values are present in our data so we simply fill them with the help of
the fillna() function.

new_df.update(new_df.fillna(new_df.mean()))

with this, we handle only numerical variables value because, we fill mean() and mean value is not for
categorical variables, so for categorical variables:-

# catogerical vars next_df = pd.get_dummies(new_df,drop_first=True) # display new dataframe next_df

You were able to see that the get_dummies() function which is used for handling categorical columns, in
this dataset ‘Type’ feature contains two types Red and White, where Red consider as 0 and white considers
as 1.

df_dummies[''best quality''] = [ 1 if x>=7 else 0 for x in df.quality] print(df_dummies)

Splitting dataset
Now we perform a split operation on our dataset:

from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test =

train_test_split(x,y,test_size=0.2,random_state=40)

Normalization

We do normalization on numerical data because our data is unbalanced it means the difference between
the variable values is high so we convert them into 1 and 0.

#importing module from sklearn.preprocessing import MinMaxScaler # creating normalization object norm =
MinMaxScaler() # fit data norm_fit = norm.fit(x_train) new_xtrain = norm_fit.transform(x_train) new_xtest =
norm_fit.transform(x_test) # display values print(new_xtrain)

Applying Model

This is the last step where we apply any suitable model which will give more accuracy, here we will use
RandomForestClassifier because it was the only ML model that gives the 88% accuracy which was
considered as the best accuracy.

RandomForestClassifier:-

# importing modules from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import

classification_report #creating RandomForestClassifier constructor rnd = RandomForestClassifier() # fit data

fit_rnd = rnd.fit(new_xtrain,y_train) # predicting score rnd_score = rnd.score(new_xtest,y_test) print('score
of model is : ',rnd_score) # display error rate print('calculating the error') # calculating mean squared

error rnd_MSE = mean_squared_error(y_test,y_predict) # calculating root mean squared error rnd_RMSE =

np.sqrt(MSE) # display MSE print('mean squared error is : ',rnd_MSE) # display RMSE print('root mean squared
error is : ',rnd_RMSE) print(classification_report(x_predict,y_test))

Now, we are at the end of our article, we can differentiate the predicted values and actual value.
x_predict = list(rnd.predict(x_test)) predicted_df = {'predicted_values': x_predict, 'original_values':

y_test} #creating new dataframe pd.DataFrame(predicted_df).head(20)

Saving Model

At last, we save our machine learning model:

import pickle file = 'wine_quality' #save file save = pickle.dump(rnd,open(file,'wb'))

So, at this step, our machine learning prediction is over.

End Notes

This is one of the interesting articles that I have written because it was on today’s current top technology
machine learning, but I was used basic language to explain this article so, you can’t get difficulty on
understanding.

If you have any question regarding this article then your will feel free to ask in the comment section below.

Thank you.

The media shown in this ar ticle are not owned by Analytics Vidhya and is used at the Author’s discretion.

Article Url - https://www.analyticsvidhya.com/blog/2021/04/wine-quality-prediction-using-machine-

learning/
mayurbadole2407

Project CST 383
No ratings yet
Project CST 383
1,083 pages
Modern Derivatization Methods For Separation Sciences (T. To
No ratings yet
Modern Derivatization Methods For Separation Sciences (T. To
588 pages
MLP Slides Merged
No ratings yet
MLP Slides Merged
480 pages
Machine Learning (16CIC73) Project Report Template
33% (3)
Machine Learning (16CIC73) Project Report Template
12 pages
THE REALITY THE CONCEPT OF GOD AND PROPHET MUHAMMAD PBUH IN THE BIBLE Second Edition Alhassan Abu-Bakr Sadiq Instant Download
100% (5)
THE REALITY THE CONCEPT OF GOD AND PROPHET MUHAMMAD PBUH IN THE BIBLE Second Edition Alhassan Abu-Bakr Sadiq Instant Download
47 pages
Texto 42
100% (1)
Texto 42
240 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Wine Quality Prediction Project Report
No ratings yet
Wine Quality Prediction Project Report
4 pages
Grade 9 Agriculture Project 004 - 2025
No ratings yet
Grade 9 Agriculture Project 004 - 2025
4 pages
Wine Quality Research Paper
100% (1)
Wine Quality Research Paper
3 pages
Prediction of Wine Quality Using Machine Learning
100% (1)
Prediction of Wine Quality Using Machine Learning
12 pages
Edau 5
No ratings yet
Edau 5
10 pages
College Project by Muhannad-3
No ratings yet
College Project by Muhannad-3
20 pages
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
No ratings yet
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
10 pages
FINLATICS
No ratings yet
FINLATICS
8 pages
CSC 240 HW 4
No ratings yet
CSC 240 HW 4
17 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Guillermo Garcia Rodriguez - Rivendel S.L
No ratings yet
Guillermo Garcia Rodriguez - Rivendel S.L
85 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
Wine
No ratings yet
Wine
22 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
Australia
No ratings yet
Australia
18 pages
SUBQUERIES
No ratings yet
SUBQUERIES
8 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
Oat Flour Recipes EBOOK
0% (1)
Oat Flour Recipes EBOOK
9 pages
Exploratory Data Analysis and Case
No ratings yet
Exploratory Data Analysis and Case
29 pages
45B AIML Practical07 Clustering
No ratings yet
45B AIML Practical07 Clustering
8 pages
Devesh
No ratings yet
Devesh
11 pages
Data Mining
No ratings yet
Data Mining
10 pages
Decision Support
No ratings yet
Decision Support
21 pages
Food Spoilage and Preservation
No ratings yet
Food Spoilage and Preservation
35 pages
Lab 1 Data Visualization and Statistics From Data
No ratings yet
Lab 1 Data Visualization and Statistics From Data
4 pages
Honours LY Project
No ratings yet
Honours LY Project
31 pages
Wine Quality Prediction GHAR
No ratings yet
Wine Quality Prediction GHAR
19 pages
ML LAB Mannual - Index
No ratings yet
ML LAB Mannual - Index
29 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
University of Mauritius: Assignment On Supervised & Unsupervised Machine Learning Algorithms
No ratings yet
University of Mauritius: Assignment On Supervised & Unsupervised Machine Learning Algorithms
71 pages
HW04
No ratings yet
HW04
3 pages
ML PR
No ratings yet
ML PR
32 pages
Business Analytics 1 Ca 2
No ratings yet
Business Analytics 1 Ca 2
26 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
R Project
No ratings yet
R Project
22 pages
Word Master BASIC 2022
No ratings yet
Word Master BASIC 2022
84 pages
Food Loss Waste Minimization Objectives
No ratings yet
Food Loss Waste Minimization Objectives
11 pages
Bài tập thì HTĐ lớp 6
No ratings yet
Bài tập thì HTĐ lớp 6
12 pages
Water Potablity Detection
No ratings yet
Water Potablity Detection
29 pages
ABUD KAKOOZA - Waiter - CV
No ratings yet
ABUD KAKOOZA - Waiter - CV
2 pages
English Year 3
100% (4)
English Year 3
11 pages
10.1007@978 981 13 7403 623
No ratings yet
10.1007@978 981 13 7403 623
9 pages
Practice Test: Ielts Usa
No ratings yet
Practice Test: Ielts Usa
11 pages
Mahima 2020
No ratings yet
Mahima 2020
8 pages
Exercise#9 Instructions 2021
No ratings yet
Exercise#9 Instructions 2021
5 pages
Unit 1 Future Jobs The Future Tenses 1. Be Going To
No ratings yet
Unit 1 Future Jobs The Future Tenses 1. Be Going To
6 pages
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
Program 5
No ratings yet
Program 5
3 pages
Report
No ratings yet
Report
6 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Wine Quality Classification
No ratings yet
Wine Quality Classification
36 pages
Wine DS
No ratings yet
Wine DS
14 pages
Red Wine Quality Prediction Using Machine Learning
No ratings yet
Red Wine Quality Prediction Using Machine Learning
4 pages
Constructa Doshwasher Manual
No ratings yet
Constructa Doshwasher Manual
38 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
E Grammar Exercises Ebook Demo
No ratings yet
E Grammar Exercises Ebook Demo
39 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Complete Test 3 Longman Volume B PDF
No ratings yet
Complete Test 3 Longman Volume B PDF
5 pages
Python Machine Learning Tutorial With Scikit-Learn
No ratings yet
Python Machine Learning Tutorial With Scikit-Learn
16 pages
7 Data Science / Machine Learning Cheat Sheets in One
100% (1)
7 Data Science / Machine Learning Cheat Sheets in One
9 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
MACRO
No ratings yet
MACRO
17 pages
Millets A Super Food or A Diet Fad
No ratings yet
Millets A Super Food or A Diet Fad
6 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
Materi Reducing & Omitting
No ratings yet
Materi Reducing & Omitting
1 page
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Fiona Wagaki Business Plan
No ratings yet
Fiona Wagaki Business Plan
49 pages
Case PPT of Pizza Hut 1
No ratings yet
Case PPT of Pizza Hut 1
16 pages
Sithccc 020 Final Marking Guide Work Effectively As A Cook1
No ratings yet
Sithccc 020 Final Marking Guide Work Effectively As A Cook1
18 pages
Red Bull It Gives You Wings: How It All Began
No ratings yet
Red Bull It Gives You Wings: How It All Began
14 pages
9 Simple Ways To Write Product Descriptions That Sell 1
No ratings yet
9 Simple Ways To Write Product Descriptions That Sell 1
4 pages
Vegetarian Lifestyle and Monitoring of Vitamin B-12 Status
No ratings yet
Vegetarian Lifestyle and Monitoring of Vitamin B-12 Status
13 pages
Vegetarian Flavors With Alamelu Press Release
No ratings yet
Vegetarian Flavors With Alamelu Press Release
2 pages
Research 9 2
No ratings yet
Research 9 2
7 pages
English 3kl
No ratings yet
English 3kl
4 pages
Cuestionario de Apropiación de Conocimientos AA3.docx Ingles Terminar
No ratings yet
Cuestionario de Apropiación de Conocimientos AA3.docx Ingles Terminar
2 pages

Wine Quality Prediction Using Machine Learning

Uploaded by

Wine Quality Prediction Using Machine Learning

Uploaded by

Wine Quality Prediction Using Machine Learning

BE G I NNE R C LA S S I F I C AT I O N M A C HI NE LE A RNI NG PYT HO N RE S O URC E S T RUC T URE D D AT A S UPE RVI S E D

This article was published as a part of the Data Science Blogathon.

Basics understanding of Wine.

” Wine is the most healthful and most hygienic of beverages “

volatile acidity : Volatile acidity is the gaseous acids present in wine.

residual sugar : Amount of sugar left after fermentation.

citric acid : It is weak organic acid, found in citrus fruits naturally.

chlorides : Amount of salt present in wine.

total sulfur dioxide

pH : In wine pH is used for checking acidity

alcohol : Percent of alcohol present in wine.

For the next step we have to import some important library :

import matplotlib import matplotlib.pyplot as plt

# creating Dataframe object df = pd.read_csv(R'D://xdatasets/winequalityN.csv') print(df.head())

df.hist(bins=25,figsize=(10,10)) # display histogram plt.show()

plt.figure(figsize=[10,6]) # plot bar graph plt.bar(df['quality'],df['alcohol'],color='red') # label x-axis

plt.xlabel('quality') #label y-axis plt.ylabel('alcohol')

# ploting heatmap plt.figure(figsize=[19,10],facecolor='blue') sb.heatmap(df.corr(),annot=True)

for a in range(len(df.corr().columns)): for b in range(a): if abs(df.corr().iloc[a,b]) >0.7: name =

So, we drop that feature

new_df=df.drop('total sulfur dioxide',axis=1)

Handle null values

# catogerical vars next_df = pd.get_dummies(new_df,drop_first=True) # display new dataframe next_df

df_dummies[''best quality''] = [ 1 if x>=7 else 0 for x in df.quality] print(df_dummies)

from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test =

# importing modules from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import

classification_report #creating RandomForestClassifier constructor rnd = RandomForestClassifier() # fit data

error rnd_MSE = mean_squared_error(y_test,y_predict) # calculating root mean squared error rnd_RMSE =

y_test} #creating new dataframe pd.DataFrame(predicted_df).head(20)

At last, we save our machine learning model:

import pickle file = 'wine_quality' #save file save = pickle.dump(rnd,open(file,'wb'))

So, at this step, our machine learning prediction is over.

Article Url - https://www.analyticsvidhya.com/blog/2021/04/wine-quality-prediction-using-machine-

You might also like