ML PR
ML PR
COLLEGE OF ENGINEERING
1
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING
CERTIFICATE
This is to certify that the requirements for the project report entitled ‘Wine Quality Prediction’ have
been successfully completed by the following students:
In partial fulfillment of B. Tech in the Department of CSE, BVDU DET, during the Academic Year
2023 – 2024.
Subject In charge
2
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
DECLARATION
We declare that this written submission for B.TECH project entitled “Wine Quality Prediction” represent our ideas in our
own words and where others' ideas or words have been included, we have adequately cited and referenced the original
sources. We also declare that we have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any ideas / data / fact / source in our submission. We understand that any
violation of the above will cause disciplinary action by the institute and also evoke penal action from the sources which
have not been properly cited or from whom prior permission have not been taken when needed.
3
Abstract
The main goal of this project is to predict wine quality whether it is good or bad. For centuries
tasting has been done by humans and they have always predicted on the basis of sensory organs. But
in recent times industries are adopting newer technologies and applying them in all kinds of areas.
But still there are many areas in which human expertise is needed like product quality assurance.
Nowadays, it has become an expensive process as the demand for product is growing over the time.
Therefore, this project searches different machine learning techniques such as MLP classifier,
Decision Tree classifier, Support Vector Machines (SVM) for product quality assurance. These
techniques do quality assurance process with the help of available characteristics of product and
automate the process by minimizing human interference. The "Machine Learning-Based Wine
Quality Prediction" project is a data-driven endeavor designed to harness the power of machine
learning algorithms to forecast the quality of wines based on various physicochemical and sensory
attributes. In the ever-expanding world of wine production, the ability to predict wine quality
accurately is invaluable for winemakers and consumers alike. This project leverages a comprehensive
dataset of red and white wines, incorporating attributes such as acidity, alcohol content, residual
sugar, and more, to build predictive models that estimate wine quality.
4
Index
1 Introduction 6
2 Implementation 7
3 Result 29
4 Conclusion 30
5 Reference 31
5
Chapter 1
1.1 INTRODUCTION
Predicting on the test data of Red Wine Quality Dataset and finding the
accuracy of the model using Logistic Regression, involving import of dataset,
quality check on the data (Data Wrangling), and performing Exploratory Data
Analysis (Univariate and Bivariate Analysis) using Histograms, Boxplots and
Scatter Plots. Thus, modelling the dataset using various machine learning
algorithms.
1.3 OBJECTIVE
o Build a Jupyter notebook in Anaconda, import data, and view numbers loaded
obsessed by the notebook.
o Practice Pandas to clean and formulate data.
o Use scikit-learn to create the machine learning exemplary.
o Use Matplotlib to see the model's performance.
6
Chapter 2
Implementation
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from tkinter import *
import numpy as np
def showQuality():
new =
np.array([[float(e1.get()),float(e2.get()),float(e3.get()),float(e4.get()),float(e5.
get()),float(e6.get()),float(e7.get()),float(e8.get()),float(e9.get()),float(e10.get
()),float(e11.get())]])
Ans = RF_clf.predict(new)
fin=str(Ans)[1:-1]#IT WILL remove[ ]
quality.insert(0, fin)
#------------------------------------------------------------------------
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# For this kernel, I amm only using the red wine dataset
data = pd.read_csv('winequality-red.csv')
data.head()
#Summary statistics
data.describe()
# Let's proceed to separate 'quality' as the target variable and the rest as
features.
y = data.quality # set 'quality' as target
X = data.drop('quality', axis=1) # rest are features
print(y.shape, X.shape)
#Let's look at the correlation among the variables using Correlation chart
colormap = plt.cm.viridis
plt.figure(figsize=(12,12))
plt.title('Correlation of Features', y=1.05, size=15)
sns.heatmap(data.astype(float).corr(),linewidths=0.1,vmax=1.0, square=True,
linecolor='white', annot=True)
27
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, log_loss
from sklearn.metrics import confusion_matrix
#Train and evaluate the Random Forest Classifier with Cross Validation
# Instantiate the Random Forest Classifier
RF_clf = RandomForestClassifier(random_state=seed)
# Compute k-fold cross validation on training dataset and see mean accuracy score
cv_scores = cross_val_score(RF_clf,X_train, y_train, cv=10, scoring='accuracy')
#Perform predictions
RF_clf.fit(X_train, y_train)
pred_RF = RF_clf.predict(X_test)
#------------------------------------------------------------------------
master = Tk()
e1 = Entry(master)
e2 = Entry(master)
e3 = Entry(master)
e4 = Entry(master)
28
e5 = Entry(master)
e6 = Entry(master)
e7 = Entry(master)
e8 = Entry(master)
e9 = Entry(master)
e10 = Entry(master)
e11 = Entry(master)
quality = Entry(master)
e1.grid(row=0, column=1)
e2.grid(row=1, column=1)
e3.grid(row=2, column=1)
e4.grid(row=3, column=1)
e5.grid(row=4, column=1)
e6.grid(row=5, column=1)
e7.grid(row=6, column=1)
e8.grid(row=7, column=1)
e9.grid(row=8, column=1)
e10.grid(row=9, column=1)
e11.grid(row=10, column=1)
quality.grid(row=13, column=1)
mainloop( )
29
Chapter 3
Results
30
Chapter 4
Conclusion
In conclusion, our wine quality prediction project successfully demonstrated the ability to predict
wine quality scores based on a comprehensive analysis of key attributes. The chosen machine
learning model exhibited strong performance, providing valuable insights into the factors that
influence wine quality. While limitations in the dataset were acknowledged, this project lays the
foundation for practical applications in the wine industry, empowering winemakers, and distributors
to make informed decisions about production and quality control. Future work can explore
enhancements and expanded datasets to further refine the predictive accuracy of the model,
contributing to the ongoing improvement of wine quality assessment processes.
31
References
https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009
Research papers:
Links:
1. https://www.verzeo.in/
2. https://www.tutorialspoint.com/machine_learning/wh
at_is_machine_learning.htm
3. https://towardsdatascience.com/exploratory-data-
analysis-8fc1cb20fd15
4. https://en.wikipedia.org/wiki/Machine_learning
32