Wine 9
Wine 9
in
By
Under the
supervision of
to
Aryan Negi(171220)
Parvesh Sharma(171256)
This is to certify that the above statement made by the candidate is true to the best
of my knowledge.
We are also thankful to all other faculty members for their constant
motivation and helping us bring in improvements in the project.
Finally, we like to thank our family and friends for their constant
support. Without their contribution it would have been impossible to
complete our work.
II
I
Certificate
III
ABSTRACT
The main goal of this project is to predict wine quality whether it is good or
bad. For centuries tasting has been done by humans and they have always
predicted on the basis of sensory organs. But in recent times the industries are
adopting newer technologies and applying them in all kinds of areas. But, still
there are many areas in which human expertise is needed like product quality
assurance. Nowadays, it becomes an expensive process as the demand of
product is growing over the time. Therefore, this project searches different
machine learning techniques such as MLP classifier, Decision Tree classifier,
Support Vector Machines (SVM) for product quality assurance. These
techniques do quality assurance process with the help of available
characteristics of product and automate the process by minimizing human
interference.
IV
List of Abbreviations
ACRONYM DEFINITIONS
V
List of Figure
1. Figure 1: Histogram
2. Figure 2: Boxplot
VI
Chapter – 1
1.1 INTRODUCTION
The most defining period of human history will always be remembered as computing
moved from mainframes to PCs to cloud and now to artificial intelligence. An
important area of artificial intelligence which came in lime light, called as Machine
Learning, allows computers to get into some kind of self-learning mode involuntary.
With the concepts and ideas from machine learning, we have been able to spread from
miscellaneous accurate reduplications to big data iteration that too with at a
marvellous speed. This spectacle has been in momentum over the last several years.
On the other hand, data mining includes data discovery and sorting it among large
data sets vacant to identify the required designs and begin affiliations with the aim of
answering teething worries over and done with data analysis. Basically linking, device
learning and data mining use the same type of method and set of processes, except the
kind of data pre-dealing out and end guess varies. Between these two core expanses to
predict and present the truest results potential.
1.3 OBJECTIVE
Build a Jupyter notebook in Anaconda, import data, and view numbers loaded
obsessed by the notebook.
Practice Pandas to clean and formulate data.
Use scikit-learn to create the machine learning exemplary.
Use Matplotlib to see the model's performance.
VII
Chapter -2
Jupyter notebooks are highly collaborative, and since they can take in
executable enigma, they provide the seamless platform for manipulating data
and edifice predictive models from it.
3.Connect the Run button to execute the code. Sanction that the output
remind us of the output below.
VIII
The Data Frame that we formed contains information of all the contents
percentages that are present in red wine and the wine quality as well. It has
more than 1000 rows and 12 columns. (The output says "5 rows" because
Numbers Frame’s head job only returns the first five rows.) Each row embodies
the amount of content available in the wine as well as it’s quality as well .We'll
mine at the data more closely a bit later in this segment.
IX
Chapter-3
2. Yield a flash to survey the 12 columns in the dataset. Here is a ample list
of the columns in the dataset.
Column Explanation
X
Column Explanation
pH Percentage of pH in wine
One of the most central aspects of fixing a dataset for practise in apparatus
learning is decide on the "feature" columns that are significant to the outcome
we are trying to predict while filtering out columns that do not affect the
outcome, could bias it in a negative way, or might produce multicollinearity.
Another important task is to exclude missing values, either by accordingly
scoring them or by filling them with the average value of that column. In this
exercise, we will check for missing value rows/columns.
1. One of the first things data scientists typically look for in a dataset is
missing values. There's an easy way to check for missing values in Pandas.
To demonstrate, execute the following code in a cell at the end of the
notebook:
XI
2. The next step is to find out where the missing values are. To do so,
execute the following code:
Confirm that we see the following output listing no count of missing values in
each column:
The dataset is now "clean" in the sense that missing values have been replaced and
the list of columns has been narrowed to those most relevant to the model.
XII
12
Fig2: -Box plot for one variable
The below figures show bivariate analysis using scatter plot for ‘fixed acidity’
and ‘quality’ columns which reflects uniformly spread data.
XIII
A heat map is extremely powerful way to visualize relationships between
variables in high dimensional space. For example, in this case a correlation
matrix with heat map colouring is shown below. A correlation matrix is a
table showing correlation coefficients between sets of variables. Each
random variable in the table is correlated with each of the other values in the
table. This allows us to see which pairs have the highest correlation.
XIV
Chapter-4
To fashion a machine learning model, we want two datasets: one for training
and one for testing. In practice, we often have only one dataset, so we split it
into two. In this exercise, we will perform an 70-30 split on the dataframe we
prepared in the previous lab such that it can used to train a machine learning
model. We will also isolate the dataframe obsessed by feature columns and
label supports. The former has the columns used as input to the wine(for
example, the fixed acidity, alcohol, sulphates, etc. ), while the latter contains the
stake that the model will try to predict — in this case, the quality column, which
indicates whether a flight will arrive on time.
1. In a new cell at the end of the notebook, enter and execute the following
statements:
2. Now use this command to show the numeral of rows and columns in the
DataFrame comprehending the feature columns used for tough:
XV
There are various types of machine learning copies. One of the most public is
the KNN model, stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
One of the doles of using scikit-learn is that we don't have to build these
reproductions — or implement the algorithms that they use — by hand. Scikit-
learn includes a variety of classes for instigating collective machine learning
models. One of them is Decision Tree classifier, which organizes a series of test
questions and conditions in a tree structure.
2.Now call the predict method to test the model using the values in X_test,
followed by the score method to determine the mean accuracy of the model:
In the real world, a trained data scientist would look for ways to make the model
even more accurate. Among other things, they would try different algorithms
and take steps to tune the chosen algorithm to find the optimum combination of
parameters. Another likely step would be to expand the dataset to millions of
rows rather than a few. But for our purposes, the model is fine as-is.
XVI
18
20
CONCLUSION
Machine Learning requires massive data sets to train on, and these should be
inclusive/unbiased, and of good quality. There can also be times where they
must wait for new data to be generated. ML needs enough time to let the
algorithms learn and develop enough to fulfil their purpose with a considerable
amount of accuracy and relevancy. It also needs massive resources to function.
This can mean additional requirements of computer power for you. Another
major challenge is the ability to accurately interpret results generated by the
algorithms. You must also carefully choose the algorithms for your purpose.
Machine Learning can be incredibly powerful when used in the right ways and
in the right places
XVII
REFERENCE
Research papers:
Links:
1.https://www.verzeo.in/
2.https://www.tutorialspoint.com/machine_learning/wh
at_is_machine_learning.htm
3.https://towardsdatascience.com/exploratory-data-
analysis-8fc1cb20fd15
4.https://en.wikipedia.org/wiki/Machine_learning
XVIII