0% found this document useful (0 votes)

73 views14 pages

Wine DS

This document provides an overview of analyzing wine quality data using machine learning models. It includes: 1) Importing and exploring the data which contains physicochemical properties and sensory attributes of wines. 2) Performing exploratory data analysis including checking distributions, outliers, and correlations between variables. 3) Evaluating the relationship between predictor variables and the ordinal quality rating variable through box plots. 4) The goal is to build machine learning models like SVC/SVR to predict wine quality based on physicochemical tests and evaluate model performance.

Uploaded by

ARCHANA R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views14 pages

Wine DS

Uploaded by

ARCHANA R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Source

Wine Quality Data Set: Document Flow

i. Dataset information

1. Data Injestion

2. EDA

3. Preprocessing

4. Model Building (SVC/SVR)

5. Evaluation

Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of
Portugal. The goal is to model wine quality based on physicochemical tests.

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details,
consult:. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables
are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

Scope of Work:

These datasets can be viewed as classification or regression tasks.

1. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor
ones).
2. Outlier detection algorithms could be used to detect the few excellent or poor wines.
3. We are not sure if all input variables are relevant. So it could be interesting to test feature selection
methods.

Attribute Information:

Input variables (based on physicochemical tests):

1. Feature columns

* fixed acidity | Continous Data

* volatile acidity | Continous Data

* citric acid | Continous Data

* residual sugar | Continous Data

* chlorides | Continous Data

* free sulfur dioxide | Continous Data

* total sulfur dioxide | Continous Data

* density | Continous Data

* pH | Continous Data

* sulphates | Continous Data

* alcohol | Continous Data

2. Target column

* quality | Ordinal data (score between 3 to 8)

Citation:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining
from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

1. Data Injestion

Library Import

1 import pandas as pd

2 import numpy as np

3 from sklearn.model_selection import train_test_split, GridSearchCV

4 from sklearn.metrics import classification_report, accuracy_score, confusion_matrix , roc_auc_score, roc_curve

5 from sklearn.preprocessing import StandardScaler

6 import matplotlib.pyplot as plt

7 plt.style.use('ggplot')

8 import seaborn as sns

9 import warnings

10
warnings.filterwarnings("ignore")

11
%matplotlib inline

14
# from pandas_profiling import ProfileReport

15

16
# ! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Data Import

1 data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', header = 0, sep=';')

2 data.head()

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol qu
acidity acidity acid sugar dioxide dioxide

0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4

1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8

2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8

3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8

4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4

EDA

1 data.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',

'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',

'pH', 'sulphates', 'alcohol', 'quality'],

dtype='object')

1 data.info()

RangeIndex: 1599 entries, 0 to 1598

Data columns (total 12 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 fixed acidity 1599 non-null float64

1 volatile acidity 1599 non-null float64

2 citric acid 1599 non-null float64

3 residual sugar 1599 non-null float64

4 chlorides 1599 non-null float64

5 free sulfur dioxide 1599 non-null float64

6 total sulfur dioxide 1599 non-null float64

7 density 1599 non-null float64

8 pH 1599 non-null float64

9 sulphates 1599 non-null float64

10 alcohol 1599 non-null float64

11 quality 1599 non-null int64

dtypes: float64(11), int64(1)

memory usage: 150.0 KB

Shape of dataset

1 data.shape

(1599, 12)

Summary of dataset

1 data.describe().T

count mean std min 25% 50% 75% max

fixed acidity 1599.0 8.319637 1.741096 4.60000 7.1000 7.90000 9.200000 15.90000

volatile acidity 1599.0 0.527821 0.179060 0.12000 0.3900 0.52000 0.640000 1.58000

citric acid 1599.0 0.270976 0.194801 0.00000 0.0900 0.26000 0.420000 1.00000

residual sugar 1599.0 2.538806 1.409928 0.90000 1.9000 2.20000 2.600000 15.50000

chlorides 1599.0 0.087467 0.047065 0.01200 0.0700 0.07900 0.090000 0.61100

free sulfur dioxide 1599.0 15.874922 10.460157 1.00000 7.0000 14.00000 21.000000 72.00000

total sulfur dioxide 1599.0 46.467792 32.895324 6.00000 22.0000 38.00000 62.000000 289.00000

density 1599.0 0.996747 0.001887 0.99007 0.9956 0.99675 0.997835 1.00369

pH 1599.0 3.311113 0.154386 2.74000 3.2100 3.31000 3.400000 4.01000

sulphates 1599.0 0.658149 0.169507 0.33000 0.5500 0.62000 0.730000 2.00000

alcohol 1599.0 10.422983 1.065668 8.40000 9.5000 10.20000 11.100000 14.90000

quality 1599.0 5.636023 0.807569 3.00000 5.0000 6.00000 6.000000 8.00000

Checking for null values

1 data.isna().sum()

fixed acidity 0

volatile acidity 0

citric acid 0

residual sugar 0

chlorides 0

free sulfur dioxide 0

total sulfur dioxide 0

density 0

pH 0

sulphates 0

alcohol 0

quality 0

dtype: int64

Proportion of count data on "quality" columns

1 data.quality.unique()

2 round(data.quality.value_counts()/(len(data))*100,2)

5 42.59

6 39.90

7 12.45

4 3.31
8 1.13
3 0.63
Name: quality, dtype: float64

Obesrvation:

1. Data set is imbalanced

Univariate analysis

Numerical Columns

1 plt.figure(figsize=(15, 15))

2 plt.suptitle('Univariate Analysis of Numerical Features', fontsize=20, fontweight='bold', alpha=0.8, y=1.)

3 numerical_features = [col for col in data.columns if data[col].dtypes != 'O']

4
5 for i in range(0, len(numerical_features)):

6 plt.subplot(5, 3, i+1)

7 sns.kdeplot(x=data[numerical_features[i]],shade=True, color='b')

8 plt.xlabel(numerical_features[i])

9 plt.tight_layout()

Observations:

1. Normally distributed columns: 'density', 'pH' .

2. Column with Outliers: 'Residual Sugar', 'Chlorides'
3. Left skewed Columns: 'Fixed acidity', 'Free sulphur dioxide', 'Total sulphur dioxide', 'sulphates', 'alcohol'
4. Right skewed Columns: No such column
5. Columns with uneven distribution: Volatile acidic, citric acid

Multivariate analysis

Multivariate analysis is the analysis of more than one variable.

Checking for multicolinearity

1 # In numerical column
2 plt.figure(figsize = (15,10))
3 matrix = np.triu(data.corr())
4 sns.heatmap(data.corr(), annot=True, mask = matrix)
5 plt.yticks(rotation=45)
6 plt.show()

Observations:

1. Very highly correlated columns (corr > .90): None

2. High correlated columns (corr > 90): None

Relation between feature and label column

1 feature = data.drop(columns = 'quality')

2 feature.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',

'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',

'pH', 'sulphates', 'alcohol'],

dtype='object')

Here we will make an approach to understand the relation between the feature and target columns.

Feature columns are:

'fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur
dioxide', 'density','pH', 'sulphates', 'alcohol'

Label columns:

'quality'

1 # Getting the percentage of each category in 'quality' column

2 sns.countplot(x = 'quality', data= data)

<matplotlib.axes._subplots.AxesSubplot at 0x7f4a042035d0>

1 feature_continous = [col for col in feature.columns if data[col].dtypes != 'O']

1 fig = plt.figure(figsize=(15, 50))
2 plt.suptitle('Box Plot between feature and continous label ', fontsize = 20, y = 1)

3
4 for i in range(0, len(feature_continous)):

5 ax = plt.subplot(10, 3, i+1)

6 sns.boxplot(data = data, x = 'quality', y = data[feature_continous[i]])

7 plt.tight_layout()

Preliminary Conclusions:

1. Increasing sulphates will imporove the quality of wine.

2. Increasing alcohol will imporove the quality of wine.
3. Decreasing volatilic acid will imporove the quality of wine.
4. Increasing citric acid will imporove the quality of wine.

1 sns.pairplot(data)

<seaborn.axisgrid.PairGrid at 0x7f4a04037610>

Preprocessing
Bringing down the data to same scale will surely reduce the computation time

1 x = data.drop(columns = 'quality')

2 y = data['quality']

1 x_train, x_test, y_train,y_test = train_test_split(x , y, test_size = .2, random_state = 0)

1 scaler = StandardScaler()

2 x_train_tf = scaler.fit_transform(x_train)

3 # get the parameter

4 scaler.mean_

array([ 8.33737295, 0.53 , 0.27218139, 2.55316654, 0.08702424,

16.03283815, 47.11415168, 0.99675746, 3.30982799, 0.6590774 ,

10.41399531])

1 x_test_tf = scaler.transform(x_test)

Model Building

1 from sklearn.svm import SVC

Raw SVC

1 model_svc = SVC()

2 model_svc.fit(x_train_tf, y_train)

3 model_svc.score(x_train_tf, y_train)

4 print(f"Accuracy score is: {model_svc.score(x_train_tf, y_train)}")

5 predict_raw = model_svc.predict(x_test_tf)

6
7
8

Accuracy score is: 0.6669272869429241

Tuning the SVC

1 model_svc_tune = SVC()

2 params = [{'C': [.5,.9,1,1.2,1.3,1.5]}]

3 clf = GridSearchCV(model_svc_tune, params, cv = 10, scoring='accuracy')

4 clf.fit(x_train_tf, y_train)

5 print(f'best value of C is {clf.best_params_}')

7 model_svc_tune = SVC()

8 params = {'kernel': [ 'rbf','linear','poly','sigmoid' ],

9 'degree': [ 2,3,4,5,6 ]}

10
clf = GridSearchCV(model_svc_tune, params, cv = 10, scoring='accuracy')

11
clf.fit(x_train_tf, y_train)

12
print(clf.best_params_)

14
model_svc_tune = SVC()

15
params = {'gamma' :[0.8,0.9,1,1.1,1.2,1.3]}

16
clf = GridSearchCV(model_svc_tune, params, cv = 10, scoring='accuracy')

17
clf.fit(x_train_tf, y_train)

18
print(clf.best_params_)

best value of C is {'C': 1.2}

{'degree': 2, 'kernel': 'rbf'}

{'gamma': 1}

1 params = {

2 'C': [.9,1,1.2,1.3],

3 'kernel':['rbf','linear'],

4 'gamma': [.9,1,1.1]

5 }

6
7 clf = GridSearchCV(model_svc_tune, params, cv = 10, scoring='accuracy')

8 clf.fit(x_train_tf, y_train)

9 print(clf.best_params_)

{'C': 1.3, 'gamma': 1.1, 'kernel': 'rbf'}

1 model_svc_tune = SVC(C = 1.3,kernel= 'rbf', gamma = 1.3)

2 model_svc_tune.fit(x_train_tf, y_train)

3 predict_tuned = model_svc_tune.predict(x_test_tf)

Evaluation

Raw Model

1 print(f'Accuracy Score: {accuracy_score(y_test, predict_raw)}')

Accuracy Score: 0.64375

1 print('Classification report')

2 print(classification_report(y_test, predict_raw))

Classification report

precision recall f1-score support

3 0.00 0.00 0.00 2

4 0.00 0.00 0.00 11

5 0.67 0.75 0.71 135

6 0.63 0.68 0.66 142
7 0.50 0.30 0.37 27

8 0.00 0.00 0.00 3

accuracy 0.64 320

macro avg 0.30 0.29 0.29 320

weighted avg 0.61 0.64 0.62 320

Tuned Model

1 print(f'Accuracy Score: {accuracy_score(y_test, predict_tuned)}')

Accuracy Score: 0.675

1 print('Classification Report')

2 print(classification_report(y_test, predict_tuned))

Classification Report

precision recall f1-score support

3 0.00 0.00 0.00 2

4 0.00 0.00 0.00 11

5 0.66 0.79 0.72 135

6 0.70 0.68 0.69 142
7 0.71 0.44 0.55 27

8 0.00 0.00 0.00 3

accuracy 0.68 320

macro avg 0.34 0.32 0.33 320

weighted avg 0.65 0.68 0.66 320

1 pd.read_csv("https://raw.githubusercontent.com/srinivasav22/Graduate-Admission-Prediction/master/Admission_Predict_Ver1.1.csv")

Colab paid products

-
Cancel contracts here

Quality Prediction Checkpoint
No ratings yet
Quality Prediction Checkpoint
14 pages
FDS Solved Slips
100% (1)
FDS Solved Slips
63 pages
Water Quality 1673157384
No ratings yet
Water Quality 1673157384
30 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
MLP Slides Merged
No ratings yet
MLP Slides Merged
480 pages
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
No ratings yet
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
10 pages
Balancing Chemical Equation
100% (1)
Balancing Chemical Equation
4 pages
Radicals
100% (1)
Radicals
2 pages
Guillermo Garcia Rodriguez - Rivendel S.L
No ratings yet
Guillermo Garcia Rodriguez - Rivendel S.L
85 pages
International GCSE Chemistry Teacher Resource Pack Sample
100% (1)
International GCSE Chemistry Teacher Resource Pack Sample
25 pages
Statistics Interview Questions
100% (2)
Statistics Interview Questions
5 pages
The Art of Effective Visualization of Multi-Dimensional Data
No ratings yet
The Art of Effective Visualization of Multi-Dimensional Data
51 pages
Lecture-2: Introduction To Data Science
No ratings yet
Lecture-2: Introduction To Data Science
32 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Learning Concepts Hackers Realm
No ratings yet
Learning Concepts Hackers Realm
78 pages
Fds Practical Slips Solutions
No ratings yet
Fds Practical Slips Solutions
32 pages
Basic Python Analysis
No ratings yet
Basic Python Analysis
33 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Water Potablity Detection
No ratings yet
Water Potablity Detection
29 pages
Wine
No ratings yet
Wine
22 pages
Empirical Crop Suitability Model 1694688954
No ratings yet
Empirical Crop Suitability Model 1694688954
24 pages
Skewness in Data
No ratings yet
Skewness in Data
33 pages
Matigo Mocks Uace Chem 2 Guide
No ratings yet
Matigo Mocks Uace Chem 2 Guide
13 pages
R Project
No ratings yet
R Project
22 pages
14-May - Jupyter Notebook
No ratings yet
14-May - Jupyter Notebook
15 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
Eda Red Wine
No ratings yet
Eda Red Wine
16 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
Year 9 Science
No ratings yet
Year 9 Science
28 pages
30 Deep Learning Projects
No ratings yet
30 Deep Learning Projects
7 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
AM19 EDA Assignment5
No ratings yet
AM19 EDA Assignment5
19 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Wine
No ratings yet
Wine
15 pages
Report Revathy
No ratings yet
Report Revathy
13 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
Quality Prediction
No ratings yet
Quality Prediction
20 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
Suresh-Sparkling Time Series Forecasting Project Report
No ratings yet
Suresh-Sparkling Time Series Forecasting Project Report
73 pages
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
No ratings yet
QM - Ii Assignment - 3: Submitted By: Group 2 (Sec-B)
6 pages
45B AIML Practical07 Clustering
No ratings yet
45B AIML Practical07 Clustering
8 pages
Central Tendency and Dispersion Analysis - 12212204
No ratings yet
Central Tendency and Dispersion Analysis - 12212204
14 pages
Data Visualisation Using Tableau
No ratings yet
Data Visualisation Using Tableau
12 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
SUBQUERIES
No ratings yet
SUBQUERIES
8 pages
Data Ingestion: Import As Import As Import As
No ratings yet
Data Ingestion: Import As Import As Import As
16 pages
Time Series Forecasting Week 2 Quiz Part 1
75% (4)
Time Series Forecasting Week 2 Quiz Part 1
3 pages
Water Quality Index - EDA and Classification
No ratings yet
Water Quality Index - EDA and Classification
9 pages
Water - Qualit (2) - JupyterLab
No ratings yet
Water - Qualit (2) - JupyterLab
10 pages
Time Series Forecasting Week 1 Quiz Part 2
67% (3)
Time Series Forecasting Week 1 Quiz Part 2
2 pages
Statistics and Probability PROJECT 2
No ratings yet
Statistics and Probability PROJECT 2
8 pages
DA
No ratings yet
DA
4 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
Devesh
No ratings yet
Devesh
11 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
6 pages
Wine Quality Questions
No ratings yet
Wine Quality Questions
2 pages
Python Project 2 Colab
No ratings yet
Python Project 2 Colab
6 pages
Exercise#9 Instructions 2021
No ratings yet
Exercise#9 Instructions 2021
5 pages
21brs1715 Lab3
No ratings yet
21brs1715 Lab3
4 pages
Lab 1 Data Visualization and Statistics From Data
No ratings yet
Lab 1 Data Visualization and Statistics From Data
4 pages
Assignment4 VidulGarg
No ratings yet
Assignment4 VidulGarg
14 pages
Practical04.ipynb - Colab
No ratings yet
Practical04.ipynb - Colab
2 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
Learning Naming Through Guided Inquiry - Key
No ratings yet
Learning Naming Through Guided Inquiry - Key
15 pages
Compte Rendu TP 2 Pandas
No ratings yet
Compte Rendu TP 2 Pandas
2 pages
Pandas Usefull Code
No ratings yet
Pandas Usefull Code
2 pages
Wine
No ratings yet
Wine
2 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
6 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
R Console
No ratings yet
R Console
1 page
Chemsheets As 008 (Amount of Substance)
No ratings yet
Chemsheets As 008 (Amount of Substance)
36 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
X B Worksheet 1
No ratings yet
X B Worksheet 1
8 pages
Statistics Materials: Data Science: Week 9
No ratings yet
Statistics Materials: Data Science: Week 9
22 pages
Python Codin
No ratings yet
Python Codin
4 pages
Class-X, Ch-1, Text Book Solutions (Chemical Reactions and Equations)
No ratings yet
Class-X, Ch-1, Text Book Solutions (Chemical Reactions and Equations)
6 pages
DSML Brochure 2023 Latest Feb
No ratings yet
DSML Brochure 2023 Latest Feb
18 pages
Preparation of Mohr Salt
No ratings yet
Preparation of Mohr Salt
1 page
Plan The Week - Storytelling With Data-1
No ratings yet
Plan The Week - Storytelling With Data-1
5 pages
Chemical Names of Common Substances - Chemical or Scientific Names of Household Chemicals
No ratings yet
Chemical Names of Common Substances - Chemical or Scientific Names of Household Chemicals
5 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
Deposit Subscription: Eda Mini Project
No ratings yet
Deposit Subscription: Eda Mini Project
41 pages
Vired
No ratings yet
Vired
4 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Class 10TH Holiday Assignment 2024-25
No ratings yet
Class 10TH Holiday Assignment 2024-25
8 pages
Tableau+2020 2+relationships
No ratings yet
Tableau+2020 2+relationships
2 pages
Linear Regression and SVR
No ratings yet
Linear Regression and SVR
25 pages
Limit Test by Vaibhav
No ratings yet
Limit Test by Vaibhav
12 pages
NDA Chemistry
No ratings yet
NDA Chemistry
4 pages
Betty Boop S Guide To A Bold and Balanced Life Fun Fierce Fabulous Advice Inspired by The Animated Icon 3rd Edition Horan Instant Download
No ratings yet
Betty Boop S Guide To A Bold and Balanced Life Fun Fierce Fabulous Advice Inspired by The Animated Icon 3rd Edition Horan Instant Download
49 pages
1.IEB Chemistry - Content - Pg. 9-15
No ratings yet
1.IEB Chemistry - Content - Pg. 9-15
13 pages
Statistics and Machine Learning
No ratings yet
Statistics and Machine Learning
51 pages
Machine: Learning
No ratings yet
Machine: Learning
15 pages
Model Deployment GL
No ratings yet
Model Deployment GL
20 pages
Low Sulphur Fuel Oil (LSFO)
No ratings yet
Low Sulphur Fuel Oil (LSFO)
7 pages
Important: Valve Material Selection Guide
No ratings yet
Important: Valve Material Selection Guide
6 pages
Titanic DS Callenge
No ratings yet
Titanic DS Callenge
24 pages
Valve Material Selection Guide
No ratings yet
Valve Material Selection Guide
5 pages
Answers Equilibrium Solutions and KSP
No ratings yet
Answers Equilibrium Solutions and KSP
10 pages
7.3 Preparation of Salts QP
No ratings yet
7.3 Preparation of Salts QP
11 pages
Ais 2024-25 Igcse-Iii Mock-2 Chem-P2
No ratings yet
Ais 2024-25 Igcse-Iii Mock-2 Chem-P2
14 pages
Product Upload Details-5 READY
No ratings yet
Product Upload Details-5 READY
15 pages
Investigating Displacement and Double Displacement Reactions
No ratings yet
Investigating Displacement and Double Displacement Reactions
3 pages
GROUP 2 PAST Questions
No ratings yet
GROUP 2 PAST Questions
4 pages
Coordination Compounds
No ratings yet
Coordination Compounds
6 pages
G1658319514soswaal ICSE 10th Chemistry - Practice Paper-3
No ratings yet
G1658319514soswaal ICSE 10th Chemistry - Practice Paper-3
3 pages
Laboratory Experiments and Demonstrations
No ratings yet
Laboratory Experiments and Demonstrations
2 pages
Module 7: Determination of The KSP of Various Solids
No ratings yet
Module 7: Determination of The KSP of Various Solids
5 pages
Chemsheets KS3 065 (Reactions of Acids 3)
No ratings yet
Chemsheets KS3 065 (Reactions of Acids 3)
1 page
Automotive Sensor Testing and Waveform Analysis
From Everand
Automotive Sensor Testing and Waveform Analysis
Mandy Concepcion
4.5/5 (14)
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet

Wine DS

Uploaded by

Wine DS

Uploaded by

Source

Wine Quality Data Set: Document Flow

4. Model Building (SVC/SVR)

These datasets can be viewed as classification or regression tasks.

Input variables (based on physicochemical tests):

fixed volatile citric residual free sulfur total sulfur

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',

'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',

'pH', 'sulphates', 'alcohol', 'quality'],

RangeIndex: 1599 entries, 0 to 1598

Data columns (total 12 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 fixed acidity 1599 non-null float64

1 volatile acidity 1599 non-null float64

2 citric acid 1599 non-null float64

3 residual sugar 1599 non-null float64

4 chlorides 1599 non-null float64

5 free sulfur dioxide 1599 non-null float64

7 density 1599 non-null float64

8 pH 1599 non-null float64

9 sulphates 1599 non-null float64

10 alcohol 1599 non-null float64

11 quality 1599 non-null int64

dtypes: float64(11), int64(1)

memory usage: 150.0 KB

count mean std min 25% 50% 75% max

chlorides 1599.0 0.087467 0.047065 0.01200 0.0700 0.07900 0.090000 0.61100

density 1599.0 0.996747 0.001887 0.99007 0.9956 0.99675 0.997835 1.00369

pH 1599.0 3.311113 0.154386 2.74000 3.2100 3.31000 3.400000 4.01000

sulphates 1599.0 0.658149 0.169507 0.33000 0.5500 0.62000 0.730000 2.00000

alcohol 1599.0 10.422983 1.065668 8.40000 9.5000 10.20000 11.100000 14.90000

quality 1599.0 5.636023 0.807569 3.00000 5.0000 6.00000 6.000000 8.00000

Checking for null values

free sulfur dioxide 0

total sulfur dioxide 0

Proportion of count data on "quality" columns

1. Data set is imbalanced

1. Normally distributed columns: 'density', 'pH' .

Multivariate analysis is the analysis of more than one variable.

Checking for multicolinearity

1. Very highly correlated columns (corr > .90): None

Relation between feature and label column

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',

'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',

'pH', 'sulphates', 'alcohol'],

Feature columns are:

1. Increasing sulphates will imporove the quality of wine.

array([ 8.33737295, 0.53 , 0.27218139, 2.55316654, 0.08702424,

16.03283815, 47.11415168, 0.99675746, 3.30982799, 0.6590774 ,

Accuracy score is: 0.6669272869429241

Tuning the SVC

best value of C is {'C': 1.2}

{'degree': 2, 'kernel': 'rbf'}

{'C': 1.3, 'gamma': 1.1, 'kernel': 'rbf'}

Accuracy Score: 0.64375

precision recall f1-score support

3 0.00 0.00 0.00 2

5 0.67 0.75 0.71 135

8 0.00 0.00 0.00 3

accuracy 0.64 320

macro avg 0.30 0.29 0.29 320

weighted avg 0.61 0.64 0.62 320

Accuracy Score: 0.675

precision recall f1-score support

3 0.00 0.00 0.00 2

5 0.66 0.79 0.72 135

8 0.00 0.00 0.00 3

accuracy 0.68 320

macro avg 0.34 0.32 0.33 320

weighted avg 0.65 0.68 0.66 320

Colab paid products

You might also like