Exercise#9 Instructions 2021

Uploaded by

laylaydeanne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Exercise#9 Instructions 2021

Uploaded by

laylaydeanne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Week 10 Interactive Exercise#9

Clustering
In this exercise, we will do the following:

 Explore a dataset
 Visualize the clusters using matplotlib and seaborn
 Build a clustering model using K-means clustering algorithm

Note: You need to do step 10 before you leave the lab and present the results to your professor
before leaving to earn any grades for this lab.

Pre-requisites:
1- Install Anoconda
2- We will be using a lot of Public datasets these datasets are available at https://goo.gl/zjS4C6 under a
folder named "Datasets for Predictive Modelling with Python", the datasets are organized in the order of
the text book chapters: Python: Advanced Predictive Analytics, chapter # 7 files are required

Steps for exploring and building a logistic regression model:

1- Open your spyder IDE
2- Load the 'wine.csv' file into a dataframe name the dataframe data_firstname_wine where first
name is your first name carry out the following activities:
a. Display the column names
b. Display the shape of the data frame i.e number of rows and number of columns
c. Display the main statistics of the data
d. Display the types of columns
e. Display the first five records
f. Find the unique values of the quality attribute
g. Find the mean of the various chemical compositions across samples for the different
groups of the wine quality

Following is the code, make sure you update the path to the correct path where you placed the
files and update the data frame name correctly:
import pandas as pd
import os
path = "C:/A_COMP309/data/"
filename = 'wine.csv'
fullpath = os.path.join(path,filename)
data_viji_wine = pd.read_csv(fullpath,sep=';')
print (data_viji_wine)
pd.set_option('display.max_columns',15)
print(data_viji_wine.head())
print(data_viji_wine.columns.values)
print(data_viji_wine.shape)
print(data_viji_wine.describe())
print(data_viji_wine.dtypes)
print(data_viji_wine.head(5))
print(data_viji_wine['quality'].value_counts())
# number_quality=data_viji_wine['quality'].value_counts()
# print("number of items ",number_quality)
print(data_viji_wine['quality'].unique())
pd.set_option('display.max_columns',15)
print(data_viji_wine.groupby('quality').mean())

Some observations
 The lesser the volatile acidity and chlorides, the higher the wine quality
 The more the sulphates and citric acid content, the higher the wine quality
 The density and pH don't vary much across the wine quality

3- Plot a histogram to see the number of wine samples in each quality type
Following is the code, make sure you update the the data frame name correctly:

import matplotlib.pyplot as plt

plt.hist(data_viji_wine['quality'])

4- Use seaborn library to generate different plots: histograms, pairplots, heatmaps…etc. and
investigate the correlations.
Following are the code snippets, make sure you update the data frame name correctly:
#Use seaborn library to generate different plots:
import seaborn as sns
sns.distplot(data_viji_wine['quality'])
# plot only the density function
sns.distplot(data_viji_wine['quality'], rug=True, hist=False, color = 'g')
# Change the direction of the plot
sns.distplot(data_viji_wine['quality'], rug=True, hist=False, vertical = True)
# Check all correlations. Here it take longer time to execute
sns.pairplot(data_viji_wine)
# Subset three column
x=data_viji_wine[['fixed acidity','chlorides','pH']]
y=data_viji_wine[['chlorides','pH']]
# check the correlations
sns.pairplot(x)

# Generate heatmaps
sns.heatmap(data_viji_wine[['fixed acidity']])
sns.heatmap(x)
sns.heatmap(x.corr())
sns.heatmap(x.corr(),annot=True)
import matplotlib.pyplot as plt
plt.figure(figsize=(10,9))
sns.heatmap(x.corr(),annot=True, cmap='coolwarm',linewidth=0.5)
##line two variables
plt.figure(figsize=(20,9))
sns.lineplot(data=y)
sns.lineplot(data=y,x='chlorides',y='pH')
## line three variables
sns.lineplot(data=x)

# check some plots after normalizing the data

x1=data_viji_wine_norm[['fixed acidity','chlorides','pH']]
y1=data_viji_wine_norm[['chlorides','pH']]
sns.lineplot(data=y1)
sns.lineplot(data=x1)
sns.lineplot(data=y,x='chlorides',y='pH')

5- Normalize the data in order to apply clustering, the formula is as follows:

Following is the code, make sure you update model name correctly:
#Normalize the data in order to apply clustering
data_viji_wine_norm = (data_viji_wine - data_viji_wine.min()) / (data_viji_wine.max() -
data_viji_wine.min())
data_viji_wine_norm.head()
The output should look like this
6- Generate some additional plots for the normalized data:
Following is the code, make sure you update model name correctly:

# check some plots after normalizing the data

x1=data_viji_wine_norm[['fixed acidity','chlorides','pH']]
y1=data_viji_wine_norm[['chlorides','pH']]
sns.lineplot(data=y1)
sns.lineplot(data=x1)
sns.lineplot(data=y,x='chlorides',y='pH')

7- Cluster the data (observations) into 6 clusters using k-means clustering algorithm.
8- Following is the code, make sure you update model name correctly:

from sklearn.cluster import KMeans

#from sklearn import datasets
model=KMeans(n_clusters=6)
model.fit(data_viji_wine_norm)

9- Check the results as follows:

a. Print the model labels
b. Append the clusters to each record on the dataframe, i.e. add a new column for clusters
c. find the final cluster's centroids for each cluster
d. Calculate the J-scores The J-score can be thought of as the sum of the squared distance
between points and cluster centroid for each point and cluster. For an efficient cluster,
the J-score should be as low as possible.
e. plot a histogram for the clusters variable to get an idea of the number of observations in
each cluster.
Following is the code, make sure you update model name correctly:

model.labels_
# Append the clusters to each record on the dataframe, i.e. add a new column for clusters
md=pd.Series(model.labels_)
data_viji_wine_norm['clust']=md
data_viji_wine_norm.head(10)
#find the final cluster's centroids for each cluster
model.cluster_centers_

#Calculate the J-scores The J-score can be thought of as the sum of the squared distance
between points and cluster centroid for each point and cluster.
#For an efficient cluster, the J-score should be as low as possible.
model.inertia_

#let us plot a histogram for the clusters

import matplotlib.pyplot as plt

plt.hist(data_viji_wine_norm['clust'])
plt.title('Histogram of Clusters')
plt.xlabel('Cluster')
plt.ylabel('Frequency')
# plot a scatter
plt.scatter(data_viji_wine_norm['clust'],data_viji_wine_norm['pH'])
plt.scatter(data_viji_wine_norm['clust'],data_viji_wine_norm['chlorides'])

10- Re-cluster the data into three clusters and check the results. Show the results to your professor.

Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Emma Mason - The Cambridge Introduction To William Wordsworth-Cambridge University Press (2010)
100% (6)
Emma Mason - The Cambridge Introduction To William Wordsworth-Cambridge University Press (2010)
151 pages
Machine Learning (16CIC73) Project Report Template
33% (3)
Machine Learning (16CIC73) Project Report Template
12 pages
Pre Sales Questionnaire
No ratings yet
Pre Sales Questionnaire
15 pages
Lab Assignment 10: Web Mining
No ratings yet
Lab Assignment 10: Web Mining
5 pages
Lab Assignment 10: Web Mining
No ratings yet
Lab Assignment 10: Web Mining
5 pages
Wine
No ratings yet
Wine
22 pages
SUBQUERIES
No ratings yet
SUBQUERIES
8 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Devesh
No ratings yet
Devesh
11 pages
MLP Slides Merged
No ratings yet
MLP Slides Merged
480 pages
Wine
No ratings yet
Wine
15 pages
Wine DS
No ratings yet
Wine DS
14 pages
Big Data Projecct
No ratings yet
Big Data Projecct
12 pages
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
No ratings yet
Grupo Turing - Processo Seletivo 2019.1: Exemplo de Análise de Dados - Red Wine Quality
7 pages
45B AIML Practical07 Clustering
No ratings yet
45B AIML Practical07 Clustering
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Wine Quality Prediction Using Machine Learning Algorithms
100% (1)
Wine Quality Prediction Using Machine Learning Algorithms
4 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
The Art of Effective Visualization of Multi-Dimensional Data
No ratings yet
The Art of Effective Visualization of Multi-Dimensional Data
51 pages
Pandas Usefull Code
No ratings yet
Pandas Usefull Code
2 pages
Cluster
No ratings yet
Cluster
3 pages
Wine Prediction
100% (1)
Wine Prediction
13 pages
Wine Quality Predictions
No ratings yet
Wine Quality Predictions
13 pages
R Project
No ratings yet
R Project
22 pages
Business Analytics
No ratings yet
Business Analytics
17 pages
Wine Quality Prediction GHAR
No ratings yet
Wine Quality Prediction GHAR
19 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
6 pages
FINLATICS
No ratings yet
FINLATICS
8 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Guillermo Garcia Rodriguez - Rivendel S.L
No ratings yet
Guillermo Garcia Rodriguez - Rivendel S.L
85 pages
Wine Quality Prediction: Implementation
No ratings yet
Wine Quality Prediction: Implementation
3 pages
ML Week 5
No ratings yet
ML Week 5
2 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
Wine Quality Prediction Project Report
No ratings yet
Wine Quality Prediction Project Report
4 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
ML PR
No ratings yet
ML PR
32 pages
Final Assessment Introductory Data Science Part 2
No ratings yet
Final Assessment Introductory Data Science Part 2
6 pages
Mahima 2020
No ratings yet
Mahima 2020
8 pages
Compte Rendu TP 2 Pandas
No ratings yet
Compte Rendu TP 2 Pandas
2 pages
Red Wine Mine
100% (1)
Red Wine Mine
32 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
Week 11 Assignment 11.2.2
No ratings yet
Week 11 Assignment 11.2.2
3 pages
Business Analytics 1 Ca 2
No ratings yet
Business Analytics 1 Ca 2
26 pages
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
No ratings yet
Machine Learning On Wine Quality: Prediction and Feature Importance Analysis
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Machine Learning Miniproject
No ratings yet
Machine Learning Miniproject
10 pages
Wine Quality Prediction Using Data Mining
No ratings yet
Wine Quality Prediction Using Data Mining
13 pages
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
No ratings yet
Using Chemical Composition To Predict Red Wine Quality Via Multiple Linear Regression
12 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
PCA Problem Statement
No ratings yet
PCA Problem Statement
25 pages
HW04
No ratings yet
HW04
3 pages
Data Analysis and Evaluation Methods Comparison
No ratings yet
Data Analysis and Evaluation Methods Comparison
11 pages
20BCE2126 ML Da 5
No ratings yet
20BCE2126 ML Da 5
3 pages
Lab 1 Data Visualization and Statistics From Data
No ratings yet
Lab 1 Data Visualization and Statistics From Data
4 pages
R Console
No ratings yet
R Console
1 page
Report Revathy
No ratings yet
Report Revathy
13 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
From Everand
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
Eddie Vi
4/5 (1)
Quarter 3 - Module 8: The Power (Positivity, Optimism and Resiliency) To Cope
100% (1)
Quarter 3 - Module 8: The Power (Positivity, Optimism and Resiliency) To Cope
3 pages
Aurora Geo Report
No ratings yet
Aurora Geo Report
86 pages
A-Dec Dental Lights and Monitor Mounts Service Guide
No ratings yet
A-Dec Dental Lights and Monitor Mounts Service Guide
68 pages
Physics Ia (Electricity)
No ratings yet
Physics Ia (Electricity)
5 pages
Individual Event Posters
No ratings yet
Individual Event Posters
8 pages
Lecture22 PDF
No ratings yet
Lecture22 PDF
29 pages
Under Guidance of Hassan Zakir Jafri SB
No ratings yet
Under Guidance of Hassan Zakir Jafri SB
10 pages
2425 - Pgdlma - Elscon - Mock - Assessment - Tagged
No ratings yet
2425 - Pgdlma - Elscon - Mock - Assessment - Tagged
4 pages
Quotation Structures Poles RGGVY XII DVVNL
No ratings yet
Quotation Structures Poles RGGVY XII DVVNL
2 pages
Quotation for Air cond - 240108 - eng version (giá gốc)
No ratings yet
Quotation for Air cond - 240108 - eng version (giá gốc)
3 pages
HDFC 5000 Book4 07to31mar25
No ratings yet
HDFC 5000 Book4 07to31mar25
3 pages
SKF TrainingCalendar 2019-20 - India
No ratings yet
SKF TrainingCalendar 2019-20 - India
84 pages
Welkon Limited: Insulation (Temperature) Classes
No ratings yet
Welkon Limited: Insulation (Temperature) Classes
1 page
Assembly and Operating Instructions: Inverter Welding Machine
No ratings yet
Assembly and Operating Instructions: Inverter Welding Machine
14 pages
Optimal Foundation Design of A Vertical Pump Assembly
No ratings yet
Optimal Foundation Design of A Vertical Pump Assembly
9 pages
Futo Digital Bootcamp 2024 Timetable
No ratings yet
Futo Digital Bootcamp 2024 Timetable
3 pages
Set 12-Math-Class V
No ratings yet
Set 12-Math-Class V
6 pages
Advantage of Using PLC in Industrial Automation
No ratings yet
Advantage of Using PLC in Industrial Automation
2 pages
MFR11 Manual
No ratings yet
MFR11 Manual
59 pages
Hydraulic Port Ds
No ratings yet
Hydraulic Port Ds
2 pages
PM Clinic L11 2023
No ratings yet
PM Clinic L11 2023
2 pages
Technical University of Mombasa Transcript Year 3
100% (1)
Technical University of Mombasa Transcript Year 3
1 page
Impact Application of ICT On Office Mana
No ratings yet
Impact Application of ICT On Office Mana
34 pages
Bus 5115 - Discussion Forum Unit 1 University of The People
No ratings yet
Bus 5115 - Discussion Forum Unit 1 University of The People
5 pages
Valve T Parker TH 1000 27FM
No ratings yet
Valve T Parker TH 1000 27FM
3 pages
Pokétwitch Eng
No ratings yet
Pokétwitch Eng
5 pages
Chapter 3 - Static Performance Characterstics
No ratings yet
Chapter 3 - Static Performance Characterstics
29 pages
Facial Representation Using Linear Barco
No ratings yet
Facial Representation Using Linear Barco
11 pages