DSE Lab Assignment - Writeup - 7

The document discusses performing clustering analysis on workout data using K-means clustering in Python. It includes loading and exploring the data, selecting K-means modeling, training the model to assign clusters, and evaluating performance.

Uploaded by

1032212420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views4 pages

DSE Lab Assignment - Writeup - 7

Uploaded by

1032212420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

B.

Tech Electrical and Computer Engineering

Semester: VI Subject: Data Science for Engineers

Name: Abhishek Agrawal Class: TY El&CE
Roll no.: 03 Batch: A3

Experiment No.: 07
Name of the Experiment: Clustering using Python

Aim:
Write a Python program to perform Clustering: We have the data for the workout as below.
Date Distance_km Duration_min Delta_last_workout Day_category
10/17/17 4.3 21.58 1 0
11/04/17 1.9 9.25 18 1
11/18/17 1.9 9.0 14 1
11/23/17 1.9 8.93 5 0
11/28/17 2.3 11.94 5 0
11/29/17 2.8 14.05 1 0

To keep track of your performance you need to identify similar workout sessions. Clustering
can help you group the data into distinct groups, guaranteeing that the data points in each
group are similar to each other. Perform the following steps:
i. Load the Data
ii. Data Exploratory Analysis: Pair Plot and Distance versus workout duration,
distance versus duration with the number of days, and correlation (Scatter plot) to
get idea about the correlation between different features.
iii. Select K-means clustering for the model and get the clusters.
iv. Evaluate the performance of the model.

Theory:
Clustering based Machine Learning:
The task of grouping data points based on their similarity with each other is called Clustering
or Cluster Analysis. This method is defined under the branch of Unsupervised Learning,
which aims at gaining insights from unlabeled data points, that is, unlike supervised
learning we don’t have a target variable. Clustering aims at forming groups of homogeneous
data points from a heterogeneous dataset. It evaluates the similarity based on a metric like
Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group the points with
highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3 circular clusters
forming on the basis of distance.

It is not necessary that the clusters formed must be circular in shape. The shape of clusters
can be arbitrary. There are many algorithms that work well with detecting arbitrary shaped
clusters.
For example, In the below given graph we can see that the clusters formed are not circular in
shape.

Types of Clustering:
● Hard Clustering: In this type of clustering, each data point belongs to a cluster
completely or not.
● Soft Clustering: In this type of clustering, instead of assigning each data point into a
separate cluster, a probability or likelihood of that point being that cluster is
evaluated.
Types of Clustering Algorithms:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering
Uses of Clustering: Clustering algorithms are majorly used for:
● Market Segmentation – Businesses use clustering to group their customers and use
targeted advertisements to attract more audience.
● Market Basket Analysis – Shop owners analyze their sales and figure out which items
are majorly bought together by the customers.
● Social Network Analysis – Social media sites use your data to understand your
browsing behavior and provide you with targeted friend recommendations or content
recommendations.
● Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic
images like X-rays.
● Anomaly Detection – To find outliers in a stream of real-time datasets or forecast
fraudulent transactions we can use clustering to identify them.
K-means Clustering:
Unsupervised machine learning is the process of teaching a computer to use unlabeled,
unclassified data and enabling the algorithm to operate on that data without supervision.
Without any previous data training, the machine’s job in this case is to organize unsorted data
according to parallels, patterns, and variations.
K means clustering, assigns data points to one of the K clusters depending on their distance
from the center of the clusters. It starts by randomly assigning the clusters centroid in the
space. Then each data point assign to one of the cluster based on its distance from centroid of
the cluster. After assigning each point to one of the cluster, new cluster centroids are
assigned. This process runs iteratively until it finds good cluster.

Procedure:

1. Load the Data: Begin by loading the workout data into a Python environment. You can
use libraries such as pandas to read the data from a CSV file into a Data Frame.
2. Data Exploratory Analysis: Perform exploratory analysis on the data to understand its
structure and characteristics. Create visualizations such as pair plots to visualize the
relationships between different features. Plot Distance versus workout duration, distance
versus duration with the number of days, and correlation scatter plots to identify any
correlations between features.
3. Select K-means Clustering: Choose K-means clustering as the clustering algorithm for
the model. Determine the optimal number of clusters (K) using techniques such as the
elbow method or silhouette analysis.
4. Train the Model and Get Clusters: Train the K-means clustering model using the
workout data. Assign each data point to a cluster based on its proximity to the cluster
centroids.
5. Evaluate the Performance of the Model: Evaluate the performance of the clustering
model using metrics such as silhouette score or inertia. Visualize the clusters to gain
insights into the patterns and groupings within the data.
Conclusion:
In this lab, we performed clustering analysis on workout data using the K-means clustering
algorithm. By grouping similar workout sessions together, we can gain insights into patterns
and trends in the data. Clustering analysis can be a valuable tool for organizing and analyzing
large datasets, helping to uncover hidden patterns and relationships.

Post Lab Questions:

1. What are some real-world applications of clustering algorithms, and how do they
benefit from clustering?
2. Explain the different types of clustering algorithms.
3. Discuss different techniques to calculate the distance between centroids and data
elements.
4. Describe the various performance measures that can be used for clustering
algorithms.

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Analyzing and Interpreting Quantitative Data
No ratings yet
Analyzing and Interpreting Quantitative Data
19 pages
GSRTC Presentation BY RAHESH - BKMIBA-HLBBA
No ratings yet
GSRTC Presentation BY RAHESH - BKMIBA-HLBBA
49 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
Unit 5
No ratings yet
Unit 5
33 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
18 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
K Means
No ratings yet
K Means
9 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Unit IV
No ratings yet
Unit IV
96 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unit 4
No ratings yet
Unit 4
29 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Zara
No ratings yet
Zara
47 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
CC Unit IV
No ratings yet
CC Unit IV
30 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Unit 4
No ratings yet
Unit 4
16 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Exp 7
No ratings yet
Exp 7
3 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
02 - KNN & Regression
No ratings yet
02 - KNN & Regression
40 pages
10.lab Activity
No ratings yet
10.lab Activity
11 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Healthcare Fraud Detection System
No ratings yet
Healthcare Fraud Detection System
25 pages
Presented By: Jayson S. Hernandez: Guidance Counselor I San Miguel National High School
No ratings yet
Presented By: Jayson S. Hernandez: Guidance Counselor I San Miguel National High School
36 pages
Business Statistics FinalExam Paper 1 Spring 2022
No ratings yet
Business Statistics FinalExam Paper 1 Spring 2022
2 pages
Lecture - Elements of Simulation Analysis and Activity
No ratings yet
Lecture - Elements of Simulation Analysis and Activity
5 pages
6414 SP2022 Practice Final Part1 Solutions
No ratings yet
6414 SP2022 Practice Final Part1 Solutions
3 pages
Topic2 EDA 5
No ratings yet
Topic2 EDA 5
19 pages
AI in Language in Learning and Teaching
No ratings yet
AI in Language in Learning and Teaching
15 pages
UNIT I Complete Notes
No ratings yet
UNIT I Complete Notes
5 pages
Assumption of Regresion
No ratings yet
Assumption of Regresion
18 pages
6 Ridge Regression
No ratings yet
6 Ridge Regression
7 pages
Assignment of HR Analytics
No ratings yet
Assignment of HR Analytics
3 pages
RESEARCH Chapters 1 5 1
No ratings yet
RESEARCH Chapters 1 5 1
70 pages
Universal Project Format
No ratings yet
Universal Project Format
9 pages
The Impact of Brand Image Towards Loyalty With Satisfaction As A Mediator in Mcdonald'S
No ratings yet
The Impact of Brand Image Towards Loyalty With Satisfaction As A Mediator in Mcdonald'S
9 pages
Miller - Haden - 2013 - GLM Statistical Analysis PDF
No ratings yet
Miller - Haden - 2013 - GLM Statistical Analysis PDF
274 pages
Chuchu's Assignment
100% (1)
Chuchu's Assignment
26 pages
Internship
No ratings yet
Internship
9 pages
Unit 5.1 Testing The Difference Between Two Independent Population Means
No ratings yet
Unit 5.1 Testing The Difference Between Two Independent Population Means
26 pages
Data Analytics Assignment Solutions
No ratings yet
Data Analytics Assignment Solutions
20 pages
Research Samples and Explanations
No ratings yet
Research Samples and Explanations
56 pages
Megan Bryant Hw2
No ratings yet
Megan Bryant Hw2
14 pages
Cost Behavior and Forecasting: Seventh Edition
No ratings yet
Cost Behavior and Forecasting: Seventh Edition
130 pages
MAA SL 4.4 LINEAR REGRESSION (Concise)
No ratings yet
MAA SL 4.4 LINEAR REGRESSION (Concise)
10 pages
BA Final Report
No ratings yet
BA Final Report
28 pages
Data Cleaning Ebook
No ratings yet
Data Cleaning Ebook
25 pages
11 - Descriptive and Inferential Statistics - ThoughtCo
No ratings yet
11 - Descriptive and Inferential Statistics - ThoughtCo
3 pages
Unbalanced Panel Data PDF
No ratings yet
Unbalanced Panel Data PDF
51 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
37 pages

DSE Lab Assignment - Writeup - 7

Uploaded by

DSE Lab Assignment - Writeup - 7

Uploaded by

B.

Tech Electrical and Computer Engineering

Semester: VI Subject: Data Science for Engineers

Post Lab Questions:

You might also like