0% found this document useful (0 votes)

33 views8 pages

Clustering in Python-Dr. Afsaneh Javadi

The document provides an overview of clustering in Python using libraries such as Sklearn, NumPy, SciPy, and Matplotlib. It includes examples of K-means and DBSCAN clustering algorithms, along with explanations of data manipulation and visualization techniques. The document emphasizes the importance of preprocessing data and tuning hyperparameters for effective clustering results.

Uploaded by

lindsay.yareth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views8 pages

Clustering in Python-Dr. Afsaneh Javadi

Uploaded by

lindsay.yareth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Clustering in Python Dr.

Afsaneh Javadi

Sklearn (short for Scikit-learn) is a free, open-source machine learning library for Python. It provides
simple and efficient tools for data analysis and is built on top of NumPy, SciPy, and matplotlib. Sklearn
supports various supervised and unsupervised learning algorithms, such as classification, regression,
clustering, and dimensionality reduction. It also includes tools for data preprocessing, model selection,
and evaluation. Sklearn is widely used in academia and industry for a variety of machine learning tasks.

NumPy is a Python library used for performing numerical operations in scientific computing. It provides
efficient multi-dimensional arrays and matrices, along with a large variety of mathematical functions to
operate on these arrays. NumPy is widely used in data science, machine learning, and scientific
computing domains where numerical computations and large data sets are common.

Here are some examples of how to use NumPy:

1.Creating an array:

import numpy as np

arr = np.array([1, 2, 3])

2.Reshaping an array:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

new_arr = arr.reshape(3, 2)

3.Applying math functions to an array:

import numpy as np

arr = np.array([1, 2, 3])

new_arr = np.sqrt(arr)

4.Indexing and slicing an array:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr[3])

print(arr[1:4])

5.Combining two arrays:

import numpy as np

1|Page
Clustering in Python Dr. Afsaneh Javadi

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

new_arr = np.concatenate((arr1, arr2))

6.Applying statistical functions to an array:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

mean = np.mean(arr)

median = np.median(arr)

std_dev = np.std(arr)

2|Page
Clustering in Python Dr. Afsaneh Javadi

3|Page
Clustering in Python Dr. Afsaneh Javadi

SciPy is a Python-based open-source software library that is used for scientific computing, data analysis,
and numerical optimization. It provides a range of powerful tools and functions for various fields like
mathematics, engineering, physics, and statistics.

here are some examples of common use cases for SciPy:

1. Optimization: using the scipy.optimize module to find the minimum or maximum of a function,
or to solve constrained optimization problems.

2. Integration: using the scipy.integrate module to calculate definite integrals of functions on a

given interval.

3. Interpolation: using the scipy.interpolate module to fit a curve to a set of data points, or to
generate new values from existing data.

4. Linear algebra: using the scipy.linalg module to perform matrix operations, such as finding
eigenvalues and eigenvectors, solving linear systems of equations, or computing matrix
decompositions.

5. Signal processing: using the scipy.signal module to perform a variety of signal processing tasks,
such as Fourier transforms, filtering, and spectral analysis.

6. Statistics: using the scipy.stats module to generate random numbers, calculate summary
statistics, or perform hypothesis testing and statistical inference.

7. Image processing: using the scipy.ndimage module to process images, such as filtering or
enhancing contrast.

8. Sparse matrices: using the scipy.sparse module to work with large, sparse matrices efficiently.

Matplotlib is a Python library used for creating static, interactive, and animated data visualizations in
Python. It provides a wide variety of plots, such as line plots, bar plots, histograms, scatter plots, 3D
plots, and heatmaps. Matplotlib allows for easy customization of the plots, including color and font
choices, line and marker styles, and axis formatting. It is widely used in scientific and engineering fields
for data analysis and visualization.

Line plot: A simple line plot can be created using Matplotlib to show a trend over time or any dependent
variable.

Scatter plot: A scatter plot can be used when we have two numeric variables and we want to see their
relationship.

Histogram plot: Histogram plot helps in understanding the pattern of the distribution of a numeric
variable.

4|Page
Clustering in Python Dr. Afsaneh Javadi

Bar plot: A bar plot can be used when we have categorical variables and we want to compare them.

Heatmap: Heatmap is a graphical representation of data where values are displayed as colors.

6.Box plot: Box plot gives us an idea about the shape of the distribution and the presence of the outliers.

Pie chart: Pie chart can be used to show the proportion of different categories in a dataset.

3D plot: Matplotlib provides tools to create 3D plots which can be used to visualize data in a 3D space.

Polar plot: Polar plot is used to plot data points in a polar coordinate system.

Subplots: Using subplots we can create multiple plots in the same figure for a better comparison of data.

5|Page
Clustering in Python Dr. Afsaneh Javadi

Example1

There are many clustering algorithms that you can use depending on your data and the problem you are
trying to solve. Here is an example of how to write code for K-means clustering algorithm in Python:

from sklearn.cluster import KMeans import numpy as np

# Assume you have a dataset of features that you want to cluster

# The features are stored in a NumPy array called "features"

num_clusters = 3

# Create a KMeans object with the number of clusters you want to find

kmeans = KMeans(n_clusters=num_clusters)

# Fit the KMeans object to your datasetkmeans.fit(features)

# Get the labels of each data point (i.e., which cluster it belongs to)

labels = kmeans.labels_

# Get the centroids of each cluster

centroids = kmeans.cluster_centers_

6|Page
Clustering in Python Dr. Afsaneh Javadi

In the code above, we first import the KMeans class from the scikit-learn library. We then define the
number of clusters we want to find (num_clusters) and create a KMeans object with that number of
clusters. We fit the KMeans object to our dataset of features (features) using the fit() method. We then
use the labels_ attribute to get the cluster labels for each data point in our dataset, and the
cluster_centers_ attribute to get the centroids of each cluster.

Keep in mind that there are many different clustering algorithms, and the implementation details may
differ depending on the algorithm you choose to use. Additionally, you may need to preprocess your
data or tune the hyperparameters of your clustering algorithm to get good results.

Example 2:

here's an example implementation of the DBSCAN algorithm in Python using the scikit-learn library:

7|Page
Clustering in Python Dr. Afsaneh Javadi

In this example, we load the iris dataset and define a DBSCAN instance with an epsilon value of 0.5 and
minimum number of samples per cluster of 5. We then fit the model to the data and retrieve the
resulting cluster labels. Finally, we print out the predicted labels for each data point. Note that the
DBSCAN algorithm does not require us to specify the number of clusters in advance - it discovers them
automatically based on the input parameters.

8|Page

1Z0-1047-24 - Oracle Absence Cloud - Final
100% (1)
1Z0-1047-24 - Oracle Absence Cloud - Final
22 pages
Chemistry 2nd Edition Tro Test Bank
No ratings yet
Chemistry 2nd Edition Tro Test Bank
27 pages
Mixer Systems Mech and TRNG Guide 120515
80% (5)
Mixer Systems Mech and TRNG Guide 120515
204 pages
Mathematical Foundations For Data Science: BITS Pilani
No ratings yet
Mathematical Foundations For Data Science: BITS Pilani
36 pages
Basic Functions t24
No ratings yet
Basic Functions t24
6 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
Unit 4
No ratings yet
Unit 4
105 pages
Dav Lab
No ratings yet
Dav Lab
8 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
AIML Short Term Internship Session 9 Summary-1719044709410
No ratings yet
AIML Short Term Internship Session 9 Summary-1719044709410
14 pages
Exp 1
No ratings yet
Exp 1
22 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
DM File
No ratings yet
DM File
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
ML Exp
No ratings yet
ML Exp
9 pages
l9 Scientific Python Proc
No ratings yet
l9 Scientific Python Proc
30 pages
Internship Presentation
No ratings yet
Internship Presentation
18 pages
Mastering Python Data Visualization - Sample Chapter
100% (9)
Mastering Python Data Visualization - Sample Chapter
63 pages
Data Science
No ratings yet
Data Science
17 pages
Roadmap
No ratings yet
Roadmap
27 pages
Machine Learning - Manual
No ratings yet
Machine Learning - Manual
32 pages
AI/ML Python Modules
No ratings yet
AI/ML Python Modules
17 pages
Unit-2 Ds
No ratings yet
Unit-2 Ds
26 pages
TWP
No ratings yet
TWP
2 pages
Numpy Lib
No ratings yet
Numpy Lib
19 pages
FDS Lab
No ratings yet
FDS Lab
11 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Part3 ML
No ratings yet
Part3 ML
201 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
DSLab2020 - Week 1 Exercises
No ratings yet
DSLab2020 - Week 1 Exercises
30 pages
Lab Description File
No ratings yet
Lab Description File
11 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
DSBDA
No ratings yet
DSBDA
145 pages
ML Unit-5
No ratings yet
ML Unit-5
8 pages
3-Numpy Pandas
No ratings yet
3-Numpy Pandas
37 pages
PP Unit-5 Notes
No ratings yet
PP Unit-5 Notes
15 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Data Science Project Training Report
No ratings yet
Data Science Project Training Report
19 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
Numpy Code
No ratings yet
Numpy Code
10 pages
HW5 Clustering (50 PTS) : Test Algorithms
No ratings yet
HW5 Clustering (50 PTS) : Test Algorithms
5 pages
Ass 1 DSBDL
No ratings yet
Ass 1 DSBDL
24 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Python Datasci Slides
No ratings yet
Python Datasci Slides
13 pages
Pythonlibraries
No ratings yet
Pythonlibraries
20 pages
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
PP&DS Unit Iii
No ratings yet
PP&DS Unit Iii
26 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Unit 5 Python Notes HM
No ratings yet
Unit 5 Python Notes HM
59 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
Practical # 8
No ratings yet
Practical # 8
16 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
UNIT Vnotes
No ratings yet
UNIT Vnotes
44 pages
Unit 5
No ratings yet
Unit 5
27 pages
UNIT-6 (Data Analytics and Visualization With Python)
No ratings yet
UNIT-6 (Data Analytics and Visualization With Python)
41 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
C Programming Pointers
No ratings yet
C Programming Pointers
120 pages
CHAPTER1 Datamining
No ratings yet
CHAPTER1 Datamining
33 pages
Security INFO CISSP R-Whitmanch03-04
No ratings yet
Security INFO CISSP R-Whitmanch03-04
94 pages
CCNP Prepared BRKCRT-2016
No ratings yet
CCNP Prepared BRKCRT-2016
72 pages
S2MTC Mid
No ratings yet
S2MTC Mid
3 pages
Koren - CH 02 PDF
No ratings yet
Koren - CH 02 PDF
19 pages
Tutorial 10: Solving Cutting Stock Problem Using Column Generation Technique
No ratings yet
Tutorial 10: Solving Cutting Stock Problem Using Column Generation Technique
13 pages
Journals Price List For 2022: No. Title Journal Abbreviation Issn (Print) Issn (E-Only)
No ratings yet
Journals Price List For 2022: No. Title Journal Abbreviation Issn (Print) Issn (E-Only)
24 pages
RBI-Grade-B-Quantitative-Aptitude-Question-Paper-2018-Phase-I 2 PDF
No ratings yet
RBI-Grade-B-Quantitative-Aptitude-Question-Paper-2018-Phase-I 2 PDF
15 pages
Aire Acondicionado LG
No ratings yet
Aire Acondicionado LG
78 pages
More More Symcli
100% (1)
More More Symcli
10 pages
Infiniti 7000 Manual PDF
No ratings yet
Infiniti 7000 Manual PDF
84 pages
Photo Realistic Drawing Project Description
No ratings yet
Photo Realistic Drawing Project Description
4 pages
All Biology Worksheets
No ratings yet
All Biology Worksheets
40 pages
Golden Physics Book
No ratings yet
Golden Physics Book
90 pages
GAS - AMPZILLA Original
No ratings yet
GAS - AMPZILLA Original
8 pages
.300 Win. Magnum Ballistics Calcs (QuickTarget Unlimited Lapua Edition)
No ratings yet
.300 Win. Magnum Ballistics Calcs (QuickTarget Unlimited Lapua Edition)
4 pages
UKZN Map - Westville
0% (1)
UKZN Map - Westville
1 page
Customizing AutoCAD Isometrics in AutoCAD Plant 3D
No ratings yet
Customizing AutoCAD Isometrics in AutoCAD Plant 3D
11 pages
Knowing Atoms Better: Htt0.p://phet - Colorado.edu/en/simulation/build-An-Atom
No ratings yet
Knowing Atoms Better: Htt0.p://phet - Colorado.edu/en/simulation/build-An-Atom
5 pages
89128001EN
No ratings yet
89128001EN
106 pages
About Version Control
No ratings yet
About Version Control
6 pages
Pure Substance and Mixture
No ratings yet
Pure Substance and Mixture
7 pages
Bones of Upper Limb (Anatomy Practical) Mansoura
100% (1)
Bones of Upper Limb (Anatomy Practical) Mansoura
27 pages
Introduction To Teradata Data Mover Create Your First Job
No ratings yet
Introduction To Teradata Data Mover Create Your First Job
5 pages
First Mock Exam As Chem MCQ Nov 23
No ratings yet
First Mock Exam As Chem MCQ Nov 23
16 pages
Badass Ebikes Calibration Instructions Pro v1
No ratings yet
Badass Ebikes Calibration Instructions Pro v1
3 pages
K04 Type
No ratings yet
K04 Type
6 pages

Clustering in Python-Dr. Afsaneh Javadi

Uploaded by

Clustering in Python-Dr. Afsaneh Javadi

Uploaded by

Clustering in Python Dr.

Here are some examples of how to use NumPy:

arr = np.array([1, 2, 3])

arr = np.array([[1, 2, 3], [4, 5, 6]])

3.Applying math functions to an array:

arr = np.array([1, 2, 3])

4.Indexing and slicing an array:

arr = np.array([1, 2, 3, 4, 5])

5.Combining two arrays:

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

new_arr = np.concatenate((arr1, arr2))

6.Applying statistical functions to an array:

arr = np.array([1, 2, 3, 4, 5])

here are some examples of common use cases for SciPy:

2. Integration: using the scipy.integrate module to calculate definite integrals of functions on a

from sklearn.cluster import KMeans import numpy as np

# Assume you have a dataset of features that you want to cluster

# The features are stored in a NumPy array called "features"

# Fit the KMeans object to your datasetkmeans.fit(features)

# Get the centroids of each cluster

You might also like