0% found this document useful (0 votes)

105 views84 pages

Clustering

This document discusses different types of machine learning algorithms including supervised learning, unsupervised learning, and clustering. It provides examples of clustering applications such as customer segmentation, crime analysis, and social network analysis. The document then focuses on k-means clustering, describing how it works, the algorithm, limitations, and providing a simple example. Finally, it briefly discusses hierarchical clustering and different approaches.

Uploaded by

manmeet singh tuteja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views84 pages

Clustering

Uploaded by

manmeet singh tuteja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Clustering

Machine learning: Supervised vs

Unsupervised.
Supervised learning - discover patterns in the data that relate data
attributes with a target (class) attribute.
• there must be a training data set in which the solution is already
known.

Unsupervised learning - the outcomes are unknown or The data have

no target attribute.

• cluster the data to reveal meaningful partitions and Hierarchies

• We have to explore the data to find some intrinsic structures in
them.
INTRODUCTION-
What is clustering?
• Clustering is the classification of objects into
different groups, or more precisely, the
partitioning of a data set into subsets (clusters),
so that the data in each subset (ideally) share
some common trait - often according to some
defined distance measure
Examples
• Let us see some real-life examples
• Example 1: groups people of similar sizes
together to make “small”, “medium” and
“large” T-Shirts.
– Tailor-made for each person: too expensive
– One-size-fits-all: does not fit all.
• Example 2: In marketing, segment customers
according to their similarities
– To do targeted marketing.
Current Applications
• Document Classification: Cluster documents in
multiple categories based on tags, topics, and the
content of the document. This is a very standard
classification problem and k-means is a highly
suitable algorithm for this purpose.
• Identifying Crime Localities: With data related
to crimes available in specific localities in a city,
the category of crime, the area of the crime, and
the association between the two can give quality
insight into crime-prone areas within a city
Current Applications
• Customer Segmentation: Clustering helps
marketers improve their customer base, work on
target areas, and segment customers based
on purchase history, interests, or activity
monitoring.
• Insurance Fraud Detection: Utilizing past
historical data on fraudulent claims, it is possible
to isolate new claims based on its proximity to
clusters that indicate fraudulent patterns.
Current Applications
• Rideshare Data Analysis: The publicly
available Uber ride information dataset
provides a large amount of valuable data
around traffic, transit time, peak pickup
localities, and more. Analyzing this data is
useful not just in the context of Uber but also
in providing insight into urban traffic patterns
and helping us plan for the cities of the future.
Current Applications
• Social network analysis - Facebook
"smartlists"
• Organizing computer clusters and data
centers for network layout and location
• Astronomical data analysis - Understanding
galaxy formation
illustration
• The data set has three natural groups of data
points, i.e., 3 natural clusters.
Aspects of clustering
• A distance (similarity, or dissimilarity)
function
• Clustering quality
– Inter-clusters distance  maximized
– Intra-clusters distance  minimized
• The quality of a clustering result depends on
the algorithm, the distance function, and the
application.
Types of clustering
• Hierarchical algorithms: these find
successive clusters
1. Agglomerative ("bottom-up"): Agglomerative
algorithms begin with each element as a
separate cluster and merge them into
successively larger clusters.
2. Divisive ("top-down"): Divisive algorithms begin
with the whole set and proceed to divide it into
successively smaller clusters.
Types of clustering
• Partitional clustering: Partitional algorithms
determine all clusters at once. It include:
K-means and derivatives.
• The k-means algorithm is an algorithm to cluster n
objects based on attributes into k partitions, where k
< n.
• It assumes that the object attributes form a vector
space.
Other Approaches
• Density-based
• Mixture model
• Spectral methods
K-means clustering
• k-means clustering is an algorithm to classify or to
group the objects based on attributes/features into K
number of group.
• K is positive integer number.
• The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
How it works?
Algorithm
• Step 1: Begin with a decision on the value of k =
number of clusters .
• Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the
training samples randomly, or systematically
as the following:
1.Take the first k training sample as single-
element clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest
centroid. After each assignment, recompute
the centroid of the gaining cluster.
Algorithm
• Step 3: Take each sample in sequence and
compute its distance from the centroid of each of
the clusters. If a sample is not currently in the
cluster with the closest centroid, switch this
sample to that cluster and update the centroid of
the cluster gaining the new sample and the cluster
losing the sample.
• Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.
Simple example
• Take some random values.
• Consider number of clusters and centroids
randomly.
• 3,8,24,91,53,75,31,9,6,44,62,15
• Two clusters?
• Consider mid points as 24 and 62
• With 24 -?
• 3,8,31,9,6,15
• With 62-44, 53,91,75
• With three clusters with 15, 44 and 75 as mid
points?
A Simple example showing the implementation
of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
• Thus, we obtain two
• clusters containing:
{1,2,3} and {4,5,6,7}.
• Their new centroids are:
Step 3:
• Now using these centroids
we compute the Euclidean
distance of each object,
as shown in table.
• Therefore, the new
clusters are:{1,2} and
{3,4,5,6,7}

• Next centroids are:

m1=(1.25,1.5) and
m2 = (3.9,5.1)
• Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
• Therefore, there is
no change in the cluster.
• Thus, the algorithm
comes to a halt here and
final result consist of
2 clusters {1,2} and
{3,4,5,6,7}.
PLOT
(with K=3) Step 1
Step 2
PLOT
Limitations
• K-means is extremely
sensitive to cluster
center initializations
• Bad initialization can
lead to Poor
convergence speed
• Bad initialization can
lead to bad overall
clustering
‘k’ value
• Elbow method
• Within Group Sum of
Square(WGSS)
• Convergence value
will be chosen
Practise
• We have 4 medicines as our training data points
object and each medicine has 2 attributes. Each
attribute represents coordinate of the object. We have
to determine which medicines belong to cluster 1 and
which medicines belong to the other cluster.
Attribute1 (X): Attribute 2 (Y): pH
Object
weight index

Medicine A 1 1

Medicine B 2 1

Medicine C 4 3

Medicine D 5 4
Hierarchical Clustering
• Use distance matrix as clustering criteria. This method does
not require the number of clusters k as an input, but needs a
termination condition
Step 0 Step 1 Step 2 Step 3 Step 4
agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
34
AGNES (Agglomerative Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Implemented in statistical packages, e.g., Splus
• Use the single-link method and the dissimilarity matrix
• Merge nodes that have the least dissimilarity
• Go on in a non-descending fashion
• Eventually all nodes belong to the same cluster
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

35
Dendrogram: Shows How Clusters are Merged

Decompose data
objects into a several
levels of nested
partitioning (tree of
clusters), called a
dendrogram

A clustering of the
data objects is
obtained by cutting
the dendrogram at
the desired level,
then each connected
component forms a
cluster
DIANA (Divisive Analysis)

• Introduced in Kaufmann and Rousseeuw (1990)

• Implemented in statistical analysis packages, e.g., Splus
• Inverse order of AGNES
• Eventually each node forms a cluster on its own

10 10
10

9 9
9

8 8
8

7 7
7

6 6
6

5 5
5

4 4
4

3 3
3

2 2
2

1 1
1

0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

37
Example of converting data points
into distance matrix
• Clustering analysis with agglomerative
algorithm

data matrix

distance matrix

Euclidean distance 38
Example
X Y

A 0.40 0.53

B 0.22 0.38

C 0.35 0.32

D 0.26 0.19

E 0.08 0.41

F 0.45 0.30
Example
A B C D E F

A 0

B 0.23 0

C 0.22 0.15 0

D 0.37 0.20 0.15 0

E 0.34 0.14 0.28 0.29 0

F 0.23 0.25 0.11 0.22 0.39 0

Example
A B C,F D E

A 0

B 0.23 0

C,F 0.22 0.15 0

D 0.37 0.20 0.15 0

E 0.34 0.14 0.28 0.29

Example
A B,E C,F D

A 0

B,E 0.23 0

C,F 0.22 0.15 0

D 0.37 0.20 0.15 0

Example
A (B,E), (C,F) D

A 0

(B,E), (C,F) 0.22 0

D 0.37 0.15 0
Practise
1 2 3 4 5
1 0
2 9 0
3 3 7 0
4 6 5 9 0
5 11 10 2 8 0
MIN or Single Link
Inter-cluster distance
• The distance between two clusters is
represented by the distance of the closest pair
of data objects belonging to different clusters.
• Determined by one pair of points, i.e., by one
link in the proximity graph
MAX or Complete Link
Inter-cluster distance
• The distance between two clusters is
represented by the distance of the farthest pair
of data objects belonging to different clusters
Distance between X X

Clusters
• Single link: smallest distance between an element in one cluster and an
element in the other, i.e., dist(Ki, Kj) = min(tip, tjq)
• Complete link: largest distance between an element in one cluster and an
element in the other, i.e., dist(Ki, Kj) = max(tip, tjq)
• Average: avg distance between an element in one cluster and an element in
the other, i.e., dist(Ki, Kj) = avg(tip, tjq)
• Centroid: distance between the centroids of two clusters, i.e., dist(Ki, Kj) =
dist(Ci, Cj)
• Medoid: distance between the medoids of two clusters, i.e., dist(Ki, Kj) =
dist(Mi, Mj)
– Medoid: a chosen, centrally located object in the cluster

49
Centroid, Radius and Diameter of a Cluster
(for numerical data sets)
• Centroid: the “middle” of a cluster iN= 1(t )
Cm = N ip

• Radius: square root of average distance from any point of the

cluster to its centroid  N (t − cm ) 2
Rm = i =1 ip
N
• Diameter: square root of average mean squared distance
between all pairs of points in the cluster

 N  N (t − t ) 2
Dm = i =1 i =1 ip iq
N ( N −1)

50
Parametric vs Non Parametric
Estimation
Learning a Function

• Machine learning can be summarized as

learning a function (f) that maps input
variables (X) to output variables (Y).
Y = f(x)
• An algorithm learns this target mapping
function from training data
• Different algorithms make different
assumptions or biases about the form of the
function and how it can be learned.
Parametric Machine Learning
Algorithms
• Assumptions can greatly simplify the learning
process, but can also limit what can be learned.
Algorithms that simplify the function to a known
form are called parametric machine learning
algorithms.
• A learning model that summarizes data with a set
of parameters of fixed size (independent of the
number of training examples) is called a
parametric model.
• No matter how much data you throw at a
parametric model, it won’t change about how
many parameters it needs.
The algorithms involve two steps:
1. Select a form for the function.
2. Learn the coefficients for the function from
the training data.
• An easy to understand functional form for the
mapping function is a line, as is used in linear
regression: b0 + b1*x1 + b2*x2 = 0
• Where b0, b1 and b2 are the coefficients of the
line that control the intercept and slope, and x1
and x2 are two input variables.
Parametric Estimation
• Assuming the functional form of a line greatly
simplifies the learning process. Now, all we need to
do is estimate the coefficients of the line equation and
we have a predictive model for the problem.
• Some more examples of parametric machine learning
algorithms include
– Logistic Regression
– Linear Discriminant Analysis
– Perceptron
– Naive Bayes
– Simple Neural Networks
Benefits of Parametric Machine
Learning Algorithms:
• Simpler: These methods are easier to
understand and interpret results.
• Speed: Parametric models are very fast to
learn from data.
• Less Data: They do not require as much
training data and can work well even if the fit
to the data is not perfect.
Limitations of Parametric Machine
Learning Algorithms:
• Constrained: By choosing a functional form
these methods are highly constrained to the
specified form.
• Limited Complexity: The methods are more
suited to simpler problems.
• Poor Fit: In practice the methods are unlikely
to match the underlying mapping function.
Nonparametric Machine Learning
Algorithms
• Algorithms that do not make strong assumptions
about the form of the mapping function are called
nonparametric machine learning algorithms. By
not making assumptions, they are free to learn any
functional form from the training data.
• Nonparametric methods are good when you have
a lot of data and no prior knowledge, and when
you don’t want to worry too much about choosing
just the right features.
Nonparametric Estimation
• Nonparametric methods seek to best fit the training data
in constructing the mapping function, whilst
maintaining some ability to generalize to unseen data.
As such, they are able to fit a large number of
functional forms.
• An easy to understand nonparametric model is the k-
nearest neighbors algorithm that makes predictions
based on the k most similar training patterns for a new
data instance. The method does not assume anything
about the form of the mapping function other than
patterns that are close are likely have a similar output
variable.
Nonparametric Estimation
• Some more examples of popular non
parametric machine learning algorithms are:
• k-Nearest Neighbours
• Decision Trees like CART and C4.5
• Support Vector Machines
Benefits of Nonparametric Machine
Learning Algorithms:
• Flexibility: Capable of fitting a large number
of functional forms.
• Power: No assumptions (or weak
assumptions) about the underlying function.
• Performance: Can result in higher
performance models for prediction.
Limitations of Nonparametric Machine
Learning Algorithms:
• More data: Require a lot more training data to
estimate the mapping function.
• Slower: A lot slower to train as they often have
far more parameters to train.
• Overfitting: More of a risk to overfit the
training data and it is harder to explain why
specific predictions are made.
K Nearest Neighbour Classification
• It classifies new points based on the similarity
measure.
• Also identifies data points that are separated
into several classes to predict the classification
of a sample point.
K Nearest Neighbor Classification
• Step 1: Initialize ‘k’
• Step 2: For each sample in the training data,
– Calculate distance between query point and the current
point
– Add the distance and the index of the example to an
ordered collection.
• Sort the ordered collection of distances and
indexes from small to large.
• Pick the first ‘k’ entries from the list.
• Get the labels of selected ‘k’ entries.
K Nearest Neighbor Classification
Height Weight T Shirt Size
158 58 M
158 59 M
158 63 M
160 59 M
160 60 M
163 60 M
163 61 M
160 64 L
163 64 L
165 61 L
165 62 L
165 65 L
168 62 L
168 63 L
168 66 L
170 63 L
170 64 L
170 68 L
K=5 and for an input height as 161cm
and weight as 61kg
K Nearest Neighbor Classification-
Visualization
KNN vs. K-mean
• K-mean is an unsupervised learning technique (no
dependent variable) whereas KNN is a supervised
learning algorithm (dependent variable exists)
• K-mean is a clustering technique which tries to
split data points into K-clusters such that the
points in each cluster tend to be near each other
whereas K-nearest neighbor tries to determine the
classification of a point, combines the
classification of the K nearest points
K Nearest Neighbor Classification -
Practise
Perform KNN Classification algorithm on
following dataset and predict the class for P1=3
and P2=7. Consider k=3.
P1 P2 Class
7 7 False
7 5 False
5 6 False
3 4 True
2 3 True
4 3 True
Voronoi Diagram
Nonparametric Regression:
Smoothing Models
Regression
• In regression, given the training set X ={xt, rt}
where rt ∈ R, we assume
rt = g(xt ) + ∈
• In parametric regression, we assume a
polynomial of a certain order and compute its
coefficients that minimize the sum of squared
error on the training set.
Nonparametric regression
• Nonparametric regression is used when no
such polynomial can be assumed;
• we only assume that close x have close g(x)
values.
• As in nonparametric density estimation, given
x, our approach is to find the neighborhood of
x and average the r values in the neighborhood
to calculate ˆg(x).
Nonparametric regression
• The nonparametric regression estimator is also
called a smoother and the estimate is called a
smooth
Regressogram
• a commonly used simple non-parametric
method
Regressogram

• This is an analysis for astronomy data. On the

X-axis is the galaxy distance to some
cosmological structure and on the Y-axis is the
correlation for some features of this galaxy. We
binned the data according to galaxy distance
and take the mean within each bin as a
landmark (or summary) and show how this
landmark changes along galaxy distance.
Regressogram

• Note that now the range of Y is (0,1) while in

the regressogram, the range is (0.7, 0.8). If you
want to visualize the data, this scatter plot will
not be helpful. The regressogram, however, is
a simple approach to visualize hidden structure
within this complicated data.
• Here’s the steps for constructing regressogram.
First we bin the data according to the X-axis
(shown by red lines):
• Then we compute the mean within each bin
(shown by the blue points):
• We can show only the blue points (and blue
curves, which just connects each points) so
that the result looks much more concise:
• However, since the range for Y-axis is too large, this
does not show the trend. So we zoom-in and compute
the error for estimating the mean within each bin.
• The advantage for regressogram is its simplicity.
Since we’re summarizing the whole data by
points representing the mean within each bin, the
interpretation is very straight-forward.
• Also, it shows the trend (and error bars) for the
data so that we have rough idea what’s going on.
Moreover, no matter how complicated the original
plot is, the regressogram uses only a few of
statistics (the mean within each bin) to summarize
the whole data. Notice that we do not make any
assumption on distribution (like normally
distributed) of the data; thus, regressogram is a
non-parametric method.
Kernel smoother
• KDE

K – Kernel function (non negative)

h – smoothing parameter

BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
41 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Underground Mining Fundamentals P13GR37WEBPDF
No ratings yet
Underground Mining Fundamentals P13GR37WEBPDF
4 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Clustering
No ratings yet
Clustering
80 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Clustering
No ratings yet
Clustering
75 pages
Physics Ia (Electricity)
No ratings yet
Physics Ia (Electricity)
5 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Ielts Writing Task 2
No ratings yet
Ielts Writing Task 2
52 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Week 11
No ratings yet
Week 11
49 pages
Week 9
No ratings yet
Week 9
66 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Clustering
No ratings yet
Clustering
75 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Interview To A Manager
No ratings yet
Interview To A Manager
5 pages
Module 5
No ratings yet
Module 5
98 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Unit 4
No ratings yet
Unit 4
125 pages
Computer Science 1
No ratings yet
Computer Science 1
61 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
38 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Autobiography Rubric: Category 4 3 2 1
No ratings yet
Autobiography Rubric: Category 4 3 2 1
1 page
Design Calculation: Hindustan Construction Co. LTD
No ratings yet
Design Calculation: Hindustan Construction Co. LTD
13 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Pascal Contest: (Grade 9)
No ratings yet
Pascal Contest: (Grade 9)
6 pages
Clustering
No ratings yet
Clustering
29 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
University of Mumbai: Revised Syllabus W.E.F. Academic Year, 2016-18
No ratings yet
University of Mumbai: Revised Syllabus W.E.F. Academic Year, 2016-18
5 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
DYNA102 Stanadyne Pump
100% (3)
DYNA102 Stanadyne Pump
4 pages
Clustering
No ratings yet
Clustering
125 pages
Unit 4
No ratings yet
Unit 4
74 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
K Mean
No ratings yet
K Mean
7 pages
K Mean
No ratings yet
K Mean
12 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Horizon (Ceiling Hung) : Key Features
No ratings yet
Horizon (Ceiling Hung) : Key Features
2 pages
Kra 4 Community Linkages and Professional Engagement & Personal Growth and
No ratings yet
Kra 4 Community Linkages and Professional Engagement & Personal Growth and
7 pages
Nystesc 2019 Primer
No ratings yet
Nystesc 2019 Primer
64 pages
Final ThesisII
No ratings yet
Final ThesisII
82 pages
Skema Toshiba c805 By3 By4
No ratings yet
Skema Toshiba c805 By3 By4
49 pages
Semi Finals Examination: Multiple Choice
No ratings yet
Semi Finals Examination: Multiple Choice
6 pages
Feds Env Habitats
No ratings yet
Feds Env Habitats
2 pages
HSBC Digital Starter Kit Masterbrand HBPH
No ratings yet
HSBC Digital Starter Kit Masterbrand HBPH
27 pages
Adf Scheme - List of The Colleges and Departments Approved by Aicte SL. NO. Name of The College Name of The Departments
No ratings yet
Adf Scheme - List of The Colleges and Departments Approved by Aicte SL. NO. Name of The College Name of The Departments
1 page
Impact Application of ICT On Office Mana
No ratings yet
Impact Application of ICT On Office Mana
34 pages
Cambridge International AS & A Level: Geography 9696/41
No ratings yet
Cambridge International AS & A Level: Geography 9696/41
24 pages
Test CAE
No ratings yet
Test CAE
10 pages
Wheel Decide Tutorial - Youtube
No ratings yet
Wheel Decide Tutorial - Youtube
3 pages
HW - 7 1
No ratings yet
HW - 7 1
4 pages
The Straits Times Life! - Interviews With Mediacorp Artistes, Pierre PNG, Cynthia Koh and Rebecca Lim
No ratings yet
The Straits Times Life! - Interviews With Mediacorp Artistes, Pierre PNG, Cynthia Koh and Rebecca Lim
1 page
Benguet EP 2017
No ratings yet
Benguet EP 2017
158 pages
PM Clinic L11 2023
No ratings yet
PM Clinic L11 2023
2 pages
Germany Aman Singh Jaswal
No ratings yet
Germany Aman Singh Jaswal
4 pages
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
No ratings yet
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
4 pages
Ecn 104 Foundations of Managerial Economics Syllabus
No ratings yet
Ecn 104 Foundations of Managerial Economics Syllabus
11 pages
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Vector Calculus Using Mathematica Second Edition
From Everand
Vector Calculus Using Mathematica Second Edition
Steven Tan
No ratings yet

Clustering

Uploaded by

Clustering

Uploaded by

Clustering

Machine learning: Supervised vs

Unsupervised learning - the outcomes are unknown or The data have

• cluster the data to reveal meaningful partitions and Hierarchies

• Next centroids are:

• Introduced in Kaufmann and Rousseeuw (1990)

D 0.37 0.20 0.15 0

E 0.34 0.14 0.28 0.29 0

F 0.23 0.25 0.11 0.22 0.39 0

C,F 0.22 0.15 0

D 0.37 0.20 0.15 0

E 0.34 0.14 0.28 0.29

C,F 0.22 0.15 0

D 0.37 0.20 0.15 0

(B,E), (C,F) 0.22 0

• Radius: square root of average distance from any point of the

• Machine learning can be summarized as

• This is an analysis for astronomy data. On the

• Note that now the range of Y is (0,1) while in

K – Kernel function (non negative)

You might also like