0% found this document useful (0 votes)

124 views41 pages

Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF

The document discusses unsupervised learning methods for partitioning data into clusters. It describes k-means clustering, which groups data by minimizing distances between points and cluster centroids. The k-means algorithm initializes centroids and assigns points to the nearest centroids, then recomputes centroids and reassigns points in iterations until centroids are stable. The document also discusses k-nearest neighbors classification, which classifies new points based on the classes of the k closest training points.

Uploaded by

Tayyaba Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views41 pages

Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF

Uploaded by

Tayyaba Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Unsupervised Learning: Partitioning Methods

CS 822 Data Mining

Anis ur Rahman
Department of Computing
NUST-SEECS
Islamabad

December 3, 2018

1 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Roadmap

1 Basic Concepts
2 K-Means
3 K-Medoids

2 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Cluster Analysis

Unsupervised learning (i.e., Class label is unknown)

Group data to form new categories (i.e., clusters), e.g., cluster
houses to find distribution patterns

Principle
Maximizing intra-class similarity & minimizing interclass similarity

Typical Applications
WWW, Social networks, Marketing, Biology, Library, etc.
3 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Partitioning Methods

Given
A data set of n objects
K the number of clusters to form
Organize the objects into k partitions (k ≤ n) where each partition
represents a cluster
The clusters are formed to optimize an objective partitioning
criterion
Objects within a cluster are similar
Objects of different clusters are dissimilar

4 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Lazy learning vs. Eager learning

1 Eager learning
Given a set of training set, constructs a classification model before
receiving new (e.g., test) data to classify
e.g. decision tree induction, Bayesian classification, rule-based
classification
2 Lazy learning
Simply stores training data (or only minor processing) and waits
until it is given a new instance
Lazy learners take less time in training but more time in predicting
e.g., k-nearest-neighbor classifiers, case-based reasoning classifiers

5 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Example Problem: Face Recognition

We have a database of (say) 1 million face images

We are given a new image and want to find the most similar
images in the database
Represent faces by (relatively) invariant values, e.g., ratio of nose
width to eye width
Each image represented by a large number of numerical features
Problem: given the features of a new face, find those in the DB that
are close in at least ¾ (say) of the features

6 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Lazy Learning

Typical approaches of lazy learning:

1 k-nearest neighbor approach
Instances represented as points in a Euclidean space.
2 Case-based reasoning
Uses symbolic representations and knowledge-based inference
3 Locally weighted regression
Constructs local approximation

7 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Roadmap

1 Basic Concepts
2 K-Means
3 K-Medoids

8 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means

Goal
Create 3 clusters (partitions)

1 Choose 3 objects (cluster centroids)

2 Assign each object to the closest
centroid to form Clusters

3 Update cluster centroids

9 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means

Goal
Create 3 clusters (partitions)

4 Recompute Clusters

5 If Stable centroids, then stop

10 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means Algorithm

Input
K : the number of clusters
D : a data set containing n objects
Output: A set of k clusters
Method:
1 Arbitrary choose k objects from D as in initial cluster centers
2 Repeat
3 Reassign each object to the most similar cluster based on the mean
value of the objects in the cluster
4 Update the cluster means
5 Until no change

11 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means Properties

The algorithm attempts to determine k partitions that minimize the

square-error function
k X
X
E= |p − mi |2
i =1 p∈Ci

E : the sum of the squared error for all objects in the dataset
P : the data point in the space representing an object
mi : is the mean of cluster Ci
It works well when the clusters are compact clouds that are rather
well separated from one another

12 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means: Advantages

K-means is relatively scalable and efficient in processing large

data sets
The computational complexity of the algorithm is O (nkt)
n: the total number of objects
k : the number of clusters
t: the number of iterations
Normally: k << n and t << n

13 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means: Disadvantages

Can be applied only when the mean of a cluster is defined

Users need to specify k
K-means is not suitable for discovering clusters with nonconvex
shapes or clusters of very different size
It is sensitive to noise and outlier data points (can influence the
mean value)

14 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Means demo

Demo

15 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Variations of K-Means

A few variants of the k-means which differ in

Selection of the initial k means
Dissimilarity calculations
Strategies to calculate cluster means
How can we change K-Means to deal with categorical data?
Handling categorical data: k-modes (Huang’98)
Replacing means of clusters with modes
Using new dissimilarity measures to deal with categorical objects
Using a frequency-based method to update modes of clusters
A mixture of categorical and numerical data

16 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

first described in the early 1950s

It has since been widely used in the area of pattern recognition
The training instances are described by n attributes
Each instance represents a point in an n-dimensional space
A k-nearest-neighbor classifier searches the pattern space for the
k training instances that are closest to the unknown instance

17 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

Example:
We are interested in classifying the type of drug a patient should
be prescribed
Based on the age of the patient and the patient’s
sodium/potassium ratio (Na/K)
Dataset includes 200 patients

18 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Scatter plot

19 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Close-up of neighbors to new patient 2

Main questions:
How many neighbors should we consider? That is, what is k?
How do we measure distance?
Should all points be weighted equally, or should some points have
more influence than others?

20 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

The nearest neighbor can be defined in terms of Euclidean

distance, dist(X1 , X2 )
The Euclidean distance between two points or instances, say,
X1 = (x11 , x12 , · · · , x1n ) and X2 = (x21 , x22 , · · · , x2n ), is:
v
t n
X
dist(X1 , X2 ) = (x1n − x2n )2
i =1

Nominal attributes: distance either 0 or 1

Refer to cluster analysis for more distance metrics

21 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

Typically, we normalize the values of each attribute in advanced.

This helps prevent attributes with initially large ranges (such as
income) from outweighing attributes with initially smaller ranges
(such as binary attributes).
v − minA
v0 =
maxA − minA

Min-max normalization:
all attribute values lie between 0 and 1
For more information on normalization methods refer to data
preprocessing section

22 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

For k-nearest-neighbor classification, the unknown instance is

assigned the most common class among its k nearest neighbors
When k = 1, the unknown instance is assigned the class of the
training instance that is closest to it in pattern space
Nearest-neighbor classifiers can also be used for prediction, that
is, to return a real-valued prediction for a given unknown instance
In this case, the classifier returns the average value of the
real-valued labels associated with the k nearest neighbors of the
unknown instance

23 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

Distances for categorical attributes:

A simple method is to compare the corresponding value of the
attribute in instance X1 with that in instance X2
If the two are identical (e.g., instances X1 and X2 both have the
color blue), then the difference between the two is taken as 0,
otherwise 1
Other methods may incorporate more sophisticated schemes for
differential grading (e.g., where a difference score is assigned, say,
for blue and white than for blue and black)

24 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

Handling missing values:

In general, if the value of a given attribute A is missing in instance
X1 and/or in instance X2 , we assume the maximum possible
difference
For categorical attributes, we take the difference value to be 1 if
either one or both of the corresponding values of A are missing
If A is numeric and missing from both instances X1 and X2 , then
the difference is also taken to be 1
If only one value is missing and the other (which we’ll call v 0 ) is
present and normalized, then we can take the difference to be either
|1 − v 0 | or |0 − v 0 |, whichever is greater

25 / 41
Unsupervised Learning: Nearest-Neighbor Classification

k-Nearest-Neighbor Classifiers

Determining a good value for k:

k can be determined experimentally.
Starting with k = 1, we use a test set to estimate the error rate of
the classifier.
This process can be repeated each time by incrementing k to allow
for one more neighbor.
The k value that gives the minimum error rate may be selected.
In general, the larger the number of training instances is, the
larger the value of k will be

26 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Finding nearest neighbors efficiently

Simplest way of finding nearest neighbor: linear scan of the data

Classification takes time proportional to the product of the number
of instances in training and test sets
Nearest-neighbor search can be done more efficiently using
appropriate methods
kD-trees (k-dimensional trees) represent training data in a tree
structure

27 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Roadmap

1 Basic Concepts
2 K-Means
3 K-Medoids

28 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Medoids Method

Minimize the sensitivity of k-means to outliers

Pick actual objects to represent clusters instead of mean values
Each remaining object is clustered with the representative object
(Medoid) to which is the most similar
The algorithm minimizes the sum of the dissimilarities between
each object and its corresponding representative object
k X
X
E= |p − oi |
i −1 p∈Ci

E : the sum of absolute error for all objects in the data set
P : the data point in the space representing an object
Oi : is the representative object of cluster Ci

29 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Medoids: The idea

Initial representatives are chosen randomly

The iterative process of replacing representative objects by no
representative objects continues as long as the quality of the
clustering is improved
For each representative Object O
For each non-representative object R , swap O and R
Choose the configuration with the lowest cost
Cost function is the difference in absolute error-value if a current
representative object is replaced by a non-representative object

30 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Data Objects

A1 A2
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5
O10 7 6 Goal: create two clusters
Choose randmly two medoids
O2 = (3, 4)
O8 = (7, 4)

31 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Data Objects

A1 A2
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5 Assign each object to the closest representative object
O10 7 6
Using L1 Metric (Manhattan), we form the following
clusters

Cluster1 = {O1 , O2 , O3 , O4 }
Cluster2 = {O5 , O6 , O7 , O8 , O9 , O10 }

32 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Data Objects

A1 A2
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4 Compute the absolute error criterion [for the set of Medoids
O7 7 3 (O2 ,O8 )]
O8 7 4
O9 8 5 k X
X
O10 7 6 E= |p − oi |
i =1 p∈Ci
= |O1 − O2 | + |O3 − O2 | + |O4 − O2 | + |O5 − O8 |+
|O6 − O8 | + |O7 − O8 | + |O9 − O8 | + |O10 − O8 |
= (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20
33 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Data Objects

A1 A2
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5
O10 7 6 Choose a random object O7
Swap O8 and O7
Compute the absolute error criterion [for the set of
Medoids (O2 , O7 )]

34 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Data Objects

A1 A2
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5 Compute the cost function
O10 7 6 Absolute error [for O2 ,O7 ] - Absolute error [O2 ,O8 ]

S = 22 − 20

S > 0 ⇒ it is a bad idea to replace O8 by O7

35 / 41
Unsupervised Learning: Nearest-Neighbor Classification

Data Objects

A1 A2
O1 2 6
O2 3 4
O3 3 8
O4 4 7
O5 6 2
O6 6 4
O7 7 3
O8 7 4
O9 8 5
O10 7 6 In this example, changing the medoid of cluster 2 did
not change the assignments of objects to clusters.
What are the possible cases when we replace a medoid
by another object?

36 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Medoids

First case
Currently P assigned to A
The assignment of P to A does not
change

Second case
Currently P assigned to B
P is reassigned to A

37 / 41
Unsupervised Learning: Nearest-Neighbor Classification

K-Medoids

Third case
Currently P assigned to B
P is reassigned to the new B

Fourth case
Currently P assigned to A
P is reassigned to A

38 / 41
Unsupervised Learning: Nearest-Neighbor Classification

PAM: Partitioning Around Medoids

Input
K : the number of clusters
D : a data set containing n objects
Output: A set of k clusters
Method:
1 Arbitrary choose k objects from D as representative objects (seeds)
2 Repeat
3 Assign each remaining object to the cluster with the nearest
representative object
4 For each representative object Oj
5 Randomly select a non representative object Orandom
6 Compute the total cost S of swapping representative object Oj with
Orandom
7 if S < 0 then replace Oj with Orandom
8 Until no change

39 / 41
Unsupervised Learning: Nearest-Neighbor Classification

PAM Properties

The complexity of each iteration is O (k (n − k )2 )

For large values of n and k , such computation becomes very costly
Advantages
K-Medoids method is more robust than k-Means in the presence of
noise and outliers
Disadvantages
K-Medoids is more costly than k-Means
Like k-means, k-medoids requires the user to specify k
It does not scale well for large data sets

40 / 41
Unsupervised Learning: Nearest-Neighbor Classification

References

J. Han, M. Kamber, Data Mining: Concepts and Techniques, Elsevier

Inc. (2006). (Chapter 6)
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques, 2nd Edition, Elsevier Inc., 2005. (Chapter 6)

41 / 41

Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
K-Nearest Neighbourhood
100% (1)
K-Nearest Neighbourhood
7 pages
CSE225 Course Outline Fall2021
No ratings yet
CSE225 Course Outline Fall2021
4 pages
K-Nearest Neighbor Algorithm
100% (1)
K-Nearest Neighbor Algorithm
6 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
130 pages
Lazy Learners Unit 2
No ratings yet
Lazy Learners Unit 2
26 pages
Kmeans
No ratings yet
Kmeans
74 pages
Clustering
No ratings yet
Clustering
24 pages
Question Bank For NN
No ratings yet
Question Bank For NN
6 pages
K Nearest Neighbours
No ratings yet
K Nearest Neighbours
12 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
24 pages
Lecture 12
No ratings yet
Lecture 12
15 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
14-15 ASAP Advanced Statistics Clasification Techniques KNN
No ratings yet
14-15 ASAP Advanced Statistics Clasification Techniques KNN
49 pages
Week 10
No ratings yet
Week 10
41 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
11 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Lab 2. Binomial Heaps and Fibonacci Heaps
No ratings yet
Lab 2. Binomial Heaps and Fibonacci Heaps
16 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Instance-Based Learning Introduction To Data Mining, 2 Edition
17 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
Lecture 3 - KNN Algorithm
No ratings yet
Lecture 3 - KNN Algorithm
28 pages
The Streamlined Simplex Method: An Example
No ratings yet
The Streamlined Simplex Method: An Example
5 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Unit IV
No ratings yet
Unit IV
96 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Notes On Pattern Recognition: What Is It? What Is Feature Extraction?
No ratings yet
Notes On Pattern Recognition: What Is It? What Is Feature Extraction?
4 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Week 11
No ratings yet
Week 11
49 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Week 9
No ratings yet
Week 9
66 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K Means
No ratings yet
K Means
25 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Lesson 3.6 - Supervised Learning Neural Networks PDF
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks PDF
97 pages
Lesson 2.1 - Know Your Data PDF
No ratings yet
Lesson 2.1 - Know Your Data PDF
43 pages
Lesson 3.3 - Supervised Learning Rule Based Classification
No ratings yet
Lesson 3.3 - Supervised Learning Rule Based Classification
43 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods
32 pages
Sequential Quadratic Programming
No ratings yet
Sequential Quadratic Programming
52 pages
Ds Anurag Sir
No ratings yet
Ds Anurag Sir
383 pages
Lesson 2.2 - Frequent Pattern Analysis
No ratings yet
Lesson 2.2 - Frequent Pattern Analysis
54 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Chapter No. 16
No ratings yet
Chapter No. 16
27 pages
DS Unit 1 Lab Programs
No ratings yet
DS Unit 1 Lab Programs
57 pages
Lesson 3.1 - Supervised Learning Decision Trees
No ratings yet
Lesson 3.1 - Supervised Learning Decision Trees
51 pages
Chapter No. 17
No ratings yet
Chapter No. 17
33 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
64 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
Lesson 3.6 - Supervised Learning Neural Networks
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks
35 pages
Lesson 3.2 - Supervised Learning Evaluation
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation
31 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Blocplan Theory
No ratings yet
Blocplan Theory
48 pages
An Efficient Methodology To Sort Large Volume of Data
No ratings yet
An Efficient Methodology To Sort Large Volume of Data
5 pages
Mahmud RAHMAN - 2.1 Workbook
No ratings yet
Mahmud RAHMAN - 2.1 Workbook
33 pages
Chapter No. 9
No ratings yet
Chapter No. 9
19 pages
Chapter 4 (Ii) - Divide and Conquer
No ratings yet
Chapter 4 (Ii) - Divide and Conquer
71 pages
CG Mini Project Report Kyashawanth
No ratings yet
CG Mini Project Report Kyashawanth
5 pages
Algorithms Overview
No ratings yet
Algorithms Overview
30 pages
AI Unit 2
No ratings yet
AI Unit 2
41 pages
Chapter 5 Queue
No ratings yet
Chapter 5 Queue
7 pages
Lecture 11 Knut H Morris Pratt
No ratings yet
Lecture 11 Knut H Morris Pratt
27 pages
7 Trees
No ratings yet
7 Trees
30 pages
Machine Learning Week 4
No ratings yet
Machine Learning Week 4
24 pages
Queue
No ratings yet
Queue
6 pages
Algorithms
No ratings yet
Algorithms
10 pages
Lec 18 Deadlock Detection and Recovery
No ratings yet
Lec 18 Deadlock Detection and Recovery
14 pages
(COMP1942) (2022) (S) Midterm Thliai 91588
No ratings yet
(COMP1942) (2022) (S) Midterm Thliai 91588
13 pages
May 23 Q13 MS
No ratings yet
May 23 Q13 MS
7 pages
CDT-22 Interval Trees
No ratings yet
CDT-22 Interval Trees
7 pages
Naive String Matching
No ratings yet
Naive String Matching
2 pages
Warshal Algorithm
No ratings yet
Warshal Algorithm
3 pages
IR-19 Asgmnt02 PDF
No ratings yet
IR-19 Asgmnt02 PDF
1 page
NUST National University of Sciences and Technology (NUST) School of Electrical Engineering and Computer Science (SEECS)
No ratings yet
NUST National University of Sciences and Technology (NUST) School of Electrical Engineering and Computer Science (SEECS)
1 page
DSA Checklist
No ratings yet
DSA Checklist
1 page
Tut - 4 Divide and Conquer
No ratings yet
Tut - 4 Divide and Conquer
1 page
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF

Uploaded by

Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF

Uploaded by

Unsupervised Learning: Partitioning Methods

CS 822 Data Mining

Unsupervised learning (i.e., Class label is unknown)

Lazy learning vs. Eager learning

Example Problem: Face Recognition

We have a database of (say) 1 million face images

Typical approaches of lazy learning:

1 Choose 3 objects (cluster centroids)

3 Update cluster centroids

5 If Stable centroids, then stop

The algorithm attempts to determine k partitions that minimize the

K-means is relatively scalable and efficient in processing large

Can be applied only when the mean of a cluster is defined

A few variants of the k-means which differ in

first described in the early 1950s

Close-up of neighbors to new patient 2

The nearest neighbor can be defined in terms of Euclidean

Nominal attributes: distance either 0 or 1

Typically, we normalize the values of each attribute in advanced.

For k-nearest-neighbor classification, the unknown instance is

Distances for categorical attributes:

Handling missing values:

Determining a good value for k:

Finding nearest neighbors efficiently

Simplest way of finding nearest neighbor: linear scan of the data

Minimize the sensitivity of k-means to outliers

K-Medoids: The idea

Initial representatives are chosen randomly

S > 0 ⇒ it is a bad idea to replace O8 by O7

PAM: Partitioning Around Medoids

The complexity of each iteration is O (k (n − k )2 )

J. Han, M. Kamber, Data Mining: Concepts and Techniques, Elsevier

You might also like