K-Means Clustering and K-Nearest Neighbors Algorithm
K-Means Clustering and K-Nearest Neighbors Algorithm
K-Nearest Neighbors
Algorithm
K-Means
Clustering
What is K-Means Clustering
Algorithm?
K-means clustering is a method of vector quantization. K-
means clustering aims to partition n observations into k
clusters in which each observation belongs to the cluster
with the nearest mean, serving as a prototype of the
cluster.
3
What is K-Means Clustering
Algorithm?
It is basically a type of unsupervised learning method. An
unsupervised learning method is a method in which we draw
references from datasets consisting of input data without
labeled responses. Generally, it is used as a process to find
meaningful structure, explanatory underlying processes,
generative features, and groupings inherent in a set of
examples.
4
What is
Clustering?
Clustering is the task of dividing the population
or data points into a number of groups such that
data points in the same groups are more similar
to other data points in the same group and
dissimilar to the data points in other groups. It
is basically a collection of objects on the basis
of similarity and dissimilarity between them.
5
For example
The data points in the graph below clustered together can be classified into one
single group. We can distinguish the clusters, and we can identify that there are 3
clusters in the below picture.
6
It is not necessary for clusters to be a spherical.
Such as :
7
Why Clustering?
Clustering is very much important as it determines the
intrinsic grouping among the unlabeled data present. There
are no criteria for a good clustering. It depends on the user,
what is the criteria they may use which satisfy their need.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Clustering Methods:
1. Density-Based Methods :
49
Clustering Methods:
2. Hierarchical Based Methods :
50
Clustering Methods:
3. Partitioning Methods :
51
Clustering Methods:
4. Grid-based Methods :
52
Applications of Clustering in
different fields:
1. Marketing: It can be used to characterize
& discover customer segments for
marketing purposes.
2. Biology: It can be used for classification
among different species of plants and
animals.
3. Libraries: It is used in clustering different
books on the basis of topics and
information.
53
Applications of Clustering in
different fields:
4. Insurance: It is used to acknowledge the
customers, their policies and identifying the
frauds.
5. City Planning: It is used to make groups of
houses and to study their values based on
their geographical locations and other
factors present.
6. Earthquake studies: By learning the
earthquake affected areas we can determine
the dangerous zones.
54
Pseudocode of K-Means
Clustering:
Initialize k means with random values.
For a given number of iterations:
Iterate through items:
55
K-Nearest
Neighbors
What is K-Nearest Neighbors
Algorithm?
In pattern recognition, the k-nearest neighbors algorithm
(k-NN) is a non-parametric method used for classification
and regression. In both cases, the input consists of the k
closest training examples in the feature space. The output
depends on whether k-NN is used for classification or
regression:
57
K-NN Approach
k-NN assumes that all instances are points
in some n-dimensional space and defines
neighbors in terms of distance (usually
Euclidean in R-space).
k is the number of neighbors considered.
58
K-NN Classification
In k-NN classification, the output is a class
membership. An object is classified by a plurality vote
of its neighbors, with the object being assigned to the
class most common among its k nearest neighbors (k is
a positive integer, typically small). If k = 1, then the
object is simply assigned to the class of that single
nearest neighbor.
59
Example of k-NN classification. The test sample (green dot) should be classified
either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the
red triangles because there are 2 triangles and only 1 square inside the inner circle. If
k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles
inside the outer circle).
60
K-NN Regression
In k-NN regression, the output is the property value for
the object. This value is the average of the values
of k nearest neighbors.
61
Thanks!
Any Questions?
You can find us at:
@OurHome
62