0% found this document useful (0 votes)

62 views

Clustering Lecture 1: Basics: Jing Gao

This document provides an overview of clustering basics and techniques. It discusses clustering motivation, definitions, evaluation methods, and applications. The document outlines clustering of gene expression data to find groups of co-expressed genes with similar patterns over time that may indicate co-function or co-regulation. It also covers two important aspects of clustering - properties of input data like defining similarity, and clustering objectives and methodologies.

Uploaded by

Keshav Negi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Clustering Lecture 1: Basics: Jing Gao

Uploaded by

Keshav Negi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Clustering

Lecture 1: Basics

Jing Gao
SUNY Buffalo

1
Class Structure

• Topics
– Clustering, Classification
– Network mining
– Anomaly detection
• Expectation
– Sign-in
– Take quiz in class
– Two more projects on clustering and classification
– One more homework on network mining or anomaly detection
• Website
– http://www.cse.buffalo.edu/~jing/cse601/fa12/

2
Outline
• Basics
– Motivation, definition, evaluation
• Methods
– Partitional,
– Hierarchical
– Density-based
– Mixture model
– Spectral methods
• Advanced topics
– Clustering ensemble
– Clustering in MapReduce
– Semi-supervised clustering, subspace clustering, co-clustering,
etc.

3
Readings

• Tan, Steinbach, Kumar, Chapters 8 and 9.

• Han, Kamber, Pei. Data Mining: Concepts and Techniques.
Chapters 10 and 11.
• Additional readings posted on website

4
Clustering Basics

• Definition and Motivation

• Data Preprocessing and Similarity Computation
• Objective of Clustering
• Clustering Evaluation

5
Clustering
• Finding groups of objects such that the objects in a group will
be similar (or related) to one another and different from (or
unrelated to) the objects in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

6
Application Examples

• A stand-alone tool: explore data distribution

• A preprocessing step for other algorithms
• Pattern recognition, spatial data analysis, image processing,
market research, WWW, …
– Cluster documents
– Cluster web log data to discover groups of similar access
patterns

7
Clustering Co-expressed Genes
Gene Expression Data Matrix Gene Expression Patterns

Co-expressed Genes

Why looking for co-expressed genes?

 Co-expression indicates co-function;
 Co-expression also indicates co-regulation.

8
Gene-based Clustering
1.5

0.5

Expression Value
0

-0.5

-1

-1.5
Time Point

1.5

0.5

Expression Level
0

-0.5

-1

-1.5
Time Points

1.5

0.5

Expression Value
0

-0.5

-1

-1.5
Time Points

Examples of co-expressed genes and coherent

patterns in gene expression data
Iyer’s data [2]

 [2] Iyer, V.R. et al. The transcriptional program in the response of human fibroblasts to serum. Science,
283:83–87, 1999.
9
Other Applications

• Land use: Identification of areas of similar land use in an earth

observation database
• Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
• City-planning: Identifying groups of houses according to their
house type, value, and geographical location
• Climate: understanding earth climate, find patterns of
atmospheric and ocean

10
Two Important Aspects

• Properties of input data

– Define the similarity or dissimilarity between points
• Requirement of clustering
– Define the objective and methodology

11
Clustering Basics

• Definition and Motivation

• Data Preprocessing and Distance computation
• Objective of Clustering
• Clustering Evaluation

12
Data Representation

• Data: Collection of data objects Attributes

and their attributes

• An attribute is a property or Tid Refund Marital Taxable

Status Income Cheat
characteristic of an object
1 Yes Single 125K No
– Examples: eye color of a person,
temperature, etc. 2 No Married 100K No
3 No Single 70K
– Attribute is also known as No

dimension, variable, field, 4 Yes Married 120K No

characteristic, or feature 5 No Divorced 95K Yes
Objects
6 No Married 60K No

• A collection of attributes describe 7 Yes Divorced 220K No

an object 8 No Single 85K Yes

9 No Married 75K No
– Object is also known as record,
point, case, sample, entity, or 10
10 No Single 90K Yes
instance

13
Data Matrix

• Represents n objects with p variables

– An n by p matrix
The value of the i-th
object on the f-th
Attributes attribute

 x11  x1 f  x 
1p 

Objects
      
x  x  x 
 i1 if ip 
      
 
 xn1  xnf  x 
np 

14
Gene Expression Data

sample 1 sample 2 sample 3 sample 4 sample …

• Clustering genes
gene 1 0.13 0.72 0.1 0.57 •Genes are objects
gene 2 0.34 1.58 1.05 1.15 •Experiment conditions are
gene 3 0.43 1.1 0.97 1 attributes
gene 4 1.22 0.97 1 0.85
• Find genes with similar
gene 5 -0.89 1.21 1.29 1.08 function
gene 6 1.1 1.45 1.44 1.12
gene 7 0.83 1.15 1.1 1
gene 8 0.87 1.32 1.35 1.13
gene 9 -0.33 1.01 1.38 1.21
gene 10 0.10 0.85 1.03 1
gene
…

15
Similarity and Dissimilarity

• Similarity
– Numerical measure of how alike two data objects are
– Is higher when objects are more alike
– Often falls in the range [0,1]
• Dissimilarity
– Numerical measure of how different are two data
objects
– Lower when objects are more alike
– Minimum dissimilarity is often 0
– Upper limit varies

16
Distance Matrix

• Represents pairwise distance in n objects

– An n by n matrix
– d(i,j): distance or dissimilarity between objects i and j
– Nonnegative
– Close to 0: similar

 0 
d (2,1) 0 
 
d (3,1) d (3,2) 0 
 
    
d (n,1) d (n,2)   0
17
Data Matrix -> Distance Matrix

s1 s2 s3 s4 …
g1 0.13 0.72 0.1 0.57
g1 g2 g3 g4 …
g2 0.34 1.58 1.05 1.15
g3 0.43 1.1 0.97 1 g1 0 d(1,2) d(1,3) d(1,4)
g4 1.22 0.97 1 0.85 g2 0 d(2,3) d(2,4)
g5 -0.89 1.21 1.29 1.08
g6 1.1 1.45 1.44 1.12 g3 0 d(3,4)
g7 0.83 1.15 1.1 1 g4 0
g8 0.87 1.32 1.35 1.13
g9 …
-0.33 1.01 1.38 1.21
g 10 0.10 0.85 1.03 1
… Distance Matrix
Original Data Matrix

18
Types of Attributes

• Discrete
– Has only a finite or countably infinite set of values
– Examples: zip codes, counts, or the set of words in a collection of
documents
– Note: binary attributes are a special case of discrete attributes
• Ordinal
– Has only a finite or countably infinite set of values
– Order of values is important
– Examples: rankings (e.g., pain level 1-10), grades (A, B, C, D)
• Continuous
– Has real numbers as attribute values
– Examples: temperature, height, or weight
– Continuous attributes are typically represented as floating-point
variables

19
Similarity/Dissimilarity for Simple Attributes

p and q are the attribute values for two data objects.

Discrete

Ordinal

Continuous

Dissimilarity and similarity between p and q

20
Minkowski Distance—Continuous Attribute
• Minkowski distance: a generalization
d (i, j)  q | x  x |q  | x  x |q ... | x  x |q (q  0)
i1 j1 i2 j2 ip jp

• If q = 2, d is Euclidean distance
• If q = 1, d is Manhattan distance

Xi (1,7)
xi
12
8.48
q=2 6 q=1

6
Xj(7,1) xj
21
Standardization

• Calculate the mean absolute deviation

mf  1
n (x1 f  x2 f  ...  xnf )
.

sf  1
n (| x1 f  m f |  | x2 f  m f | ... | xnf  m f |)

• Calculate the standardized measurement (z-score)

xif  m f
zif  sf

22
Mahalanobis Distance
1
d ( p, q)  ( p  q)  ( p  q) T

 is the covariance matrix of the

input data X

1 n
 j ,k   ( X ij  X j )( X ik  X k )
n  1 i 1

Belongs to the family of bregman

divergence

For red points, the Euclidean distance is 14.7, Mahalanobis distance is 6.

23
Mahalanobis Distance
Covariance Matrix:

0.3 0.2
 
 0.2 0.3
C

B A: (0.5, 0.5)
B: (0, 1)
A C: (1.5, 1.5)

Mahal(A,B) = 5
Mahal(A,C) = 4

24
Common Properties of a Distance

• Distances, such as the Euclidean distance, have

some well known properties
1. d(p, q)  0 for all p and q and d(p, q) = 0 only if
p = q. (Positive definiteness)
2. d(p, q) = d(q, p) for all p and q. (Symmetry)
3. d(p, r)  d(p, q) + d(q, r) for all points p, q, and r.
(Triangle Inequality)
where d(p, q) is the distance (dissimilarity) between points
(data objects), p and q.

• A distance that satisfies these properties is a

metric

25
Similarity for Binary Attributes
• Common situation is that objects, p and q, have only
binary attributes
• Compute similarities using the following quantities
M01 = the number of attributes where p was 0 and q was 1
M10 = the number of attributes where p was 1 and q was 0
M00 = the number of attributes where p was 0 and q was 0
M11 = the number of attributes where p was 1 and q was 1

• Simple Matching and Jaccard Coefficients

SMC = number of matches / total number of attributes
= (M11 + M00) / (M01 + M10 + M11 + M00)

J = number of matches / number of not-both-zero attributes values

= (M11) / (M01 + M10 + M11)

26
SMC versus Jaccard: Example

p= 1000000000
q= 0000001001

M01 = 2 (the number of attributes where p was 0 and q was 1)

M10 = 1 (the number of attributes where p was 1 and q was 0)
M00 = 7 (the number of attributes where p was 0 and q was 0)
M11 = 0 (the number of attributes where p was 1 and q was 1)

SMC = (M11 + M00)/(M01 + M10 + M11 + M00) = (0+7) / (2+1+0+7) = 0.7

J = (M11) / (M01 + M10 + M11) = 0 / (2 + 1 + 0) = 0

27
Document Data
• Each document becomes a `term' vector,
– each term is a component (attribute) of the vector,
– the value of each component is the number of times the
corresponding term occurs in the document.

timeout

season
coach

game
score
team

ball

lost
pla

wi
n
y

Document 1 3 0 5 0 2 6 0 2 0 2

Document 2 0 7 0 2 1 0 0 3 0 0

Document 3 0 1 0 0 1 2 2 0 3 0

28
Cosine Similarity
• If d1 and d2 are two document vectors, then
cos( d1, d2 ) = (d1  d2) / ||d1|| ||d2|| ,
where  indicates vector dot product and || d || is the length of vector d.

• Example:
d1 = 3 2 0 5 0 0 0 2 0 0
d2 = 1 0 0 0 0 0 0 1 0 2

d1  d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481
||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2) 0.5 = (6) 0.5 = 2.245

cos( d1, d2 ) = .3150

29
Correlation

• Correlation measures the linear relationship between objects

• To compute correlation, we standardize data objects, p and q,
and then take their dot product (continuous attributes)

pk  ( pk  mean( p)) / std ( p)

qk  (qk  mean(q)) / std (q)

s( p, q)  p  q
30
Common Properties of a Similarity

• Similarities, also have some well known

properties.
1. s(p, q) = 1 (or maximum similarity) only if p = q.

2. s(p, q) = s(q, p) for all p and q. (Symmetry)

where s(p, q) is the similarity between points (data

objects), p and q.

31
Characteristics of the Input Data Are Important

• Sparseness
• Attribute type
• Type of Data
• Dimensionality
• Noise and Outliers
• Type of Distribution
• => Conduct preprocessing and select the appropriate
dissimilarity or similarity measure
• => Determine the objective of clustering and choose
the appropriate method

32
Clustering Basics

• Definition and Motivation

• Data Preprocessing and Distance computation
• Objective of Clustering
• Clustering Evaluation

33
Considerations for Cluster Analysis
• Partitioning criteria
– Single level vs. hierarchical partitioning (often, multi-level hierarchical
partitioning is desirable)
• Separation of clusters
– Exclusive (e.g., one customer belongs to only one region) vs. overlapping
(e.g., one document may belong to more than one topic)

• Hard versus fuzzy

– In fuzzy clustering, a point belongs to every cluster with some weight
between 0 and 1
– Weights must sum to 1
– Probabilistic clustering has similar characteristics

• Similarity measure and data types

• Heterogeneous versus homogeneous
– Cluster of widely different sizes, shapes, and densities
34
Requirements of Clustering

• Scalability
• Ability to deal with different types of attributes
• Minimal requirements for domain knowledge to determine
input parameters
• Able to deal with noise and outliers
• Discovery of clusters with arbitrary shape
• Insensitive to order of input records
• High dimensionality
• Incorporation of user-specified constraints
• Interpretability and usability
• What clustering results we want to get?
35
Notion of a Cluster can be Ambiguous

How many clusters? Six Clusters

Two Clusters Four Clusters

36
Partitional Clustering

Input Data A Partitional Clustering

37
Hierarchical Clustering

p1
p3 p4
p2
p1 p2 p3 p4

Clustering Solution 1

p1
p3 p4
p2
p1 p2 p3 p4

Clustering Solution 2
38
Types of Clusters: Center-Based

• Center-based
– A cluster is a set of objects such that an object in a cluster is closer
(more similar) to the “center” of a cluster, than to the center of any
other cluster
– The center of a cluster is often a centroid, the average of all the
points in the cluster, or a medoid, the most “representative” point
of a cluster

4 center-based clusters

39
Types of Clusters: Density-Based

• Density-based
– A cluster is a dense region of points, which is separated by low-
density regions, from other regions of high density.
– Used when the clusters are irregular or intertwined, and when noise
and outliers are present.

6 density-based clusters

40
Clustering Basics

• Definition and Motivation

• Data Preprocessing and Distance computation
• Objective of Clustering
• Clustering Evaluation

41
Cluster Validation

• Cluster validation
– Quality: “goodness” of clusters
– Assess the quality and reliability of clustering
results

• Why validation?
– To avoid finding clusters formed by chance
– To compare clustering algorithms
– To choose clustering parameters
• e.g., the number of clusters

42
Aspects of Cluster Validation

• Comparing the clustering results to ground truth

(externally known results)
– External Index
• Evaluating the quality of clusters without reference
to external information
– Use only the data
– Internal Index
• Determining the reliability of clusters
– To what confidence level, the clusters are not formed
by chance
– Statistical framework

43
Comparing to Ground Truth

• Notation
– N: number of objects in the data set
– P={P1,…,Ps}: the set of “ground truth” clusters
– C={C1,…,Ct}: the set of clusters reported by a clustering
algorithm
• The “incidence matrix”
– N  N (both rows and columns correspond to objects)
– Pij = 1 if Oi and Oj belong to the same “ground truth” cluster
in P; Pij=0 otherwise
– Cij = 1 if Oi and Oj belong to the same cluster in C; Cij=0
otherwise

44
Rand Index and Jaccard Coefficient

• A pair of data object (Oi,Oj) falls into one of the

following categories
– SS: Cij=1 and Pij=1; (agree)
– DD: Cij=0 and Pij=0; (agree)
– SD: Cij=1 and Pij=0; (disagree)
– DS: Cij=0 and Pij=1; (disagree)

| SS |  | DD |
• Rand index Rand 
| Agree |

| Agree |  | Disagree | | SS |  | SD |  | DS |  | DD |

– may be dominated by DD
• Jaccard Coefficient Jaccard coefficien t 
| SS |
| SS |  | SD |  | DS |

45
Clustering

g1 g2 g3 g4 g5

g1 1 1 1 0 0

g2 1 1 1 0 0 Clustering
g3 1 1 1 0 0 Same Different
Cluster Cluster
g4 0 0 0 1 1 Ground Same
truth 9 4
Cluster
g5 0 0 0 1 1 Different
4 8
Cluster
Groundtruth

g1 g2 g3 g4 g5
| SS |  | DD | 17
g1 1 1 0 0 0 Rand  
| SS |  | SD |  | DS |  | DD | 25
g2 1 1 0 0 0

g3 0 0 1 1 1 | SS | 9
Jaccard  
g4 0 0 1 1 1 | SS |  | SD |  | DS | 17
g5 0 0 1 1 1
46
Entropy and Purity

• Normalized Mutual Information

| Ck  Pj | N  | Ck  Pj |
I (C , P) I (C , P)   log
NMI  k j N | Ck || Pj |
H (C ) H ( P)
| Ck | |C |
H (C )  
| Pj | | Pj |
log k H ( P)   log
k N N j N N 47
Example
P1 P2 P3 P4 P5 P6 Total

C1 3 5 40 506 96 27 677

1
C2 4 7 280 29 39 2 361
Purity   max j | Ck  Pj |
N k
C3 1 1 1 7 4 671 685
506  280  671  162  331  358
C4 10 162 3 119 73 2 369 Purity 
3204
C5 331 22 5 70 13 23 464  0.7203

C6 5 358 12 212 48 13 648

total 354 555 341 943 273 738 3204

| Ck  Pj | N  | Ck  Pj |
NMI 
I (C , P) I (C , P)   log
H (C ) H ( P) k j N | Ck || Pj |
| Ck | |C |
H (C )  
| Pj | | Pj |
log k H ( P)   log
k N N j N N 48
Internal Index

• “Ground truth” may be unavailable

• Use only the data to measure cluster quality
– Measure the “cohesion” and “separation” of clusters
– Calculate the correlation between clustering results
and distance matrix

49
Cohesion and Separation
• Cohesion is measured by the within cluster sum of squares
WSS    ( x  mi ) 2
i xCi

• Separation is measured by the between cluster sum of squares

BSS   Ci (m  mi )2
i
where |Ci| is the size of cluster i, m is the centroid of the whole data set

• BSS + WSS = constant

• WSS (Cohesion) measure is called Sum of Squared Error (SSE)—a
commonly used measure
• A larger number of clusters tend to result in smaller SSE

50
Example

m
  
1 m1 2 3 4 m2 5

WSS (1  3)2  (2  3)2  (4  3)2  (5  3)2  10

K=1 :
BSS  4  (3  3)2  0
Total  10  0  10
WSS (1  1.5) 2  (2  1.5) 2  (4  4.5) 2  (5  4.5) 2  1
K=2 :
BSS  2  (3  1.5) 2  2  (4.5  3) 2  9
Total  1  9  10

K=4: WSS (1  1) 2  (2  2) 2  (4  4) 2  (5  5) 2  0
BSS  1 (1  3) 2  1 (2  3) 2  1 (4  3) 2  1 (5  3) 2  10
Total  0  10  10

51
Silhouette Coefficient
• Silhouette Coefficient combines ideas of both cohesion and separation

• For an individual point, i

– Calculate a = average distance of i to the points in its cluster
– Calculate b = min (average distance of i to points in another cluster)
– The silhouette coefficient for a point is then given by

s = 1 – a/b if a < b, (s = b/a - 1 if a  b, not the usual case)

– Typically between 0 and 1 b

– The closer to 1 the better a

• Can calculate the Average Silhouette width for a cluster or a clustering

52
Correlation with Distance Matrix

• Distance Matrix
– Dij is the similarity between object Oi and Oj
• Incidence Matrix
– Cij=1 if Oi and Oj belong to the same cluster, Cij=0
otherwise
• Compute the correlation between the two
matrices
– Only n(n-1)/2 entries needs to be calculated
• High correlation indicates good clustering

53
Correlation with Distance Matrix

• Given Distance Matrix D = {d11,d12, …, dnn } and Incidence

Matrix C= { c11, c12,…, cnn } .

• Correlation r between D and C is given by

n _ _

 (d
i 1, j 1
ij  d )(cij  c)
r
n _ n _

 (dij  d )
i 1, j 1
2
 ij
( c  c
i 1, j 1
) 2

54
Are There Clusters in the Data?
1 1

0.9 0.9

0.8 0.8

0.7 0.7

Random 0.6 0.6 DBSCAN

Points 0.5 0.5
y

y
0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
1 1

0.9 0.9

K-means 0.8 0.8

Complete
0.7 0.7
Link
0.6 0.6

0.5 0.5
y

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
55
Measuring Cluster Validity Via Correlation

• Correlation of incidence and distance matrices for the K-

means clusterings of the following two data sets

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5
y

y
0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x

Corr = -0.9235 Corr = -0.5810

56
Using Similarity Matrix for Cluster Validation

• Order the similarity matrix with respect to cluster

labels and inspect visually.
1
1
10 0.9
0.9
20 0.8
0.8
30 0.7
0.7
40 0.6
0.6

Points
50 0.5
0.5
y

60 0.4
0.4
70 0.3
0.3
80 0.2
0.2
90 0.1
0.1
100 0
0 20 40 60 80 100 Similarity
0 0.2 0.4 0.6 0.8 1
Points
x

57
Using Similarity Matrix for Cluster Validation

• Clusters in random data are not so crisp

1
1
10 0.9
0.9
20 0.8
0.8
30 0.7
0.7
40 0.6
0.6

Points
50 0.5
0.5
y

60 0.4
0.4
70 0.3
0.3
80 0.2
0.2
90 0.1
0.1
100 0
0 20 40 60 80 100 Similarity
0 0.2 0.4 0.6 0.8 1
Points
x

58
Reliability of Clusters

• Need a framework to interpret any measure

– For example, if our measure of evaluation has the

value, 10, is that good, fair, or poor?

• Statistics provide a framework for cluster validity

– The more “atypical” a clustering result is, the more
likely it represents valid structure in the data

59
Statistical Framework for SSE
• Example
– Compare SSE of 0.005 against three clusters in random data
– SSE Histogram of 500 sets of random data points of size 100—
lowest SSE is 0.0173
1
50
0.9
45
0.8
40
0.7
35
0.6
30

Count
0.5
y

25
0.4
20
0.3
15
0.2
10
0.1
5
0
0 0.2 0.4 0.6 0.8 1 0
0.016 0.018 0.02 0.022 0.024 0.026 0.028 0.03 0.032 0.034
x SSE

SSE = 0.005
60
Determine the Number of Clusters Using SSE

• SSE curve
10

6 9

8
4
7

2 6

SSE
5
0
4
-2 3

2
-4
1
-6 0
2 5 10 15 20 25 30
5 10 15
K

Clustering of Input Data SSE wrt K

61
Take-away Message

• What’s clustering?
• Why clustering is important?
• How to preprocess data and compute
dissimilarity/similarity from data?
• What’s a good clustering solution?
• How to evaluate the clustering results?

Injection Pump Calibration Data: 1. Test Conditions
86% (14)
Injection Pump Calibration Data: 1. Test Conditions
2 pages
Ottoman Bows - An Assessment of Draw Weight, Performance and Tactical Use
No ratings yet
Ottoman Bows - An Assessment of Draw Weight, Performance and Tactical Use
11 pages
02data Part4
No ratings yet
02data Part4
28 pages
Similarity
No ratings yet
Similarity
19 pages
X Chapter 02 Data
No ratings yet
X Chapter 02 Data
67 pages
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
No ratings yet
Mbict 111 - 162 - 2021 - 11 - 14032021 - 3236
30 pages
Chapter - 2 Data Mining
No ratings yet
Chapter - 2 Data Mining
21 pages
Lec 5
No ratings yet
Lec 5
24 pages
Data Science: Department of Computer Science & Engineering
No ratings yet
Data Science: Department of Computer Science & Engineering
31 pages
Lecture 10
No ratings yet
Lecture 10
26 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Knowing Your Data
No ratings yet
Knowing Your Data
43 pages
Data Similarity
0% (1)
Data Similarity
18 pages
Chapter 2: Getting To Know Your Data
No ratings yet
Chapter 2: Getting To Know Your Data
30 pages
Introduction To Data Science: Tom A S Horv Ath
No ratings yet
Introduction To Data Science: Tom A S Horv Ath
39 pages
DMi_03-Proximity
No ratings yet
DMi_03-Proximity
51 pages
Lecture 3-Know Your Data - M
No ratings yet
Lecture 3-Know Your Data - M
19 pages
Lecture 8-9 - Clustering
No ratings yet
Lecture 8-9 - Clustering
43 pages
Similarty and Dissimilarity
No ratings yet
Similarty and Dissimilarity
11 pages
2 Similarity Disimilarity Measure
No ratings yet
2 Similarity Disimilarity Measure
35 pages
Class-Data Preprocessing-IV
No ratings yet
Class-Data Preprocessing-IV
28 pages
DM - Topic Four - Part III (Autosaved)
No ratings yet
DM - Topic Four - Part III (Autosaved)
67 pages
Similarity and Dissimilarity
No ratings yet
Similarity and Dissimilarity
34 pages
CSE-1-PPT-MiniTest-12feb24-Similarity (6)
No ratings yet
CSE-1-PPT-MiniTest-12feb24-Similarity (6)
11 pages
03 - Data Mining
No ratings yet
03 - Data Mining
37 pages
29.measuring Data Similarity and Dissimilarity Introduction
No ratings yet
29.measuring Data Similarity and Dissimilarity Introduction
43 pages
L13
No ratings yet
L13
19 pages
DS5 Statistics
No ratings yet
DS5 Statistics
67 pages
DWDM Unit6-Data Similarity Measures
No ratings yet
DWDM Unit6-Data Similarity Measures
40 pages
Lecture 2. Similarity Measures For Cluster Analysis
No ratings yet
Lecture 2. Similarity Measures For Cluster Analysis
31 pages
Sess02 Data
No ratings yet
Sess02 Data
96 pages
DM 10,11 Clustering PDF
No ratings yet
DM 10,11 Clustering PDF
65 pages
Cluster Analysis Introduction (Unit-6)
No ratings yet
Cluster Analysis Introduction (Unit-6)
20 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Clustering
0% (1)
Clustering
127 pages
Materi 7.1. Distance Measurement
No ratings yet
Materi 7.1. Distance Measurement
14 pages
UNIT V DWM Notes
No ratings yet
UNIT V DWM Notes
18 pages
Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
26 pages
CS822-DataMining-Week4 (2)
No ratings yet
CS822-DataMining-Week4 (2)
45 pages
Clustering and Association Rule
No ratings yet
Clustering and Association Rule
69 pages
lec01-dataprep
No ratings yet
lec01-dataprep
67 pages
Unit- 4 DMA
No ratings yet
Unit- 4 DMA
145 pages
Week 5 - Data Mining Exploring Data With R
No ratings yet
Week 5 - Data Mining Exploring Data With R
146 pages
DWM UNIT-VI (2)
No ratings yet
DWM UNIT-VI (2)
30 pages
Cluster Analysis Introduction
No ratings yet
Cluster Analysis Introduction
23 pages
Wk. 3. Data (12-05-2021)
No ratings yet
Wk. 3. Data (12-05-2021)
57 pages
Chap 5 1 NN Classification
0% (1)
Chap 5 1 NN Classification
22 pages
ITS665dm Topic2-DataUnderstanding
No ratings yet
ITS665dm Topic2-DataUnderstanding
53 pages
Lec2 2-Dataset2
No ratings yet
Lec2 2-Dataset2
29 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y
No ratings yet
Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y
79 pages
Similarity
No ratings yet
Similarity
20 pages
Similarity
No ratings yet
Similarity
20 pages
III-IT-Data Mining Unit 1-Session 3
No ratings yet
III-IT-Data Mining Unit 1-Session 3
21 pages
CSC_522_Lecture10_5f0e8c83dce359ee001691c737303b46
No ratings yet
CSC_522_Lecture10_5f0e8c83dce359ee001691c737303b46
30 pages
CS2209 Similarity Distances
No ratings yet
CS2209 Similarity Distances
23 pages
Lab 2
No ratings yet
Lab 2
21 pages
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
No ratings yet
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
57 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
Cluster Analysis and DBSCAN
No ratings yet
Cluster Analysis and DBSCAN
44 pages
The Manning House Saga
From Everand
The Manning House Saga
Anne Pinchera
No ratings yet
XS-4222 4100XPC Assembly Procedure
100% (4)
XS-4222 4100XPC Assembly Procedure
8 pages
Foundry Technology: Reference Books
No ratings yet
Foundry Technology: Reference Books
34 pages
Working With Non-Ideal Gases PDF
No ratings yet
Working With Non-Ideal Gases PDF
3 pages
Board of Examiners 0416102089
No ratings yet
Board of Examiners 0416102089
2 pages
Template Six Monthly Progress Report (MS)
100% (1)
Template Six Monthly Progress Report (MS)
3 pages
Atomic Theory Science Presentation Colorful 3D Style - 20240609 - 160039 - 0000
No ratings yet
Atomic Theory Science Presentation Colorful 3D Style - 20240609 - 160039 - 0000
25 pages
Nick Kollerstrom Work of Martineau PDF
No ratings yet
Nick Kollerstrom Work of Martineau PDF
13 pages
Pi - Trahair - Inelastic Torsion of Steel I Beams
No ratings yet
Pi - Trahair - Inelastic Torsion of Steel I Beams
12 pages
14 Singly Reinforced Beam 03
No ratings yet
14 Singly Reinforced Beam 03
15 pages
Mechanical Design Exit Exam Reviewer PDF
100% (1)
Mechanical Design Exit Exam Reviewer PDF
70 pages
Calibration of Sensors
No ratings yet
Calibration of Sensors
5 pages
Gollent Marcin Landscape Creation and
No ratings yet
Gollent Marcin Landscape Creation and
69 pages
Solution of 2D Convection-Diffusion Transient Problems by A Fractional-Step FE Method
No ratings yet
Solution of 2D Convection-Diffusion Transient Problems by A Fractional-Step FE Method
11 pages
01 Grade 5 LP
100% (1)
01 Grade 5 LP
4 pages
1 s2.0 S0378778816316528 Main
No ratings yet
1 s2.0 S0378778816316528 Main
8 pages
Planar Graphs. Graph Colourings. Chromatic Polynomials: Isabela DR Amnesc UVT
No ratings yet
Planar Graphs. Graph Colourings. Chromatic Polynomials: Isabela DR Amnesc UVT
56 pages
Kick Tolerance.
No ratings yet
Kick Tolerance.
9 pages
0id Safety
No ratings yet
0id Safety
264 pages
Types of Ijarah
No ratings yet
Types of Ijarah
2 pages
Unit II Basic Internetworking
No ratings yet
Unit II Basic Internetworking
73 pages
Compressor Valves - Questions and Answers How and Why Compressor Valves Fail
No ratings yet
Compressor Valves - Questions and Answers How and Why Compressor Valves Fail
8 pages
Tutorial 1 Integration
No ratings yet
Tutorial 1 Integration
1 page
MC Manuel - Gardner - Fernandes Pickup Music V1 PDF
88% (8)
MC Manuel - Gardner - Fernandes Pickup Music V1 PDF
32 pages
Chemistry MCQ
No ratings yet
Chemistry MCQ
3 pages
3rd SEM SYLLABUS..
No ratings yet
3rd SEM SYLLABUS..
24 pages
Plumbing Systems 3-Pipe Size Calculation - Pump Design
100% (1)
Plumbing Systems 3-Pipe Size Calculation - Pump Design
16 pages
Practice Questions 1
No ratings yet
Practice Questions 1
2 pages
VLT Autom Drive Ethernet Manual en MG90J102
No ratings yet
VLT Autom Drive Ethernet Manual en MG90J102
60 pages