0% found this document useful (0 votes)

2K views59 pages

SOLVED NUMERICALS EXAMPLES in Machine Learning

1) The document provides examples and explanations for numerical implementations of various machine learning algorithms including KNN, decision trees, metrics, Naive Bayes, K-means clustering, hierarchical clustering, and dimensionality reduction techniques. 2) A step-by-step example is given for implementing the KNN algorithm to classify a new paper sample using a training dataset containing acid durability, strength, and classification labels. 3) The ID3 decision tree algorithm is explained and an example is given using a weather dataset to classify whether to play tennis based on outlook, temperature, humidity, wind, and the classification label. Entropy and information gain calculations are shown to determine the most important attribute to split on at each node.

Uploaded by

Yash Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views59 pages

SOLVED NUMERICALS EXAMPLES in Machine Learning

Uploaded by

Yash Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

1

NEUMERICALS SOLVED EXAMPLES

S.No. Topic Page No.

1 KNN 2
2 Decision tree 5
3 METRICS 11
4 Naïve Bayes 22
5 K-means clustering 30
6 Hierarchical clustering 40
7 Dimensionality reduction techniques (PCA) 51

KNN Numerical Example (hand computation)

Numerical Example of K Nearest Neighbor Algorithm

Here is step by step on how to compute K-nearest neighbors KNN algorithm:
1. Determine parameter K = number of nearest neighbors
2. Calculate the distance between the query-instance and all the training samples
3. Sort the distance and determine nearest neighbors based on the K-th minimum distance
4. Gather the category y of the nearest neighbors
5. Use simple majority of the category of nearest neighbors as the prediction value of the query
instance

Example
We have data from the questionnaires survey (to ask people opinion) and objective testing with two
attributes (acid durability and strength) to classify whether a special paper tissue is good or not. Here is
four training samples
X2 = Strength
Y=
X1 = Acid Durability

(seconds) (kg/square Classification

meter)

7 7 Bad

7 4 Bad

3 4 Good

1 4 Good

Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7. Without
another expensive survey, can we guess what the classification of this new tissue is?

1. Determine parameter K = number of nearest neighbors

Suppose use K = 3
2. Calculate the distance between the query-instance and all the training samples

Coordinate of query instance is (3, 7), instead of calculating the distance we compute square distance
which is faster to calculate (without square root)
3

X2 = Strength
X1 = Acid Durability Square Distance to query instance
(seconds) (kg/square (3,7)
meter)
7 7 (7-3)2+(7-7)2=16

7 4 (7-3)2+(4-7)2=25

3 4 (3-3)2+(4-7)2=9

1 4 (1-3)2+(4-7)2=13

3. Sort the distance and determine nearest neighbors based on the K-th minimum distance

X2 = Strength
X1 = Acid Durability Square Distance to query instance Rank minimum
(seconds) (kg/square (3, 7) distance Is it included in 3-
meter) Nearest neighbors?
7 7 (7-3)2+(7-7)2=16 3 Yes

7 4 (7-3)2+(4-7)2=25 4 No

3 4 (3-3)2+(4-7)2=9 1 Yes

1 4 (1-3)2+(4-7)2=13 2 Yes

4. Gather the category Y of the nearest neighbors. Notice in the second row last column that the category
of nearest neighbor (Y) is not included because the rank of this data is more than 3 (=K).
4

X1 = Acid X2 =Strength Square Distance Rank Is it included Y = Category

Durability to query minimum in 3-Nearest of nearest
instance (3, 7) neighbors? Neighbor
(seconds) (kg/square distance
meter)
7 7 (7-3)2+(7-7)2=16 3 yes Bad
7 4 (7-3)2+(4-7)2=25 4 no -
3 4 (3-3)2+(4-7)2=9 1 yes good
1 4 (1-3)2+(4-7)2=13 2 yes good

5. Use simple majority of the category of nearest neighbors as the prediction value of the query instance

We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass laboratory test
with X1 = 3 and X2 = 7 is included in Good category.
5

Decision Tree

Data set
For instance, the following table informs about decision making factors to play
tennis at outside for previous 14 days.

Da Outlook Temp Humidit Wind Decisio

y . y n

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

We can summarize the ID3 algorithm as illustrated below

Entropy(S) = ∑ – p(I) . log2p(I)

Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

.Entropy

We need to calculate the entropy first. Decision column consists of 14 instances and includes two
labels: yes and no. There are 9 decisions labeled yes, and 5 decisions labeled no.

Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

Now, we need to find the most dominant factor for decisioning.

Wind factor on decision

Gain(Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) . Entropy(Decision|Wind) ]

Wind attribute has two labels: weak and strong. We would reflect it to the formula.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .

Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ]

Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively.

Weak wind factor on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

13 Overcast Hot Normal Weak Yes

There are 8 instances for weak wind. Decision of 2 items are no and 6 items are yes as illustrated
below.
7

1- Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)

2- Entropy(Decision|Wind=Weak) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811

Strong wind factor on decision

Day Outlook Temp. Humidity Wind Decision

2 Sunny Hot High Strong No

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

14 Rain Mild High Strong No

Here, there are 6 instances for strong wind. Decision is divided into two equal parts.

1- Entropy(Decision|Wind=Strong) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)

2- Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Now, we can turn back to Gain(Decision, Wind) equation.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .

Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ] = 0.940 – [ (8/14) . 0.811 ] – [ (6/14). 1] = 0.048

Calculations for wind column is over. Now, we need to apply same calculations for other
columns to find the most dominant factor on decision.

Other factors on decision

We have applied similar calculation on the other columns.

1- Gain(Decision, Outlook) = 0.246

2- Gain(Decision, Temperature) = 0.029

3- Gain(Decision, Humidity) = 0.151

As seen, outlook factor on decision produces the highest score. That’s why, outlook decision will
appear in the root node of the tree.

Root decision on the tree

Now, we need to test dataset for custom subsets of outlook attribute.

Overcast outlook on decision

Basically, decision will always be yes if outlook were overcast.

Day Outlook Temp. Humidity Wind Decision

3 Overcast Hot High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

Sunny outlook on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5
percent yes.

1- Gain(Outlook=Sunny|Temperature) = 0.570

2- Gain(Outlook=Sunny|Humidity) = 0.970

3- Gain(Outlook=Sunny|Wind) = 0.019

Now, humidity is the decision because it produces the highest score if outlook were sunny.

At this point, decision will always be no if humidity were high.

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

On the other hand, decision will always be yes if humidity were normal

Day Outlook Temp. Humidity Wind Decision

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

Finally, it means that we need to check the humidity and decide if outlook were sunny.

Rain outlook on decision

Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

10 Rain Mild Normal Weak Yes

14 Rain Mild High Strong No

1- Gain(Outlook=Rain | Temperature) = 0.01997309402197489

2- Gain(Outlook=Rain | Humidity) = 0.01997309402197489

3- Gain(Outlook=Rain | Wind) = 0.9709505944546686

Here, wind produces the highest score if outlook were rain. That’s why, we need to check wind
attribute in 2nd level if outlook were rain.

So, it is revealed that decision will always be yes if wind were weak and outlook were rain.

Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

What’s more, decision will be always no if wind were strong and outlook were rain.

Day Outlook Temp. Humidity Wind Decision

6 Rain Cool Normal Strong No

14 Rain Mild High Strong No

So, decision tree construction is over. We can use the following rules for decisioning.

Final version of decision tree

METRICS IN ML
12
13
14
15
16
17
18
19
20
21
22

Naive Bayes Example

Say you have 1000 fruits which could be either ‘banana’, ‘orange’ or ‘other’.
These are the 3 possible classes of the Y variable.

We have data for the following X variables, all of which are binary (1 or 0).

 Long
 Sweet
 Yellow

The first few rows of the training dataset look like this:

Fruit Long (x1) Sweet (x2) Yellow (x3)

Orange 0 1 0
Banana 1 0 1
Banana 1 1 1
Other 1 1 0
23

For the sake of computing the probabilities, let’s aggregate the training data
to form a counts table like this.

So the objective of the classifier is to predict if a given fruit is a ‘Banana’ or

‘Orange’ or ‘Other’ when only the 3 features (long, sweet and yellow) are
known.

Let’s say you are given a fruit that is: Long, Sweet and Yellow, can you
predict what fruit it is?

This is the same of predicting the Y when only the X variables in testing data
are known. Let’s solve it by hand using Naive Bayes.

The idea is to compute the 3 probabilities, that is the probability of the fruit
being a banana, orange or other. Whichever fruit type gets the highest
probability wins

All the information to calculate these probabilities is present in the above

tabulation.
Step 1: Compute the ‘Prior’ probabilities for each of the class of fruits.

That is, the proportion of each fruit class out of all the fruits from the
population. You can provide the ‘Priors’ from prior information about the
population. Otherwise, it can be computed from the training data.

For this case, let’s compute from the training data. Out of 1000 records in
training data, you have 500 Bananas, 300 Oranges and 200 Others. So the
respective priors are 0.5, 0.3 and 0.2.

P(Y=Banana) = 500 / 1000 = 0.50

P(Y=Orange) = 300 / 1000 = 0.30

P(Y=Other) = 200 / 1000 = 0.20

Step 2: Compute the probability of evidence that goes in the denominator.

This is nothing but the product of P of Xs for all X. This is an optional step
because the denominator is the same for all the classes and so will not
affect the probabilities.

P(x1=Long) = 500 / 1000 = 0.50

P(x2=Sweet) = 650 / 1000 = 0.65

P(x3=Yellow) = 800 / 1000 = 0.80

Step 3: Compute the probability of likelihood of evidences that goes in the

numerator.

It is the product of conditional probabilities of the 3 features. If you refer

back to the formula, it says P(X1 |Y=k). Here X1 is ‘Long’ and k is ‘Banana’.
That means the probability the fruit is ‘Long’ given that it is a Banana. In the
above table, you have 500 Bananas. Out of that 400 is long. So, P(Long |
Banana) = 400/500 = 0.8.

Here, I have done it for Banana alone.

Probability of Likelihood for Banana

P(x1=Long | Y=Banana) = 400 / 500 = 0.80

P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70

P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90

So, the overall probability of Likelihood of evidence for Banana = 0.8 * 0.7 *
0.9 = 0.504
25

Step 4: Substitute all the 3 equations into the Naive Bayes formula, to get the
probability that it is a banana.

Similarly, you can compute the probabilities for ‘Orange’ and ‘Other fruit’.
The denominator is the same for all 3 cases, so it’s optional to compute.

Clearly, Banana gets the highest probability, so that will be our predicted
class.

What is Laplace Correction?

The value of P(Orange | Long, Sweet and Yellow) was zero in the above
example, because, P(Long | Orange) was zero. That is, there were no ‘Long’
oranges in the training data.

It makes sense, but when you have a model with many features, the entire
probability will become zero because one of the feature’s value was zero. To
avoid this, we increase the count of the variable with zero to a small value
(usually 1) in the numerator, so that the overall probability doesn’t become
zero.

This correction is called ‘Laplace Correction’. Most Naive Bayes model

implementations accept this or an equivalent form of correction as a
parameter.
26

ANOTHER TEXT EXAMPLE (NAÏVE BAYES)

27
28
29
30

K-Means Clustering
31
32
33
34

ANOTHER EXAMPLE FOR

K-Means Clustering
35
36
37

ANOTHER EXAMPLE OF K-MEANS (One Dimention)

Suppose we want to group the visitors to a website using just their age (one-dimensional space) as follows:
n = 19
15,15,16,19,19,20,20,21,22,28,35,40,41,42,43,44,60,61,65

Initial clusters (random centroid or average):

k=2
c1 = 16
c2 = 22

Iteration 1:
c1 = 15.33
c2 = 36.25
xi c1 c2 Distance 1 Distance 2 Nearest Cluster New Centroid
15 16 22 1 7 1
15 16 22 1 7 1 15.33
16 16 22 0 6 1
19 16 22 9 3 2
19 16 22 9 3 2
20 16 22 16 2 2
20 16 22 16 2 2
21 16 22 25 1 2
22 16 22 36 0 2
28 16 22 12 6 2
35 16 22 19 13 2
36.25
40 16 22 24 18 2
41 16 22 25 19 2
42 16 22 26 20 2
43 16 22 27 21 2
44 16 22 28 22 2
60 16 22 44 38 2
61 16 22 45 39 2
65 16 22 49 43 2
38

Iteration 2:
c1 = 18.56
c2 = 45.90
xi c1 c2 Distance 1 Distance 2 Nearest Cluster New Centroid
15 15.33 36.25 0.33 21.25 1
15 15.33 36.25 0.33 21.25 1
16 15.33 36.25 0.67 20.25 1
19 15.33 36.25 3.67 17.25 1
19 15.33 36.25 3.67 17.25 1 18.56
20 15.33 36.25 4.67 16.25 1
20 15.33 36.25 4.67 16.25 1
21 15.33 36.25 5.67 15.25 1
22 15.33 36.25 6.67 14.25 1
28 15.33 36.25 12.67 8.25 2
35 15.33 36.25 19.67 1.25 2
40 15.33 36.25 24.67 3.75 2
41 15.33 36.25 25.67 4.75 2
42 15.33 36.25 26.67 5.75 2
45.9
43 15.33 36.25 27.67 6.75 2
44 15.33 36.25 28.67 7.75 2
60 15.33 36.25 44.67 23.75 2
61 15.33 36.25 45.67 24.75 2
65 15.33 36.25 49.67 28.75 2

Iteration 3:
c1 = 19.50
c2 = 47.89
xi c1 c2 Distance 1 Distance 2 Nearest Cluster New Centroid
15 18.56 45.9 3.56 30.9 1
15 18.56 45.9 3.56 30.9 1
16 18.56 45.9 2.56 29.9 1
19 18.56 45.9 0.44 26.9 1
19 18.56 45.9 0.44 26.9 1
19.50
20 18.56 45.9 1.44 25.9 1
20 18.56 45.9 1.44 25.9 1
21 18.56 45.9 2.44 24.9 1
22 18.56 45.9 3.44 23.9 1
28 18.56 45.9 9.44 17.9 1
39

35 18.56 45.9 16.44 10.9 2

40 18.56 45.9 21.44 5.9 2
41 18.56 45.9 22.44 4.9 2
42 18.56 45.9 23.44 3.9 2
43 18.56 45.9 24.44 2.9 2 47.89
44 18.56 45.9 25.44 1.9 2
60 18.56 45.9 41.44 14.1 2
61 18.56 45.9 42.44 15.1 2
65 18.56 45.9 46.44 19.1 2

Iteration 4:
c1 = 19.50
c2 = 47.89
xi c1 c2 Distance 1 Distance 2 Nearest Cluster New Centroid
15 19.5 47.89 4.50 32.89 1
15 19.5 47.89 4.50 32.89 1
16 19.5 47.89 3.50 31.89 1
19 19.5 47.89 0.50 28.89 1
19 19.5 47.89 0.50 28.89 1
19.50
20 19.5 47.89 0.50 27.89 1
20 19.5 47.89 0.50 27.89 1
21 19.5 47.89 1.50 26.89 1
22 19.5 47.89 2.50 25.89 1
28 19.5 47.89 8.50 19.89 1
35 19.5 47.89 15.50 12.89 2
40 19.5 47.89 20.50 7.89 2
41 19.5 47.89 21.50 6.89 2
42 19.5 47.89 22.50 5.89 2
43 19.5 47.89 23.50 4.89 2 47.89
44 19.5 47.89 24.50 3.89 2
60 19.5 47.89 40.50 12.11 2
61 19.5 47.89 41.50 13.11 2
65 19.5 47.89 45.50 17.11 2

No change between iterations 3 and 4 has been noted. By using clustering, 2 groups have been identified 15-28 and
35-65. The initial choice of centroids can affect the output clusters, so the algorithm is often run multiple times with
different starting conditions in order to get a fair view of what the clusters should be.
40

Videos Tutorials
1. K Means Clustering Algorithm: URL:
https://www.youtube.com/watch?v=1XqG0kaJVHY&feature=emb_logo
2. K Means Clustering Algorithm: URL: https://www.youtube.com/watch?v=EItlUEPCIzM

Hierarchical Clustering
Hierarchical clustering is another unsupervised learning algorithm that is used to group together the
unlabeled data points having similar characteristics. Hierarchical clustering algorithms falls into following
two categories.

Agglomerative hierarchical algorithms − In agglomerative hierarchical algorithms, each data point is

treated as a single cluster and then successively merge or agglomerate (bottom-up approach) the pairs
of clusters. The hierarchy of the clusters is represented as a dendrogram or tree structure.

Divisive hierarchical algorithms − On the other hand, in divisive hierarchical algorithms, all the data
points are treated as one big cluster and the process of clustering involves dividing (Top-down approach)
the one big cluster into various small clusters.

Agglomerative clustering
In agglomerative or bottom-up clustering method we assign each observation to its own cluster. Then,
compute the similarity (e.g., distance) between each of the clusters and join the two most similar
clusters. Finally, repeat steps 2 and 3 until there is only a single cluster left. The related algorithm is
shown below.
41

Before any clustering is performed, it is required to determine the proximity matrix containing the distance bet
a distance function. Then, the matrix is updated to display the distance between each cluster. The following thr
how the distance between each cluster is measured.

An example: working of the algorithm

Measuring the distance of two clusters

 A few ways to measure distances of two clusters.
42

 Results in different variations of the algorithm.

o Single link

o Complete link

o Average link

o Centroids

o …

Single link method

In single linkage hierarchical clustering, the distance between two clusters is defined as the shortest
distance between two points in each cluster. For example, the distance between clusters “r” and “s” to
the left is equal to the length of the arrow between their two closest points.

Complete link method

In complete linkage hierarchical clustering, the distance between two clusters is defined as the longest
distance between two points in each cluster. For example, the distance between clusters “r” and “s” to
the left is equal to the length of the arrow between their two furthest points.
43

Average link method:

In average linkage hierarchical clustering, the distance between two clusters is defined as the average
distance between each point in one cluster to every point in the other cluster. For example, the distance
between clusters “r” and “s” to the left is equal to the average length each arrow between connecting
the points of one cluster to the other.

Centroid method:
 In this method, the distance between two clusters is the distance between their centroids

The complexity
 All the algorithms are at least O(n2). n is the number of data points.

 Single link can be done in O(n2).

 Complete and average links can be done in O(n2logn).

 Due the complexity, hard to use for large data sets.

o Sampling

o Scale-up methods (e.g., BIRCH).

An Example

Let’s now see a simple example: a hierarchical clustering of distances in kilometers between some Italian
cities. The method used is single-linkage.

Input distance matrix (L = 0 for all the clusters):

BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
44

NA 255 468 754 0 219 869

RM 412 268 564 219 0 669
TO 996 400 138 869 669 0

The nearest pair of cities is MI and TO, at distance 138. These are merged into a single cluster called
"MI/TO". The level of the new cluster is L(MI/TO) = 138 and the new sequence number is m = 1.
Then we compute the distance from this new compound object to all other objects. In single link
clustering the rule is that the distance from the compound object to another object is equal to the
shortest distance from any member of the cluster to the outside object. So the distance from "MI/TO" to
RM is chosen to be 564, which is the distance from MI to RM, and so on.

After merging MI with TO we obtain the following matrix:

BA FI MI/TO NA RM

BA 0 662 877 255 412

FI 662 0 295 468 268

MI/TO 877 295 0 754 564

NA 255 468 754 0 219

RM 412 268 564 219 0

min d(i,j) = d(NA,RM) = 219 => merge NA and RM into a new cluster called NA/RM
L(NA/RM) = 219
m=2

BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 268 564 0

min d(i,j) = d(BA,NA/RM) = 255 => merge BA and NA/RM into a new cluster called BA/NA/RM
L(BA/NA/RM) = 255
m=3
46

BA/NA/RM FI MI/TO
BA/NA/RM 0 268 564
FI 268 0 295
MI/TO 564 295 0

min d(i,j) = d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster called BA/FI/NA/RM
L(BA/FI/NA/RM) = 268
m=4

BA/FI/NA/RM MI/TO
BA/FI/NA/RM 0 295
MI/TO 295 0

Finally, we merge the last two clusters at level 295.

The process is summarized by the following hierarchical tree

ANOTHER HIERARCHIAL EXAMPLE

What is hierarchical clustering (agglomerative) ?

Clustering is a data mining technique to group a set of objects in a way such that objects in the
same cluster are more similar to each other than to those in other clusters.
In hierarchical clustering, we assign each object (data point) to a separate cluster. Then compute
the distance (similarity) between each of the clusters and join the two most similar clusters. Let’s
understand further by solving an example.
48

Objective : For the one dimensional data set {7,10,20,28,35}, perform hierarchical clustering and plot
the dendogram to visualize it.
Solution : First, let’s the visualize the data.

Observing the plot above, we can intuitively conclude that:

The first two points (7 and 10) are close to each other and should be in the same cluster
Also, the last two points (28 and 35) are close to each other and should be in the same cluster
Cluster of the center point (20) is not easy to conclude
Let’s solve the problem by hand using both the types of agglomerative hierarchical clustering:
Single Linkage : In single link hierarchical clustering, we merge in each step the two clusters,
whose two closest members have the smallest distance.
49

Using single linkage two clusters are formed :

Cluster 1 : (7,10)
Cluster 2 : (20,28,35)
50

2.Complete Linkage : In complete link hierarchical clustering, we merge in the members of the
clusters in each step, which provide the smallest maximum pairwise distance

Using complete linkage two clusters are formed :

Cluster 1 : (7,10,20)
Cluster 2 : (28,35)
Conclusion : Hierarchical clustering is mostly used when the application requires a hierarchy, e.g
creation of a taxonomy. However, they are expensive in terms of their computational and storage
requirements.
51

Video:

1. Hierarchical Clustering: URL:

https://www.youtube.com/watch?v=tlIv3IT_hHk&feature=emb_logo
2. Hierarchical Clustering :URL: https://www.youtube.com/watch?v=9U4h6pZw6f8

PRINCIPAL COMPONENT ANALYSIS

52
53
54
55
56
57
58
59

EXTERNAL RESOURCE:
1. https://youtu.be/FgakZw6K1QQ
2. https://builtin.com/data-science/step-step-explanation-principal-component-analysis
3. https://towardsdatascience.com/the-mathematics-behind-principal-component-analysis-
fff2d7f4b643

CCS359 - Quantum Computing Manual (WOL)
No ratings yet
CCS359 - Quantum Computing Manual (WOL)
25 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
65 pages
Ad3351 Daa Unit I
No ratings yet
Ad3351 Daa Unit I
135 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
Week 1 Assignment 01
100% (1)
Week 1 Assignment 01
4 pages
(Technical) Machine Learning U1-2 (2019 Pattern)
No ratings yet
(Technical) Machine Learning U1-2 (2019 Pattern)
86 pages
DBMS Lab Manual 2023-24
No ratings yet
DBMS Lab Manual 2023-24
77 pages
Machine Learning Viva Questions With Answers
0% (1)
Machine Learning Viva Questions With Answers
3 pages
Bda Sem 7 Book
No ratings yet
Bda Sem 7 Book
188 pages
Practice Questions For HPC Final Orals-2022
No ratings yet
Practice Questions For HPC Final Orals-2022
3 pages
TCS NQT Model Paper 2024
100% (2)
TCS NQT Model Paper 2024
8 pages
Software Testing Lab Manual 2
100% (2)
Software Testing Lab Manual 2
46 pages
Unit 1 - Data Science & Big Data - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Science & Big Data - WWW - Rgpvnotes.in
20 pages
KNN Solved Example
100% (1)
KNN Solved Example
6 pages
Characteristics of Soft Computing
88% (8)
Characteristics of Soft Computing
11 pages
Daa Important Questions Unit Wise
No ratings yet
Daa Important Questions Unit Wise
3 pages
MA3354 Discrete Mathematics Lecture Notes 2
No ratings yet
MA3354 Discrete Mathematics Lecture Notes 2
221 pages
DSA Sheet by Rohit Negi
No ratings yet
DSA Sheet by Rohit Negi
38 pages
AL3451 Machine Learning Lecture Notes 1
No ratings yet
AL3451 Machine Learning Lecture Notes 1
212 pages
FAFL Padma Reddy
100% (1)
FAFL Padma Reddy
457 pages
Data Science - Assignment 1
No ratings yet
Data Science - Assignment 1
4 pages
Algorithms Lab Viva Questions
No ratings yet
Algorithms Lab Viva Questions
2 pages
9.5 Variants of The Basic Convolution Function Function of CNN
No ratings yet
9.5 Variants of The Basic Convolution Function Function of CNN
7 pages
Ma8551-Algebra & Number Theory (Iii Cse, It) Class Notes Unit-Iii Divisibility Theory and Canonical Decompositions
100% (1)
Ma8551-Algebra & Number Theory (Iii Cse, It) Class Notes Unit-Iii Divisibility Theory and Canonical Decompositions
24 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
Polynomial Time Reduction(s)
100% (1)
Polynomial Time Reduction(s)
8 pages
AIML Unit Wise Question Bank
100% (1)
AIML Unit Wise Question Bank
4 pages
Estimating Moments
No ratings yet
Estimating Moments
22 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
@vtucode - in 21CS63 Question Bank 2021 Scheme
No ratings yet
@vtucode - in 21CS63 Question Bank 2021 Scheme
6 pages
K-Means Clustering - Numerical Example
100% (1)
K-Means Clustering - Numerical Example
6 pages
Learning Data Mining With Python - Sample Chapter
100% (4)
Learning Data Mining With Python - Sample Chapter
29 pages
Distribution Model
100% (1)
Distribution Model
24 pages
Co-Po Big Data Analytics
100% (1)
Co-Po Big Data Analytics
41 pages
11.NUMPY Lab File (R20)
100% (1)
11.NUMPY Lab File (R20)
105 pages
0 D 55
No ratings yet
0 D 55
23 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
AI CH4 Unit4
100% (1)
AI CH4 Unit4
14 pages
Well Posed Learning Problems and Applications of ML
100% (1)
Well Posed Learning Problems and Applications of ML
17 pages
KNN 200406061259
No ratings yet
KNN 200406061259
9 pages
Problem Representation in Ai
100% (10)
Problem Representation in Ai
12 pages
Machine Learning BITS
No ratings yet
Machine Learning BITS
3 pages
Discrete Mathematics: Anna University Solved Question Papers
No ratings yet
Discrete Mathematics: Anna University Solved Question Papers
11 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
Soft Computing (SC) Topper Solution
100% (2)
Soft Computing (SC) Topper Solution
35 pages
Analytical Learning
No ratings yet
Analytical Learning
42 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
2 pages
DATA MINING - Simple Guide For Beginners PDF
0% (1)
DATA MINING - Simple Guide For Beginners PDF
5 pages
DAA Syllabus
No ratings yet
DAA Syllabus
1 page
Data Analytics With Python - Unit 13 - Week 11
No ratings yet
Data Analytics With Python - Unit 13 - Week 11
4 pages
Cs3451 Ios Unit 5 Notes
No ratings yet
Cs3451 Ios Unit 5 Notes
21 pages
Ooad Questions
No ratings yet
Ooad Questions
2 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
Rtmnu Machine Learning Paper Winter 2024
100% (1)
Rtmnu Machine Learning Paper Winter 2024
4 pages
DAA Question Bank
No ratings yet
DAA Question Bank
9 pages
IT8601-Computational Intelligence PDF
No ratings yet
IT8601-Computational Intelligence PDF
12 pages
Operations Research and Machine Learning
No ratings yet
Operations Research and Machine Learning
104 pages
Cluster Analysis - Ward's Method
No ratings yet
Cluster Analysis - Ward's Method
6 pages
Pune University Soft Computing Exam Papers
No ratings yet
Pune University Soft Computing Exam Papers
4 pages
DBMS ER Design Issues - Copy Unit.2
No ratings yet
DBMS ER Design Issues - Copy Unit.2
2 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Practical File: Deep Learning
No ratings yet
Practical File: Deep Learning
33 pages
Decision Trees
No ratings yet
Decision Trees
49 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
Machine Learning Tutorial For Beginners
No ratings yet
Machine Learning Tutorial For Beginners
15 pages
Does India Have Subnational Welfare Regimes The Role of State Governments in Shaping Social Policy
No ratings yet
Does India Have Subnational Welfare Regimes The Role of State Governments in Shaping Social Policy
18 pages
Dendrogram - Slides
No ratings yet
Dendrogram - Slides
27 pages
Advanced Machine Learning and Artificial Intelligence
No ratings yet
Advanced Machine Learning and Artificial Intelligence
9 pages
Clustering Techniques and Their Applications in Engineering
100% (1)
Clustering Techniques and Their Applications in Engineering
16 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
Analyzing The Behavior of Electricity Consumption Using Hadoop
No ratings yet
Analyzing The Behavior of Electricity Consumption Using Hadoop
4 pages
POA - Tracker
No ratings yet
POA - Tracker
60 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
Google Earth Engine Cloud Computing Platform For Remote Sensing Big Data Applications: A Comprehensive Review
No ratings yet
Google Earth Engine Cloud Computing Platform For Remote Sensing Big Data Applications: A Comprehensive Review
26 pages
Adaptive Resonance Theory (ART) : An Introduction by L.G. Heins & D.R. Tauritz May/June 1995
No ratings yet
Adaptive Resonance Theory (ART) : An Introduction by L.G. Heins & D.R. Tauritz May/June 1995
15 pages
Image Segmentation: Principles, Techniques, and Applications Tao Leidownload
100% (2)
Image Segmentation: Principles, Techniques, and Applications Tao Leidownload
56 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
Clustering Polysemic Subcategorization
No ratings yet
Clustering Polysemic Subcategorization
8 pages
Functional Bid Landscape Forecasting For Display Advertising
No ratings yet
Functional Bid Landscape Forecasting For Display Advertising
16 pages
Multi-Level Observation and Understanding of Program Behaviors
No ratings yet
Multi-Level Observation and Understanding of Program Behaviors
34 pages
Data Mining R18 Set 4
No ratings yet
Data Mining R18 Set 4
2 pages
Clim Jcli D 15 0640.1
No ratings yet
Clim Jcli D 15 0640.1
15 pages
FRCCS24 Jason V1
No ratings yet
FRCCS24 Jason V1
5 pages
0.1 Mmai Mma Gmma 869 Syllabus
No ratings yet
0.1 Mmai Mma Gmma 869 Syllabus
16 pages
Hierarchical Graph Representation Learning With Differentiable Pooling
No ratings yet
Hierarchical Graph Representation Learning With Differentiable Pooling
9 pages
98 Jicr September 3208
No ratings yet
98 Jicr September 3208
6 pages
Conclusion p1 1st Draft
No ratings yet
Conclusion p1 1st Draft
2 pages