Unit V
Unit V
GR20A3123
UNIT – V
UNSUPERVISED LEARNING
Unsupervised Learning : Reinforcement Learning:
Step-05:
Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping
criteria is met-
Center of newly formed clusters do not change
Data points remain present in the same cluster
Maximum number of iterations are reached
PRACTICE PROBLEMS BASED ON K-MEANS CLUSTERING ALGORITHM-
Problem-01:
Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
Ρ(a, b) = |x2 – x1| + |y2 – y1|
Use K-Means Algorithm to find the three cluster centers after the second iteration.
Solution-
We follow the above discussed K-Means Clustering Algorithm-
Iteration-01:
We calculate the distance of each point from each of the center of the three
clusters.
The distance is calculated by using the given distance function.
Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0
Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |5 – 2| + |8 – 10|
=3+2
=5
Calculating Distance Between A1(2, 10) and C3(1, 2)-
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1 – 2| + |2 – 10|
=1+8
=9
In the similar manner, we calculate the distance of other points from each of the center
of the three clusters.
A2(2, 5) 5 6 4 C3
A3(8, 4) 12 7 9 C2
A4(5, 8) 5 0 10 C2
A5(7, 5) 10 5 9 C2
A6(6, 4) 10 5 7 C2
A7(1, 2) 9 10 0 C3
A8(4, 9) 3 2 10 C2
Cluster-01:
Cluster-02:
Cluster-03:
For Cluster-01:
For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
Iteration-02:
We calculate the distance of each point from each of the center of the three
clusters.
The distance is calculated by using the given distance function.
The following illustration shows the calculation of distance between point A1(2, 10)
and each of the center of the three clusters-
Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|= 0
Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |6 – 2| + |6 – 10|
= 4 + 4= 8
Calculating Distance Between A1(2, 10) and C3(1.5, 3.5)-
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1.5 – 2| + |3.5 – 10|
= 0.5 + 6.5 = 7
In the similar manner, we calculate the distance of other points from each of
the center of the three clusters.
Next,
We draw a table showing all the results.
Using the table, we decide which point belongs to which cluster.
The given point belongs to that cluster whose center is nearest to it.
Distance from Distance from Distance from
Point belongs
Given Points center (2, 10) center (6, 6) of center (1.5, 3.5)
to Cluster
of Cluster-01 Cluster-02 of Cluster-03
A1(2, 10) 0 8 7 C1
A2(2, 5) 5 5 2 C3
A3(8, 4) 12 4 7 C2
A4(5, 8) 5 3 8 C2
A5(7, 5) 10 2 7 C2
A6(6, 4) 10 2 5 C2
A7(1, 2) 9 9 2 C3
A8(4, 9) 3 5 8 C1
From here, New clusters are-
Cluster-01:
First cluster contains points-
A1(2, 10) A8(4, 9)
Cluster-02:
Second cluster contains points-
A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)
Cluster-03:
For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4) = (6.5, 5.25)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5)
This is completion of Iteration-02.
After second iteration, the center of the three clusters are-
C1(3, 9.5)
C2(6.5, 5.25)
C3(1.5, 3.5)
Problem-02:
We calculate the distance of each point from each of the center of the
two clusters.
The distance is calculated by using the Euclidean distance formula.
The following illustration shows the calculation of distance between point A(2,
2) and each of the center of the two clusters-
Ρ(A, C1)
= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]
= sqrt [ (2 – 2)2 + (2 – 2)2 ] = sqrt [ 0 + 0 ] = 0
Ρ(A, C2)
= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]
= sqrt [ (1 – 2)2 + (1 – 2)2 ] = sqrt [ 1 + 1 ] = sqrt [ 2 ] = 1.41
In the similar manner, we calculate the distance of other points from each of the
center of the two cluster
• We draw a table showing all the results.
• Using the table, we decide which point belongs to which cluster.
• The given point belongs to that cluster whose center is nearest to it.
Distance from Distance from
Point belongs
Given Points center (2, 2) of center (1, 1) of
to Cluster
Cluster-01 Cluster-02
A(2, 2) 0 1.41 C1
B(3, 2) 1 2.24 C1
C(1, 1) 1.41 0 C2
D(3, 1) 1.41 2 C1
E(1.5, 0.5) 1.58 0.71 C2
From here, New clusters are-
Cluster-01:
Cluster-02:
Second cluster contains points-
C(1, 1) E(1.5, 0.5)
Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the
points contained in that cluster.
For Cluster-01:
Center of Cluster-01
= ((2 + 3 + 3)/3, (2 + 2 + 1)/3) = (2.67, 1.67)
For Cluster-02:
Center of Cluster-02
= ((1 + 1.5)/2, (1 + 0.5)/2) = (1.25, 0.75)
• But for categorical data points, we cannot calculate the distance. So we go for
KModes algorithm.
• The lesser the dissimilarities the more similar our data points are. It uses
Modes instead of means.
HOW K MODES ALGORITHM WORKS ?
• Step 2 : Calculate the dissimilarities and assign each observation to its closest
cluster
• Comparing leader/cluster P1 to
the observation P2 gives
3(1+1+1) dissimilarities.
EXAMPLE
dissimilarity)
EXAMPLE
• After step 2, the observations P1, P2, P5 are assigned to cluster 1; P3, P7 are
• Note: If all the clusters have the same dissimilarity with an observation, assign
gives 1 dissimilarity.
EXAMPLE
observation P2 gives 2
dissimilarities.
EXAMPLE
• Likewise, calculate all the dissimilarities
and put them in a matrix. Assign each
observation to its closest cluster.
• The observations P1, P2, P5 are assigned
to Cluster 1; P3, P7 are assigned to
Cluster 2; and P4, P6, P8 are assigned to
Cluster 3.
• We stop here as we see there is no
change in the assignment of
observations.
K PROTOTYPES CLUSTERING
ALGORITHM
WHAT IS K PROTOTYPES CLUSTERING ?
• K-Prototype which is created to handle clustering algorithms with the mixed
data types (numerical and categorical variables).
• K-Prototype is a clustering method based on partitioning.
• Its algorithm is an improvement of the K-Means and K-Mode clustering
algorithm to handle clustering with the mixed data types.
• In k-prototypes clustering, we select k-prototypes randomly at the start.
• After that, we calculate the distance between each data point and the
prototypes. Accordingly, all the data points are assigned to clustering
associated with different prototypes.
DI S TANCE MEA S U R ES I N K- PR OTOTYPES
CLU STER I NG :
• As the k-prototypes clustering algorithm deals with data having numerical as well as
categorical attributes, it uses different measures for both data types.
• For numerical data, the k-prototypes clustering uses squared euclidean distance as the
distance measure.
• For instance, if we are given two data points (1,2,3) and (4, 3, 3), the distance between these
two data points will be calculated as (1-4)^2+(2-3)^2+(3-3)^2 which is equal to 10.
• For categorical attributes, the k-prototypes clustering algorithm follows matching dissimilarity.
If you have two records (A, B, C, D) and (A, D, C, C) with categorical attributes, the matching
dissimilarity is the number of different values at each position in the records.
• In the given records, values are different at the two positions only. Hence, the matching
dissimilarity between the records will be 2.
DI S TANCE MEA S U R ES I N K- PR OTOTYPES
CLU STER I NG :
• For records having mixed attributes, we calculate the distance between
categorical and numerical attributes separately.
• After that, we use the sum of the dissimilarity scores as the distance between
two records. For instance, consider that we have two records [‘A’, ‘B’, ‘F’, 155,
53] and [‘A’, ‘A’, ‘M’, 174, 70].
• To find the distance between these two records, we will first find the
dissimilarity score between [‘A’, ‘B’, ‘F’] and [‘A’, ‘A’, ‘M’]. The score is 2 as
two attributes out of three have different values.
• Next, we will calculate the square Euclidean distance between [155, 53] and
[174, 70]. Here, (155-174)^2 + (53-70)^2 which is equal to 650.
DI S TANCE MEA S U R ES I N K- PR OTOTYPES
CLU STER I NG :
• Now, we can directly calculate the total dissimilarity score as the sum of the
dissimilarity score between categorical attributes and the square Euclidean
distance of numerical attributes. Here, the sum will be equal to 650+2=652.
• Observe that the matching dissimilarity score of categorical attributes is almost
negligible compared to the square Euclidean distance between numerical
attributes. Hence, the categorical attributes will have little or no effect on
clustering.
• To solve this problem, we can scale the values in numeric attributes within a
range of say 0 to 5.
• Alternatively, we can take a weighted sum of the matching dissimilarity scores
and the square Euclidean distance.
CH OI CE OF NEW PR OT OT Y PES I N K-
PR OTOTYPES CLU S T ER I N G :
• Once a cluster is formed, we need to calculate a new prototype for the cluster
using the data points in the current cluster.
• To calculate the new prototype for any given cluster, we will take mode of
categorical attributes of the data points in the cluster.
• For numerical attributes, we will use the mean of the values to calculate new
prototype for the cluster.
• For example, suppose that we have the following data points in a cluster.
EXAMPLE :
EXAMPLE :
• In the above cluster, we will take the mode of values in the EQ Rating, IQ
Rating, and Gender attributes.
• For the attributes Height and Weight, we will take the mean of the values to
calculate the new prototype.
• Hence, the prototype for the above cluster will be [B, A , F, 4.71978,3.999999 ].
• Here, B is the mode of values in the EQ Rating column. A is the mode of the
values in the IQ Rating column, and F is the mode of the values in
the Gender column.
• Similarly, 4.71978 and 3.999999 are mean of
the Height and Weight attributes respectively.
K–PROTOTYPES CLUSTERING ALGORITHM :
1. First, we select K data points from the input dataset as initial prototypes.
2. We then find the distance of each data point from the current prototypes. The
distances are calculated as discussed in the previous sections.
3. After finding the distance of each data point from the prototypes, we assign
data points to clusters. Here, each data point is assigned to the cluster with the
prototype nearest to the data point.
K-PROTOTYPES CLUSTERING
ALGORITHM:
4. After assigning data points to the clusters, we calculate new prototypes for
each cluster. To calculate the prototypes, we take the mean of numeric attributes
and the mode of categorical attributes as discussed previously.
5. If the new prototypes are the same as the previous prototypes, we say that
the algorithm has converged. Hence, the current clusters are finalized.
Otherwise, we go to 2.
NUMERICAL
EXAMPLE :
NUMERICAL
EXAMPLE :
• In the above table, we have normalized the Height and Weight attributes in the
range 0 to 5. Now, the distance between the numeric attributes will not be very
• Let us now discuss the numerical for k-prototypes clustering using the
ITERATION 1 :
attributes and mean of the numerical attributes.
NUMERICAL EXAMPLE 1 :
NUMERICAL EXAMPLE ITERATION 1 :
NUMERICAL EXAMPLE ITERATION 1 :
• Hence, the prototype for cluster 3 is [A, B, F, 4.580062, 3.773109].
• After iteration 1, we have the following prototypes.
• prototype1= [B, A, F, 4.656593, 3.970588]
• prototype2= [A, C, M, 4.615384, 4.411764]
• prototype3=[A, B, F, 4.580062, 3.773109]
• You can observe that the current prototypes are not the same as the initial
prototypes. Hence, we will calculate the distance of the data points in the
dataset to these prototypes and reassign the points to the clusters.
NUMERICAL
EXAMPLE
ITERATION 2 :
NUMERICAL EXAMPLE ITERATION 2:
Advantages of EM algorithm –
• It is always guaranteed that likelihood will increase with
each iteration.
• The E-step and M-step are often easy for many problems in
terms of implementation.
• Solutions to the M-steps often exist in the closed form.
Reinforcement Learning
What is Reinforcement Learning?
Unlike supervised and unsupervised learning, reinforcement
learning is a feedback-based approach in which agent
learns by performing some actions as well as their outcomes.
Based on action status (good or bad), the agent gets positive
or negative feedback. Further, for each positive feedback,
they get rewarded, whereas, for each negative feedback, they
also get penalized.
Key points in Reinforcement Learning
Reinforcement learning does not require any labeled data for the
learning process. It learns through the feedback of action
performed by the agent. Moreover, in reinforcement learning,
agents also learn from past experiences.
Reinforcement learning methods are used to solve tasks where
decision-making is sequential and the goal is long-term, e.g.,
robotics, online chess, etc.
Reinforcement learning aims to get maximum positive feedback
so that they can improve their performance.
Reinforcement learning involves various actions, which include
acting, changing/unchanged state, and getting feedback. And
based on these actions, agents learn and explore the environment
Exploitation in Reinforcement Learning:
o An initial state probability distribution, ? = {?1, ?2, ..., ?N}, which specifies the probability of starting in each
hidden state.
o A transition probability matrix, A = [aij], defines the probability of moving from one hidden state to another.
o An emission probability matrix, B = [bjk], defines the probability of emitting an observation from a given
hidden state.
o The basic idea behind an HMM is that the hidden states generate the observations, and the observed data is
used to estimate the hidden state sequence. This is often referred to as the forward-backwards algorithm.
APPLICATIONS :
• Speech Recognition :
• One of the most well-known applications of HMMs is speech recognition. In this field, HMMs are
used to model the different sounds and phones that makeup speech. The hidden states, in this case,
correspond to the different sounds or phones, and the observations are the acoustic signals that are
generated by the speech. The goal is to estimate the hidden state sequence, which corresponds to
the transcription of the speech, based on the observed acoustic signals. HMMs are particularly well-
suited for speech recognition because they can effectively capture the underlying structure of the
speech, even when the data is noisy or incomplete. In speech recognition systems, the HMMs are
usually trained on large datasets of speech signals, and the estimated parameters of the HMMs are
used to transcribe speech in real time.
APPLICATIONS :
• Natural Language Processing :
Another important application of HMMs is natural language processing. In this field, HMMs
are used for tasks such as part-of-speech tagging, named entity recognition, and text
classification. In these applications, the hidden states are typically associated with the underlying
grammar or structure of the text, while the observations are the words in the text. The goal is to
estimate the hidden state sequence, which corresponds to the structure or meaning of the text, based
on the observed words. HMMs are useful in natural language processing because they can effectively
capture the underlying structure of the text, even when the data is noisy or ambiguous.
APPLICATIONS:
• Bio Informatics :
HMMs are also widely used in bioinformatics, where they are used to model sequences of
DNA, RNA, and proteins. The hidden states, in this case, correspond to the different types of
residues, while the observations are the sequences of residues. The goal is to estimate the hidden state
sequence, which corresponds to the underlying structure of the molecule, based on the observed
sequences of residues. HMMs are useful in bioinformatics because they can effectively capture the
underlying structure of the molecule, even when the data is noisy or incomplete.
APPLICATIONS :
Finance :
Finally, HMMs have also been used in finance, where they are used to model stock prices,
interest rates, and currency exchange rates. In these applications, the hidden states correspond to
different economic states, such as bull and bear markets, while the observations are the stock prices,
interest rates, or exchange rates. The goal is to estimate the hidden state sequence, which
corresponds to the underlying economic state, based on the observed prices, rates, or exchange
rates.
LIMITATIONS OF HIDDEN MARKOV MODELS :
• Now, we will explore some of the key limitations of HMMs and discuss how they can impact the accuracy and performance
of HMM-based systems.
• Limited Modeling Capabilities:
One of the key limitations of HMMs is that they are relatively limited in their modelling capabilities. HMMs are designed to
model sequences of data, where the underlying structure of the data is represented by a set of hidden states. However, the
structure of the data can be quite complex, and the simple structure of HMMs may not be enough to accurately capture all the
details. For example, in speech recognition, the complex relationship between the speech sounds and the corresponding acoustic
signals may not be fully captured by the simple structure of an HMM.
LIMITATIONS OF HIDDEN MARKOV MODELS :
Overfitting :
Another limitation of HMMs is that they can be prone to overfitting, especially when the number of
hidden states is large, or the amount of training data is limited. Overfitting occurs when the model fits
the training data too well and is unable to generalize to new data. This can lead to poor performance
when the model is applied to real-world data and can result in high error rates. To avoid overfitting, it
is important to carefully choose the number of hidden states and to use appropriate regularization
techniques.
LIMITATIONS OF HIDDEN MARKOV MODELS :
• Lack of Robustness:
HMMs are also limited in their robustness to noise and variability in the data. For
example, in speech recognition, the acoustic signals generated by speech can be subjected to a
variety of distortions and noise, which can make it difficult for the HMM to accurately estimate the
underlying structure of the data. In some cases, these distortions and noise can cause the HMM to
make incorrect decisions, which can result in poor performance. To address these limitations, it is
often necessary to use additional processing and filtering techniques, such as noise reduction and
normalization, to pre-process the data before it is fed into the HMM.
LIMITATIONS OF HIDDEN MARKOV MODELS :
• Computational Complexity:
Finally, HMMs can also be limited by their computational complexity, especially when
dealing with large amounts of data or when using complex models. The computational complexity
of HMMs is due to the need to estimate the parameters of the model and to compute the likelihood
of the data given in the model.
Q Learning
Q Learning :
Let’s say that a robot has to cross a maze and reach the end point.
There are mines, and the robot can only move one tile at a time. If
the robot steps onto a mine, the robot is dead. The robot has to
reach the end point in the shortest time possible.
The scoring/reward system is as below:
1. The robot loses 1 point at each step. This is done so that the robot takes
the shortest path and reaches the goal as fast as possible.
2. If the robot steps on a mine, the point loss is 100 and the game ends.
3. If the robot gets power ⚡️, it gains 1 point.
4. If the robot reaches the end goal, the robot gets 100 points.
Now, the obvious question is: How do we train a robot to reach the end
goal with the shortest path without stepping on a mine?
Introducing the Q-Table
Q-Table is just a fancy name for a simple lookup
table where we calculate the maximum expected
future rewards for action at each state. Basically,
this table will guide us to the best action at each
state.
There will be four numbers of actions at each non-edge tile. When a robot is
at a state it can either move up or down or right or left.
In the Q-Table, the columns are the actions, and the rows are the states.
Each Q-table score will be the maximum expected future reward that
the robot will get if it takes that action at that state. This is an iterative
process, as we need to improve the Q-Table at each iteration.
But the questions are:
How do we calculate the values of the Q-table?
Are the values available or predefined?
To learn each value of the Q-table, we use the Q-Learning algorithm.
Mathematics: the Q-Learning algorithm
Q-function
The Q-function uses the Bellman equation and takes two inputs: state (s)
and action (a).
The Q-function uses the Bellman equation and takes two inputs: state (s)
and action (a).
Using the above function, we get the values of Q for the cells in
the table.
When we start, all the values in the Q-table are zeros.
There is an iterative process of updating the values. As we start to explore the
environment, the Q-function gives us better and better approximations by
continuously updating the Q-values in the table.
Now, let’s understand how the updating takes place.
We will choose an action (a) in the state (s) based on the Q-Table.
But, as mentioned earlier, when the episode initially starts, every Q-
value is 0.
Now we have taken an action and observed an outcome and reward. We need
to update the function Q(s,a).
In the case of the robot game, to reiterate the scoring/reward structure is:
power = +1
mine = -100
end = +100
We will repeat this again and again until the learning is stopped.
In this way the Q-Table will be updated.