0% found this document useful (0 votes)
8 views18 pages

ML Unit5 Notes

The document discusses unsupervised learning techniques, focusing on clustering methods such as K-means, K-modes, and K-prototypes, as well as reinforcement learning concepts. Unsupervised learning identifies patterns in unlabeled data, while reinforcement learning involves agents learning from actions and feedback. Key algorithms and their applications are explained, emphasizing the importance of choosing optimal parameters and understanding the strengths and weaknesses of each method.

Uploaded by

freefiretopup606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views18 pages

ML Unit5 Notes

The document discusses unsupervised learning techniques, focusing on clustering methods such as K-means, K-modes, and K-prototypes, as well as reinforcement learning concepts. Unsupervised learning identifies patterns in unlabeled data, while reinforcement learning involves agents learning from actions and feedback. Key algorithms and their applications are explained, emphasizing the importance of choosing optimal parameters and understanding the strengths and weaknesses of each method.

Uploaded by

freefiretopup606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT – V

Unsupervised Learning : Clustering-K-means, K-Modes, K-Prototypes, Gaussian


Mixture Models, Expectation-Maximization.
Reinforcement Learning: Exploration and exploitation trade-offs, non-associative
learning, Markov decision processes, Q-learning.

Unsupervised Learning:

Unsupervised learning is a machine learning technique in which models are not


supervised using training dataset. Instead, models itself find the hidden patterns and
insights from the given data. It can be compared to learning which takes place in
the human brain while learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained


using unlabeled dataset and are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or classification


problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset


containing images of different types of cats and dogs. The algorithm is never
trained upon the given dataset, which means it does not have any idea about the
features of the dataset. The task of the unsupervised learning algorithm is to
identify the image features on their own. Unsupervised learning algorithm will
perform this task by clustering the image dataset into the groups according to
similarities between images.
Below are some main reasons which describe the importance of Unsupervised Learning:

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by
their own experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data
which make unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.

Working of unsupervised learning can be understood by the below diagram:


Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is fed to
the machine learning model in order to train it. Firstly, it will interpret the raw
data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.

The unsupervised learning algorithm can be further categorized into two types of
problems:
o Clustering: Clustering is a method of grouping the objects into clusters
such that objects with most similarities remains into a group and has less or no
similarities with the objects of another group. Cluster analysis finds the
commonalities between the data objects and categorizes them as per
the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is
used for finding the relationships between variables in the large database. It
determines the set of items that occurs together in the dataset.
Association rule makes marketing strategy more effective. Such as people
who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam)
item. A typical example of Association rule is Market Basket Analysis
K-Means Clustering

K-Means clustering is an unsupervised iterative clustering technique.


It partitions the given data set into k predefined distinct clusters.

A cluster is defined as a collection of data points exhibiting certain similarities.

It partitions the data set such that-

Each data point belongs to a cluster with the nearest mean.

Data points belonging to one cluster have high degree of similarity.

Data points belonging to different clusters have high degree of dissimilarity.

K-Means Clustering Algorithm- (Refer class notes)

K-Means Clustering Algorithm involves the following steps-


Step-01:
1. Choose the number of clusters K.
2. Step-02:
1. Randomly select any K data points as cluster centers.
2. Select cluster centers in such a way that they are as farther as possible
from each other.
Step-03:
1. Calculate the distance between each data point and each cluster center.
2. The distance may be calculated either by using given distance function or by
using euclidean distance formula.
Step-04:
1. Assign each data point to some cluster.
2. A data point is assigned to that cluster whose center is nearest to that data
point.
Step-05:
1. Re-compute the center of newly formed clusters.

Example: Cluster the following eight points (with (x, y) representing locations) into
three clusters: A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4,
9)
Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).

Euclidean distance:

From here, New clusters are-


Cluster-01:
First cluster contains points-
A1(2, 10)
Cluster-02:
Second cluster contains points-
A3(8, 4)
A4(5, 8)
A5(7, 5)
A6(6, 4)
A8(4, 9)

Cluster-03:
Third cluster contains points-
. A2(2, 5)
. A7(1, 2)

Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the points contained
in that cluster.

For Cluster-01:
We have only one point A1(2, 10) in Cluster-01.
So, cluster center remains the same.

For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-01.
Iteration-02:

We calculate the distance of each point from each of the center of the three
clusters.
The distance is calculated by using the given distance function.
From here, New clusters are-
Cluster-01:
First cluster contains points-
A1(2, 10)
A8(4, 9)
Cluster-02:
Second cluster contains points-
A3(8, 4)
A4(5, 8)
A5(7, 5)
A6(6, 4)

Cluster-03:
Third cluster contains points-
A2(2, 5)
A7(1, 2)
Now,
We re-compute the new cluster clusters.
The new cluster center is computed by taking mean of all the points contained
in that cluster.

For Cluster-01:
Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5)
For Cluster-02:
Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)
For Cluster-03:
Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)
This is completion of Iteration-02.
After second iteration, the centers of the three clusters are-
C1(3, 9.5)
. C2(6.5, 5.25)
C3(1.5, 3.5)

Continue the iteration until the new cluster and previous clusters remains same.
Stopping Criteria for K-Means Clustering

There are essentially three stopping criteria that can be adopted to stop the K-means
algorithm:

1. Centroids of newly formed clusters do not change

2. Points remain in the same cluster

3. Maximum number of iterations is reached

We can stop the algorithm if the centroids of newly formed clusters are not changing.

Even after multiple iterations, if we are getting the same centroids for all the clusters,

we can say that the algorithm is not learning any new pattern, and it is a sign to stop

the training.

Another clear sign that we should stop the training process is if the points remain in

the same cluster even after training the algorithm for multiple iterations.

How to choose optimal k value?

We have in the previous section that the value of k needs to be chosen beforehand. The
performance of the K-means clustering algorithm depends on the optimal and packed
clusters. So, let’s see how to choose the optimal number of clusters using the techniques
given below:
Elbow Method

The Elbow method is the go to method to find the optimal number of clusters. It uses the
concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, means the total
variations within a cluster. In simple words, it is sum of the squared distances between each
data point and its centroid and calculates the average distance within a cluster. To measure
the distance between data points and centroid, we can use any method such as Euclidean
distance, Manhattan distance or cosine distance, etc.

Within Cluster Sum of Square


To find the optimal value of clusters, the elbow method follows the below steps:

1. Perform the K-means clustering multiple times using various k values (from 1-10).
2. For each value of k, calculates the WCSS value.
3. Plots a curve between calculated WCSS values (sum of squared distance) and the
number of clusters k.
4. Look for sharp bend in the curve (looks like an arm or elbow), that point is
considered as the optimal value of k.

Elbow Method to find optimal value of k


The above plot shows the sharp bend, which looks like an elbow. Therefore, it is called the
elbow method.

Advantages of k-means

1. Simple and easy to implement: The k-means algorithm is easy to understand and
implement, making it a popular choice for clustering tasks.

2. Fast and efficient: K-means is computationally efficient and can handle large datasets
with high dimensionality.

3. Scalability: K-means can handle large datasets with a large number of data points and
can be easily scaled to handle even larger datasets.
4. Flexibility: K-means can be easily adapted to different applications and can be used with
different distance metrics and initialization methods.

Disadvantages of K-Means:

1. Sensitivity to initial centroids: K-means is sensitive to the initial selection of centroids


and can converge to a suboptimal solution.

2. Requires specifying the number of clusters: The number of clusters k needs to be


specified before running the algorithm, which can be challenging in some applications.

3. Sensitive to outliers: K-means is sensitive to outliers, which can have a significant impact
on the resulting clusters.

K MODE CLUSTERING

K-mode clustering is an unsupervised machine-learning technique used to group a set of


data objects into a specified number of clusters, based on their categorical attributes.
The algorithm is called “K-Mode” because it uses modes (i.e. the most frequent values)
instead of means or medians to represent the clusters.
In K-means clustering when we used categorical data after converting it into a numerical
form. it doesn’t give a good result for high-dimensional data.
So, Some changes are made for categorical data t.
 Replace Euclidean distance with Dissimilarity metric
 Replace Mean by Mode for cluster centers.
 Apply a frequency-based method in each iteration to update the mode.
And then this is called K-MODE Clustering because of MODE.

Dissimilarity measurements between two data objects

K-modes is an algorithm for clustering categorical data. It is used to partition a dataset


into a specified number of clusters, where each cluster is characterized by a mode, which
is the most frequent categorical value in the cluster.

Similarity and dissimilarity measurements are used to determine the distance between
the data objects in the dataset. In the case of K-modes, these distances are calculated
using a dissimilarity measure called the Hamming distance. The Hamming distance
between two data objects is the number of categorical attributes that differ between
the two objects.
Let x and y be two categorical data objects defined by m features or attributes.

Where,
For example, consider the following dataset with three categorical
attributes:

To calculate the Hamming distance between data objects 1 and 2, we compare their
values for each attribute and count the number of differences. In this case, there is one
difference (Attribute 3 is C for object 1 and D for object 2), so the Hamming distance
between objects
1 and 2 is 1.

To calculate the Hamming distance between objects 1 and 3, we again compare their
values for each attribute and count the number of differences. In this case, there
are two differences (Attribute 2 is B for object 1 and C for object 3, and Attribute 3 is C
for object 1 and E for object 3), so the Hamming distance between objects 1 and 3 is 2.

To calculate the Hamming distance between objects 1 and 4, we again compare their
values for each attribute and count the number of differences. In this case, there
are three differences (Attribute 1 is A for objects 1 and B for object 4, Attribute 2 is B
for object 1 and C for object 4, and Attribute 3 is C for objects 1 and E for object 4), so
the Hamming distance between objects 1 and 4 is 3.

Data objects with a smaller Hamming distance are considered more similar, while objects
with a larger Hamming distance is considered more dissimilar.

It consists of the following steps.


1. Select K data objects for each cluster.
2. Calculate dissimilarities and allocate each data object to nearest cluster.
3. Calculate the new modes for all clusters.
4. Repeat step 2 and 3 until the cluster will become stable.

Overall, the goal of K-modes clustering is to minimize the dissimilarities between the
data objects and the centroids (modes) of the clusters, using a measure of
categorical similarity such as the Hamming distance.

K-Prototypes clustering
K-Prototypes clustering is a partitioning clustering algorithm. We use k-prototypes
clustering to cluster datasets that have categorical as well as numerical attributes. The K-
Prototypes clustering algorithm is an ensemble of k-means clustering and k-modes
clustering algorithm. Hence, it can handle both numerical and categorical data.

k-prototypes clustering, we select k-prototypes randomly at the start. After that, we


calculate the distance between each data point and the prototypes. Accordingly, all the
data points are assigned to clustering associated with different prototypes.

After assigning data points to the clusters, we calculate the new prototype for the current
cluster using the method discussed in the next sections. After that, we recalculate the
distance of prototypes from the data points and reassign the clusters. This process is
continued until the clusters converge.

Reinforcement Learning

Unlike supervised and unsupervised learning, reinforcement learning is a


feedback-based approach in which agent learns by performing some actions as well
as their outcomes. Based on action status (good or bad), the agent gets positive or
negative feedback. Further, for each positive feedback, they get rewarded,
whereas, for each negative feedback, they also get penalized.

Def: “Reinforcement learning is a type of machine learning technique,


where an intelligent agent (computer program) interacts with the environment,
explore it by itself, and makes actions within that."

o Reinforcement learning does not require any labeled data for the learning
process. It learns through the feedback of action performed by the
agent. Moreover, in
reinforcement learning, agents also learn from past experiences.
Reinforcement learning methods are used to solve tasks where decision-making
is sequential and the goal is long-term, e.g., robotics, online chess, etc.

o Reinforcement learning aims to get maximum positive feedback so that


they can improve their performance.
o Reinforcement learning involves various actions, which include taking
action, changing/unchanged state, and getting feedback. And based on these
actions, agents learn and explore the environment.

5.6.1 Exploration and Exploitation in Reinforcement Learning:


Before going to a brief description of exploration and exploitation in machine learning,
let's first understand these terms in simple words. In reinforcement learning, whenever
agents get a situation in which they have to make a difficult choice between
whether to continue the same work or explore something new at a specific time,
then, this situation results in Exploration-Exploitation Dilemma because the
knowledge of an agent about the state, actions, rewards and resulting states is
always partial.

Now we will discuss exploitation and exploration in technical terms.

Exploitation in Reinforcement Learning

Exploitation is defined as a greedy approach in which agents try to get more rewards by
using estimated value but not the actual value. So, in this technique, agents make the
best decision based on current information.

Exploration in Reinforcement Learning

Unlike exploitation, in exploration techniques, agents primarily focus on improving


their knowledge about each action instead of getting more rewards so that they can get
long-term benefits. So, in this technique, agents work on gathering more information to
make the best overall decision.

Examples of Exploitation and Exploration in Machine Learning

Let's understand exploitation and exploration with some interesting real-world


examples.
Example 1: Let's say we have a scenario of online restaurant selection for food orders,
where you have two options to select the restaurant. In the first option, you can choose
your favorite restaurant from where you ordered food in the past; this is called
exploitation because here, you only know information about a specific restaurant. And
for other options, you can try a new restaurant to explore new varieties and tastes
of food, and it is called exploration. However, food quality might be better in the first
option, but it is also possible that it is more delicious in another restaurant.
Non-Associative Learning

In reinforcement learning, non-associative learning refers to a type of learning that


does not involve forming associations or relationships between different stimuli or
actions. It is a simpler form of learning compared to associative learning, which
involves linking different stimuli or actions together.

Non-associative learning is typically observed in situations where an agent's


behavior changes in response to a single stimulus or repeated exposure to the same
stimulus. There are two common types of non-associative learning: habituation and
sensitization.

1. Habituation: Habituation occurs when an agent's response to a particular


stimulus decreases over time with repeated exposure. It is a form of adaptive
behavior where the agent learns to ignore irrelevant or harmless stimuli. For example,
if a robot is repeatedly exposed to a loud noise that is not associated with any
reward or punishment, it may gradually stop reacting to the noise and become
habituated to it.

2. Sensitization: Sensitization is the opposite of habituation and occurs when an


agent's response to a stimulus increases over time with repeated exposure. It
involves an increased sensitivity or responsiveness to a stimulus. For example, if a
robot is repeatedly exposed to a painful stimulus, it may become more sensitive to
that stimulus and show an increased response.

Non-associative learning is not directly related to reinforcement


learning, as reinforcement learning primarily focuses on associative learning,
where agents learn to associate their actions with rewards or punishments.
However, non-associative learning mechanisms can play a role in shaping an agent's
behavior and influencing its responses to stimuli, which can indirectly impact the
learning process in reinforcement learning scenarios.
Markov-Decision Process

Reinforcement Learning is a type of Machine Learning. It allows machines and


software agents to automatically determine the ideal behavior within a specific context,
in order to maximize its performance. Simple reward feedback is required for the
agent to learn its behavior; this is known as the reinforcement signal.
There are many different algorithms that tackle this issue. As a matter of
fact, Reinforcement Learning is defined by a specific type of problem, and all its
solutions are classed as Reinforcement Learning algorithms. In the problem, an
agent is supposed to decide the best action to select based on his current state.
When this step is repeated, the problem is known as a Markov Decision Process.
A Markov Decision Process (MDP) model contains:

A set of possible world states S.


A set of Models.
A set of possible actions A.

A real-valued reward function R(s,a).


A policy the solution of Markov Decision Process.

State:

A State is a set of tokens that represent every state that the agent can be in.

Model:
A Model (sometimes called Transition Model) gives an action’s effect in a state.
In particular, T(S, a, S’) defines a transition T where being in state S and taking an
action
‘a’ takes us to state S’ (S and S’ may be the same). For stochastic actions (noisy, non-
deterministic) we also define a probability P(S’|S,a) which represents the probability
of reaching a state S’ if action ‘a’ is taken in state S. Note Markov property states
that the effects of an action taken in a state depend only on that state and
not on the prior history.

Actions
An Action A is a set of all possible actions. A(s) defines the set of actions that can be
taken being in state S.

Reward
A Reward is a real-valued reward function. R(s) indicates the reward for simply being
in the state S. R(S,a) indicates the reward for being in a state S and taking an
action ‘a’. R(S,a,S’) indicates the reward for being in a state S, taking an action ‘a’ and
ending up in a state S’.
Policy
A Policy is a solution to the Markov Decision Process. A policy is a mapping from S to a. It
indicates the action ‘a’ to be taken while in state S.

Let us take the example of a grid world:

An agent lives in the grid. The above example is a 3*4 grid. The grid has a
START state(grid no 1,1). The purpose of the agent is to wander around the grid to
finally reach the Blue Diamond (grid no 4,3). Under all circumstances, the agent should
avoid the Fire grid (orange color, grid no 4,2). Also the grid no 2,2 is a blocked grid, it acts
as a wall hence the agent cannot enter it.

The agent can take any one of these actions: UP, DOWN, LEFT, RIGHT
Walls block the agent path, i.e., if there is a wall in the direction the agent would
have taken, the agent stays in the same place. So for example, if the agent says
LEFT in the START grid he would stay put in the START grid.
First Aim: To find the shortest sequence getting from START to the Diamond. Two
such sequences can be found:
RIGHT RIGHT UP UPRIGHT

UP UP RIGHT RIGHT RIGHT


Let us take the second one (UP UP RIGHT RIGHT RIGHT) for the subsequent discussion.
The move is now noisy. 80% of the time the intended action works correctly. 20% of
the time the action agent takes causes it to move at right angles. For example, if the
agent says UP the probability of going UP is 0.8 whereas the probability of going LEFT is
0.1, and the probability of going RIGHT is 0.1 (since LEFT and RIGHT are right angles to UP).

The agent receives rewards each time


step:-

Small reward each step (can be negative when can also be term as punishment,

in the above example entering the Fire can have a reward of -1).
Big rewards come at the end (good or bad).
The goal is to Maximize the sum of rewards.

Q-learning
Q-learning is a model-free, value-based, off-policy algorithm that will find the best
series of actions based on the agent's current state. The “Q” stands for quality.
Quality represents how valuable the action is in maximizing future rewards.

The model-based algorithms use transition and reward functions to estimate the
optimal policy and create the model. In contrast, model-free algorithms learn
the consequences of their actions through the experience without transition and
reward function.

The value-based method trains the value function to learn which state is more
valuable and take action. On the other hand, policy-based methods train the policy
directly to learn which action to take in a given state.

In the off-policy, the algorithm evaluates and updates a policy that differs from
the policy used to take an action. Conversely, the on-policy algorithm evaluates and
improves the same policy used to take an action

Before we jump into how Q-learning works, we need to learn a few useful
terminologies to understand Q-learning's fundamentals.
States(s): the current position of the agent in the environment.

Action(a): a step taken by the agent in a particular state.

Rewards: for every action, the agent receives a reward and penalty.

Episodes: the end of the stage, where agents can’t take new action. It happens
when
the agent has achieved the goal
or failed.

Q(St+1, a): expected optimal Q-value of doing the action in a particular state.

Q(St, At): it is the current estimation of Q(St+1, a).

Q-Table: the agent maintains the Q-table of sets of states and actions.

Temporal Differences(TD): used to estimate the expected value of Q(St+1, a) by


using the current state and action and previous state and action.

Q-
Table

The agent will use a Q-table to take the best possible action based on the expected
reward for each state in the environment. In simple words, a Q-table is a data structure
of sets of actions and states, and we use the Q-learning algorithm to update the values
in the table.

Q-
Function

The Q-function uses the Bellman equation and takes state(s) and action(a) as input.
The equation simplifies the state values and state-action value calculation.

Q-learning algorithm
Q learning algorithm

You might also like