0% found this document useful (0 votes)
3 views37 pages

ML Module5

The document discusses clustering algorithms, which group similar objects into classes without predefined labels, highlighting the differences between clustering and classification. It covers various clustering methods, including hierarchical and k-means algorithms, as well as proximity measures like Euclidean and Hamming distances. Additionally, it outlines the advantages, disadvantages, and applications of clustering in fields such as customer profiling and document indexing.

Uploaded by

sahanasanu9543
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views37 pages

ML Module5

The document discusses clustering algorithms, which group similar objects into classes without predefined labels, highlighting the differences between clustering and classification. It covers various clustering methods, including hierarchical and k-means algorithms, as well as proximity measures like Euclidean and Hamming distances. Additionally, it outlines the advantages, disadvantages, and applications of clustering in fields such as customer profiling and document indexing.

Uploaded by

sahanasanu9543
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Machine Learning (BCS602)

MODULE 5
CHAPTER 13

CLUSTERING ALGORITHMS
• Clustering: the process of grouping a set of objects into classes of similar objects

• Documents within a cluster should be similar.

• Documents from different clusters should be dissimilar

• Finding similarities between data according to the characteristics found in the data and
grouping similar data objects into clusters.
• Unsupervised learning: no predefined classes.
• Example: Below fig: shows the data points with two features shown in different shaded
samples.

If few similarities then manually we can do , but when examples have more features then cannot
be done manually, so automatic clustering is required.
Deepa S, Dept .of CSE,RNSIT 1
Machine Learning (BCS602)

Clusters are represented by centroids.


Example:
(3,3),(2,6) and(7,9).
Centroid : (3+2+7,3+6+9)=(4,6). The clusters should not overlap and every
cluster should represent only one class.
Difference between Clustering and Classification
S.NO. Clustering Classification
Unsupervised learning and cluster Supervised learning with the presence of a
1. formation are done by trial and supervisor to provide training and testing data
error as there is no supervisor
2. Unlabelled data Labelled data
No prior knowledge in clustering Knowledge of the domain is must to label the
3.
samples of the dataset
4. Cluster results are dynamic Once a label is assigned, it does not change

Applications of Clustering
1. Grouping based on customer buying patterns
2. Profiling of customers based on lifestyle
3. In information retrieval applications (like retrieval of a document from a collection
of documents)
4. Identifying the groups of genes that influence a disease
5. Identification of organs that are similar in physiology functions
6. Taxonomy of animals, plants in Biology
7. Clustering based on purchasing behaviour and demography
8. Document indexing
9. Data compression by grouping similar objects and finding duplicate objects

Advantages and Disadvantages


Sl.No Advantages Disadvantages
Cluster analysis algorithms are
Cluster analysis algorithms can handle
1 sensitive to initialization and order
missing data and outliers.
of the input data.
Can help classifiers in labelling the Often, the number of clusters
2 unlabelled data. Semi-supervised present in the data have to be
algorithms use cluster analysis specified by the user
Deepa S, Dept .of CSE,RNSIT 2
Machine Learning (BCS602)

algorithms to label the unlabelled data


and then use classifiers to classify them.
Cluster analysis algorithms are
Cluster analysis algorithms can handle
3 sensitive to initialization and order
missing data and outliers.
of the input data.
Can help classifiers in labelling the
unlabelled data. Semi-supervised Often, the number of clusters
4 algorithms use cluster analysis present in the data have to be
algorithms to label the unlabelled data specified by the user.
and then use classifiers to classify them.

Challenges of Clustering Algorithms


1. Collection of data with higher dimensions.
2. Designing a proximity measure is another challenge.
3. The curse of dimensionality

PROXIMITY MEASURES
Clustering algorithms need a measure to find the similarity or dissimilarity among the
objects to group them. Similarity and Dissimilarity are collectively known as proximity
measures. This is used by a number of data mining techniques, such as clustering, nearest
neighbour classification, and anomaly detection.

Distance measures are known as dissimilarity measures, as these indicate how one object
is different from another.
Measures like cosine similarity indicate the similarity among objects.
Distance measures and similarity measures are two sides of a same coin, as more distance
indicates more similarity and vice-versa.

If all the conditions are satisfied, then the distance measure is called metric.
Some of proximity measures:
1. Quantitative variables

Deepa S, Dept .of CSE,RNSIT 3


Machine Learning (BCS602)

a) Euclidean distance: It is one of the most important and common distance


measure. It is also called L2 norm.

Advantage: The distance does not change with the addition of new object.
Disadvantage: i) If the unit changes, the resulting Euclidean or squared
Euclidean Changes drastically.
ii) Computational complexity is high, because it involves square root and
square.
b) City Block Distance: Known as Manhattan Distance or L1 norm.

c) Chebyshev Distance: Also known as maximum value distance. This is the


absolute magnitude of the differences between the coordinates of a pair of
objects.This distance is called supremum distance or Lmax or L∞ norm.

d) Minkowski Distance: In general, all the above distances measures can


be generalized as:

Deepa S, Dept .of CSE,RNSIT 4


Machine Learning (BCS602)

Binary Attributes: Binary Attributes have only two values. Distance measures
have discussed above cannot be applied to find the distance between objects
that have binary attributes. For finding the distance among objects with binary
objects, the contingency table is used.

Deepa S, Dept .of CSE,RNSIT 5


Machine Learning (BCS602)

Hamming Distance: Hamming distance is a metric for comparing two binary data strings.
While comparing two binary strings of equal length, Hamming distance is the number of bit
positions in which the two bits are different. It is used for error detection or error correction
when data is transmitted over computer networks.

Example
Suppose there are two strings 1101 1001 and 1001 1101.

11011001 ⊕ 10011101 = 01000100. Since, this contains two 1s, the Hamming distance,
d(11011001, 10011101) = 2.

Categorical Variables

In many cases, categorical values are used. It is just a code or symbol to represent the values. For
example, for the attribute Gender, a code 1 can be given to female and 0 can be given to male. To
calculate the distance between two objects represented by variables, we need to find only whether
they are equal or not. This is given as:

Ordinal Variables
Ordinal variables are like categorical values but with an inherent order. For example, designation is an
ordinal variable. If job designation is 1 or 2 or 3, it means code 1 is higher than 2 and code 2 is higher than
3. It is ranked as 1 >2>3.

Cosine Similarity
 Cosine similarity is a metric used to measure how similar the documents are
irrespective of their size.
 It measures the cosine of the angle between two vectors projected in a multi-
dimensional space.
 The cosine similarity is advantageous because even if the two similar documents are
far apart by the Euclidean distance (due to the size of the document), chances are
they may still be oriented closer together.
 The smaller the angle, higher the cosine similarity.

Deepa S, Dept .of CSE,RNSIT 6


Machine Learning (BCS602)

 Consider 2 documents P1 and P2.


◦ If distance is more, then less similar.
◦ If distance is less, then more similar.

Hierarchical Clustering Algorithms


Hierarchical clustering involves creating clusters that have a predetermined ordering from
top to bottom.
For example, all files and folders on the hard disk are organized in a hierarchy.
Hierarchical relationship is shown in the form of dendogram.
There are two types of hierarchical clustering.
◦ Divisive and Agglomerative.

 Divisive method : In divisive or top-down clustering method we assign all of the


observations to a single cluster and then partition the cluster to two least similar clusters.
Finally, we proceed recursively on each cluster until there is one cluster for each observation.
There is evidence that divisive algorithms produce more accurate hierarchies than

Deepa S, Dept .of CSE,RNSIT 7


Machine Learning (BCS602)

agglomerative algorithms in some circumstances but is conceptually more complex.


 Agglomerative method: In agglomerative or bottom-up clustering method we assign each
observation to its own cluster. Then, compute the similarity (e.g., distance) between each of
the clusters and join the two most similar clusters. Finally, repeat steps 2 and 3 until there is
only a single cluster left. The related algorithm is shown below.

 The following three methods differ in how the distance between each cluster is measured.
1. Single Linkage
2. Average Linkage
3. Complete Linkage
Single Linkage or MIN algorithm
In single linkage hierarchical clustering, the distance between two clusters is defined
as the shortest distance between two points in each cluster. For example, the distance
between clusters “r” and “s” to the left is equal to the length of the arrow between
their two closest points.

Deepa S, Dept .of CSE,RNSIT 8


Machine Learning (BCS602)

Deepa S, Dept .of CSE,RNSIT 9


Machine Learning (BCS602)

 Complete Linkage : In complete linkage hierarchical clustering, the distance between two

clusters is defined as the longest distance between two points in each cluster. For example,
the distance between clusters “r” and “s” to the left is equal to the length of the arrow
between their two furthest points.

Deepa S, Dept .of CSE,RNSIT 10


Machine Learning (BCS602)

OR

 Average Linkage : In average linkage hierarchical clustering, the distance between two
clusters is defined as the average distance between each point in one cluster to every point in
the other cluster. For example, the distance between clusters “r” and “s” to the left is equal to
the average length each arrow between connecting the points of one cluster to the other.

Deepa S, Dept .of CSE,RNSIT 11


Machine Learning (BCS602)

Mean-Shift Algorithm

Use dataset and apply hierarchical methods. Show the dendrogram.


SNo. X Y

1. 3 5

2. 7 8

3. 12 5

4. 16 9

5. 20 8

Table Sample Data

Deepa S, Dept .of CSE,RNSIT 12


Machine Learning (BCS602)

Solution

The similarity table among the variables is computed and is shown in Table 134.4. Euclidean
distance is computed and is shown in the following Table 143.57.
Table 134.57: Proximity Matrix
Objects 0 1 2 3 4
0 - 5 9 9.85 17.26
1 - 5.83 9.49 13
2 - 5.66 8.94
3 - 4.12
4 -

The minimum distance is 4.12. Therefore, the items 1 and 4 are clustered together. The resultant
table is given as shown in the following Table.
Table After Iteration 1
Clusters {1,4} 2 3 5

{1,4} - 5 5.66 4.12


2 - 5.83 13

3 - 8.94

5 -

The distance between the group {1, 4} and items 2, 3, 5 are computed using this formula.
Thus, the distance between {1,4} and {2} is:
Minimum { {1,4}, {2} = Minimum {(1,2),(4,2)=5
The distance between {1,4} and {3} is given as:
Minimum { {1,3}, {4,3} } = Minimum {9,5.66}=5.66
The distance between {1,4} and {5} is given as:
Minimum { {1,5}, {2,5} } = Minimum {17.26,4.12} = 4.12

The minimum distance of above table is 4.12. Therefore, {1,4} and {5} are combined. This
results in the following Table.
Table After Iteration 2

Clusters {1,4,5} 2 5
{1,4,5} - 5 5.66
2 - 5.83

5 -

Deepa S, Dept .of CSE,RNSIT 13


Machine Learning (BCS602)

Thus, the distance between {1,4,5} and {2} is: Minimum


{(1,2),(4,2},(5,2)}= {5,9.49,13} = 5

Thus, the distance between {1,4,5} and {3} is:


Minimum { {1,3}, {4,3},{5,3)} = Minimum {9,5.66,8.94} = 5.66

The minimum is 5. Therefor {1,4,5} and {2} is combined. And finally, it is combined with
{3}.

therefore, the order of cluster is {1,4} then {5}, then {2} and finally {3}.
Complete Linkage or MAX or Clique
Here from the first iteration table minimum is taken and {1,4} is combined. Then maximum is
computed as

Thus, the distance between {1,4} and {2} is:


Max{ {1,4}, {2} = Max {(1,2),(4,2)= 9.49
The distance between {1,4} and {3} is given as: Max {
{1,3}, {4,3} } = Max {9,5.66}=9
The distance between {1,4} and {5} is given as:
Max{ {1,5}, {2,5} } = Max {17.26,4.12} = 17.26

This results in a Table


Clusters {1,4} 2 3 5

{1,4} - 9.49 9 17.26


2 - 5.83 13

3 - 8.94

5 -

So, the minimum is 8.94. Therefore, {3,5} is combined. This is shown in the following Table.

Clusters {1,4} {3,5} 2


{1,4} - 17.26 9.49
{3,5} - 13

2 -

Deepa S, Dept .of CSE,RNSIT 14


Machine Learning (BCS602)

The minimum is 9.49. Therefore {1,4,2} are combined. The order of cluster is {1,4}, {1,4}
and {2}, and {3,5}.
Hint: The same is used for average link algorithm where the average distance of all pairs of
points across the clusters is used to form clusters.

Consider the following data shown in Table 143.125. Use k-means algorithm with k = 2 and show
the result.
Table Sample Data
SNO X Y

1. 3 5

2. 7 8

3. 12 5

4. 16 9

Solution

Let us assume the seed points are (3,5) and (16,9). This is shown in the following tableas
starting clusters.
Table Initial Cluster Table

Cluster 1 Cluster 2
(3,5) (16,9)

Centroid (3,5) Centroid (16,9)

Iteration 1: Compare all the data points or samples with the centroid and assigned to the
nearest sample.

Take the sample object 2 and compare it with the two centroids as follows:

Dist(2,centroid 1) = 5
(7 − 3)2 + (8 − 5)2

Dist(2,centroid 2) = 9.05

Deepa S, Dept .of CSE,RNSIT 15


Machine Learning (BCS602)

Object 2 is closer to centroid of cluster 1 and hence assign it to the cluster 1. This is shown in
Table. For the object 3:,

Dist(3,centroid 1) =
9
Dist(3,centroid 2) = (12 − 3)2 + (5 − 5)2 5.66

Object 3 is closer to centroid of cluster 2. and hence remains in the same cluster 1

This is shown in the following Table.


Table Cluster Table After Iteration 1

Cluster 1 Cluster 2
(3,5) (12,4)
(7,8) (10,4)

Centroid (10/2,13/2)=(5,6.5) Centroid (28/2,14/2)=(14,7)

The second iteration is started again. Compute again,

Dist(1,centroid 1) =
6.25
Dist(1,centroid 2) = (7 − 5)2 + (8 − 6.5)2 7.07
(12 −14)2 + (8 − 7)2

Object 1 is closer to centroid of cluster 1 and hence remains in the same cluster. Take the
sample object 3, compute again

Dist(3,centroid 1) = (12 − 5)2 + (5 − 6.5)2 7.16

Dist(3,centroid 2) = (16 −14)2 + (9 − 7)2 2.82

Object 3 is closer to centroid of cluster 2 and remains in the same cluster.


Therefore, the resultant clusters are
{(3,5), (7,80} and {(12,5),(16,9)}.

Deepa S, Dept .of CSE,RNSIT 16


Machine Learning (BCS602)

PARTITIONAL CLUSTERING ALGORITHMS

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as
the centroid-based method. The most common example of partitioning clustering is the K-Means
Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of
pre-defined groups. The cluster center is created in such a way that the distance between the data
points of one cluster is minimum as compared to another cluster centroid.

K means can be viewed as greedy algorithm as it involves partitioning ‘n’ samples to k clusterd
to minimize sum of squared Error. SSE is a metric that is a measure of error that gives the sum
of the squared Euclidean distances of each data to its closet centroid.

𝑘
SSE= ∑ 𝑓(𝑥) = ∑𝑖=1(𝐝𝐢𝐬𝐭(𝐜𝐢 , x)2)
Here ci = centroid of ith cluster
x=sample data

Advantages
1. Simple
2. Easy to implement
Disadvantages
1. It is sensitive to initialization process as change of initial points leads to different clusters.
2. If the samples are large, then the algorithm takes a lot of time.

Deepa S, Dept .of CSE,RNSIT 17


Machine Learning (BCS602)

Complexity
The complexity of k-means algorithm is dependent on the parameters like 1, the number of
samples, k, the number of clusters, (nkld). I is the number of iterations and d is the number of
attributes. The complexity of k-means algorithm is 0 (n2).

PROBLEM

Deepa S, Dept .of CSE,RNSIT 18


Machine Learning (BCS602)

Density-Based Clustering

A cluster is a dense region of points, which is separated by low-density regions, from other regions
of high density.
Used when the clusters are irregular or intertwined, and when noise and outliers are present.
Density-Based Clustering refers to unsupervised learning methods that identify distinctive
groups/clusters in the data, based on the idea that a cluster in data space is a contiguous region of
high point density, separated from other such clusters by contiguous regions of low point density.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a base algorithm for
density-based clustering. It can discover clusters of different shapes and sizes from a large amount
of data, which is containing noise and outliers.

Deepa S, Dept .of CSE,RNSIT 19


Machine Learning (BCS602)

The DBSCAN algorithm uses two parameters:


minPts: The minimum number of points (a threshold) clustered together for a region to be
considered dense.
eps (ε): A distance measure that will be used to locate the points in the neighborhood of any point.
These parameters can be understood if we explore two concepts called Density Reachability and
Density Connectivity.
Reachability in terms of density establishes a point to be reachable from another if it lies within a
particular distance (eps) from it. Connectivity, on the other hand, involves a transitivity based
chaining-approach to determine whether points are located in a particular cluster. For example, p
and q points could be connected if p->r->s->t->q, where a->b means b is in the neighborhood of
a.

There are three types of points after the DBSCAN clustering is complete:

• Core — This is a point that has at least m points within distance n from itself.
• Border — This is a point that has at least one Core point at a distance n.
• Noise — This is a point that is neither a Core nor a Border. And it has less than m points
within distance n from itself.

The following connectedness measures are used for this algorithm.


1. Direct density reachable - The point X is directly reachable from Y, if:
(a) X is the ε -neighborhood of Y
(b) Y is a core point
2. Densely reachable - The point X is densely reachable from Y, if there is a set of core points
that leads from Y to X.

Deepa S, Dept .of CSE,RNSIT 20


Machine Learning (BCS602)

3. Density connected – X and Y are densely connected if Z is a core point and thus points X and
Y are densely reachable from Z.

Advantages
1. No need for specifying the number of clusters beforehand
2. The algorithm can detect clusters of any shapes
3. Robust to noise
4. Few parameters are needed
The complexity of this algorithm is O(nlogn).

Grid-Based Approaches
grid-based clustering method takes a space-driven approach by partitioning the embedding
space into cells independent of the distribution of the input objects.

The grid-based clustering approach uses a multiresolution grid data structure. It quantizes the
object space into a finite number of cells that form a grid structure on which all of the operations
for clustering are performed.

The main advantage of the approach is its fast processing time, which is typically independent
of the number of data objects, yet dependent on only the number of cells.

There are three important concepts that need to be mastered for understanding the grid-based
schemes. They are:
1. Subspace clustering

Deepa S, Dept .of CSE,RNSIT 21


Machine Learning (BCS602)

2. Concept of dense cells


3. Monotonicity property

Subspace Clustering

Grid-based algorithms are useful for clustering high-dimensional data, that is, data with many attributes.
Some data like gene data may have millions of attributes. Every attribute is called a dimension. But all the
attributes are not needed, as in many applications one may not require all the attributes. For example, an
employee's address may not be required for profiling his diseases. Age may be required in that case. So,
one can conclude that only a subset of features is required.

CLIQUE is a density-based and grid-based subspace clustering algorithm, useful for finding
clustering in subspace.
Concept of Dense cell
CLIQUE partitions each dimension into several overlapping intervals and intervals it into cells.
Then, algorithm determines whether the cells is dense or sparse. The cell is considered dense if
it exceeds a threshold value.

It is defined as the ratio of number of points and volume of the region. In one pass, the algorithm
finds the number of cells , number of points etc and then combines the dense cells. For that the
algorithm uses the contiguous intervals and a set of dense cells.

MONOTONICITY Property
CLIQUE uses antimonotonicity property or apriori algorithm. It means that all the subsets of a
frequent itemset are frequent. Similarly if the subset is infrequent then its superset are
infrequent.

Deepa S, Dept .of CSE,RNSIT 22


Machine Learning (BCS602)

Algorithm works in 2 stages:

Advantages of CLIQUE
1. Insensitive to input order of objects
2. No assumptions of underlying data distributions
3. Finds subspace of higher dimensions such that high-density clusters exist in those
subspaces

Disadvantage
The disadvantage of CLIQUE is that tuning of grid parameters, such as grid size, and finding
optimal threshold for finding whether the cell is dense or not is a challenge.

Deepa S, Dept .of CSE,RNSIT 23


Machine Learning (BCS602)

Reinforcement Learning

Overview of Reinforcement Learning


What is Reinforcement Learning?
Reinforcement Learning (RL) is a machine learning paradigm that mimics
how humans and animals learn through experience.
Humans interact with the environment, receive feedback (rewards or penalties),
and adjust their behavior accordingly.
Example: A child touching fire learns to avoid it after experiencing pain (negative
reinforcement)

How RL Works in Machines


• RL simulates real-world scenarios for a computer program (agent) to learn by
trial and error.
• The agent executes actions, receives positive or negative rewards, and optimizes
its future actions based on these experiences.

Types of Reinforcement Learning


1. Positive Reinforcement Learning
Rewards encourage good behavior (reinforce correct actions).
Example: A robot gets +10 points for reaching a goal successfully.
Effect: Increases the likelihood of repeating the rewarded action.
2. Negative Reinforcement Learning
Negative rewards discourage unwanted actions.
Example: A game agent loses -10 points for stepping into a danger zone.
Effect: Helps the agent learn to avoid negative outcomes.

Deepa S, Dept .of CSE,RNSIT 24


Machine Learning (BCS602)

Characteristics of RL
1. Sequential Decision Making
o The goal is achieved through a sequence of decisions.
o One wrong decision can lead to failure.
o RL involves learning a proper sequence of steps to reach the target.
2. Delayed Feedback
o Rewards or feedback are not given immediately.
o It may take several steps or moves before knowing whether the action was good
or bad.
3. Interdependent Actions
o Each action affects future actions.
o A wrong move now can cause problems later.
4. Time-related Actions
o All actions are linked to time.
o Actions are naturally ordered in a timeline.

Challenges of Reinforcement Learning


1. Reward Design
o It is difficult to define appropriate rewards in many games.
o Determining the value of rewards is a major challenge.
2. Absence of a Model
o Some games (e.g., chess) have defined rules, but many environments do not.
o Without a model, simulation is necessary to learn.
3. Partial Observability
o Not all states are fully visible or known.
o Example: In weather forecasting, complete information is not always available.
4. Time-Consuming Operations
o Large state spaces and multiple possible actions take more time to process.
o This increases the overall training and computation time.
5. Complexity
o Games like GO have many possible board states and actions.
o Lack of labeled data and high complexity make RL design difficult.

Deepa S, Dept .of CSE,RNSIT 25


Machine Learning (BCS602)

Applications of Reinforcement Learning


Reinforcement Learning (RL) is used in many real-world applications, including:
• Industrial Automation
o For improving efficiency and decision-making in manufacturing processes.
• Resource Management
o Optimizing allocation of limited resources in various domains.
• Traffic Light Control
o To minimize traffic congestion using smart decision-making systems.
• Personalized Recommendations
o In platforms like news apps or streaming services to suggest content based on
user behavior.
• Ad Bidding Systems
o Automatically deciding the best ads to show in real time for maximum impact.
• Customized Applications
o Tailor-made solutions based on user-specific data and goals.
• Driverless Cars
o Learning to drive through interactions with the environment and improving
performance.
• Games and Deep Learning
o Used in playing complex games like Chess and GO alongside deep learning.
• DeepMind Applications
o Advanced AI models that generate images, programs, or solve complex
problems.

Reinforcement Learning vs Supervised Learning

Reinforcement Learning Supervised Learning

No supervisor and labelled dataset initially Presence of supervisor and labelled data
Decisions are dependent and are made Decisions are independent and based on
sequentially input given in training
Feedback is not instantaneous and delayed by Feedback is usually instantaneous once the
time model is created
Depends on initial input or input given at
Agent action affects the next input data
start
No target values, only goal-oriented Target class is predefined by the problem
Example: Chess, GO, Atari Games Example: Classifiers
Deepa S, Dept .of CSE,RNSIT 26
Machine Learning (BCS602)

Reinforcement Learning vs Unsupervised Learning

Reinforcement Learning Unsupervised Learning


Mapping from input to output is not
Mapping from input to output is present
present

Get constant feedback from environment No feedback from environment

Components of Reinforcement Learning


Reinforcement Learning consists of four key components:
• Environment
• Agent
• Actions
• Rewards

Reinforcement Problems
There are two types of problems in RL:
1. Learning Problems
o The environment is unknown.
o The agent learns through trial and error.
o It interacts with the environment to improve its policy (behavior strategy).
2. Planning Problems
o The environment is known.
o The agent computes with the model to improve the policy.
Environment and Agent
• Environment:
o The external system where all actions happen.
o It includes input, output, and reward definitions.
o It describes the state or state variables (initially called the initial state).
o Example: In a self-driving car system, maps, rules, and road obstacles are part
of the environment.
• Agent:
o An autonomous entity that observes the environment and performs actions.
o It could be a robot, chatbot, or software that learns from the environment.

Deepa S, Dept .of CSE,RNSIT 27


Machine Learning (BCS602)

States and Actions


• In Reinforcement Learning, the input given to the agent is called a state.
• The output or the response made by the agent is called an action.
• Example: A graph (like Figure 14.5) may show how an agent moves up or down based
on its chosen actions.

States
• A state is the current situation or position of the agent (e.g., location in a maze or city).
• Example states: A, B, C, D, E, F, G, H, I
• A = Starting state, G = Goal/Target state
• Notations:
o S – general state
o s – specific state
o sₜ – state at time t
🔹 Types of Nodes
1. Goal Node (also called Terminal or Absorbing State)
o Final destination with the highest reward.
2. Non-terminal Nodes
o Intermediate states before reaching the goal.
3. Start Node
o The initial position of the agent.
🔹 Actions
Deepa S, Dept .of CSE,RNSIT 28
Machine Learning (BCS602)

• Actions are transitions between states (e.g., moving from A → B).


• Example: Action UP causes the agent to move from A to B.
• Notations:
o A – general action (e.g., UP, DOWN)
o a – specific action
o aₜ – action at time t
🔹 Rewards
• After each action, the agent receives a reward that indicates how good or bad the action
was.

Episodes in Reinforcement Learning


• An episode = sequence of steps from the start state to the goal state.
• Each episode includes states, actions, and rewards.
Types of Episodes:
1. Episodic Tasks
o Has a clear start and end (goal state).
o Example paths (episodes):
▪ A→B→D→F→G
▪ A→B→E→G
▪ A successful episode: E → F → C → B → A → D → G
2. Continuous Tasks
o No terminal or goal state.
o The task continues indefinitely without a specific endpoint.

Markov Decision Process (MDP)


What is a Markov Chain?
• A Markov chain is a stochastic (random) process that satisfies the Markov property.
• It is a sequence of random variables:
X0,X1,X2,...,Xn such that the next state depends only on the current state, not the past states.
Example: University Student Transitions
• Two states: University A and University B
• Scenario:
o 80% of students in University A move to University B
o 20% stay in University A

Deepa S, Dept .of CSE,RNSIT 29


Machine Learning (BCS602)

o 60% of students in University B move to University A


o 40% stay in University B
Markov Chain Diagram
States: A and B

Transition Matrix (P)


The transition probabilities are stored in a matrix:
• Each row represents the current state.
• Each column represents the next state.

• Each row must sum to 1, as it represents a probability distribution.


Mathematical Notation
• Transition matrix at time t:

MULTI-ARM BANDIT PROBLEM AND REINFORCEMENT PROBLEM


TYPES
Reinforcement learning uses trial and error to learn a series of actions that maximize the total
reward. There are two sub-problems in the reinforcement problem.
1. The first problem is predicting the total reward, called policy evaluation or value

Deepa S, Dept .of CSE,RNSIT 30


Machine Learning (BCS602)

estimation. It uses a state-value function, estimated using Temporal


Difference (TD) Learning.
2. The second problem is choosing actions that maximize rewards, called policy
improvement. Together, policy evaluation and policy improvement form
policy iteration to find the optimal policy.

A common problem in reinforcement learning is the multi-arm (or N-arm) bandit problem.
Imagine a 5-armed slot machine in a casino (Figure 14.7).
Each lever, when pulled, gives a random reward between $1 and $10.
There can be N levers, each with unknown reward behavior.
The goal is to select levers wisely to maximize total money earned in a limited number of
tries.

The money returned by each lever is called the reward.


The point is which lever is the best one to activate to get maximum profit. A possible answer to
this question is: the lever that returns average return more than other levers is the best slot
machine. Let us formalize the problem: Given k number of chances to access N-arm slot
machine, with actions of pulling the lever with associated rewards, ..., the expected reward given
is given as:

This function is called action-value function or Q function. This indicates the value of taking a
particular action 'd'.

)
The best action (highest Q. value) is the action that returns the highest average return and is the
indicator of the action quality.

Deepa S, Dept .of CSE,RNSIT 31


Machine Learning (BCS602)

Selection Procedure

Reinforcement Learning Agent Types


An agent can be classified based on how it learns:
1. Value-Based Approaches
• Aim: Optimize the value function V(s)V(s)V(s)
• V(s)V(s)V(s): Expected future reward from a state, discounted over time.
• The agent chooses actions leading to states with higher values.
2. Policy-Based Approaches
• Aim: Find the optimal policy
• A policy maps each state to a probability distribution over actions.
• Focus: Choose the best action in each state directly.
3. Hybrid Methods (Actor-Critic)
• Combine value-based and policy-based approaches.
• Actor updates the policy; Critic evaluates it using value functions.
4. Model-Based Approaches

Deepa S, Dept .of CSE,RNSIT 32


Machine Learning (BCS602)

• Build a model of the environment first.


• Use methods like Markov Decision Process (MDP) for planning.
5. Model-Free Methods
• No model of the environment is used.
• Learn from experience using methods like:
o Temporal Difference (TD) Learning
o Monte Carlo Methods

Reinforcement algorithm

Solving Reinforcement Problems conventional methods: There are two main algorithms for
solving reinforcement learning problems using conventional methods:
1. Value Iteration 2. Policy Iteration

Policy iteration consists of two main steps:


1. Policy Evaluation
2. Policy Improvement
Policy Evaluation
Initially, for a given policy π, the algorithm starts with v(s) = 0 (no reward). The Bellman
equation is used to obtain v(s), and the process continues iteratively until the optimal
v(s) is found.

Deepa S, Dept .of CSE,RNSIT 33


Machine Learning (BCS602)

Policy Improvement
The policy improvement process is performed as follows:
1. Evaluate the current policy using policy evaluation.
2. Solve the Bellman equation for the current policy to obtain v(s).
3. Improve the policy by applying the greedy approach to maximize expected
rewards.
4. Repeat the process until the policy converges to the optimal policy.
Algorithm
1. Start with an arbitrary policy π.
2. Perform policy evaluation using Bellman’s equation.
3. Improve the policy greedily.
4. Repeat until convergence.

Monte-Carlo Methods (MC)


• Monte-Carlo (MC) methods learn from experience without assuming any model of the
environment.
• These methods simulate and interact with the environment to gain knowledge, similar
to trial-and-error learning.

MC Makes following Assumptions


1. All episodes must terminate – Each episode must eventually end regardless of the
starting point.
2. Use of value-action functions – Values are calculated after the episode ends (not
during).
o Hence, MC methods are incremental.

Working of MC
• Rewards are collected only at the end of an episode.
• These are used to calculate the maximum expected future reward (called return).
• Empirical return is used instead of expected return.
• The return is averaged over multiple episodes for each state.

Value Function Update

Deepa S, Dept .of CSE,RNSIT 34


Machine Learning (BCS602)

• n(st) Number of times state st has been visited.


• Gi: Return from episode i.

🔹 Incremental Update Rule (Every-Visit MC)


After each episode ends, the value function is updated as:

Difference between Monte-Carlo Method and Temporal Difference Method

Monte-Carlo Method Temporal Difference Method


Easy to understand; does not exploit Exploits Markov property
Markov property
Requires episode termination to update Does not require episode termination
values

Temporal Difference (TD) Learning


• Alternative to Monte Carlo method
• Model-free learning method (no need for environment model)
• Learns from experience and interaction with environment
• Based on bootstrapping:
o Uses current estimate and future reward to update values
• Estimates reward at every state step-by-step
• Updates occur after every step, not after full episode

Deepa S, Dept .of CSE,RNSIT 35


Machine Learning (BCS602)

Q-Learning
Q-Learning Algorithm
1. Initialize Q-table:
Create a table Q(s,a) with states s and actions a.
Initialize Q-values with random or zero values.
2. Set parameters:
Learning rate α\alphaα (typically between 0 and 1).
Discount factor γ (typically close to 1).
Exploration-exploitation trade off strategy (e.g., ε-greedy policy).
3. Repeat for each episode:
Start from an initial state s.
Repeat until reaching a terminal state:

4. End the training once convergence is reached (Q-values become stable).

Deepa S, Dept .of CSE,RNSIT 36


Machine Learning (BCS602)

SARSA Learning
SARSA Algorithm (State-Action-Reward-State-Action)

Differences between SARSA and Q-Learning

Deepa S, Dept .of CSE,RNSIT 37

You might also like