0% found this document useful (0 votes)
17 views43 pages

Unit - 5

The document provides an overview of reinforcement learning (RL), detailing its components, algorithms, and applications. It explains concepts such as Q-learning, deep reinforcement learning, and transfer learning, highlighting their significance in various fields like gaming, healthcare, and robotics. Additionally, it covers Markov Chain Monte Carlo methods for sampling and their role in approximating probability distributions.

Uploaded by

ANJALI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views43 pages

Unit - 5

The document provides an overview of reinforcement learning (RL), detailing its components, algorithms, and applications. It explains concepts such as Q-learning, deep reinforcement learning, and transfer learning, highlighting their significance in various fields like gaming, healthcare, and robotics. Additionally, it covers Markov Chain Monte Carlo methods for sampling and their role in approximating probability distributions.

Uploaded by

ANJALI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Machine Learning

By
R.Sanghavi
Asst Professor
CSE(DS)
MALLA REDDY ENGINEERING COLLEGE (Autonomous)
Module 5:
Reinforcement Learning
Syllabus
Reinforcement Learning–(Q-Learning, Deep Q-
Networks) – Transfer Learning and Pretrained Models –
Markov Chain Monte Carlo Methods – Sampling –
Proposal Distribution – Markov Chain Monte Carlo –
Graphical Models – Bayesian Networks – Markov
Random Fields – Case Studies: Real-World Machine
Learning Applications – Future Trends in Machine
Learning
Reinforcement Learning
• Reinforcement Learning (RL) is a branch of machine learning that teaches agents how
to make decisions by interacting with an environment to achieve a goal. In RL, an
agent learns to perform tasks by trying different strategies to maximize cumulative
rewards based on feedback received through its actions.

• Agent: The decision-maker that performs actions.


• Environment: The world or system in which the agent operates.
• State: The situation or condition the agent is currently in.
• Action: The possible moves or decisions the agent can make.
• Reward: The feedback or result from the environment
• based on the agent’s action.
How RL Works?
• The RL process involves an agent performing actions in an environment,
receiving rewards or penalties based on those actions, and adjusting its
behavior accordingly. This loop helps the agent improve its decision-making
over time to maximize the cumulative reward.
RL components:
• Policy: A strategy that the agent uses to determine the next action based on
the current state.
• Reward Function: A function that provides feedback on the actions taken,
guiding the agent towards its goal.
• Value Function: Estimates the future cumulative rewards the agent will receive
from a given state.
• Model of the Environment: A representation of the environment that predicts
future states and rewards, aiding in planning.
RL Example: Navigating a Maze
Imagine a robot navigating a maze to reach a diamond
while avoiding fire hazards. The goal is to find the optimal
path with the least number of hazards while maximizing
the reward:
• Each time the robot moves correctly, it receives a
reward.
• If the robot takes the wrong path, it loses points.

The robot learns by exploring different paths in the maze.


By trying various moves, it evaluates the rewards and
penalties for each path. Over time, the robot determines
the best route by selecting the actions that lead to the
highest cumulative reward.
The robot’s learning process is as follows:
• Exploration: The robot starts by exploring all possible paths in the maze,
taking different actions at each step (e.g., move left, right, up, or down).
• Feedback: After each move, the robot receives feedback from the
environment:
• A positive reward for moving closer to the diamond.
• A penalty for moving into a fire hazard.
• Adjusting Behavior: Based on this feedback, the robot adjusts its behavior to
maximize the cumulative reward, favoring paths that avoid hazards and bring
it closer to the diamond.
• Optimal Path: Eventually, the robot discovers the optimal path with the least
number of hazards and the highest reward by selecting the right actions based
on past experiences.
Types of RL Algorithm

1. Model-Based RL
• In model-based reinforcement learning algorithm, the agent builds a model of
the environment's dynamics. This model predicts the next state and the
reward given the current state and action. The agent uses this model to plan
actions by simulating possible future scenarios before deciding on the best
action. This type of RL is appropriate for environments where building an
accurate model is feasible, allowing for efficient exploration and planning.
2. Model-Free RL
• Model-free reinforcement learning algorithm does not require a model of the
environment. Instead, the agent learns directly from interactions with the
environment by trial and error. The agent learns to associate actions with
rewards and uses this experience to improve decision-making over time. This
type of reinforcement learning is suitable for complex environments where
modeling the environment's dynamics is difficult or impossible.
RL Models
Traditional reinforcement learning models
• Traditional reinforcement learning models are based on the foundational
principles of RL, where an agent learns to make decisions through trial and error
by interacting with an environment. These models often rely on tabular methods,
like Q-learning and SARSA , which use a table or matrix to store and update the
values of different actions in various states.
• Q-Learning is a value-based method in which the agent learns the value of taking a
particular action in a specific state, aiming to maximize the cumulative reward over
time.
• SARSA is similar to Q-learning, but the agent updates its value estimates using the
action taken rather than the best possible action.
Q-learning
• Q-learning is a model-free reinforcement learning algorithm used to train
agents (computer programs) to make optimal decisions by interacting with an
environment. It helps the agent explore different actions and learn which ones
lead to better outcomes. The agent uses trial and error to determine which
actions result in rewards (good outcomes) or penalties (bad outcomes).
• Over time, it improves its decision-making by updating a Q-table, which stores
Q-values representing the expected rewards for taking particular actions in
given states.
• Q-Learning works well for small state-action spaces, it struggles with
scalability when dealing with high-dimensional environments like images or
continuous states.
Working of Q-Learning

Q-learning models follow an iterative process, where different


components work together to train the agent:
• Agent: The entity that makes decisions and takes actions within the
environment.
• States: The variables that define the agent’s current position in the
environment.
• Actions: The operations the agent performs when in a specific state.
• Rewards: The feedback the agent receives after taking an action.
• Episodes: A sequence of actions that ends when the agent reaches
a terminal state.
• Q-values: The estimated rewards for each state-action pair.
Algorithm of Q learning
• Initialization: The agent starts with an initial Q-table, where Q-
values are typically initialized to zero.
• Exploration: The agent chooses an action based on the ϵ-greedy
policy (either exploring or exploiting).
• Action and Update: The agent takes the action, observes the next
state, and receives a reward. The Q-value for the state-action
pair is updated using the TD update rule.
• Iteration: The process repeats for multiple episodes until the
agent learns the optimal policy.
Deep Reinforcement Learning Models
• Deep reinforcement learning (Deep RL) models combine the principles of
traditional RL with deep learning, allowing the agent to handle complex
environments with high-dimensional inputs, such as images or continuous
action spaces.
• Deep RL models are powerful in handling complex and large-scale problems,
such as playing video games, robotics, and autonomous driving. They can
process high-dimensional data and learn features automatically without
manual feature engineering. Instead of using tables to store values, Deep RL
models utilize neural networks to approximate the value functions or policies,
explained in detail below:
Deep Q-Networks (DQN)
• DQN is a powerful algorithm in the field of RL. It
combines the principles of deep neural networks
with Q-learning, enabling agents to learn
optimal policies in complex environments.
• Traditional Q-Learning uses a table to store Q-
values for each state-action pair, which becomes
impractical for large state spaces.
• Instead of maintaining a table of Q-values for
each state-action pair, DQNs approximate the Q-
value function using a neural network
parameterized by weights θ. The network takes
a state as input and outputs Q-values for all
possible actions.
Architecture of Deep Q-Networks
1. Neural Network
The network approximates the Q-value function Q(s,a;θ)Q(s,a;θ),
where θθ represents the trainable parameters.
For example, in Atari games, the input might be raw pixels from the game
screen, and the output is a vector of Q-values corresponding to each possible
action.
2. Experience Replay
To stabilize training, DQNs store past experiences (s,a,r,s′)(s,a,r,s′) in a replay
buffer.
During training, mini-batches of experiences are sampled randomly from the
buffer, breaking the correlation between consecutive experiences and
improving generalization.
3. Target Network
A separate target network with parameters θ−θ− is used to compute the
target Q-values during updates. The target network is periodically updated
with the weights of the main network to ensure stability.
Loss Function

•The loss function measures the difference between the


predicted Q-values and the target Q-values:

L(θ)=E[(r+γmaxa’​Q(s’,a’;θ−)–Q(s,a;θ))2]
Advantages

• High-Dimensional State Spaces : Traditional Q-Learning requires


storing a Q-table, which becomes infeasible for large state
spaces. Neural networks can generalize across states, making
them suitable for complex environments.
• Continuous Input Data : Many real-world problems involve
continuous inputs, such as pixel data from video frames. Neural
networks excel at processing such data.
• Scalability : By leveraging the representational power of deep
learning, DQNs can scale to solve tasks that were previously
unsolvable with tabular methods.
Applications of RL
• Gaming: Training AI to outperform humans in complex games like chess, Go, and multiplayer online games.
• Autonomous Vehicles: Developing decision-making systems for self-driving cars, drones, and other
autonomous systems to navigate and operate safely.
• Robotics: Teaching robots to perform tasks such as assembly, walking, and complex manipulation through
adaptive learning.
• Finance: Enhancing strategies in trading, portfolio management, and risk assessment.
• Healthcare: Personalizing medical treatments, managing patient care, and assisting in surgeries with robotic
systems.
• Supply Chain Management: Optimizing logistics, inventory management, and distribution networks.
• Energy Management: Managing and distributing renewable energy in smart grids to enhance efficiency and
sustainability.
• Advertising: Optimizing ad placements and bidding strategies in real-time to maximize engagement and
revenue.
• Manufacturing: Automating and optimizing production lines and processes.
Transfer learning
• Transfer learning is a machine learning technique where knowledge gained from
solving one problem is applied to a different but related problem. Instead of
training a model from scratch for a new task, a pre-trained model on a similar
task is used, saving time and resources.
• It involves reusing a model pre-trained on one task and fine-tuning it for a new,
related task.
• It's particularly beneficial when you have limited data for the new task, as the
pre-trained model already has learned a lot of useful features.
• Typically, training a model takes a large amount of compute resources, data and
time. Using a pretrained model as a starting point helps cut down on all three, as
developers don't have to start from scratch, training a large model on what would
be an even bigger data set.
How to use Transfer Learning
• Transfer learning can be accomplished in several ways. One way is to find a related
learned task -- labeled as Task B -- that has plenty of transferable labeled data. The
new model is then trained on Task B. After this training, the model has a starting
point for solving its initial task, Task A.
• Another way to accomplish transfer learning
is to use a pretrained model. This process is
easier, as it involves the use of an already
trained model. The pretrained model should
have been trained using a large data set to
solve a similar task as task A. Models can be
imported from other developers who have
published them online.
Types of transfer learning
• Transfer learning methods fall into one of the following three
categories:

• Transductive transfer. Target tasks are the same but use different data
sets.
• Inductive transfer. Source and target tasks are different, regardless of
the data set. Source and target data are typically labeled.
• Unsupervised transfer. Source and target tasks are different, but the
process uses unlabeled source and target data. Unsupervised learning
is useful in settings where manually labeling data is impractical.
Transfer learning examples
• In machine learning, knowledge or data gained while solving one problem is
stored, labeled and then applied to a different but related problem. In NLP, for
example, a data set from an old model that understands the vocabulary used in
one area can be used to train a new model whose goal is to understand dialects
in multiple areas. An organization could then apply this for sentiment analysis.
• Transfer learning is also useful during the deployment of upgraded technology,
such as a chatbot. If the new domain is similar enough to previous deployments,
transfer learning can assess which knowledge should be transplanted. Using
transfer learning, developers can decide what knowledge and data is reusable
from the previous deployments and transfer that information for use when
developing the upgraded version.
Pre-trained model
• a pre-trained model is a model created by some one else to solve
a similar problem. Instead of building a model from scratch to
solve a similar problem, you use the model trained on other
problem as a starting point.
• For example, if you want to build a self learning car. You can
spend years to build a decent image recognition algorithm from
scratch or you can take inception model (a pre-trained model)
from Google which was built on ImageNet data to identify images
in those pictures.
• A pre-trained model may not be 100% accurate in your
application, but it saves huge efforts required to re-invent the
wheel. Let me show this to you with a recent example.
Some real-world examples
Examples where pre-trained NLP models are used
• Sentiment Analysis is an NLP task where a model tries to identify if the
given text has positive, negative, or neutral sentiment. Sentiment analysis can
be used in many real-world scenarios like customer support chatbots and
spam detection. Pre-trained NLP models for sentiment analysis are provided
by open-source NLP libraries such as BERT, NTLK, Spacy, and Stanford NLP.
• Text Summarization is an NLP task where a model tries to summarize the
input text into a shorter version in an efficient way that preserves all
important information from the input text. NER, NMT, and Sentiment Analysis
models are often used as part of the pipeline for pre-processing input text
before sending it over to a summarization model.
• Automated Question Answering Systems
• Speech Recognition
Markov Chain Monte Carlo Methods –
Sampling
• Markov Chain Monte Carlo (MCMC) is a family of algorithms used to sample from a
probability distribution, especially when direct sampling is difficult or impossible.
• It works by constructing a Markov chain whose stationary distribution is the target
distribution. The Markov chain property ensures that the next sample depends only
on the current sample, not the entire history.
• MCMC methods are a family of algorithms that uses Markov Chains to perform
Monte Carlo estimate.
• The name gives us a hint, that it is composed of two components – Monte Carlo and
Markov Chain. Let us understand them separately and in their combined form.
• Instead of sampling independently, MCMC constructs a Markov Chain whose
stationary distribution is the target distribution. By simulating the chain for a long
time, you obtain samples approximately distributed according to the target.
Monte Carlo Sampling
• A stochastic process where the next state depends only on the current
state.
• It is a technique for sampling from a probability distribution and using
those samples to approximate desired quantity. In other words, it uses
randomness to estimate some deterministic quantity of interest.
• Say we have Expectation (s) to estimate, this could be a highly complex
integral or even intractable to estimate— using the Monte Carlo method
we resort to approximate such quantities by averaging over samples.

Approximated Expectation generated by


stimulating large samples of f(x)
Original Expectation to be calculated
• Computing the average over a large number of samples could reduce the
standard error and provide us with a fairly accurate approximation.
• "This method has a limitation, for it assumes to easily sample from a probability
distribution, however doing so is not always possible. Sometimes, we can’t even
sample from the distribution. In such cases, we make use of Markov chains to
efficiently sample from an intractable probability distribution."
Sampling
• Sampling is a way to approximately estimate certain characteristics of the
whole population by taking a subset of the population into the study.
Sampling has various use cases :
• It could be used to approximate an intractable sum or integral.
• It could be used to provide a significant speedup in estimating tractable but
costly sum or integral.
• In some cases like density estimation, it could simply be used to approximate
probability distribution and then impute missing data.
• Few Sampling techniques – Ancestral Sampling, Inverse Transform Sampling,
Rejection Sampling, Importance Sampling, Monte Carlo Sampling, MCMC
Sampling.
• The goal is to sample from the posterior distribution (in Bayesian
methods) or from a joint distribution when exact computation
(e.g., normalization) is intractable.
Common Sampling Strategies:
• Direct Sampling (if the distribution is simple)
• Rejection Sampling
• Importance Sampling
• MCMC Sampling (when above methods fail due to complexity)
Algorithms:

• Metropolis-Hastings: Uses an acceptance rule to decide


whether to accept a proposed sample.
• Gibbs Sampling: A special case where you sample each
variable conditional on all others, used when
conditionals are easier to sample from.
MCMC Graphical models
• Graphical models use graphs to represent the conditional
dependence structure between random variables.
Types:
• Directed: Bayesian Networks (DAGs)
• Undirected: Markov Random Fields (MRFs)
They provide a compact representation of joint distributions, allow
efficient inference, and help visualize relationships.
Bayesian Networks (BNs)

• A Bayesian Network is a directed acyclic graph (DAG) where


nodes represent variables and edges indicate direct probabilistic
dependencies.
Key Features:
• Each node has a conditional probability distribution (CPD) given
its parents.
• Joint distribution is factorized as:
P(X1​,…,Xn​)=i=1∏n​P(Xi​∣Parents(Xi​))
Markov Random Fields (MRFs)

• An undirected graphical model where nodes represent variables


and edges encode conditional independence assumptions.
Key Features
• Joint distribution is represented using potential functions over
cliques (fully connected subgraphs):
P(X)=Z1​C∈Cliques∏​ϕC​(XC​)

Z is the normalization constant (partition function).

• Widely used in computer vision, image processing, and spatial


statistics.
• Graphical models (Bayesian Networks)
• Sampling and inference
• MCMC (specifically, Gibbs Sampling)
Consisder eg :Medical Diagnosis
Suppose you have a Bayesian Network that models the relationships
among the following binary variables:
A: Has a disease (True/False) The dependencies
•A→B
B: Test result (Positive/Negative) •A→C
C: Has symptoms (Present/Absent)
Medical Diagnosis Example
Bayesian Network can be formed as follows:
A Each variable has a conditional probability
/\ table (CPT):

B C • P(A)

• P(B∣A)

• P(C∣A)

to Compute Posterior P(A∣B=Positive,C=Present)


Direct computation is hard if the network is large or CPTs are complex.
So, we use MCMC (specifically, Gibbs Sampling) to approximatethe posterior.
Cont...
Note: Since 𝐵and c are observed, we fix them during sampling.
1. Initialize all variables randomly (e.g., A=False,B=Positive,C=Present )

2. Loop: At each step, sample only unobserved variables, which is just A here.
To sample A, use the conditional distribution
P(AB,C)∝P(A)⋅P(B∣A)⋅P(C∣A)
Since B and C are fixed, we compute this for both A=True and A=False normalize, and
sample a new value of A.

3. Repeatmany times (e.g., 10,000 iterations), discarding early samples (burn-in).

4. Estimate Posterior:
After sampling, compute

P^(A=True∣B=Positive,C=Present)=(Number of samples where A=True/Total samples after


burn-in)
Concept Type Key Feature

Uses Markov Chains to sample from complex


MCMC Sampling method
distributions

Sampling General concept Drawing samples from a distribution

Proposal Distribution MCMC component Suggests new candidate states

Metropolis-Hastings / Gibbs MCMC algorithms Methods to simulate Markov Chains

Represent probabilistic dependencies with


Graphical Models Modeling tool
graphs

Bayesian Networks Directed Graph Encode causal/conditional dependencies

Encode symmetric relationships using


Markov Random Fields Undirected Graph
potentials
Case Studies: Real-World ML Applications

Case 1 : Google DeepMind – Diabetic Retinopathy Detection

• Diabetic retinopathy is a leading cause of blindness, and early


detection is crucial.
• DeepMind developed an AI model that analyzes eye images to
detect signs of the disease.
• The system achieved expert-level accuracy, enabling faster and
more scalable diagnoses, especially in underserved areas.
Case2 :PayPal – Fraud Detection

• Problem: Online transactions are vulnerable to fraud, leading to


financial losses.
• Solution: PayPal implemented a machine learning system that
analyzes millions of transactions in real time to detect anomalies.
• Results: The AI-driven approach significantly improved fraud
detection, reducing unauthorized transactions
Future Trends in Machine Learning
• Quantum Computing & ML – Quantum algorithms will enable
faster and more complex computations, revolutionizing areas like
cryptography and optimization.
• Edge Computing for Real-Time Analytics – Processing data closer
to its source will enhance efficiency in applications like
autonomous vehicles and industrial automation.
• AI-Driven Predictive Analytics – Improved forecasting models will
provide deeper insights in finance, healthcare, and climate
science.
• Self-Learning Robots – Robotics will advance with AI-driven
adaptability, making machines more efficient in manufacturing
and logistics.
• Autonomous Transportation – Machine learning will continue to
refine navigation and safety in self-driving vehicles.
• Autonomous Agents – AI-powered agents will become more
prevalent, handling complex tasks independently
• https://www.simplilearn.com/tutorials/machine-learning-tutorial/rein
forcement-learning
• https://www.geeksforgeeks.org/q-learning-in-python
• https://towardsdatascience.com/monte-carlo-markov-chain-mcmc-ex
plained-94e3a6c8de11/

You might also like