Unit - 5
Unit - 5
By
R.Sanghavi
Asst Professor
CSE(DS)
MALLA REDDY ENGINEERING COLLEGE (Autonomous)
Module 5:
Reinforcement Learning
Syllabus
Reinforcement Learning–(Q-Learning, Deep Q-
Networks) – Transfer Learning and Pretrained Models –
Markov Chain Monte Carlo Methods – Sampling –
Proposal Distribution – Markov Chain Monte Carlo –
Graphical Models – Bayesian Networks – Markov
Random Fields – Case Studies: Real-World Machine
Learning Applications – Future Trends in Machine
Learning
Reinforcement Learning
• Reinforcement Learning (RL) is a branch of machine learning that teaches agents how
to make decisions by interacting with an environment to achieve a goal. In RL, an
agent learns to perform tasks by trying different strategies to maximize cumulative
rewards based on feedback received through its actions.
1. Model-Based RL
• In model-based reinforcement learning algorithm, the agent builds a model of
the environment's dynamics. This model predicts the next state and the
reward given the current state and action. The agent uses this model to plan
actions by simulating possible future scenarios before deciding on the best
action. This type of RL is appropriate for environments where building an
accurate model is feasible, allowing for efficient exploration and planning.
2. Model-Free RL
• Model-free reinforcement learning algorithm does not require a model of the
environment. Instead, the agent learns directly from interactions with the
environment by trial and error. The agent learns to associate actions with
rewards and uses this experience to improve decision-making over time. This
type of reinforcement learning is suitable for complex environments where
modeling the environment's dynamics is difficult or impossible.
RL Models
Traditional reinforcement learning models
• Traditional reinforcement learning models are based on the foundational
principles of RL, where an agent learns to make decisions through trial and error
by interacting with an environment. These models often rely on tabular methods,
like Q-learning and SARSA , which use a table or matrix to store and update the
values of different actions in various states.
• Q-Learning is a value-based method in which the agent learns the value of taking a
particular action in a specific state, aiming to maximize the cumulative reward over
time.
• SARSA is similar to Q-learning, but the agent updates its value estimates using the
action taken rather than the best possible action.
Q-learning
• Q-learning is a model-free reinforcement learning algorithm used to train
agents (computer programs) to make optimal decisions by interacting with an
environment. It helps the agent explore different actions and learn which ones
lead to better outcomes. The agent uses trial and error to determine which
actions result in rewards (good outcomes) or penalties (bad outcomes).
• Over time, it improves its decision-making by updating a Q-table, which stores
Q-values representing the expected rewards for taking particular actions in
given states.
• Q-Learning works well for small state-action spaces, it struggles with
scalability when dealing with high-dimensional environments like images or
continuous states.
Working of Q-Learning
L(θ)=E[(r+γmaxa’Q(s’,a’;θ−)–Q(s,a;θ))2]
Advantages
• Transductive transfer. Target tasks are the same but use different data
sets.
• Inductive transfer. Source and target tasks are different, regardless of
the data set. Source and target data are typically labeled.
• Unsupervised transfer. Source and target tasks are different, but the
process uses unlabeled source and target data. Unsupervised learning
is useful in settings where manually labeling data is impractical.
Transfer learning examples
• In machine learning, knowledge or data gained while solving one problem is
stored, labeled and then applied to a different but related problem. In NLP, for
example, a data set from an old model that understands the vocabulary used in
one area can be used to train a new model whose goal is to understand dialects
in multiple areas. An organization could then apply this for sentiment analysis.
• Transfer learning is also useful during the deployment of upgraded technology,
such as a chatbot. If the new domain is similar enough to previous deployments,
transfer learning can assess which knowledge should be transplanted. Using
transfer learning, developers can decide what knowledge and data is reusable
from the previous deployments and transfer that information for use when
developing the upgraded version.
Pre-trained model
• a pre-trained model is a model created by some one else to solve
a similar problem. Instead of building a model from scratch to
solve a similar problem, you use the model trained on other
problem as a starting point.
• For example, if you want to build a self learning car. You can
spend years to build a decent image recognition algorithm from
scratch or you can take inception model (a pre-trained model)
from Google which was built on ImageNet data to identify images
in those pictures.
• A pre-trained model may not be 100% accurate in your
application, but it saves huge efforts required to re-invent the
wheel. Let me show this to you with a recent example.
Some real-world examples
Examples where pre-trained NLP models are used
• Sentiment Analysis is an NLP task where a model tries to identify if the
given text has positive, negative, or neutral sentiment. Sentiment analysis can
be used in many real-world scenarios like customer support chatbots and
spam detection. Pre-trained NLP models for sentiment analysis are provided
by open-source NLP libraries such as BERT, NTLK, Spacy, and Stanford NLP.
• Text Summarization is an NLP task where a model tries to summarize the
input text into a shorter version in an efficient way that preserves all
important information from the input text. NER, NMT, and Sentiment Analysis
models are often used as part of the pipeline for pre-processing input text
before sending it over to a summarization model.
• Automated Question Answering Systems
• Speech Recognition
Markov Chain Monte Carlo Methods –
Sampling
• Markov Chain Monte Carlo (MCMC) is a family of algorithms used to sample from a
probability distribution, especially when direct sampling is difficult or impossible.
• It works by constructing a Markov chain whose stationary distribution is the target
distribution. The Markov chain property ensures that the next sample depends only
on the current sample, not the entire history.
• MCMC methods are a family of algorithms that uses Markov Chains to perform
Monte Carlo estimate.
• The name gives us a hint, that it is composed of two components – Monte Carlo and
Markov Chain. Let us understand them separately and in their combined form.
• Instead of sampling independently, MCMC constructs a Markov Chain whose
stationary distribution is the target distribution. By simulating the chain for a long
time, you obtain samples approximately distributed according to the target.
Monte Carlo Sampling
• A stochastic process where the next state depends only on the current
state.
• It is a technique for sampling from a probability distribution and using
those samples to approximate desired quantity. In other words, it uses
randomness to estimate some deterministic quantity of interest.
• Say we have Expectation (s) to estimate, this could be a highly complex
integral or even intractable to estimate— using the Monte Carlo method
we resort to approximate such quantities by averaging over samples.
B C • P(A)
• P(B∣A)
• P(C∣A)
2. Loop: At each step, sample only unobserved variables, which is just A here.
To sample A, use the conditional distribution
P(AB,C)∝P(A)⋅P(B∣A)⋅P(C∣A)
Since B and C are fixed, we compute this for both A=True and A=False normalize, and
sample a new value of A.
4. Estimate Posterior:
After sampling, compute