0% found this document useful (0 votes)
54 views23 pages

Ai Unit 3

Uploaded by

Omkar Talkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views23 pages

Ai Unit 3

Uploaded by

Omkar Talkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

AI UNIT 3

Difference between active and passive reinforcement learning

Passive Learning Active Learning

Uses a large set of pre-


Starts with a small set of labeled data and
labeled data to train the
requests additional data from the user
algorithm

The algorithm does not The algorithm interacts with the user to
interact with the user acquire additional data

May continue to request additional data


It does not require user input
until a satisfactory level of accuracy is
after training is complete
achieved

Suitable for applications


Suitable for applications where labeled
where a large dataset is
data is scarce or expensive to acquire
available

Passive Learning:

Passive learning, also known as batch learning, is a method of acquiring data


by processing a large set of pre-labeled data. In passive learning, the
algorithm uses all the available data to learn and improve its performance.
The algorithm does not interact with the user or request additional data to
improve its accuracy.
Example:- An example of passive learning is training a machine learning
model to classify emails as spam or not spam. The algorithm is fed a large
dataset of labeled emails and uses it to learn how to identify spam emails.
Once the training is complete, the algorithm can accurately classify new
emails without any further input from the user.

Active Learning:

Active learning is a method of acquiring data where the algorithm interacts


with the user to acquire additional data to improve its accuracy. In active
learning, the algorithm starts with a small set of labeled data and requests the
user to label additional data. The algorithm uses the newly labeled data to
improve its performance and may continue to request additional data until a
satisfactory level of accuracy is achieved.
Example:- An example of active learning is training a machine learning model
to recognize handwritten digits. The algorithm may start with a small set of
labeled data and ask the user to label additional data that the algorithm is
uncertain about. The algorithm uses the newly labeled data to improve its
accuracy, and the process repeats until the algorithm can accurately
recognize most handwritten digits.

Explain the policy and types of policy


A policy in machine learning is a set of rules or strategies that defines the
agent’s behavior in a given environment, guiding its decision-making process
to achieve specific objectives.
In machine learning, a policy refers to a set of rules or strategies that dictate
the decision-making process of an agent within a specific environment. This
concept is often associated with reinforcement learning, a type of machine
learning where an agent interacts with an environment, takes actions, and
receives feedback in the form of rewards or penalties.
The primary goal of a policy is to guide the agent in making decisions that
maximize cumulative rewards over time. The policy serves as a mapping
function that takes the current state of the environment as input and outputs
the action the agent should take in that particular state. Essentially, it defines
the behavior of the agent by providing a strategy for selecting actions in
different situations.
There are two main types of policies in reinforcement learning:
1. Deterministic Policy:
• In a deterministic policy, the action to be taken in a specific state is fixed
and does not vary. Given the same state, the agent will always take the
same action.
2. Stochastic Policy:
• A stochastic policy, on the other hand, involves randomness in decision-
making. Even in the same state, the agent may choose different actions
with certain probabilities. This introduces exploration in the learning
process, allowing the agent to discover potentially better strategies.
The learning process in reinforcement learning involves optimizing the policy
to improve the agent’s performance over time. This is often done through
iterative exploration and exploitation, where the agent explores new actions
to discover their effects and exploits known actions to maximize rewards.
Several algorithms, such as Q-learning and policy gradient methods, are
employed to train the agent’s policy. Q-learning, for example, focuses on
learning a value function that estimates the expected cumulative rewards for
taking a particular action in a given state. Policy gradient methods directly
optimize the policy by adjusting its parameters based on the observed
rewards.

Explain the ON policy and OFF policy of reinforcement learning


ON-Policy vs. OFF-Policy Reinforcement Learning

Reinforcement Learning is a machine learning technique where an agent

learns to make decisions by interacting with an environment. The agent's goal

is to maximize a reward signal over time. There are two main approaches to

reinforcement learning: ON-Policy and OFF-Policy.

ON-Policy Learning

• Definition: In ON-Policy learning, the agent learns a policy and follows

that policy while interacting with the environment. The agent's behavior

is directly influenced by the policy it's learning.

• Key Characteristics:

• The agent's actions are determined by the current policy.

• The agent's experiences are used to update the policy.

• Examples of ON-Policy algorithms: SARSA, Policy Gradient.

OFF-Policy Learning

• Definition: In OFF-Policy learning, the agent learns a policy while

following a different behavior policy. This allows the agent to learn from

past experiences, even if they were generated by a different policy.

• Key Characteristics:
• The agent follows a behavior policy to interact with the

environment.

• The agent's experiences are used to improve a target policy.

• Examples of OFF-Policy algorithms: Q-Learning, Deep Q-

Networks (DQN).

Why use OFF-Policy learning?

• Exploration: OFF-Policy methods can explore the environment more

effectively, as they are not strictly tied to a single policy.

• Learning from past experiences: OFF-Policy methods can learn from

past experiences, even if they were generated by a different policy.

• Handling complex environments: OFF-Policy methods can be more

suitable for complex environments where the optimal policy is difficult

to learn directly.

Write in short about various reinforcement algorithm

Reinforcement learning algorithms can be broadly categorized into two main

types: Value-Based and Policy-Based.

Value-Based Algorithms

• Q-Learning: Learns a Q-function that estimates the expected future

reward for taking a particular action in a given state.


• SARSA: Similar to Q-Learning, but uses on-policy learning, meaning the

agent's actions are determined by the current policy.

• Deep Q-Networks (DQN): Combines Q-Learning with deep learning to

handle complex state spaces.

Policy-Based Algorithms

• Policy Gradient: Directly optimizes the policy function to maximize

expected rewards.

• Actor-Critic: Combines value-based and policy-based methods, using a

critic to estimate the value function and an actor to update the policy.

Model-Free vs. Model-Based Algorithms

• Model-Free: Learn directly from experience without building a model of

the environment.

• Model-Based: Build a model of the environment to predict the

consequences of actions and plan accordingly.

Other Notable Algorithms:

• Monte Carlo Methods: Estimate the value function based on the

average return from multiple simulations.


• Temporal Difference (TD) Learning: Combine elements of Monte Carlo

methods and dynamic programming.

The choice of algorithm depends on the specific problem and its

characteristics, such as the complexity of the environment, the nature of the

reward function, and the desired trade-offs between exploration and

exploitation.

Explain various applications of reinforcement learning


Reinforcement Learning is a sub-field of Machine Learning which itself is a sub-
field of Artificial Intelligence. It implies:
Artificial Intelligence -> Machine Learning -> Reinforcement Learning

1. RL in Marketing

Marketing is all about promoting and then, selling the products or services
either of your brand or someone else’s. In the process of marketing, finding the
right audience which yields larger returns on investment you or your company
is making is a challenge in itself.
And, it is one of the reasons companies are investing dollars in managing
digitally various marketing campaigns. Through real-time bidding supporting
well the fundamental capabilities of RL, your and other companies, smaller or
larger, can expect: –

• more display ad impressions in real-time.


• increased ROI, profit margins.
• predicting the choices, reactions, and behavior of customers towards your
products/services.
2. RL in Broadcast Journalism

Through different types of Reinforcement Learning, attracting likes and views


along with tracking the reader’s behavior is much simpler. Besides,
recommending news that suits the frequently-changing preferences of readers
and other online users can possibly be achieved since journalists can now be
equipped with an RL-based system that keeps an eye on intuitive news content
as well as the headlines. Take a look at other advantages too which
Reinforcement Learning is offering to readers all around the world.

• News producers are now able to receive the feedback of their users
instantaneously.
• Increased communication, as users are more expressive now.
• No space for disinformation, hatred.

3. RL in Healthcare

Healthcare is an important part of our lives and through DTRs (a sequence-


based use-case of RL), doctors can discover the treatment type, appropriate
doses of drugs, and timings for taking such doses. Curious to know how is this
possible!! See, DTRs are equipped with: –

• a sequence of rules which confirm the current health status of a patient.


• Then, they optimally propose treatments that can diagnose diseases like
diabetes, HIV, Cancer, and mental illness too.
If required, these DTRs (i.e. Dynamic Treatment Regimes) can reduce or
remove the delayed impact of treatments through their multi-objective
healthcare optimization solutions.

4. RL in Robotics

Robotics without any doubt facilitates training a robot in such a way that a
robot can perform tasks – just like a human being can. But still, there is a bigger
challenge the robotics industry is facing today – Robots aren’t able to use
common sense while making various moral, social decisions. Here, a
combination of Deep Learning and Reinforcement Learning i.e. Deep
Reinforcement Learning comes to the rescue to enable the robots with, “Learn
How To Learn” model. With this, the robots can now: –

• manipulate their decisions by grasping well various objects visible to them.


• solve complicated tasks which even humans fail to do as robots now know
what and how to learn from different levels of abstractions of the types of
datasets available to them.

5. RL in Gaming

Gaming is something nowadays without which you, me, or a huge chunk of


people can’t live. With games optimization through Reinforcement Learning
algorithms, we may expect better performances of our favorite games related
to adventure, action, or mystery.
To prove it right, the Alpha Go example can be considered. This is a computer
program that defeated the strongest Go (a challenging classical game) Player
in October 2015 and itself became the strongest Go player. The trick of Alpha
Go to defeat the player was Reinforcement Learning which kept on developing
stronger as the game is constantly exposed to unexpected gaming challenges.
Like Alpha Go, there are many other games available. Even you can also
optimize your favorite games by applying appropriately prediction models
which learn how to win in even complex situations through RL-enabled
strategies.

6. RL in Image Processing

Image Processing is another important method of enhancing the current


version of an image to extract some useful information from it. And there are
some steps associated like:

• Capturing the image with machines like scanners.


• Analyzing and manipulating it.
• Using the output image obtained after analysis for representation,
description-purposes.
Here, ML models like Deep Neural Networks (whose framework is
Reinforcement Learning) can be leveraged for simplifying this trending image
processing method. With Deep Neural Networks, you can either enhance the
quality of a specific image or hide the info. of that image. Later, use it for any of
your computer vision tasks.

7. RL in Manufacturing

Manufacturing is all about producing goods that can satisfy our basic needs
and essential wants. Cobot Manufacturers (or Manufacturers of Collaborative
Robots that can perform various manufacturing tasks with a workforce of more
than 100 people) are helping a lot of businesses with their own RL solutions for
packaging and quality testing. Undoubtedly, their use is making the process of
manufacturing quality products faster that can say a big no to negative
customer feedback. And the lesser negative feedbacks are, the better is the
product’s performance and also, sales margin too.
Difference between negative and positive reinforcement
learning

Positive Reinforcement

One important strategy for changing behavior is positive reinforcement. It


provides rewards or incentives after desired actions, encouraging individuals
to continue the behavior in the future.
Sticker charts are useful tools for implementing positive reinforcement. They
allow children to earn stickers for completing tasks. These stickers can lead
to bigger rewards, teaching the child about setting goals and waiting for
rewards. It's important to customize the rewards to fit the individual's
preferences. Verbal praise or affection may be more effective for some, while
others may prefer tangible rewards.

• Additional playtime or screen time


• Exclusive excursions or events
• Small indulgences or playthings
• Benefits, such as selecting a family film or meal
• Motivating expressions or actions, like a congratulatory hand gesture or
an embrace

It is essential to maintain consistency and timeliness when implementing


positive reinforcement. It requires promptly rewarding the desired behavior
each time it happens to reinforce the link between the behavior and the reward,
which boosts the chances of it being repeated in the future. Positive
reinforcement, which involves adding something desirable like praise or
rewards after the behavior, can effectively promote desired behaviors when
tailored to individual preferences and needs.

Negative Reinforcement

By eliminating an unpleasant input when a desired action is exhibited, negative


reinforcement works to enhance the likelihood of that behavior occurring.
Eliminating the unpleasant stimulus reinforces the behavior and serves as a
reward, increasing the likelihood that it will occur again. Negative
reinforcement concentrates on getting rid of something unwanted, as opposed
to positive reinforcement, which adds something desired.
When a kid wakes up early and turns off an annoying alarm, for instance, the
unpleasant stimulus-the alarm-is removed after the desirable behavior-waking
up on time-is achieved. By taking away the alarm, you are rewarding the child
for making the decision to get up on time and making it more likely that they will
do so in the future.

Other Instances of Negative Reinforcement include.

• A student putting in effort to avoid a low grade on a test


• A driver reducing speed to avoid getting a speeding ticket
• Completing work quickly in order to avoid receiving negative feedback
from their supervisor
• A person taking medicine to relieve pain or discomfort

It is critical to recognize the distinction between punishment and negative


reinforcement. While both entail an unpleasant stimulation, the goal of
punishment is to make undesirable conduct less frequent by attaching an
unpleasant consequence. Negative reinforcement, on the other hand, seeks to
encourage a desired behavior by eliminating an unpleasant experience.

Define reinforcement learning and explain various terms related


to it
Reinforcement Learning (RL) is a branch of machine learning focused on
making decisions to maximize cumulative rewards in a given situation. Unlike
supervised learning, which relies on a training dataset with predefined answers,
RL involves learning through experience. In RL, an agent learns to achieve a
goal in an uncertain, potentially complex environment by performing actions
and receiving feedback through rewards or penalties.
Key Concepts of Reinforcement Learning

• Agent: The learner or decision-maker.


• Environment: Everything the agent interacts with.
• State: A specific situation in which the agent finds itself.
• Action: All possible moves the agent can make.
• Reward: Feedback from the environment based on the action taken.

How Reinforcement Learning Works

RL operates on the principle of learning optimal behavior through trial and error.
The agent takes actions within the environment, receives rewards or penalties,
and adjusts its behavior to maximize the cumulative reward. This learning
process is characterized by the following elements:

• Policy: A strategy used by the agent to determine the next action based on
the current state.
• Reward Function: A function that provides a scalar feedback signal based
on the state and action.
• Value Function: A function that estimates the expected cumulative reward
from a given state.
• Model of the Environment: A representation of the environment that helps
in planning by predicting future states and rewards.
Main points in Reinforcement learning –
• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions
to a particular problem
• Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Types of Reinforcement:

3. Positive: Positive Reinforcement is defined as when an event, occurs due


to a particular behavior, increases the strength and the frequency of the
behavior. In other words, it has a positive effect on behavior.
Advantages of reinforcement learning are:

• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can
diminish the results
4. Negative: Negative Reinforcement is defined as strengthening of behavior
because a negative condition is stopped or avoided.
Advantages of reinforcement learning:

• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior

Elements of Reinforcement Learning

i) Policy: Defines the agent’s behavior at a given time.


ii) Reward Function: Defines the goal of the RL problem by providing
feedback.
iii) Value Function: Estimates long-term rewards from a state.
iv) Model of the Environment: Helps in predicting future states and rewards for
planning.

Short note on EM algorithm


What is an EM algorithm?

The Expectation-Maximization (EM) algorithm is defined as the combination of


various unsupervised machine learning algorithms, which is used to determine
the local maximum likelihood estimates (MLE) or maximum a posteriori
estimates (MAP) for unobservable variables in statistical models. Further, it is
a technique to find maximum likelihood estimation when the latent variables
are present. It is also referred to as the latent variable model.

A latent variable model consists of both observable and unobservable


variables where observable can be predicted while unobserved are inferred
from the observed variable. These unobservable variables are known as latent
variables.

• It is known as the latent variable model to determine MLE and MAP


parameters for latent variables.
• It is used to predict values of parameters in instances where data is
missing or unobservable for learning, and this is done until convergence
of the values occurs.

EM Algorithm

The EM algorithm is the combination of various unsupervised ML algorithms,


such as the k-means clustering algorithm. Being an iterative approach, it
consists of two modes. In the first mode, we estimate the missing or latent
variables. Hence it is referred to as the Expectation/estimation step (E-step).
Further, the other mode is used to optimize the parameters of the models so
that it can explain the data more clearly. The second mode is known as the
maximization-step or M-step.
• Expectation step (E - step): It involves the estimation (guess) of all
missing values in the dataset so that after completing this step, there
should not be any missing value.
• Maximization step (M - step): This step involves the use of estimated data
in the E-step and updating the parameters.
• Repeat E-step and M-step until the convergence of the values occurs.

The primary goal of the EM algorithm is to use the available observed data of
the dataset to estimate the missing data of the latent variables and then use
that data to update the values of the parameters in the M-step.

What is Convergence in the EM algorithm?

Convergence is defined as the specific situation in probability based on


intuition, e.g., if there are two random variables that have very less difference
in their probability, then they are known as converged. In other words,
whenever the values of given variables are matched with each other, it is called
convergence.
Steps in EM Algorithm

The EM algorithm is completed mainly in 4 steps, which include Initialization


Step, Expectation Step, Maximization Step, and convergence Step. These steps
are explained as follows:

• 1st Step: The very first step is to initialize the parameter values. Further,
the system is provided with incomplete observed data with the
assumption that data is obtained from a specific model.
• 2nd Step: This step is known as Expectation or E-Step, which is used to
estimate or guess the values of the missing or incomplete data using the
observed data. Further, E-step primarily updates the variables.
• 3rd Step: This step is known as Maximization or M-step, where we use
complete data obtained from the 2nd step to update the parameter values.
Further, M-step primarily updates the hypothesis.
• 4th step: The last step is to check if the values of latent variables are
converging or not. If it gets "yes", then stop the process; else, repeat the
process from step 2 until the convergence occurs.

Applications of EM algorithm

The primary aim of the EM algorithm is to estimate the missing data in the latent
variables through observed data in datasets. The EM algorithm or latent
variable model has a broad range of real-life applications in machine learning.
These are as follows:

• The EM algorithm is applicable in data clustering in machine learning.


• It is often used in computer vision and NLP (Natural language processing).
• It is used to estimate the value of the parameter in mixed models such as
the Gaussian Mixture Modeland quantitative genetics.
• It is also used in psychometrics for estimating item parameters and
latent abilities of item response theory models.
• It is also applicable in the medical and healthcare industry, such as in
image reconstruction and structural engineering.
• It is used to determine the Gaussian density of a function.

Advantages of EM algorithm

• It is very easy to implement the first two basic steps of the EM algorithm
in various machine learning problems, which are E-step and M- step.
• It is mostly guaranteed that likelihood will enhance after each iteration.
• It often generates a solution for the M-step in the closed form.
Disadvantages of EM algorithm

• The convergence of the EM algorithm is very slow.


• It can make convergence for the local optima only.
• It takes both forward and backward probability into consideration. It is
opposite to that of numerical optimization, which takes only forward
probabilities.

*Expectation-Maximization (EM) Algorithm*

TheExpectation-Maximization (EM) algorithm is an iterative optimization


method that combines different unsupervised machine learning algorithms to
find maximum likelihood or maximum posterior estimates of parameters in
statistical models that involve unobserved latent variables. The EM algorithm
is commonly used for latent variable models and can handle missing data. It
consists of an estimation step (E-step) and a maximization step (M-step),
forming an iterative process to improve model fit.
In the E step, the algorithm computes the latent variables i.e. expectation of
the log-likelihood using the current parameter estimates.
In the M step, the algorithm determines the parameters that maximize the
expected log-likelihood obtained in the E step, and corresponding model
parameters are updated based on the estimated latent variables.

Expectation-Maximization in EM
By iteratively repeating these steps, the EM algorithm seeks to maximize the
likelihood of the observed data. It is commonly used for unsupervised
learning tasks, such as clustering, where latent variables are inferred and has
applications in various fields, including machine learning, computer vision,
and natural language processing.
*Key Terms in Expectation-Maximization (EM) Algorithm*
Some of the most commonly used key terms in the Expectation-Maximization
(EM) Algorithm are as follows:
Latent Variables: Latent variables are unobserved variables in statistical
models that can only be inferred indirectly through their effects on observable
variables. They cannot be directly measured but can be detected by their
impact on the observable variables.
Likelihood: It is the probability of observing the given data given the
parameters of the model. In the EM algorithm, the goal is to find the
parameters that maximize the likelihood.

Log-Likelihood: It is the logarithm of the likelihood function, which measures


the goodness of fit between the observed data and the model. EM algorithm
seeks to maximize the log-likelihood.
Maximum Likelihood Estimation (MLE): MLE is a method to estimate the
parameters of a statistical model by finding the parameter values that
maximize the likelihood function, which measures how well the model
explains the observed data.

Posterior Probability: In the context of Bayesian inference, the EM algorithm


can be extended to estimate the maximum a posteriori (MAP) estimates,
where the posterior probability of the parameters is calculated based on the
prior distribution and the likelihood function.
Expectation (E) Step: The E-step of the EM algorithm computes the expected
value or posterior probability of the latent variables given the observed data
and current parameter estimates. It involves calculating the probabilities of
each latent variable for each data point.

Maximization (M) Step: The M-step of the EM algorithm updates the parameter
estimates by maximizing the expected log-likelihood obtained from the E-
step. It involves finding the parameter values that optimize the likelihood
function, typically through numerical optimization methods.
Convergence: Convergence refers to the condition when the EM algorithm
has reached a stable solution. It is typically determined by checking if the
change in the log-likelihood or the parameter estimates falls below a
predefined threshold.

*How Expectation-Maximization (EM) Algorithm Works:*


The essence of the Expectation-Maximization algorithm is to use the available
observed data of the dataset to estimate the missing data and then use that
data to update the values of the parameters.

Initialization:
Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that the
observed data comes from a specific model.

E-Step (Expectation Step): In this step, we use the observed data in order to
estimate or guess the values of the missing or incomplete data. It is basically
used to update the variables.
Compute the posterior probability or responsibility of each latent variable
given the observed data and current parameter estimates.

Estimate the missing or incomplete data values using the current parameter
estimates.

Compute the log-likelihood of the observed data based on the current


parameter estimates and estimated missing data.

M-step (Maximization Step): In this step, we use the complete data generated
in the preceding “Expectation” – step in order to update the values of the
parameters. It is basically used to update the hypothesis.
Update the parameters of the model by maximizing the expected complete
data log-likelihood obtained from the E-step.
This typically involves solving optimization problems to find the parameter
values that maximize the log-likelihood.
The specific optimization technique used depends on the nature of the
problem and the model being used.

Convergence: In this step, it is checked whether the values are converging or


not, if yes, then stop otherwise repeat step-2 and step-3 i.e. “Expectation” –
step and “Maximization” – step until the convergence occurs.
Check for convergence by comparing the change in log-likelihood or the
parameter values between iterations.
If the change is below a predefined threshold, stop and consider the
algorithm converged.
Otherwise, go back to the E-step and repeat the process until convergence is
achieved.

You might also like