Ai Unit 3
Ai Unit 3
The algorithm does not The algorithm interacts with the user to
interact with the user acquire additional data
Passive Learning:
Active Learning:
is to maximize a reward signal over time. There are two main approaches to
ON-Policy Learning
that policy while interacting with the environment. The agent's behavior
• Key Characteristics:
OFF-Policy Learning
following a different behavior policy. This allows the agent to learn from
• Key Characteristics:
• The agent follows a behavior policy to interact with the
environment.
Networks (DQN).
to learn directly.
Value-Based Algorithms
Policy-Based Algorithms
expected rewards.
critic to estimate the value function and an actor to update the policy.
the environment.
exploitation.
1. RL in Marketing
Marketing is all about promoting and then, selling the products or services
either of your brand or someone else’s. In the process of marketing, finding the
right audience which yields larger returns on investment you or your company
is making is a challenge in itself.
And, it is one of the reasons companies are investing dollars in managing
digitally various marketing campaigns. Through real-time bidding supporting
well the fundamental capabilities of RL, your and other companies, smaller or
larger, can expect: –
• News producers are now able to receive the feedback of their users
instantaneously.
• Increased communication, as users are more expressive now.
• No space for disinformation, hatred.
3. RL in Healthcare
4. RL in Robotics
Robotics without any doubt facilitates training a robot in such a way that a
robot can perform tasks – just like a human being can. But still, there is a bigger
challenge the robotics industry is facing today – Robots aren’t able to use
common sense while making various moral, social decisions. Here, a
combination of Deep Learning and Reinforcement Learning i.e. Deep
Reinforcement Learning comes to the rescue to enable the robots with, “Learn
How To Learn” model. With this, the robots can now: –
5. RL in Gaming
6. RL in Image Processing
7. RL in Manufacturing
Manufacturing is all about producing goods that can satisfy our basic needs
and essential wants. Cobot Manufacturers (or Manufacturers of Collaborative
Robots that can perform various manufacturing tasks with a workforce of more
than 100 people) are helping a lot of businesses with their own RL solutions for
packaging and quality testing. Undoubtedly, their use is making the process of
manufacturing quality products faster that can say a big no to negative
customer feedback. And the lesser negative feedbacks are, the better is the
product’s performance and also, sales margin too.
Difference between negative and positive reinforcement
learning
Positive Reinforcement
Negative Reinforcement
RL operates on the principle of learning optimal behavior through trial and error.
The agent takes actions within the environment, receives rewards or penalties,
and adjusts its behavior to maximize the cumulative reward. This learning
process is characterized by the following elements:
• Policy: A strategy used by the agent to determine the next action based on
the current state.
• Reward Function: A function that provides a scalar feedback signal based
on the state and action.
• Value Function: A function that estimates the expected cumulative reward
from a given state.
• Model of the Environment: A representation of the environment that helps
in planning by predicting future states and rewards.
Main points in Reinforcement learning –
• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions
to a particular problem
• Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Types of Reinforcement:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can
diminish the results
4. Negative: Negative Reinforcement is defined as strengthening of behavior
because a negative condition is stopped or avoided.
Advantages of reinforcement learning:
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior
EM Algorithm
The primary goal of the EM algorithm is to use the available observed data of
the dataset to estimate the missing data of the latent variables and then use
that data to update the values of the parameters in the M-step.
• 1st Step: The very first step is to initialize the parameter values. Further,
the system is provided with incomplete observed data with the
assumption that data is obtained from a specific model.
• 2nd Step: This step is known as Expectation or E-Step, which is used to
estimate or guess the values of the missing or incomplete data using the
observed data. Further, E-step primarily updates the variables.
• 3rd Step: This step is known as Maximization or M-step, where we use
complete data obtained from the 2nd step to update the parameter values.
Further, M-step primarily updates the hypothesis.
• 4th step: The last step is to check if the values of latent variables are
converging or not. If it gets "yes", then stop the process; else, repeat the
process from step 2 until the convergence occurs.
Applications of EM algorithm
The primary aim of the EM algorithm is to estimate the missing data in the latent
variables through observed data in datasets. The EM algorithm or latent
variable model has a broad range of real-life applications in machine learning.
These are as follows:
Advantages of EM algorithm
• It is very easy to implement the first two basic steps of the EM algorithm
in various machine learning problems, which are E-step and M- step.
• It is mostly guaranteed that likelihood will enhance after each iteration.
• It often generates a solution for the M-step in the closed form.
Disadvantages of EM algorithm
Expectation-Maximization in EM
By iteratively repeating these steps, the EM algorithm seeks to maximize the
likelihood of the observed data. It is commonly used for unsupervised
learning tasks, such as clustering, where latent variables are inferred and has
applications in various fields, including machine learning, computer vision,
and natural language processing.
*Key Terms in Expectation-Maximization (EM) Algorithm*
Some of the most commonly used key terms in the Expectation-Maximization
(EM) Algorithm are as follows:
Latent Variables: Latent variables are unobserved variables in statistical
models that can only be inferred indirectly through their effects on observable
variables. They cannot be directly measured but can be detected by their
impact on the observable variables.
Likelihood: It is the probability of observing the given data given the
parameters of the model. In the EM algorithm, the goal is to find the
parameters that maximize the likelihood.
Maximization (M) Step: The M-step of the EM algorithm updates the parameter
estimates by maximizing the expected log-likelihood obtained from the E-
step. It involves finding the parameter values that optimize the likelihood
function, typically through numerical optimization methods.
Convergence: Convergence refers to the condition when the EM algorithm
has reached a stable solution. It is typically determined by checking if the
change in the log-likelihood or the parameter estimates falls below a
predefined threshold.
Initialization:
Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that the
observed data comes from a specific model.
E-Step (Expectation Step): In this step, we use the observed data in order to
estimate or guess the values of the missing or incomplete data. It is basically
used to update the variables.
Compute the posterior probability or responsibility of each latent variable
given the observed data and current parameter estimates.
Estimate the missing or incomplete data values using the current parameter
estimates.
M-step (Maximization Step): In this step, we use the complete data generated
in the preceding “Expectation” – step in order to update the values of the
parameters. It is basically used to update the hypothesis.
Update the parameters of the model by maximizing the expected complete
data log-likelihood obtained from the E-step.
This typically involves solving optimization problems to find the parameter
values that maximize the log-likelihood.
The specific optimization technique used depends on the nature of the
problem and the model being used.