Reinforcement ML
Reinforcement ML
COURSE INSTRUCTOR:
Dr.R.UMamaheswari
Assoc.prof & HoD ECM
Reinforcement learning
Reinforcement learning
Reinforcement learning
The goal of reinforcement learning is to train an agent to complete a task within an
unknown environment. The agent receives observations and a reward from the
environment and sends actions to the environment. The reward is a measure of
how successful an action is with respect to completing the task goal.
The policy is a mapping that selects actions based on the observations from the
environment. Typically, the policy is a function approximator with tunable
parameters, such as a deep neural network.
The learning algorithm continuously updates the policy parameters based on the
actions, observations, and reward. The goal of the learning algorithm is to find an
optimal policy that maximizes the cumulative reward received during the task.
To learn how to generate the correct actions from the observations, the computer
repeatedly tries to park the vehicle using a trial-and-error process.
To guide the learning process, you provide a signal that is one when the car
successfully reaches the desired position and orientation and zero otherwise
(reward).
During each trial, the computer selects actions using a mapping (policy) initialized
with some default values.
After each trial, the computer updates the mapping to maximize the reward
(learning algorithm).
This process continues until the computer learns an optimal mapping that
successfully parks the car.
Formulate problem — Define the task for the agent to learn, including
how the agent interacts with the environment and any primary and
secondary goals the agent must achieve.
Create environment — Define the environment within which the agent
operates, including the interface between agent and environment and the
environment dynamic model.
Define reward — Specify the reward signal that the agent uses to
measure its performance against the task goals and how to calculate
this signal from the environment.
Create agent — Create the agent, which includes defining a policy
representation and configuring the agent learning algorithm.
Train agent — Train the agent policy representation using the
defined environment, reward, and agent learning algorithm
Validate agent — Evaluate the performance of the trained agent by
simulating the agent and environment together.
Deploy policy — Deploy the trained policy representation using, for
example, generated GPU code.
Training settings
Policy representation
Environment dynamics
Thank You