0% found this document useful (0 votes)
20 views10 pages

Reinforcement ML

Machine learning concept
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Reinforcement ML

Machine learning concept
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Topic: Introduction to ML Course: Machine Learning

Machine Learning (VR17)


IV B.Tech – I Semester
UNIT-1
Lecture:3
Topic: Reinforcement Machine Learning

COURSE INSTRUCTOR:
Dr.R.UMamaheswari
Assoc.prof & HoD ECM

Department of Electronics and Computer Engineering Slide No. 1


Topic: Reinforcement learning Course: Machine Learning

Reinforcement learning

Reinforcement learning is a goal-directed computational approach where a


computer learns to perform a task by interacting with an unknown dynamic
environment.

This learning approach enables a computer to make a series of decisions to


maximize the cumulative reward for the task without human intervention and
without being explicitly programmed to achieve the task.

Department of Electronics and Computer Engineering Slide No. 2


Topic: Reinforcement learning Course: Machine Learning

Reinforcement learning

Department of Electronics and Computer Engineering Slide No. 3


Topic: Reinforcement learning Course: Machine Learning

Reinforcement learning
The goal of reinforcement learning is to train an agent to complete a task within an
unknown environment. The agent receives observations and a reward from the
environment and sends actions to the environment. The reward is a measure of
how successful an action is with respect to completing the task goal.

The agent contains two components: a policy and a learning algorithm.

The policy is a mapping that selects actions based on the observations from the
environment. Typically, the policy is a function approximator with tunable
parameters, such as a deep neural network.

The learning algorithm continuously updates the policy parameters based on the
actions, observations, and reward. The goal of the learning algorithm is to find an
optimal policy that maximizes the cumulative reward received during the task.

Department of Electronics and Computer Engineering Slide No. 4


Topic: Reinforcement learning Course: Machine Learning

In other words, reinforcement learning involves an agent learning the optimal


behaviour through repeated trial-and-error interactions with the environment
without human involvement.
As an example, consider the task of parking a vehicle using an automated driving
system.
The goal of this task is for the vehicle computer (agent) to park the vehicle in the
correct position and orientation.
To do so, the controller uses readings from cameras, accelerometers,
gyroscopes, a GPS receiver, and lidar (observations) to generate steering, braking,
and acceleration commands (actions).
The action commands are sent to the actuators that control the vehicle.
The resulting observations depend on the actuators, sensors, vehicle dynamics,
road surface, wind, and many other less-important factors.
All these factors, that is, everything that is not the agent, make up
the environment in reinforcement learning.

Department of Electronics and Computer Engineering Slide No. 5


Topic :Reinforcement learning Course: Machine Learning

To learn how to generate the correct actions from the observations, the computer
repeatedly tries to park the vehicle using a trial-and-error process.

To guide the learning process, you provide a signal that is one when the car
successfully reaches the desired position and orientation and zero otherwise
(reward).

During each trial, the computer selects actions using a mapping (policy) initialized
with some default values.

After each trial, the computer updates the mapping to maximize the reward
(learning algorithm).

This process continues until the computer learns an optimal mapping that
successfully parks the car.

Department of Electronics and Computer Engineering Slide No. 6


Topic: Reinforcement learning Course: Machine Learning

Reinforcement Learning Workflow

Formulate problem — Define the task for the agent to learn, including
how the agent interacts with the environment and any primary and
secondary goals the agent must achieve.
Create environment — Define the environment within which the agent
operates, including the interface between agent and environment and the
environment dynamic model.

Department of Electronics and Computer Engineering Slide No. 7


Topic: Reinforcement learning Course: Machine Learning

Define reward — Specify the reward signal that the agent uses to
measure its performance against the task goals and how to calculate
this signal from the environment.
Create agent — Create the agent, which includes defining a policy
representation and configuring the agent learning algorithm.
Train agent — Train the agent policy representation using the
defined environment, reward, and agent learning algorithm
Validate agent — Evaluate the performance of the trained agent by
simulating the agent and environment together.
Deploy policy — Deploy the trained policy representation using, for
example, generated GPU code.

Department of Electronics and Computer Engineering Slide No. 8


Topic: Reinforcement learning Course: Machine Learning

Training an agent using reinforcement learning is an iterative process. Decisions


and results in later stages can require you to return to an earlier stage in the
learning workflow.

Training settings

Learning algorithm configuration

Policy representation

Reward signal definition

Action and observation signals

Environment dynamics

Department of Electronics and Computer Engineering Slide No. 9


Topic: Reinforcement learning Course: Machine Learning

Thank You

Department of Electronics and Computer Engineering Slide No. 10

You might also like