0% found this document useful (0 votes)

7 views8 pages

Unit I

Uploaded by

lakshmipoojithamuchhu8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Unit I

Uploaded by

lakshmipoojithamuchhu8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit-I

Reinforcement Learning

1. The Reinforcement Learning Problem

1.1 Reinforcement Learning
 Machine learning is a type of AI focused on building computer systems that learn
from data, enabling software to improve its performance over time. There are three
types of learning.
Supervised learning
 Supervised learning is a category of machine learning that uses labelled datasets to
train algorithms to predict outcomes and recognize patterns. There are two types of
supervised learning.
 Classification - Classification algorithms are used to classify data by predicting a
categorical label or output variable based on the input data. One of the most
common examples of classification algorithms in use is the spam filter in your email
inbox. Here, a supervised learning model is trained to predict whether an email is
spam or ham.
 Regression - Regression algorithms are used to predict a real or continuous value,
where the algorithm detects a relationship between two or more variables. A
common example of a regression task might be predicting a salary based on work
experience.
Unsupervised Learning
 Unsupervised learning is a type of machine learning in which models are trained
using unlabeled dataset and are allowed to act on that data without any
supervision.
 There are three types of unsupervised learning.
 Clustering - is a method of grouping the objects into clusters such that objects with
most similarities remain into a group and has less or no similarities with the objects
of another group.
 Association rule mining - is a rule-based approach to reveal interesting
relationships between data points in large datasets.
 Dimensionality reduction - is an unsupervised learning technique that reduces the
number of features, or dimensions, in a dataset.
Reinforcement Learning
 Before jumping into reinforcement learning, let us try to understand what is meant
by learning? How humans and animals learn?
 The way humans and animals learn is by interacting with surroundings or
environment. It is also called as natural way of learning.
 For example, if you look at an infant or newly born baby, he or she does not have
any information about the environment. So, the way it acquires the knowledge is by
interacting with the environment. It collects the experiences and from these
experiences it learns. There is no explicit teacher available for learning.
 For example, if the baby touches a hot cup, immediately the baby’s sensory system
triggers a feedback causing pain to the baby. Then the baby understands that this
action shall not be performed. Hence, throughout our lives we will acquire wealth of
information by interacting with the environment. From this information we will learn.
 So, learning from interacting with the environment is the hallmark of intelligence. In
this course we will discuss the computational approach for learning from
experiences. This computational approach is called as reinforcement learning.
So, what is reinforcement learning?
 Reinforcement learning is mapping situations (also called as states) to actions in
order to maximize the numerical reward signal.
 In Reinforcement Learning, the agent interacts with the environment and explores it
by itself. The primary goal of an agent in reinforcement learning is to improve the
performance by getting the maximum positive rewards.

 RL solves a specific type of problems where decision making is sequential, and the
goal is long-term, such as game-playing, robotics, etc.
 It is the core part of artificial intelligence. Here we do not need to pre-program the
agent instead it learns from its own experience without any human intervention.
Example: Suppose there is an AI agent present within a maze environment, and his goal
is to find the diamond. The agent interacts with the environment by performing some
actions, and based on those actions, the state of the agent gets changed, and it also
receives a reward or penalty as feedback.
Key Features of Reinforcement Learning
 Reinforcement problems are closed-loop problems because the learning system's
actions influence its later inputs.
 The agent is not told which actions to take instead it must discover on its own which
actions yield the most reward by trying them out.
 Actions may affect not only the immediate reward but also all subsequent rewards
as well as future states.
 The agent may get a delayed reward.
 The environment is stochastic, and the agent needs to explore it to get the
maximum positive rewards.
How reinforcement learning is different from supervised and unsupervised learning?
 Reinforcement learning is different from supervised learning. In supervised learning
the agent is told what action it has to take on a particular situation. Whereas in
reinforcement learning the agent has not told what action it has to take on a
particular situation. It is the responsibility of the agent to decide what action it has to
take on a particular situation on its own.
 Reinforcement learning is also different from unsupervised learning. The goal of
unsupervised learning is to extract the hidden patterns or structure from the
unlabelled data. But, in reinforcement learning instead of finding the hidden
structure, we focus on maximizing the reward.
So, reinforcement learning is different from both supervised/unsupervised learning and we
consider it as the third paradigm in the space of learning and computational intelligence.
One of the challenges that arise in reinforcement learning is the trade-off between
exploration and exploitation.
Exploitation - is a strategy of using the accumulated knowledge to make decisions that
maximize the expected reward.
Exploration - involves acquiring new information or opportunities that will maximize the
expected reward.
Let's suppose people A and B are digging in a coal mine in the hope of getting a diamond
inside it. Person B got success in finding the diamond before person A and walks off
happily. After seeing him, person A gets a bit greedy and thinks he too might get success
in finding diamond at the same place where person B was digging coal. This action
performed by person A is called greedy action, and this policy is known as a greedy
policy. But person A was unknown because a bigger diamond was buried in that place
where he was initially digging the coal, and this greedy policy would fail in this situation.
In this example, person A only got knowledge of the place where person B was digging
but had no knowledge of what lies beyond that depth. But in the actual scenario, the
diamond can also be buried in the same place where he was digging initially or some
completely another place. Hence, with this partial knowledge about getting more rewards,
our reinforcement learning agent will be in a dilemma on whether to exploit the partial
knowledge to receive some rewards or it should explore unknown actions which could
result in many rewards.

Restaurant Selection
Exploitation Go to your favorite restaurant
Exploration Try a new restaurant

Online Banner Advertisements

Exploitation Show the most successful advert
Exploration Show a different advert

Oil Drilling
Exploitation Drill at the best known location
Exploration Drill at a new location

Game Playing
Exploitation Play the move you believe is best
Exploration Play an experimental move

1.2 Examples of Reinforcement Learning

Game Playing
 Reinforcement learning is extensively used in games such as chess, go, checkers
etc.
 In 2016 RL based algorithm called Alpha Go by Google defeated Lee Sedol in a
five match game of Go.
 In 1996 IBM’s Deep Blue an RL agent defeated world chess champion Garry
Kasparov. Deep blue can explore 100 million possible chess positions per second.

Data Centers Cooling

 One of the world’s most challenging problems is energy consumption.
 Data centers have a large energy consumption to keep the servers cooling
 Google has developed an automatic controlling system which can optimize the
energy consumption by adjusting various parameters. Google reported that it has
saved 40% of energy consumption using this system.

Trading and Finance

 Reinforcement learning can be used for predicting the stock price.
 An RL agent can be used to make decisions on whether to hold, buy, or sell a
stock.
 For example IBM has developed an RL based platform for live trading of stocks.

Robotics
 Robotics is an important sector which uses RL extensively. Almost all robots are
designed to grasp the objects.
 For example a vacuum cleaner robot automatically cleans the rooms without
human intervention.
 Similarly a robotic arm in a manufacturing industry can automatically fit various
parts of a car.

1.3 Elements of Reinforcement Learning

There are 4 basic elements of reinforcement learning.
1. Policy
 Policy() is the core of an RL agent which determines the behaviour of an agent at a
given time.
 So, policy defines a mapping from states to actions.
 In some cases policy may be a simple function or a lookup table. But in some cases
policy may involve a complex computation such as a search process.
 Policies may be stochastic specifying probabilities for actions.

2. Reward Signal
 A reward signal defines the goal of a reinforcement problem
 The reward signal defines the good and bad events of the agent.
 The objective of an agent is to maximize the total reward received over the long
run.

3. Value Function
 The value of a state is the total amount of reward an agent can expect to
accumulate in the future starting from that state.
 Reward signal indicate the immediate response whereas value indicate the long-
term response.
 Value function indicates what is good in the long-term.
 A state may have a low immediate reward but still have a high value. How it is
possible? It is because it may be followed by a high reward states.

4. Model of the Environment

 Model of the environment is one that mimics the behavior of the environment. i.e.
how the environment behaves when an agent performs an action)
 Given a state and action, the model might predict the resulting state and reward.
 Models are used for planning. It means that the model helps us to predict the future
states before they are actually experienced.
 RL methods can either be model-based or model-free methods. If the RL method
uses a model then it is called as model-based method. If the RL method is not
using a model then it is called as model-free method.

1.4 Limitations and Scope

 Most of the reinforcement learning methods we study in this course are structured
around estimating value functions.
 Other methods such as genetic algorithms, genetic programming, simulated
annealing, and other optimization methods have been used to approach
reinforcement learning problems.
 These methods evaluate the behavior of an agent using a different policy for
interacting with its environment.
 We call these methods as evolutionary methods because their operation is
analogous to the biological evolution.
 Our focus is on reinforcement learning methods that involve learning while
interacting with the environment, which evolutionary methods do not do.
 It is our belief that the methods that take advantage of individual behavioral
interactions can be much more efficient than evolutionary methods in many cases.
 Evolutionary methods do not consider the policy as a function from states to
Actions. They do not notice which states an individual passes through during its
lifetime, or which actions it selects. Hence, sometimes they become inefficient.
 However, we do include some methods that are like evolutionary methods called
policy gradient methods have proven useful in many problems.

1.5 An Extended Example: Tic-Tac-Toe

Consider the game of tic-tac-toe. Two players play this game on a 3X3 board. One player
plays Xs and the other Os until one player wins by placing three marks in a row,
horizontally, vertically, or diagonally, as the X player has in this game:

If the board fills up with neither player getting three in a row, the game is a draw.
Although this is a simple problem, it cannot readily be solved in a satisfactory way through
classical techniques. The reinforcement learning solves this problem by estimating the
values of states. The following figure shows the state space tree of tic-tac-toe game. In
this each node represents a state.
The reinforcement learning uses a value function to estimate the values of states. First we
set up a table of numbers, one for each possible state of the game. Each number will be
the latest estimate of the probability of our winning from that state. We treat this estimate
as the state's value, and the whole table is the learned value function. If state A has higher
value than state B then we say that the probability of winning from A is higher than from B.
In tic-tac-toe game we can only know the values of terminal states as shown below.

…..

…..
For all other states, the values
can be estimated using a value
function as given below.

Where S is the current state

S` is the new state
α is the learning rate
V(s) denote the value of state s
Initially the values of all states are assumed to be 0 except the terminal states. Assume
that we are at current state let’s say s4 as given below. From S4 many moves are possible
as shown below. Out of these possible moves the agent makes a move to the state that
has maximum value. In this example, the agent makes a move to S9 as it has the
maximum value.
In this way the agent computes the values of states using the value function and makes
moves as per the state values.

1.6 History of Reinforcement Learning

History of RL includes three main threads
1. The first thread tells about the idea of the trial and error learning.
2. Second thread concentrates on the problem of optimal control and its solution.
3. Third thread concentrate on temporal difference methods .

First thread: trial and error

 Edward Thorndike expressed the importance of trial and error methods in RL
 Minsky (1954) developed an Artificial Neuron to mimic the Boolean gates
 Donald Michie developed a learning system to play Tic- Tac-Toe game
 John Holland introduced value functions in playing the games
Second thread: optimal control and its solution
 The term optimal control describes the design of optimised system.
 Richard bellman 1967 proposed by bellman equation which is quite extensively
used in RL to find optimal policy.
 Markov proposed MDP (Marco decision process) which is used to make decision
optimally to solve RL problem.
 Ron Howard devised a policy iteration method to solve MDP problems.
Third thread: temporal difference method
 It is one of the methods for solving RL problems.
 Arther Samuel 1959 proposed a TD method used to play checkers game
 Cloud shannon 1950 suggested an evolution method based on TD to play chess
 Sutton 1998 described the same learning methods based on TD
 Chris walkim developed Q- learning which is used to solve RL problems
 Geny Tesauro developed backgammon playing program and TD gammon

CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
DGCA RPAS Guidance Manual
No ratings yet
DGCA RPAS Guidance Manual
66 pages
A Socio Legal Analysis of Voyeurism and Stalking in India
No ratings yet
A Socio Legal Analysis of Voyeurism and Stalking in India
23 pages
Significance of Trees in Landscaping
No ratings yet
Significance of Trees in Landscaping
11 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
To Business Strategy: A Guide by Reda Shuhumi
No ratings yet
To Business Strategy: A Guide by Reda Shuhumi
25 pages
Hydra
No ratings yet
Hydra
5 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
No ratings yet
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
58 pages
Electroplating
80% (5)
Electroplating
18 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Doosan DX 190W
No ratings yet
Doosan DX 190W
315 pages
R22ML 5
No ratings yet
R22ML 5
24 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
HRM 370 Report Final
No ratings yet
HRM 370 Report Final
41 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Ai Unit 3
No ratings yet
Ai Unit 3
23 pages
Design and Evaluation of Biomass Furnace
No ratings yet
Design and Evaluation of Biomass Furnace
18 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
7 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
12 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
8 pages
Amplitude Modulation: Page 1 of 37
No ratings yet
Amplitude Modulation: Page 1 of 37
36 pages
Powersynth: Multi-Chip Power Module Layout Synthesis: Application of Fast Design Optimization Tools For Mcpms
No ratings yet
Powersynth: Multi-Chip Power Module Layout Synthesis: Application of Fast Design Optimization Tools For Mcpms
1 page
Lecture#1 - RL An Introduction 2023
No ratings yet
Lecture#1 - RL An Introduction 2023
44 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
Introduction To Prolog-Unit3
No ratings yet
Introduction To Prolog-Unit3
30 pages
Lect 2
No ratings yet
Lect 2
26 pages
Department of Education: Sta. Lucia National High School
No ratings yet
Department of Education: Sta. Lucia National High School
9 pages
Strategic Management (SM) : Saurav Banerjee 1
No ratings yet
Strategic Management (SM) : Saurav Banerjee 1
11 pages
RL Week - 1
No ratings yet
RL Week - 1
53 pages
Unit 5
No ratings yet
Unit 5
58 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
1.B. Soal Bahasa Inggris X Sem 1
No ratings yet
1.B. Soal Bahasa Inggris X Sem 1
15 pages
Module 01
No ratings yet
Module 01
66 pages
Unit 3
No ratings yet
Unit 3
29 pages
Types of Data:: Reference Website
No ratings yet
Types of Data:: Reference Website
15 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Unit 4
No ratings yet
Unit 4
56 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
ML Unit 5
No ratings yet
ML Unit 5
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
8 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Module 1
No ratings yet
Module 1
72 pages
RL Vishnu Sankar
No ratings yet
RL Vishnu Sankar
26 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Unit - 5 Re-Inforcement Learning
No ratings yet
Unit - 5 Re-Inforcement Learning
3 pages
Aisha Data
No ratings yet
Aisha Data
12 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Usulan Alat Lab TKLP 2022 Asiin
No ratings yet
Usulan Alat Lab TKLP 2022 Asiin
11 pages
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
No ratings yet
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
9 pages
All Plan Ronchester
No ratings yet
All Plan Ronchester
38 pages
Seminar 4 Verb
No ratings yet
Seminar 4 Verb
11 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Potential AI Based Software Products That Can Be Built by Small Companies
No ratings yet
Potential AI Based Software Products That Can Be Built by Small Companies
2 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
Class - XII Syllabus (PA-2)
No ratings yet
Class - XII Syllabus (PA-2)
7 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Basics of Digital Circuits: by Serupalli Mineesha 21MTIS03
No ratings yet
Basics of Digital Circuits: by Serupalli Mineesha 21MTIS03
22 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
11 Board Question Paper Maths II November 2020 - 6598093377c7e
No ratings yet
11 Board Question Paper Maths II November 2020 - 6598093377c7e
4 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
Monthly Updates: Lead Sponsor
No ratings yet
Monthly Updates: Lead Sponsor
20 pages
Irregular Verbs Exercises
No ratings yet
Irregular Verbs Exercises
2 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
PRELIM Tour Guiding
No ratings yet
PRELIM Tour Guiding
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Design and Stress Analysis of Multi-Purpose Power Tiller: B Tech Pursuer Abin Thomas
No ratings yet
Design and Stress Analysis of Multi-Purpose Power Tiller: B Tech Pursuer Abin Thomas
7 pages
Year 9 Assessment Support Sample Unit 9aa
No ratings yet
Year 9 Assessment Support Sample Unit 9aa
14 pages
Activity Design For SSG Election 2021
100% (2)
Activity Design For SSG Election 2021
3 pages
First Periodical Test in Math - Grade 7
No ratings yet
First Periodical Test in Math - Grade 7
5 pages
Laptop Policy - HR (Final)
No ratings yet
Laptop Policy - HR (Final)
3 pages
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet

Unit I

Uploaded by

Unit I

Uploaded by

Unit-I

1. The Reinforcement Learning Problem

Online Banner Advertisements

1.2 Examples of Reinforcement Learning

Data Centers Cooling

Trading and Finance

1.3 Elements of Reinforcement Learning

4. Model of the Environment

1.4 Limitations and Scope

1.5 An Extended Example: Tic-Tac-Toe

Where S is the current state

1.6 History of Reinforcement Learning

First thread: trial and error

You might also like