Assignment 15 Modern AI

The document discusses reinforcement learning (RL), a method where agents learn to optimize actions based on interactions with their environment to maximize long-term rewards. It outlines key components of RL, including policy, reward signals, value functions, and models of the environment, while also highlighting the challenges of balancing exploration and exploitation. Examples illustrate RL applications, and the document emphasizes its significance in artificial intelligence and decision-making processes.

Uploaded by

Ameen Aazam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views3 pages

Assignment 15 Modern AI

Uploaded by

Ameen Aazam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment - 15

Ameen Aazam
EE23BTECH11006

The concept of learning from the interaction is a basic concept in theories of learning and intelligence.
Reinforcement learning studied as goal directed learning from interaction is the theme of this book.
It investigates several learning methods, studying idealized learning settings, and designing good
machines for scientific or economic problems.

1 Reinforcement Learning
Reinforcement learning (RL) is a form in which an agent learns to map states to actions so that the
overall outcome, in the form of a numerical reward measure, is optimized. RL is unlike the supervised
learning: it requires trial and error, works with delayed reward, and trial and error search. This it
strives to find which actions will give rise to the most reward in time.
RL rather distinguished itself from supervised and unsupervised learning, as such, the agents should
learn from their own experiences without help. Unlike unsupervised learning, that comprises of
searching for patterns in unlabeled data, RL is about optimizing the patterns that will bring the
greatest reward for interacting with the environment.
But RL is a challenge because the agents have to balance the exploration and exploitation trade-off —
that is, the exploration of new unknown rewards versus exploiting known ones for potential future
rewards, as concentrating on the latter may impede success.
The agent’s behaviour interacting with the environment is represented as a Markov decision process
(MDP) where the goal is to identify actions which, in spite of the uncertainty of the environment,
optimizes for a long term goal, through state sensing, actions and feedback.
In addition, RL hooks into other fields like psychology, neuroscience, statistics and optimization. RL
has been used to study core algorithms inspired by biological learning systems, and has contributed to
improved models of animal learning and brain reward systems. The strength of this interdisciplinary
approach is to place RL at the center of artificial intelligence (AI) as it becomes more focused on
finding general learning principles rather than task specific heuristics and general methods.

2 Examples
Reinforcement learning is illustrated through diverse examples like :

• A chess player improves their strategy using planning and intuition.

• An adaptive controller optimizes a refinery’s operations in real time.
• A newborn gazelle learns to stand and run within minutes.
• A robot decides between tasks based on battery levels and past experiences.
• Phil prepares breakfast by making complex decisions involving goal–subgoal behavior.

These examples share common elements: an agent (acting in an environment) that interacts with its
environment, it makes decisions to reach goals, faces uncertainty, learns from experience to improve
over time. However, their actions affect future states, and for these they need foresight, adaptation
and ongoing feedback for optimal performance.

Preprint. Under review.

3 Elements of Reinforcement Learning
3.1 Policy

It defines an agent’s behavior at any given time. It then maps perceived environmental states to
actions (stimulus response). It might be simple function, lookup table or involve some complex
calculations. There can be policies that are deterministic or stochastic; i.e., policies that specify action
probabilities.

3.2 Reward Signal

The goal of the agent is represented through the reward signal. The environment provides a numerical
reward at a time step and the agent wishes to maximize this over the longer term. Rewards show us if
an action is good or bad, and leads to adjustments on the agent’s behavior.

3.3 Value Function

The predicted future rewards starting from a state, is used as a proxy for the long term desirability
of that state. Rewards indicate instant outcomes, and values imply identical outcomes in the future,
therefore value estimation is important for ideal decision making.

3.4 Model of the Environment

A couple reinforcement learning systems directly use a model in order to predict how the environment
will behave elsewhere, such as predating future states and rewards. This model will allow it to plan,
i.e. the agent will choose an action based on what the possible future scenarios might be. When
referring to model based methods, the planning takes place, though model free method uses trial and
error.

4 Limitations and Scope

In reinforcement learning, the idea of a state gives agents the environmental information so they
can make decisions. This is concerned with the choice of given actions depending on available
state information. Evolutionary methods such as genetic algorithms and simulated annealing use
evolution to develop policies from performance, but not to estimate value functions. In environments
where the complete state is not accessible these methods can be effective. Nevertheless, most of the
reinforcement learning methods which learn on the basis of direct interaction with environment are
more efficient, making use of the very detailed state action relationships.

5 An Extended Example: Tic-Tac-Toe

In tic-tac-toe, it’s a two player game in which players are trying to gets three of their marks in
rows (any row) on a 3x3 board. Minimax is a traditional way to play and makes assumptions about
the behaviour of an opponent, and the algorithm cannot adapt without having access to previous
information. Playing multiple game is the optimal approach, i.e. learning the model of the opponent
through experience. Methods based on evolution aim at assessing performance and modify strategies
after a few games. The value function of reinforcement learning is used for each game state whose
value represents the probability of winning. Players can then adapt their strategy dynamically
according to their actual game experiences. In more complex problems with continuous problems
or situations of unambiguous adversary, reinforcement learning may be used. Large or infinite state
spaces can be handled and can be used at multiple levels within a hierarchical learning framework over
complex problem solving methods. Reinforcement learning key features include incorporating prior
knowledge, hints implicit representations and methods with and without models of the environment.
Reinforcement learning is a computational approach to goal directed learning and decision making,
addressing computational questions in learning from direct interaction with an environment. First, it
defines interactions of a learning agent with its environment based on Markov decision processes
framework, where states, actions and rewards are considered. This framework embodies the central

2
problems of artificial intelligence — prescribed conditional outcome, uncertainty, and goals. In
reinforcement learning methods, understanding and automating learning processes is made possible
with the concepts of value and value function.

Kennedy, Gavin - The New Negotiation Edge
83% (6)
Kennedy, Gavin - The New Negotiation Edge
288 pages
Important CSS Vocabulary PDF
78% (9)
Important CSS Vocabulary PDF
33 pages
Possible Instruments
No ratings yet
Possible Instruments
8 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Unbeatable Mind
No ratings yet
Unbeatable Mind
6 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Recent Trends in Wage and Salary Administration in Nigeria: A Synopsis On Theoretical and Empirical Challenges
100% (1)
Recent Trends in Wage and Salary Administration in Nigeria: A Synopsis On Theoretical and Empirical Challenges
12 pages
Lesson 11 Speech According To Purpose and Delivery
100% (1)
Lesson 11 Speech According To Purpose and Delivery
26 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Proposal For Fitness Instructor Job
50% (2)
Proposal For Fitness Instructor Job
2 pages
DLP English 5
No ratings yet
DLP English 5
241 pages
NAPLANstyle Tests Y5 Online Resource 2017
79% (14)
NAPLANstyle Tests Y5 Online Resource 2017
204 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
64 ABB Interview Questions Answers
No ratings yet
64 ABB Interview Questions Answers
6 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Unit 4
No ratings yet
Unit 4
56 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Unit 5
No ratings yet
Unit 5
58 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
Unit I
No ratings yet
Unit I
8 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
Module 1
No ratings yet
Module 1
72 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Module 01
No ratings yet
Module 01
66 pages
Unit 3
No ratings yet
Unit 3
29 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
7 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning B.Tech. IV Year I Sem. Unit - I
No ratings yet
Reinforcement Learning B.Tech. IV Year I Sem. Unit - I
27 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Unit 3
No ratings yet
Unit 3
12 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Lecture 9 - Reinforced Learning
No ratings yet
Lecture 9 - Reinforced Learning
18 pages
Evaluative Tool COPAR
No ratings yet
Evaluative Tool COPAR
8 pages
A View From The Bridge
No ratings yet
A View From The Bridge
10 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Unit 6
No ratings yet
Unit 6
34 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Transcultural Nursing
No ratings yet
Transcultural Nursing
22 pages
Informed Consent Form 1
No ratings yet
Informed Consent Form 1
2 pages
Situational Theory of Leadership
No ratings yet
Situational Theory of Leadership
35 pages
Globalization and Human Resource Management
No ratings yet
Globalization and Human Resource Management
18 pages
Curriculum Vitae Payal Sharma: Recruiting and Staffing Logistics Organizational and Space Planning
No ratings yet
Curriculum Vitae Payal Sharma: Recruiting and Staffing Logistics Organizational and Space Planning
3 pages
Ba24 Sob-1111 Introduction To Sociology
No ratings yet
Ba24 Sob-1111 Introduction To Sociology
8 pages
Module 1 General Courses - Year 4 - 2nd Semester
No ratings yet
Module 1 General Courses - Year 4 - 2nd Semester
95 pages
EYQ Activity2 Instructions Final
No ratings yet
EYQ Activity2 Instructions Final
7 pages
Strategies To Extend Your Answer
No ratings yet
Strategies To Extend Your Answer
3 pages
Preference Assessment
No ratings yet
Preference Assessment
10 pages
Wilman R Edpr3002 A2 Porfolio
No ratings yet
Wilman R Edpr3002 A2 Porfolio
8 pages
Ideas and Opinions by Albert Einstein
No ratings yet
Ideas and Opinions by Albert Einstein
22 pages
Level 3 Diploma in Cyber Security Management and Operations
No ratings yet
Level 3 Diploma in Cyber Security Management and Operations
4 pages
Unit 14
No ratings yet
Unit 14
24 pages
Leadership and Supervisory Behavior
No ratings yet
Leadership and Supervisory Behavior
30 pages
Network AND Linkages
No ratings yet
Network AND Linkages
23 pages
Deviant Selves, Transgressive Acts, and Moral Narratives. The Symbolic-Interactionist Field of Transgression, Crime, and Justice
No ratings yet
Deviant Selves, Transgressive Acts, and Moral Narratives. The Symbolic-Interactionist Field of Transgression, Crime, and Justice
23 pages
Vaiata Yarbrough - Introducing Pyschology Worksheet
No ratings yet
Vaiata Yarbrough - Introducing Pyschology Worksheet
3 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Action Election: Fundamentals and Applications
From Everand
Action Election: Fundamentals and Applications
Fouad Sabry
No ratings yet