0% found this document useful (0 votes)
12 views3 pages

Assignment 15 Modern AI

The document discusses reinforcement learning (RL), a method where agents learn to optimize actions based on interactions with their environment to maximize long-term rewards. It outlines key components of RL, including policy, reward signals, value functions, and models of the environment, while also highlighting the challenges of balancing exploration and exploitation. Examples illustrate RL applications, and the document emphasizes its significance in artificial intelligence and decision-making processes.

Uploaded by

Ameen Aazam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Assignment 15 Modern AI

The document discusses reinforcement learning (RL), a method where agents learn to optimize actions based on interactions with their environment to maximize long-term rewards. It outlines key components of RL, including policy, reward signals, value functions, and models of the environment, while also highlighting the challenges of balancing exploration and exploitation. Examples illustrate RL applications, and the document emphasizes its significance in artificial intelligence and decision-making processes.

Uploaded by

Ameen Aazam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment - 15

Ameen Aazam
EE23BTECH11006

The concept of learning from the interaction is a basic concept in theories of learning and intelligence.
Reinforcement learning studied as goal directed learning from interaction is the theme of this book.
It investigates several learning methods, studying idealized learning settings, and designing good
machines for scientific or economic problems.

1 Reinforcement Learning
Reinforcement learning (RL) is a form in which an agent learns to map states to actions so that the
overall outcome, in the form of a numerical reward measure, is optimized. RL is unlike the supervised
learning: it requires trial and error, works with delayed reward, and trial and error search. This it
strives to find which actions will give rise to the most reward in time.
RL rather distinguished itself from supervised and unsupervised learning, as such, the agents should
learn from their own experiences without help. Unlike unsupervised learning, that comprises of
searching for patterns in unlabeled data, RL is about optimizing the patterns that will bring the
greatest reward for interacting with the environment.
But RL is a challenge because the agents have to balance the exploration and exploitation trade-off —
that is, the exploration of new unknown rewards versus exploiting known ones for potential future
rewards, as concentrating on the latter may impede success.
The agent’s behaviour interacting with the environment is represented as a Markov decision process
(MDP) where the goal is to identify actions which, in spite of the uncertainty of the environment,
optimizes for a long term goal, through state sensing, actions and feedback.
In addition, RL hooks into other fields like psychology, neuroscience, statistics and optimization. RL
has been used to study core algorithms inspired by biological learning systems, and has contributed to
improved models of animal learning and brain reward systems. The strength of this interdisciplinary
approach is to place RL at the center of artificial intelligence (AI) as it becomes more focused on
finding general learning principles rather than task specific heuristics and general methods.

2 Examples
Reinforcement learning is illustrated through diverse examples like :

• A chess player improves their strategy using planning and intuition.


• An adaptive controller optimizes a refinery’s operations in real time.
• A newborn gazelle learns to stand and run within minutes.
• A robot decides between tasks based on battery levels and past experiences.
• Phil prepares breakfast by making complex decisions involving goal–subgoal behavior.

These examples share common elements: an agent (acting in an environment) that interacts with its
environment, it makes decisions to reach goals, faces uncertainty, learns from experience to improve
over time. However, their actions affect future states, and for these they need foresight, adaptation
and ongoing feedback for optimal performance.

Preprint. Under review.


3 Elements of Reinforcement Learning
3.1 Policy

It defines an agent’s behavior at any given time. It then maps perceived environmental states to
actions (stimulus response). It might be simple function, lookup table or involve some complex
calculations. There can be policies that are deterministic or stochastic; i.e., policies that specify action
probabilities.

3.2 Reward Signal

The goal of the agent is represented through the reward signal. The environment provides a numerical
reward at a time step and the agent wishes to maximize this over the longer term. Rewards show us if
an action is good or bad, and leads to adjustments on the agent’s behavior.

3.3 Value Function

The predicted future rewards starting from a state, is used as a proxy for the long term desirability
of that state. Rewards indicate instant outcomes, and values imply identical outcomes in the future,
therefore value estimation is important for ideal decision making.

3.4 Model of the Environment

A couple reinforcement learning systems directly use a model in order to predict how the environment
will behave elsewhere, such as predating future states and rewards. This model will allow it to plan,
i.e. the agent will choose an action based on what the possible future scenarios might be. When
referring to model based methods, the planning takes place, though model free method uses trial and
error.

4 Limitations and Scope


In reinforcement learning, the idea of a state gives agents the environmental information so they
can make decisions. This is concerned with the choice of given actions depending on available
state information. Evolutionary methods such as genetic algorithms and simulated annealing use
evolution to develop policies from performance, but not to estimate value functions. In environments
where the complete state is not accessible these methods can be effective. Nevertheless, most of the
reinforcement learning methods which learn on the basis of direct interaction with environment are
more efficient, making use of the very detailed state action relationships.

5 An Extended Example: Tic-Tac-Toe


In tic-tac-toe, it’s a two player game in which players are trying to gets three of their marks in
rows (any row) on a 3x3 board. Minimax is a traditional way to play and makes assumptions about
the behaviour of an opponent, and the algorithm cannot adapt without having access to previous
information. Playing multiple game is the optimal approach, i.e. learning the model of the opponent
through experience. Methods based on evolution aim at assessing performance and modify strategies
after a few games. The value function of reinforcement learning is used for each game state whose
value represents the probability of winning. Players can then adapt their strategy dynamically
according to their actual game experiences. In more complex problems with continuous problems
or situations of unambiguous adversary, reinforcement learning may be used. Large or infinite state
spaces can be handled and can be used at multiple levels within a hierarchical learning framework over
complex problem solving methods. Reinforcement learning key features include incorporating prior
knowledge, hints implicit representations and methods with and without models of the environment.
Reinforcement learning is a computational approach to goal directed learning and decision making,
addressing computational questions in learning from direct interaction with an environment. First, it
defines interactions of a learning agent with its environment based on Markov decision processes
framework, where states, actions and rewards are considered. This framework embodies the central

2
problems of artificial intelligence — prescribed conditional outcome, uncertainty, and goals. In
reinforcement learning methods, understanding and automating learning processes is made possible
with the concepts of value and value function.

You might also like