Assignment 15 Modern AI
Assignment 15 Modern AI
Ameen Aazam
EE23BTECH11006
The concept of learning from the interaction is a basic concept in theories of learning and intelligence.
Reinforcement learning studied as goal directed learning from interaction is the theme of this book.
It investigates several learning methods, studying idealized learning settings, and designing good
machines for scientific or economic problems.
1 Reinforcement Learning
Reinforcement learning (RL) is a form in which an agent learns to map states to actions so that the
overall outcome, in the form of a numerical reward measure, is optimized. RL is unlike the supervised
learning: it requires trial and error, works with delayed reward, and trial and error search. This it
strives to find which actions will give rise to the most reward in time.
RL rather distinguished itself from supervised and unsupervised learning, as such, the agents should
learn from their own experiences without help. Unlike unsupervised learning, that comprises of
searching for patterns in unlabeled data, RL is about optimizing the patterns that will bring the
greatest reward for interacting with the environment.
But RL is a challenge because the agents have to balance the exploration and exploitation trade-off —
that is, the exploration of new unknown rewards versus exploiting known ones for potential future
rewards, as concentrating on the latter may impede success.
The agent’s behaviour interacting with the environment is represented as a Markov decision process
(MDP) where the goal is to identify actions which, in spite of the uncertainty of the environment,
optimizes for a long term goal, through state sensing, actions and feedback.
In addition, RL hooks into other fields like psychology, neuroscience, statistics and optimization. RL
has been used to study core algorithms inspired by biological learning systems, and has contributed to
improved models of animal learning and brain reward systems. The strength of this interdisciplinary
approach is to place RL at the center of artificial intelligence (AI) as it becomes more focused on
finding general learning principles rather than task specific heuristics and general methods.
2 Examples
Reinforcement learning is illustrated through diverse examples like :
These examples share common elements: an agent (acting in an environment) that interacts with its
environment, it makes decisions to reach goals, faces uncertainty, learns from experience to improve
over time. However, their actions affect future states, and for these they need foresight, adaptation
and ongoing feedback for optimal performance.
It defines an agent’s behavior at any given time. It then maps perceived environmental states to
actions (stimulus response). It might be simple function, lookup table or involve some complex
calculations. There can be policies that are deterministic or stochastic; i.e., policies that specify action
probabilities.
The goal of the agent is represented through the reward signal. The environment provides a numerical
reward at a time step and the agent wishes to maximize this over the longer term. Rewards show us if
an action is good or bad, and leads to adjustments on the agent’s behavior.
The predicted future rewards starting from a state, is used as a proxy for the long term desirability
of that state. Rewards indicate instant outcomes, and values imply identical outcomes in the future,
therefore value estimation is important for ideal decision making.
A couple reinforcement learning systems directly use a model in order to predict how the environment
will behave elsewhere, such as predating future states and rewards. This model will allow it to plan,
i.e. the agent will choose an action based on what the possible future scenarios might be. When
referring to model based methods, the planning takes place, though model free method uses trial and
error.
2
problems of artificial intelligence — prescribed conditional outcome, uncertainty, and goals. In
reinforcement learning methods, understanding and automating learning processes is made possible
with the concepts of value and value function.