0% found this document useful (0 votes)

11 views7 pages

Reinforcement Learning

Reinforcement learning (RL) is a machine learning approach that mimics human learning by allowing agents to make decisions in an environment through trial and error, receiving rewards or penalties based on their actions. The concept has roots in psychology, with foundational theories from Edward Thorndike, Ivan Pavlov, and B.F. Skinner, focusing on how behavior is shaped by consequences. RL has diverse applications, including automated robots, natural language processing, marketing, gaming, and healthcare, showcasing its potential to optimize performance and decision-making across various fields.

Uploaded by

vaidipkorde64

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Reinforcement Learning

Uploaded by

vaidipkorde64

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Reinforcement Theory in Psychology and Artificial Intelligence

What is reinforcement learning?

Reinforcement learning is the closest to human learning as digital systems and machines can
get. Through this training, machine learning models can be taught to follow instructions,
conduct tests, operate equipment, and much more. 2

Reinforcement learning is centered around a digital agent who is put in a specific environment
to learn. Similar to the way that we learn new things, the agent faces a game-like situation and
must make a series of decisions to try to achieve the correct outcome. 3 Through trial and error,
the agent will learn what to do (and what not to do) and is rewarded and punished accordingly.
Every time it receives a reward, it reinforces the behavior and signals the agent to employ the
same tactics again next time.

History & Background

The foundations for reinforcement in Psychology learning were laid over 100 years ago, and it
is actually said to have a two-pronged origin. The first is rooted in animal learning and the
“Law of Effect,” coined by Edward Thorndike. Thorndike described the Law of Effect in 1911
as the notion that an animal will repeat actions if they produce satisfaction, and it will be
deterred from actions that produce discomfort. Furthermore, the greater the level of pleasure
or pain, the greater the pursuit or deterrence from the action. 4 The Law of Effect combines both
selectional and associative learning; with selectional learning, the animal will try to try a few
different options and routes and select among them based on how they went. In associative
learning, the animal chooses its options based on what situations they associate them with, and
whether they’re positive or negative.

Although Thorndike established the essence of reinforcement learning, the term

“reinforcement” wasn’t formally used until 1927 by Ivan Pavlov. He described reinforcement
as “the strengthening of a pattern of behavior due to an animal receiving a stimulus—a
reinforcer— in a time-dependent relationship with another stimulus or with a response.”4 In
other words, when animals receive a reaction to something they’ve done shortly after they’ve
done it, it affects whether or not they’ll do it again, in the same way, in the future.

B. F. Skinner, an American psychologist best known for his seminal work on behavior. He is
known as the father of operant conditioning. Arguing that classical conditioning was too
simplistic to fully explain the complexity of human behavior, Skinner believed that people’s
behavior was a result of how they have been conditioned by the consequences of their past
behavior. He introduced the concept of reinforcement schedule, categorized on the basis of
time interval between the presentation of two reinforcements and number of responses given
by the organism. These schedules are,

1) Fixed Interval Schedule- giving the reinforcement after fixed time interval.
2) Variable Interval Schedule- giving the reinforcement after uncertain time interval.
3) Fixed Ratio Schedule- giving the reinforcement after fixed number of responses.
4) Variable Ratio Schedule-giving the reinforcement after uncertain number of responses.

Types of Reinforcement:
Positive Reinforcement: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In other words, it
has a positive effect on behavior.
Advantages of reinforcement learning are:

 Maximizes Performance
 Sustain change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish the
results

2. Negative Reinforcement: The function of negative reinforcement is to weaken

undesirable behavior
 Decreases undesirable behavior
 Provide defiance to a minimum standard of performance
 It Only provides enough to meet up the minimum behavior

Operant conditioning has been used to explain various human and animal behaviors, including
learning processes, addiction, and language acquisition. 7 This method primarily concerns
voluntary behaviors as it involves learning through consequences—rewards or punishments—
based on individual choices. These behaviors are typically controlled by the individual, like
studying to achieve good grades or attempting to quit smoking.

Reinforcement learning In Artificial Intelligence and Machine Learning

Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to

maximize cumulative rewards in a given situation. Unlike supervised learning, which relies on
a training dataset with predefined answers, RL involves learning through experience. In RL,
an agent learns to achieve a goal in an uncertain, potentially complex environment by
performing actions and receiving feedback through rewards or penalties.

Key Concepts of Reinforcement Learning

 Agent: The learner or decision-maker.
 Environment: Everything the agent interacts with.
 State/Target: A specific situation in which the agent finds itself.
 Action: All possible moves the agent can make.
 Reward: Feedback from the environment based on the action taken.

How Reinforcement Learning Works

RL operates on the principle of learning optimal behavior through trial and error. The agent
takes actions within the environment, receives rewards or penalties, and adjusts its behavior to
maximize the cumulative reward. This learning process is characterized by the following
elements:
 Policy: A strategy used by the agent to determine the next action based on the current state.
 Reward Function: A function that provides a scalar feedback signal based on the state and
action.
 Value Function: A function that estimates the expected cumulative reward from a given
state.
 Model of the Environment: A representation of the environment that helps in planning by
predicting future states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between. The
agent is supposed to find the best possible path to reach the reward. The following problem
explains the problem more easily.

The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward
that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the
possible paths and then choosing the path which gives him the reward with the least hurdles.
Each right step will give the robot a reward and each wrong step will subtract the reward of the
robot. The total reward will be calculated when it reaches the final reward that is the diamond.
Applications of Reinforcement Learning

Reinforcement learning is on the rise and its future is just as vibrant. Here, we’ll take a look at
some of the current ways RL is working in the real world.

1. Automated Robots

While most robots don’t look like pop culture has led us to believe, their capabilities are just
as impressive. The more robots learn using RL, the more accurate they become, and the quicker
they can complete a previously difficult task. They can also perform duties that would be
dangerous for people with far less consequences. For these reasons, aside from requiring some
oversight and regular maintenance, robots are a cost-effective and efficient alternative to
manual labor.

For example, some restaurants use robots to deliver food to tables. Grocery stores are using
robots to identify where shelves are low and order more product. In common settings,
automated robots have been used thus far to assemble products; inspect for defects; count,
track, and manage inventory; deliver goods; travel long and short distances; input, organize,
and report on data; and grasp and handle objects of all different shapes and sizes. As we
continue to test robotic abilities, new features are being introduced to expand their potential.

2. Natural Language Processing

Predictive text, text summarization, question answering, and machine translation are all
examples of natural language processing (NLP) that uses reinforcement learning. By studying
typical language patterns, RL agents can mimic and predict how people speak to each other
every day. This includes the actual language used, as well as syntax, (the arrangement of words
and phrases) and diction (the choice of words).

In 2016, researchers from Stanford University, Ohio State University, and Microsoft Research
used this learning to generate dialogue, like what’s used for chatbots. Using two virtual agents,
they simulated conversations and used policy gradient methods to reward important attributes
such as coherence, informativity, and ease of answering. 5 This research was unique in that it
didn’t only focus on the question at hand, but also on how an answer could influence future
outcomes. This approach to reinforcement learning in NLP is now widely adopted and used by
customer service departments in many major organizations.

3. Marketing and Advertising

Both brands and consumers can use reinforcement learning to their benefit. For brands selling
to target audiences, they can use real-time bidding platforms, A/B testing, and automatic ad
optimization. This means that they can place a series of advertisements in the marketplace and
the host will automatically serve the best-performing ads in the best spots for the lowest
prices.2,5 Although brands post and set up the campaigns themselves, marketing and
advertising platforms are also learning which types of ads are resonating with audiences and
will display those ads more frequently and prominently.

From a consumer perspective, you might notice that the ads you receive are usually from
companies whose websites you’ve visited before, whom you have bought from before, or are
in the same industry as a company from which you’ve made a purchase. That’s because
marketing and advertising platforms can use reinforcement learning to associate similar
companies, products, and services to prioritize for certain customers. If they try certain options
and receive a click or other engagement, it signals that they were ‘correct’ and should employ
the same strategy again.2

4. Image Processing

Have you ever taken a security test that asked you to identify objects in frames, such as “Click
on the photos that have a street sign in them”? This is similar to what learning machines can
do, although they approach it in a different way.

When asked to process an image, RL agents will search an entire image as their starting point,
then identify objects sequentially until everything is registered. Artificial vision systems also
use deep convolutional neural networks, made up of large, labeled datasets, to map images to
human-generated scene descriptions from simulation engines. 2

Some more examples of reinforcement learning in image processing include: 2

 Robots equipped with visual sensors from to learn their surrounding environment
 Scanners to understand and interpret text
 Image pre-processing and segmentation of medical images, like CT Scans
 Traffic analysis and real-time road processing by video segmentation and frame-by-
frame image processing
 CCTV cameras for traffic and crowd analytics
5. Recommendation Systems

The “Frequently Bought Together” section on Amazon, a “Customers Also Liked” tab online
at Target, and the “Recommended Reading” articles from news outlets all utilize learning
machines to generate recommendations. Specifically for news reading, RL agents can track the
types of stories, topics, and even author names someone prefers so that the system can queue
the next story they think they would enjoy. That includes the details of exactly how they interact
with the content, e.g., clicks and shares, and aspects such as timing and freshness of the news.
A reward is then defined based on these user behaviors. 5

Recommendation systems also analyze past behaviors to try to predict future ones. So if, for
example, a hundred people who bought ski pants then went on to buy ski boots, a company’s
system learns to send ads for ski boots to anyone who just bought ski pants. If the ads are
unsuccessful, they might try to display ads for ski jackets, instead, and see how the results
compare.

6. Gaming

From creating a new game, to testing its bugs, to defeating its levels, RL is an efficient and
relatively easy resource on which programmers can rely. Compared to traditional video games
that require complex behavioral trees to craft the logic of the game, training an RL model is
much simpler. Here, the agent will learn by itself in the simulated game environment through
navigation, defense, attack, and strategizing. 2 Through trial and error, they’ll begin to perform
the necessary actions to reach the desired goal.

RL agents are also used in bug detection and game testing. This is due to its ability to run a
large number of iterations without human input, stress testing, and creating situations for
potential bugs.2

7. Energy Conservation

As much of the world works to lower their effects on the climate, reducing energy consumption
is at the top of the list. A prime example is the partnership between Deepmind and Google to
cool massive and essential Google Data Centers. With a fully-functioning AI system, the
centers saw a 40% reduction in energy spending without the need for human intervention—
though there is still some supervision from data center experts. 5,6

The system works in the following way:5

 Taking snapshots of data from the data centers every five minutes and feeding this to
deep neural networks
 Predicting how different combinations will affect future energy consumptions
 Identifying actions that will lead to minimal power consumption while maintaining a
set standard of safety criteria
 Sending and implementing these actions at the data center
 Verifying the actions by the local control system

Another example may be an Eco setting on your thermostat, or motion-activated lights that
offer different settings based on the level of light already in the room.

8. Traffic Control

Civil engineers have been struggling with traffic for centuries, but reinforcement learning is
working to help solve that. Continuous traffic monitoring in complex urban networks helps
build a literal and figurative “map” of traffic patterns and vehicle behavior. Due to its data-
driven nature, the RL agents can start to learn when traffic is heaviest, which directions it’s
coming from, and how quickly cars are moving through each light color. 2 Then, they adapt
accordingly and continue to test and learn across times, climates, and seasons.

9. Healthcare

Healthcare employs machine learning and artificial intelligence in much of its work, and RL is
no exception. It has been used in automated medical diagnosis, resource scheduling, drug
discovery and development, and health management. 5

One important avenue for deploying reinforcement learning is in dynamic treatment regimes
(DTRs). To create a DTR, someone must input a set of clinical observations and assessments
of a patient. Using previous outcomes and patient medical history, the learning system will
then output a suggestion on treatment type, drug dosages, and appointment timing for every
stage of the patient’s journey. This is extremely beneficial for making time-dependent decisions
for the best treatment for a patient at a specific time without expending much time, energy, or
effort to consult with multiple parties. 2

Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Students Affairs With Urdu
50% (2)
Students Affairs With Urdu
6 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Reinforcement Learning: Nazia Bibi
100% (1)
Reinforcement Learning: Nazia Bibi
61 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
No ratings yet
Reinforcement Learning With Python - Master Reinforcemearning in Python Without Being An Expert - Bob Story (Bob Story) (Z-Library)
58 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Reinforcement Learning and Adaptive Dynamic Programming For Feedback Control Lewis
No ratings yet
Reinforcement Learning and Adaptive Dynamic Programming For Feedback Control Lewis
19 pages
Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
7 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Lecture#1 - RL An Introduction 2023
No ratings yet
Lecture#1 - RL An Introduction 2023
44 pages
R22ML 5
No ratings yet
R22ML 5
24 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Collins - 2024 - Reinforcement Learning
No ratings yet
Collins - 2024 - Reinforcement Learning
16 pages
Types of Data:: Reference Website
No ratings yet
Types of Data:: Reference Website
15 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Unit I
No ratings yet
Unit I
8 pages
Cognitive Systems For Revenge and Forgiveness PDF
No ratings yet
Cognitive Systems For Revenge and Forgiveness PDF
58 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Unit 5
No ratings yet
Unit 5
58 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Planning For Health Career
100% (3)
Planning For Health Career
3 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
12 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
HALLIDAY M. A. K. The Language of Scienc PDF
100% (1)
HALLIDAY M. A. K. The Language of Scienc PDF
268 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
8 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
2 pages
Unit 6
No ratings yet
Unit 6
34 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
Unit 3
No ratings yet
Unit 3
29 pages
Reinforcement Learning Is An Autonomous
No ratings yet
Reinforcement Learning Is An Autonomous
3 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Module 1
No ratings yet
Module 1
72 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
UHPW Workbook v8 Digital Spreads
100% (1)
UHPW Workbook v8 Digital Spreads
48 pages
Unit 5
No ratings yet
Unit 5
45 pages
Module 01
No ratings yet
Module 01
66 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
4 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
No ratings yet
Introduction To Reinforcement Learning: Presented by - Rohit Mahto
9 pages
RL Week - 1
No ratings yet
RL Week - 1
53 pages
Unit 4
No ratings yet
Unit 4
56 pages
(The World of Linguistics 6) Geoffrey Haig, Geoffrey Khan - The Languages and Linguistics of Western Asia - An Areal Perspective-De Gruyter (2018)
No ratings yet
(The World of Linguistics 6) Geoffrey Haig, Geoffrey Khan - The Languages and Linguistics of Western Asia - An Areal Perspective-De Gruyter (2018)
986 pages
Compliance & Motivation
75% (4)
Compliance & Motivation
30 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Edna DLL Oral Comm
100% (1)
Edna DLL Oral Comm
41 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
3GP ML Reinforcement Learning
No ratings yet
3GP ML Reinforcement Learning
3 pages
CAT4 Level C Sample Questions
No ratings yet
CAT4 Level C Sample Questions
5 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
SSC CGL Tier 1 Answer Key - Download SSC CGL Morning Shift - Evening Shift Answer Key 2015 Here
100% (1)
SSC CGL Tier 1 Answer Key - Download SSC CGL Morning Shift - Evening Shift Answer Key 2015 Here
11 pages
Lec 01
No ratings yet
Lec 01
60 pages
ML 10
No ratings yet
ML 10
9 pages
How To Write Good Narrative Essays
100% (3)
How To Write Good Narrative Essays
2 pages
Human Machine Interaction
0% (1)
Human Machine Interaction
4 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
PowerPoint Helping Students With Proving
No ratings yet
PowerPoint Helping Students With Proving
90 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
19 pages
Kajian Tindakan Pentafsiran
100% (1)
Kajian Tindakan Pentafsiran
11 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Practical Research 2 - Module 5
No ratings yet
Practical Research 2 - Module 5
36 pages
FINAL CHAPTERS 1 5 With Bibliography and Appendices
No ratings yet
FINAL CHAPTERS 1 5 With Bibliography and Appendices
86 pages
Causative Verbs in English: Let Make Have Get Help
No ratings yet
Causative Verbs in English: Let Make Have Get Help
13 pages
Present Perfect Tense
No ratings yet
Present Perfect Tense
23 pages
Sheet 1 Neural Network
No ratings yet
Sheet 1 Neural Network
5 pages
Quiz 3
No ratings yet
Quiz 3
4 pages
Lesson Plan - Plan & Elevations
No ratings yet
Lesson Plan - Plan & Elevations
3 pages
Kulelat Syndrome 1
No ratings yet
Kulelat Syndrome 1
13 pages
Mona and John Listening Assessment
No ratings yet
Mona and John Listening Assessment
2 pages
June 2020 QP - Paper 1 Edexcel Psychology GCSE
No ratings yet
June 2020 QP - Paper 1 Edexcel Psychology GCSE
40 pages
Industrial/Organizational Psychology - Chapters 10 - 12
No ratings yet
Industrial/Organizational Psychology - Chapters 10 - 12
20 pages
CH01 - PPT - Updated - Recent Without
No ratings yet
CH01 - PPT - Updated - Recent Without
25 pages
Unit 5 - Complete
No ratings yet
Unit 5 - Complete
6 pages
Reflective Essay
No ratings yet
Reflective Essay
2 pages
Portfolio Self Assessment Matrix Final
No ratings yet
Portfolio Self Assessment Matrix Final
2 pages
Webquest Template 1
No ratings yet
Webquest Template 1
3 pages
Foundations Of Human Learning
From Everand
Foundations Of Human Learning
Dr. Asheotsala A. Alaku
No ratings yet
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Secrets of Dark Psychology: Recognizing the manipulators
From Everand
Secrets of Dark Psychology: Recognizing the manipulators
Mentore Arrayago
5/5 (1)