Reinforcement Learning
Reinforcement Learning
Reinforcement learning is the closest to human learning as digital systems and machines can
get. Through this training, machine learning models can be taught to follow instructions,
conduct tests, operate equipment, and much more. 2
Reinforcement learning is centered around a digital agent who is put in a specific environment
to learn. Similar to the way that we learn new things, the agent faces a game-like situation and
must make a series of decisions to try to achieve the correct outcome. 3 Through trial and error,
the agent will learn what to do (and what not to do) and is rewarded and punished accordingly.
Every time it receives a reward, it reinforces the behavior and signals the agent to employ the
same tactics again next time.
The foundations for reinforcement in Psychology learning were laid over 100 years ago, and it
is actually said to have a two-pronged origin. The first is rooted in animal learning and the
“Law of Effect,” coined by Edward Thorndike. Thorndike described the Law of Effect in 1911
as the notion that an animal will repeat actions if they produce satisfaction, and it will be
deterred from actions that produce discomfort. Furthermore, the greater the level of pleasure
or pain, the greater the pursuit or deterrence from the action. 4 The Law of Effect combines both
selectional and associative learning; with selectional learning, the animal will try to try a few
different options and routes and select among them based on how they went. In associative
learning, the animal chooses its options based on what situations they associate them with, and
whether they’re positive or negative.
B. F. Skinner, an American psychologist best known for his seminal work on behavior. He is
known as the father of operant conditioning. Arguing that classical conditioning was too
simplistic to fully explain the complexity of human behavior, Skinner believed that people’s
behavior was a result of how they have been conditioned by the consequences of their past
behavior. He introduced the concept of reinforcement schedule, categorized on the basis of
time interval between the presentation of two reinforcements and number of responses given
by the organism. These schedules are,
1) Fixed Interval Schedule- giving the reinforcement after fixed time interval.
2) Variable Interval Schedule- giving the reinforcement after uncertain time interval.
3) Fixed Ratio Schedule- giving the reinforcement after fixed number of responses.
4) Variable Ratio Schedule-giving the reinforcement after uncertain number of responses.
Types of Reinforcement:
Positive Reinforcement: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In other words, it
has a positive effect on behavior.
Advantages of reinforcement learning are:
Maximizes Performance
Sustain change for a long period of time
Too much Reinforcement can lead to an overload of states which can diminish the
results
Operant conditioning has been used to explain various human and animal behaviors, including
learning processes, addiction, and language acquisition. 7 This method primarily concerns
voluntary behaviors as it involves learning through consequences—rewards or punishments—
based on individual choices. These behaviors are typically controlled by the individual, like
studying to achieve good grades or attempting to quit smoking.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward
that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the
possible paths and then choosing the path which gives him the reward with the least hurdles.
Each right step will give the robot a reward and each wrong step will subtract the reward of the
robot. The total reward will be calculated when it reaches the final reward that is the diamond.
Applications of Reinforcement Learning
Reinforcement learning is on the rise and its future is just as vibrant. Here, we’ll take a look at
some of the current ways RL is working in the real world.
1. Automated Robots
While most robots don’t look like pop culture has led us to believe, their capabilities are just
as impressive. The more robots learn using RL, the more accurate they become, and the quicker
they can complete a previously difficult task. They can also perform duties that would be
dangerous for people with far less consequences. For these reasons, aside from requiring some
oversight and regular maintenance, robots are a cost-effective and efficient alternative to
manual labor.
For example, some restaurants use robots to deliver food to tables. Grocery stores are using
robots to identify where shelves are low and order more product. In common settings,
automated robots have been used thus far to assemble products; inspect for defects; count,
track, and manage inventory; deliver goods; travel long and short distances; input, organize,
and report on data; and grasp and handle objects of all different shapes and sizes. As we
continue to test robotic abilities, new features are being introduced to expand their potential.
Predictive text, text summarization, question answering, and machine translation are all
examples of natural language processing (NLP) that uses reinforcement learning. By studying
typical language patterns, RL agents can mimic and predict how people speak to each other
every day. This includes the actual language used, as well as syntax, (the arrangement of words
and phrases) and diction (the choice of words).
In 2016, researchers from Stanford University, Ohio State University, and Microsoft Research
used this learning to generate dialogue, like what’s used for chatbots. Using two virtual agents,
they simulated conversations and used policy gradient methods to reward important attributes
such as coherence, informativity, and ease of answering. 5 This research was unique in that it
didn’t only focus on the question at hand, but also on how an answer could influence future
outcomes. This approach to reinforcement learning in NLP is now widely adopted and used by
customer service departments in many major organizations.
From a consumer perspective, you might notice that the ads you receive are usually from
companies whose websites you’ve visited before, whom you have bought from before, or are
in the same industry as a company from which you’ve made a purchase. That’s because
marketing and advertising platforms can use reinforcement learning to associate similar
companies, products, and services to prioritize for certain customers. If they try certain options
and receive a click or other engagement, it signals that they were ‘correct’ and should employ
the same strategy again.2
4. Image Processing
Have you ever taken a security test that asked you to identify objects in frames, such as “Click
on the photos that have a street sign in them”? This is similar to what learning machines can
do, although they approach it in a different way.
When asked to process an image, RL agents will search an entire image as their starting point,
then identify objects sequentially until everything is registered. Artificial vision systems also
use deep convolutional neural networks, made up of large, labeled datasets, to map images to
human-generated scene descriptions from simulation engines. 2
Robots equipped with visual sensors from to learn their surrounding environment
Scanners to understand and interpret text
Image pre-processing and segmentation of medical images, like CT Scans
Traffic analysis and real-time road processing by video segmentation and frame-by-
frame image processing
CCTV cameras for traffic and crowd analytics
5. Recommendation Systems
The “Frequently Bought Together” section on Amazon, a “Customers Also Liked” tab online
at Target, and the “Recommended Reading” articles from news outlets all utilize learning
machines to generate recommendations. Specifically for news reading, RL agents can track the
types of stories, topics, and even author names someone prefers so that the system can queue
the next story they think they would enjoy. That includes the details of exactly how they interact
with the content, e.g., clicks and shares, and aspects such as timing and freshness of the news.
A reward is then defined based on these user behaviors. 5
Recommendation systems also analyze past behaviors to try to predict future ones. So if, for
example, a hundred people who bought ski pants then went on to buy ski boots, a company’s
system learns to send ads for ski boots to anyone who just bought ski pants. If the ads are
unsuccessful, they might try to display ads for ski jackets, instead, and see how the results
compare.
6. Gaming
From creating a new game, to testing its bugs, to defeating its levels, RL is an efficient and
relatively easy resource on which programmers can rely. Compared to traditional video games
that require complex behavioral trees to craft the logic of the game, training an RL model is
much simpler. Here, the agent will learn by itself in the simulated game environment through
navigation, defense, attack, and strategizing. 2 Through trial and error, they’ll begin to perform
the necessary actions to reach the desired goal.
RL agents are also used in bug detection and game testing. This is due to its ability to run a
large number of iterations without human input, stress testing, and creating situations for
potential bugs.2
7. Energy Conservation
As much of the world works to lower their effects on the climate, reducing energy consumption
is at the top of the list. A prime example is the partnership between Deepmind and Google to
cool massive and essential Google Data Centers. With a fully-functioning AI system, the
centers saw a 40% reduction in energy spending without the need for human intervention—
though there is still some supervision from data center experts. 5,6
Taking snapshots of data from the data centers every five minutes and feeding this to
deep neural networks
Predicting how different combinations will affect future energy consumptions
Identifying actions that will lead to minimal power consumption while maintaining a
set standard of safety criteria
Sending and implementing these actions at the data center
Verifying the actions by the local control system
Another example may be an Eco setting on your thermostat, or motion-activated lights that
offer different settings based on the level of light already in the room.
8. Traffic Control
Civil engineers have been struggling with traffic for centuries, but reinforcement learning is
working to help solve that. Continuous traffic monitoring in complex urban networks helps
build a literal and figurative “map” of traffic patterns and vehicle behavior. Due to its data-
driven nature, the RL agents can start to learn when traffic is heaviest, which directions it’s
coming from, and how quickly cars are moving through each light color. 2 Then, they adapt
accordingly and continue to test and learn across times, climates, and seasons.
9. Healthcare
Healthcare employs machine learning and artificial intelligence in much of its work, and RL is
no exception. It has been used in automated medical diagnosis, resource scheduling, drug
discovery and development, and health management. 5
One important avenue for deploying reinforcement learning is in dynamic treatment regimes
(DTRs). To create a DTR, someone must input a set of clinical observations and assessments
of a patient. Using previous outcomes and patient medical history, the learning system will
then output a suggestion on treatment type, drug dosages, and appointment timing for every
stage of the patient’s journey. This is extremely beneficial for making time-dependent decisions
for the best treatment for a patient at a specific time without expending much time, energy, or
effort to consult with multiple parties. 2