0% found this document useful (0 votes)

13 views

Applications of Reinforcement Learning

Applications of Reinforcement learning

Uploaded by

शून्य अद्वैत

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Applications of Reinforcement Learning

Applications of Reinforcement learning

Uploaded by

शून्य अद्वैत

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

In this article, we’ll look at some of the real-world applications of

reinforcement learning.

Table of contents

May be useful
How to log, explore, and compare the RL agent training metadata with
neptune.ai [Video]

Applications in self-driving cars

Various papers have proposed Deep Reinforcement Learning for autonomous

driving. In self-driving cars, there are various aspects to consider, such as
speed limits at various places, drivable zones, avoiding collisions — just to
mention a few.

Some of the autonomous driving tasks where reinforcement learning could be

applied include trajectory optimization, motion planning, dynamic pathing,
controller optimization, and scenario-based learning policies for highways.

For example, parking can be achieved by learning automatic parking policies.

Lane changing can be achieved using Q-Learning while overtaking can be
implemented by learning an overtaking policy while avoiding collision and
maintaining a steady speed thereafter.

AWS DeepRacer is an autonomous racing car that has been designed to test
out RL in a physical track. It uses cameras to visualize the runway and a
reinforcement learning model to control the throttle and direction.
Table of contents

Source

Wayve.ai has successfully applied reinforcement learning to training a car on

how to drive in a day. They used a deep reinforcement learning algorithm to
tackle the lane following task. Their network architecture was a deep network
with 4 convolutional layers and 3 fully connected layers. The example below
shows the lane following task. The image in the middle represents the driver’s
perspective.

Source

Read more
Self-Driving Cars With Convolutional Neural Networks (CNN)
Industry
Table of contents automation with

Reinforcement Learning

In industry reinforcement, learning-based robots are used to perform various

tasks. Apart from the fact that these robots are more efficient than human
beings, they can also perform tasks that would be dangerous for people.

A great example is the use of AI agents by Deepmind to cool Google Data

Centers. This led to a 40% reduction in energy spending. The centers are now
fully controlled with the AI system without the need for human intervention.
There is obviously still supervision from data center experts. The system
works in the following way:

Taking snapshots of data from the data centers every five minutes and
feeding this to deep neural networks
It then predicts how different combinations will affect future energy
consumptions
Identifying actions that will lead to minimal power consumption while
maintaining a set standard of safety criteria
Sending and implement these actions at the data center

The actions are verified by the local control system.

Reinforcement Learning applications

in trading and finance

Supervised time series models can be used for predicting future sales as well
as predicting stock prices. However, these models don’t determine the action
to take at a particular stock price. Enter Reinforcement Learning (RL). An RL
agent can decide on such a task; whether to hold, buy, or sell. The RL model is
evaluated using market benchmark standards in order to ensure that it’s
performing optimally.

This automation brings consistency into the process, unlike previous methods
where analysts would have to make every single decision. IBM for example has
a sophisticated reinforcement learning based platform that has the ability to
make financial trades. It computes the reward function based on the loss or
profit of every financial transaction.
Table of contents

Reinforcement Learning in NLP

(Natural Language Processing)

In NLP, RL can be used in text summarization, question answering, and

machine translation just to mention a few.

The authors of this paper Eunsol Choi, Daniel Hewlett, and Jakob Uszkoreit
propose an RL based approach for question answering given long texts. Their
method works by first selecting a few sentences from the document that are
relevant for answering the question. A slow RNN is then employed to produce
answers to the selected sentences.

Source

A combination of supervised and reinforcement learning is used for abstractive

text summarization in this paper. The paper is fronted by Romain Paulus,
Caiming Xiong & Richard Socher. Their goal is to solve the problem faced in
summarization while using Attentional, RNN-based encoder-decoder models
in longer documents. The authors of this paper propose a neural network with
a novel intra-attention that attends over the input and continuously generates
output separately. Their training methods are a combo of standard supervised
word
Tableprediction
of contentsand reinforcement learning.

Source

On the side of machine translation, authors from the University of Colorado

and the University of Maryland, propose a reinforcement learning based
approach to simultaneous machine translation. The interesting thing about
this work is that it has the ability to learn when to trust the predicted words
and uses RL to determine when to wait for more input.
Source

Table of contents
Researchers from Stanford University, Ohio State University, and Microsoft
Research have fronted Deep RL for use in dialogue generation. The deep RL
can be used to model future rewards in a chatbot dialogue. Conversations are
simulated using two virtual agents. Policy gradient methods are used to reward
sequences that contain important conversation attributes such as coherence,
informativity, and ease of answering.

Source

More NLP applications can be found here or here.

Reinforcement Learning applications

in healthcare

In healthcare, patients can receive treatment from policies learned from RL

systems. RL is able to find optimal policies using previous experiences without
the need for previous information on the mathematical model of biological
systems. It makes this approach more applicable than other control-based
systems in healthcare.

RL in healthcare is categorized as dynamic treatment regimes(DTRs) in chronic

disease or critical care, automated medical diagnosis, and other general
domains.
Table of contents

Source

In DTRs the input is a set of clinical observations and assessments of a patient.

The outputs are the treatment options for every stage. These are similar to
states in RL. Application of RL in DTRs is advantageous because it is capable of
determining time-dependent decisions for the best treatment for a patient at a
specific time.

The use of RL in healthcare also enables improvement of long-term outcomes

by factoring the delayed effects of treatments.

RL has also been used for the discovery and generation of optimal DTRs for
chronic diseases.

You can dive deeper into RL applications in healthcare by exploring this paper.

Reinforcement Learning applications

in engineering
In the engineering frontier, Facebook has developed an open-source
reinforcement learning platform — Horizon. The platform uses reinforcement
learning to optimize large-scale production systems. Facebook has used
Horizon
Table of internally:
contents

to personalize suggestions
deliver more meaningful notifications to users
optimize video streaming quality.

Horizon also contains workflows for:

simulated environments
a distributed platform for data preprocessing
training and exporting models in production.

A classic example of reinforcement learning in video display is serving a user a

low or high bit rate video based on the state of the video buffers and estimates
from other machine learning systems.

Horizon is capable of handling production-like concerns such as:

deploying at scale
feature normalization
distributed learning
serving and handling datasets with high-dimensional data and thousands of
feature types.

Reinforcement Learning in news

recommendation

User preferences can change frequently, therefore recommending news to

users based on reviews and likes could become obsolete quickly. With
reinforcement learning, the RL system can track the reader’s return behaviors.

Construction of such a system would involve obtaining news features, reader

features, context features, and reader news features. News features include
but are not limited to the content, headline, and publisher. Reader features
refer to how the reader interacts with the content e.g clicks and shares.
Context features include news aspects such as timing and freshness of the
news. A reward is then defined based on these user behaviors.

Table of contents

Reinforcement Learning in gaming

Let’s look at an application in the gaming frontier, specifically AlphaGo Zero.

Using reinforcement learning, AlphaGo Zero was able to learn the game of Go
from scratch. It learned by playing against itself. After 40 days of self-training,
Alpha Go Zero was able to outperform the version of Alpha Go known as
Master that has defeated world number one Ke Jie. It only used black and
white stones from the board as input features and a single neural network. A
simple tree search that relies on the single neural network is used to evaluate
positions moves and sample moves without using any Monte Carlo rollouts.

Real-time bidding— Reinforcement

Learning applications in marketing
and advertising

In this paper, the authors propose real-time bidding with multi-agent

reinforcement learning. The handling of a large number of advertisers is dealt
with using a clustering method and assigning each cluster a strategic bidding
agent. To balance the trade-off between the competition and cooperation
among advertisers, a Distributed Coordinated Multi-Agent Bidding (DCMAB) is
proposed.

In marketing, the ability to accurately target an individual is very crucial. This is

because the right targets obviously lead to a high return on investment. The
study in this paper was based on Taobao — the largest e-commerce platform in
China. The proposed method outperforms the state-of-the-art single-agent
reinforcement learning approaches.

Reinforcement Learning in robotics

manipulation
The use of deep learning and reinforcement learning can train robots that have
the ability to grasp various objects — even those unseen during training. This
can, for example, be used in building products in an assembly line.
Table of contents
This is achieved by combining large-scale distributed optimization and a
variant of deep Q-Learning called QT-Opt. QT-Opt support for continuous
action spaces makes it suitable for robotics problems. A model is first trained
offline and then deployed and fine-tuned on the real robot.

Google AI applied this approach to robotics grasping where 7 real-world

robots ran for 800 robot hours in a 4-month period.

Source

In this experiment, the QT-Opt approach succeeds in 96% of the grasp

attempts across 700 trials grasps on objects that were previously unseen.
Google AI’s previous method had a 78% success rate.

Final thoughts

Whereas reinforcement learning is still a very active research area significant

progress has been made to advance the field and apply it in real life.

In this article, we have barely scratched the surface as far as application areas
of reinforcement learning are concerned. Hopefully, this has sparked some
curiosity that will drive you to dive in a little deeper into this area. If you want