Efficient Elevator Algorithm
Efficient Elevator Algorithm
5-2020
Owen Barbour
University of Tennessee, Knoxville, [email protected]
Carl Edwards
University of Tennessee, Knoxville, [email protected]
Daniel Nichols
University of Tennessee, Knoxville, [email protected]
Austin Day
University of Tennessee, Knoxville, [email protected]
Recommended Citation
Toll, Sean M.; Barbour, Owen; Edwards, Carl; Nichols, Daniel; and Day, Austin, "Efficient Elevator Algorithm"
(2020). Chancellor’s Honors Program Projects.
https://trace.tennessee.edu/utk_chanhonoproj/2338
This Dissertation/Thesis is brought to you for free and open access by the Supervised Undergraduate Student
Research and Creative Work at TRACE: Tennessee Research and Creative Exchange. It has been accepted for
inclusion in Chancellor’s Honors Program Projects by an authorized administrator of TRACE: Tennessee Research
and Creative Exchange. For more information, please contact [email protected].
Efficient Elevator Algorithm
ECE401/402: Final Detailed Design Report
Austin Day, Carl Edwards, Daniel Nichols, Sean Toll, Owen Barbour
24 April 2020
Executive Summary
Team 28’s goal is to leverage modern deep reinforcement learning techniques to
optimize elevator efficiency in tall, population-dense buildings. Such a feat will
ultimately improve the lives of both elevator users and elevator owners. For this
project, the top five Engineering Characteristics are: 1) integrating the reward
functionality and simulations, 2) creating the reward function by optimizing
the transportation of people, 3) creating a good training set, 4) simulating the
physics and other variables (number of floors, distance between floors, etc.), and
5) developing a base model to determine the performance increase due to the
implementation.
Contents
Contents i
List of Tables iv
1 Problem Definition 1
2 Background 1
3 Requirements Specification 2
3.1 Customer Requirements . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Engineering Requirements . . . . . . . . . . . . . . . . . . . . . . 3
4 Technical Approach 4
4.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Control Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6 Embodiment Design 14
6.1 Algorithm & Control Flow . . . . . . . . . . . . . . . . . . . . . . 14
6.1.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.1.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . 16
6.2 Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 Test Plan 18
7.1 Testing the Simulation . . . . . . . . . . . . . . . . . . . . . . . . 18
7.1.1 Testing the Building Simulation . . . . . . . . . . . . . . . 18
7.1.2 Testing the Person Simulation . . . . . . . . . . . . . . . 19
7.2 Testing Reinforcement Algorithm . . . . . . . . . . . . . . . . . . 19
7.2.1 Testing Implementation . . . . . . . . . . . . . . . . . . . 19
7.2.2 Testing the Reward Function . . . . . . . . . . . . . . . . 19
8 Deliverables 20
i
9 Project Management 20
10 Budget 21
11 Summary of Progress 22
12 References 27
13 Appendix 28
ii
List of Figures
1 Engineering design pipeline for simulation creation. . . . . . . . . 4
2 Project Class Overview . . . . . . . . . . . . . . . . . . . . . . . 14
3 Example Of The Building Visualization . . . . . . . . . . . . . . 15
4 This shows the return per epoch on a 10 floor system. The return
approaches a value near zero, which indicates that few people are
waiting during the episodes once it is trained. . . . . . . . . . . . 23
5 This shows the running average of actions taken at each step for
an elevator system with 20 floors. This is the same experiment as
Figure 19. Note the collapse in return happens when the action
only goes up or down. . . . . . . . . . . . . . . . . . . . . . . . . 24
6 This shows the return for each epoch for an elevator system with
20 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7 This is a visualization of the neural network making decisions for
10 floors. Check out the gif at: https://raw.githubusercontent.
com/EfficientElevator28/ReinforcementLearning/master/up_
down_10.gif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8 This shows the loss for each step for an elevator system with 5
floors. Note that all these figures show running average values as
well. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9 This shows the loss for each epoch for an elevator system with 5
floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
10 This shows the reward for each step for an elevator system with
5 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
11 This shows the action for each step for an elevator system with
5 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
12 This shows the loss for each step for an elevator system with 10
floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
13 This shows the loss for each epoch for an elevator system with 10
floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
14 This shows the reward for each step for an elevator system with
10 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
15 This shows the action for each step for an elevator system with
10 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
16 This shows the loss for each step for an elevator system with 20
floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
17 This shows the loss for each epoch for an elevator system with 20
floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
18 This shows the reward for each step for an elevator system with
20 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
19 This shows the action for each step for an elevator system with
20 floors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
iii
List of Tables
1 Customer Requirements . . . . . . . . . . . . . . . . . . . . . . . 2
2 Engineering Requirements (continued on next page) . . . . . . . 3
3 Engineering Requirements Continued . . . . . . . . . . . . . . . . 4
4 Simulation Design Decisions . . . . . . . . . . . . . . . . . . . . . 6
5 Control Algorithm Design Decisions . . . . . . . . . . . . . . . . 7
6 Performance Prediction for Design Concept: Programming a Sim-
ulation with Custom Data Set via Simulation . . . . . . . . . . . 8
7 Performance Prediction for Design Concept: Programming a Sim-
ulation with Existing Elevator Data Set . . . . . . . . . . . . . . 8
8 Performance Prediction for Design Concept: Programming a Sim-
ulation By Measuring Real World Data Manually . . . . . . . . . 9
9 Weighted Decision Matrix (Programming a simulation). Weight*
Rating = Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
10 Performance Prediction for Design Concept: Optimization Algo-
rithm with Deep Q-learning . . . . . . . . . . . . . . . . . . . . . 10
11 Performance Prediction for Design Concept: Optimization Algo-
rithm with Policy Gradient . . . . . . . . . . . . . . . . . . . . . 11
12 Performance Prediction for Design Concept: Optimization Algo-
rithm with Tabular Q-Learning . . . . . . . . . . . . . . . . . . . 11
13 Weighted Decision Matrix (Using deep Q-learning algorithm).
Weight * Rating = Score . . . . . . . . . . . . . . . . . . . . . . 11
14 Deadlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iv
1 Problem Definition
“Why is this elevator taking so long?” This is a common thought that has
popped into everyone’s head at some point in their lives, spurred on by elevators
that leave much to be desired. Some are slow, taking ages to go from floor to
floor. Others are baffling, going up when someone wants to go down or sending
the most distant unit to pick up a user when there is clearly a closer one nearby.
Granted, the speed of the elevator is limited by the physics of a massive metal
box moving vertically through a building. However, the frequency of weird
actions, the ones that are triggered by a programmable algorithm, can surely
be reduced. While no elevator algorithm will ever reach perfection, they can
certainly get closer. That is the objective of the team’s project.
Team 28 project’s success will chiefly benefit two customer groups: eleva-
tor users and elevator owners. For users, the benefit is clear: a better elevator
experience. For owners, the benefit is more indirect, yet no less significant: a
higher profit margin. To explain, regardless of the business sector, happier peo-
ple means better business. These people can either be employees or customers,
and, therefore, this increase can take the form of greater productivity, increased
sales, or more positive reviews. No matter the channel, the end result remains
the same. Improved elevators will make people happier, or, at the very least,
will make people less unhappy. This effect will cascade until it results in more
money for the business the elevator is supporting. Assuming the owner of the
elevator has some stake in the business, if they are not already a part of it, they
will reap the benefits of this increase. In this way, one elevator will have made
a bigger difference than most would have anticipated. Granted, nothing ever
works out as perfectly as the example above describes. However, it remains true
that even small things like elevators can hit far above their weight.
2 Background
Although most elevator manufacturers offer their own proprietary algorithms
which are mostly treated as trade secrets [1], this problem and its hypothetical
optimizations have been tackled in several studies available to the public. In a
1996 study by Robert Crites and Andrew Barto [2], it was found that training an
elevator algorithm using reinforcement learning offered notable improvements
over older dynamic programming approaches to solving large scale optimization
problems in various aspects, including performance and memory requirements.
“These results demonstrate the utility of [Reinforcement Learning] on a very
large scale dynamic optimization problem. [...] RL avoids the need of con-
ventional [Dynamic Programming] algorithms to exhaustively sweep the state
set. By storing information in artificial neural networks, it avoids the need to
maintain large lookup tables” [2]. A later study by the same researchers in
the year 1998 went deeper into this, comparing various reinforcement learning
techniques to existing methods: “We used a team of RL agents, each of which
was responsible for controlling one elevator car. Results obtained in simula-
1
tion surpassed the best of the heuristic elevator control algorithms of which we
are aware. Performance was also robust in the face of decreased levels of state
information” [3]. Other more recent studies also exist, however most of these
do not account for all of the factors which the group intends to include in the
study. In particular, the monitoring of foot traffic and consideration of time of
day and past foot traffic patterns will be used to add more context to optimize
the algorithm, and hopefully achieve new improved results.
3 Requirements Specification
3.1 Customer Requirements
Accurate training set development The use case is a high density buildings with a large
number of floors
Normal implementation base model A metric to determine the effectiveness of the im-
proved algorithm
Deep reinforcement learning Takes into account the number of people waiting per
implementation floor, waiting time per person, and the number of
elevators
Collection of data from UT campus Base model for implementation on UT’s campus
elevator*
2
3.2 Engineering Requirements
Reinforcement Integrates the reward and simula- Further research and use of mod- DV
learning tions ern, deep reinforcement learning
implementation techniques
Training set Variable number of elevator requests Take into account different densi- DC
requesting ties in sections of floors to better
model skyscraper distribution
Training set Models people arriving and travel More complex variables such as el- DC
physics time. Accounts for multiple eleva- evator weight
simulation tors, number of floors, and distance
between floors
Installation of Sensors for collecting the number Expand to other elevators at the DC
sensors* of people, waiting times, button UT campus
presses, etc.
3
Table 3: Engineering Requirements Continued
4 Technical Approach
4.1 Simulation
The approach to the elevator simulation can be broken down into three main
steps: research, programming, and testing (Figure 1).
In the research phase the focus will be on understanding how elevators work.
While this will of course necessitate a high level investigation of elevator physics,
the specific goal of this phase will be to discern how the various aspects of
4
elevators (weight, speed, maximum load, etc.) interact with each other. For
example, the trade-off between the weight of an elevator and its resulting speed
will be a key area of focus, among many others. This research phase will mostly
consist of online searches and physics calculations. However, it will also be
necessary to verify the findings by observing a real-life elevator (such as one of
the elevators in the Min Kao building). Through this additional analysis it will
be possible to incorporate some elements of practical reality into the simulation,
since theoretical calculations can only go so far.
In the programming phase the objective is simple: create the simulation.
To do this python will be used, because it is easy and there is a lot of library
support behind it. Further, the entire group is familiar with Python. As such,
it is the most straightforward choice. Importantly, this simulation will by no
means need to be perfect or complete by the end of the phase; however, the core
functionality will need to be implemented. Additionally, it is critical that that
simulation be designed with future expansion and modification in my mind. As
the project goes on, it quite likely that new parameters will need to be added
and old ones will need to be adjusted or even discarded. If the code is poorly
structured, this revision process will be much more difficult. Therefore, it will
be imperative that the simulation is coded well.
Once the simulation is fully implemented the testing phase will begin. The
goal here is two-fold: ensure that the simulated elevator obeys the laws of physics
and then improve its applicability to real-world elevators. The first part of the
goal will require physics calculations and other mathematical checks. It is in
some ways the most vital part to get down as an optimized elevator algorithm
is worthless if its not physically possible. Once the physical constraints are
successfully accounted for, the second part, real-world application, can begin
in earnest. This will first necessitate that the parameters and performance of
elevators around campus be recorded. Then these parameters will be inputted
into the simulation and its performance will be compared to that of the campus
elevators. The goal is to make simulation’s performance as similar as possible.
If this not the case after a round of testing, then it will be necessary to go back
to the programming phase or even the research phase. This cyclical process
will continue until the simulation is to the group’s and more importantly the
customer’s satisfaction.
A summary of the major design decisions and the relevant engineering char-
acteristics can be found in Table 3 below.
5
Table 4: Simulation Design Decisions
2
Li (θi ) = E(s,a,r,s0 )∼U (D) r + γ max
0
0
Q(s , a 0
; θi− ) − Q(s, a; θi )
a
In this equation, s is the current state, a is the action taken,, r is the reward
for taking these actions, s0 is the next state, and a0 is the action taken from the
next state. γ is the discounting factor. The Adam optimizer will be used to
minimize this loss function. Q is the action-value function that will be learned
(the multilayer perceptron). U (D) is a uniform distrubution of transitions in
the replay buffer D. Θ is the model parameters.
In order to program this algorithm Team 28 will create a environment wrap-
per class for the simulation. This will include a reset function to create a new
episode and a step function to take one timestep and get the reward for that.
Since elevators are a nonepisodic task, it isn’t important for us to reset the
episode however. The group will collect state transitions into a replay buffer
and use these to sample minibatches to train the network.
6
To implement this model the team will create this deep Q-learning algorithm
in Python 3 using the deep learning library PyTorch. PyTorch is a good choice
because it is a flexible, efficient library for deep learning calculations while still
being relatively easy to use.
A summary of the major design decisions and the relevant engineering char-
acteristics can be found in Table 4 below.
Using a Markov model with one Training set requesting, Reward function, Reinforcement learning
state and up/down actions* floor priority
Performance Predictions:
7
Table 6: Performance Prediction for Design Concept: Programming a Simula-
tion with Custom Data Set via Simulation
Training set requesting 5 With this option, the group will be able
to completely customize what elevator re-
quests are going into the simulation.
Training set physics simulation 5 The physics are measured so they are com-
pletely accurate.
Normal implementation base model 3 The base model will not be as accurate without
custom data for the research focus.
8
Table 8: Performance Prediction for Design Concept: Programming a Simula-
tion By Measuring Real World Data Manually
Training set physics simulation 5 The physics are measured so they are
completely accurate.
Normal implementation base model 2 Any elevator that can be measured is not
the use case of high floor count.
Concepts
Simulation Existing Dataset New Dataset
Criteria Weight Rating Score Rating Score Rating Score
Applicability 0.3 1 0.3 3 0.9 2 0.6
Utility 0.3 3 0.9 0 0.0 0 0.0
Time Spent 0.3 1 0.3 3 0.9 -3 -0.9
Creativity 0.1 3 0.3 -1 -0.1 0 0.0
Total 1.8 1.7 -0.3
Rank 1 2 3
Continue? yes no NO
Using an existing dataset would not be useful for later optimization algo-
rithms if one even exists. The same can be said for creating a new dataset
except doing this would require far too much work. Both of these concepts are
also quite boring. Finally, while they would provide greater real-world appli-
cability than a simulation, it is not worth the decreased usefulness in the later
optimization algorithms. Therefore, programming a simulation is the superior
design concept.
Selection Explanation:
A simulation is better for this project, because it will allow for directly ma-
nipulating the different parameters. This will allow the group to draw causal
9
relationships between the parameters and the simulated elevators performance.
This will come in handy when developing the optimization algorithm.
Selection Explanation:
Python will be used, because it is easy and has a lot of hlepful library support.
Additionally, the entire group is familiar with it.
10
Table 11: Performance Prediction for Design Concept: Optimization Algorithm
with Policy Gradient
Table 13: Weighted Decision Matrix (Using deep Q-learning algorithm). Weight
* Rating = Score
Concepts
Deep Q-Learning Policy Gradient Tabular Q-learning
Criteria Weight Rating Score Rating Score Rating Score
Implementation Efficiency 0.1 0 0 0 0 3 .3
Implementation Cost 0.2 1 0.2 0 0.2 2 0.4
Implementation Difficulty 0.3 -1 -0.1 -2 -.6 2 0.6
Elevator Algorithm Quality 0.4 3 1.2 3 1.2 -2 -0.8
Total 1.3 0.8 0.5
Rank 1 2 3
Continue? Yes Maybe No
11
model complexity.
Selection Explanation:
Deep Q-learning will allow us to learn the best action to move in. It allows tab-
ular Q-learning to be extended to continuous input domains, which is desirable
for many tasks. It can also produce a deterministic action policy which policy
gradient does not.
Selection Explanation:
Team 28 choose a more complex Markov model with up/down actions first
because it will give the algorithm more freedom to find a better solution to
the problem. If a good solution is not found, then an alternative concept of a
discrete state space will be used.
Selection Explanation:
Adam is used because it is commonly used in many reinforcement learning
papers.
12
also consider using spiking neural networks instead.
Selection Explanation:
The group will use a multilayer perceptron (MLP) because it is simple yet pow-
erful. It can be used to approximate a lot of functions, and since the group
does not have a great idea about the underlying statistical correlation structure
of the Markov model transition data, then the group wants a potentially gen-
eralizable model with dense fully-connect hidden layers. MLP is also relatively
simplistic and efficient so it is a great starting point.
Selection Explanation:
PyTorch is a commonly used deep learning framework which is easy to use but
allows for sufficient implementation detail complexity.
Selection Explanation:
We don’t anticipate our model to require the amount of computation a GPU
will provide.
13
6 Embodiment Design
6.1 Algorithm & Control Flow
The algorithm used in this project is split into two core functionalities as men-
tioned before: the simulation and the reinforcement learning algorithm.
6.1.1 Simulation
From a high level, the simulation is split into six fundamental classes, a step
function, and the visualization.
The classes are as follows: Simulator, Building, Elevator, Floor, Person,
and the Person Scheduler. The Simulator class runs the step function, serving
as the most abstracted layer of the simulator. The Building class is one step
deeper, handling the storage of elevators, floors, and measurement specifications
of certain aspects of the building. The Elevator class contains variables for
keeping track of passengers and velocity, as well as functions for physics and
loading/unloading passengers. The Floor class stores the people waiting and
queued floor button presses (up/down). The Person class stores the person’s
original floor, destination, waiting state, and waiting time.
The basic step function has several jobs. First, it checks if any elevators
are going to finish their loading/unloading cycle in the time allotted. If so,
it changes the allotted step time to exactly finish this cycle; this allows the
reinforcement algorithm to determine the direction of travel. Next, it changes
the position of each elevator to a new position based off the specified passage
of time. With this it unloads people and loads people going in the previous
14
direction. Finally, the step function increments states such as each person’s
waiting times.
The visualization simply gets fed the building as part of its initialization and
updates the building/elevator/passenger states accordingly. This component
uses a wrapper for OpenGL primitives to display the elevator’s position, cur-
rent state (the elevator outline), the elevator passengers, the number of people
waiting on a floor, and the floor up/down button presses. Having a visualiza-
tion is helpful to see that the physics are working properly and it is useful to
compare the states of the reinforcement algorithm’s performance as opposed to
the SCAN algorithm’s performance.
15
6.1.2 Reinforcement Learning
Interfacing with the Simulation
Implementations
Implementations:
1. First, a simple problem with one elevator will be tackled as a proof of
concept. Inside the RL step function, the algorithm will call the physics
step function until the elevator reaches a given floor. Then, the algorithm
will calculate a reward and return it and the new state. The RL algorithm
will calculate an action which will be the next floor to go to.
16
• Pros: This will be very simple. The algorithm won’t need to plan
more than one RL step ahead. This will show that the elevator can
learn on a simple problem.
• Cons: This algorithm can only be applied to one elevator. This is
because elevators might not all reach their destination at the same
time. Waiting for that to happen would be a huge waste of time.
2. In order to tackle multiple elevators, the physics and RL step functions
will be merged together. Due to the safety constrains mentioned above,
the elevator cannot change floors while traveling. In order to avoid this
issue, a 2-element queue will be used. The elevator will constantly travel
to the first element, and the RL algorithm can change the second element
whenever it wants (even while traveling between floors). When an elevator
arrives, the second element will become the first element.
• Pros: By merging the physics and RL timesteps (only call the physics
timestep once in the RL step function), the program will be able to
have multiple elevators running in parallel. This is because the algo-
rithm won’t have to wait for all elevators to reach their destination
before moving on (Implementation 1’s method would necessitate do-
ing so).
• Cons: Since the RL algorithm can only change the second element
(the next floor to travel to after the destination), the algorithm will
receive a delayed reward for its actions. This delayed reward signal
will make training slower and more difficult.
3. At Dr. Sadovnik’s suggestion, we will also consider having an RL algo-
rithm for each elevator. Each elevator will learn its own neural network
and only take a new action when it reaches its destination like in Imple-
mentation 1.
• Pros: The physics and RL timestep functions can be unmerged.
The physics timestep will be called in a loop, and each elevator’s
unique RL step function will be called when that elevator reaches its
destination. This would allow us to take a more direct approach like
in Implementation 1 while using multiple elevators. This may also
help scale up to many elevators, since each elevator will have its own
specific neural network. (One neural network might take to long to
learn to control several elevators by itself).
• Cons: The program will not be able to control each elevator with
one central neural network, which might cause communication issues
between the elevators and reduce the efficiency improvement of the
algorithm. However, the simplicity for each individual elevator might
make this approach easier for learning a good algorithm.
The implementation of this step is also much harder.
17
Reward Calculation
To calculate the rewards, we will first use negative people waiting as a re-
ward, which is easy to implement and will incentivize the elevator to move
people to their desired floors quickly. After this, we will also try changing the
reward function to be the negative of the sum of the time each person has
waited. This should further incentivize the algorithm to treat passengers fairly
and not abandon someone for hours.
7 Test Plan
7.1 Testing the Simulation
7.1.1 Testing the Building Simulation
It is essential for the later parts of the project to have a robust elevator simu-
lation in the models. Thus precise and complete tests are needed to ensure the
robustness of the elevators.
To test this, a series of simulated buildings will be created to cover all of the
possible elevator configurations and edge cases:
• Single elevator
• Multiple elevators
• Elevators covering different ranges of floors
• Building with floors that are not accessible by elevator
Each of these buildings will be fed a predefined list of passengers arriving at
specific timings in order to test the simulation’s capability to deliver passengers
to all accessible floors in a building, and to deal with high load without failing.
This simulation test will run on a pass or fail basis, with the only condition
being that after new passengers stop arriving, the simulation eventually reaches
a state where there are no more waiting people. This pass should succeed for
all buildings which are not intentionally designed to have inaccessible floors
18
7.1.2 Testing the Person Simulation
In order to truly acquire an adequate amount of accurate data and testing
conditions for a machine learning algorithm to improve our results, an adequate
simulation is required for the random coming and going of new passengers,
accounting for differences in popularity of floors in a building and density of
foot traffic depending on time of day. To do this, a simulator was created to
generate passengers based on a probability density function (PDF) describing
the likelihood of a button press for each floor, relative to the passage of time.
In order to ascertain the accuracy of the person scheduling simulator, sufficient
testing must be done.
Given a specific building and the set of PDFs for it, this test will run the
person scheduling simulation, and monitor and analyze the queuing up of new
passenger. After the simulation is run for a long enough duration, the actual
average rates of button presses will be compared to the PDF, and an error value
will be obtained. The test will pass if the error falls is an acceptable margin of
ten percent.
19
8 Deliverables
Upon completion of the project it is planned to have a working simulation for
the elevators and reinforcement learning algorithms to find the optimal behavior
of the elevator. Thus, Team 28 will use these as deliverables to work towards
as various sections of the project are completed.
Since elevators are expensive, they will be unable to be purchased for this
project. Additionally, the team’s algorithms will be most useful for multi-
elevator systems, which would require more elevators. The simulation will
take several important parameters: number of elevators, speed of elevators,
and weight capacity of elevators. Additionally, the amount of foot traffic will be
parameter-based. This flexibility will allow for the simulation of nearly all real-
world elevator systems will still retaining a reasonable budget. Additionally,
Team 28 can measure elevator use within UTK buildings in order to replicate
the real-world elevator systems here such as the Min Kao building or SERF
elevator systems. Since an important aim of the team’s project is to target very
tall, multi-elevator systems, Team 28 will attempt to model elevator systems
from well known towers elsewhere, such as the Cathedral of Learning in Pitts-
burgh. Finally, the group will test the team’s algorithms on speculative elevator
designs, which may be useful in future megastructures.
The next deliverable is the creation of a reward function, which will allow
us to run the team’s reinforcement learning algorithms. This is critical for the
team’s machine learning algorithms, since it tells the algorithm how it should
respond to the simulation state. For example, the group can choose to prioritize
how many passengers are outside an elevator door or to prioritize certain floors
over others.
The other major deliverable in this project will be the modern deep rein-
forcement algorithms. This deliverable will use the prior deliverables of the
simulation and reward functions in order to determine the optimal actions that
an elevator should take at any given point in time. For example, algorithms
such as deep Q-networks will be applied, which use neural networks for function
approximation.
Since the team’s project is mainly computational, these first deliverables
will be in the form of several source code files written in the Python language.
Finally, Team 28 will write and design a report on the results of the team’s
project. The report will highlight the team’s methods and their effectiveness on
various elevators and associated simulations. This deliverable is a critical step
in communicating the results of the team’s work to a wider audience.
9 Project Management
In order to successfully go about completing the deliverables, the team’s group
will break down the next semester into the following segments organized by the
time and order of completion.
20
Table 14: Deadlines
Deadline Description
Month 1: Create a simulation to The simulation should model people arriving, ele-
handle the physics of elevator vator travel time, and elevator weight. The state
movement should be given to a reinforcement learning algo-
rithm. Since this is the backbone of the team’s sys-
tem, the team’s group will dedicate the first month
to its creation.
Month 1.5: Find simulation The group will model several test elevator systems
parameters to model elevator systems from both real buildings and hypothetical future
within UT and other potential structures. It is important to have a solid collection
scenarios. of elevator systems to experiment on.
Month 3: Implementation of modern The team’s group will spend 1 month implement-
deep reinforcement learning techniques ing and modifying various algorithms from the liter-
to interface with the simulation and ature. The team’s group will also collect the results
reward function from these algorithms for the different elevator sys-
tems created in above deliverables.
Month 3.5: Generate a project This will be the final project report for the class.
report based on existing findings.
Since the group consists of five computer science majors, every member is
capable of contributing similarly to the project. Carl Edwards has more ex-
perience in machine learning and is taking a graduate reinforcement learning
class, so he will help steer the direction of the reinforcement learning algorithm
deliverable production.
10 Budget
There is no designated budget for this project. The simulation and reinforce-
ment learning are to be software only with only the use of open-source.
21
11 Summary of Progress
There are two major components to the project: the simulation and the rein-
forcement learning aspect. For the simulation, it is completely functioning. The
simulation supports any number of elevators with any number of floors. Realis-
tic physics are used with options to set acceleration/deceleration, max velocity,
floor distance, etc. This is all displayed on an OpenGL visualization. Addi-
tionally on the visualization, one can see the floor button presses (up/down),
the number of people waiting, the number of people on each elevator, and the
current state of each elevator. Finally, the simulation uses a uniform poisson
distribution for spawning people on floors with their desired destination floor.
As for the reinforcement learning component, we have achieved learning on
a single elevator systems of 5 and 10 floors using negative people waiting as
a reward and the desired floor as the action. Additionally, we also learn on
20 floors systems but solutions have a tendency to collapse. This is likely due
to the way the algorithm interacts with the simulation being non-Markovian
(there is a different amount of simulation time when we travel to floors farther
away). We use [5] and [6] help determine hyperparameters. Initially, we tried
to learn without episodes. This resulted in 5 floors working and sometimes 10.
Unfortunately, 20 floors didn’t work because people showed up faster than the
initial random exploration policy could handle; this resulted in the Q-values
essentially running away as the number of people increased to infinity. 20 floors
worked when we reduced the rate people showed up to 20% of the previous value
(from 0.5 to 0.1). The takeaway is that on taller elevator systems (20 floors),
when the Poisson rate is the same as smaller buildings, the elevator has trouble
keeping up at the beginning of learning. At lower Poisson rates it is also possible
to learn better actions (and if Poisson rate is too high it won’t learn at all).
22
Figure 4: This shows the return per epoch on a 10 floor system. The return
approaches a value near zero, which indicates that few people are waiting during
the episodes once it is trained.
23
work better with the negative of the number of people waiting as the reward
(we don’t use the cumulative wait time per floor states either). 5 and 10 floors
learn nicely. 20 floors begins to learn but it suffers from some type of solution
collapse. It’s interesting to note that the average action while learning was
about 0.5. This is desirable because 1 is up and 0 is down, so an average of 0.5
indicates the elevator is moving up and down.
Figure 5: This shows the running average of actions taken at each step for an
elevator system with 20 floors. This is the same experiment as Figure 19. Note
the collapse in return happens when the action only goes up or down.
24
Figure 6: This shows the return for each epoch for an elevator system with 20
floors.
Whenever the solution collapsed the elevator started either only going up
or only down. Long episodes seem to work better (20000 seconds or infinite)
because people pile up which helps the elevator learn faster (bigger differences
in values). However, for infinite episodes on large buildings people pile up before
it can learn and it gets overwhelmed and doesn’t learn (the Q function keeps
changing to get bigger as more people show up. We balance this with 20000
second episodes so that the length advantage exists but we can apply what we
learned by resetting.
The 20 floor solution collapse is theoretically impossible in a Markovian sys-
tem in traditional reinforcement learning due to the policy improvement theorem
[8], but due to the approximation function for Q and non-Markovian elements
of the simulation it happened here in practice. Merging the RL step and simula-
tion timestep function would probably contribute to fixing this. Also, although
5 and 10 floors worked, they would still abandon people on rare occasions. We
think a hybrid algorithm between RL and classical algorithms might be the
best bet for a consistent, semi-explainable algorithm. It might also be useful
to remove the option to go up on the top floor and down on the bottom floor.
Using a state-of-the-art model models might also help; we don’t believe that
policy gradient methods would help, however, though this is untested. Another
25
future path for research is to fill the elevator up with 10-20 people and then
stop letting them show up. After this, train the elevator on this until nobody
is waiting. Repeat until fully trained.
The Covid-19 pandemic substantially delayed our project, but we managed
to still achieve some results.
26
12 References
[1] Gina Barney and Lutfi Al-Sharif. Elevator traffic handbook: theory and
practice. Routledge, 2015.
[2] Robert H Crites and Andrew G Barto. Improving elevator performance
using reinforcement learning. In Advances in neural information processing
systems, pages 1017–1023, 1996.
[3] Robert H Crites and Andrew G Barto. Elevator group control using multiple
reinforcement learning agents. Machine learning, 33(2-3):235–262, 1998.
[4] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic opti-
mization. arXiv preprint arXiv:1412.6980, 2014.
[5] Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A Ortega, Tom Everitt,
Andrew Lefrancq, Laurent Orseau, and Shane Legg. Ai safety gridworlds.
arXiv preprint arXiv:1711.09883, 2017.
[6] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel
Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fid-
jeland, Georg Ostrovski, et al. Human-level control through deep reinforce-
ment learning. Nature, 518(7540):529, 2015.
[7] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Brad-
bury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein,
Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary De-
Vito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,
Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative
style, high-performance deep learning library. In H. Wallach, H. Larochelle,
A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, editors, Advances
in Neural Information Processing Systems 32, pages 8024–8035. Curran As-
sociates, Inc., 2019.
27
13 Appendix
Business model canvas is attached on the next page. Our Gannt chart is attached
on the page after. The 3rd page is a test plan matrix. The 4th through 7th page
is a preliminary patent application. The 8th page is the poster. The remaining
pages are reinforcement learning results.
28
Designed for: Efficient Elevator Algorithm Designed by: Team 28 Date: 10/16/2019 Version: 2
The Business Model Canvas
Key Partners Key Activities Value Propositions Customer Relationships Customer Segments
• Key activities required by value • Value to customer: smaller* • Elevator owners (personalized • Building owners
propositions: creation or collecting
• Partners: elevator data sets, algorithm and metric
wait times for elevators leading assistance) o Ranging from
manufacturers for tall buildings development, marketing to building to increased happiness (*exact o Free remote debugging corporations to
(ex. Downtown hotels, offices) owners and tenants (specifically measurement metric to what is o Paid in-person landlords
• Suppliers: microcontroller companies), installation of additional a smaller wait time is not debugging ▪ Not very diverse
suppliers (if additional hardware hardware for customers necessarily straightforward) • Elevator Users (active community • Users
required) • Distribution channels: partnership(s) • Customer problems we solve: that provides feedback if willing) o Anyone who uses the
• Resources (from partners): with building owners frustration from elevator users; o Will maintain online elevator
• Customer relationships: periodic the lack of elevators that were portal for submitting ▪ Ranging from
none* (run in simulations until
(yearly) check-ins with customers to originally installed in buildings feedback on elevators workers to
actual implementation) see if that are satisfied, analysis of
• Key activities (that partners will be mitigated. These can (both positive and residents to
actual data to see if time savings lead to people/business more
perform): service/update negative) visitors
have occurred (automated)
existing elevators in certain likely to rent ▪ Link will be posted in ▪ Very diverse
• Revenue streams: one-time
buildings per our algorithm • Products/services to each each elevator group
installation fee and possible
additional service fee customer segment: algorithm
product to building owners
• Customer needs satisfied:
Key Resources better convenience/usability Channels
• For value proposition: existing • Website with contact/purchasing information
example hardware used by the o Will include forum/portal for users to
elevators that we will be provide feedback on elevators (both
working with positive and negative)
• For customer relationships: for o Will also include a separate portal for
later automated analysis of owners allowing them to:
▪ Report issues/provide feedback
actual elevator data, potentially
▪ Download new updates (both functional
additional hardware and an and security)
internet connection will be ▪ Request in-person/remote assistance
required for data upload • To increase awareness
• For distribution channels: o Booths at appropriate conventions
website demonstrating o Calls/emails to various businesses,
effectiveness of algorithm apartment complexes, etc.
• For revenue streams: • To aid installation
Subscription fee from clients for o Partnerships with appropriate
maintenance and updates electronics providers if necessary
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit:
http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
1 15 1 15 1 15
April 2020
BACKGROUND
Most elevators run a proprietary variation of the SCAN algorithm. The SCAN
algorithm is a simple algorithm used in various applications, wherein the object
in question moves in single-directional sweeps across its range of access points
until it has satisfied all requests before it can change directions. For this specific
case, once the elevator has started its motion up or down, it will continue in
that direction until it has stopped at every floor that has had its button pressed
in that direction. Once it has no more requests at any floors further down that
trajectory, it has the freedom to go any direction to satisfy the next request for
a floor.
This algorithm has been used since the introduction of some of the earliest
mechanical elevators, and continues to be used for its effectiveness, with few
changes outside of proprietary heuristic tuning to allow the algorithm to better
suit the needs of more complex elevator setups and systems. The algorithm is
nonetheless prone to certain shortcomings in the face of more complex scenarios,
such as when presented with a choice, choosing a direction where passengers
are nearer, or a direction where there is a higher probability of having many
passengers, where choosing the wrong direction could waste time.
By implementing a well-tuned reinforcement learning element to the decision-
making of an elevator system, the system is able to make more educated deci-
sions based on a greater variety of factors, thus improving overall performance
by decreasing the average time lost by people waiting on elevators.
1
BRIEF SUMMARY OF THE INVENTION
The presented invention comprises a novel algorithm for scheduling elevators in
various structures. The algorithm is powered by reinforcement learning, specifi-
cally deep-Q learning, and a software simulation of the building to find optimal
parameters. The invention is designed to work with both simple and complex
structures with varying numbers of elevator shafts and elevators per shaft. While
optimal scheduling is unknown, the simulation gives the reinforcement learning
model access to all of its parameters and, with some training time, it can find
a suitably optimal elevator control.
2
deployed model). The proposed invention uses a 2 layer Multi-Layer Perceptron
network with loss calculated via mean squared error.
The proposed loss function is written as
2
Li (θi ) = E(s,a,r,s0 )∼U (D) r + γ max
0
Q(s0 , a0 ; θi− ) − Q(s, a; θi )
a
In this equation, s is the current state, a is the action taken,, r is the reward
for taking these actions, s0 is the next state, and a0 is the action taken from the
next state. γ is the discounting factor. The Adam optimizer will be used to
minimize this loss function. Q is the action-value function that will be learned
(the Multi-Layer Perceptron). U (D) is a uniform distribution of transitions in
the replay buffer D. Θ are the model’s parameters.
To effectively measure loss the model needs to take into an account an appro-
priate object. This is done via the reward function, which is modular and cus-
tomizable in this invention. It is allowed to optimize waiting time, travel time,
and any other desired objective (assuming appropriate mathematical proper-
ties). Given this modularity the invention is also claimed to be used for finding
poor, or sub-optimal schedules. Since the objective can be set to longest average
wait, for instance.
While this loss is standard in deep-Q learning, its use in elevator scheduling
paired with a simulation is unique to this invention.
In implementation, this invention uses a proprietary code base developed by
the authors in the open source, freely available Python language. The learning
algorithm uses open source deep learning frameworks, which are freely available
under the Modified BSD license.
ABSTRACT
The presented invention describes a novel approach towards combining advanced
reinforcement learning techniques with fully customizable, modular simulations
in order to discover optimal elevator scheduling for a desired building config-
uration. After configuring the simulation with the appropriate building setup,
elevator parameters, and people frequencies the simulation, provided the desired
optimization objective, will find an optimal schedule, which will be better than
current SCAN style approaches to scheduling. With the invention is also in-
cluded a custom visualization software to view elevator behavior before actually
integrating with a real system. Once an optimal network is found, the product
could be deployed to control a system in a building given main, central control
system.
3
Figure 1: Workflow Overview
4
Efficient Elevator Algorithm
Owen Barbour, Austin Day, Carl Edwards, Daniel Nichols, Sean Toll
Abstract Results
Our goal was to improve elevator efficiency ● Algorithm ran successfully with one
by running a custom reinforcement learning elevator on 5 and 10 floors (Figures 2 and
algorithm on an elevator simulation of our 3 show 10 floors)
own creation. We had success learning ● With 20 floors, the algorithm collapsed
elevator systems with a lower number of during training due to the non-Markovian
floors and a lower Poisson rate. nature of the simulation
Overview Conclusion/Future
Elevators are not perfect, but they can be
improved. There is currently a gap in the
Work
literature surrounding elevator optimization We successfully simulated an elevator. With
using modern reinforcement learning limited success, we optimized the simulation
techniques. We attempted to fill in this gap. Figure 1 - Elevator Simulation using a reinforcement learning algorithm.
Our end goal was to create an elevator
algorithm that could potentially handle future Regarding future development on this
super-structures. project idea, we could:
● Make simulation more Markovian
● Make hybrid reinforcement/classical
Solution algorithm to guarantee consistency
37
Figure 9: This shows the loss for each epoch for an elevator system with 5 floors.
38
Figure 10: This shows the reward for each step for an elevator system with 5
floors.
39
Figure 11: This shows the action for each step for an elevator system with 5
floors.
40
Figure 12: This shows the loss for each step for an elevator system with 10
floors.
41
Figure 13: This shows the loss for each epoch for an elevator system with 10
floors.
42
Figure 14: This shows the reward for each step for an elevator system with 10
floors.
43
Figure 15: This shows the action for each step for an elevator system with 10
floors.
44
Figure 16: This shows the loss for each step for an elevator system with 20
floors.
45
Figure 17: This shows the loss for each epoch for an elevator system with 20
floors.
46
Figure 18: This shows the reward for each step for an elevator system with 20
floors.
47
Figure 19: This shows the action for each step for an elevator system with 20
floors.
48