0% found this document useful (0 votes)
44 views

Bus Optimization

Using over 100 million GPS data points and 1 million automated passenger count (APC) data points from Indiana University's campus bus service, the authors developed a system to analyze and predict future public transportation demand. They created a new model that maximizes passenger satisfaction on a college campus. By optimizing routes using real-time GPS and APC data, they achieved a 7% reduction in mileage and 2.7% reduction in fuel usage while increasing the number of trips and reducing wait times. This data-driven approach to schedule optimization could materially improve outcomes for public transportation systems.

Uploaded by

bitish commect
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Bus Optimization

Using over 100 million GPS data points and 1 million automated passenger count (APC) data points from Indiana University's campus bus service, the authors developed a system to analyze and predict future public transportation demand. They created a new model that maximizes passenger satisfaction on a college campus. By optimizing routes using real-time GPS and APC data, they achieved a 7% reduction in mileage and 2.7% reduction in fuel usage while increasing the number of trips and reducing wait times. This data-driven approach to schedule optimization could materially improve outcomes for public transportation systems.

Uploaded by

bitish commect
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/330877942

Using Data Analytics to Optimize Public Transportation on a College Campus

Conference Paper · October 2018


DOI: 10.1109/DSAA.2018.00059

CITATIONS READS
5 3,280

6 authors, including:

Hasan Kurban Mark Jenne


Indiana University Bloomington Indiana University Bloomington
32 PUBLICATIONS   148 CITATIONS    12 PUBLICATIONS   76 CITATIONS   

SEE PROFILE SEE PROFILE

Mehmet Dalkilic
Indiana University Bloomington
85 PUBLICATIONS   631 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Lithium-ion Battery Impedance Regeneration using Machine Learning View project

DCEM: Clustering for Multivariate and Univariate Data Using Expectation Maximization Algorithm (R package) View project

All content following this page was uploaded by Hasan Kurban on 20 July 2019.

The user has requested enhancement of the downloaded file.


Using Data Analytics to Optimize Public
Transportation on a College Campus
Kurt Zimmer, Hasan Kurban, Mark Jenne, Logan Keating, Perry Maull & Mehmet M. Dalkilic

Abstract—Using a large volume of bus data in the form of


GPS coordinates (over 100 million data points) and automated
passenger count data (over 1 million data points) we have
developed (1) a system of analysis and prediction of future public
transportation demand (2) a new model that uses concepts spe-
cific to college campuses that maximizes passenger satisfaction.
Using these concepts we improve service of a model college
public transportation service and more specifically the Indiana
University Campus Bus Service (IUCBS).
Index Terms—public transportation, bus, GPS, APC, big data
Fig. 1: Optimization of PT over three dimensions: cost, exe-
cution, and goodwill. For cost, the aim is to always minimize
I. I NTRODUCTION and for goodwill to maximize. For execution, there exist many
competing goals that either minimize (e.g., wait time) or max-
Public transport (PT) is the movement of groups of people
imize (rubber-time). The three dimensions are not independent
over established routes (networks of public and private in-
making the discovery of an optimal solution difficult while the
frastructure) using public conveyances (e.g., cart, stagecoach,
system itself has only three principal components: passenger,
ferry, or bus). Timetables are listings of triples (location,
conveyance, and schedule.
conveyance, time) that are posted along the routes that indicate
when and where a conveyance will stop for entry and exit of
passengers. The timetables directly affect public participation
and that, itself, affects the timetables. They are the cornerstone routes. This approach involves creating a subgraph over the
of PT. In the U.S., most PT are municipally owned and, total infrastructure resulting in a smaller search space, but
therefore, have additional state and federal constraints e.g., also often reflecting traffic laws. Schedule optimization is the
labor laws, that often have nothing to do per se with PT. production of schedules for passengers that optimize execution
The movement to urban areas, explosive population growth, most often throughput. We observe that being agnostic toward
the increasing density of urban areas, and increased traffic, data, i.e., the actual location of the conveyance as well as
among other factors, are placing enormous burdens on existing number of people conveyed, makes finding a good solution
PT that have traditionally not used a data science approach to nearly impossible, let alone an optimal one. We observe too
improving PT. Optimization of PT is along three dimensions: that there is a remarkably good fit for both data science
cost, execution, and rider perception. Cost includes fuel, labor, and big data, e.g., volume of people, velocity of real-time
and repairs. Execution is the actual process of moving people. GPS data (also known as Automated Vehicle Location, AVL),
To improve execution time, for example, we might minimize variety of not only constraints on effective scheduling, but
wait time (to enter the conveyance), minimize travel time also passengers, roads, drivers, value to the municipality,
between locations, maximize so-called rubber-time–either the issues affecting load or rates (for example, weather or traffic
absolute time the conveyance moves over a route or the conditions), etc.
ratio of movement to movement and stopping time. Rider Our contribution is using both conveyance location ∆GP S
perception is the goodwill or satisfaction passengers hold at 2 second intervals as well as ∆AP C (Automated Passenger
toward the PT that changes the likelihood of participation. Counts) to decrease wait times yielding a decrease in mileage
Fig. 1. illustrates some of the complexities of this problem (and fuel costs) and improve goodwill (measured by increased
with regard to search space of an optimal solution. Increasing participation). Our results were dramatic: We obtained a 7.0%
goodwill increases, for example, waiting time. Decreasing cost reduction in total mileage that translated into a 2.7% reduction
generally increases goodwill. The interaction itself, however, is of fuel. Additionally, we increased trips decreasing wait times.
simple with three principle elements: passenger, conveyance, To our best knowledge, we are the first group to not only
and schedule where the schedule is crux of PT. approach schedule optimization using this kind of data, but to
Solutions to improve PT have focused largely on one demonstrate materially improved outcomes–schedules driven
dimension of the problem and usually on treating routes as by data analytics. Throughout this paper we have tried to
graphs with nodes as locations that conveyances use as stops or minimize the parlance of PT, but occasionally will use them
pass through and edges as distances thus establishing optimal explaining terms as they are being introduced. The remaining
(a)

(b)

(c)

Fig. 2: Example of GPS data (a), original APC data (b), and
cleaned APC data (c). We took the data generated from IUCBS
and represented it in a relational model with supporting tables.
Before data reduction there were ~96,000,000 GPS data points
from 2016-04-08 to 2018-04-11 and ~7,000,000 APC data
points from 2017-08-21 to 2018-04-01, afterwards there are Fig. 3: Map of IUCBS routes with a partial satellite image of
16,877,381 GPS data points and 640,597 APC data points. the center. Each route begins and ends at a single point north
of campus. They are composed of a series of stops represented
by the black and yellow dots. Some stops are located in busier
discussion is as follows: background, methodology, results, areas of campus (see Fig. 12) and the travel through that
related work, summary and future work. area can affect the overall time of the bus at certain times of
day. The A route for example is approximately 7.48 km and
II. BACKGROUND circumscribes the campus of 7.82 km2 , and the terminus is
A. IU Campus Bus Service (IUCBS) Memorial Stadium. The yellow arrows point to same location.
Indiana University is located in the city of Bloomington, The satellite image shows the numbers of different paths and
Indiana, USA. Bloomington currently has a population of modes of transportation that students can choose.
around 85K, making it the 7th largest city in Indiana. Indiana
University itself has an enrollment of around 50K students
with an additional 5K faculty and staff. To model our system
for testing we are using data (∆) from IUCBS which is the
management PT arm of Indiana University. ∆GP Si initially is
a 1-2 sec snapshot of a bus’ position shown in Fig. 2a. ∆AP Ci
records passenger events at every bus stop on every route
shown in Fig. 2c. IUCBS has 25 total buses, with 19 buses
operating at peak times running four routes on any given week-
day, with the A route accounting for approximately 45% of
total ridership. Total system ridership is approximately 300K
passengers per month. Additionally, there is a municipality
owned and managed bus service that slightly overlaps with
IUCBS. There is currently not any coordination between the
two and, therefore, we do not include this in our work.
The goal for IUCBS is to increase the quality of service Fig. 4: Graph percentage of transportation options used by IU
available to passengers while minimizing costs. Measuring students on campus [28].
quality directly is prohibitively difficult and falls outside the
current scope of this research. Some remarks, however, should
be made. To move between classes, students have many
choices: walking, biking, driving, public transportation, taxi
service, or some combination these. Furthermore, there are
numerous paths that students can choose as well. Without
consideration of campus registration fees e.g., bike registration
is $20, the choices are relatively indistinguishable with respect
to cost: the city transportation fee is $.50 per day translating
Fig. 5: Measuring goodwill. (Left) Illustrative indifference
into about $40-$50 per semester while the others are free.
curve. The arrow points to maximum utility. (Right) Goodwill
Biking is appealing, since both the campus and city have ded-
is done implicitly through the difference between the posted
icated bike paths (https://tinyurl.com/ybu78rrt). Alternatively,
time and actual arrival time. Currently, we measure goodwill
the campus provides Zipcars (www.zipcar.com), and vehicle
as a number in [0, 1] and is a function of this difference (A).
registration comes with access to student parking. We point
Likely, the actual goodwill is something akin to (B) and will
to two approaches to assessing goodwill: microeconomics via
be part of our future work in addition to studying the degree
utility [26] or directly as a function of some distinguished
of the quality of this model affects the optimization.
factors [21] [31] [24]. For utility (Fig. 5 (Left)), we can
build indifference curves constructed over two axes: choosing
the campus bus and alternative transportation. The utility Challenge Solution
(combination of choices) is determined by, in this case, time–
the student desires to maximize utility using equi-marginal Wide variation in Use ∆ from past to adjust service throughout
traffic density and the day through optimization techniques
principle [26]. Determining these family of curves in this passenger load
scenario, however, is overly complex for our needs, and the throughout the day
benefit is unlikely to exceed the cost of building the curves.
A simpler, more straightforward approach is directly survey- Timetable does not ∆GP S is used to fill in timetable
match actual times of
ing riders. In [18] (Fig. 5 (Right)), the authors, for example, stops
determine goodwill (customer satisfaction) from service, ac-
cess, availability, time, and environment–commonly regarded Schedule does not line Use ∆AP C to determine when students
determinants for goodwill. In our setting, surveying is not up with passenger’s arrive at stops
needs (e.g. stop is not
feasible. We determine goodwill solely by a function of time. close to beginning or
We provide two curves that model goodwill as a real-valued end of classes)
[0, 1] scalar function where 0 means the passenger will not
enter and 1 means the passenger always enters. The curve IUCBS operates with Reduce optimization problem by eliminating
a fixed cost (fixed terms that IUCBS has no control over
is effectively a function of the difference between the posted fleet size, etc.), while
schedule time (expected) and observed bus arrival time. We most frameworks
currently presume a flat curve (5A) that drops to zero when include cost in their
optimization
outside ±[4, 5] minutes, but likely it is somewhat Gaussian in
shape (5B). Our intent is to model goodwill as a probability Passenger demand Implement Ghost Bus service to alleviate
in the future, but the complexity of the optimization lead us to causing platooning demand during troublesome periods. The
believe this simplification would not affect the initial outcome– and overflowed buses Ghost Bus is a bus not assigned to a route that
alleviates issues by assigning it dynamically to
which is consistent with our results. routes that are underserved during times of
Additionally the University has mandated that operations high demand
across the state-wide institution address sustainability; for
IUCBS this means reducing the PT carbon footprint as well. TABLE I: Challenges faced by IUCBS and our solutions.
Before our solution, IUCBS used a schedule developed heuris- Similar challenges exist for other transportation problems, as
tically that did not take into account the wealth of data well as within different domains. The challenges presented
available to them. We will propose a method that takes into will eventually inform the choice of optimization.
account these factors and optimizes the bus service on a
college campus using several data driven algorithms.
to the usual 2-3 so-called rush hour periods that a traditional
B. Previous Work transit system experiences. Indiana University operates on a
IUCBS, like any other college campus bus service [34] [27], fixed class schedule starting at 8:00 a.m., and proceeds in
faces several unique challenges to schedule creation. Campus one hour and five minute increments until approximately 2:30
transportation services can vary wildly, but all are focused p.m.; consequently, the service experiences peak demand and
primarily on serving the student population. [11] The first traffic several times throughout a given service day that can
challenge for non-urban college campuses is the existence of be addressed by algorithms that take into account past data.
several periods of high traffic throughout the day, in contrast Traditional heuristic scheduling practices, with fixed de-
parture intervals, and fixed intervals between stops, do not
adequately address this challenges faced by IUCBS. The result
is a schedule that is sub-optimal for both consumers, and
bus operators. The consumer is unable to trust the schedule,
reducing passenger satisfaction since the bus is liable to be
exceptionally late during times of peak demand or unaccept-
ably early during down times. As well passengers might be
unable to determine if their regularly scheduled bus will be
overfilled. Furthermore, the typical scheduling process does
incorporate local events such as the end of classes, instead
relying on the set schedule hopefully lining up with passenger
demand without actual data to corroborate. The bus operators
are also unable to count on regular breaks which conflicts with
their collective bargaining agreement specifying a minimum
percentage of a scheduled trip be recovery time.
Before the advent of large data sources of GPS or APC data, Fig. 6: Data Flow and Process. A bus collects GPS and pas-
the transit scheduling problem was solved heuristically with a senger data along a route (orange arrows indicate movement)
rigid schedule that could not adapt to the shifting passenger that is stored on the bus (A). At the end of the route, the data is
demand. Data driven methods typically use load profile data, uploaded (B), cleaned, transformed, and reduced (C), driving
from either manual or ∆AP C (Automated passenger count algorithms (E) for schedule production. The subsequent year’s
data) and ∆GP S (GPS data) from the buses. [15] Previous data is used to assess the schedule (F).
work has even explored dynamic scheduling of trips from
∆GP S [13]. Using data generated from bus services can
show that historical methods of generating schedules, without include the bus drivers and supervisors in the process and
incorporating passenger data, are insufficient in modelling communicate with them the results at each step. They then
actual transit times [16]. These traditional methods do not can suggest changes based off of their own personal heuristics
adequately address challenges experienced by IUCBS–and that can improve the scheduling.
likely other similar urban areas. We found no other methods When modeling this kind of transportation, we face an
that incorporated both APC and GPS data as part of their optimization problem whose dimensions are at odds with
model from beginning to end for scheduling. each other. On the one hand, for example, we want to reduce
travel time, the number of buses, and the fuel used. On the
III. M ETHODOLOGY other we want to increase passenger count and goodwill. The
In Fig. 6 we provide an overview of the solution. As a most common approach is to ultimately focus reducing time,
bus moves along a route (6A) its transponder emits a signal bounding other dimensions by physical or optimally required
that, when next to a stationary drum (stationed along routes), values e.g., number of passengers,buses, or travel time. One
triggers a save (locally on the bus computer). A set of dual of the many formulations of the optimization problem for
lasers placed at the entry and exit trigger GPS-time data that designing both routes and timetables can be described as [3] :
is saved as well. Events are then assigned and grouped by
stop (not a physical stop, but a location that has two different Xn X
n X
times with one GPS). A running total of passengers on a M in{c1 [ dij tij ] + c2 [ fk Tk ]} (1)
bus is computed at each stop and cleared at the completion j=1 i=1 all k∈SR

of a route. This is both ∆GP Si , ∆AP Ci where i indicates Subject to:


year and i + 1 the following year. At the end of the route
(6B) the bus uploads the raw data via Wi-Fi. The data is Frequency feasibility: fk ≥ fmin
cleaned, transformed, and reduced (6C) both for noise and ∀k ∈ SR (2)
redundancy using empirically derived rules. For example, (Qk )max
data from a stoplight or stop sign is generally erroneous. Load Factor Constraint: LFk = ≤ LFmax
fk CAP
Another example are GPS coordinates that, because of noise,
∀k ∈ SR (3)
give spurious locations (outside the route). Lastly, the lasers X X
sometimes signal passenger movement when no passengers Fleet Size Constraint: Nk = [ fk Tk ] ≤ W
are either entering or exiting. The application for Leave Time ∀k∈SR ∀k∈SR
Optimization (LTO) and Time Tables (TT) works in concert ∀k ∈ SR (4)
with metrics on the fleet (6D) e.g., platooning that captures the
Manhattan distance between buses. The schedule for the year Where these are the variables of interest:
is produced (6E). The following year’s data (same process up 1) dij = Passenger demand (Number of passengers) from
to (6D)) is used to assess the data-driven schedule. We also i to j
2) tij = Total travel time from i to j - equal to tW t,ij +
tinvtt,ij
3) fk = Frequency of buses on route k
4) Tk = Total route time from beginning to end
5) LFk = Passenger load (load factor)
6) tW t,ij = Waiting time between i and j
7) tinvtt,ij = In vehicle travel time between i and j
8) (Qk )max = maximum flow on route k
9) CAP = Seating capacity
10) W = Fleet size
11) Nk = Number of buses on k Fig. 7: A representative model of routes for IUCBS. Each
12) SR = Set of total routes route has a spur where dorms and external campus parking is
located from a central rotation around campus where classes
Where i and j are nodes on the transportation network of
and offices are located. This is to provide a easily understood
route k, and c1 , c2 are relative weights that describe the relative
underlying mechanism of public transportation for IUCBS that
importance of both sides. Other formulations exist [9], but they
necessitates a specific model.
are all very similar. This problem is a non-linear, mixed integer
programming problem, which leads to it being NP-Hard and
impossible to get a direct solution for any transit system [15]. Constant Headway Leave Times GPS Data Leave Times
This is even before we introduce the fact that, in the real world, 9:50am 9:50am
9:56am 9:57am
the transit network is a considered a stochastic process with 10:02am 10:03am
multiple possible outcomes of each trip on the route. These 10:08am 10:10am
and other factors indicate the overall difficulty of the problem 10:14am 10:15am
[8]. IUCBS primary concern does not lay with the design of
TABLE II: An example of constant headway optimization
routes, since they feel like the current system has performed
versus times supported by actual data. Headway is the distance
adequately and there is a general expectation that routes will be
between buses on a route. Constant headway is assumed to be
around and through the institution. Their interest is, instead,
6 minutes. GPS data leave times more accurately represent
in the automatic scheduling of both schedules (to provide a
the actual travel times of the buses on route since they use
more efficient service) and employees (to save on man hours).
historical data [16] [1].
These constraints are within control and can be changed or
even fixed allowing simplification the optimization problem.
Seating capacity, fleet size, number of buses, frequency of
buses, the set of routes, and the maximum load cannot be multiple reasons on IUCBS’ data, and we suspect on urban
changed, since those involve higher level budgetary decisions. settings similar to ours. Using a small representative example
The combination of limiting route design and not having of IUCBS routes shown in Fig. 7 illustrates how our method
transfer concerns prevents IUCBS from applying techniques succeeds where typical optimization methods fail. Stops are
such as from [36]. Therefore the limited values that are studied indicated by nodes s1, s2, . . . , s7 with edges. An undirected
are: edge indicates a bidirectional edge. The two icons people
and academic hat represent student housing and classrooms,
1) dij = Passenger demand (Number of passengers) from respectively. We observe that the vast majority of student
i to j traffic is from [s1, s2, s3, s4] → [s5, s6, s7], [s1, s2, s3, s4] →
2) tij = Total travel time from i to j (From passenger [s1, s2, s3, s4] or [s5, s6, s7] → [s1, s2, s3, s4]. Student traf-
arrival at stop to alighting) fic mostly consists of travel from housing (located around
To minimize the entire function for IUCBS (and any bus [s5, s6, s7]) to classes (located around [s1, s2, s3, s4]), classes
service with a fixed fleet, route and budget) is therefore to classes, or classes back to housing. This kind of move
reduced to minimizing these two factors allowing us to not eliminates the presence of transfer calculations disallowing
only disregard traditional optimization, but also directly reach formulations like [22]. We remark that transfer considerations
a solution that directly affects the campus, e.g, fuel costs and are the principle approach to optimizing of scheduling [19].
passenger participation. To accomplish this we created two Additionally, the asynchronous bus movement prevents con-
different synergistic algorithms: one which involves setting up gestion, viz., different buses along different routes will produce
the leave times from the beginning of each route, and one and worsen congestion when stopping at a shared route.
that fills in the timetable for each trip. Both of these are Alternatively, platooning is synchronicity of different buses
agnostic toward factors such as fleet size, route design and along the same route to reduce congestion as well as ensuring
man hours. These algorithms solve the optimization problem goodwill. Secondly, IUCBS’ system is a very small network
by producing a schedule for the entire semester, as a fixed that has a fixed route system, eliminating both the need and
master schedule for students to rely on during the year is the improvement from optimization methods that include route
required by IUCBS. Previous optimization methods fail for design. Lastly, and most importantly, the majority of models
do not take in ∆GP S or ∆AP C data, which hinders the given GPS constraints, therefore the maximal amount of buses
effectiveness of an optimization when assessed against the data travelling on route) based on the amount of time the buses
[16] [1]. Once transfer considerations are removed, and the have historically took around their routes and additional break
actual data is ignored, the optimal scheduling is one with the time. This headway value then is updated at each bus departure
smallest fixed headway given the constraints (2) - (4) (shown interval using the same method.
by (5) & (6)). However applying data to the problem allows
us to update the headway and make other considerations that Algorithm 1 Leave times on Routes - Naive Method
will improve the resulting schedule. The difference between 1: gpsLeeway ← 5 min
the two methods is shown in TABLE II. 2: breakT ime ← 10 min
1) Original Solution: The first attempt we made to leverage 3: for r in Routes do
∆GP S and ∆AP C was to adapt the schedule within a 4: for d in Days do
month for new academic semesters (August). We developed 5: t0 ← Begin(r, d) {First trip on the route}
a heuristic that allowed us to have an acceptable solution 6: t1 ← End(r, d) {Last trip on the route}
in a very short time frame which was necessary given the 7: LeaveT imes[r][d] ← [] {Initialize time list}
real-word constraints of IUCBS. We took the schedule that 8: t ← t0
IUCBS used in a previous Fall semester and adapted it by 9: while t between t0 and t1 do
generating data-driven heuristics such as in Fig. 10. Using 10: LeaveT imes[r][d].append(t) {Add current time
these generated heuristics, we then adjusted the schedule to to leave times}
match what was actually happening daily. This was sufficient 11: tripT ime ← AV G(GP S(t − gpsLeeway, t +
for a modest improvement (See Fig. 8). There was still better gpsLeeway, r, d)) {Get next leave time based off
solutions available that we explored for future semesters of GPS data}
building upon the transit optimization literature adapted for 12: t ← t + tripT ime + breakT ime {Increment to
a campus environment. next leave time}
2) Leave times on routes: We then developed a leave time 13: end while
algorithm (LTO) to maximize the efficiency of the route and 14: end for
best match the schedule to the real world data. We first fixed 15: end for
the amount of buses and work schedule of the drivers based 16: return LeaveT imes
on the heuristics from IUCBS. After fixing these variables, Where Begin(r, d) returns the beginning of the schedule for
the optimization problem is reduced to one that minimizes the route r on day d, End(r, d) returns the end of the schedule
length between 2 buses arriving at a particular stop assuming for route r on day d, and GP S(t1, t2, r, d) returns the set of
a uniform random distribution of passengers. First the total all round trip times between t1 and t2 on r on days d. The
travel time is defined as in vehicle travel plus waiting time: complexity of this algorithm is dependent solely on the amount
Xn X
n X n
n X of GPS data points used in the calculation. If gpsLeeway ∗ 2
M in{ dij tij } = M in{ dij (tinvtt,ij + tW t,ij )} is less than the difference between leave times, than all of
j=1 i=1 j=1 i=1 the calculations of AV G can’t exceed the size of ∆GP S.
(5) Therefore this algorithm is on the order of O(∆GP S).
If we assume invariant travel times from i to j as IUCBS Lastly, the strongest method includes APC data, since we
routes and traffic are fixed and unable to be optimized, we can now implement it using dij as part of the optimization.
can eliminate the term tinvtt,ij . Next we assume passenger dij varies widely throughout the day for most areas, espe-
arrival is uniformly distributed for the first naive method cially for a college campus - so as part of our optimization
(tW t,ij being the time difference between passenger arrival problem, we vary d and p by time as well as by stops.
tp,ij and closest bus arrival tb,ij ) as well as the cost of waiting Our improved
function is identical and convex among all passengers–the Pp Pnalgorithm
Pn takes this into account by calculating
M IN ( j=1 i=1 dij,p tij,p ) for each route at each leave
same assumption used in [12]. time p, subject to the constraints below. We first precalculate
n X
n n X
n
X X dij,p at time p, using the APC data to determine the amount
M in{ dij tW t,ij } = M in{ dij (tp,ij − tb,ij )} of riders that get on bus x between xi and yi , where bi is
j=1 i=1 j=1 i=1 the time bus b arrives at stop i, and bus y is the bus last at
(6) stop i. We require a distribution over these times; however we
Then tij itself is minimized when the amount of buses that only have access to the amount of students that get on the bus
travel from i to j is maximal as that reduces the sums of all at time xi . There is no direct way to determine when exactly
(tp,ij − tb,ij ) with uniform (tp,ij . This maximization can be between xi and yi the rider arrives. A Poisson distribution
performed by utilizing GPS data to determine the times that would accurately determine the number of passengers given
each bus should leave. the rate between times xi and yi , but in order to determine
To calculate the leave times using this algorithm, we calculate the rate from ∆AP C, we are forced to assume that passengers
the optimal headway (the smallest distance between buses arrive at the stop in a stochastic, uniform manner between
bus stop events. This could be slightly imprecise when, for B. Preprocessing
example, it is shown that most riders arrive close to when The GPS snapshot at each point in time was preprocessed
the bus gets to the stop by watching the live bus tracker or to save computation time by (interestingly) enrichment: the
looking at the schedule. We claim, however, that the time the current stop location and its alightings (exiting the bus),
passengers actually arrive and the time that is most convenient boardings, and cumulative load at that position. To accomplish
for them are different, removing the issue when the times the data addition, we used a data set of passenger boarding
people want to leave are uniformly distributed (See Future and alighting from the buses at each stop. We adjusted for
Work section below for more details on improved estimation). situations where the APC data was erroneous from incorrect
We then can run an optimization algorithm on the minimum passenger readings. A filter was applied to remove situations
to generate the leave times that minimize formula 1, subject where both the passenger load exceeded the maximum pos-
additionally to: sible load or was below the minimum load and occurrences
1) tij,p = GP where the amount of passengers getting on or off triggered on
P S(i, j, p) + tW t,i,j
2) Tk,p = GP S(i, j, p) + breakT ime at time p an outlier detection.
1) Data Reduction and Storage: We wanted to reduce ∆
3) Intermediate Timetables: After determining the headway
to improve running time by selecting the data needed before
value at each time point we then fill-in the remainder of
hand. First we applied a data reduction routine that then
the schedule based on the historical GPS data. The times
allowed us to determine the exact bus load, boarding and
were generated from the distribution of those GPS points and
alightings at each GPS point identified above instead of having
determining the minimization of how long a passenger would
multiple singular passenger entry and exit events. Additionally
wait if she had arrived at that stop at the posted time.
we performed a reset on every trip of the passenger count,
as every IUCBS route vacates the bus after the completion
Algorithm 2 Determination of intermediate stop times
of each route to reduce any biases from building up over
1: gpsLeeway ← 5 min time. The routine involved having a running total of current
2: for r, d, t in Routes, Days, LeaveTimes[r][d] do passenger load, taking into account the corrected APC data
3: TimeTables[r][d][t] ← [t] {Loop over all the leave from preprocessing. We were able to reduce the amount of
times, add the first to the timetable} APC data points from 1,785,888 passenger entrances and
4: for i in range(1, Stops(r)) do exits to 641,009 stop events. This improved our algorithms
5: betweenStops ← AVG( GPS STOPS (t-gpsLeeway, by reducing the runtime by a factor of N/3 where N is the
t+gpsLeeway, i, i − 1, r, d)) {Increment to next stop overall size of the APC data. ∆GP S was also reduced by
based off of GPS data within gpsLeeway} eliminating all points that were not associated with a specific
6: TimeTables[r][d][t][i] ← TimeTables[r][d][t][i − stop on the bus’ current route. This allowed us to reduce the
1]+betweenStops {Add stop time to the time table} data size from around 100 million data points to around 20
7: end for million. The data is then stored as seen in Figs. 2a and 2c.
8: end for
9: return TimeTables V. R ESULTS
Where Stops(r) gets the major stops on route r and GPS(t1, Using the data and analyses we were able to improve the
t2, i, i − 1, r, d) returns the set of all round trip times performance of the bus system while decreasing passenger
between t1 and t2 on r on days d between stops i and i − 1. wait times. The previous schedule contained 119 trips, and
Similarly to the naive leave times algorithm, if gpsLeeway ∗2 required 88.2 man hours, resulting in 1.35 trips per man hour.
is less than the difference between leave times, than all of The new schedule contains 130 trips, and requires 88.3 man
the calculations of AV G can’t exceed the size of ∆GP S. hours, resulting in 1.47 trips per man hour. This is a 9.12%
Therefore this algorithm is on the order of O(∆GP S). increase in efficiency over the previous schedule. After the im-
plementation of the new schedule, total fuel consumption was
reduced from 93,402 gallons to 90,877 gallons, a reduction of
2.7% despite the additional trips. We believe this is because
IV. DATA no additional time in route for the fleet was added compared
to the original schedule. This resulted from our elimination of
A. Summary
erroneous schedule estimations by better estimating how the
GPS and Automated Passenger Count (APC) data total buses are actually being run on each route.
approximately 112 million data points over a period of the Conversely, the previous schedule had an average recovery
past 2 years. To work with this large volume of data, we percentage of 35.49%, while the new schedule has an average
preprocessed it for faster access for our algorithms. After recovery percentage of 35.08%. The industry standard for
this was completed we then stored the data in a relational recovery percentage is 14.00% [2]. Previous schedules were
database with other supporting tables including the IUCBS drafted with exceptionally large recovery times in order to
routes, stops, previous timetables, registrar data, and a model compensate for the inability to predict campus conditions, and
of the campus’ transportation network. therefore total transit time. The additional recovery time was
0.25 Bus Punctuality A Route Load Profile

Spring 2016 Mean Spring 2016: −0.107 ●

25
Fall 2016 Std Dev Spring 2016: 2.746
Spring 2017 Mean Fall 2016: 0.01 ●

Std Dev Fall 2016: 2.479


Mean Spring 2017: −0.123
0.20

Average Number of Passengers at Stop


Std Dev Spring 2017: 2.026


● ●
● ●

20

0.15



● ●
Frequency


0.10

15

0.05

10
0.00

−10 −5 0 5 10 67 39 38 37 41 1 4 6 8 10 11 12 13 14 36 30 35

Difference from Posted Time Stop on Route

Fig. 8: Bus punctuality on the A route across several semesters. Fig. 9: A Route load profile. Data was taken from APC data
Spring 2016 was the last semester before this project, Fall 2016 accumulated over almost a full year. After removing outliers,
was the initial first pass solution, and Spring 2017 was after the we averaged the ridership on each stop of each route by both
full optimization solution. The most important measure of bus time and day. This figure shows the average of those plots for
punctuality is the standard deviation being low - representing the A Route Monday-Thursday.
a tighter grouping of when the buses actually arrive around
the posted schedule. This shows the improvement we’ve made
over the traditional schedule using ∆GP S and ∆AP C develop our current schedule, we have the opportunity to use
current data to evaluate that same schedule - some of which
is explained here.
used as a cushion in the event of late buses. We were able to To determine the efficiency of the schedule we generated
make these changes without adjusting break times or causing a large set of visualizations that allowed the bus operators
more work for our drivers. Given that the new schedule should to determine how well the schedule was meeting their goals.
accurately predict demand, and transit times, it should further We leveraged ∆AP C to determine the mean passenger load
result in a larger amount of usable break times for drivers, at each stop (See fig. 9), as well as outlier/full bus detection
as the recovery time is needed less as a cushion. As such, it to observe both the quality of the data as well as potential
should be possible to reduce the recovery percentage in the problem areas. Additionally we used ∆GP S to keep track
future to be more in line with the industry standard, without during the semester to figure out if the schedule was correctly
significantly effecting driver break time. estimating bus travel times (See Fig. 10). We also kept track
In addition the new schedule improved the accuracy of of platooning, where 2 buses serving the same route are close
the schedule. Fig. 8 shows the improvement overall from the enough to impact service. (See Fig. 11 ) Platooning causes both
old schedule to the new in relation to when buses actually longer wait times and insufficiently filled buses and can happen
arrive at their stops, i.e. the difference in punctuality has been during periods of intermittent and excessive traffic conditions.
significantly reduced.

A. Heuristics VI. R ELATED W ORK


Many bus services, including IUCBS, do not leverage Transit scheduling has attracted the attention of scientists
∆GP S and ∆AP C in evaluation of their transit network. for almost 50 years [14], with varying degrees of success [6].
However there has been significant work in both determining Most of the literature on the subject has focused on the labor
effectiveness of the transit network [25] [17] [32] and leverag- portion [35] [5] of the problem or the optimization of trip
ing the data for real time updates [29] [10]. We similarly chose times [23], with the occasional focus on solving both problems
to determine the success of our newly generated schedules by simultaneously [20]. In Siu et al., [30], the authors used big
developing some new tools using inspiration from the literature data to improve market sharing rate. Other ideas have been
to act on these data sources. While we use past control data to put forth using novel ideas and heuristics for the optimization
Heatmap of Platooning

level

60000

40000

20000

Fig. 10: Indiana Memorial Union (IMU) (the most shared Fig. 11: A Route platooning heatmap. Data was generated
stop). Actual stop times difference from posted times (A from the GPS data set. We applied a filter where we tracked
Route). This plot shows the difference between the schedule whether buses were within 100 feet of each other going the
and the actual arrival times of buses. We took the set of points same direction on the same route within 5 seconds. Each
of buses on each route within 50 feet of the stop of interest. instance of this was added as an example of platooning
The earliest point for each set of points was selected as the and added to the plot as an average of the two buses GPS
arrival time for that particular bus. We then calculated the coordinates.
difference between the arrival time and the closest scheduled
stop on the timetable. experience viz, if a passenger arrives to the bus stop according
to the schedule, but the bus left 2 minutes early, there could
be an extremely long wait for the next bus. In fact, because
part of the transit problem. One such involves using Constraint of this tandem phenomenon, arriving later is better than early,
Programming, a programming paradigm that codes the con- which is accounted for in our model.
straints of the problem to develop a solution using software VII. F UTURE W ORK
packages [4]. Other ideas involve using Mixed Integer Non-
We are currently researching ways to use the data from a
Linear Programming (MINLP) a group of techniques that
university’s Registrar’s Office to determine the student traffic
solve an objective function and a series of constraints. [22]
on campus more accurately, since using ∆AP C alone is
[33] One interesting approach for scheduling fixes cost of
insufficient to determine the actual times passengers arrive at,
waiting to be identical for all passengers for the model in
or wish to arrive at the bus stop for travel to their destination.
[12]; however, their model includes knowing when passengers’
As shown in Fig. 12 there is a large difference in needs for
desired travel times instead of uniformly distributing them as
buses at particular times on a general campus. We also are
ours (See Future Work). Several surveys of the literature in this
designing an app that allows students to signal either their
area have a complete summary of these methods and others
location or intended location to create a dynamic schedule.
[6] [19] [7].
Lastly, we are working several university towns similar to
Measuring customer satisfaction in regards to scheduling is
Bloomington to help improve PT.
central to many transportation industries, especially airlines.
Acknowledgements The authors thank the anonymous re-
Regularly the “best” airlines for on-time flights are published
viewers for their informative, encouraging comments and
and are a critical determiner of customer satisfaction. There
suggestions. This work was partially supported by NCI Grant
is a concerted effort to have a large percentage of flights
1R01CA213466-01.
being on-time compared to the schedule, which can cause
airlines to schedule later arrivals to meet the deadlines more R EFERENCES
often https://tinyurl.com/ybsckdad. In our model, however, we [1] Luca Allulli, Giuseppe F. Italiano, and Federico Santaroni. Exploiting
established that early arrivals are detrimental to the customer gps data in public transport journey planners. pages 295–306, 2014.
[13] Daniel Delling, Giuseppe F. Italiano, Thomas Pajor, and Federico
Santaroni. Better transit routing by exploiting vehicle gps data. In
Proceedings of the 7th ACM SIGSPATIAL International Workshop on
Computational Transportation Science, IWCTS ’14, pages 31–40, New
York, NY, USA, 2014. ACM.
[14] SE Elias. The use of digital computers in the economic scheduling for
both man and machine in public transportation. 1964.
[15] Reza Zanjirani Farahani, Elnaz Miandoabchi, W.Y. Szeto, and Hannaneh
Rashidi. A review of urban transportation network design problems.
European Journal of Operational Research, 229(2):281 – 302, 2013.
[16] Donatella Firmani, Giuseppe F. Italiano, Luigi Laura, and Federico
level Santaroni. Is timetabling routing always reliable for public transport?
33:15–26, September 2013.
20000 [17] Peter G Furth, Brendon Hemily, Theo HJ Muller, and James G Strath-
man. Using archived avl-apc data to improve transit performance and
10000
management. 2006.
[18] Maria Elisa Alen Gonzalez, Lorenzo Rodriguez Comesana, and Jose
Antonio Fraiz Brea. Assessing tourist behavioral intentions through
perceived service quality and customer satisfaction. Journal of Business
Research, 60(2):153–160, February 2007.
[19] Valrie Guihaire and Jin-Kao Hao. Transit network design and schedul-
ing: A global review. Transportation Research Part A: Policy and
Practice, 42(10):1251 – 1273, 2008.
[20] Knut Haase, Guy Desaulniers, and Jacques Desrosiers. Simultaneous ve-
hicle and crew scheduling in urban mass transit systems. Transportation
science, 35(3):286–303, 2001.
[21] Rabiul Islam, Mohammed S Chowdhury, Mohammad Sumann Sarker,
and Salauddin Ahmed. Measuring customer’s satisfaction on bus
transportation. 2014.
Fig. 12: Heat map of student class sizes on the IU campus. [22] Leise Neel Jansen, Michael Berliner Pedersen, and Otto Anker Nielsen.
Minimizing passenger transfer times in public transport timetables. In
Data was taken from the Indiana University’s Registrar’s 7th Conference of the Hong Kong Society for Transportation Studies,
Office. Different areas of campus have vastly varying class Transportation in the information age, Hong Kong, pages 229–239,
sizes, both aggregated across all times of day as this figure is 2002.
[23] Natalia Kliewer, Taieb Mellouli, and Leena Suhl. A time–space
or per hour. We can use this data to leverage improvements network based exact optimization model for multi-depot bus scheduling.
to the public transportation schedule determined from where European journal of operational research, 175(3):1616–1627, 2006.
and when on campus the student traffic is originating from. [24] Wen-Tai Lai and Ching-Fu Chen. Behavioral intentions of public transit
passengersthe roles of service quality, perceived value, satisfaction and
involvement. Transport Policy, 18(2):318–325, 2011.
[25] ElGeneidy Ahmed M., Horning Jessica, and Krizek Kevin J. Analyzing
[2] American Planning Association. Planning and urban design standards. transit service reliability using detailed data from automatic vehicular
John Wiley & Sons, 2006. locator systems. Journal of Advanced Transportation, 45(1):66–79,
[3] M. Hadi Baaj and Hani S. Mahmassani. An ai-based approach for transit 2011.
route system planning and design. Journal of Advanced Transportation, [26] K. R. MacCrimmon and M. Toda. The Experimental Determination of
25(2):187–209, 1991. Indifference Curves. Review of Economic Studies, 36(4):433–451, 1969.
[4] Alexandre Barra, Luis Carvalho, Nicolas Teypaz, Van-Dat Cung, and [27] James H Miller. Transportation on college and university campuses.
Ronaldo Balassiano. Solving the Transit Network Design problem Number Project J-7, Topic SA-11. 2001.
with Constraint Programming. In 11th World Conference in Transport [28] IU Office of Sustainability. Transportation: Student commuting.
Research - WCTR 2007, pages –, Berkeley, United States, June 2007. http://sustain.iu.edu/overview/indicators/transportation.php, 2012.
[5] Avishai Ceder. Transit scheduling. Journal of advanced transportation, [29] Amer Shalaby and Ali Farhan. Prediction model of bus arrival and
25(2):137–160, 1991. departure times using avl and apc data. Journal of Public Transportation,
[6] Avishai Ceder. Urban transit scheduling: framework, review and exam- 7(1):3, 2004.
ples. Journal of Urban Planning and Development, 128(4):225–244, [30] Jau-Ming Su, Nomungerel Erdenebat, Liang-Hua Ho, and Yu-Ting Zhan.
2002. Integration of transit demand and big data for bus route design in taiwan.
[7] Avishai Ceder and Nigel H.M. Wilson. Bus network design. Trans- pages 19–26, 2016.
portation Research Part B: Methodological, 20(4):331 – 344, 1986. [31] Yannis Tyrinopoulos and Constantinos Antoniou. Public transit user
[8] Partha Chakroborty. Genetic algorithms for optimal urban transit satisfaction: Variability and policy implications. Transport Policy,
network design. Computer-Aided Civil and Infrastructure Engineering, 15(4):260–272, 2008.
18(3):184–200, 2003. [32] Niels van Oort, Daniel Sparing, Ties Brands, and Rob M. P. Goverde.
[9] Partha Chakroborty, Kalyanmoy Deb, and P. S. Subrahmanyam. Data driven improvements in public transport: the dutch example. Public
Optimal scheduling of urban transit systems using genetic algo- Transport, 7(3):369–389, Dec 2015.
rithms. Journal of Transportation Engineering, 121(6), Issue: ob- [33] Quentin K. Wan and Hong K. Lo. A mixed integer formulation
ject: doi:10.1061/jtpedi.1995.121.issue-6, revision: rev:1479464639368- for multiple-route transit network design. Journal of Mathematical
29611:doi:10.1061/jtpedi.1995.121.issue-6, . Modelling and Algorithms, 2(4):299–308, 2003.
[10] Mei Chen, Xiaobo Liu, and Jingxin Xia. Dynamic prediction method [34] Michael E Williams and Kathleen L Petrait. U-pass: A model transporta-
with schedule recovery impact for bus arrival time. tion management program that works. Transportation Research Record,
[11] John Daggett and Richard Gutkowski. University transportation survey: (1404), 1993.
Transportation in university communities. Transportation Research [35] Anthony Wren and David O Wren. A genetic algorithm for public trans-
Record: Journal of the Transportation Research Board, (1835):42–49, port driver scheduling. Computers & Operations Research, 22(1):101–
2003. 110, 1995.
[12] Andr de Palma and Robin Lindsey. Optimal timetables for public trans- [36] Fang Zhao and Xiaogang Zeng. Optimization of transit route network,
portation. Transportation Research Part B: Methodological, 35(8):789 vehicle headways and timetables for large-scale transit networks. Euro-
– 813, 2001. pean Journal of Operational Research, 186(2):841 – 855, 2008.

View publication stats

You might also like