Open Data: Challenges and Opportunities For Transit Agencies (2015)
Open Data: Challenges and Opportunities For Transit Agencies (2015)
Open Data: Challenges and Opportunities For Transit Agencies (2015)
org/22195
DETAILS
123 pages | 8.5 x 11 | PAPERBACK
ISBN 978-0-309-43272-6 | DOI 10.17226/22195
CONTRIBUTORS
Carol L. Schweiger; Transit Cooperative Research Program; Transportation
Research Board; National Academies of Sciences, Engineering, and Medicine
BUY THIS BOOK
Visit the National Academies Press at nap.edu and login or register to get:
– Access to free PDF downloads of thousands of publications
– 10% off the price of print publications
– Email or social media notifications of new titles related to your interests
– Special offers and discounts
All downloadable National Academies titles are free to be used for personal and/or non-commercial
academic use. Users may also freely post links to our titles on this website; non-commercial academic
users are encouraged to link to the version on this website rather than distribute a downloaded PDF
to ensure that all users are accessing the latest authoritative version of the work. All other uses require
written permission. (Request Permission)
This PDF is protected by copyright and owned by the National Academy of Sciences; unless otherwise
indicated, the National Academy of Sciences retains copyright to all materials in this PDF with all rights
reserved.
Open Data: Challenges and Opportunities for Transit Agencies
Consultant
Carol L. Schweiger
TranSystems Corporation
Boston, Massachusetts
S ubject A reas
Planning and Forecasting • Public Transportation
The nation’s growth and the need to meet mobility, environ- Project J-7, Topic SA-34
mental, and energy objectives place demands on public transit ISSN 1073-4880
systems. Current systems, some of which are old and in need of ISBN 978-0-309-27171-4
upgrading, must expand service area, increase service frequency, Library of Congress Control Number 2014959838
and improve efficiency to serve these demands. Research is nec- © 2015 Transportation Research Board
essary to solve operating problems, to adapt appropriate new
technologies from other industries, and to introduce innovations
into the transit industry. The Transit Cooperative Research Pro-
COPYRIGHT PERMISSION
gram (TCRP) serves as one of the principal means by which the
transit industry can develop innovative near-term solutions to Authors herein are responsible for the authenticity of their materials and
meet demands placed on it. for obtaining written permissions from publishers or persons who own the
The need for TCRP was originally identified in TRB Special copyright to any previously published or copyrighted material used herein.
Cooperative Research Programs (CRP) grants permission to reproduce
Report 213—Research for Public Transit: New Directions, pub-
material in this publication for classroom and not-for-profit purposes.
lished in 1987 and based on a study sponsored by the Federal Permission is given with the understanding that none of the material will be
Transit Administration (FTA). A report by the American Public used to imply TRB, AASHTO, FAA, FHWA, FMCSA, FTA, or Transit
Transportation Association (APTA), Transportation 2000, also Development Corporation endorsement of a particular product, method, or
recognized the need for local, problem-solving research. TCRP, practice. It is expected that those reproducing the material in this document
modeled after the longstanding and successful National Coopera- for educational and not-for-profit uses will give appropriate acknowledgment
tive Highway Research Program, undertakes research and other of the source of any reprinted or reproduced material. For other uses of the
technical activities in response to the needs of transit service pro material, request permission from CRP.
viders. The scope of TCRP includes a variety of transit research
fields including planning, service configuration, equipment, fa-
cilities, operations, human resources, maintenance, policy, and NOTICE
administrative practices. The project that is the subject of this report was a part of the Transit
TCRP was established under FTA sponsorship in July 1992. Cooperative Research Program conducted by the Transportation Research
Proposed by the U.S. Department of Transportation, TCRP was Board with the approval of the Governing Board of the National Research
authorized as part of the Intermodal Surface Transportation Effi- Council. Such approval reflects the Governing Board’s judgment that the
ciency Act of 1991 (ISTEA). On May 13, 1992, a memorandum project concerned is appropriate with respect to both the purposes and
agreement outlining TCRP operating procedures was executed by resources of the National Research Council.
the three cooperating organizations: FTA, the National Academy The members of the technical advisory panel selected to monitor this
of Sciences, acting through the Transportation Research Board project and to review this report were chosen for recognized scholarly
competence and with due consideration for the balance of disciplines
(TRB); and the Transit Development Corporation, Inc. (TDC), a
appropriate to the project. The opinions and conclusions expressed or
nonprofit educational and research organization established by implied are those of the research agency that performed the research, and
APTA. TDC is responsible for forming the independent governing while they have been accepted as appropriate by the technical panel, they
board, designated as the TCRP Oversight and Project Selection are not necessarily those of the Transportation Research Board, the
(TOPS) Committee. Transit Development Corporation, the National Research Council, or the
Research problem statements for TCRP are solicited periodi- Federal Transit Administration of the U.S. Department of Transportation.
cally but may be submitted to TRB by anyone at any time. It is the Each report is reviewed and accepted for publication by the technical
responsibility of the TOPS Committee to formulate the re-search panel according to procedures established and monitored by the
program by identifying the highest priority projects. As part of the Transportation Research Board Executive Committee and the Governing
evaluation, the TOPS Committee defines funding levels and Board of the National Research Council.
expected products.
Once selected, each project is assigned to an expert panel, ap-
pointed by TRB. The panels prepare project statements (requests The Transportation Research Board of The National Academies, the
for proposals), select contractors, and provide technical guidance Transit Development Corporation, the National Research Council, and the
and counsel throughout the life of the project. The process for Federal Transit Administration (sponsor of the Transit Cooperative
developing research problem statements and selecting research Research Program) do not endorse products or manufacturers. Trade or
manufacturers’ names appear herein solely because they are considered
agencies has been used by TRB in managing cooperative re-search
essential to the clarity and completeness of the project reporting.
programs since 1962. As in other TRB activities, TCRP project
panels serve voluntarily without compensation.
Because research cannot have the desired impact if products
fail to reach the intended audience, special emphasis is placed on
disseminating TCRP results to the intended end users of the Published reports of the
research: transit agencies, service providers, and suppliers. TRB
TRANSIT COOPERATIVE RESEARCH PROGRAM
provides a series of research reports, syntheses of transit practice,
and other supporting material developed by TCRP research. are available from:
APTA will arrange for workshops, training aids, field visits, and Transportation Research Board
other activities to ensure that results are implemented by urban Business Office
and rural transit industry practitioners. 500 Fifth Street, NW
Washington, DC 20001
The TCRP provides a forum where transit agencies can coop-
eratively address common operational problems. The TCRP results and can be ordered through the Internet at
http://www.national-academies.org/trb/bookstore
support and complement other ongoing transit research and
training programs. Printed in the United States of America
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished schol-
ars engaged in scientific and engineering research, dedicated to the furtherance of science and technology
and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in
1863, the Academy has a mandate that requires it to advise the federal government on scientific and techni-
cal matters. Dr. Ralph J. Cicerone is president of the National Academy of Sciences.
The National Academy of Engineering was established in 1964, under the charter of the National Acad-
emy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration
and in the selection of its members, sharing with the National Academy of Sciences the responsibility for
advising the federal government. The National Academy of Engineering also sponsors engineering programs
aimed at meeting national needs, encourages education and research, and recognizes the superior achieve-
ments of engineers. Dr. C. D. Mote, Jr., is president of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the
services of eminent members of appropriate professions in the examination of policy matters pertaining
to the health of the public. The Institute acts under the responsibility given to the National Academy of
Sciences by its congressional charter to be an adviser to the federal government and, upon its own
initiative, to identify issues of medical care, research, and education. Dr. Victor J. Dzau is president of
the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences in 1916 to associate
the broad community of science and technology with the Academy’s purposes of furthering knowledge and
advising the federal government. Functioning in accordance with general policies determined by the Acad-
emy, the Council has become the principal operating agency of both the National Academy of Sciences
and the National Academy of Engineering in providing services to the government, the public, and the
scientific and engineering communities. The Council is administered jointly by both Academies and the
Institute of Medicine. Dr. Ralph J. Cicerone and Dr. C. D. Mote, Jr., are chair and vice chair, respectively,
of the National Research Council.
The Transportation Research Board is one of six major divisions of the National Research Council. The
mission of the Transportation Research Board is to provide leadership in transportation innovation and
progress through research and information exchange, conducted within a setting that is objective, interdisci-
plinary, and multimodal. The Board’s varied activities annually engage about 7,000 engineers, scientists, and
other transportation researchers and practitioners from the public and private sectors and academia, all of
whom contribute their expertise in the public interest. The program is supported by state transportation
departments, federal agencies including the component administrations of the U.S. Department of Transporta-
tion, and other organizations and individuals interested in the development of transportation. www.TRB.org
www.national-academies.org
CHAIR
BRAD J. MILLER, Pinellas Suncoast Transit Authority, St. Petersburg, FL
MEMBERS
DONNA DeMARTINO, San Joaquin Regional Transit District, Stockton, CA
MICHAEL FORD, Ann Arbor Transportation Authority, Ann Arbor, MI
BOBBY J. GRIFFIN, Griffin and Associates, Flower Mound, TX
ROBERT H. IRWIN, Consultant, Sooke, BC, Canada
JEANNE KRIEG, Eastern Contra Costa Transit Authority, Antioch, CA
PAUL J. LARROUSSE, Rutgers, The State University of New Jersey, New Brunswick
DAVID A. LEE, Connecticut Transit, Hartford
ELIZABETH PRESUTTI, Des Moines Area Regional Transit Authority–DART
ROBERT H. PRINCE, JR., AECOM Consulting Transportation Group, Inc., Boston, MA
FTA LIAISON
JARRETT W. STOLTZFUS, Federal Transit Administration
TRB LIAISON
JENNIFER L. WEEKS, Transportation Research Board
FOREWORD Transit administrators, engineers, and researchers often face problems for which infor-
mation already exists, either in documented form or as undocumented experience and prac-
tice. This information may be fragmented, scattered, and unevaluated. As a consequence,
full knowledge of what has been learned about a problem may not be brought to bear on its
solution. Costly research findings may go unused, valuable experience may be overlooked,
and due consideration may not be given to recommended practices for solving or alleviat-
ing the problem.
There is information on nearly every subject of concern to the transit industry. Much
of it derives from research or from the work of practitioners faced with problems in their
day-to-day work. To provide a systematic means for assembling and evaluating such use-
ful information and to make it available to the entire transit community, the Transit Coop-
erative Research Program Oversight and Project Selection (TOPS) Committee authorized
the Transportation Research Board to undertake a continuing study. This study, TCRP
Project J-7, “Synthesis of Information Related to Transit Problems,” searches out and syn-
thesizes useful knowledge from all available sources and prepares concise, documented
reports on specific topics. Reports from this endeavor constitute a TCRP report series,
Synthesis of Transit Practice.
This synthesis series reports on current knowledge and practice, in a compact format,
without the detailed directions usually found in handbooks or design manuals. Each report
in the series provides a compendium of the best knowledge available on those measures
found to be the most successful in resolving specific problems.
PREFACE The report documents the current state of the practice in the use of open data for improved
By Donna L. Vlasak transit planning, service quality, and customer information; the implications of open data
Senior Program Officer and open documentation policies; and the impact of open data on transit agencies, and the
Transportation public and private sectors. It focuses on successful practices in open transit data policies,
Research Board use, protocols, and licensing. This synthesis is intended for transit agencies, the public, and
the private sector.
A literature review and detailed survey responses from 67 of 67 agencies surveyed around
the world, including Canada and 14 European countries, a response rate of 100 percent,
are provided. Also, four case examples offer more detailed information from agencies and
organizations that have significant experience with providing open data.
Carol L. Schweiger, TranSystems Corporation, Boston, Massachusetts, collected and
synthesized the information and wrote the report, under the guidance of a panel of experts
in the subject area. The members of the topic panel are acknowledged on the preceding
page. This synthesis is an immediately useful document that records the practices that were
acceptable within the limitations of the knowledge available at the time of its preparation.
As progress in research and practice continues, new knowledge will be added to that now
at hand.
CONTENTS
1 SUMMARY
78 REFERENCES
SUMMARY In the past 5 years, more and more transit agencies have begun making schedule and real-
time operational data available to the public. “Open data” provide opportunities for agencies
to inform the public in a variety of ways about transit agency services.
The purpose of this synthesis is to document the current state of the practice and policies
in the use of open data for improved transit planning, service quality, and customer infor-
mation; the implications of open data and open documentation policies; and the impact of
open data on transit agencies and the public and private sectors. The synthesis focuses on
successful practices in open transit data policies, use, protocols, and licensing. A literature
review and survey collected key information about open transit data. The survey was sent
to 67 transit agencies around the world and had a 100% response rate. Of the 67 sur-
veys received, three were from Canadian agencies and 14 from European agencies. U.S.
responses represent agencies that carry a total of more than 5.4 billion passengers annually
(annual unlinked trips), with U.S. agencies’ annual ridership ranging from 1.8 million (a
county transit system in Florida) to 2.6 billion (Metropolitan Transportation Authority in
New York City).
The background of open transit data in the United States is as follows. Prior to 1998, data
generated by technologies deployed by public transit agencies were not made available to
the public. In 1998, Bay Area Rapid Transit released schedule data in the comma-separated
values (.csv) format; this was the first known release of transit data to the public. Tri-County
Metropolitan Transportation District of Oregon (TriMet) worked with Google in the cre-
ation of General Transit Feed Specification (GTFS, originally developed by Google and
containing static schedule information for transit agencies, including stop location, route
geometrics, and stop times) in 2005. Massachusetts Bay Transportation Authority (MBTA)
opened the agency’s data in 2009. As of April 2014, according to City-Go-Round, almost
29% of U.S. transit agencies provided open data. In 2003, the Digital Agenda for Europe,
Public Service Information Directive was issued, requiring all European Union member
states to release public sector information, including open public transport data. Many pub-
lic transit agencies in the Asia-Pacific region are beginning to open their data as well, such
as a recent initiative to combine and release the data from many public transit operators in
Tokyo, Japan.
In addition, not only have public transit authorities benefited from providing open data,
but the public, private, and independent sectors also have realized benefits. Transit authorities
that have embraced transparency by providing open data have improved the perception and
increased the visibility of transit. They also have been able to use the data they are releasing
to the public to make internal improvements. The public now has access to many free appli-
cations that provide real-time and static transit information, which greatly facilitates travel
using transit. Private businesses have been created or expanded to work with open transit data
and have developed innovative applications that, in some cases, could not have been devel-
oped in a public agency. Finally, the independent sector, consisting of academic institutions
and research and development organizations, has been instrumental in researching, analyz-
ing, using, and promoting the creation and use of open transit data.
2
This synthesis examines and documents the state of the practice in open transit data using
the following five elements:
In addition to the four case examples conducted as part of this synthesis project and pre-
sented in chapter seven, two examples show the value and power of open transit data:
• Moscow’s transit authority relied on open data to determine whether more investment
in rail networks was necessary or if other services could better meet demand. Instead of
building a new rail line at considerable expense, the authority restructured bus service,
which allowed flexibility for future shifts in population. Not only did the authority avoid
incurring more than $1 billion in infrastructure costs, but the restructured bus service
also saved an average of 3 minutes per trip during the morning commute, amounting to
10 hours of travel time for each rider every year.
• In New Jersey, NJ Transit released data on passenger flows to the public in 2012. Third
parties quickly analyzed ridership at different times of day and were able to pinpoint
underutilized rail stops, which led to more express trains and a savings of 6 minutes in
the average commuting time during peak hours.
In summary, based on the literature review, the responses to the questionnaire, and the
case examples, the key findings of this synthesis project are as follows:
• The benefits to the agency strongly support open transit data. The availability of open
transit data encourages innovation that could not be accomplished solely by agency staff.
The top five overall benefits experienced by survey respondents were (1) increased
awareness of our services; (2) empowered our customers; (3) encouraged innovation
3
outside of the agency; (4) improved the perception of our agency (e.g., openness/trans-
parency); and (5) provided opportunities for private businesses.
• Engaging application developers, other data users, and customers is an approach that
can accomplish several critical tasks, including:
–– Obtaining feedback on data anomalies and data quality issues;
–– Ensuring that some portion of the applications developed by third parties meets the
needs of customers; and
–– Finding out more about how people want to use/reuse the data.
There are several ways to engage developers and customers. Results of the survey indi-
cated that the most effective methods are conducting face-to-face events, conferences,
and “meetups.” Meetups are informal meetings to discuss particular topics, such as
application development.
• The results of the literature review and survey indicate that standards and commonly
used formats can be used to facilitate the generation and use of open data. Further, using
standards makes it easier to transfer applications from one agency to another.
• Open transit data result in innovation that could not be accomplished within a transit
agency. That is not to say that sufficient intellect does not exist in a transit agency;
rather, it is an issue of having sufficient resources to develop applications and conduct
analyses at the scale that can be done in an open market.
4
–– The top three steps that respondents took to disclose their data publicly were convert-
ing transit data into formats suitable for public use; improving data quality to ensure
accuracy and reliability; and adopting an open, nonproprietary data standard.
• Uses of open data:
–– The top five types of customer applications that have been developed as a result of
providing open data are (in descending order of frequency) trip planning, mobile
applications, real-time transit information (arrival/departure times, delays, detours),
maps, and data visualization;
–– The top five decision-support tools that have been developed are data visualization,
service planning and evaluation, route layout and design, performance analysis, and
travel time and capacity analysis;
–– Almost two-thirds (33 or 63.5%) of respondents do not track usage of their open data;
–– The two most prevalent methods of tracking are to monitor data downloads and keep
track of applications developed;
–– For mobile applications, an equal number of respondents reported Android and iOS
applications; and
–– Sixteen respondents reported a total of almost 266 million Application Programming
Interface (API) calls per month.
• Costs and benefits of providing open data:
–– The top five types of costs associated with providing open data are staff time to
update, fix, and maintain data as needed; internal staff time to convert data to an open
format; staff time needed to validate and monitor the data for accuracy; staff time to
liaise with data users and developers; and web service for hosting data;
–– Almost 90% (43 or 89.4%) of respondents could not quantify how much time is spent
on any of these activities;
–– There was limited information regarding the actual labor required from specific staff
in the organization and the costs associated with open data; and
–– The top three benefits experienced by survey respondents are increased awareness of
their services, empowerment of their customers, and encouragement of innovation
outside of the agency.
• Opportunities and challenges:
–– Almost 70% (33 or 69.6%) of respondents engage or have a dialogue with existing
and potential data users and reusers;
–– Twenty-five or 75.8% of respondents engage data users and reusers to obtain feed-
back on data anomalies and data quality issues. Twenty-four or 60% of the respon-
dents use face-to-face events to engage these groups;
–– The organizational impacts on the agency resulting from opening the data ranged
from increased transparency to better and more accurate internal data to lower costs
to provide information. The majority of negative impacts were related to resources
required to maintain an open data program;
–– Impacts on the customer were numerous, including better and more accessible infor-
mation for customers; better perception, visibility and awareness of services, and
improved customer satisfaction;
–– In terms of impacts on the public, creating and improving access to additional
and higher quality public services was mentioned, along with improving public
perception/image of transit, making transit more competitive, providing better
regional coordination of services, encouraging innovation, and providing a better
transit experience;
–– The impacts on the private sector are primarily providing business/commercial and
development opportunities, including new and expanded companies (e.g., creating
a new ecosystem of private entrepreneurs); enabling innovation and the creation of
applications that may not have been created by the public sector; and adding value to
existing public services; and
–– Challenges were noted by survey respondents in five areas: (1) resources and orga-
nizational issues; (2) data quality and timeliness issues; (3) standards and formatting
5
issues; (4) marketing issues relating to making the open data known and addressing
branding issues; and (5) technical issues.
Several conclusions can be drawn from the results of the synthesis project, including:
7
chapter one
INTRODUCTION
PROJECT BACKGROUND AND OBJECTIVES This synthesis documents but is not limited to the follow-
ing five major elements.
The primary focus of this synthesis is on the current state of
the practice and policies in the use of open data for improved • Characteristics of open transit data
transit planning; service quality, customer information, and –– Reasons for choosing to provide open data
customer experience; implications of open data and open –– Standards and protocols for providing open data
documentation policies; and their impact on transit agencies –– Underlying technology used to generate open data
and public support. In addition, successful practices in open • Legal and licensing issues and practices
data policy, use, and protocols and licensing in the United –– Legal and licensing issues
States and abroad are documented. –– Public disclosure practices
• Uses of open data
Within the past 5 years, more and more transit agencies –– Applications
have made schedule and real-time operational data available to –– Decision-support tools
the public. “Open data” provide opportunities for agencies to –– Visualizations
inform the public in a variety of ways about a transit agency’s • Costs and benefits of providing open data
services. For example, there is significant value to having web- • Opportunities and challenges
based and mobile applications that are developed by people –– Techniques for engaging users and reusers of data
outside the transit agency—these applications allow riders to –– Challenges associated with providing open data
navigate public transit systems more easily. In this example, –– Impacts on transit agencies and the public and pri-
the agency does not bear the costs associated with the applica- vate sectors
tion development and encourages innovation in terms of how
to present transit information to the public. Open data are being A literature review, survey of selected transit agencies and
used to create enterprise-facing, decision-support tools that other stakeholders, and detailed case examples or profiles
can help to optimize operations in real time, improve mainte- were done to report on the state of the practice, including inno-
nance, and support capital planning and programs. However, vations, lessons learned, challenges, and gaps in information.
in addition to opportunities, open data present challenges for
agency operations and other business functions. A review of the relevant literature in the field is combined
with surveys of selected transit agencies and other appropri-
The use of open transit data was first reported by TRB in ate stakeholders to report on the current state of the practice.
2011’s TCRP Synthesis 91 (1). At the time that synthesis was Based on survey results, four case examples were developed
prepared, there was a keen interest on the part of several U.S. to describe innovative and successful practices, as well as
transit agencies to provide open data, and legislation had just lessons learned and gaps in information.
been passed that governed the required open data in parts of
Europe. In addition, local, state, and federal requirements for TECHNICAL APPROACH TO PROJECT
open data (including the Open Government Directive issued
by President Obama in 2009) were under development in the This synthesis project was conducted in five major steps.
United States. Since that report was prepared, many more First, a literature review was performed to identify the char-
agencies have embraced and provided open data, and have acteristics of open data; legal and licensing issues and prac-
realized benefits well beyond what was originally thought. tices; uses of open data; and the costs, benefits, challenges,
This development occurs within a context of agencies requir- and opportunities resulting from providing open data. See
ing opening of their data. This synthesis documents the cur- the References section for a list of sources.
rent state of the practice and policies in the use of open data
for improved transit planning, service quality and customer Second, a survey was conducted to collect information on
information, and implications of open data and open docu- factors such as the reasons for choosing to provide open data;
mentation policies; and their impact on transit agencies and standards and protocols being used; public disclosure prac-
the public. It focuses on successful practices in open data tices; customer applications and other data uses; techniques
policies, use, protocols and standards, and licensing. for engaging actual and potential users and reusers of the data;
8
and challenges associated with providing open data. In addi- of open data (e.g., nontransit applications), and open data
tion, information regarding the impacts on transit agencies and applications statistics.
the public and private sectors was explored through the survey. • Chapter six presents information about the costs, ben-
The survey instrument is shown in Appendix A, and the list of efits, challenges, and opportunities resulting from pro-
agencies and staff titles responding to the survey is shown in viding open data.
Appendix B. • Chapter seven presents case examples from selected
agencies that have significant experience with open data.
Third, the survey results were documented and summa- • Chapter eight summarizes the results of the synthesis
rized. Fourth, telephone interviews were conducted with key and presents conclusions.
personnel at four agencies and organizations that have signifi- • An Abbreviations and Acronyms section lists those
cant experience with providing open data; case examples from elements.
those four agencies are presented in chapter seven. Finally, • The References section contains the list of literature
the results and conclusions were prepared and documented. that was reviewed and referred to in this report.
• Appendix A contains the survey instrument.
• Appendix B shows the list of responding agencies and
REPORT ORGANIZATION
staff titles.
This report is organized as follows: • Appendix C provides supplemental information regard-
ing conferences, meetings, and agency events dedicated
• Chapter one presents the goals and objectives of the syn- to open transit data.
thesis and describes the technical approach used to con- • Appendix D shows the total annual ridership for each
duct the project. responding agency.
• Chapter two summarizes the literature review. • Appendix E contains website addresses for agency
• Chapter three describes the characteristics of open data, license agreements and terms of use.
including the reasons for choosing to provide open data; • Appendix F shows examples of customer information-
underlying technology used to generate the data pro- related applications that are available through agency
vided to the public; and standards, protocols, and for- websites.
mats being used to provide open data. • Appendix G provides examples of applications noted
• Chapter four describes the legal and licensing issues through the Transport Innovation Deployment for Europe
and practices, and public disclosure practices. (TIDE) project.
• Chapter five discusses the uses of open data, including • Appendix H has a list of Transport for London (TfL)
customer applications, decision-support tools, other uses open data available from the London Datastore.
9
chapter two
LITERATURE REVIEW
The literature review revealed that a wide variety of reports, basic principle that the data are free to use, reuse, and redis-
papers, articles, and press releases have been written about tribute. According to the Open Data Institute (2),
open data in transit. The literature review is divided into the
following sections, including the five elements identified in Open data is information that is available for anyone to use, for
any purpose, at no cost. Open data has to have a license that says
chapter one:
it is open data. Without a license, the data can’t be reused. The
license might also say:
• Characteristics of open data, including standard(s) used
for open data • that people who use the data must credit whoever is publish-
• Legal and licensing issues ing it (this is called attribution); and
• Uses of open data, including customer and other • that people who mix the data with other data have to also
release the results as open data (this is called share-alike).
applications
• Costs and benefits In addition, the Open Data Institute describes what consti-
• Opportunities and challenges, including: tutes “good” open data:
–– Engaging existing and potential data users and reusers
–– Impacts of open data • Can be shared easily;
• Is available in a standard, structured format;
The first step of the literature review was to conduct an • Has guaranteed availability and consistency; and
online Transport Research International Documentation • Is traceable, through processing, back to where it
(TRID) search. This TRID search yielded 30 documents, the originates.
most relevant of which were reviewed and used as input for
this report. The second step was to obtain and review articles, Finally, the conditions of open data are defined as fol-
press releases, and website information directly from agen- lows (3):
cies and open data organizations across the world. The third
step was to review research reports from the FTA, FHWA, • Complete—taking privacy into consideration;
and TCRP. Finally, other papers and articles were obtained • Primary—being as close as possible to the source;
from a variety of sources, including the following: • Actual—as automatic as possible in the exchange;
• Accessible—in digital format for as many users as pos-
• TRB Annual Meetings; sible and as many purposes as possible;
• APTA conferences and publications; • Readable machine to machine;
• Intelligent Transportation Society of America (ITSA) • Free—mostly accessible for no cost and with no restric-
Annual Meetings; tions for use; and
• Intelligent Transportation Society World Congresses; • In an open format and to follow a standard [e.g., Exten-
• Open Data Organizations; sible Markup Language (XML)].
• European Commission project documentation; and
• Internet searches. Organizations that provide guidance about making a busi-
ness case for open data; how to open the data; engaging with
The sources used in this synthesis are listed in the Refer- data consumers and reusers; licensing; and the effects of open
ences section. Supplemental information regarding confer- data include:
ences, meetings, and agency events dedicated to open transit
data can be found in Appendix C. • Open Knowledge Foundation
• Sunlight Foundation
• Open Data Institute
INTRODUCTION TO OPEN TRANSIT DATA • The Open Data Foundation
• Project Open Data
Before describing the types of data that are being released • The Open Data Center Alliance
by transit agencies according to the literature, it is important • Open Mobile Alliance
to define the term “open data.” Many definitions include the • Public Data Transit Community
10
On January 21, 2009, President Obama issued an Open states.” In February 2012, Claudia Schwegmann reported
Government Directive, which directed executive depart- that developing countries are just now beginning to open
ments and agencies to take actions that support transparency, their data, including public transport data (6).
participation, and collaboration:
The U.S. DOT supports and promotes open transportation
Transparency promotes accountability by providing the public data, as evidenced in its Open Government Plan for 2012 to
with information about what the Government is doing. Participa- 2014 (7). As shown in Figure 1, five initiatives in this plan
tion allows members of the public to contribute ideas and exper-
include several open transit data-related activities.
tise so that their government can make policies with the benefit
of information that is widely dispersed in society. Collaboration
improves the effectiveness of Government by encouraging part- The literature reviewed for this synthesis included several
nerships and cooperation within the Federal Government, across other concepts that are directly related to open transit data,
levels of government, and between the Government and private
including “open data movement” and “open transport.” The
institutions (4).
open data movement and its relationship to public transit is
In 2003, the Digital Agenda for Europe, Public Service defined by Eros et al. as coming
Information Directive was issued, significantly affecting the
from philosophical principles of open government, transparency,
release and “re-use of public sector information,” includ-
and accountability, and practical motivations related to increased
ing open public transport data (5). Through this directive, returns on public investment, downstream wealth creation, more
open data are “obligatory for all [European Union] member potential brainpower brought to examining complex problems,
11
and enhanced public policy and service delivery. For transpor- data standards are freely and publicly available, with no required
tation, the open data movement has fundamentally shifted how use agreements. Open source software features universal access
agencies communicate with users as an increasing number move and redistribution rights via free licenses to a product’s design,
from tightly controlling data and the products derived from them, code, or blueprint (9).
towards generating and releasing data with minimal control over
the end products. In transportation, the confluence of open data,
GTFS [General Transit Feed Specification] (originally developed
From the open transit data perspective, “while not all
by Google and containing static schedule information for tran- transport data lends itself to be open, there are benefits to
sit agencies, including stop locations, route geometrics, and stop releasing some data, such as public transit service informa-
times), and increasingly ubiquitous mobile computing, sensing, tion, to achieve economies of scale in generation of third-
and communication technologies (epitomized by the ‘smart-
phone’), has spurred numerous technical innovations from a range party applications to support wider and more efficient use of
of actors. Tools include applications that assist with trip planning, a transit network” (9).
ridesharing, timetable creation, data visualization, planning analy-
sis, interactive voice response, and real-time information provi- An overall history of the open data movement between
sion. Together, GTFS and GTFS-RealTime (containing real-time
information related to vehicle positions, service alerts, and trip 2006 and 2011 is shown in Figure 2, which shows “the evo-
updates such as delays, cancellations, etc.) enable transit agen- lution and role of [application] API in building influential
cies and operators to engage the power of the software developer and essential tools and applications for web users, plus the
community and citizenry more generally to create new forms of
demographics of public data usage around the world” (10).
information services about public transportation (8).
This evolution had a significant impact on the introduction
Open transport, according to the World Bank Open Trans- of open transit data.
port Team,
A timeline of transit open data between 2005 and 2012, as
defines the next generation of tools and methodologies for man-
detailed by Francisca Rojas, is shown in Figure 3.
aging and planning transport systems in resource-constrained
environments. Open Transport is defined by three principles: As agencies saw that disseminating transit information by elec-
Open data standards, open source software and open data. Open tronic means was valuable to their customers, they began to create
12
the points of interaction, or Application Programming Inter- data sets to develop technical skills, and improve and
faces (APIs), needed for third-party software developers to access expand access to transit information;
dynamic, real-time data feeds of bus and train location informa-
tion. The solid-line boxes in Figure 3 indicate when agencies • Agencies willing to adapt intelligent transportation sys-
made available real-time data feeds to the public. This process tems used for internal management of operations into
gained momentum in 2009 and by the end of 2011 all major data formats suitable for public disclosure; and
transit agencies in the United States had 1) posted their routes • Open data champions who built networks within agen-
and schedules on Google Maps, 2) publicly released GTFS files
of static transit information, and 3) created APIs for access to cies to share experiences and seek advice on technical
real-time information by third-parties (11). and policy aspects of data disclosure.
Because this timeline ends in 2012, it does not display events The literature reviewed presented the questions that many
in 2013 or 2014, such as Metropolitan Transportation Author- transit agencies ask as they are determining whether or not to
ity’s (MTA’s) completion of the Bus Time implementation open their data (12):
and release of GTFS-realtime data.
1. What data should we provide?
Rojas suggests that TriMet, one of the U.S. transit agen- 2. How do we get the data?
cies to pioneer the release and use of open data, started the 3. Will the app developers use our data?
spread of open transit data to some of the largest transit agen- 4. Will the developers provide a reliable service?
cies because of the following factors (11): 5. What are good examples of apps?
6. Do we need to produce an app ourselves?
• A de facto data standard offered by the GTFS format 7. What standards should we use?
facilitated the process for agencies to integrate sched- 8. How should we ensure data quality?
ules and routes into Google Maps (now Google Tran- 9. Should we preprocess our data?
sit), and for broader public disclosure of those same 10. Can we charge for our data?
data sets;
• Demand from technologically savvy, networked tran- A tool developed by the Center for Technology in Gov-
sit riders for customized transit information because ernment at the University of Albany “provides a series of
of the wide adoption of short message service (SMS)- questions that take agencies through a review of their exist-
enabled cell phones and location-aware smartphones, ing and proposed open government plans to quickly assess
which enabled riders to view real-time information while the public value of their open government initiatives” (13).
traveling; This tool, called the Open Government Portfolio Public
• Software developer communities that were eager to learn Value Assessment Tool (PVAT), evaluates each open gov-
how to code mobile applications and sought available ernment initiative using a “multistep question process, which
13
14
devices. In this magnitude, static data are all transit Figure 7 shows how data are open or closed based on these
schedules/routes/stops, and real-time data are all esti- four characteristics. According to the McKinsey Institute,
mated arrivals/vehicle positions/service alerts; and
• “Faucet,” which is a precise subset of transit data and is Open data sets also are defined in relation to other types of data,
suitable for mobile devices. In this magnitude, the data especially big data. ‘Big data’ refers to data sets that are volu-
minous, diverse, and timely. Open data are often big data, but
are specific. For example, static data could be “Stop ‘small’ data sets can also be open. We view open and big data as
ID 10 is served by Route 5,” and real-time data could be distinct concepts. ‘Open’ describes how liquid and transferable
“It is 2 minutes until Route 5 bus arrives at Stop ID 10.” data are, and ‘big’ describes size and complexity of data sets.
The degree to which big data is liquid indicates whether or not
the data are open (26, p. 4).
The McKinsey Global Institute characterizes open data in
terms of four characteristics (26):
The New Zealand government described a five-level model
for open data.
• Accessibility: A wide range of users is permitted to
access the data;
The World Wide Web Consortium (W3C) has developed a five
• Machine readability: The data can be processed auto- star model to describe different characteristics of open data, and
matically; its usefulness for people wishing to reuse it. It is being used glob-
• Cost: Data can be accessed free or at negligible cost; ally as a model for assessing data readiness for re-use. Apply-
ing this five star data model along with metadata standards will
and
result in well understood and ‘mashable’ datasets (datasets easily
• Rights: Limitations on the use, transformation, and dis- joined together to create a new dataset). The three star level is
tribution of data are minimal. considered the minimum standard for release of government’s
15
FIGURE 6 U.S. City Open Data Census—State of Open Transit Data (22).
16
public data for re-use: non-proprietary, machine-readable, and • Silver: data is supplied with a high bandwidth and guaranteed
accessible via the web, and licensed for reuse (27). delivery. The extent of this level is determined unilaterally
but in consultation with end users. For this service, a fixed
fee will be charged.
Those five levels are as: • Gold: data is supplied by a mutual agreement. The fee is
dependent on the contents of the mutual agreement. Which
1. Data are visible and licensed for reuse but require con- may be higher than in silver (for example, because 24 * 7
siderable effort to reuse: 1 star, on the web with an open service requested) or lower cost (for instance as the end user
guarantees transmission of safety related information).
license.
2. Data are visible, licensed, and easy to reuse but not
A more multimodal perspective on open data types was dis-
necessarily by all: 2 star, machine-readable data.
cussed by Lee et al. (30). “Organizations of the government
3. Data are visible and easy to reuse by all (not restricted to
using specific software): 3 star, nonproprietary formats. are recently supporting the common use of public information
4. Data are visible, easy to use, and described in a stan- by preparing plans about information opening. Therefore, if
dard fashion: 4 star, Resource Description Framework public information is open and OPEN-API technology is intro-
(RDF) standards. duced, then it would help promote the use of public informa-
5. Data are visible, easy to use, and described in a stan- tion and improve the quality of information.” The open data
dard fashion, and meaning is clarified by being linked they included covered the following:
to a common definition: 5 star, linked RDF.
• Traffic flow information
In describing how Bay Area Rapid Transit (BART) creates • Traffic control information
value with open transportation data, Timothy Moore, web ser- • Incident information
vices manager, uses the graphic in Figure 8 to “demonstrate • Closed-circuit television (CCTV) information
the flow of information in an open data ecosystem. Informa- • Static and real-time transit information
tion flows in a continuous path clockwise from Customers to • Bus arrival prediction information
BART to Data to Developers” (and back to customers) (28). • Bus transfer information
Rojas further defines these entities: the transit agency is the • Parking lot location operating information
“discloser”; the developers are the “intermediaries”; and the • Parking lot guidance information
customers are the “end users” (11, pp. 28, 32, and 33).
STANDARDS AND FORMATS USED
Hans Nobbe suggests the data can be provided at different FOR OPEN DATA
levels of service, while still being open and free (29):
The use of standards in providing open transit data is critical
• Bronze: data is supplied with a limited bandwidth. There is and discussed in many pieces of literature. Kaufman identi-
no guarantee that data is supplied in time. The capacity of the
system is maximized. fies the basic standards and file formats for open transit data,
as shown in Table 1 (23). Barbeau reports that successful open
data formats are:
• GTFS (https://developers.google.com/transit/gtfs/
reference)—The General Transit Feed Specification,
originally developed by Google, contains static schedule
information for transit agencies, including stop locations,
route geometries and stop times. “GTFS consists of a
package of comma-delimited text files, each of which
contains one aspect of the transit information and a set
of rules on how to record it: six mandatory files (agency,
stops, routes, trips, stops times, and calendar) and seven
FIGURE 8 Open data ecosystem (28). optional files (calendar dates, fare attributes, fare rules,
17
TABLE 1
COMMON DATA STANDARDS AND FILE FORMATS
shapes, frequencies, transfers and feed info)” (8, p. 1). later) but is designed for real time and thus is more com-
“The market success of GTFS has led to an unprec- plex than some other standards (33).
edented adoption rate by transit agencies as shown by • TCIP (http://www.aptatcip.com/)—The Transit Com-
total unlinked passenger trips for agencies with GTFS” munications Interface Protocols is an APTA standard
(32, p. 1). For schedule data, GTFS adoption has substan- with components that deal with passenger information
tially outpaced the Transit Communications Interface and scheduling, as well as a host of other business divi-
Protocols and Service Interface for Real Time standards sions in transit. This standard was developed in the early
in North America due to its relative ease of use for transit stages of the transit information systems era; early stud-
agencies to describe, implement and maintain data feeds ies encouraged its use for static and real-time data stan-
(34, p. 3). dardization. The standard’s complexity results from its
GTFS has evolved over the years to meet expand- attempt to account for all the various operational proce
ing requirements. The group that collaborates on these dures and service types offered by all transit agencies
changes is the GTFS Changes Group (https://groups. (33). “Some of the data elements needed for TCIP have
google.com/forum/#!forum/gtfs-changes). The rules since become part of the GTFS specification, including
governing this group and how it is managed can be items such as agency name, block, route identifiers, trip
found at https://groups.google.com/forum/#!searchin/ identifiers and other similar information” (33, p. 3).
gtfs-changes/welcome/gtfs-changes/C5dgsKGkpDA/ • NextBus (http://www.nextbus.com/xmlFeedDocs/Next
kyxN1DCS-dQJ. BusXMLFeed.pdf)—A number of transit agencies
• GTFS-realtime (https://developers.google.com/ use the NextBus XML API to deliver real-time arrival
transit/gtfs-realtime/)—GTFS-realtime contains real-time information.
information related to vehicle positions, service alerts,
and trip updates (including delays and cancellations). Other formats reported in the literature that are being used
• SIRI (http://www.kizoom.com/standards/siri/)—The Ser- for open transit data are as follows:
vice Interface for Real Time Information is a real-time
data standard predominant in Europe and making signifi- • Comma-separated values (CSV)—a file format used as
cant inroads into the U.S. market, notably at the Metro- a portable representation of a database. Each line is one
politan Transportation Authority (MTA) in New York. entry or record, and the fields in a record are separated
Recent change proposals to the SIRI standard include the by commas (34). Agencies using GTFS have commit-
definition of a structure for SIRI web services. The SIRI ted to producing and maintaining their schedule data
standard includes a component for schedule data (see in standardized CSV tables to display their system on
18
Google Transit’s trip planner and, increasingly, opening • Extensible Markup Language (XML) is more robust
these data to other third-party application developers than GTFS in its abilities to represent large complex
(32, p. 1). models, but the approach is more common in Europe
• Geo JavaScript Object Notation (GeoJSON) is a format and raises standardization challenges in the face of
for encoding a variety of geographic data structures. It hyperflexibility (33).
is a geospatial data interchange format based on Java
Script Object Notation (JSON). Wong et al. said of GTFS:
• Identification of Fixed Objects in Public Transport
(IFOPT) defines a model and identification principles The GTFS, first introduced in 2005, is the result of a project
between Google and TriMet in Portland to create a transit trip-
for the main fixed objects related to public access to pub- planner using the Google Maps web application. Because of the
lic transport (e.g., stop points, stop areas, stations, con- collaborative approach to its development, the specification was
nection links, entrances, etc.). IFOPT Standard builds designed specifically to be simple for agencies to create, easy for
on the TransModel Standard to define four related sub programmers to access and comprehensive enough to describe
an intricate transit system. GTFS identifies a series of comma
models (35). separated files which together describe the stops, trips, routes
• JavaScript Object Notation (JSON) is a data-interchange and fare information about an agency’s service. Google opened
and text format that is completely language independent the feed for general use in mid-2007 and it propagated widely as
agencies translated their transit schedules into the format. The
but uses conventions that are familiar to programmers of feed is the most used standard for static transit data exchange in
the C-family of languages (http://json.org/). the United States today (33, pp. 2–3).
• Network Exchange (NeTEx) is intended to be a general
purpose format capable of exchanging timetables for As mentioned, according to data from the GTFS Data
rail, bus, coach, ferry, air, or any other mode of public Exchange, as of March 11, 2014, data are available for 726
transport. It includes full support for rail services, and worldwide transit agencies (http://www.gtfs-data-exchange.
can be used to exchange (36). com/agencies), and 239 agencies’ feeds are available on http://
• Protocol Buffers—GTFS-realtime data exchange format code.google.com/p/googletransitdatafeed/wiki/PublicFeeds,
based on Protocol Buffers. Protocol Buffers are a language- a guide to GTFS data that was written by Wong (45). A GTFS
and platform-neutral mechanism for serializing struc- tool that can be used by small transit agencies was prepared
tured data (think XML, but smaller, faster, and simpler). by Williams and Sherrod (46).
The data structure is defined in a GTFS-realtime.proto
file, which then is used to generate source code to easily Pioneering Open Data Standards: The GTFS Story
read and write your structured data from and to a variety describes TriMet’s experience in helping develop GTFS
of data streams, using a variety of languages (37). and the effects that it has had on the transit industry (47,
• Resource Description Framework (RDF) is a standard pp. 126–128). In summary, this story recounts the combined
model for data interchange on the web (38). efforts of TriMet and Google in developing what would
• Representational state transfer (REST) is a distributed become a de facto standard for anyone to use to conduct
system framework that uses web protocols and tech- transit trip planning anywhere in the world. The impact of
nologies (39). GTFS was far reaching; 8 months after Google Transit was
• Really Simple Syndication or Rich Site Summary (RSS) launched, five more transit agencies were added. “Within
is a format for delivering regularly changing web con- a year, Google Transit launched with fourteen more transit
agencies in the United States and expanded internationally
tent (40).
to Japan” (47, p. 128).
• Simple Object Access Protocol (SOAP) is a method of
transferring messages or small amounts of information Those agencies that release their data through GTFS are
over the Internet. SOAP messages are formatted in XML required to provide the following data, at a minimum (48):
and are typically sent using HTTP (hypertext transfer
protocol) (41). • Name or identification of the transit agency(ies) provid-
• TransModel is the European Reference Data Model for ing the data;
Public Transport; it provides an abstract model of com- • Individual locations where vehicles pick up or drop off
mon public transport concepts and structures that can passengers;
be used to build many different kinds of public transport • Routes, which are defined as groups of trips that are
information systems, including for timetabling, fares, displayed to riders as single services;
operational management, real time data, and so forth (42). • Trips for each route, which are sequences of two or
• TransXChange (TxC) is the U.K. nationwide standard more stops that occur at specific times;
for exchanging bus schedules and related data (43). • Times that a vehicle arrives at and departs from indi-
TxC provides a means to exchange bus routes and time vidual stops for each trip; and
tables between different computer systems, together with • When service starts and ends, as well as days of the
related operational data (44). week when service is available.
19
Optional data types that can be provided through GTFS • SIRI-CM (Connection Monitoring): Provides real-time
are as follows: information about feeder and distributor arrivals and
departures at a connection point. Can be used to support
• Exceptions for when service starts and ends, and days “connection protection”
of the week when service is available; • SIRI-GM (General Message): Exchanges general infor-
• Fare information for a transit organization’s routes; mation messages between participants
• Rules for applying fare information for a transit orga- • SIRI-FM (Facility Monitoring): Provides real-time infor-
nization’s routes; mation about facilities
• Rules for drawing lines on a map to represent a transit • SIRI-SX (Situation Exchange): Provides real-time infor-
organization’s routes; mation about incidents.
• Headway (time between trips) for routes with variable
frequency of service; MTA and the Utah Transit Authority (UTA) use open SIRI
• Rules for making connections at transfer points between feeds. Of the list of SIRI messages, MTA Bus Time only sup-
routes; and ports VM, SM, and SX. UTA supports SM and VM (D.A.
• Additional information about the feed itself, including Laidig, Systems Engineering Manager, Metropolitan Trans-
publisher, version, and expiration information. portation Authority, personal communication, June 18, 2014).
Use of GTFS-realtime dictates that the following data, in APTA conducted a survey of member agencies in 2013
addition to what is provided through GTFS (static informa- to determine how transit agencies are providing static and
tion), are released (49): real-time information to customers (51). The survey results
show that GTFS was the most common format used by agen-
• Real-time update on the progress of a vehicle along a cies, followed by proprietary formats from companies that
trip (this is required information). This can specify a provide scheduling software.
trip that proceeds along the schedule; a trip that pro-
ceeds along a route but has no fixed schedule; and a trip Four out of ten agencies use a variety of other formats, and nearly
two out of ten are using an internal agency format. Only two out
that has been added or removed with regard to schedule; of the 75 respondents said they did not use formats and standards.
• Timing information for a single predicted event, either These tools and standards help agencies organize their routes and
arrival or departure (this is optional). Timing consists of schedules internally, but they can also be used to create value
delay and/or estimated time, and uncertainty; for customers. The data organized by these standards can drive
tools that customers can use to plan a trip, or they can create data
• Real-time update for arrival and/or departure events for streams that feed information to apps so customers can access
a given stop on a trip (optional); this information on the go (51, p. 8).
• Real-time positioning information for a given vehicle
(optional); Figure 9 shows the standards used by respondents to pro-
• An alert, indicating some sort of incident in the public vide static information. The 25 respondent agencies that pro-
transit network (including cause and effect) (optional); vided more than 25 million trips in fiscal year (FY) 2010 are
• Geographic position of a vehicle (required); and classified as large agencies, and those providing fewer than
• Identification information for the vehicle performing 25 million trips are smaller agencies.
the trip (optional).
According to APTA,
The SIRI standard (50) contains the following data ele-
ments (D.A. Laidig, Systems Engineering Manager, Metro- Looking at the split between large and smaller agencies, large agen-
politan Transportation Authority, personal communication, cies were more likely to use most of the listed formats, because
June 18, 2014): those agencies were more likely to use multiple formats than the
smaller agencies. A big majority—88%—of large agencies used
multiple formats for static data. Only 58% of smaller agencies
• SIRI-PT (Production Timetable): Exchanges planned did so. Large agencies were much more likely to use tools from
timetables [technology companies] than smaller agencies. Smaller agencies
• SIRI-ET (Estimated Timetable): Exchanges real-time were more likely to use a format in the ‘other’ category—these
agencies used a variety of platforms provided by smaller soft-
updates to timetables ware companies (51, p. 8).
• SIRI-ST (Stop Timetable): Provides timetable informa-
tion about stop departures and arrivals The use of both GTFS and SIRI is exemplified in OneBus
• SIRI-SM (Stop Monitoring): Provides real-time infor- Away, an application whose “primary function is to share
mation about stop departures and arrivals real-time public transit information with riders across a vari-
• SIRI-VM (Vehicle Monitoring): Provides real-time ety of interfaces” (52). Iley [in OneBusAway Application
information about vehicle movements Suite (53)], describes OneBusAway as
• SIRI-CT (Connection Timetable): Provides timetabled
information about feeder and distributor arrivals and an open-source transit information software system, including sev-
departures at a connection point eral mobile apps, that was originally developed at the University of
20
FIGURE 9 Results of APTA survey: Standards used to provide static data (51, p. 8).
Washington and deployed in the Puget Sound area of Washington adapted for paratransit data collection. We found that GTFS allows
state. OneBusAway leverages the GTFS data format for sched- for the incorporation of rich and useful metadata in a structured
ule transit data, but the original Puget Sound deployment did not way. By using the GTFS data with Open Street Maps or Google
use a standardized interface for sharing real-time transit data with maps, we created some of the first comprehensive visualizations
mobile apps. As part of their real-time Bus Time API pilot project of the Nairobi matatu paratransit system for the public and plan-
in early 2011, Metropolitan Transportation Authority (MTA) in ners. By trying to fit aspects of the paratransit system into a GTFS
New York leveraged the OneBusAway software to build their own format, however, it also became more clear where the fit is often
transit information system. MTA implemented a modified ver- hard to make because the standard was developed for planned for-
sion of SIRI in their OneBusAway server to share their data with mal transport system with fixed stops and schedules (even if they
mobile app developers. In 2012, MTA [moved] on to the second are not always strictly adhered to) and not the demand responsive
step of deploying the same technology to other NYC boroughs. and flexible paratransit system. Overall, it appears that modifica-
tions need to be made to GTFS to account for key differences
between paratransit and more formal, planned systems.
In 2014, MTA completed its full five-borough rollout. Cur-
rently, OneBusAway provides real-time transit information in Eros et al. (8, pp. 13 and 14) state that
the Atlanta, New York City, Puget Sound, and Tampa regions.
many cities across Latin America, Africa and Asia share this
Hazarika (54) describes the characteristics of two stan- predicament; research indicates that flexible transport services
dards, SIRI and TransXChange (TxC), that are used in the constitute more than 90% of transit trips in cities such as Algiers.
United Kingdom to integrate various sources of public transit In Mexico a work-around was found by creating a variant to
the GTFS feed based on defining fixed stops at regular intervals
data. This is based on a survey conducted of “Local Authori- combined with the possibility for users to assess travel times
ties (LA’s), Passenger Transport Executives (PTE’s) and bus and connections from any point between stations. Headway esti-
operators about their understanding, experiences and invest- mates, based on existing knowledge (including vehicle counts
ment plans for these standards.” This reference describes and speed data) substituted for schedules. Teams working in two
cities, Manila and Dhaka, also had to deal with this challenge.
“the future growth/usage of the key features of TxC and SIRI Like Mexico City, Manila chose to avoid schedules, instead pro-
against their customer’s industry segments” (54, p. 4), and viding headway estimates for their jeepneys. In terms of stop
shows the key success factors (KSF), strengths, and weak- locations, the Dhaka team included stop location based on where
nesses of SIRI and TxC (54, p. 33). the bus stopped during the data collection ride. Manila’s stops
were interpolated every 500 meters along the route.
STANDARDS FOR OPEN PARATRANSIT DATA This discussion of standards for incorporating paratransit
data directly relates to the use of GTFS for integrated trip plan-
One issue associated with using GTFS is how to portray para- ning. York Region Transit outside Toronto, Ontario, “is saving
transit services within this format because it was developed significant money having customers use fixed route for a por-
primarily for fixed-route transit service. Chambers, with Ride tion of their paratransit trip” (information from Rajeev Roy,
Connection in Portland, Oregon, addresses this issue in his pre- Manager, Transit Management Systems, The Regional Munic-
sentation (55). He answers “yes” to his statement on page 7: ipality of York-Transportation and Community Planning).
“If a transit service can’t be described in the GTFS, does it
exist?”
APPLICATION PROGRAMMING INTERFACES
Further, two other cases show how to use GTFS to describe
Before leaving the standards discussion, it is important to
paratransit or demand-response service. First, Klopp et al.
draw the distinction between standards and an API. Accord-
(56) describe, in their Nairobi study,
ing to Open Data Handbook Documentation, an API is
“A way computer programs talk to one another. [It c]an be
how paratransit networks can be mapped along with the collection
of important transit data using off-the-shelf mobile phone technol- understood in terms of how a programmer sends instructions
ogy. We also demonstrate the utility of the GTFS format when between programs” (57).
21
FIGURE 10 Results of APTA survey: APIs used to provide real-time information (51, p. 5).
As discussed, APTA’s survey examined the use of APIs. For example, of the 50 reasons public agencies do not
release data, there are several related to legal, privacy, and
Two-thirds of those agencies with real-time information provide misuse issues (61), including the following.
an interface so app developers can utilize that information inde-
pendent from the agency. Providing an interface for third-party
applications allows developers to find innovative ways to pro- • We can’t legally do that.
vide this information to transit customers. Just more than one- • It will be misunderstood/misused.
third [37%] of all agencies provide an application programming • If we share our data/code, we’ll be hacked.
interface (API) for third-party developers. APIs allow third- • It might be presented in ways that result in people mis-
party applications to read and display data provided by transit
agencies. The most common API used is NextBus, followed by understanding it. The media will misreport.
GTFS-realtime (51, p. 1). • People don’t understand my data. It’s complex/magical/
for experts.
Real-time information is provided using the APIs, as shown • The data source is a mess.
in Figure 10. • The data might have errors or mistakes and could mis-
inform the public.
• Privacy.
LEGAL AND LICENSING ISSUES
As shown in the discussion of the survey results, only one
In the Intelligent Transportation Systems (ITS) community, dis- survey respondent (of 67) has experienced a legal issue aris-
cussions regarding open data started 10 years ago at the 2004 ing from providing open transit data.
ITS World Congress with “An Open Platform for Telematics”
(58). Although the paper describes the technical aspects of McCann addresses a few of these legal concerns (61).
telematics (defined as “the wireless provision of information For example, the concern that “people might sue us” can be
and services to vehicles”), it raises issues about the functions addressed by making “a policy that balances the public interest
and responsibilities among vehicle manufacturers, wireless in accessing the data with the privacy concerns and stick[ing]
carriers, information service providers, and call centers. These to it.” Another example is “we don’t think it would be good
issues lead the reader to consider the legal and licensing rami- PR to open this.” This can be addressed by being “more pro-
fications of the development and implementation of the open active about good PR. Explain what the data means and why
platform. you are opening it.”
When transit agencies began to consider providing their Several open data organizations provide guidance regard-
data openly, some became concerned about legal issues, ing legal issues and licensing samples. For example, Open
including “misrepresentation of the agency, logo usage and Data Commons presents sample legal tools and licenses in
brand identity” (33). In addition, agencies releasing data “Open Data Commons Public Domain Dedication and License
often are concerned about lawsuits and bad press (59). Other (PDDL)” on http://opendatacommons.org/licenses/pddl/. The
risks cited in the literature include “legal exposure due to the Open Knowledge Foundation provides numerous legal and
lack of accuracy of data, loss of advertising revenue on the licensing resources on http://opendefinition.org/licenses/.
agency homepage (if Internet traffic is directed to other sites,
such as Google Transit, that provide transit services), and In terms of open transit data, many agencies have devel-
loss of control of dissemination of transit service informa- oper license agreements and terms of use on their websites.
tion” (60, pp. 5–6). Several examples are shown in Appendix E.
22
Timothy Moore, with BART, suggests that a simple license available on agency websites or can be found on developer
may be the most appropriate for open transit data (62). BART’s sites. Examples are provided in Appendix F.
license has the following characteristics:
An application that provides real-time information for
• Short + sweet: 258 words MTA buses in New York City is called Bus Time. The applica-
• We reserve the trademark tion was piloted with route B63 on February 1, 2011, and was
• Data provided ‘as is’ and ‘as available’ fully deployed citywide in 2014. Bus Time displays bus loca-
• You don’t have to sign anything (62). tion and distance from stops, not time-based arrival predic-
tions, on mobile platforms (64, 65). Bus Time provides open
Antrim and Barbeau (60), and The Finnish Transport data between the MTA and software developers and custom-
Agency (63) report that open transit data agreements gen ers, is based on open standards [between hardware compo-
erally contain the following statements: nents, between bus and server, and between server and other
MTA/New York City Transit (NYCT) systems], and uses open
• The agency reserves the rights to its logo and all trade- source software (software code [OneBusAway] and APIs).
marks. These marks are an indicator used for official Figure 11 shows how Bus Time works. Bus Time has had a
information from the agency only. positive impact on developers, with several apps developed:
• The data are provided without warranties. interactive phone application, arrival time predictions, and
• No availability guarantees are expressed or implied. smartphone apps.
• The agency retains full rights to the data (60).
• The license is free of charge and is made between the Samtrafiken, a nonprofit organization owned by 34 pub-
[agency] and a licensee. lic transport operators and authorities in Sweden, has been
• The licensee may freely copy and deliver; modify and a champion in open transit data. Their innovation manager,
use (e.g., for a commercial purposes); and combine and Elias Arnestrand, recognized that when developers started
use as part of an application or service. screen-scraping public transport data from various websites
• It is not mandatory, but we highly recommend, that the to create apps, agencies that “own” that data needed to con-
name of the licensor (e.g., Finnish Transport Agency) sider releasing it. In addition, he realized that agencies could
is shown (63). not keep up with the number of platforms that mobile devices
were using. Further, apps that used screen-scraping technol-
ogy could bring an agency’s information systems to a halt
APPLICATIONS because of the volume of data requests.
The literature has numerous examples of open data applica-
[In] 2009 [we] created Trafiklab [http://www.trafiklab.se/]. It
tions. Examples of applications related to customer informa- was formulated as an initiative to start to work with open data
tion that are driven by open data are rapidly evolving, and are and open APIs. We wanted to make it simple to access this data
23
and even make it fun for our industry and third party develop- set designed for routing. TriMet’s customization of OTP “uti-
ers to discuss these issues. [T]his industry initiative [was] all lizes all open data including OpenStreetMap, GTFS, and the
together on one site instead of each public transport entity cre-
ating their own channel, data sources, set of agreements, and USGS National Elevation Dataset” (68, p. 10).
different types of APIs. This would have created a huge burden
on third parties that wanted to access the complete set of public As mentioned, in 2012, APTA conducted a survey of its
transportation data and services in Sweden. We looked at the membership regarding customer information. A portion of
external drivers that motivate developers [and recognized] that
developers were driven by finding challenge, the satisfaction of this survey related to open transit data (51).
getting their app to work, and the ability to showcase their work
to the greater public. [W]e focused on [these drivers] to help get Overall, a large percentage of agencies (80%) are providing static
this initiative successfully off the ground. data such as schedules, routes, and fares to customers in some
Today, our open APIs are a very important part of our strat- fashion. Around two-thirds participate in Google Transit, and a
egy in providing customers with relevant public transport infor- similar number make their data available to third-party develop-
mation and services. For example, for public transportation in ers. Just over six in ten agencies said that they make their static
Stockholm, more than 50% of the requests come from services data available to third party apps. Large agencies were more likely
created by third parties. APIs are a marketing and distribution to encourage third-party activities—88% of those agencies make
channel for our public transportation information and services. their data available (51, pp. 10–11).
We’ve realized that APIs are the cheapest and fastest way to
build applications. And most importantly, APIs let third parties Figure 12 shows the percentage of APTA survey respon-
extend our products and services (66). dents that provide static data to third-party applications.
Another application that is based on open data is Open- Overall, just more than 40% of APTA survey respondents
TripPlanner (OTP), which was developed by OpenPlans. OTP had developers using their open static data; 68% of large
“provides a robust multi-modal, multi-agency trip planner. agencies reported developers using their static data. “Around
This tool allows for multi-modal travel as one of the trip plan- one-quarter of agencies surveyed indicated that app develop-
ning options for those looking to travel to transit via walking ers are using their real-time data. Forty-four percent of large
or bike” (67). This type of application greatly facilitates multi- agencies indicated that developers use their data and fourteen
modal/multiagency coordination. It also uses crowdsourcing; percent of smaller agencies indicated that this is the case [see
OTP uses OpenStreetMap (OSM), a crowd-source open data Figure 13]” (51, p. 5).
24
Antrim and Barbeau describe some of the types of appli- • Interactive Voice Response (IVR)—applications that
cations that use open transit data (60): provide transit information over the phone by means of
an automated speech recognition system.
• Trip planning and maps—applications that assist a • Real-time transit information—applications that use GTFS
transit customer in planning a trip from one location to data along with a real-time information source to provide
another using public transportation. Examples include estimated arrival information to transit riders. Examples
Google Maps, OpenTripPlanner, Bing Maps, HopStop, include OneBusAway, NextBus, and TransLōc.
MapQuest, and rome2rio.
• Ridesharing—applications that assist people in con- As mentioned in TCRP Synthesis 104 (69), open data are
necting with potential ridesharing matches. Examples being used to power electronic signs that display real-time
include Parkio and Carma (formerly known as Avego). information (see Figure 14 for an example). The digital signs
• Timetable creation—create a printed list of the agency’s originally developed by Mobility Labs have become com-
schedule in a timetable format: TimeTable Publisher. mercially available. That synthesis reported,
• Mobile applications—applications for mobile devices
that provide transit information. Examples include The widening world of open data availability is generating a
Google Maps, Transit App for iOS 6 and beyond, Nokia deluge of mobile apps designed to deliver transit information to
people on the move. But there is still an active—and growing—
Transport, RouteShout, and Tiramisu. market for static displays, linked—among other factors in the
• Data visualization—applications that provide graphic United States—to the increasing importance of transit-accessible
visualizations of transit routes, stops, and schedule data. location in real estate markets. The system is for use by building
owners—typically in lobbies—in any large metropolitan area pro-
Examples include Walk Score, Apartment Search, and
viding open transit data. It draws together data streams from dis-
Mapnificent. tinct agencies for presentation on large screens in multi-occupied
• Accessibility—applications that assist transit riders with residential and commercial properties.
disabilities in using public transportation. Examples
include: Sendero Group BrailleNote GPS and Travel In Europe, another vendor provides real-time transport dis-
Assistant Device (TAD). plays (see Figure 15) in private homes and commercial proper-
• Planning analysis—applications that assist transit pro- ties (70). Another example of information displays driven by
fessionals in assessing the current or planned transit open transit data is shown in Figure 16 (70).
network. Examples include:
–– OpenTripPlanner: Analyst Extension Wong described the use of GTFS to conduct several transit
–– Graphserver planning analyses (18). Table 2 “summarizes the fixed-route
–– Regional Public Transportation GIS Architecture transit service measures from the [Transit Capacity and Quality
and Data Model of Service Manual] TCQSM and identifies those where GTFS
–– Transit Boardings Estimation and Simulation Tool feeds can be used as a data source. Two of the six measures can
(TBEST) be calculated exclusively with GTFS feeds and three others can
–– TransCAD 6.0 be calculated using GTFS feeds with supplemental data” (18,
–– GTFS-based Planning and Research pp. 3–4).
25
TABLE 2
DATA REQUIREMENTS IN TCQSM ANALYSES
26
FIGURE 17 Distribution of stop-route level daily headways for the SEPTA bus system (18, p. 9).
FIGURE 18 Length and number of stops for SEPTA bus routes (18, p. 9).
27
Mapping and data visualizations are effective tools for commu- Catalá et al. (75, pp. 32–41) provide several visualizations
nicating the robust data and information within the GTFS by using GTFS data, including a Marey graph that shows the
providing clarity to service levels which the data doesn’t natu-
rally produce. This can help articulate the impact of an agency’s distribution of PATH trips and vehicles per hour at a particu-
service changes and service cuts (75). lar stop (see Figure 23). The report also describes how GTFS
data can be used to calculate service and performance evalu-
Open transit (and other types of) data allow many differ- ation metrics. Finally, the report describes opportunities to
ent types of visualizations that graphically display data to be combine GTFS data with performance-related information
easily interpreted. One example is showing the accessibility (e.g., APCs).
of Welsh schools by public transport using a program called
Mapumental (76).
There are many other articles and reports about using open
In 2013, there was an interest in showing the shortest data to conduct analysis and provide customer information
time using public transit to get to any secondary school in (79–94), including an article about a visualization (see Fig-
Wales from any point in the country. Figure 20 shows this ure 24) that shows access to jobs on public transit. “Specifically,
visualization. it tells us how many jobs are accessible within 30 minutes—
using the key at right—from each location by public transit,
Time bands are in 15-minute increments, with red areas being during the 7–9 a.m. peak morning window. The darker green
those where schools are accessible within a 15-minute journey
(the centres of the red dots therefore also represent the positions
areas have the greatest accessibility to jobs; the lighter green
of the schools). Purple areas are those where journey time is areas have the least. The red lines show transit routes” (80).
between 1.75 and 2 hours, and the colours in between run in the Further, an initiative funded by Virginia Department of Rail
order you see bottom right of the map. White areas (much of and Public Transportation, is based on three factors: (1) open
which are mountainous and sparsely populated) are outside the
two-hour transit time (77).
transport data standards such as GTFS and OpenStreetMap;
(2) multimodal trip planning engines such as OpenTripPlan-
There are several visualizations using open data from the ner; and (3) web-based visualization tools such as the D3
Washington, D.C., area’s Capital Bikeshare (78) shown in (Data-Driven Documents; e.g., https://github.com/mbostock/
Figures 21 and 22. d3/wiki/Gallery) library (82).
28
FIGURE 20 Transit times by public transport to secondary schools in Wales, with an arrival time of 9:00 a.m. (77).
29
FIGURE 22 Three-dimensional visualization of trip history • Converting data to mainstream formats (e.g., GTFS),
Capital Bikeshare data. which may include an additional cost to purchase pro-
prietary scheduling software (which has modules that
30
automatically generate GTFS feeds) but allows for mini- is consistent with consultant estimates from the literature sug-
mal human input which minimizes errors in data transla- gesting a cost for small agencies on the order of $3,000–5,000
based on simple networks with limited stops. Several free tools
tion to GTFS; exist for agency use to generate and edit GTFS feeds including
• Web service for hosting data; a project funded by the Transportation Research Board’s IDEA
• Personnel time to update and maintain data as needed; program (33).
and
• Personnel time to liaise with data users However, the benefits of open transit data are cited in sev-
eral pieces of literature. Open transit data are often mentioned
One example of costs is provided in Wong et al. (33). as having some of the most significant benefits because of the
relationship of such data to riders. For example, Maltby (95)
Staff at BART discussed creating the original GTFS feed as an states that
internal staff project, originally in less than one day. Since then,
the agency reported spending less than $3,500 over the lifetime of one of the most successful and prolific areas where open data
a software product that was commissioned to specifically output has gone into mass public use has been through the multitude
GTFS from their existing scheduling database. This information of transport information apps that allow citizens to better plan
31
their rail, tube or bus travel, find a parking space, or evade road • Third parties have developed applications that an agency:
works, or which through Google Now even help predict plan- –– may not have thought of;
ning for journeys you are about to take. The Deloitte report for
Stephan Shakespeare’s independent review of Public Sector –– did not know that customers wanted; and
Information found there had been more than 4 million down- –– could not afford to procure or develop;
loads of apps using transport data in London alone. Beyond • Time saved by agencies in developing customized
the services delivered directly to the public, companies such as applications;
Placr are developing a thriving business aggregating transport • Crowdsourcing of data quality checking;
data from a variety of sources and providing this as a service
for app developers and organisations including Transport for • Better understanding of the demand for data and cus-
London. tomers’ needs; and
• Better understanding of what services can be made avail-
able commercially, and those that cannot and need to be
TCRP Legal Research Digest 37 (96), Traveline Informa-
funded to ensure inclusion and accessibility.
tion Ltd (TIL) (a partnership of transport operators and local
authorities formed to provide impartial and comprehensive
The Polis Position Paper (98) reports that
information about public transport in Scotland, England, and
Wales) (97), and Lee (23, p. 14), identified the following ben-
in addition to transparency, open data offers local authorities an
efits of open data: opportunity to meet other local transport policies, notably to pro-
mote sustainable travel choices, by enabling them to:
• Perception that an agency is more “open” and transpar-
ent than other agencies that do not share their data; • Relook at their own business and improve internal pro-
cesses, notably (i) seeking to understand which data a pub-
• “Halo effect” of being involved in innovative third-party lic body holds/gathers, (ii) thinking strategically about the
platforms and uses; value of data, (iii) improving the quality of data through
• Partnerships between agencies and their local devel- feedback from the developer community.
oper communities; • Improve the quality of service to users by harnessing the cre-
ativity of the apps developer community to produce innova-
• Higher quality (including accuracy) information for tive services based on one or more data feeds.
customers, resulting in improved customer service and • Reduce the cost of service provision, by allowing the local
experience, and potentially increased ridership; authority to focus on data acquisition and management
• No customer confusion about the origin or location of while the private sector takes on some of the burden of
disseminating information to users.
“official” (agency) information; • Promote economic development, especially for local infor-
• No additional customer services complaints; mation services providers (see previous point).
32
TCRP Synthesis 91 discusses some of the benefits just being ENGAGING EXISTING AND POTENTIAL
realized by open transit data (1, pp. 16–17). DATA USERS AND REUSERS
. . . Jay Walder, the New York Metropolitan Transportation Lewis (99) describes a number of engagement strategies being
Authority (MTA) Chief Executive Officer, stated that he hoped used by transit agencies, as follows:
‘that the tools that might be developed using the agency’s data
would help transform the city’s transit system into an even more
useful resource for residents much faster and cheaper than it could • MTA sponsors contests and hackathons, such as the MTA
do so itself.’ Further, at the end of 2009, the Massachusetts Depart- App Quest, which was their second challenge;
ment of Transportation (MassDOT) launched the first phase of • SEPTA has sponsored three hackathons; and
its open data initiative by releasing real-time information for five
bus routes. The data released to software developers included real-
• Madison Metro has developed “strong relationships with
time GPS locations of buses and arrival countdown information the local software community and universities.”
for every bus route. Within just one hour of releasing these data,
a developer built an application showing real-time bus positions. . . . Our advice would be to do your best to work with the devel-
Within two months, more than a dozen applications had been cre- opers ahead of time. Tell them your concerns and make sure to
ated including websites, smart phone applications, SMS text mes- impress upon them the needs of your riders and the need for it
sage services, and 617 phone numbers. All of these applications to be accurate.
were created at no cost to MassDOT or the Massachusetts Bay After all, third-party developers aren’t public transit employ-
Transportation Authority (MBTA). ees or vendors, so system employees need excellent communica-
tion and a shared understanding of the goals and considerations
One important set of benefits for the transit industry results in public transit if the project is to succeed. One of the students
from MTA’s Bus Time project, as shown in Table 3. Transit who developed a Madison Metro app continues to maintain it
even though he’s been hired by Google and moved to California,
agencies have traditionally procured proprietary solutions Rusch said. ‘It’s about fostering relationships with these people
rather than open solutions, which is what Bus Time is based on. because they don’t work for you directly and you’re not purchas-
ing a service from them,’ he said.
McHugh (47, p. 130) notes that
As mentioned earlier in the report, APTA’s survey regard-
at TriMet, our process is automated, so there is very little over- ing real-time information reported that app contests were
head. TriMet has four major service changes a year, in addition held by 8% of the APTA survey respondents. Twenty percent
to minor changes and adjustments in between. We may update
and publish our GTFS data as frequently as twice a month. Tri- of larger agencies used this engagement technique, as shown
Met has not incurred any direct costs for this specific project, in Figure 26.
except resource time, which is a very small investment in com-
parison to the returns.
Now that agencies have made GTFS freely available as open BART suggests that meeting with data users and reusers
data, hundreds of applications have spawned worldwide. We found encourages discussing their needs and generating ideas. BART
that by making our data easily and openly accessible, developers has used 10 engagement techniques (62):
are getting very creative and expanding its use. This is not only
beneficial because it expands the number of product offerings
available, but it can also have emergent economic benefits for • Meetups
developers and the communities that they live in. In addition, • One-on-ones
because the standard allows for interoperability between cities, • Google groups
applications built to serve one city can be readily deployed to
serve other cities for a much lower cost and effort than if the data • RSS feeds
wasn’t standardized. • E-mail lists
TABLE 3
PROPRIETARY VERSUS OPEN SOLUTION
33
• Developer challenges is held, and the winning apps may not be maintainable. For
• Hack days example, in 2012, the winning entry in an MTA app con-
• Media events test was later purchased by Apple. Thus, offering a reward
• Transit camps to maintain an app for at least a certain period of time may
• “Find a politician willing to get in front of a camera!” be helpful.
34
An example of open data creating a positive impact is the allows network operators (including municipal transit
District of Columbia DC Circulator Dashboard (circulator- systems) to improve capacity and throughput. Open data
already has been used to improve the design of transpor-
dashboard.dc.gov) (104), which was created to: tation networks. For example, when Moscow’s transit
authority was modernizing its public transit system in
• Advance government transparency and accountability 2012, it depended on open or shared data to determine
• Facilitate data-driven decision-making where commuters lived and where they worked in the
• Improve information availability to the community Russian capital. Officials used mobile phone location
• Engage the public on operations, routing, and safety data along with government information on the ages, pro-
• Identify areas needing improvement fessions, and home neighborhoods of workers who com-
• Highlight success for replication. muted to specific business districts. Moscow then used
this information to determine if greater investment was
The cost to develop the dashboard was minimal ($5,000 necessary in rail networks or if other services could do a
and two part-time staff over 6 months) and users need only a better job of meeting demand. Based on the research, the
city decided not to make a costly investment in a new rail
basic Internet browser and free plug-in to access it. line and instead met transportation needs by redrawing
100 bus routes. This limited Moscow’s upfront invest-
Other impacts are discussed by Eros et al., including the ment costs and ensured that services could be flexible
potential to enough to meet the needs of a shifting population. In
addition to avoiding more than $1 billion in infrastructure
. . . lower the barrier to innovation and enhance cross-fertilization costs, the new bus routes reduced average morning com-
of tools, approaches and ideas. In Mexico City, for example, almost mute times by three minutes per trip, saving ten hours of
immediately after the release of the GTFS feed, a number of apps travel time for each rider every year. In New Jersey, NJ
made use of these data to provide value to users. The suite of apps Transit released data on passenger flows to the public in
is already growing; all have been created by American, Canadian, 2012. Third parties quickly analyzed ridership at differ-
and Israeli developers as transfers of previously existing apps into ent times of day and were able to pinpoint underutilized
the Mexico City environment. The nature of the GTFS format rail stops, which led to more express trains and a saving
facilitates easy innovation transfers between different problems of six minutes from the average commuting time during
and contexts; as one city develops apps around a particular prob- rush hour.
lem, others can benefit with relatively little additional investment. • Optimized fleet investment and management. Real-time
This is not limited to public-facing apps—it includes data collec- open data about vehicle location and condition and bench-
tion as well as analysis and planning tools. However, more must be marking of vehicle cost and maintenance information can
done, particularly in expanding the reach of this open-data culture help operators purchase, deploy, and maintain fleets more
to ‘traditional’ transport planning tools (8, p. 13). efficiently.
• Better-informed customer decision making. Detailed open
Additional impacts wee noted by Suzanne Hoadley in her data about costs, reliability, environmental impact, and
presentation at the 2013 Annual Polis Conference held in other factors can allow customers to make better deci-
sions about which mode of travel to use and when (25,
Brussels, Belgium, on December 3 and 4, 2013: “Opening Up p. 32). . . .
Transport Data” (105). Polis is a network of European cities and
regions working together to develop innovative technologies Based on our analysis, we estimate the global potential eco-
and policies for local transport. Since 1989, European local and nomic value that could be unlocked through these open data
regional authorities have been working together within Polis levers in transportation to be $720 billion to $920 billion
to promote sustainable mobility through the deployment of per year. Optimized fleet operations (fuel savings, more effec-
tive maintenance, higher utilization) could enable as much as
innovative transport solutions. See http://www.polisnetwork. $370 billion a year in value. Improved infrastructure planning
eu/about/about-polis.) In some cities, the practice of opening and management and improved consumer decision making
up data has can each lead to value of as much as $280 billion per year (25,
p. 32).
. . . helped create a relationship of trust with app developers;
improved the quality of data itself; and genuinely harnessed the The challenges associated with opening public transit data
creativity of developer community (105, p. 3). . . . Further, the
perceived benefits are improve[d] internal processes, including are covered extensively in the literature. In “Open Data Pre
data inventory, data value and data quality; improve[d] quality of sents Opportunity, Challenge for Public Transit Systems” (99),
service by harness[ing the] creativity of developer community; many challenges were noted.
reduce[d] cost of information service provision; and promot[ion
of] local economic development (105, p. 4).
The innovation potential of open source data brings with it
associated challenges, not the least of which are cost and devel-
However, many challenges in opening data were mentioned oping a sound process for releasing data and maintaining over-
and will be covered later. sight of its use. Any public transit system interested in jumping
on the bandwagon of open data—or expanding existing open
The value of open data in the transportation area was data programs—could learn valuable lessons from the experi-
ences of other agencies making the transition. ‘You want to be
described as follows by Barbeau (25).
in a position where you’re giving information, you’re support-
ing ideas, and you’re encouraging creativity,’ said Ron Hop-
There are three major levers for unlocking value with open data kins, assistant general manager of operations for Philadelphia’s
in transportation: Southeastern Pennsylvania Transportation Authority (SEPTA),
which collects 1.5 million data points each month to measure
• Improved infrastructure planning and management. Open on-time performance. ‘The concern was, how do we manage all
data on passenger flows and door-to-door travel times that?’ (99).
35
In terms of challenges, James Wong, Landon Reed, Kari • Architectural in that some systems are not designed for
Watkins, and Regan Hammond discuss data integrity and publishing open data. Further, they may have been devel-
maintenance as critical issues (33). In terms of data integrity, oped ad hoc for a single operator, making it challenging
because open data can be used as input to traveler information to integrate the data among multiple agencies. Integrat-
tools, such as trip planners, “any inaccuracies in the GTFS ing data from multiple sources requires consideration
cascade down to inaccuracies among [these] tools. Data main- of system capacity, load and response time, frequency of
tenance relies on not only the agency maintaining a public file updates, nonstandard feed formats, different interpreta-
with up-to-date information, but also software developers tions of standard protocols, nonstandard referencing, and
who commit to update the information on their own proj- data complexity. An example is where location references
ects when the data is updated” (33, pp. 6–7). between one mode and another may differ. In addition, a
gap in data standards may challenge release and use of
Antrim and Barbeau (60) discuss the challenges in terms open transit data.
of resource requirements as follows. • Data coverage, quality, privacy/confidentiality, accu-
racy, and timeliness: data may need significant amounts
Transit agencies must make the decision whether to format and of “cleansing” or anonymizing before publication. One
maintain a GTFS dataset using their own personnel, or if they way to overcome this challenge is for agencies to take
are going to outsource this task. It is important to consider that
a new GTFS dataset will need to be produced every time there no responsibility for the data or information provided.
is a change to the schedule to keep the transit services based on However, having processes in place to report problems,
GTFS data up-to-date. Major schedule changes can occur 3 to see progress, and achieve fixes helps address this issue.
4 times a year for large agencies, although, depending on the Definitions for minimum quality of service requirements
impact on the transit rider, the agency may want to update their
GTFS data more frequently to reflect smaller changes in ser- also would help.
vice on a weekly or monthly basis. Therefore, when identifying • Unrealistic expectations or dependency from the pub-
a GTFS creation process, the maintenance and sustainability of lic around the authority’s capacity to provide consis-
the process must be considered (60, p. 4). tent, convenient, and reliable data all the time (e.g., data
latency following the detection of an incident). This
Additional challenges are discussed in the Polis Position challenge includes managing public reactions and expec-
Paper (98), by the Finnish Transport Agency (106), by Marples tations about changes in the transportation system that
et al. (107), by Traveline Information Limited (TIL) (108), in arise from the use of open data. Further, the data may not
the McKinsey report (25), by Beasley of the Reading Borough be what is desired by the public, which highlights the
Council (12), and by Watkins (14, pp. 27–30, 40–42): fact that the determination of which data to open should
be based on the data users’ needs. Sometimes “the private
• Opposition from information service providers, due sector is better placed to provide the end user services and
in some cases to the fear of the threat of competition. can help advise on what data [an agency] should focus
Deployment of open data principals for data exchange on” (12).
between private sector firms and service providers • Cost of opening up data, which is pertinent in the current
could alleviate this challenge. climate of public sector cuts and in view of the fact that
• Data control and ownership, for instance where data most authorities do not have a dedicated budget for their
are owned by a cross-agency institution (e.g., passenger open data activity. This cost does not just relate to build-
transport authority) or data are provided by a contractor. ing and providing the open data facility but also relates
Some concerns cited by contractors include “competitor to the ongoing costs of maintaining open data (ensuring
or commercially sensitive,” “fear of use for measuring that authorities have the resources to update/refresh the
operational performance,” and “extra burden on opera- data once it is published) as well as the support that must
tions.” Usage agreements could be helpful in expressing be provided to the developer community (98).
agency concerns. • Developer relationships need to be at different levels of
• Organizational in the sense that there may not be a clear engagement and promote support for mutual customers.
process/practical framework to guide transport authori- • Working with app developers should be sustainable and
ties in opening their data. In addition, there may be holistic, and include open communication lines.
a lack of understanding of the value of open data to • Performance measures are to be used to track success,
improve performance and a lack of capability/expertise including number of app downloads, number of apps
to implement an open data program. Also, having staff- developed, an app accessibility inventory, and market
level champions and strong leadership often leads to research surveys.
successful deployments. • Consider accessibility and equity.
36
chapter three
The synthesis survey covered several key characteristics The survey results show that almost all agencies provid-
of open transit data, including the justifications and rea- ing open data see it as a way to maintain or increase rider-
sons for choosing to provide or not provide open data; the ship. Fifty-four or more than 96% of responding agencies
underlying technology being used to generate the data; and that provide open data did not use any evaluation measures
the standards, protocols, and formats used in providing the to assist them in deciding to open their data.
data. Table 4 and Appendix B list the 67 responding agen-
cies. Before examining these characteristics, the study team The major factor in agencies deciding which data to open
noted the overall annual ridership and modes operated by is based on the ease of releasing the data. The next major
each respondent. U.S. responses represent agencies that decision is based on observing what other transit agencies
carry a total of just more than 5.4 billion passengers annually have done regarding open data, and the third most frequent
(annual unlinked trips), with U.S. agencies’ annual ridership decision was done internally without asking any groups out-
ranging from 1.8 million (a county transit system in Florida) side their agencies. All the survey responses are shown in
to 2.6 billion (MTA). The total annual ridership for each Table 7 and Figure 27.
agency is shown in Appendix D.
UNDERLYING TECHNOLOGY
JUSTIFICATIONS AND REASONS
FOR PROVIDING OPEN DATA In terms of the underlying technology that is generating the
open data, the survey results indicate that scheduling soft-
The first question in the survey was “Has your agency pro-
ware is the primary system being used. The next most used is
vided open data?” Of the 69 (two agencies provided two
GIS. The third most used is computer-aided dispatch (CAD)/
responses each from different departments of the agency)
automatic vehicle location (AVL). All survey responses
responding agencies, 57 (82.6%) provide open data and 12
about underlying technologies used to generate open transit
(17.4%) do not. Half of the agencies began providing open
data are shown in Figure 28. The “other” category includes
data in the 2010 to 2012 time frame. One agency started pro-
the following responses:
viding open data in 1981, two in the mid-1990s, 21 in the
2000 to 2009 time frame, and three in 2013.
• AVL/CAD vendor-supplied
Fifty-one agencies (almost 90% of those agencies that pro- • GTFS generated in Microsoft (MS) Excel
vide open data) provide it to increase information access to • Manually entered data
transit riders. The next most prevalent reason (49 responses, • Schedules. Scheduled information only
almost 86%) is to improve upon existing customer informa- • Trapeze
tion and customer service or create new customer informa- • Trillium Transit
tion services. The next most prevalent reason for providing • We collect and host data from transit operators in the
open data is to foster a more positive perception of transit or region
encourage more people to try public transit. Table 5 shows all • All auto-generated from our enterprise relational data-
of the reasons agencies provide open data. base management system (RDBMS)
• Open source editing tool (https://github.com/conveyal/
The most prevalent reason transit agencies are not provid- gtfs-editor)
ing open data is that it is too much effort to produce the data • Ride checks
or the agency does not have the time or people to do the work • Clever Devices (CD) BusTime Developer’s API, data-
required. The next most prevalent reason is that it takes too base script to convert scheduling information in Hastus
much effort to clean the data. All of the reasons are shown and CD Bustools to GTFS
in Table 6. • Ontologies (protégé) SPARQL.
37
TABLE 4
RESPONDING AGENCIES
State/Province/
Agency Name City
Country
Alameda–Contra Costa Transit District (AC Transit) Oakland CA
Ann Arbor Area Transportation Authority (AAATA) Ann Arbor MI
Arlington Transit (ART) Arlington VA
AtB AS Trondheim Norway
Atlanta Regional Commission Atlanta GA
Bangor Area Comprehensive Transportation System Brewer ME
Bay Area Rapid Transit (BART) Oakland CA
Bilbao City Council Bilbao Bizkaia, Spain
Blacksburg Transit Blacksburg VA
Chittenden County Transportation Authority Burlington VT
Capital Metropolitan Transportation Authority Austin TX
Central Florida Regional Transportation Authority Orlando FL
Central New York Regional Transportation Authority Syracuse NY
Champaign–Urbana Mass Transit District (CUMTD) Urbana IL
Charlotte Area Transit System (CATS) Charlotte NC
Chicago Transit Authority (CTA) Chicago IL
Consorcio Regional de Transportes de Madrid Madrid Spain
Des Moines Area Regional Transit Authority (DART) Des Moines IA
Delaware Transit Corporation (DTC) Wilmington DE
Empresa Municipal de Transportes de Madrid, S.A. Madrid Spain
Fairfax County DOT/Fairfax Connector Fairfax VA
Grand River Transit (Region of Waterloo) Kitchener Ontario, Canada
Greater Bridgeport Transit Bridgeport CT
Greater Cleveland Regional Transit Authority (GCRTA) Cleveland OH
Helsinki Regional Transport Authority Helsinki Finland
Interurban Transit Partnership (ITP) Grand Rapids MI
Kansas City Area Transportation Authority (KCATA) Kansas City MO
King County Metro Seattle WA
Los Angeles County Metropolitan Transportation
Los Angeles CA
Authority
Manatee County Area Transit (MCAT) Bradenton FL
Massachusetts Bay Transportation Authority (MBTA) Boston MA
Metrolinx Toronto ON
Metropolitan Atlanta Rapid Transit Authority (MARTA) Atlanta GA
Metropolitan Transportation Authority (MTA) New York NY
Metropolitan Transportation Commission (MTC) Oakland CA
Monterey–Salinas Transit District Monterey CA
NJ Transit Newark NJ
Sør-Trøndelag,
Norwegian Public Roads Administration (NPRA) Trondheim
Norway
New Hampshire DOT Concord NH
North County Transit District (NCTD) Oceanside CA
Norwalk Transit District Norwalk CT
Orange County Transportation Authority (OCTA) Orange CA
Oregon DOT Rail + Public Transit Division Salem OR
Pennsylvania Public Transportation Association Harrisburg PA
Pace Suburban Bus Arlington Heights IL
(continued on next page)
38
TABLE 4
(continued)
State/Province/
Agency Name City
Country
Pinellas Suncoast Transit Authority St. Petersburg FL
Roaring Fork Transportation Authority Aspen CO
Regional Transportation Commission of Washoe County Reno NV
Regional Transportation District (RTD) Denver CO
Stark Area Regional Transit Authority Canton OH
Suburban Mobility Authority for Regional Transportation Detroit MI
Brabant,
Samenwerkingsverband Regio Eindhoven (SRE) Eindhoven
Netherlands
Syndicat des transports d'Île-de-France (STIF) Paris France
San Mateo County Transit District San Carlos CA
Société de transport de Laval Laval Quebec,
Canada
Sound Transit Seattle WA
Pirkanmaa,
Tampere City Public Transport Tampere
Finland
United
Transport for London (TfL) London
Kingdom
United
Transport for Greater Manchester Manchester
Kingdom
Tri-County Metropolitan Transportation District of
Portland OR
Oregon (TriMet)
Organization of Urban Transportation of Thessaloniki Thessaloniki Greece
Vastra
Urban Transport Administration Gothenburg Gotaland,
Sweden
Utah Transit Authority Salt Lake City UT
Votran South Daytona FL
Wiener Linien Vienna Austria
Worcester Regional Transit Authority (WRTA) Worcester MA
York Region Transit Richmond Hill Ontario, Canada
TABLE 5
REASONS FOR PROVIDING OPEN DATA
Percent of
Number of Agencies
Reason for Providing Open Data
Respondents Providing Open
Data
Increase information access to transit riders 51 89.5
Improve upon existing customer information and customer 49 86.0
service or create new customer information services
Foster a more positive perception of transit/encourage more 44 77.2
people to try public transit
Foster/encourage innovation around the agency’s data or 36 63.2
help third parties develop skills and services (e.g., with
which the agency can contract)
Facilitate information sharing within the agency and with 34 59.6
partners and customers
Agency transparency 33 57.9
Availability of data standard(s) for transit information (e.g., 33 57.9
GTFS)
Improve effectiveness of the agency and its services 32 56.1
Increase customization for customer information 31 54.4
There was demand for us to open our data/we were 29 50.9
requested to provide open data
39
TABLE 5
(continued)
Percent of
Number of Agencies
Reason for Providing Open Data
Respondents Providing Open
Data
Help achieve other agency goals (e.g., by providing a wider 27 47.4
audience for published information)
Provide ways to better understand and use transit 26 45.6
information within our agency
Participate in the latest trend in the transit industry 26 45.6
Improve or provide new private products and services 25 43.9
An information gap existed that could be bridged by better 20 35.1
public data
Cut costs to our agency 12 21.1
Provide incentives for others to help maintain data sets, 12 21.1
reducing the maintenance cost for the agency
Other (only one respondent specified that their “other” 6 10.5
response meant “part of agency culture, esp. information
technology.”
Measure the impact of transit on the community(ies) that 6 10.5
are served
TABLE 6
REASONS FOR NOT PROVIDING OPEN DATA
40
TABLE 7
FACTORS IN DECIDING WHICH DATA TO OPEN
Number of
Decision Factor Percent
Respondents
Based on the ease of releasing the data 33 58.9
Based on observing what other transit agencies have done regarding 21 37.5
open data
Decided internally without asking any groups outside our agency 17 30.4
Asked potential users of the data 11 19.6
Based on the cost associated with producing or cleaning the data 11 19.6
Asked the community in which your agency operates service 8 14.3
Asked riders 1 1.8
Other:
Approached by Google.
Approached by transit enthusiast.
Based upon what our AVL/CAD vendor provided.
I don’t know.
Open Government Data (OGD) Vienna.
Requests to access data.
Some in the developer community encouraged us to release items.
User demands.
Based on demand.
Defaults to GTFS and availability of Clever Devices Bustime API.
Worked with developers.
We were already using web services for internal purposes, we merely exposed it with documentation for
the third party developers; a no brainer.
Supported University of Washington graduate study project to provide scheduled data to the public via
third-party application (One Bus Away).
Based on requests from third party service providers.
Asked experts in the University field.
Decided both internally, and from developer community.
41
FIGURE 28 Underlying technologies that generate open transit data (from survey responses).
TABLE 8 The degree to which the data are open was examined from
TYPES OF OPEN TRANSIT DATA four different perspectives, as shown in Figure 7. The survey
results for each of these four characteristics are shown in Fig-
Types of Information Number of Respondents Percent
ures 30 through 33.
Route data 51 89.5
Schedule data 50 87.7 GTFS is the format most commonly used to provide open
Station/stop locations 49 86.0
transit data. The survey results indicate that a number of other
Real-time information 33 57.9
Park-and-ride locations 17 29.8
standards and formats are being used, as shown in Table 9.
Fare media sales locations 14 24.6
Ridership data 14 24.6 The agency’s website is the outlet used most frequently
Other 12 21.1 through which open transit data are provided. The GTFS
Budgetary data 10 17.5 Exchange website is the next most commonly used, followed
Performance data 8 14.0 by APIs. All survey responses to this question are shown in
From survey responses. Table 10.
42
FIGURE 29 Frequency with which open data are updated or modified (from survey responses).
FIGURE 30 Degree of access of open data for percentage of FIGURE 32 Cost of open data for percentage of survey
survey respondents (from survey responses). respondents (from survey responses).
FIGURE 31 Machine readability of open data for percentage of FIGURE 33 Rights to open data for percentage of survey
survey respondents (from survey responses). respondents (from survey responses).
43
TABLE 9
STANDARDS AND FORMATS USED BY SURVEY RESPONDENTS
Number of
Standards and Formats Percent
Respondents
TABLE 10
WHERE OPEN TRANSIT DATA ARE MADE AVAILABLE
Number of
How Agencies Make Data Available Percent
Responses
44
chapter four
Just more than half (29 or 50.9%) of the survey responses show • Only provide GTFS and website information to public.
that agencies require a license or agreement to use the agency’s All other data are restricted to the Metropolitan Plan-
open data. Almost 60% of the respondents (16 responses of 27) ning Organization (MPO), government agencies and
said that they require acknowledgment of a license agreement research entities by request.
before a third party accesses the open data. Just more than 83%
of the respondents (25 responses of 30) note that they do not The survey asked what agencies do if they discover irre-
require another type of registration before a third party accesses sponsible users. Among the responses were 15 indicating
and uses the data. that this situation has not occurred. Several responders indi-
cated that they have a policy in place to handle this situation
The respondents’ license agreements cover a variety of but have never had to exercise it.
items, the most common of which is the right to use the
agency’s data. The next most common item is the limitation • Limit or terminate/cancel access (some by revoking the
on data availability (nonguarantee of data availability), accu- key that developers receive when they register to use
racy or timeliness, and the third most common is the liability the data).
limitations for missing or incorrect data. The remaining items • Block the data.
and the frequency of their occurrence are shown in Table 11. • Contact/follow-up directly with the publisher and user,
and try to resolve the issue.
Almost all of the survey respondents (54 or 98.2%) have • Limit access and monitor the public sites or API.
experienced no legal issues resulting from the release of their • If it is an incorrect time/broken app, do nothing and
data to the public. let the market sort itself out through reviews and other
means.
Figure 34 shows the steps taken by the survey respondents • Ignore them.
to disclose their data publicly. Those who responded that they • Probably would let the active developer community
took “other” steps to disclose their data reported the following: publicize the offender.
• We identify our and data consumers’ responsibilities in
• Accepted Google’s request the terms of use. We take no responsibility if someone
• All of above (meaning all of the choices for this ques- violates those terms.
tion in the survey) were done for corporate reasons, but • No misuse of data to date, but periodically ask new
enabled open data publication developers to please comply with the terms of use if
• Developers page on agency website they are not. The few that were encountered apologized
• Open data are available through a local development and corrected immediately.
environment (ITS Factory) • 1. Verbal warning to stop. 2. Remove their access to
• Developed license agreement the feed.
• Work with developers • We issue a license key for each user however so we
• Our data were already accurate and used via web ser- would simply revoke the key of any irresponsible user.
vices developed for internal purposes. Internal purposes • If a trademark or copyright violation, it would be
turned out to be similar needs for external developers, referred to legal for review.
just a few minor tweaks based on comments that actu- • For real-time data access can be blocked and the key
ally improved it can be revoked.
TABLE 11
ITEMS COVERED IN LICENSE AGREEMENTS
Number of
Items Covered in License Agreement Percent
Respondents
Right to use the agency’s data 20 71.4
Nonguarantee of data availability, accuracy, or timeliness 19 67.9
Liability limitations for missing or incorrect data 18 64.3
Agency’s right to alter data without notice or liability 17 60.7
Data ownership 15 53.6
Use and placement of copyrighted logos and images 14 50.0
Indemnity from technical malfunctions due to users’ use of data 13 46.4
Limitations on use of data 13 46.4
Indemnity from legal actions against data users 12 42.9
Indemnification 9 32.1
Termination 5 17.9
Licensing fees and royalties 3 10.7
Quality control 3 10.7
Other:
Applicable law
Caveat to be provided when data published
Do not overwork servers by requesting more data than necessary
License is implicit with the use of the data
We use acceptance of terms of use
Differs for web services and static data sets. It is very liberal.
http://www.transitchicago.com/developers/terms.aspx
No license in place yet, only agreement with app developers, MA DOT provides
GTFS access without license.
From survey responses.
46
chapter five
APPLICATIONS DOWNLOADS
The survey contained several questions regarding how open When survey recipients were asked to estimate how much data
data were being used in terms of customer applications, are being used or downloaded over a certain time frame,
decision-support tools used by the agency itself, and non- there was a wide variety of answers. Of the 31 responses to
transit applications. Further, there were two questions regard- this question, just less than half (15) reported that they either
ing how agencies monitor use of the open data. do not know or cannot estimate how much data are being
used or downloaded. Those who could estimate the volume
Customer applications are the most prevalent use of open reported the following:
transit data. As shown in Table 12, trip planning is the most
common use of open data, followed by mobile applications • Per day
and real-time transit information. –– 1,800,000 queries per day
–– 2,000,000 API calls per day
The types of decision-support tools that use the open data –– About 250,000 unique user accesses daily
are shown in Table 13. Data visualization is the most com- –– Data download average is 4 gigabytes per day
mon tool, followed by service planning and evaluation, and –– 100,000 API requests per day
route layout and design. –– Number of daily transactions approximately between
20,000 and 35,000 per day
Survey respondents reported numerous applications that –– Approximately 85,000 requests per day
used their open data. The applications reported to be used • Per month
most frequently are: –– Less than 30 megabytes per month
–– 250 megabytes per month
–– Three terabytes per month
• Google Maps
• Per year
• Google Transit
–– 7,213 downloads in the past year (about 40 gigabytes)
• HopStop
–– For the real-time feed, 18,045 gigabytes in the last
• OneBusAway
12 months
• Open Trip Planner
–– Approximately 100 megabytes per year, with a GTFS
• Rome2Rio
file slightly less than 1 megabyte
• RouteShout
• Per download: 100 megabytes per download.
• TimeTable Publisher
• WalkScore Obviously, the volume is based on several factors, includ-
ing the amount of data being accessed, the size of the agency,
Although the other applications reported by respondents and the number of applications accessing the data.
are too numerous to list and are, for the most part, locally
developed, the survey responses indicate that even the small- The types of applications reported by survey respondents
est agencies have more than one application that uses open were as follows:
data.
• Thirty-seven respondents (88.1% respondents) who
Almost two-thirds (33 or 63.5%) of respondents stated indicated that they have mobile applications counted a
that they were not aware of other uses of their agency’s open total of 764 applications;
data. Further, the same number of respondents do not track • Twenty-eight respondents (66.7%) who indicated
usage of their open data. Table 14 shows the methods used by that they have web-based applications counted 191 of
the agencies that do track usage. them; and
47
TABLE 12
TYPES OF CUSTOMER APPLICATIONS USING OPEN TRANSIT DATA
Number of
Customer Applications Percent
Respondents
Trip planning 41 75.9
Mobile applications 38 70.4
Real-time transit information (arrival/departure times, delays, 32 59.3
detours)
Maps 31 57.4
Data visualization 21 38.9
Timetable creation 17 31.5
Interactive voice response (IVR) 14 25.9
Accessibility 12 22.2
Other 8 14.8
Ridesharing 8 14.8
Crowdsourcing 5 9.3
From survey responses.
48
TABLE 15
TYPES OF MOBILE PLATFORMS USING OPEN TRANSIT DATA
Platforms for Mobile Applications Number of Respondents Percent
Android 35 91.9
iOS (Apple) 35 91.9
Windows Mobile 11 29.7
Blackberry 7 18.9
Nokia 6 16.2
Mobile Linux 1 2.7
Other:
Text messaging app (Dabnab)
HTML5
Jolla
Palm WebOS
Pebble
Short message service (SMS)
Windows 7
OSX
Mobile web apps
Spotbros.
49
chapter six
50
TABLE 16
TYPES OF COSTS ASSOCIATED WITH OPEN DATA
Number of
Types of Costs Associated with Providing Open Data Percent
Respondents
Staff time to update, fix, and maintain data as needed 38 76.0
Internal staff time to convert data to an open format 35 70.0
Staff time needed to validate and monitor the data for accuracy 28 56.0
Staff time to liaise with data users/developers 25 50.0
Web service for hosting data 23 46.0
Publicity/marketing 12 24.0
Consultant time to convert data to an open format 11 22.0
Other:
Contract management
Cost to develop prediction software or use prediction Software as a Service (SaaS)
Everything above is already done for internal purposes and it is all automated
Investigation project agreement with the Faculty of Computing Sciences
Consultant time to build editing tool
License Routing service Mentz
No additional costs are incurred.
TABLE 17
LABOR HOURS PER OPEN DATA ACTIVITY
Range of Labor
Number of
Activity Hours per
Respondents
Month
Internal staff time to convert data to an open format 4 3–40
Staff time needed to validate and monitor the data for accuracy 4 1–10
Staff time to update, fix and maintain data as needed 3 2–20
Publicity/marketing 3 0.1–2
Staff time to liaise with data users/developers 2 0.25–6
Consultant time to convert data to an open format 2 20
Web service for hosting data 1 1
From survey responses.
TABLE 18
BENEFITS OF OPEN DATA
Number of
Benefits Percent
Respondents
Increased awareness of our services 39 78.0
Empowered our customers 37 74.0
Encouraged innovation outside of the agency 37 74.0
Improved the perception of our agency (e.g., 33 66.0
openness/transparency)
Provided opportunities for private businesses 24 48.0
Encouraged innovation internally 21 42.0
Improved our market reach 18 36.0
Become more efficient and effective as an agency 11 22.0
Increased our return-on-investment from existing web 10 20.0
services
Experienced cost savings 5 10.0
Been able to reassign staff 3 6.0
From survey responses.
51
TABLE 19
REASONS FOR ENGAGING DATA USERS AND REUSERS
Number of
Reasons for Engagement Percent
Respondents
Obtain feedback on data anomalies and data quality issues 25 75.8
Find out more about how people want to use/reuse your data 21 63.6
Expose your data to a wider audience 21 63.6
Provide technical support 20 60.6
Announce updates, modifications, etc. 19 57.6
Find out more about the demand for our data 18 54.5
Suggesting features to improve the functionality of applications 17 51.5
Find out more about prospective users/reusers 15 45.5
Enable existing and prospective users/reusers to find out more 14 42.4
about your data
Explain transit jargon and definitions 12 36.4
Solicit requests for future data 11 33.3
Enable prospective and existing users to meet each other 7 21.2
From survey responses.
–– More scrutiny because of increased visibility of data • Resulting in improved and sustainable mobility
accuracy, including third-party users wanting zero • Improving transparency and accountability
downtime • Improving customer service, customer satisfaction, and
• Neutral Impacts public perception/image of transit, including service
–– Thinking about data reuse versus public policies reliability
–– Public awareness of what agencies are doing and • Empowering the public
how they are doing it. • Making transit more competitive (reducing “costs” of
trip related to customer uncertainty) and easier to use
The impacts on the public sector (e.g., riders, community • Providing more visual information
citizens) of providing open data also were explored. The fol- • Providing more innovative applications that government
lowing were reported: agencies may not be able to provide
• Providing a better transit experience
• Creating and improving access to additional and • Increasing ridership
higher quality public services, including more and free • Increasing competition for transit riders
applications • Providing better regional coordination
• Providing better, more accessible, and more timely • Encouraging the development of third-party tools and
public information and tools applications
TABLE 20
DATA USER AND REUSER ENGAGEMENT TECHNIQUES
Number of
Engagement Techniques Percent
Respondents
Face-to-face events 24 60.0
Conferences 18 45.0
Meetups 12 30.0
Hackathons 10 25.0
Application competitions 7 17.5
Unconferences/BarCamps (conferences with 5 10.3
no set agenda—the agenda is set at the time of
the conference by the participants)
Speed Geek events (participation process used 2 5.1
to quickly view a number of presentations
within a fixed period of time)
Other:
Local Open Data Advocacy Group
Email
Various online industry and developer forums
52
The impacts on the private sector (e.g., developers) were –– Improve/preserve rider perception of accuracy of
reported by survey respondents as follows: arrival predictions, given operational impacts such
as adverse weather, reroutes, construction
• Providing business/commercial and development oppor- –– Ensure safety/security of files and information
tunities, including new and expanded companies that disseminated
could create a new eco-system of private entrepreneurs • Standards and formatting
• Enabling innovation and the creation of applications –– Standards help overcome formatting issues
• Providing data to cover new needs –– Better organization of the marketing of available
• Decreasing the need for agency to develop apps on a information to public
multitude of differing platforms, which would be costly –– Managing the evolution of internal data model
to do internally or to outsource –– Data scalability
• Providing more visual information • Marketing
• Providing a broader reach for customers –– Making the data uniform resource locators (URLs)
• Adding value to existing services known
• Private sector interacting with transit more comfortably –– Partnering with an organization (e.g., Mobility Lab)
because they know more about transit to publicize availability
• Adding data by large trip planning services (Google, –– Initially, resistance because of branding issues
Bing, HopStop) • Technical issues
• Improving access with high potential for untapped growth –– Tracking users and developers
areas that the public sector cannot fund or have access to –– Process of making the data available when new sched-
• Creating interest in agency and desire to show off cod- ules are released
ing ability –– Finding ways to represent some of the unique aspects
of rural transit (such as deviations, or stops not always
Many of the impacts to the agencies, public sector, and reached in the same order) in a standardized format
private sector are repeated, proving that opening transit data –– Ensuring the route changes are reported in a timely
has a significant value for all three groups. manner to the individual responsible for maintaining
the data feed
–– The ability to get the data out to developers at the
CHALLENGES speed they want
–– Local development environment
The survey asked about the challenges associated with pro- –– Developers want a wide array of features
viding open data and how the challenges were overcome. –– Slow data retrieval
The survey responses include the following: –– Allowing direct access to data through agency firewall
–– How to provide large amounts of data in a timely
• Resources and organizational issues manner
–– Limited dedicated resources (both time and staff)
responsible for managing open data (including data
conversion/cleaning and validation) LESSONS LEARNED
–– The process/philosophy is still not fully understood
–– Securing management support The lessons learned noted by survey respondents cover four
–– Agency coordination challenges major areas as follows:
–– The lack of technical know-how within the agency
–– Challenging internal parties who believe that we • Data quality and accuracy are critical to the success
should be charging for the release of data of an open data program. Respondents mentioned the
–– Helping internal groups see the benefit/value of par- following:
ticipation and demonstrating how this can reduce –– Put quality checks in place when opening data;
manual publication load –– Be as open as possible, but test the data before releas-
–– Closer attention to change management ing it to developers;
–– Internal fear that we should not do it because not all –– Start small in terms of the amount of open data
predictions would be accurate and we would be criti- offered and then grow that when confident of data
cized for that quality of new sources/data sets;
• Data quality and timeliness –– It is important to have good, clean data—things that
–– Ensure/preserve data quality, completeness and you understand internally as a transit agency don’t
equity, and timely release of data always translate well to people who are less familiar
–– Necessary to clean the data with your operations; and
–– Interaction with regional systems and inconsisten- –– Data must be compatible with or identical among the
cies with what data are used in which of their systems different formats in which they are made available.
53
• Open data are not free. –– Further, an open data program should be supported
–– If you do not have staff to support open data (plan- by a project champion;
ning, engineering and maintenance, especially), do –– Carefully assign staff roles and skills;
not implement such a program; –– Having buy-in from coordinating agencies is crucial;
–– Providing open data requires a lot of technical under- –– Considering open data is a fundamental part of the
standing when establishing it. Think of the costs that overall information system; and
come with providing such data; –– Ensure that data reuse complies with public policies.
–– Use standards to make it easier to provide open data; • Engagement and developing relationships with devel-
and opers is key to success as well. Respondents mentioned
–– Select a technology vendor that supports open data the following in relation to engaging data users and
or require it in the contract with the vendor. reusers:
• Recognize that opening data will create changes within –– Early engagement with potential users is key. Find out
and external to the agency. Respondents summarized what they want and how they want it. Try and track
this point as follows: who is developing what, particularly to understand the
–– There is a shift that agencies have to get comfortable successes and failures;
with—from providing solutions to providing data; –– Respond quickly to opportunities. Developers work
–– Open data will not solve every customer require- on much shorter schedules than planners;
ment, and agencies will still have to stay in the game –– Developers will know the latest mobile platforms
(e.g., SMS, accessible services); and can utilize these with your data;
–– Customers are smart—they can tell which third-party –– Make it as easy as possible for developers to access
services are best, and they will not hold the agency your data, and make the license understandable and
responsible for third-party services that are poor; intimidating; and
–– It is important that agencies not interfere with the –– Developers will help you determine the quality
market to ensure that the benefits of competition can of the data if you provide a forum for this type of
be realized; feedback.
54
chapter seven
CASE EXAMPLES
Several of the transit agencies and organizations that responded The MBTA independently opened its schedule data and
to the synthesis survey were interviewed by telephone to obtain some real-time information, but initially did not share the
more detailed information on their deployment of electronic real-time information. They wanted to put the real-time
signage. The results of the interviews are presented in this information in a data feed on a trial basis before opening it
section as case examples. to the public. Further, the market was right for developers—
they could use open data for web applications, phone apps,
and electronic signs. Before the open data revolution, there
MASSACHUSETTS BAY TRANSPORTATION were far fewer options available to developers. The MBTA
AUTHORITY (BOSTON) got a positive response after opening its data, and that pro-
vided the momentum for continuing the open data program.
In July 2009, the Executive Office of Transportation (now
Massachusetts Department of Transportation) began an open The choice of standards used for the open data was based
transportation data program by creating a developer’s web on what was needed within the API and what was available
page that contained a variety of transportation information, in the marketplace. When the MBTA reviewed potential stan-
including route and schedule data for the Massachusetts dards, agency personnel wanted to support GTFS-realtime and
Bay Transportation Authority (MBTA) and Massachusetts wanted to be on Google. (The MBTA was one of the first transit
regional transit authorities (109). On November 14, 2009, the agencies in the United States to provide real-time information
First Annual MassDOT Developers Conference was held at on Google Transit.) They looked at SIRI but found it to be ver-
the Massachusetts Institute of Technology (MIT) to encour- bose and somewhat complicated, so they decided against using
age the development of applications based on the newly it. However, if SIRI does become more prevalent among devel-
opened transportation data. Two important announcements opers, they could support it. TCIP was well-suited for commu-
were made at this conference: (1) the winners of the 2009 nications within an agency, but not for communications with
MassDOT Developers Challenge were announced; and (2) the developers, so it was not selected for use with the open data.
MBTA announced the availability of real-time bus informa-
tion for selected routes (110). Laurel Ruma referred to the The MBTA developed an API in order to retrieve smaller
MBTA opening its data at Ignite 2010, where she presented sets of information than what is contained in GTFS-realtime
“Better than Winning the World Series: Boston Opens Real- (GTFS-RT) and include some information that is not in
Time Transit Data” (http://igniteshow.com/videos/better- GTFS-RT. The MBTA selected XML format for its API
winning-world-series-boston-opens-real-time-transit-data). because it is an industry standard for APIs. The documenta-
Her speech covered the eight steps that led to the successful tion describing the MBTA’s real-time open data can be found
opening of the MBTA’s data: (1) build a community, (2) learn in MBTA and IBI Group (111).
the lingo, (3) open data, (4) hold a contest, (5) be prepared to
MBTA’s engagement with developers consists of doing a
be blown away, (6) award unique prizes (e.g., subway pass for
survey of developers and having a newsgroup in which devel-
a year), (7) tell the world, and (8) repeat.
opers can ask and answer questions. The MBTA received a
better response from the survey than the newsgroup. In addi-
According to David Barker, there were three primary tion, the MBTA suggests running an event (developer’s con-
reasons the MBTA opened its data (D. Barker, manager of ference) when an agency has something to announce that is
operations technology for the MBTA, personal communica- related to open transit data.
tion, March 10, 2014):
Although the MBTA has not conducted a survey that directly
1. The MBTA wanted to get its schedule and real-time addresses the effects of the agency’s open data program, a sur-
data out to customers, realizing that this information is vey was conducted by the MIT examining the impacts of real-
the ultimate in advertising MBTA services; time information displayed on the electronic signs in MBTA
2. Customers wanted real-time information; and subway stations. The data displayed on the signs, the arrival
3. There was enthusiasm for open government ideas and times of the next two subway trains, are driven by the same
for trying something new (the MBTA wanted to be on open data available to developers (112, 113). The conclusions
the forefront). of the impact study were as follows.
55
Countdown signs have significantly altered how passengers uses its own API, the more the agency will discover improve-
view their wait for public transit in Boston. Passengers reduce ments that developers might not think of and the more it will
estimates of wait time on average by 0.85 minutes. After control-
ling for service disruptions (when the countdown predictions are reduce vendor lock-in.
less accurate), passengers reduce wait time estimates by 1.3 min-
utes. This corresponds to a reduction of approximately 17% in In a discussion about opportunities and challenges related
total passenger wait time estimation, and a 50% reduction in to open transit data, Mr. Barker mentioned the following:
wait time overestimation. This coincides with improved wait
satisfaction for headways lower than 5 minutes, but decreased
• There is less development in the open data area currently.
for headways greater than 9 minutes. Wait time overestimation
remained high at 34% even with countdown timers (112, p. 9). The original open data products are being maintained.
• Adding staff to handle open data is challenging. One
As of early April 2014, the MBTA’s next steps were integrat- way this has been overcome is being able to refer to
ing with Twitter and taking steps to reduce message volume to “good press”—that is helping to get support to add staff.
customers who subscribe to T-Alerts. The agency considered
In terms of costs and benefits, the NextBus contract costs
moving from e-mailing alerts to providing them by means of text
$20 per bus per month to make and publish predictions. Fur-
messages (SMS), but the cost was too high. The improvements
ther, the public address/two-display countdown signs have
in message formatting, which have been implemented, were:
to be maintained. However, the ROI is “fantastic,” and many
customer applications are free.
• The long subject lines will be replaced with auto-
generated information from meta-data that summarizes The MBTA reports six lessons learned regarding open data:
the subject; and
• A message needs to fit within Twitter length 1. Start small and iterate;
requirements—automated abbreviations. 2. Listen to and interact with developers;
3. Know that when you first release the data, there will be
Further, the MBTA is interested in conducting another a grace period during which you will discover issues;
contest or hackathon. In addition, the agency wants to lever- 4. Plan for how you will sustain an open data program;
age its API internally. For example, the agency is interested 5. Capitalize on good press. Although it is hard to mea-
in placing a real-time information sign in one station show- sure improvements resulting from open data, positive
ing the time until the next bus(es). Currently, there is a sign press helps to demonstrate the value of open data; and
like this in Ruggles Station, but the data come directly from 6. It is important to maintain the momentum and dem
a vendor’s product rather than the MBTA’s API. The agency onstrate an open data project’s performance.
wants to replace this sign, and place signs in other stations,
with signs that use the MBTA’s own feeds and include alerts In addition to numerous customer applications, several
for services leaving that station, including real-time bus and visualizations have been developed using MBTA open data.
subway times, and elevator outages. The more the MBTA One visualization, shown in Figure 36, “shows a bit more
56
than 24 hours’ worth of bus location data [on November 4, Marey diagram and a heat map that shows the average num-
2011] with colored lines representing the speed of each vehi- ber of people who enter and exit subway stations for every
cle. Red indicates speeds less than 10 miles per hour, yellow hour throughout the month of February 2014.
is 10–25 mph, and blue is faster than 25 mph. It’s drawn from
2,058,574 data points in all” (114).
TRANSPORT FOR LONDON—LONDON,
Bostonography.com contains several visualizations based UNITED KINGDOM
on MBTA open data, as shown in Figures 37 through 40.
Figure 40 shows the MBTA Orange Line as a 24-hour clock. As discussed in TCRP Synthesis 91 (1), Transport for London
Each ring is a train station, and the thickness of the line rep- (TfL) began providing open data in June 2010. According to
resents the amount of people on the train at that time. Phil Young, Head of Online at TfL (P. Young, personal com-
munication, Feb. 25, 2014), the chronology of events leading
Another visualization (Figures 41 and 42) shows “24 hours to providing open data through early April 2014 is:
of MBTA ticket swipes. The color represents the train line
and the thickness represent[s] the amount of people riding at • 2007—Launched embeddable “widgets” for live travel
that hour” (http://thunderhead.esri.com/readonlyurl/MBTA/ news, map, and Journey Planner
MBTA1.html). A visualization shown in Figures 43 and 44 • 2009—Special area for developers launched on TfL
displays the patterns of MBTA commuters across all subway website
lines and stations, or for a particular line or station over a • 2010
single day. –– Additional real-time feeds launched, with hundreds
of developers registered
In June 2014, another visualization of MBTA data was –– Greater London Authority (GLA)—digital advisory
published in http://mbtaviz.github.io/. This visualization is board, mayoral and deputy mayor support, mayor’s
an in-depth analysis of how MBTA subways operate using a live event, London Datastore
FIGURE 37 Visualization of MBTA Orange Line ridership for August 12, 2009 (http://vanderlin.cc/projects/mbta_visualizations:
accessed on April 3, 2014).
57
FIGURE 38 Visualization of MBTA Silver Line ridership for August 12, 2009
(http://vanderlin.cc/projects/mbta_visualizations: accessed on April 3, 2014).
FIGURE 39 Snapshot of MBTA subway traffic for 24 hours in 5 minutes (James Kebinger “MBTA in
Motion,” August 30, 2009: http://www.youtube.com/watch?v=0tuzjxEBto4).
58
–– U.K. government—Public Sector Transparency Board • TfL will achieve an integrated, coherent, relevant pres-
2010—central government launch of data.gov.uk to ence across digital media, including
drive forward the release of “machine readable” data –– Online presence—web, tablet, mobile
• 2011 –– On-system digital information
–– London Underground train location and Journey –– Open data
Planner APIs launched. Registered developers rise –– Social media
to more than 1,000. –– Digital marketing.
59
FIGURE 43 Visualization for MBTA Orange Line over 24 hours (“A Day of MBTA” http://adayofmbta.com/).
60
FIGURE 44 Visualization for all MBTA subway lines over 24 hours (“A Day of MBTA” http://adayofmbta.com/).
• For open data, TfL will deliver “Data openly syndicated data openly available, with only 6% levying any charge to
to third parties, where commercially, technically and developers. The industry is also seeing it as an important way
legally feasible, while TfL engages developers where in which to demonstrate openness and transparency to cus-
necessary to meet our business objectives.” tomers and stakeholders.
Examples of developers’ products include the following: At the same time, TfL is experiencing continued and rapid
growth in visits to the agency’s other information services.
• London’s Nearest Bus—allows users to find the nearest For example, the agency’s website is getting 8 million
buses and live departure times from their location. Users unique users a month, and the number of people following
can also set individual bus alerts to trigger when a bus the agency’s Twitter feeds has risen to more than 1 million.
is due;
• Station Master—offers detailed accessibility informa- The reasons TfL opened its data are as follows:
tion for every London Underground station;
• Tube Tracker—a multiservice app that finds the nearest 1. Public data—as a public body funded by fares and tax-
station to the user with directions. Provides automati- payers, the agency’s transport information is seen to be
cally updated live departure information, a journey plan- owned by the public.
ning function, first/last Tubes, and Tube status alerts; 2. Reach—the agency’s goal is to ensure persons needing
• Colour Blind Tube Map—This displays the London travel information can get it wherever and whenever
Underground map in various formats for easier viewing they wish in any way they wish. Open data allows them
by people with all forms of vision impairment. to extend the reach of their information.
3. Optimal use of transport network—by enabling cus-
Young reports that a global survey of transport companies tomers to make more informed choices, TfL makes
by the International Association of Public Transport (UITP) the most efficient use of the capacity of the transport
shows that 54% of transport operators are now making their network.
61
4. Economic benefit—open data saves customer time • Single TfL API—A single TfL API consolidating all
(up to £58 million per annum, according to a recent existing and new APIs into a single normalized model.
study) and facilitates the growth of small and medium • App Garden—to showcase applications “powered by”
technology companies, generating employment and a TfL data. This will give consumers choice and ensure
highly skilled workforce. apps meet minimum standards. Aspects of the Garden
5. Innovation—by having thousands of developers build- are:
ing applications, services, and tools with TfL data –– Draw in customer ratings from app stores;
and APIs, the agency stimulates innovation through –– Improve branding guidelines so customers can see
competition. which are “powered by TfL”;
–– Improve control of app developer access through a
Each developer must register with TfL to gain access to new access portal; and
feeds and APIs. This ensures that the agency can maintain –– Minimum standards check before applications are
a relationship with them and provide information about added to the App Garden.
changes, maintenance, or new services. In summary, the TfL
license states the following. A wealth of TfL open data is available from the London Data
store (http://data.london.gov.uk/taxonomy/organisations/tfl),
• Data are free to access and use, including for commer- as shown in Appendix H.
cial purposes.
• Applications can be created from the data and Many applications have been developed using TfL open
commercialized. data. One of the earliest applications was a live map of the
• The developer must not pretend to be TfL or use the London Underground, London’s subway system. This app,
TfL brand. which won second place at the 2011 Open Data Challenge
• The developer must not “screen-scrape” login-based competition that featured 430 entries from 24 European
services. Union member states, is shown in Figure 45 (115). The same
app developer created a similar real-time map showing bus
TfL’s open data are presented in three main ways: locations (see Figure 46) (116). The app uses OpenStreetMap
(1) Static data files (which rarely change); (2) feeds (data and open data from TfL.
files refreshed at regular intervals); and (3) API, which enable
a query from an application to receive a bespoke response. Another visualization, shown in Figure 47, is described
Data are presented as XML when possible. In addition, TfL by Hargreaves: “It takes live position data from the 100s of
is moving toward the use of U.K. standards NaPTAN and buses that travel through the TFL network. The lines are the
TransXChange. routes and the dots are the individual buses. The contours of
the landscape have been exaggerated to emphasise the sense
Young describes five lessons learned related to open data. of space within the visualisation” (117).
1. Transport data are in great demand—particularly Another visualization uses smartcard (called the Oyster
real-time information, such as bus and train status and card) data from TfL to show a day’s worth of transactions, as
arrivals/departures. shown in Figure 48.
2. Developers need support—processes and resources
are needed to support, answer queries, and engage to Green indicates the number of passengers in the transit system,
deliver improvements to data and services. whether on a bus or in one of several rail modes. Blue indicates the
presence of riders prior to their first transaction of the day or after
3. Something is better than nothing—developers are their last: it is assumed that the location of a rider’s first or last
highly creative and would prefer the feeds and data to transaction approximates their place of residence. Red indicates
be released early even if not perfect. They can still do cardholders who are between transit trips, whether transferring,
great things with it. engaging in activities, or traveling outside the transit system (118).
4. Open data can improve data quality—By opening
up data, feedback is produced, which refines the data Several additional visualizations using TfL data are shown
quality and can be used to improve source systems. in Reades (119).
5. Open data are good value—emerging research indi-
cates that open data provide ROI of up to 50 times the BAY AREA RAPID TRANSIT—OAKLAND,
sums invested in terms of customer benefit. CALIFORNIA
TfL’s next steps in open data are as follows: BART’s open data initiative was first reported by TRB (1).
In 1998, BART was approached by students from the Uni-
• New TfL website in early 2014—built upon the prin- versity of California at Berkeley who wanted to develop
ciple of APIs, making integration, build, and open data one website with schedules for all 26 transit agencies in the
better and easier to deliver. Bay Area. At the time, there was no single source of transit
62
information. BART knew that it would be virtually impos- XML format (the first real-time feed was provided in 2008).
sible to manage schedules for all agencies, so this request This XML feed was updated every 30 seconds and has since
was a compelling value proposition. The students could have been retired. Currently, the use of the agency’s API allows
“scraped” the schedules. So BART gave the students sched- more granular calls that provide data such as car lengths.
ules in .csv format. The site built by the students eventually Several data items of interest are not in GTFS or GTFS-RT,
grew and became the Bay Area’s 511.org. It was positive for such as station information and load factors. There are more
BART to support a third-party initiative that led to a sustain- requests about entries and exits with as much granularity as
able regional initiative. possible.
Another request for data came when the Westfield Mall The overall chronology of BART’s open data program,
above Powell Street wanted to install a touch-screen kiosk which has been in existence more than 15 years, is as follows
with BART schedules on it. They requested the schedules in (28, p. 6):
a specific format. BART agreed to provide the schedule data.
At this point, BART did not consider making anyone sign a • 1998—Schedules provided in .csv format
contract to access the data on the kiosk or use schedule data • 2005—Embedded Quick Planner (iframe format)
in general. This point is critical to understanding BART’s (retired November 2013)
simple use agreement that does not need to be signed. • 2006—DIY display [in HyperText Markup Language
(html) format] (retired in November 2013)
Because other people began to ask for schedules, fulfilling • 2007—Delay, elevator advisories (RSS)
all of these requests would have been “one-offs” if BART had • 2008—Real-time estimated times of arrival (ETAs)
not opened their data. For example, trying to do an embed- (XML format) (retired in November 2013)
ded quick planner or electronic displays was not the agency’s • 2010—Trip plans, station info (API)
core business. Then BART saw GTFS and recognized that • 2011—Real-time ETAs, advisories (GTFS-realtime)
using GTFS was a way to solve these problems. The thinking • 2012—App Map + Geospatial (KML format)
was that if the agency published data in this format, it could
take care of requests. So it solved the challenges associated One key aspect of BART’s open data program, according
with responding to the data requests. to Timothy Moore (interview, March 20, 2014), is that BART
used an organic process to open the agency’s data. BART’s
BART was one of the few agencies to pilot GTFS-realtime, universe of data is manageable because of the agency’s size
but the agency had already been providing real-time feeds in and because there is only one pick (operator signup) per year.
63
64
65
Moore also described the value for BART (28, p. 8): FIGURE 49 Apps per rider (28, p. 5).
• Cost savings
• Labor reallocation (one BART full-time equivalent)
• Increased ROI from existing web services BART will be releasing a new multiplatform, free, smart-
• Scale, improved market reach/time to market phone app to report crimes and take pictures of and report
• Empowered customers (choice, competition) suspicious items or hazards. Riders have been asking for a
• Innovation and “trickle up” safe, silent and discreet way to communicate with BART
• Increased awareness of BART’s service when they are on a train or in a station. BART will be the first
• Positive perception: openness/transparency. transit agency to offer both Spanish and Simplified Chinese
options for the app. The app will feature a silent photo and
BART’s lessons learned regarding open data include the flash-free feature and is GPS enabled.
following:
Several visualizations have been developed using BART
• Releasing the data and adopting standards that gain open data. One example, which was funded by the Knight
traction are critical. Foundation, is shown in Figure 50 (http://barthood.news21.
• Make the license readable by non-lawyers, and do not com/). The source for this visualization was mainly BART’s
require that the license be signed. station profile study (the raw data can be found at http://
• Create multiple paths into using the data: www.bart.gov/about/reports/profile).
–– Provide simple tools to the casual user
–– Provide RSS feeds to the medium-level user Several data visualizations related to the BART strike in
–– Provide GTFS-RT and geospatial and the agency’s 2013, including ridership and regional traffic impacts (http://
API to the advanced user. enjalot.github.io/bart/). Figure 51 shows one of the visualiza-
• Open data are to be documented and released using a tions related to ridership by station. Figure 52 shows the pas-
transmission format that is accessible. senger trips between Balboa Park and other stations. Figure 53
• Do not play favorites in the market. Do not interfere in shows ridership from a different perspective at BART stations.
determining the most effective apps—the public should
be trusted to do that. WORCESTER REGIONAL TRANSIT AUTHORITY—
• Stay responsive to customer needs and developer needs. WORCESTER, MASSACHUSETTS
• Provide information to developers to ensure that cus-
tomers’ needs are being met—synchronizing customer The Worcester Regional Transit Authority (WRTA) in Worces-
needs and developer skills are critical. ter, Massachusetts, is a transit authority in central Massachu-
• Promoting data and apps by using free advertising and setts with 48 fixed-route buses and 50 paratransit vehicles.
car cards promoting that data are available. Beginning in 2009, the WRTA embarked on a program to
• Open data needs to be part of the agency’s culture. It implement technology on all of the agency’s vehicles, result-
helps customers understand that BART is not respon- ing in, among other things, providing real-time informa-
sible for every app. tion to the public. Once the technology implementation was
• Hackathons can be good for launching an open data pro- completed, the agency’s information technology consultant,
gram, but they may not be effective on an ongoing basis. Christopher Hamman, began working on opening the agency’s
Now BART participates in more organized events, such data. The reason the WRTA decided to open data was to show
as TransportationCamp West, and uses Google Groups transparency and encourage developers to create new and bet-
and an e-mail list to stay engaged with developers. ter applications for their riders (Interview with Christopher
Hamman on March 20, 2014). In addition, the administrator
Of the transit agencies in major metropolitan areas, BART of the agency is visionary and fully supported opening the data
has the smallest number of riders per app, as shown in to the public to get applications in the hands of customers.
Figure 49. The WRTA pursued developers who had created apps for the
66
67
68
69
• Excel timetable format for each route electronic sign showing both WRTA and partner’s infor-
• Microsoft Access table mation) to each community partner. This allows partners
• Images and maps to show their internal stakeholder information and WRTA
• Schedules in portable document format (pdf). bus times and transit-related information. Further, it is a
less expensive way of getting WRTA information dis-
Other techniques used by the WRTA include the following: seminated. So far, 15 kiosks have been deployed.
• WRTA’s Open Checkbook initiative (http://www.therta.
• Using source control, such as GitHub, to provide qual- com/about/open-checkbook/).
ity assurance and control for each release.
• Modifying the website to publish to RSS, Twitter, Face-
The Worcester Regional Transit Authority is committed to
book, and Wordpress automatically. These free broadcast providing citizens with open and transparent government.
media expand the reach of real-time updates to riders. As part of this proactive approach to civic engagement, the
• Using developer’s API. WRTA has developed this Open Checkbook webpage. Open
• Implementing a do-it-yourself (DIY) kiosk program with Checkbook is meant to be a window into the authority and
to provide the public with access to the authority’s spending
nine community partners, including schools and social information. Open Checkbook will detail vendor payments,
services (e.g., Quinsigamond Community College, Fam- identifying who was paid and when, how much was paid, and
ily Health Center, etc.). The WRTA licenses the asset (e.g., what was the purpose of the payment.
70
chapter eight
CONCLUSIONS
SUMMARY OF PROJECT SCOPE • The top four reasons for not providing open data are:
–– Too much effort to produce the data/not enough time
The primary purpose of this synthesis is to determine transit’s or people to do the work required;
experience with open data, how agencies have opened their –– Too much effort to clean the data;
data, and the uses of the data. A survey was used to collect key –– Concern that the agency cannot control what some-
information about open transit data and was sent to 67 transit one will do with the data; and
agencies around the world. There was a 100% response rate. –– Concern regarding the accuracy of the data.
Of the 67 surveys received, three were from Canadian agencies
and 14 from European agencies.
KEY FINDINGS
The project examined and documented the state of the
practice in open data using the following five elements: In summary, based on the literature review, the responses to
the questionnaire, and the case examples, there are four key
• Characteristics of open transit data findings of this synthesis project:
–– Reasons for choosing to provide open data
–– Standards and protocols for providing open data
1. Although the costs of providing open data are not well
–– Underlying technology used to generate the open data
understood, the benefits to the agency, public, and com-
• Legal and licensing issues and practices
munity strongly support open transit data. The avail-
–– Legal and licensing issues
ability of open transit data encourages innovation that
–– Public disclosure practices
could not be accomplished solely by agency staff. The
• Uses of open data
rapid creation of new mobile and Internet platforms,
–– Applications
–– Decision-support tools requiring new information technology (IT) develop-
–– Visualizations ment, would create a strain on typically limited agency
• Costs and benefits of providing open data resources. By focusing the limited resources on provid-
• Opportunities and challenges ing accurate, reliable, and timely open data, an agency
–– Techniques for engaging users and reusers of data can cost-effectively provide its information to the pub-
–– Challenges associated with providing open data lic, relying on third parties (e.g., application develop-
–– Impacts on transit agencies and the public and pri- ers) to create customer applications and conduct data
vate sectors analyses.
The overall benefits experienced by survey respon-
The project was conducted in four major steps as follows: dents included the following:
• Increased awareness of the agency’s services;
• Literature review; • Empowerment of customers;
• Survey to collect information on a variety of factors; • Encouragement of innovation outside of the
• Analysis of survey results; and agency;
• Interviews conducted with key personnel at agencies • Improvement in the perception of the agency
that have experience with open transit data. (e.g., openness/transparency);
• Provision of opportunities for private businesses;
This section of the report contains the project’s findings, • Encouragement of innovation internally;
lessons learned, and conclusions. • Improvement in our market reach;
• Greater efficiency and effectiveness as an agency;
PROJECT FINDINGS • Increased return on investment (ROI) from exist-
ing web services;
Key statistics from the study are as follows: • Cost savings; and
• Ability to reassign staff.
• Fifty-seven or almost 83% of the survey respondents The Moscow case study shows the power of open
provide open data; and data to change decision processes.
71
2. Engaging application developers, other data users, and directly with the high use of GTFS, which requires these
customers is an approach that can accomplish several data elements, among others. Further, these data types are
critical tasks, including: required to perform trip planning, which is the subject of
• Obtaining feedback on data anomalies and data many customer applications developed using open transit
quality issues; data. The next most common type of open data are real-
• Ensuring that some portion of the applications time information, which is provided by more than half of
developed by third parties meet the needs of cus- the survey respondents. This corresponds to the use of either
tomers; and GTFS-realtime or Service Interface for Real Time Informa-
• Learning more about how people want to use/ tion (SIRI) by almost half of the survey respondents.
reuse agency data.
There are several ways to engage developers and The most prevalent underlying technologies that produce
customers. Results of the survey indicated that the open data are scheduling software, geographic information
most effective methods are conducting face-to-face system (GIS) software, computer-aided dispatch (CAD)/
events, conferences, and “meetups.” Meetups are infor- automatic vehicle location (AVL), and real-time arrival pre-
mal meetings to discuss particular topics, such as app diction software. This finding is expected, given the types of
development. For example, Mobility Lab in Arlington, open data reported by survey respondents.
Virginia, hosts meetups to discuss transportation issues
and support programmers who are interested in transit; The overwhelming reasons for opening transit data are
biking and walking; and open data, data visualization, related to customer information—increasing access to this
and mapping. information, and improving the information and customer
3. The results of the literature review and survey indicate service. This result corresponds with almost all of the survey
that standards and commonly used formats are to be respondents indicating that providing open data is a way to
used to facilitate the generation and use of the open data. maintain or increase ridership. Improving perception of the
The literature discusses how standards are used to gen- transit system and fostering innovation were the next most
erate the open data, such as the case of many scheduling frequently reported reasons for opening data.
software packages providing schedule data in General
Transit Feed Specifications (GTFS) format. Further, The factors that went into the decision about what data
using standards makes it easier to transfer applications to open were driven primarily by the ease of releasing the
from one agency to another, which was the case when data (more than half of the survey respondents indicated this).
Worcester Regional Transit Authority (WRTA) was The next two most prevalent decision factors were observing
looking for applications; it was easy to take applica- what other transit agencies have done regarding open data
tions that were developed for Chicago Transit Authority and deciding internally without asking any groups outside the
(CTA) and adapt them for WRTA because of the stan- agency.
dards used. Without standards, planning and operations
analyses, such as those described by Wong (19) and A variety of standards and formats are being used, includ-
Catalá et al. (75), could not be accomplished easily. ing GTFS (47 or 83.9% of respondents), Extensible Markup
4. Opening transit data results in innovation that could not Language (XML) (26 or 46.4%), and comma-separated val-
be accomplished within a transit agency. That is not to ues (.csv) (18 or 32.1%), followed by GTFS-realtime (15 or
say that the intellect does not exist in a transit agency— 26.8%). The degree of openness in the four categories men-
it is an issue of having sufficient resources to develop tioned is as follows:
applications and conduct analyses at the scale that can
be done in an open market. Stephen Goldsmith, in the • Thirty-two or 57.4% of the respondents reported that
article “Open Data’s Road to Better Transit” (102), men- the data are completely open (everyone has access).
tions that “some members of the American Public Trans- • Forty-seven or 83.6% reported that the data are avail-
portation Association believe that open data initiatives able in formats that are easily retrieved and processed.
have catalyzed more innovation throughout the industry • Forty-nine or 87.3% reported that there is no cost for
than any other factor in the last three decades.” the open data.
• Forty-three or 79.2% reported that there are unlimited
FINDINGS BASED ON FIVE ELEMENTS rights to use, reuse, and redistribute data.
72
for missing or incorrect data. Almost 60% (16 responses of four or 60% of the respondents use face-to-face events to
27) of respondents indicate their agency requires acknowl- engage these groups.
edgment of a license agreement before data can be accessed.
Only one respondent reported agency legal issues resulting
Opportunities and Challenges
from the release of open data to the public.
In terms of impacts on the agency and the public and private
According to the respondents, the top three steps that
sectors, the majority of impacts reported by respondents were
agencies took to publicly disclose data are to (1) convert tran-
positive. The organizational impacts on the agency ranged
sit data into formats suitable for public use; (2) improve data
from increased transparency to better and more accurate
quality to ensure accuracy and reliability; and (3) adopt an
internal data to lower costs to provide information. Impacts
open, nonproprietary data standard.
on the customer were numerous, including better and more
accessible information for customers; better perception, vis-
Uses of Open Data ibility, and awareness of services; and improved customer
satisfaction. The majority of negative impacts were related to
The top five types of customer applications that have been resources needed to maintain an open data program.
developed as a result of providing open data are (in descend-
ing order of frequency) trip planning, mobile applications, In terms of impacts on the public, creating and improving
real-time transit information (arrival/departure times, delays, access to additional and higher quality public services was
detours), maps and data visualization. The top five decision- mentioned, along with improving public perception/image
support tools that have been developed are data visualization, of transit, making transit more competitive, providing better
service planning and evaluation, route layout and design, regional coordination of services, encouraging innovation,
performance analysis, and travel time and capacity analysis. and providing a better transit experience.
Almost two-thirds (33 or 63.5%) of respondents reported The impacts on the private sector are primarily providing
their agencies do not track usage of open data. The two most business/commercial and development opportunities, includ-
prevalent methods of tracking are to monitor data downloads ing new and expanded companies (e.g., creating a new eco-
and to keep track of applications developed. For mobile appli- system of private entrepreneurs), enabling innovation and the
cations, an equal number of respondents reported Android creation of applications that may not have been created by the
and iOS applications. Sixteen respondents reported a total of public sector, and adding value to existing public services.
almost 266 million API calls per month.
Challenges were noted by survey respondents in five areas,
as follows:
Costs and Benefits of Providing Open Data
• Resources and organizational issues, which largely con-
The top five types of costs associated with providing open
sist of limited resources and securing support for an open
data are staff time to update, fix, and maintain data as needed;
data program;
internal staff time to convert data to an open format; staff
• Data quality and timeliness issues, which largely describe
time needed to validate and monitor the data for accuracy;
having to ensure data quality, completeness, timeliness,
staff time to liaise with data users/developers; and web ser-
accuracy, and equity;
vice for hosting data. Almost 90% (43 or 89.4%) of respon-
• Standards and formatting issues;
dents could not quantify how much time is spent on any of
• Marketing issues relating to making the open data
these activities.
known and addressing branding issues;
• Technical issues, which consist of tracking users,
Although activities required to provide open data were
including who has built what apps and how success-
identified by respondents, resource requirements varied
ful they have been; complying with developers’ wishes;
widely. There was limited information regarding the actual
how to provide large amounts of data in a timely man-
labor required from specific staff in the organization or the
ner; and having a process in place to make the data
costs associated with open data.
available when new schedules are released.
Finally, the top three benefits experienced by survey
respondents are (1) increased awareness of their services, LESSONS LEARNED
(2) empowerment of customers, and (3) encouragement of
innovation outside of the agency. Almost 70% (33 or 69.6%) Respondents cite the following lessons learned:
of the respondents engage or have a dialogue with existing
and potential data users and reusers. Twenty-five or 75.8% • Data quality and accuracy
of the respondents engage data users and reusers to obtain –– Put quality checks in place when opening data
feedback on data anomalies and data quality issues. Twenty- –– Test the data before releasing it to developers
73
–– Start small in terms of the amount of open data offered analyses internally and the perception of public transit
and then grow that when confident of data quality of within the community. In addition, the agency’s trans-
new sources/data sets parency as a result of open data has had more positive
–– Ensure data are compatible with or identical among than negative effects. In a time when public agencies are
the different formats in which they are made available being scrutinized more than ever, providing data about
• Cost of open data their operations and internal processes reflects over-
–– Staff to support an open data program is needed to coming the old thinking that data should not be released
implement such a program beyond providing paper schedules.
–– Use standards to make it easier to provide open data –– The impacts of open transit data on customers and
–– Select a technology vendor that supports open data the general public are significant. Now customers
or require it in the contract with the vendor (and those who have not yet taken transit) have free
• Organizational and institutional effects, including changes tools that essentially break down the barriers to using
within and external to the agency transit, such as interpreting a paper schedule or map.
–– Agencies have to get comfortable with providing data Further, real-time information makes it easier to plan
when they are accustomed to providing only transit trips. In addition, the tools resulting from open data
service. satisfy the desire many people have for obtaining
–– Open data will not solve every customer requirement. travel information almost instantaneously. However,
–– Customers will recognize which third-party services one important factor in assessing the customer and
are most effective, and they will not hold the agency public impact is ensuring that the tools being devel-
responsible for poor third-party services. oped actually fulfill the customers’ needs.
–– It is important that agencies not interfere with the –– The impacts on the private sector have been encour-
market to ensure that the benefits of competition can aging over the past several years. Applications and
be realized. visualizations that perhaps could not have been con-
–– An open data program should be supported by a proj- ceived or developed by a transit agency have been
ect champion. created. These apps have changed the nature of travel,
–– Staff roles must be carefully assigned. where in some cases, the public transit option is more
–– Buy-in from coordinating agencies is crucial. prominent and understood. Further, this has resulted
–– Open data are a fundamental part of an overall infor- in businesses being established that may have not
mation system. existed if not for open transit data. Finally, developers
–– Agencies must ensure that data reuse complies with are creating innovative ways in which to analyze the
public policies. data, resulting in potential improvements in service.
• Developing relationships with and engaging data users • The legal fears often thought to be barriers to opening
and reusers: transit data have not been realized. The survey results
–– Early engagement with potential users is key. Find show that only one agency responding to the survey
out what they want and how they want it. Try and experienced any legal issues resulting from the release
track who is developing what, particularly to under- of open data to the public. The literature, survey results,
stand the successes and failures.
and case examples indicate that simple agreements with
–– Respond quickly to opportunities. Developers work
data users and reusers can accomplish what is needed to
on much shorter schedules than planners.
ensure proper use and distribution of the data, along with
–– Make it as easy as possible for developers to access
rules regarding the use of logos, images, and so forth.
data, and make the license understandable.
In addition, as stated in the survey responses, having a
–– Developers will help determine the quality of the
plan in place to handle irresponsible users is critical. For
data if the agency provides a forum for this type of
example, several techniques for managing irresponsible
feedback.
users included contacting developers and discussing the
–– Developers will know the latest mobile platforms
problem, and limiting or terminating access to the data.
and can use these with the data.
• Standards greatly facilitate the use of open transit data.
Although this sometimes requires additional effort in
CONCLUSIONS producing the open data, it makes it much easier for the
data to be used. Clearly, from the literature review and
Several conclusions can be drawn from the results of the survey, GTFS has become a de-facto standard, with at
synthesis. least 726 agencies using it. Further, it is being used, as
reported, in a number of agencies that are just begin-
• The benefits of providing open transit data far outweigh ning to open their data, particularly those that have
the costs. Benefits accruing to the agency itself, custom- provided only paper-based data (e.g., schedules) for
ers, and the public and private sectors are far-reaching. fixed-route services (and in some case, no information
Several of the survey respondents discussed using open at all about other services). In addition, the use of stan-
data as a way to improve their agency’s ability to conduct dards has facilitated traditional planning and analysis
74
of transit data, as reported extensively in the literature. architecture. A study that examines just how much time
Further, even vendors with proprietary products have is required by various departments and staff and a dis-
developed “translators” that reformat the data within cussion of the actual costs associated with an open data
their software to one of the standard formats. Finally, program could be helpful. For example, if a tool is used
standards are still evolving. Open standards, such as to export the data from a scheduling system, this one-
GTFS, OpenStreetMap, and OpenTripPlanner, have time investment facilitates reuse afterward (effectively
led the way in the transit industry and are being used lowering ongoing costs). Such study should include
extensively to create new applications. examining how much time and resources are spent on
• Engaging with data users and reusers has the potential to public records requests, which sometimes are made when
increase the value of the applications and visualizations open data are not available.
that are developed. Engaging with developers and the • Guidance describing each step in setting up an open
public will ensure that developers are taking custom- data program. Such a document would contain sections
ers’ needs into consideration. Further, there are many relating to the factors mentioned in the final conclusion
different ways to engage users. In addition, the survey and detail about the process that many agencies use
responses indicate that methods of engagement might to issue open data when there is a change to their data
be based on the sophistication of the agency in terms (e.g., when there is a schedule change) and when they
of open data. provide new data elements. In addition, this guidance
• Several factors lead to a successful open data program. could contain items describing various types of engage-
–– Obtaining and maintaining management-level sup- ments with data users and reusers.
port for such a program and avoiding bureaucratic • Guidance to use GTFS to depict nonfixed-route transit
delays. This factor speaks to embracing transparency, services. Although work is being conducted by organi-
realizing that transit will be more visible in the com- zations, such as The World Bank and Ride Connection
munity and that there is the potential to improve the in Portland, Oregon, more guidance would be helpful
perception of transit as a result of providing open data. in this area, given the large number of demand response
–– Recognizing the need for the appropriate level of systems in the United States.
resources needed to provide and maintain open data. • Guidance to create accessible applications. Guidance
–– Establishing ways to monitor data accuracy, timeli- for application developers could include the key ele-
ness, reliability, quality, usage, and maintenance is a ments of an accessible application. Because this issue
key component of an open data program. Making a has not been directly addressed by the Americans
decision as to whether each application based on the with Disabilities Act (ADA), it is a topic that could
open data will be tested is part of this factor. Some be helpful to developers as they are conceptualizing
agencies let the market decide if an application is their products.
good or not, and others test each application. • Open fare system data. Most open transit data are oper-
–– Creating and maintaining licensing or registration to ational in nature, but such data do not always contain
ensure that if a data user or reuser is misusing the data, fare information because of the sensitive nature of the
action can be taken with minimal effort. As suggested
data. With the proliferation of electronic fare collec-
by Bay Area Rapid Transit (BART) and Massachusetts
tion, particularly smartcard and mobile fare payments,
Department of Transportation (MassDOT), a license
developing and using open fare data is of interest to
or registration should be simple, conveying the basic
transit agencies.
principles associated with using the data.
• Changing the corporate culture. There is a gap in knowl-
–– Having an ongoing dialogue with developers and
edge regarding how to show that open transit data have
customers regarding the open data program has been
value to the corporate culture within the transit industry.
shown to increase the value of the data and products
Further, describing practices that encourage the develop-
that are based on the data.
ment and dissemination of open transit data, rather than
hindering them (e.g., bureaucratic processes), would be
SUGGESTIONS FOR FUTURE STUDY helpful.
• Open transit data from the developers’ perspective.
Based on the survey results and literature review, the follow- This report is written primarily from the transit industry
ing areas are suggested for future study: experience with open transit data. More information is
needed from the developers to better understand their
• Using open data to support performance measurements. needs and concerns.
Although the literature covers visualizations that exam- • Policies on app centers. Transit agencies could use guid-
ine transit performance, there is no guidance for transit ance on the policies regarding making applications avail-
agencies regarding effectively using open data to perform able on their websites; this is another gap in the literature.
these types of analyses. It would be helpful if this guidance included information
• Amount of staff time that is required. This will vary by about how to determine which apps are included in an
the volume of data opened and depends on the system app center, disclaimers to consider, and so forth.
75
• Visualization and other open transit data tools. There gap in the current literature. For example, as men-
is an evolution of tools that use open data to visual- tioned in the report, TriMet created a new open source,
ize important aspects of transit operations (e.g., per- multimodal trip planner (OpenTripPlanner) that uses
formance), but there is limited information about these OpenStreetMap (OSM), a crowdsource open data set
tools. designed for routing. Guidance to agencies regard-
• Open transit data ROI. Although the literature contains ing the use of crowdsourced data/information and
information discussing the value of open transit data, open source software would be extremely valuable as
there is a gap in describing how to calculate ROI. If it would help agencies move away from proprietary
open transit data have value, it is helpful to quantify solutions.
it to factor into decision making about which data sets • Open transit data as an element of enterprise architec-
to open and how to open them. For example, the Open ture. Information on this topic is lacking in the litera-
Government Portfolio Public Value Assessment Tool ture and would be most helpful to transit agencies for
(PVAT), described earlier in the report, can help agen- accommodating open data as they build or rebuild their
cies determine the public value of their open govern- IT infrastructure. This includes identifying automation
ment initiatives. (e.g., automated generation of open data from schedul-
• How to use metadata. Metadata, which describes the ing software), relational database management systems
characteristics of data, is a critical part of any data set, and use of a cloud-based IT framework to facilitate
including open transit data. The use of metadata in inclusion of open transit data.
open transit data is not covered in the current literature. • Procurement processes that support open transit data
For example, Metropolitan Transportation Authority and open source software. The literature and transit
(MTA) in New York is conducting a travel pattern proj- industry practices are lacking regarding how to procure
ect in which mobile phone metadata are being analyzed solutions that support open data and open source soft-
to understand trip flows. ware. Guidance for agencies to procure “best value”
• Crowdsourcing to combine open transit data with other solutions that support open data would be helpful to
data and open source software evolution. This is another move the industry away from proprietary solutions.
76
77
78
REFERENCES
1. Schweiger, C.L., TCRP Synthesis 91: Use and Deploy- 12. Beasley, S., “Transport Innovation Deployment for
ment of Mobile Device Technology for Real-Time Tran- Europe [TIDE],” Reading Borough Council, presented
sit Information, Transportation Research Board of the at the 2013 Annual Polis Conference, Brussels, Belgium,
National Academies, Washington, D.C., 2011 [Online]. Dec. 4–5, 2013.
Available: http://onlinepubs.trb.org/onlinepubs/tcrp/ 13. Heaton, B., “Open Government Initiatives Evaluated
tcrp_syn_91.pdf. by New Assessment Tool,” Government Technology,
2. “What Makes Data,” Open Data Institute, London, U.K. May 13, 2011 [Online]. Available: http://www.govtech.
[Online]. Available: http://theodi.org/guides/what-open- com/policy-management/Open-Government-Initiatives-
data [accessed Mar. 11, 2014]. Assessment-Tool.html.
3. Peterson, A., “Access to Data and Information—A 14. Watkins, K., “Open Transit Data: State of the Practice,”
Business Between Stakeholders and a Democratic presented at the 2013 ITS World Congress, Tokyo,
Issue,” Proceedings of 2012 ITS World Congress, Paper Japan, Oct. 14–18, 2013, pp. 17–18.
No. EU-00196, Vienna, Austria, Oct. 14–18, 2012, p. 3. 15. Young, P., “TfL Open Data, Jan ’13,” Head of TfL
4. Orszag, P.R., “Memorandum for the Heads of Execu- [Online]. Available: http://www.rtig.org.uk/web/portals/
tive Departments and Agencies: Open Government 0/PhilYoung_Jan13.pdf.
Directive,” Number M10-06, Dec. 8, 2009 [Online]. 16. McHugh, T., “Opening TriMet,” presented at the APTA
Available: http://www.whitehouse.gov/open/documents/ 2012, CTO, p. 3 [Online]. Available: http://www.apta.
open-government-directive. com/mc/annual/previous/2012/presentations/Presenta
5. de Vreeze, M., “The Future for Open Data,” presented tions/McHughT-Opening-TriMet.pdf.
at the 20th ITS World Congress, Tokyo, Japan, Oct. 17. “GTFS Data Exchange: Transit Agencies Providing
14–18, 2013. GTFS Data” [Online]. Available: http://www.gtfs-data-
6. Schwegmann, C., “Open Data in Developing Countries,” exchange.com/agencies#filter_official [accessed Feb. 23,
2014].
Topic Report No. 2013/02, European Public Sector Infor-
18. “All US Transit Agencies with Open Data,” and “All
mation Platform, Feb. 2012 [Online]. Available: http://
USA Transit Agencies with Open Data,” City-Go-
www.epsiplatform.eu/sites/default/files/127790068-
Round [Online]. Available: http://www.citygoround.org/
Topic-Report-Open-Data-in-Developing-Countries.pdf.
agencies/us/?public=all [accessed Mar. 29, 2014].
7. “Open Government Plan—Chapter 3: DOT’s Open
19. Wong, J., “Leveraging the General Transit Feed Specifi-
Government Initiatives and Activities for 2012–2014”
cation (GTFS) for Efficient Transit Analysis,” presented
[Online]. Available: http://www.dot.gov/open/plan-
at the 92nd Annual Meeting of the Transportation
chapter3 [accessed Dec. 27, 2013].
Research Board, Jan. 2013, p. 10–13.
8. Eros, E., S. Mehndiratta, C. Zegras, K. Webb, and
20. “Global Open Data Index: Survey,” Open Knowledge
M. Catalina Ochoa, “Applying the General Transit Foundation, Cambridge, U.K. [Online]. Available: https://
Feed Specification (GTFS) to the Global South: Expe- index.okfn.org/ [accessed Mar. 30, 2014].
riences in Mexico City and Beyond,” prepared for the 21. “Timetables,” Open Knowledge Foundation, Cambridge,
93rd Annual Meeting of the Transportation Research U.K. [Online]. Available: https://index.okfn.org/country/
Board, Washington, D.C., Jan. 2014, p. 2. dataset/timetables [accessed Mar. 30, 2014].
9. World Bank Open Transport Team, “An Overview of 22. “Transit,” Open Knowledge Foundation, Cambridge,
Open Transport in East and Southeast Asia,” brochure U.K. [Online]. Available: http://us-city.census.okfn.org/
published by The World Bank, 2 p. [Online]. Avail- dataset/transit [accessed June 15, 2014].
able: http://siteresources.worldbank.org/INTURBAN 23. Kaufman, S.M., “Getting Started with Open Data:
TRANSPORT/Resources/340136-1395424136020/ A Guide for Transportation Agencies,” Rudin Center
EAP-Open-Transport-Services.pdf. for Transportation Policy and Management, Robert F.
10. “The Open Data Movement” [Online]. Available: http:// Wagner Graduate School of Public Service, New York
visual.ly/open-data-movement [accessed Sep. 26, 2011]. University, May 1, 2012, p. 6 [Online]. Available: http://
11. Rojas, F.M., “Transit Transparency: Effective Disclo- wagner.nyu.edu/files/rudincenter/opendata.pdf.
sure through Open Data,” Transparency Policy Project, 24. Lee, Y., “Transit Open Data & Transit Information
Ash Center for Democratic Governance and Innova- Apps,” presented at 20th ITS World Congress, Tokyo,
tion, Taubman Center for State and Local Govern- Japan, Oct. 14–18, 2013.
ment, Harvard Kennedy School, Cambridge, Mass., 25. Barbeau S.J., “Open Transit Data—A Developer’s
June 2012, 9 pp. [Online]. Available: http://www.trans Perspective,” May 2, 2013 [Online]. Available: http://
parencypolicy.net/assets/FINAL_UTC_TransitTrans www.cutr.usf.edu/wp-content/uploads/2013/05/CUTR-
parency_8%2028%202012.pdf. Webcast-Handout-05.02.13.pdf.
79
26. Manyika, J., M. Chui, P. Groves, D. Farrell, S. Van 38. “Resource Description Framework,” W3C Semantic
Kuiken, and E. Almasi Doshi, “Open Data: Unlock- Web [Online]. Available: http://www.w3.org/RDF/
ing Innovation and Performance with Liquid Informa- [accessed June 15, 2014].
tion,” McKinsey Global Institute, McKinsey Center 39. “Representational State Transfer,” Technopedia
for Government and McKinsey Business Technology [Online]. Available: http://www.techopedia.com/defini
Office, Oct. 2013, p. 3 [Online]. Available: http://www. tion/1312/representational-state-transfer-rest [accessed
mckinsey.com/insights/business_technology/open_ June 15, 2014].
data_unlocking_innovation_and_performance_with_ 40. “What Is RSS: RSS Explained” [Online]. Available:
liquid_information. http://www.whatisrss.com/ [accessed June 15, 2014].
27. “Applying the 5 Star Open Data Model to Your High 41. “Soap,” TechTerms.com [Online]. Available: http://
Value Public Data,” ICT.govt.nz [Online]. Available: www.techterms.com/definition/soap [accessed June 15,
http://ict.govt.nz/programmes/open-and-transparent- 2014].
government/toolkit-agencies/applying-5-star-open-data- 42. Transmodel [Online]. Available: http://www.dft.gov.uk/
model-your-high-value-pu/ [accessed Feb. 23, 2014]. transmodel/ [accessed Mar. 11, 2014].
28. Moore, T., “The Open Data Ecosystem,” presented in 43. “TransXChange,” GOV.UK [Online]. Available: https://
“Learn from the Experts: Open Data Policy Guidelines www.gov.uk/government/collections/transxchange
for Transit—Maximizing Real Time and Schedule Data [accessed Mar. 11, 2014].
Use and Investments,” Talking Transportation Technol- 44. “TransXChange Overview,” GOV.UK [Online]. Avail-
ogy (T3) Webinar, ITS Professional Capacity Building able: https://www.gov.uk/government/publications/
(PCB) Program, Dec. 5, 2013 [Online]. Available: http:// transxchange-overview [accessed Mar. 11, 2014].
www.pcb.its.dot.gov/t3/s131205/s131205_open_data_ 45. Wong, J., “The OpenPlans Guide to GTFS Data,”
presentation_moore.asp. Aug. 2, 2012 [Online]. Available: http://openplans.
29. Nobbe, H., “How Open Data Will Bring Traffic Informa- org/2012/08/the-openplans-guide-to-gtfs-data/.
tion to the Next Level in the Netherlands,” presented at 46. Transit IDEA Project 58: Google Transit Data Tool for
the 2013 ITS World Congress, Tokyo, Japan, Oct. 14–18, Small Transit Agencies, Transportation Research Board
2013. of the National Academies, Washington, D.C., Aug. 2011
30. Lee, K., M. Bae, K. Kim, and S.R. Park, “A Study on [Online]. Available: http://onlinepubs.trb.org/Online
the Connecting Methods of Traffic Information Using pubs/IDEA/FinalReports/Transit/Transit58_Final_
OPEN-API,” Preprint No. IS04-11, prepared for the 2010 Report.pdf.
ITS World Congress, Busan, South Korea, Oct. 2010. 47. McHugh, B., “Beyond Transparency: Open Data and the
31. [Online]. Available: http://transportation-camp-dc-2013. Future of Civic Innovation,” In Chapter 10, Pioneering
wikispaces.com/Transit+Data+Standards [accessed Open Data Standards: The GTFS Story, B. Goldstein and
Mar. 11, 2014]. L. Dyson, Eds., Code for America Press, San Francisco,
32. Reed, L., Thesis, Georgia Institute of Technology, Dec. Calif., © Code for America, 2013.
2013, Copyright © Landon Turner Reed 2013 [Online]. 48. “General Transit Feed Specification Reference,” Google
Available: https://smartech.gatech.edu/bitstream/handle/ Developers [Online]. Available: https://developers.
1853/50218/REED-THESIS-2013.pdf?sequence=1. google.com/transit/gtfs/reference [accessed Mar. 10,
33. Wong, J., L. Reed, K. Watkins, and R. Hammond, “Open 2014].
Transit Data: State of the Practice and Experiences from 49. GTFS-realtime Reference, Google Developers [Online].
Participating Agencies in the United States,” Preprint Available: https://developers.google.com/transit/gtfs-
No. 13-0186, presented at the 92nd Annual Meeting of realtime/reference [accessed Mar. 10, 2014].
the Transportation Research Board, Washington, D.C., 50. SIRI Handbook & Functional Service Diagrams, Ver
Jan. 2013. sion 0.13, 2008/01/10, Njsk Kizoom, DRAFT, ©Copy-
34. “Comma Separated Values in Technology,” Dictionary. right 2007, 2008 Kizoom Limited, London, U.K.
com [Online]. Available: http://dictionary.reference.com/ [Online]. Available: http://user47094.vs.easily.co.uk/
browse/comma+separated+values [accessed June 15, siri/schema/1.3/doc/Handbook/Handbookv15.pdf.
2014]. 51. “APTA Surveys Transit Agencies on Providing Infor-
35. “National Public Transport Access Nodes,” GOV.UK mation and Real-Time Arrivals to Customers,” Policy
[Online]. Available: http://www.dft.gov.uk/naptan/ifopt/ Development and Research, American Public Trans-
[accessed Mar. 11, 2014]. portation Association, Washington, D.C., Sep. 2013
36. “Network Exchange, Department of Transport [Online]. [Online]. Available: http://www.apta.com/resources/
Available: http://user47094.vs.easily.co.uk/netex/ reportsandpublications/Documents/APTA-Real-Time-
[accessed Mar. 11, 2014]. Data-Survey.pdf [accessed Feb. 23, 2014].
37. “What is GTFS-Realtime,” Google Developers [Online]. 52. “OneBusAway Application Suite,” Github [Online].
Available: https://developers.google.com/transit/gtfs- Available: https://github.com/OneBusAway/onebus
realtime/ [accessed June 15, 2014]. away-application-modules/wiki [accessed Mar. 30, 2014].
80
53. Iley, J., “Moving America on Transit—Innovation in 66. “Samtrafiken: APIs Providing a Path to Innovation in
Real-time Transit Information,” Center Identification Public Transportation,” Samtrafiken, London, U.K.
Number 79017-00, Mar. 6, 2012 [Online]. Available: [Online]. Available: http://apigee.com/about/customers/
http://www.nctr.usf.edu/2012/03/moving-america-on- samtrafiken-apis-providing-path-innovation-public-
transit-innovation-in-real-time-transit-information/. transportation [accessed Nov. 26, 2013].
54. Hazarika, H., “Viability of Data Exchange Standards 67. Group 7: Pessoa, Reed, Tzegaegbe, Wong, Yan, “Enabling
in Public Transport for the UK,” Preprint No. 20108, Transit Solutions: A Case for Open Data,” prepared for
presented at the 15th ITS World Congress, New York, Georgia Institute of Technology CE 6602—Transportation
N.Y., Nov. 16–20, 2008. Planning (Fall 2011), Dec. 14, 2011, p. 11 [Online].
55. Chambers, K., “Following the Global South’s Lead: Available: http://www.prism.gatech.edu/~lreed3/open
Improving Flexible Transit Technology in the U.S.,” data.pdf.
GTFS in the World, Jan. 15, 2014. 68. McHugh, B., “The OpenTripPlanner Project,” Final
56. Klopp, J., J. Mutua, D. Orwa, P. Waiganjo, A. White, Report, Prepared for Metro 2009–2011 Regional Travel
and S. Williams, “Towards a Standard for Paratransit Options Grant, Aug. 31, 2011, p. 10 [Online]. Available:
Data: Lessons from Developing GTFS Data for Nai- https://github.com/openplans/OpenTripPlanner/wiki/
robi’s Matatu System,” presented at the 93rd Annual Reports/OTP%20Final%20Report%20-%20Metro%20
Meeting of the Transportation Research Board, Wash- 2009-2011%20RTO%20Grant.pdf.
ington, D.C., Jan. 2014, p. 16. 69. Schweiger, C.L., TCRP Synthesis 104: Use of Elec-
57. Open Knowledge Foundation, Open Data Handbook tronic Passenger Information Signage in Transit, Trans-
Documentation, Release 1.0.0, Nov. 14, 2012, p. 15 portation Research Board of the National Academies,
[Online]. Available: http://opendatahandbook.org/pdf/ Washington, D.C., 2013.
OpenDataHandbook.pdf. 70. “Open Data Gives New Lease of Life to Public Travel
58. Gomi, Y. and R. Weiland, “An Open Platform for Information Screens,” first published in ITS Interna-
Telematics,” Proceedings of 11th World Congress on
tional, Jan.–Feb. 2014, as “Apps Aren’t the only Answer”
ITS, Nagoya, Japan, Oct. 17–22, 2004.
[Online]. Available: http://www.itsinternational.com/
59. McCann, L., “Empowering the Open Data Dialogue:
sections/transmart/features/open-data-gives-new-lease-
Creating Resources for Talking About the Benefits of
of-life-to-public-travel-information-screens/?utm_
Open Data,” Oct. 22, 2013 [Online]. Available: http://
source=Adestra&utm_medium=email&campaign_
datasmart.ash.harvard.edu/news/article/empowering-
id=536&workspace_name=ITS%20International&
the-open-data-dialogue-330 [accessed Dec. 27, 2013].
workspace_id=3&project_name=E-newsletters&
60. Antrim, A. and S. Barbeau, “The Many Uses of GTFS
link_url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fwww.itsinternational.
Data—Opening the Door to Transit and Multimodal
com%2Fsections%2Ftransmart%2Ffeatures%2F
Applications,” presented at the ITS America Annual
Meeting, Nashville, Tenn., Apr. 22–24, 2013, p. 6. open-data-gives-new-lease-of-life-to-public-travel-
61. McCann, L., “Reasons Not to Release Data,” Sunlight information-screens%2F&link_label=Read%20
Foundation, Washington, D.C., Sep. 2013 [Online]. Avail- more..&campaign_name=ITS%20International%20
able: http://sunlightfoundation.com/blog/2013/09/05/ 6th%20March%20eNewsletter.
reasons-not-to-release-data/ [accessed Mar. 30, 2014]. 71. Elepano, M., “Public Open Data Feeds,” presented at the
62. Moore, T., “Leveraging Open Data to Reach Mobile Redmon Group Inc., Alexandria, Va. [Online]. Available:
Customers in Real Time,” presented at the APTA ITS http://www.vatransit.com/wp-content/uploads/2013/05/
Best Practices Workshop, Oakland, Calif., Apr. 17, 2012 Redmon_VTA_Conference-2.pdf.
[Online].Available: http://www.apta.com/mc/its/previous/ 72. “Innovation Toolbox: 15 Inspiring Transport Measures
2012a/presentations/Presentations/Leveraging- That Can Change Your City,” TIDE (Transport Inno-
Open-Data-to-Reach-Mobile-Customers-in-Real- vation Deployment for Europe), pp. 24–25[Online].
Time-Timothy-Moore.pdf. Available: http://www.tide-innovation.eu/en/upload/
63. Hartonen, S. and K. Hiltunen, “Finnish Transport Results/TIDE-InnovationToolbox-ENG-lite.pdf.
Agency’s View on Data Policies,” presented at 20th 73. “Visualization,” WhatIs.com [Online]. Available:
ITS World Congress, Tokyo, Japan, Oct. 14–18, 2013. http://whatis.techtarget.com/definition/visualization
64. Frumin, M., “Bus Customer Information Systems,” pre- [accessed Apr. 1, 2014].
sented toAPTATransITech, Mar. 31, 2011 [Online].Avail- 74. “OTP Analyst,” Open Trip Planner [Online]. Available:
able: http://www.apta.com/mc/fctt/previous/2011tt/ http://www.opentripplanner.org/analyst/.
schedule/Presentations/session%2011%20Frumin%20 75. Catalá, M., S. Downing, and D. Hayward, “Expanding
Bus%20Customer%20Information%20Systems.pdf. the Google Transit Feed Specification to Support Opera-
65. “Bus Customer Information Systems” [Online]. Avail- tions and Planning,” prepared for Florida Department of
able: http://www.apta.com/mc/annual/previous/2012/ Transportation and Research and Innovative Technol-
presentations/Presentations/QueallyD-Bus-Customer- ogy Administration, U.S. Department of Transportation,
Information-Systems.pdf. Project numbers FDOT BDK85 #977-15 and DTRT07-
81
G-0059, Nov. 15, 2011, p. 37 [Online]. Available: http:// 88. McGurrin, M.F. and D.J. Greczner, “Performance
www.nctr.usf.edu/wp-content/uploads/2012/02/77902. Metrics: Calculating Accessibility Using Open Source
pdf. Software and Open Data,” Preprint No. 11-0230, pre-
76. “Maps That Show Time,” Mapumental [Online]. sented at the 90th Annual Meeting of Transportation
Available: https://mapumental.com/ [accessed Apr. 1, Research Board, Washington, D.C., Jan. 2011.
2014]. 89. Jariyasunant, J., E. Mai, and R. Sengupta, “Algorithm
77. “Accessibility of Welsh Schools by Public Transport— for Finding Optimal Paths in a Public Transit Network
Visualised,” Myfanwy, Oct. 2, 2013 [Online]. Available: with Real-Time Data,” Preprint No. 11-3791, presented
http://www.mysociety.org/2013/10/02/accessibility- at the 90th Annual Meeting of Transportation Research
of-welsh-schools-by-public-transport-visualised/. Board, Washington, D.C., Jan. 2011.
78. Schade, M., “Capital Bikeshare Hackers Pedal Their 90. Puchalsky, C., D. Joshi, and W. Scherr, “Development
Wares at Mobility Lab,” Dec. 11, 2013 [Online]. of a Regional Forecasting Model Based on Google
Available: http://mobilitylab.org/2013/12/11/capital- Transit Feed,” Preprint No. 12-0779, presented at the
bikeshare-hackers-pedal-their-wares-at-mobility- 13th TRB Planning Application Conference, Reno,
lab/. Nev., May 2011.
79. Hillsman, E.L. and S.J. Barbeau, Enabling Cost- 91. Lee, S.G., M. Hickman, and D. Tong, “Stop Aggrega-
Effective Multimodal Trip Planners through Open tion Model (SAM): Development and Application,”
Transit Data, Report No. USF 21177926, May 2011 Preprint No. 12-1287, presented at the 91st Annual
[Online]. Available: http://www.nctr.usf.edu/wp-content/ Meeting of Transportation Research Board, Washing-
uploads/2011/06/77926.pdf. ton, D.C., Jan. 2012.
80. Badger, E., “This Map Wants to Change How You 92. “Highlights of Seamless Transport: Making Connec-
Think About Your Commute,” The Atlantic Cit- tions,” International Transport Forum: 2012 Annual
ies, Jan. 2014 [Online]. Available: http://www.theat Summit, Leipzig, Germany, May 2–4 2012 [Online].
Available: http://www.internationaltransportforum.
lanticcities.com/commute/2014/01/map-wants-
org/pub/pdf/12Highlights.pdf.
change-how-you-think-about-your-commute/8197/
93. Tran, K., E. Hillsman, S. Barbeau and M.A. Labrador,
[accessed Jan. 28, 2014].
“GO_SYNC—A Framework to Synchronize Crowd-
81. Schade, M., “Techies Work to Merge Data From
Sourced Mapping Contributions from Online Com-
Multiple Transit Agencies,” Feb. 18, 2014 [Online].
munities and Transit Agency Bus Stop Inventories,”
Available: http://mobilitylab.org/2014/02/18/techies-
Preprint No. IS06-1354, presented at the 2011 ITS
work-to-merge-data-from-multiple-transit-agencies/
World Congress, Orlando, Fla., Oct. 16–20, 2011.
[accessed Feb. 23, 2014].
94. Hægstad, A., “Energy Saving in NSB,” n.d.
82. “Transit Tech Initiative,” Department of Environmen-
95. Maltby, P., Oct. 29, 2013 (14:42), comment on
tal Services, Arlington, Va. [Online]. Available: http:// “What did open data ever do for us?” Feb. 23, 2014
mobilitylab.org/tech/transit-tech-initiative/ [accessed [Online]. Available: http://data.gov.uk/blog/what-did-
Feb. 23, 2014]. open-data-ever-do-us.
83. Raschke, K., “OneBusAway Demo App Offers Best 96. Thomas, L.W., Legal Research Digest 37: Legal
Transit Info Yet for DC Users,” Oct. 25, 2013 [Online]. Arrangements for Use and Control of Real-Time Data,
Available: http://mobilitylab.org/2013/10/25/onebus Transportation Research Board of the National Acad-
away-demo-app-offers-best-transit-info-yet-for-dc emies, Washington, D.C., June 2011.
users/ [accessed Dec. 27, 2013]. 97. Williams, J., “Open Data: A Traveline Perspective—
84. “GoogleTransitDataFeed Open Source Software Proj- Delivering High Quality Mobility Information for
ect” [Online]. Available: https://code.google.com/p/ the Passenger,” Traveline Information Limited, p. 11
googletransitdatafeed/, site offers tools to convert to [Online]. Available: http://www.rtig.org.uk/web/Portals/
and from GTFS format [accessed Feb. 23, 2014]. 0/JulieWilliams_Jan13.pdf.
85. “Google Maps: Transit: Cities Covered” [Online]. 98. Polis Traffic Efficiency & Mobility Working Group,
Available: https://www.google.com/landing/transit/ “The Move Towards Open Data in the Local Transport
cities/index.html. Domain,” Polis Position Paper, June 2013.
86. Schweiger, C.L., “Electronic Signage for Public Trans- 99. Lewis, K., “Open Data Presents Opportunity, Challenge
port: Is it History?” Proceedings of 2013 ITS World for Public Transit Systems,” Passenger Transport, June
Congress, Session TS006: Traveler Information (1), 28, 2013 [Online]. Available: http://passengertransport.
Tokyo, Japan, Oct. 15, 2013. apta.com/aptapt/issues/2013-06-28/29.html.
87. Diez Sarasola, M. and L. Bardaji, “Interconnection 100. Barbeau, S.J., “Open Transit Data—A Developer’s
Model on the Cloud as a Base for a Global Public Perspective,” May 2, 2013 [Online]. Available: http://
Transport Data Interconnected System: Moveuskadi www.apta.com/mc/fctt/presentations/Presentations/
Project” Proceedings of 2012 ITS World Congress, The%20Many%20Uses%20of%20GTFS%20Data%20
Paper No. EU-00089, Vienna, Austria, 2012. %E2%80%93%20Opening%20the%20Door%20
82
83
APPENDIX A
Survey Questionnaire
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
APPENDIX B
Agencies and Staff Titles Responding to the Survey
IT Manager CTO
Ann Arbor Area Transportation Authority Charlotte Area Transit System (CATS)
Ann Arbor, Michigan Charlotte, North Carolina
104
105
106
APPENDIX C
Transportation Organizations and Conferences
Discussing and Promoting Open Transit Data
Many transportation organizations are discussing and pro- ride a bus to the train station, or take a taxi back home
moting open transit data. They include: when working late. They might also do something
entirely different the following day depending on such
• Mobility Lab in Arlington, Virginia, sponsors events day’s particular circumstances like weather, traffic,
like Transportation Techies (a group for programmers or personal engagements. In this new context, having
interested in transit, biking and walking) Hack Nights, accurate information and data has become increasingly
in which applications and visualizations using open useful. Thus, it is good news that as never before, we
transit data are created and discussed. For example, can now benefit from so many ready-to-use technolo-
on April 3, 2014, “Bus Hack Night” was held at gies [smartphones, global positioning system (GPS),
Mobility Lab to discuss bus technologies and ways mobile Internet, etc.]. Major transport companies
to use data that is collected from buses (http://www. (national railway, local transport or taxi companies)
meetup.com/Transportation-Techies/, accessed on have begun implementing apps to help people find their
April 1, 2014). way around and to plan the best route possible. How-
• The subject of open public transport data was promi- ever, the best solution is yet to be developed: one that
nent in IT-TRANS, an international conference and would meet every single need. Maybe it is not their
exhibition on Information Technology (IT) Solutions responsibility anymore because our needs are getting
for Public Transport organized by Karlsruher Messe- more individual and unique. Or maybe, simply because
und Kongress-GmbH (KMK) and the International the ‘ultimate’ solution will come from the users them-
Association of Public Transportation (UITP) that was selves or from the infinite creativity of the internet
held from February 18–20, 2014. business world” (C2).
• TransportationCamp, a series of unconferences that • OpenDataLab contest sponsored by Régie Autonome
started in 2011 and have been held in Washington, D.C.; des Transports Parisiens (RATP) (http://www.tom.
San Francisco, California; New York, New York; Mon- travel/2013/05/opendatalab-la-ratp-recompense-les-
treal, Quebec, Canada; Cambridge, Massachusetts; and applications-voyageurs/) to promote the creation of new
Atlanta, Georgia, has contained many sessions related to applications for travelers. This event had 110 partici-
open transit data. pants in 12 teams. There were three winning projects.
• Several transit agencies, including the Massachusetts • Tampere (Finland) region’s organizations, specialists
Bay Transportation Authority (MBTA) and Metro and developers are committed to an open data approach.
politan Transportation Authority (MTA) in New York There has been four “Open Data Tampere Meets” since
City, have held challenges and contests for application early 2013 (C3).
developers since 2009. • ITS Innovation Stockholm Kista is an “innovation com-
• Between February 22 and April 30, 2014, U.S. DOT petition which is organised by The City of Stockholm,
sponsored a Data Innovation Challenge, which promoted Swedish Transport Administration, Stockholm Public
and recognized the creation of the most innovative appli- Transport, Swedish ITS Council and Kista Science
cations, tools and visualizations of publicly-available City, and financed by Sweden’s Innovation Agency.
transportation data. [It] is arranged as a precommercial procurement, the
• Open transport is now the subject of conferences such first ever in Sweden. The challenge for competitors is
as the Open Data Transport day organized by a French to develop innovative solutions that meet the demand
association of transport authorities and held on June 3, of more effective travels and transports to and from the
2013 (C1). outer Stockholm district Kista. In the long run, solu-
• Data Days 2014 conference, which was held Feb- tions are to be scalable and equipped with proficient
ruary 17–19, 2014, in Ghent, Belgium, covered the business models so that they will serve citizens in the
idea that open transport is creating a path to more effi- larger Stockholm region after competition closure. In
cient multimodal mobility. “Personal urban mobility is order to facilitate data access, the competition organ
changing. Cars are no longer ‘# 1’ anymore, and com- isers have developed a data market where competi-
muters are less exclusive in the way they move around tors get free access to some forty datasets through one
in their daily life. Indeed, they use bike-share systems, API” (C4).
107
108
APPENDIX D
Total Annual Ridership for Each Responding Agency
This section contains the total annual ridership for each responding agency.
109
*Figures come from the National Transit Database, Annual Transit Profiles (available at
http://www.ntdprogram.gov/ntdprogram/data.htm), or directly from survey responses.
**N/A means that total ridership was not available.
110
APPENDIX E
Example Developer License Agreements and Terms of Use
This section contains examples of agency developer license • Metropolitan Atlanta Rapid Transit Authority (MARTA):
agreements and terms of use. http://www.itsmarta.com/developers/data-sources.
aspx
• Bay Area Rapid Transit (BART): http://www.bart.gov/ • Metropolitan Transportation Authority (MTA) (New
schedules/developers/developer-license-agreement York City): http://web.mta.info/developers/
• Champaign-Urbana Mass Transit District (CUMTD): • New Jersey Transit (NJT): https://www.njtransit.
https://developer.cumtd.com/terms-of-use com/mt/mt_servlet.srv?hdnPageAction=MTDev
• Chicago Transit Authority (CTA): http://www.transit LoginTo
chicago.com/developers/terms.aspx • Southeastern Pennsylvania Transportation Authority
• City of Madison Metro Transit: http://www.cityofmadison. (SEPTA): http://www2.septa.org/developer/
com/metro/Apps/terms.cfm • Transport for London (TfL): http://beta.tfl.gov.uk/
• East Japan Railway Company: http://www.jreast.co.jp/ corporate/terms-and-conditions/transport-data-service
e/termsofuse/index.html • TriMet: http://developer.trimet.org/
• GO Transit: http://www.gotransit.com/public/en/ • VIA Metropolitan Transit: http://www.viainfo.net/
schedules/goapps/web/goapps.aspx Opportunities/DevLicense.aspx
• King County Transit: http://www.kingcounty.gov/ • Washington Metropolitan Area Transit Authority
transportation/kcdot/MetroTransit/Developers/Terms (WMATA): https://www.wmata.com/rider_tools/license_
OfUse.aspx agreement.cfm
• Massachusetts Department of Transportation (MassDOT): • York Region Transit (YRT): http://yrt.ca/en/aboutus/
https://www.massdot.state.ma.us/DevelopersData.aspx GTFS.asp
111
APPENDIX F
Examples of Open Transit Data Applications
This section contains examples of open transit data applications. • MARTA: http://www.itsmarta.com/developers/app-
station.aspx
• BART: http://www.bart.gov/schedules/appcenter • MBTA: http://www.mbta.com/rider_tools/apps/Default.
• CUMTD: http://www.cumtd.com/maps-and-schedules/ asp
app-garage • MTA: http://web.mta.info/apps/
• CTA: http://www.transitchicago.com/apps/ • SEPTA: http://appsforsepta.org/apps
• City of Madison Metro Transit: https://www.cityof • Transport for London (TfL): http://data.london.gov.uk/
madison.com/metro/apps/ search/node/TfL
• East Japan Railway Company: https://play.google. • TriMet: http://trimet.org/apps/
com/store/apps/developer?id=East+Japan+Railway • VIA Metropolitan Transit: http://www.viainfo.net/Bus
+Company+ICT (Android only) Service/Mobile.aspx
• GO Transit: http://www.gotransit.com/public/en/ • WMATA: http://developer.wmata.com/Application_
schedules/goapps/web/goapps.aspx Gallery
• King County Transit: http://www.kingcounty.gov/
transportation/kcdot/MetroTransit/Developers/
AppCenter.aspx
112
APPENDIX G
Examples of Applications from Transport Innovation
Deployment for Europe (TIDE) Project
113
114
APPENDIX H
TfL Open Transit Data Available from the London Datastore
This section contains open transit data available from • TfL Complaints Reports—The first TfL Complaints
the London Datastore (http://data.london.gov.uk/taxonomy/ Report for 2011/12
organisations/tfl). • TfL Live Bus Arrivals API—Developer’s API for the
Live bus arrivals data
• TfL Live Traffic Cameras—Live traffic camera feeds of • Travel Patterns and Trends, London—A summary of the
London’s streets key travel patterns and trends relating to the TfL network
• TfL Station Locations—Locations and facilities of and Airports
London’s Underground, Overground, and DLR stations • Tube Network Upgrade Data from Transport Committee
• Number of Bicycle Hires—Total number of hires of the Report—Data on the Tube upgrade programme between
Barclays Cycle Hire Scheme, by day, month, and year 2003/04 and 2010/11, requested by the London Assembly
• Public Transport Journeys by Type of Transport— • Tube Network Performance Data from Transport Com-
Number of journeys by TfL reporting period, by type of mittee Report—Data on the performance of the Tube
transport. Data are broken down by bus, underground, between 2003/04 and 2010/11, requested by London
DLR, tram, and Overground. Assembly
• London Underground Performance Reports—London • Journey Planner API Beta—Developer’s API for the TfL
Underground periodic performance reports, and TfL’s Journey Planner
key London Underground performance measures • Cycle Hire availability—The TfL feed contains details
• Vehicles entering c-charge zone by month—Total num- of Barclays Cycle Hire docking stations including the
ber of vehicles that entered the Congestion Charging number of available bikes and docking points.
Zone during charging hours • TfL Expenditure Over £500—Transport for London
• Cycle Flows on the TfL Road Network—Cycle flows expenditure over £500
on the Transport for London Road Network (TLRN) • Tube Departure Boards, Line Status, and Station status—
• Transport Crime in London—Number of crimes and Train prediction and London Underground station and
crime rate by type of public transport, including bus, line status
LU/DLR, London Overground, and London Tramlink • TfL Station Facilities—A Geo-coded KML feed of most
• Bus Use and Supply Data 1999–2022—Technical analy- London Underground, DLR and London Overground
sis for London Assembly Transport Committee report on stations.
bus services in London (October 2013) • Dial a Ride Usage Statistics—Quarterly Report, by Lon-
• Number of Buses by Type of Bus in London—Number don Borough, of Dial-a-Ride usage
of buses in the TfL fleet by type of bus in London • Coach Parking Locations—Details of coach parking
• Key Performance Indicators on the TfL Road Network— facilities and other useful information for coach drivers
Number of hours of Serious and Severe Disruption on in London
the road network by planned and unplanned status, jour- • Estimated Number of Londoners with Reduced Mobility
ney time reliability, total number of works undertaken on —Estimated number of Londoners with reduced mobil-
the road network, and number of cycle hires with average ity in 2010, 2018, and 2031
hire time • TfL Live Roadside Message Signs—Information on the
• Killed and Seriously Injured Casualties, London—Killed location and live message on every sign currently dis-
and seriously injured casualties in Greater London 1994– playing information in London.
2012 • TfL Live Traffic Disruptions—Information about the
• Public Transport Accessibility Levels—Transport for location, nature, impact and timing of a range of disrup-
London’s 2008 Public Transport Accessibility Levels tions being monitored by TfL’s 24/7 traffic control centre
(PTALs) • TfL Timetable Listings—Multimodal working timetable
• GLA Group Land Assets—Data showing the location data from the TfL Journey Planner, including Tube, Bus,
of the GLA Group’s land and property holdings and and DLR
development opportunities • TfL Cycle Hire Locations—Name and coordinates of the
• Low Carbon Generators—Listing of low carbon energy Barclays Cycle Hire docking station locations launched
generators installed on GLA group properties in Summer 2010
• TfL Rolling Origin and Destination Survey—Rolling • TfL Bus Stop Locations and Routes—Locations of
programme to capture information about journeys on more than 19,000 bus stop and details of all of TfL’s
the London Underground network contracted bus routes
115
• Oyster Ticket Stop Locations—Locations of more than • Accessibility of London Underground Stations—Detail
3,800 Oyster Ticket Stop outlets across London relating to the physical accessibility of London Under-
• London Underground Signals Passed at Danger—The ground stations
number of signals passed at danger by London Under- • TfL Investment Programme 2009/10 to 2017/18—Data
ground or on London Underground’s infrastructure incorporated in Transport for London’s Investment Pro-
• TfL Pier Locations—Details of TfL pier locations gramme Reports 2009/10
• River Boat Timetables—Timetables for river boat • TfL Business Plan 2009/10 to 2017/18—Data incorpo-
crossings rated in Transport for London’s Business Plan