Optimization of The Public Transport System Using Data Analysis Methods
Optimization of The Public Transport System Using Data Analysis Methods
Abstract—The system of public transport movement in the Thus, in order to solve the existing problem of optimizing
urban environment has been investigated using modern the work of urban public transport, it is necessary to expand
methods of data analysis. This approach allows to optimize the the base of the used methods, including through the
work of public transport and thereby improve the quality of development of a methodology for determining the routes
population life. When this problem is decided, for the first duplication in the urban transport network. The research
time, such an interdisciplinary approach is used, which results can be used in modeling and monitoring of business
combines elements of linear algebra, graph theory and processes of various transport systems [9, 10].
methods of statistical information processing. Possible problem
solutions of planning the topology in public transport system The purpose of the research is to determine the
are proposed to overcome the problem of the presence of duplication of public transport routes in order to develop
duplicating routes, depending on their duplication coefficients. recommendations for optimizing the city’s transport system.
The practical task of data analysis of the public transport The technique has been tested on the transport network of the
traffic pattern in order to identify duplicate routes was solved city of Izhevsk in the Udmurt Republic.
for the system of the city of Izhevsk (Udmurt Republic). The
studied transport system includes 323 stopping points and 55 II. PROBLEM FORMULATION
routes of urban public transport. The information presented
on the site of the interactive geoinformation system (IGIS) was
Route duplication is the coincidence of urban public
used in the numerical calculations. The routes duplication transport routes on separate sections of the city’s road
coefficients were calculated by the methods of correlation network, or their complete overlap [11].
analysis, and their significance was assessed by the Student’s The assessment of the degree of duplication of routes is
criterion. Based on the coefficients and duplicating routes based on the ratio of the length of a section of the road
variants, specific methods for solving the problem of planning network of coinciding routes to the total length of the routes
the topology of a transport network are proposed. Calculations
under consideration. This method for assessing the degree of
are shown that in order to overcome the problem of
route duplication is complicated by the need to determine the
duplication of routes in the public transport system of Izhevsk,
more than 28 routes should be synchronized, and 6 public
length of each of the routes under consideration. Since the
transport routes should be replaced by 3 routes as a result of route of movement of public transport between stopping
their partial merging. points does not affect the points of landing of passengers,
therefore, it is necessary to develop a method that does not
Keywords—Public transport, route duplication coefficient, take into account this path, while at the same time based on
traffic optimization, route matrix, Student’s test the analysis of stops (nodes) entering the routes.
( )
duplicates the section of the second route. Solution Based on the calculated coefficients, a route duplication
method: elimination of the short route, the long matrix D = d m j mk ⋅100% is formed, the elements of
M ×M
second route becomes the optimal one.
which characterize the degree of occurrence of the k-th route
• Variant No. 2: Initial routes have a joint section of in the j-th route in percent.
traffic on which the final stops of each route are We use correlation analysis to assess the significance of
located. Solution method: the most requested route is the duplication coefficients between routes. In statistics, to
lengthened so that the other route is fully included in assess the relationship between binary variables, the
the elongated optimal route. Thus, we get the first contingency coefficient is used, the analogue of which is the
variant for route duplication, so a shorter route is Pearson coefficient:
excluded.
• Variant No. 3: Initial routes have a joint section of
(
ρ mk , m j = ) σcov (mk , m j ) , (3)
traffic, on which one of the routes deviates from the
other for a small section (no more than three stops)
(mk ) ⋅ σ(m j )
( )
and ends with a final stop. Solution method: The
most requested route is lengthened so that the other where cov mk , m j – covariance coefficient between
route is fully included in the extended optimal route,
while the remaining route passes through the section
( )
columns of route matrix; σ(mk ), σ m j – standard deviations
with a branch with a return. of route matrix columns.
• Variant No. 4: The original routes have coinciding According to formulas (2) and (3), the correlation
trajectories, but deviate from each other in the course coefficient between routes can be determined through the
of movement on several sections of the path, but not duplication coefficient:
more than 500 meters. Solution method: the optimal
( )
route includes the final stops of both routes and goes mk
along the higher priority streets. ρ mk , m j = d m j mk ⋅ . (4)
mj
So, the task is to optimize the transport network by
searching for and eliminating duplication of public transport
routes. Based on the degree and variant of route duplication, In order to check the significance of the correlation
various methods are proposed for solving the problem of coefficient, you can use the Student’s t-statistics:
planning the optimal topology of a transport network.
(
ρ mk , m j ) nkj − 2
III. THE METHOD FOR DETERMINING ROUTE DUPLICATION t kj = , (5)
Using graph theory, we represent a public transport (
1 − ρ mk , m j )2
scheme by a route matrix. The numbers of nodes that are
components of the corresponding routes to the corresponding
(W )
N
nodes of the network are entered as elements of the matrix. nkj = + Wij − Wik ⋅ Wij . (6)
ik
The route matrix W is a binary data table with the i =1
dimension N × M , where N – the number of nodes
(stopping points) in the city’s transport network, M – the The resulting value t kj is compared with the tabular
number of public transport routes. If the i-th node is included value of the Student’s t-test with (n kj − 2 ) -th degrees of
in the j-th route, then Wij = 1 , otherwise Wij = 0 .
freedom and α -th level of significance. If t kj > tα nkj − 2 , ( )
The duplication coefficient of k-route for j-route is
defined as the ratio of the number of coincident stops for two then the correlation coefficient of the routes ρ ( m k , m j ) is
routes to the number of stops on k-route: statistically significant and, therefore, the assessment of the
duplication of these routes is significant.
(W )
N
⋅ Wij Based on the value of the duplication coefficients,
ik
checking their significance and the duplication variant, a
i =1
d m j mk = N
. (1) specific managerial solution to the problem of planning the
topology of the transport network is proposed to overcome
i =1
Wik2
the problem of route duplication.
175
Authorized licensed use limited to: Universidad Tecnologica de Pereira. Downloaded on September 20,2024 at 02:17:56 UTC from IEEE Xplore. Restrictions apply.
transport system of the city includes 323 stopping points and The third group in terms of the degree of duplication (
55 routes of urban public transport. 60% < d ≤ 80% ) includes 28 routes. In Fig. 2 shows an
example of bus routes No. 2, No. 6 and No. 41, which
To search for duplicated routes in the transport network
duplicate each other in the range of 65–80%. In this case,
of the city of Izhevsk, information on public transport routes
some of the routes can be excluded as a result of the merging
of the city was used, presented on the website of the
of routes into one, or you can adjust the traffic intervals
interactive geographic information system (IGIS).
along these routes.
The initial data in the form of information about each
The last fourth group of routes by the degree of
stop in the city and routes passing through these stops were
recorded in the Data Set using the Pandas library in the duplication ( 80% < d ≤ 100% ) includes 6 routes for public
Python programming language. Based on these data, a matrix transport: bus No. 8, No. 9, No. 11, minibus No. 50,
of public transport routes of the city of Izhevsk was filled trolleybus No. 2, No. 7. For these routes, according to the
with dimensions 323x55. Using the NumPy library (an open redundancy variant, specific recommendations are developed
source Python library is used to work with multidimensional for their optimization and planning. In Fig. 3 shows the
data arrays and supporting high-level mathematical intersection of routes No. 9, No. 11 and No. 73. Route
functions), on the basis of the route matrix with formula (2), No. 11 deviates from No. 9 by two final stops (variant
the elements of the route duplication matrix D are calculated No. 3). Since route No. 11 is the most popular, it is proposed
(see Fig. 1). to exclude route No. 9 from the city’s transport network,
while movement along route No. 11 continues along the
The Seaborn library for creating statistical graphs was deleted section of the route and returns to its path of
used to construct a HeatMap (a mapping in Python of the movement. And route No. 73 deviates from routes No. 9 and
route duplication matrix). No. 11 in the central section, leaving on a parallel street at
three stops (variant No. 4).
Further, based on the calculated coefficients, the degree
of route duplication was determined and its significance was
assessed using the Stats module of the ScyPy library, which
contains a large number of probability distributions,
correlation functions and statistical tests, including functions
for calculating Pearson’s correlation coefficients (3) and
Student’s test (5).
Fig. 2. Scheme of bus routes in the city of Izhevsk: No. 2 (red color),
No. 6 (purple color), No. 41 (light blue color)
The first group of duplication ( d ≤ 40% ) included 4 Similarly, route No. 8 is 83% the same as route 34 (see
Fig. 4), but this case of route duplication refers to variant No.
routes. They don’t need optimization.
4. To eliminate duplication, the longer route No. 34 should
The second group in terms of the degree of duplication be left, but it is proposed to run route No. 34 along the
( 40% < d ≤ 60% ) includes 17 routes of public transport. different central section of the route along Baranova Street
For these routes, it’s need to synchronize the schedule. To parallel to Klubnaya Streets (route No. 34) and Zarechnoye
synchronize the route schedule, it is proposed to use a Shosse (route No. 8). Thus, 6 public transport routes,
mathematical model [12], which simulates the movement of included in this duplication group, should be transformed as
public transport in an urban network, to identify congested a result of exclusion and adjustment into 3 routes.
stopping points, which are “bottlenecks” in the route network
due to the presence of duplicate routes.
176
Authorized licensed use limited to: Universidad Tecnologica de Pereira. Downloaded on September 20,2024 at 02:17:56 UTC from IEEE Xplore. Restrictions apply.
network topology are proposed, based on the values of the
duplication coefficients and the route duplication option. So,
to overcome the problem of duplication of routes in the
public transport system of the city of Izhevsk, it is proposed
to synchronize the timetable for more than 28 routes, and
replace 6 public transport routes with 3 routes as a result of
their partial merging.
The method for determining duplicated routes proposed
by the authors is universal; it can be applied to solve the
problem of optimizing the public transport system of any
urban agglomeration, both in the Russian Federation and
abroad.
REFERENCES
[1] D. V. Petrova, Modern approaches to organizing monitoring of public
transport passenger traffic in urban agglomerations, International
Journal of Open Information Technologies, vol. 8, 2020, pp. 47–57.
[2] K. V. Ketova and E. A. Saburova, Addressing a problem of regional
socio-economic system control with growth in the social and
engineering fields using an index method for building a transitional
period, Advances in Intelligent Systems and Computing, vol 1295.
Fig. 3. Scheme of bus routes in the city of Izhevsk: No. 9 (light blue
Springer, Cham, 2020, pp. 385–396. doi: 10.1007/978-3-030-63319-
color), No. 11 (green color), No. 73 (red color) 6_35.
[3] L. A. Merlin, M. Singer and J. Levine, Influences on transit ridership
and transit accessibility in US urban areas, Transportation Research
Part A: Policy and Practice, vol. 150, 2021, pp. 63–73.
doi: 10.1016/j.tra.2021.04.014.
[4] J. Xie, S. Zhan, S. C. Wong and S. M. Lo, A schedule-based timetable
model for congested transit networks, Transportation Research Part C:
Emerging Technologies, vol. 124, 2021, 102925.
doi:10.1016/j.trc.2020.102925.
[5] C. Du, W. Ren and J. Chen, Transfer per-formance evaluation
indicators and method on urban public transportation network, Journal
of Wuhan University of Technology (Transportation Science and
Engineering), vol. 38, 2014, pp. 418–421. doi:10.3963/j.issn.2095-
3844.2014.02.040.
[6] S. F. Marques and C. S. Pitombo, Ridership Estimation Along Bus
Transit Lines Based on Kriging: Comparative Analysis Between
Network and Euclidean Distances, Journal of Geovisualization and
Spatial Analysis, vol. 5, 2021. doi:10.1007/s41651-021-00075-w.
[7] E. Kasatkina, D. Nefedov and E. Saburova, Mathematical model of
adaptive control in fuel supply logistic system, Studies in Systems,
Decision and Control, vol. 199, 2019, pp. 577–593. doi:10.1007/978-
Fig. 4. Duplication assessment of bus routes No. 8 (red color) and No. 34 3-030-12072-6_47.
(blue color) in the city of Izhevsk [8] V. E. Gozbenko, M. N. Kripak, O. A. Lebedeva and S. K.
Kargapolcev, Improvement in the functioning of transport network of
V. CONCLUSION urban passenger transport by using automation of optimal rolling
stock selection model, Modern technologies. System analysis.
The article is devoted to the data analysis of the urban Modeling, vol. 2, 2017, pp. 203–208.
public transport system to solve the problem of route [9] A. L. Zolkin, R. V. Faizullin and V. V. Dragulenko, Application of
duplication. For the first time, a method is presented for the modern information technologies for design and monitoring of
determining the duplication of routes based on the integrated business processes of transport and logistics system, Journal of
use of mathematical methods of graph theory, linear algebra Physics: Conference Series, vol. 1679, 2020, 032083.
doi:10.1088/1742-6596/1679/3/032083.
and statistical analysis.
[10] E. V. Kasatkina and D. D. Vavilova, Computer simulation of traffic
To assess the degree of route duplication, it is proposed flows, Journal of Physics: Conference Series, vol. 1694, 2020,
to divide routes into 4 groups: route coincidence up to 40%; 012009. doi:10.1088/1742-6596/1694/1/012009.
from 40% to 60%; from 60% to 80%; the coincidence of [11] A. A. Carikov, V. G. Bondarenko and M. S. Pjatanov, Organization of
routes is more than 80% of its length. routes of urban public passenger transport taking into account free-of-
charge transfers, Innovative transport, vol. 2, 2020, pp. 18–26.
To calculate the coefficient of duplication of routes, it is doi:10.20291/2311-164X-2020-2-18-26.
proposed to compose a matrix of routes and for each vector [12] O. N. Larin and A. A. Kazhaev, Questions of formation of disputed
situations on routing networks of municipal formations, Bulletin of
route to calculate pairwise Pearson correlation coefficients. Brest State Technical University, vol. 5, 2010, pp. 60–63.
In this case, the significance of the duplication coefficients
should be checked on the basis of the Student’s criterion.
Using the developed methodology, an assessment of the
degree of duplication between 55 routes of the city of
Izhevsk in the Udmurt Republic was carried out. Specific
methods for solving the problem of planning the transport
177
Authorized licensed use limited to: Universidad Tecnologica de Pereira. Downloaded on September 20,2024 at 02:17:56 UTC from IEEE Xplore. Restrictions apply.