DataFITS A Heterogeneous Data Fusion Framework For Traffic and Incident Prediction
DataFITS A Heterogeneous Data Fusion Framework For Traffic and Incident Prediction
Abstract— This paper introduces DataFITS (Data Fusion on highs of 48.5 million cars (2022) and 12.7 billion carried
Intelligent Transportation System), an open-source framework passengers (2019, before the pandemic) [2], [3]. As a result,
that collects and fuses traffic-related data from various sources, urban areas experience an increasing number of traffic-related
creating a comprehensive dataset. We hypothesize that a het-
erogeneous data fusion framework can enhance information incidents (e.g., congestion and accidents), increasing time
coverage and quality for traffic models, increasing the efficiency delays, emissions, and fuel consumption [4].
and reliability of Intelligent Transportation System (ITS) appli- For this reason, academia and industry have driven efforts
cations. Our hypothesis was verified through two applications to create the next generation of transportation systems that
that utilized traffic estimation and incident classification models. are eco-friendly, cost-efficient, and powered by data analysis
DataFITS collected four data types from seven sources over nine
months and fused them in a spatiotemporal domain. Traffic and communication technology. We hypothesize that a het-
estimation models used descriptive statistics and polynomial erogeneous data fusion framework can enhance the coverage
regression, while incident classification employed the k-nearest and quality of information serving as input for traffic models,
neighbors (k-NN) algorithm with Dynamic Time Warping (DTW) thus increasing the efficiency and reliability of ITS applica-
and Wasserstein metric as distance measures. Results indicate tions. Therefore, we propose the Data Fusion on Intelligent
that DataFITS significantly increased road coverage by 137%
and improved information quality for up to 40% of all roads Transportation System (DataFITS) framework, providing a
through data fusion. Traffic estimation achieved an R2 score spatiotemporal fusion of data used to train models for two
of 0.91 using a polynomial regression model, while incident ITS applications, traffic estimation, and incident classification.
classification achieved 90% accuracy on binary tasks (incident DataFITS collects and combines real heterogeneous data (e.g.,
or non-incident) and around 80% on classifying three different weather, traffic, incident) from various sources (e.g., open
types of incidents (accident, congestion, and non-incident).
databases, map applications), preparing them by fixing errors,
Index Terms— Intelligent transportation systems, heteroge- adapting the data structure, and finally fusing them in the exact
neous data fusion, traffic estimation, incident classification. location and point in time. Our hypothesis is verified using
data characterization to quantify the benefits of combining
I. I NTRODUCTION
heterogeneous data sources and the proposal of two ITS
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
ZIßNER et al.: DataFITS: A HETEROGENEOUS DATA FUSION FRAMEWORK 11467
compares them against our solution. The design of DataFITS Their results showed that estimation up to 30 minutes ahead
and the traffic data applications are described in Section III. has an error of 12%. Meanwhile, [18] employs deep learning
Section IV evaluates the performance of our framework and algorithms for traffic estimation, showing an improvement of
the effectiveness of our traffic estimation and incident classi- accuracy and efficiency. These approaches discuss the usage
fication models using the heterogeneous fused data, verifying of ML to create accurate models for traffic estimation, but do
our hypothesis. Finally, we conclude this paper in Section V, not consider further methods, such as data fusion, correlation,
highlighting open problems for future investigations. etc.
Some ML approaches use spatiotemporal correlation
II. R ELATED W ORK to improve traffic estimation quality. In [19], a neural
network(NN)-based estimation using Graph Convolutional
This section reviews the literature on three main topics
Network (GCN) and Gated Recurrent Unit (GRU) models is
related to our proposed solution: (i) data collection and
proposed with full public access. The GCN captures spatial
fusion, (ii) traffic estimation, and (iii) incident classification.
dependencies from the road network, and GRU detect dynamic
Finally, we summarize and compare the literature with our
changes in traffic data and captures temporal dependencies.
proposal.
Other NN-based approaches, such as [20] and [21], show
similar improvements in accuracy using data correlation.
A. Data Collection and Fusion Wang et al. [22] propose an open-source deep learning frame-
To develop ITS applications, significant data is required work using GCN to estimate network-wide traffic multiple
from real or virtual sensors [5]. Vitor et al. [4] present a steps ahead in time. Zheng et al. [23] introduce another open-
platform to collect, process, and export heterogeneous data source solution, the Graph Multi Attention Network (GMAN),
from smart city sensors, providing different statistics and using an encoder-decoder architecture to provide long-term
visualizations. However, their platform concentrates on secur- traffic estimation up to one hour ahead. These approaches
ing data. Similarly, [6] proposes a smart city data platform also include correlation to improve the discussed models and
containing information from various cities. In contrast to our offer access to their data but do not propose a solution
framework, we focus on improving the quantity and quality of for collecting or fusing data. Limited literature combines
the information by fusing data, and we assess the advantages data fusion, spatiotemporal correlation, and ML to estimate
of using fused data through two ITS applications. traffic, similar to our solution. In [26], the authors fuse traffic
Data fusion combines data from multiple sources, enrich- data from stationary and dynamic sensors, considering the
ing spatiotemporal information [7], [8], [9], [10]. Several spatiotemporal correlation between traffic levels of road seg-
applications benefit from data fusion, such as emergency ments. A Multiple Linear Regression (MLR) model processes
management [11] and path planning [12]. However, fusing het- the fused information to enhance traffic estimation accuracy.
erogeneous data requires additional preprocessing to combine Unlike our solution, this approach relies solely on traffic
various data types and features [13], [14]. This investigation data from sensors but does not consider different data types
focuses on two applications supported through data fusion: and sources. Zhao et al. [24] propose a general platform for
traffic estimation and incident classification, and the methods spatiotemporal data fusion to enhance traffic estimation. The
to achieve their goals, such as data acquisition, fusion, machine approach introduces a fusion method to improve accuracy
learning, correlation, and different data types. by combining direct and indirect traffic-related data as input
for two different ML models. The indirect traffic-related data
features contain information about weather and points of inter-
B. Traffic Estimation
est and are used to improve the estimation quality. However,
Traffic estimation is a crucial smart city application for their model uses pre-existing datasets, offering no solution
better transportation management. This review focuses on for data collection, and our study focuses on incident-related
data fusion, spatiotemporal correlation, and machine learning data, while the authors in [24] consider points of interest and
techniques to achieve accurate and reliable traffic estimation weather conditions.
using historical data. The increasing availability of open In [27], the authors introduce a model to estimate traffic
databases (kept by governmental authorities) and Application within a small urban area in Zurich, with data acquired as part
Programming Interfaces (APIs) to commercial applications of a video measurement campaign. Their solution fuses infor-
(Bing, Google Maps, etc.) results in a vast collection of traffic- mation from Loop detectors, traffic lights, and other sensors
related data, making big data an opportunity for heterogeneous (e.g., video plus license plate recognition, thermal cameras)
data fusion [15]. The challenge is to combine stationary sensor and trains different MLR models with this data. Finally,
data (e.g., traffic cameras or loop detectors) and probe vehicle they evaluate the various sensors’ accuracy and robustness.
information (e.g., cameras, GPS, cellular data, or vehicular In contrast to our solution, they investigate the quality of a
sensors). Anand et al. [16] used a Kalman filter to fuse regression model using different sensor data fused to stationary
traffic flow values (from cameras) and travel time (from GPS), data. Furthermore, their data is acquired using sensors that are
improving a traffic estimation approach. not publicly available, covering only a small urban area.
Many recent traffic estimation models use Machine Learn- Finally, [25] proposes a traffic speed prediction by inte-
ing (ML) [17], [18], [19], [20], [21], [22], [23], [24], [25]. Ref- grating heterogeneous data from various sensors, including
erence [17] proposes an auto-regressive model that uses data exogenous data like weather, into a hybrid spatiotemporal
from a traffic simulator and adapts to events like accidents. features space. The main contributions are a hybrid model
11468 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 10, OCTOBER 2023
necessary for creating the road trajectories for data entries that
are only represented by a start and an end point within the data.
This process is performed on each record of the vehicular
and incident data sources, while the geo-location of the traffic
data sources is only matched once for each area, as those
are static and do not change between data acquisitions. This
strategy significantly reduces the execution time and com-
putation required for map-matching. Instead of processing
all data points within each acquisition, the main amount of
data points, namely the traffic-related information, is only
matched once. On our nine-month data time frame and a
10-minute acquisition time, this reflects a single matching
procedure instead of 38,800 procedures, significantly reducing
the runtime and required computation power. Fig. 3. Design of the traffic estimation application.
The spatial data fusion process combines the fused input
dataset with the map-matching output, adding the enriched
information about opath, cpath, and matched GPS points. but also discusses the model based on descriptive statistics.
To provide the data for all given road segments, the informa- and gives a comparative evaluation of both approaches in
tion is rearranged by extracting all fid values from the cpath section IV.
array of each data entry. The amount of computing power and a) Preprocessing: The fused data from DataFITS is
memory for grouping data points is a non-linear function of cleaned, removing all incident-related information, as it is not
the input data size. Therefore, the framework splits the data required by the model, and grouped into traffic areas contain-
into chunks, reducing the memory requirement and allowing ing one or multiple road segments. Using a data aggregation
multi-threading to speed up the process. over the array of road identifiers (cpath), we create a list of
4) Data Usage: The last stage, (4) in Fig. 2, describes areas contributing traffic information to the dataset.
different use cases of the fused dataset, e.g., as an input to Due to the traffic area grouping, the data may contain
various data applications or being characterized through dif- overlapping areas due to the data fusion that merges traffic
ferent types of statistics and visualizations for spatiotemporal areas from different sources. Those intersecting areas describe
data analysis. For example, DataFITS provides heat maps and the same spatial region but with minor differences in the
density plots separated by each source and different features, covered road segments. Combining them removes potential
such as the number of observations, traffic levels, speed, and duplicated areas, resulting in a final set of unique traffic
types of incidents. In the scope of temporal analysis, DataFITS areas. The underlying function iterates through all existing
provides time-series statistics for a specific time window (e.g., areas, calculates pairwise intersections, and combines them if
by the hour, day of the week, month, and season) and shows the overlapping road segments exceed a predefined threshold
the correlation between different features. Moreover, the fused th overlap . Finally, the initial set of fused data is re-grouped
data is exported in different data structures, allowing to be used according to the new set of combined traffic areas, resulting in
by various data applications, such as our proposed models or an input dataset for the traffic estimation models that contains
other third-party tools (e.g., ArcGIS). the combined information for each area.
Furthermore, the design covers a procedure to add data
points from other regions that show similar traffic patterns
B. Applications based on correlation. The goal is to increase the volume of
1) Traffic Estimation: The proposed traffic estimation appli- data points in areas with insufficient training data. Therefore,
cation is organized into two phases, as shown in Fig. 3. by correlating the traffic patterns (traffic level/relative speed)
Phase (1) prepares the data, groups it by intersecting areas, from different regions, the highly correlated areas can be
identifies similar traffic regions based on correlating traffic merged, increasing the training dataset, thus, benefiting the
patterns and performs a train-test-split. A traffic region is accuracy of the traffic estimation. To identify such regions,
defined as the set of connected paths (road segments) reported we calculate data similarity based on the traffic values, aiming
from a data source, represented through unique road identifiers for a more precise representation of the traffic situation within
(fids). By intersecting areas, we obtain a list of unique traffic the original area. The corresponding function implements a
regions and are able to measure the similarities between them. modified version of the Pearson Correlation and the DTW to
In phase (2), the prepared data is used to create and evaluate identify correlated traffic areas with similar traffic patterns.
two traffic estimation models using: i) descriptive statistics The correlation between two time series was defined in [36]
(naive); and ii) polynomial regression. Each model estimates and adapted for our proposed methodology.
traffic values for a single area within an arbitrarily defined time PL
interval and can also utilize data from correlating regions with t=1 (Si (t) − S̄i )(S j (t) − S̄i ))
X i, j = qP qP (1)
similar traffic behavior. Furthermore, the process considers L−t
(S (t) − S̄ )2· L−t
(S (t) − S̄ )2
t=1 i i t=1 j j
optional input parameters like weekday, weather, and road type
to create more specific models for the given characteristics. Using Eq. (1), we calculate the respective correlation
This research mainly focuses on the regression-based model between two time series of any traffic data feature Si (t) for
ZIßNER et al.: DataFITS: A HETEROGENEOUS DATA FUSION FRAMEWORK 11471
before and after the reported incident start time from the relative to the speed limit), and road type. k-NN was trained
data source, representing a time interval that includes data using two different distance metrics suitable to time series
prior to and after the measurable incident effect. We chose data. The DTW metric is used to measure the distance between
the values of 90 and 120 minutes based on an exploratory two time series in the classification model. Furthermore,
data analysis (briefly indicated in Table IV), showing that we train a model using the Wasserstein metric, a function to
the majority of incidents did not exceed 120 min impacting calculate the distance between two probability distributions µ
the traffic behavior, except for a few cases during congestion and ν, defined in Eq. 5 [44].
and disabled vehicle incidents. The second approach tries to Z 1
p
estimate the incident start time, iterating over the traffic data, W p (µ, ν) := inf d(x, y) dγ (x, y)
p
(5)
to find significant changes in the traffic pattern and setting the γ ∈0(µ,ν) RxR
start point based on this observation. The estimated approach We based our model on the k-NN-Implementation K Near-
aims to provide a more realistic representation of the incident est Neighbors with Dynamic Time Warping,2 modified and
start time, which is evaluated later in Section IV. extended to support the Wasserstein metric. k-NN uses a
Moreover, each incident report is validated by identifying parameter n (the number of neighbors) and the maximum
samples with high noise in the corresponding traffic data. warping window for the DTW, limiting the number of ele-
Noise has a negative contribution to the model and adds a ments to compare and therefore reducing the execution time.
potential bias to its accuracy. Therefore, it is detected and Moreover, data over and under-sampling are used to reduce
removed from the input dataset using three different strategies the under-representation of accident samples in our imbalanced
to validate each incident report, where at least one must be input dataset. To reduce the severity of this problem, two
satisfied: i) Comparing the absolute difference between the methods are used: i) Oversampling: Adds new samples of
traffic at the start and the end of the incident time interval, the minority class to the training data, using the information
checking for a noticeable difference given by a deviation from the already existing data points. Our implementation
of more than a predefined threshold; or ii) Calculating the includes Random Oversampling and SMOTE Oversampling;
standard deviation over the entire incident time interval and ii) Undersampling: Provides the contrary part by removing
comparing it to a certain threshold; or iii) Iterating over samples from a majority class. Our implementation includes
the data points close to the incident start and end time and Nearmiss Undersampling. These data sampling approaches are
checking for a point-wise traffic variation above a defined evaluated later in Section IV, further comparing the model
threshold. Based on these methods, we extract incidents that quality using an imbalanced dataset.
reflect a clear traffic pattern that shows a measurable impact Finally, the classification model is created using the param-
of the incident on the traffic behavior and remove all other eters k (number of neighbors), warping window (compar-
patterns that could be confused with a non-incident traffic ison constraint), and metric (DTW or Wasserstein). Next,
pattern or a biased sensor report. the model is trained, and each test data sample is classified
Lastly, to create the final dataset for the classification model, using the trained model, returning a class label together with
three further data processes are performed: the corresponding probability. This method is implemented
• Add “normal traffic data”: Adding non-incident sam- by calculating a distance matrix that contains the respective
ples to the input data is essential in the designing distance (DTW or Wasserstein) between all data samples
of our incident model. We add observations similar in regarding the chosen data feature. Using this matrix, our
time, weekday, and location to get the most comparable proposed algorithm can find the k closest neighbors and extract
data. Therefore, these reports can accurately identify a the most representative label for all test data samples.
non-incident situation for every incident in the dataset.
• Data Interpolation: As a result of measurement errors or IV. E VALUATION
other problems in the data collection, there is a possibility
This section evaluates DataFITS by quantifying the
of missing traffic values within the incident duration.
improvements in data quality and quantity and presents a data
Therefore, we use linear data interpolation to fill gaps
characterization analysis from a real heterogeneous dataset.
in the traffic data if required.
Finally, the enhanced fused data is used to evaluate the traffic
• Train-Test-Split: Finally, the input data is split into train-
estimation and incident classification models.
ing and testing datasets. The former is used to train the
ML model, allowing it to be generalized. Furthermore,
the testing dataset evaluates the classification quality. The A. The Data Fusion Framework
incident cases are randomly sampled and used within the 1) The Data: The data acquisition process started on
training or testing dataset. December 1st, 2021, and covers nine months of heterogeneous
b) The model: We use k-NN algorithm, a well-known data from Bonn and Cologne. The acquired dataset from Bonn
supervised learning approach to solve the classification prob- contains 13,700,000 entries with a total size of 14 GB, whereas
lem. To train it, each incident entry has multiple features the dataset from the neighboring city Cologne has 28,700,000
and a label referring to a particular incident type (accident, entries with a total size of 31 GB. The data is structured
congestion, or non-incident). The data features represent a time 2 github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-
series with the corresponding traffic level, speed (absolute and Warping
ZIßNER et al.: DataFITS: A HETEROGENEOUS DATA FUSION FRAMEWORK 11473
TABLE IV
G ENERAL I NCIDENT DATA S TATISTICS
TABLE VI
P ERFORMANCE OF THE R EGRESSION M ODEL ON
VARIOUS I NPUT DATASETS
TABLE VII
C OMPARISON OF S TATISTICAL AND R EGRESSION M ODEL
traffic and incident data to classify events into accident, [5] A. B. Campolina, P. H. L. Rettore, M. Do Val Machado, and
congestion, or non-incidents. A. A. F. Loureiro, “On the design of vehicular virtual sensors,” in Proc.
13th Int. Conf. Distrib. Comput. Sensor Syst. (DCOSS), Jun. 2017,
Using real heterogeneous data from two German cities, pp. 134–141.
we quantified the advantages of DataFITS by compiling a
[6] S. Jeong, S. Kim, and J. Kim, “City data hub: Implementation of
fused dataset. Our results indicate that DataFITS integrated standard-based smart city data platform for interoperability,” Sen-
data from multiple sources for 40% of all roads, thereby sors, vol. 20, no. 23, p. 7000, Dec. 2020. [Online]. Available:
increasing the overall road coverage by 137%. In addition, the https://www.mdpi.com/1424-8220/20/23/7000
[7] L. Zhang, Y. Xie, L. Xidao, and X. Zhang, “Multi-source heterogeneous
traffic estimation model, which uses polynomial regression, data fusion,” in Proc. Int. Conf. Artif. Intell. Big Data (ICAIBD),
outperformed our previous approach based on descriptive May 2018, pp. 47–51.
statistics, achieving a high R2 score of 0.91, low error metrics [8] P. H. L. Rettore, B. P. Santos, A. B. Campolina, L. A. Villas, and
A. A. F. Loureiro, “Towards intra-vehicular sensor data fusion,” in
of 0.05, and provides accurate traffic estimations using the Proc. IEEE 19th Int. Conf. Intell. Transp. Syst. (ITSC), Nov. 2016,
fused dataset. Compared to using a single sources dataset, the pp. 126–131.
fused dataset estimation showed minor accuracy improvements [9] P. H. L. Rettore, A. B. Campolina, L. A. Villas, and A. A. F. Loureiro,
“A method of eco-driving based on intra-vehicular sensor data,” in Proc.
but drastically improved the spatiotemporal coverage of the IEEE Symp. Comput. Commun. (ISCC), Jul. 2017, pp. 1122–1127.
estimated areas. Our incident classification model relies on the [10] P. H. L. Rettore, A. B. Campolina, A. Souza, G. Maia, L. A. Villas, and
A. A. F. Loureiro, “Driver authentication in VANETs based on intra-
fusion of traffic and incident data, achieving a 90% binary clas- vehicular sensor data,” in Proc. IEEE Symp. Comput. Commun. (ISCC),
sification accuracy rate within our evaluation. Preprocessing Jun. 2018, pp. 00078–00083.
the data, such as removing unclear traffic patterns, improved [11] G. L. Foresti, M. Farinosi, and M. Vernier, “Situational awareness in
smart environments: Socio-mobile and sensor data fusion for emergency
accuracy by an average of 29%. The classification of incidents response to disasters,” J. Ambient Intell. Humanized Comput., vol. 6,
into different categories resulted in a slightly lower accuracy no. 2, pp. 239–257, Apr. 2015.
of 86%, with unequal performance among classes indicated [12] H. Wen, Y. Lin, and J. Wu, “Co-evolutionary optimization algorithm
based on the future traffic environment for emergency rescue path
by F1 scores. To mitigate this problem, we oversampled the planning,” IEEE Access, vol. 8, pp. 148125–148135, 2020.
training dataset to create a more uniform representation of the [13] P. H. Rettore, G. Maia, L. A. Villas, and A. A. F. Loureiro, “Vehicular
data, resulting in an 80% accuracy for each class. Collecting data space: The data point of view,” IEEE Commun. Surveys Tuts.,
vol. 21, no. 3, pp. 2392–2418, 3rd Quart., 2019.
more accident data can also solve this problem. [14] S. A. Kashinath et al., “Review of data fusion methods for real-
We plan to expand the DataFITS framework by collecting time and multi-sensor traffic flow analysis,” IEEE Access, vol. 9,
and fusing more data types, improving its performance and pp. 51258–51276, 2021.
[15] W. Jiang and J. Luo, “Big data for traffic estimation and prediction:
data quality, and expanding its data analysis. We focus on A survey of data and tools,” Appl. Syst. Innov., vol. 5, no. 1, p. 23,
data types such as social media and images, which require Feb. 2022.
methods such as Natural Language Processing (NLP) and [16] R. A. Anand, L. Vanajakshi, and S. C. Subramanian, “Traffic density
estimation under heterogeneous traffic conditions using data fusion,” in
image processing. For ITS applications, we aim to use auto- Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2011, pp. 31–36.
mated machine learning to explore different models and [17] A. Abadi, T. Rajabioun, and P. A. Ioannou, “Traffic flow prediction
hyper-parameters and compare them with our current models. for road transportation networks with limited traffic data,” IEEE Trans.
Intell. Transp. Syst., vol. 16, no. 2, pp. 653–662, Apr. 2015.
We also plan to analyze the correlation between traffic and [18] G. Meena, D. Sharma, and M. Mahrishi, “Traffic prediction for intelli-
incidents and incorporate it into the traffic estimation models. gent transportation system using machine learning,” in Proc. 3rd Int.
In addition, we intend to explore the use of big data in Conf. Emerg. Technol. Comput. Eng., Mach. Learn. Internet Things
(ICETCE), Feb. 2020, pp. 145–148.
military scenarios, combining information from the civilian [19] L. Zhao et al., “T-GCN: A temporal graph convolutional network for
and military fields to support strategic operations in urban traffic prediction,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 9,
pp. 3848–3858, Sep. 2020.
warfare. To this end, our framework can be enhanced to collect [20] J. Tang, L. Li, Z. Hu, and F. Liu, “Short-term traffic flow prediction
and combine different types of information (image, text) to considering spatio-temporal correlation: A hybrid model combing type-
create common operational pictures and verify/authenticate 2 fuzzy C-means and artificial neural network,” IEEE Access, vol. 7,
pp. 101009–101018, 2019.
information, thereby avoiding misinformation that may influ- [21] X. Di, Y. Xiao, C. Zhu, Y. Deng, Q. Zhao, and W. Rao, “Traffic conges-
ence political decisions. tion prediction by spatiotemporal propagation patterns,” in Proc. 20th
IEEE Int. Conf. Mobile Data Manage. (MDM), Jun. 2019, pp. 298–303.
[22] X. Wang, X. Guan, J. Cao, N. Zhang, and H. Wu, “Forecast
network-wide traffic states for multiple steps ahead: A deep learn-
R EFERENCES ing approach considering dynamic non-local spatial correlation and
non-stationary temporal dependency,” Transp. Res. C, Emerg. Tech-
[1] L. Zhu, F. R. Yu, Y. Wang, B. Ning, and T. Tang, “Big data analytics in nol., vol. 119, Oct. 2020, Art. no. 102763. [Online]. Available:
intelligent transportation systems: A survey,” IEEE Trans. Intell. Transp. https://www.sciencedirect.com/science/article/pii/S0968090X20306756
Syst., vol. 20, no. 1, pp. 383–398, Jan. 2019. [23] C. Zheng, X. Fan, C. Wang, and J. Qi, “GMAN: A graph multi-attention
[2] Umweltbundesamt. (2022). Verkehrsinfrastruktur und network for traffic prediction,” in Proc. AAAI, vol. 34, no. 1, 2020,
fahrzeugbestand. Accessed: Dec. 12, 2022. [Online]. Available: pp. 1234–1241.
https://www.umweltbundesamt.de/daten/verkehr/verkehrsinfrastruktur- [24] B. Zhao, X. Gao, J. Liu, J. Zhao, and C. Xu, “Spatiotemporal data fusion
fahrzeugbestand in graph convolutional networks for traffic prediction,” IEEE Access,
[3] German Federal Statistical Office (Destatis). (2022). Passengers vol. 8, pp. 76632–76641, 2020.
Carried in Germany. Accessed: Jul. 12, 2022. [Online]. [25] N. Zafar, I. U. Haq, J.-U.-R. Chughtai, and O. Shafiq, “Applying hybrid
Available: https://www.destatis.de/EN/Themes/Economic-Sectors- LSTM-GRU model based on heterogeneous data sources for traffic speed
Enterprises/Transport/Passenger-Transport/Tables/passengers- prediction in urban areas,” Sensors, vol. 22, no. 9, p. 3348, Apr. 2022.
carried.html [Online]. Available: https://www.mdpi.com/1424-8220/22/9/3348
[4] G. Vítor, P. Rito, and S. Sargento, “Smart city data platform for real-time [26] Z. Shan, Y. Xia, P. Hou, and J. He, “Fusing incomplete multisensor
processing and data sharing,” in Proc. IEEE Symp. Comput. Commun. heterogeneous data to estimate urban traffic,” IEEE MultimediaMag.,
(ISCC), Sep. 2021, pp. 1–7. vol. 23, no. 3, pp. 56–63, Jul. 2016.
11478 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 10, OCTOBER 2023
[27] A. Genser, N. Hautle, M. Makridis, and A. Kouvelas, “An experimental Paulo H. L. Rettore received the B.Sc. and M.Sc.
urban case study with various data sources and a model for traffic degrees in computer science in 2009 and 2012,
estimation,” Sensors, vol. 22, no. 1, p. 144, Dec. 2021. respectively, and the Ph.D. degree in computer sci-
[28] L. Wenqi, L. Dongyu, and Y. Menghua, “A model of traffic accident ence from the Federal University of Minas Gerais
prediction based on convolutional neural network,” in Proc. 2nd IEEE (UFMG) in 2019. He is currently a Scientist with
Int. Conf. Intell. Transp. Eng. (ICITE), Sep. 2017, pp. 198–202. Fraunhofer FKIE, Bonn, Germany. Sitting with the
[29] S.-H. Park, S.-M. Kim, and Y.-G. Ha, “Highway traffic accident pre- Communication Systems Department (KOM), he has
diction using VDS big data analysis,” J. Supercomput., vol. 72, no. 7, been focused on measuring the performance bounds
pp. 2815–2831, Jul. 2016. of tactical systems over ever-changing scenarios.
[30] H. Ren, Y. Song, J. Wang, Y. Hu, and J. Lei, “A deep learning approach His research interests include computer networks,
to the citywide traffic accident risk prediction,” in Proc. 21st Int. Conf. mobile ad-hoc networks, tactical networks, software-
Intell. Transp. Syst. (ITSC), Nov. 2018, pp. 3346–3351. defined networking, ubiquitous computing, the Internet of Things, intelligent
[31] Q. Shang, L. Feng, and S. Gao, “A hybrid method for traffic incident transportation systems, and smart mobility.
detection using random forest-recursive feature elimination and long
short-term memory network with Bayesian optimization algorithm,”
IEEE Access, vol. 9, pp. 1219–1232, 2021.
[32] Z. Liu and C. Wang, “Design of traffic emergency response system based
on Internet of Things and data mining in emergencies,” IEEE Access,
vol. 7, pp. 113950–113962, 2019.
[33] K. R. Sanjana, S. Lavanya, and Y. B. Jinila, “An approach on automated
rescue system with intelligent traffic lights for emergency services,” Bruno P. Santos received the bachelor’s degree
in Proc. Int. Conf. Innov. Inf., Embedded Commun. Syst. (ICIIECS), from Universidade Estadual de Santa Cruz (UESC)
Mar. 2015, pp. 1–4. and the M.S. and Ph.D. degrees in computer sci-
[34] P. H. L. Rettore, B. P. Santos, R. Rigolin F. Lopes, G. Maia, L. A. Villas, ence from Universidade Federal de Minas Gerais
and A. A. F. Loureiro, “Road data enrichment framework based on (UFMG). He is currently a Professor of com-
heterogeneous data fusion for ITS,” IEEE Trans. Intell. Transp. Syst., puter science with the Federal University of Bahia
vol. 21, no. 4, pp. 1751–1766, Apr. 2020. (UFBA). His research interests include computer
[35] A. Salas, P. Georgakis, and Y. Petalas, “Incident detection using data networks, distributed systems, ubiquitous comput-
from social media,” in Proc. IEEE 20th Int. Conf. Intell. Transp. Syst. ing, the Internet of Things, intelligent transportation
(ITSC), Oct. 2017, pp. 751–755. systems, and smart mobility.
[36] S. Guo et al., “Identifying the most influential roads based on traffic
correlation networks,” EPJ Data Sci., vol. 8, no. 1, pp. 1–17, Dec. 2019.
[37] Z. Liu, Z. Li, M. Li, W. Xing, and D. Lu, “Mining road network
correlation for traffic estimation via compressive sensing,” IEEE Trans.
Intell. Transp. Syst., vol. 17, no. 7, pp. 1880–1893, Jul. 2016.
[38] Y. Zhu, Z. Li, H. Zhu, M. Li, and Q. Zhang, “A compressive sensing
approach to urban traffic estimation with probe vehicles,” IEEE Trans.
Mobile Comput., vol. 12, no. 11, pp. 2289–2302, Nov. 2013.
[39] B. P. Santos, P. H. L. Rettore, H. S. Ramos, L. F. M. Vieira, and Johannes F. Loevenich received the B.Sc. degree in
A. A. F. Loureiro, “Enriching traffic information with a spatiotemporal computer science and the B.Sc. degree in mathemat-
model based on social media,” in Proc. IEEE Symp. Comput. Commun. ics from Rheinische Friedrich-Wilhelms-Universität
(ISCC), Jun. 2018, pp. 00464–00469. Bonn. He is currently pursuing the Ph.D. degree in
[40] G. Boeing, “OSMnx: New methods for acquiring, constructing, analyz- computer science/mathematics with the Distributed
ing, and visualizing complex street networks,” Comput., Environ. Urban Systems Department, University of Osnabrück. He is
Syst., vol. 65, pp. 126–139, Sep. 2017. a Scientist with the Communication Systems Depart-
[41] C. Yang and G. Gidófalvi, “Fast map matching, an algorithm integrating ment (KOM), Fraunhofer FKIE, Bonn, Germany.
hidden Markov model with precomputation,” Int. J. Geographical Inf. His research interests include computer systems,
Sci., vol. 32, no. 3, pp. 547–570, Mar. 2018. computer networks, distributed systems, data sci-
[42] R. Tavenard. An Introduction to Dynamic Time Warping. ence, optimization theory, artificial intelligence, and
Accessed: Sep. 14, 2022. [Online]. Available: https://rtavenar. game theory.
github.io/blog/dtw.html
[43] P. Zißner, P. H. L. Rettore, B. P. Santos, R. R. F. Lopes, and P. Sevenich,
“Road traffic density estimation based on heterogeneous data fusion,” in
Proc. IEEE Symp. Comput. Commun. (ISCC), Jun. 2022, pp. 1–6.
[44] S. Kolouri, S. R. Park, M. Thorpe, D. Slepcev, and G. K. Rohde,
“Optimal mass transport: Signal processing and machine-learning appli-
cations,” IEEE Signal Process. Mag., vol. 34, no. 4, pp. 43–59, Jul. 2017.
Roberto Rigolin F. Lopes (Member, IEEE) received
the B.Sc. degree in computer science from UFMT,
Brazil, the M.Sc. degree in computer science from
UFSCar, Brazil, and the Ph.D. degree in com-
puter science from USP, Brazil. During his Ph.D.,
he also visited Twente, The Netherlands, and Ottawa,
Canada. After his Ph.D., he got a post-doctoral
scholarship from the European Research Consortium
Philipp Zißner received the B.Sc. and M.Sc.
for Informatics and Mathematics (ERCIM) to join
degrees in computer science from Rheinische
NTNU, Norway, for four years, and a Scientist with
Friedrich-Wilhelms-Universität Bonn in 2020 and
Fraunhofer FKIE, Germany, for six years. He is
2022, respectively. He is currently a Scientist with
currently a Scientist with Thales Deutschland, Ditzingen, Germany. Sitting
the Communication Systems Department (KOM),
with the Secure Communications and Information Systems (SIX), he has
Fraunhofer FKIE, Bonn, Germany. His research
been attacking problems in computer networks and distributed systems with
interests include intelligent transportation systems,
a particular interest in the performance bounds of tactical systems over
smart mobility, the Internet of Things, and tactical
ever-changing communication scenarios. His academic life triggered interest-
networks.
ing life experiences, but he has been rebuilding his own education following
curiosity freely by reading books on physics, mathematics, and philosophy.