Traffic Flow Prediction
Traffic Flow Prediction
By
Submitted to the University of Hertfordshire in partial fulfilment of the requirement of the degree
of Master of Philosophy (MPhil)
1
DECLARATION STATEMENT
I, Arsalan Ahmad Rahi, the author of this project, hereby declare that this research study titled
“Machine Learning Approach for Traffic Flow Prediction” is my own genuine work from beginning to
end. I certify that the work submitted with all the materials, sources and the form of published or
unpublished work of other persons used in this thesis, were duly acknowledged by means of IEEE
numeric referencing. (ref. UPRAS/C/6.1, Appendix I, Section 2 – Section on cheating and plagiarism)
Signed: ………………………………………………………………
Date: 01/05/2019
2
ABSTRACT
Intelligent Transport Systems (ITS) as a field has emerged quite rapidly in the recent years. A
competitive solution coupled with big data gathered for ITS applications needs the latest AI to drive
the ITS for the smart and effective public transport planning and management. Although there is a
strong need for ITS applications like Advanced Route Planning (ARP) and Traffic Control Systems (TCS)
to take the charge and require the minimum of possible human interventions. This thesis develops the
models that can predict the traffic link flows on a junction level such as road traffic flows for a freeway
or highway road for all traffic conditions.
The research first reviews the state-of-the-art time series data prediction techniques with a deep focus
in the field of transport Engineering along with the existing statistical and machine leaning methods
and their applications for the freeway traffic flow prediction. This review setup a firm work focussed
on the view point to look for the superiority in term of prediction performance of individual statistical
or machine learning models over another. A detailed theoretical attention has been given, to learn
the structure and working of individual chosen prediction models, in relation to the traffic flow data.
In modelling the traffic flows from the real-world Highway England (HE) gathered dataset, a traffic
flow objective function for highway road prediction models is proposed in a 3-stage framework
including the topological breakdown of traffic network into virtual patches, further into nodes and to
the basic links flow profiles behaviour estimations. The proposed objective function is tested with ten
different prediction models including the statistical, shallow and deep learning constructed hybrid
models for bi-directional links flow prediction methods. The effectiveness of the proposed objective
function greatly enhances the accuracy of traffic flow prediction, regardless of the machine learning
model used.
The proposed prediction objective function base framework gives a new approach to model the traffic
network to better understand the unknown traffic flow waves and the resulting congestions caused
on a junction level. In addition, the results of applied Machine Learning models indicate that RNN
variant LSTMs based models in conjunction with neural networks and Deep CNNs, when applied
through the proposed objective function, outperforms other chosen machine learning methods for
link flow predictions. The experimentation based practical findings reveal that to arrive at an efficient,
robust, offline and accurate prediction model apart from feeding the ML mode with the correct
representation of the network data, attention should be paid to the deep learning model structure,
data pre-processing (i.e. normalisation) and the error matrices used for data behavioural learning.
The proposed framework, in future can be utilised to address one of the main aims of the smart
transport systems i.e. to reduce the error rates in network wide congestion predictions and the
inflicted general traffic travel time delays in real-time.
3
TABLE OF CONTENTS
DECLARATION STATEMENT..................................................................................................................... 2
ABSTRACT................................................................................................................................................ 3
TABLE OF CONTENTS.............................................................................................................................. 4
LIST OF FIGURES ...................................................................................................................................... 8
LIST OF TABLES ...................................................................................................................................... 11
1. Introduction.................................................................................................................................... 12
Problem Statement ............................................................................................................... 12
Aims and Research Questions............................................................................................... 13
RQ1:................................................................................................................................................... 13
RQ2:................................................................................................................................................... 13
RQ3:................................................................................................................................................... 13
RQ4:................................................................................................................................................... 13
Research Method .................................................................................................................. 13
Contributions ........................................................................................................................ 13
What is Machine Learning?................................................................................................... 14
1.5.1 How Machine Learning Works ...................................................................................... 14
1.5.2 Innovations in Machine Learning .................................................................................. 14
1.5.3 Self-Driving Cars ............................................................................................................ 14
1.5.4 Recommendation Systems............................................................................................ 14
1.5.5 Social Media Sentimental Analysis................................................................................ 15
1.5.6 Online Credit Card Fraud Protection............................................................................. 15
1.5.7 Span Email Filtering ....................................................................................................... 15
1.5.8 Network Intrusion Protection .............................................................................................. 15
Commonly Used Machine Learning Algorithms ................................................................... 15
1.6.1 Artificial Neural Networks ............................................................................................. 15
1.6.2 Decision Trees ............................................................................................................... 16
1.6.3 Other ML Techniques .................................................................................................... 16
What is Smart Transportation? ............................................................................................. 18
1.7.1 From Commercial Transport Operators Point of View ................................................. 18
1.7.2 Congestion as a Cause of Flow Restriction ................................................................... 19
Thesis Structure .................................................................................................................... 19
2. Review of Traffic Flow Prediction Methods from Traditional to the State-of-the-Art Techniques
20
Introduction .......................................................................................................................... 20
Aims....................................................................................................................................... 20
4
History and Short Overview of Traffic Flow Analyses and Predictions from Literature Study
20
Study of Factors Influencing Traffic Prediction Models in the light of the Literature Review
21
2.4.1 Context of Implementation for Road Traffic Predictions.............................................. 21
2.4.2 Input variables for Traffic Prediction ............................................................................ 21
2.4.3 Effects of Using Purely Machine Learning Approaches ................................................ 22
2.4.4 Input Data Resolution for Traffic Prediction ................................................................. 24
2.4.5 Prediction Steps in Traffic Flow Prediction ................................................................... 26
2.4.6 Seasonal Effects and Spatial-Temporal Patterns in Traffic Flow Prediction ................. 27
2.4.7 Various Road Conditions in Traffic Flow Prediction ...................................................... 27
Traffic Predictions in Other Domains Closely Related to Traffic Flow .................................. 28
Various Approaches for Traffic Flow and Congestion Behaviour Modelling and the
Associated Limitations in the Light Of Literature Review ................................................................. 31
2.6.1 Parametric, Naïve and Macroscopic Simulation based Approaches: ........................... 31
2.2.2 Non-Parametric and Data Driven Data Driven Machine Learning Methods: ...................... 32
2.6.2 Hybrid Models ............................................................................................................... 34
Established Theoretical Relevance for the Proposed Methodology from Literature Review
35
Summary ............................................................................................................................... 36
3. Models and Architectures ............................................................................................................. 37
Selected Models Theory ....................................................................................................... 37
3.1.1 Historical Moving Average (HA) .................................................................................... 37
3.1.2 Seasonal Autoregressive Integrated Moving Average Model (SARIMA) ...................... 37
3.1.3 Random Forrest Regressor (RFR) .................................................................................. 38
3.1.4 Support Vector Regressor (SVR) ................................................................................... 39
3.1.5 Feed Forward Backpropagation Neural Network (FFBNN) ........................................... 40
3.1.6 Deep Belief Network (DBN) .......................................................................................... 41
3.1.7 Convolutional Neural Network (CNN) ........................................................................... 42
3.1.8 Long Short-Term Memory (LSTM) ................................................................................ 44
3.1.9 Backpropagation Long Short-Term Memory - Neural Network (B-LSTM-ANN) ........... 45
3.1.10 Deep Convolutional Neural Network - Long Short-Term Memory (DCNN-LSTM) ........ 47
Hardware and Software Implementation Details ................................................................. 49
3.2.1 Data Exploration Library ............................................................................................... 49
3.2.1 ML Implementation Library .......................................................................................... 49
4. Research Methodology and Contributions ................................................................................... 50
Introduction .......................................................................................................................... 50
5
Study Area ............................................................................................................................. 50
Data Collection ...................................................................................................................... 50
Data Description ................................................................................................................... 55
5.4.1 MIDAS/TAME/TMU Dataset................................................................................................. 55
5.4.2 AADF Dataset ....................................................................................................................... 60
Data Preparation ................................................................................................................... 60
4.5.1 Data Cleaning ................................................................................................................ 62
4.5.2 Data Integration ............................................................................................................ 62
4.5.3 Data Normalisation ....................................................................................................... 62
4.5.4 Data Reduction.............................................................................................................. 62
4.5.5 Data Discretisation ........................................................................................................ 62
4.5.6 Dependent and Independent Data Variables ............................................................... 62
Preliminary Analysis .............................................................................................................. 63
4.6.1 How Network Patch and Nodes Are Defined? .............................................................. 63
4.6.2 Preparing the Dataset Subset for Each Node of a System Patch .................................. 64
Methodology......................................................................................................................... 65
4.7.1 Traffic Network Representation on a Junction Level .................................................... 66
4.7.2 Formulation of Network Flow Estimation Function...................................................... 66
4.7.3 Node Level Traffic Flow Mathematical Representation ............................................... 66
Summary ............................................................................................................................... 68
5. Experiments and Results: Evaluation of The Proposed Frameworks ........................................... 69
Experimental Settings ........................................................................................................... 69
5.1.1 Performance Metrics .................................................................................................... 69
5.1.2 Evaluation Settings ........................................................................................................ 69
5.1.3 Empirical Error Distributions ......................................................................................... 70
5.1.4 Error Distribution Comparisons .................................................................................... 70
Experiments .......................................................................................................................... 70
Case 1: Prediction Interval ............................................................................................................ 70
Case 2: Inclusion of Related Variable ............................................................................................ 71
Correlation Analysis .............................................................................................................. 71
5.3.1 Auto-Correlation ........................................................................................................... 71
5.3.2 Cross-Correlation .......................................................................................................... 72
5.3.3 Relation Between Traffic Flow Profiles and Times of the Day ...................................... 74
5.3.1 Seasonality and Trends in Traffic Flows ........................................................................ 77
5.3.2 Seasonality and Trends in Traffic Flows ........................................................................ 78
Experimental Environment ................................................................................................... 79
6
Experimental Results ............................................................................................................ 80
5.2.1 Case 1: Experiment with Different Prediction Intervals ............................................... 80
5.2.2 Case 2: Experiment with Inclusion of the Related Variables ........................................ 81
Summary ............................................................................................................................... 82
6. Evaluation and Conclusion ............................................................................................................ 83
Evaluation ............................................................................................................................. 83
6.1.1 Case 1: Evaluation of Experiment Results with Different Prediction Intervals .................... 83
6.1.2 Case 2: Evaluation of Experimental Results with Inclusion of the Related Variables ... 84
Discussion.............................................................................................................................. 87
6.2.1 Limitations..................................................................................................................... 88
Conclusions ........................................................................................................................... 88
RQ1:................................................................................................................................................... 88
RQ2:................................................................................................................................................... 89
RQ3:................................................................................................................................................... 89
RQ4:................................................................................................................................................... 89
Contributions ........................................................................................................................ 90
6.4.1 Thorough ITS Literature Study on Prediction Models................................................... 90
6.4.2 Junction level Proposed Flow Prediction Objective Function ....................................... 90
Future Works ........................................................................................................................ 90
References ............................................................................................................................................ 91
Appendix A : Hyperparameters Tuning Results ................................................................................... 97
A.1 Experiment Case1: Best Search Hyperparameters Used for Multi Prediction Horizons ........ 97
Appendix B : Continuation of Discussion of Selected Models ............................................................ 114
B.1 Long Short-Term Memory (LSTM) ........................................................................................ 114
B.2 Traditional Neural networks (NN) VS Recurrent Neural Networks RNN: ............................. 114
Appendix C : Future Works ................................................................................................................. 117
C.1 Flow Rate Network Bottleneck Identification ....................................................................... 117
C.1.2 Average Congestion Speed and Average Travel Time Calculations: .................................. 118
C.1.3 Naïve Bayes Based Links Flow Rate Estimations:............................................................... 119
C.1.4 Flow Rate Trend Analysis in Probability Distributions at Nodes:....................................... 120
C1.5 Initial Insights into Conservation of Travel Time Delays: .................................................... 120
Appendix D: Published Work ............................................................................................................. 121
7
LIST OF FIGURES
Figure 1.1 Multi-Layer Perceptron (MLP) Network.............................................................................. 16
Figure 1.2 Supervised machine learning prediction algorithms breakdown. ...................................... 17
Figure 1.3 Typical Wait categories as seen by the transport operators. ............................................ 18
Figure 4. 1 a) Original Sample chosen test area with circles (yellow for MIDAS sites and blue for
TAME sites. b) showing the sensors installed at the test sites by Highway England authority. b)
Square red line boxes indicate the virtually divided network. ............................................................. 54
Figure 4. 2 Highway England Dataset Breakdown. ............................................................................... 55
Figure 4. 3 DFT Dataset Breakdown. .................................................................................................... 60
Figure 4. 4 a) First and b) last three days of pre-processed data from Patch 1, Node 2 associated
Links. ..................................................................................................................................................... 61
Figure 4. 5 Systems as Network of Patches. ......................................................................................... 63
Figure 4. 6 a) 𝑃𝑃1-𝑁𝑁2, Highway junction under consideration (Google Maps, 2018). b) Node
illustration retaining junction original topology. .................................................................................. 64
Figure 4. 7 General Network Node Link Dependencies Written in An Analogy with The General
Function Definition. .............................................................................................................................. 65
Figure 4. 8 a) Extension of traffic network at node 𝑖𝑖 showing three links and their associated inflows
and outflows. b) A simple traffic network at a node 𝑖𝑖 with 3 links. It shows the distribution of
incoming traffic dispersed as outgoing traffic at the node................................................................... 67
Figure 4. 9 Implementation Steps for The Proposed Methodology. ................................................... 68
Figure 5. 1 Original Flow features auto-correlation for the incoming link 𝐿𝐿1𝑖𝑖𝑖𝑖. ................................ 72
Figure 5. 2 Cross Correlation of Link 𝐿𝐿1𝑖𝑖𝑖𝑖 with it's Time Lagged Versions.......................................... 73
Figure 5. 3 Cross-Correlation of Connected Links for The Past Six-Time Steps. ................................... 74
Figure 5. 4 Link 𝐿𝐿1𝑖𝑖𝑖𝑖 Normalised Flow Profiles with Respect to The Times of The Days. .................... 75
Figure 5. 5 a) Correlation Between Non-Lagged Interconnected Link Pair Normalised Flows vs Time
of the Day. b) Correlation Between Non-Lagged Interconnected Link Pairs Normalised Flows vs Time
of the Day. ............................................................................................................................................. 76
Figure 5. 6 Links Seasonality Breakdown .............................................................................................. 77
Figure 5. 7 Link 𝐿𝐿1𝑖𝑖𝑖𝑖 Flow Profiles with Respect to The Times of The Day Along with the Days of
the Week Breakdown. .......................................................................................................................... 78
Figure 5. 8 Averaged Monthly Traffic Flows. ....................................................................................... 79
Figure 5. 9 Stationary Test: Augmented Dickey Fuller Test Results...................................................... 80
Figure 6. 1 Empirical CDF Plot of Absolute Mean Square Error Score on the Short-Term Prediction
Results. .................................................................................................................................................. 84
Figure 6. 2 Empirical CDF Plot of Absolute Mean Square Error Score on the Medium-Term Prediction
Results. .................................................................................................................................................. 85
8
Figure 6. 3 Empirical CDF Plot of Absolute Mean Square Error Score on the Long-Term Prediction
Results. .................................................................................................................................................. 85
Figure 6. 4 Empirical CDF Plot of Absolute Mean Square Error Score on the Short-Term Prediction
Results with Multi Link Proposed Flow Learning. ................................................................................. 86
Figure 6. 5 Empirical CDF Plot of Absolute Mean Square Error Score on the Medium-Term Prediction
Results with Multi Link Proposed Flow Learning. ................................................................................. 86
Figure 6. 6 Empirical CDF Plot of Absolute Mean Square Error Score on the Long-Term Prediction
Results with Multi Link Proposed Flow Learning. ................................................................................. 87
Figure A.1 ARIMA Hyperparameter Grid Search for Short, Medium- and Long-Term Prediction
Horizon. ................................................................................................................................................. 97
Figure A.2 RFR Hyperparameter Grid Search for Short Term Prediction Horizon. ............................... 98
Figure A.3 RFR Hyperparameter Grid Search for Medium Term Prediction Horizon. .......................... 99
Figure A.4 RFR Hyperparameter Grid Search for Long Term Prediction Horizon. ................................ 99
Figure A.5 SVR Hyperparameter Grid Search for Short Term Prediction Horizon. ............................. 100
Figure A.6 SVR Hyperparameter Grid Search for Medium Term Prediction Horizon. ........................ 101
Figure A.7 SVR Hyperparameter Grid Search for Long Term Prediction Horizon. .............................. 101
Figure A.8 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with
Respect to Optimizers and Activation Functions. ............................................................................... 102
Figure A.9 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon
with Respect to Optimizers and Number of Epochs. .......................................................................... 102
Figure A.10 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon
with Respect to No of Neurons and Batch Sizes. ................................................................................ 103
Figure A.11 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon
with Respect to No of Neurons and Epochs. ...................................................................................... 103
Figure A.12 FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect to Optimizers and Activation Functions. ....................................................................... 104
Figure A.13 FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect to Optimizers and Number of Epochs. .......................................................................... 104
Figure A.14 FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect to No of Neurons and Batch Sizes. ................................................................................ 105
Figure A.15 Figure FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction
Horizon with Respect to No of Neurons and Epochs. ......................................................................... 105
Figure A.16 Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with
Respect to Optimizers and Activation Functions. ............................................................................... 106
Figure A.17 FFBNN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon
with Respect to Optimizers and Number of Epochs. .......................................................................... 106
Figure A.18 FFBNN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon
with Respect to No of Neurons and Batch Sizes. ................................................................................ 107
Figure A.19 Figure FFBNN Hyperparameter Grid Search, Mean Results for Long Term Prediction
Horizon with Respect to No of Neurons and Epochs. ......................................................................... 107
Figure A.20 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with
Respect to the First RBM Layer Iterations and First Layer RBMs Batch Size. ..................................... 108
Figure A.21 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with
Respect to The Second RBM Layer Iterations and Second Layer RBMs Batch Size. ........................... 108
Figure A.22 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with
Respect to The Second RBM Layer Iterations and Second Layer RBMs Numbers. ............................ 109
9
Figure A.23 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with
Respect To The Number OF Neurons and Epochs. ............................................................................. 109
Figure A.24 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect to The First & Second RBM Layer Iterations and RBM numbers and the Model
Activation and Optimizer Functions.................................................................................................... 110
Figure A.25 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect To The Number OF Neurons and Second RBM Numbers. ............................................ 110
Figure A.26 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect to The Neural Layer Batch Size and Number of Epochs. ............................................... 111
Figure A.27 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon
with Respect to The RBM Layer Batch Sizes. ...................................................................................... 111
Figure A.28 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with
Respect to The First & Second RBM Layer Iterations and RBM numbers and the Model Activation and
Optimizer Functions. ........................................................................................................................... 112
Figure A.29 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with
Respect To The Number OF Neurons and Second RBM Numbers. .................................................... 112
Figure A.30 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with
Respect to The Neural Layer Batch Size and Number of Epochs. ....................................................... 112
Figure A.31 Figure A.27 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction
Horizon with Respect to The RBM Layer Batch Sizes.......................................................................... 113
Figure B.1 Data Flow and Operations in Long Short-Term Memory (LSTM) Unit Structure which
Contains The Forget, Input, Output, and Update Gates. .................................................................... 114
Figure B.2 General Recurrent Neural Network Structure. Unlike a feed forward neural network each
neuron not only feeds its output to the next neuron in next layer but also to the next in line neuron
in the same layer. So each neuron have two sources of input, the most recent and the recent data
which is combined to determine how they respond to the new data. .............................................. 115
10
LIST OF TABLES
Table 5. 1 MAE and RMSE Results for The Short-Term Prediction Horizon......................................... 81
Table 5. 2 MAE and RMSE Results for The Medium-Term Prediction Horizon..................................... 81
Table 5. 3 MAE and RMSE Results for The Long-Term Prediction Horizon........................................... 81
Table 5. 4 MAE and RMSE aggregated Results of The Short-Term Prediction Horizon for The Multi
Feature Inclusion................................................................................................................................... 82
Table 5. 5 MAE and RMSE aggregated Results of The Medium-Term Prediction Horizon for The Multi
Feature Inclusion................................................................................................................................... 82
Table 5. 6 MAE and RMSE aggregated Results of The Long-Term Prediction Horizon for The Multi
Feature Inclusion................................................................................................................................... 82
11
1. Introduction
This is an introductory chapter in which the initial undertaken topic of study with a little background
is presented. The subject motivation and the research questions with the aim and the contributions
are also discussed.
There has been a vast increase in the big data that is available through the advent of smart things,
smart cities and smart transportation and internet of things. The next thing that arose in the
organisations mind is to find ways on how to put this big data in use for meaningful use. This question
motivated the author to further research using the big data gathered in the field of transportation in
conjunction with AI techniques to build useful systems.
Intelligent Transport Systems (ITS) is all about providing the end users the innovative and advanced
services to seamlessly use different modes of transportation and traffic management for timely
effective planning and to empower users to make smarter choices in the latest multi modular
transportation system. Having such a system can reasonably predict the transport changes and will
be of great importance for the transportation authorities, government and the public institutions.
Daily we use public and private transport on our everyday commute. We take it for granted that the
buses arrive on the scheduled times. Due to the ever-increasing population there is a need for the
public to have the better travelling experiences on commercial transports.
Problem Statement
The issues affecting a common native may include preparing for the right currency change beforehand
in terms of cash to be paid for on board ticket purchase. The bus company eventually had to pay the
cost for each extra time that the passenger takes while boarding the bus. Once boarded, the passenger
constantly looks out for the desired stop out of the bus window. The bus takes a certain time to reach
the destination that is affected by the variable congressional regions along the route. This cycle
continues day in day out. There is nothing smart about this cycle of action; this is the normal life of
the bus running operations.
Public transportation has thus not evolved much enough over the past years as we expect it to along
with the growing technology, so that we can call it an efficient means of operation. Technology wise,
fare prices may drop, flexible dynamic congestion-based routes may be provided and ultimately bus
road driving behaviour and reliability issues can be solved. Having a system that can effectively predict
the traffic behaviours this will not only reduce the costs but will also help in reducing carbon footprints
as well. This thesis is written to address the improvements in some of the most distinctive fields of the
transportation industry that eventually constitutes to the smart transportation industry.
The closest up-to-date system deployed in the England UK today is at this website1 by Highway England.
The system gives close to Realtime traffic information for most of the highways, motorways and major
A roads in the England. The traffic information displayed includes the average traffic flows for each
junction for clockwise and counter clockwise directions, on the corresponding motorways. The CCTV
image is taken at a frequency of one minute. The data sources are the onsite deployed loop detectors
and the microwave sensors at the specific locations on the road.
1
https://m.highwaysengland.co.uk/#flow
12
There are some of the issues inferred from the current implemented system:
o The system relies on the data collection sensors deployed at certain major road locations,
which makes it easy to ascertain the traffic average road traffic speeds but at present cannot
make a meaningful elaborative system so to make the sense of the nearby links.
o At the current state the system cannot make any prediction from the current data and just
displays the instantly averaged speed and generates the control signals specifying the delays
for the e-signs on the roads. So, there is a need of latest AI based deep machine learning
techniques employment to make the effective forecasting by analysing the behaviours of the
closely related traffic flow links data.
RQ1:
What are the potential hindering challenges for the practical implementation of the road traffic
parameter forecasting systems?
RQ2:
What conventional neural network-based techniques have to offer for the traffic parameter
predictions?
RQ3:
What are the state-of-the-art traffic prediction machine learning architectures for traffic flow
forecasting and what effects does the proposed methodology has on the chosen model performances?
RQ4:
What deep machine learning approaches have to offer when compared to conventional or shallow
machine learning techniques considering the traffic flow data?
Research Method
The background study and state of the art literature review was performed to answer the first three
research questions. Proposed methodology and the null hypothesis containing the flow prediction
objective function was put to test in an iterative manner for different models. The experiments were
then conducted on the state-of-the-art deep ML models comparing them quantitatively with
conventional ML and statistical forecasting approaches. Result based hypothesis was then drawn. The
hypothesis is then further tested with multi time dependent model predictions. Conclusions are then
made in the end.
Contributions
This thesis contributes by gathering the most elaborative machine learning techniques from shallow
to the state-of-the-art deep learning approaches to do the predictions for traffic flows while
optimising for the basic junction level highway traffic flow proposed objective function. The bi-
directional flow function of individual roads is reported considering the net inflows and outflows by a
13
topological breakdown of the highway network. The proposed approach is modular and can be
adopted for network wide traffic flow behavioural learning. Further the technique can help in
considering the bottlenecks for congestions analysis.
14
predicting the likelihood of the products to be purchased by a particular customer. The set of products
offered to the customers are selected according to their buying power or range of dynamic price group
already assigned using k means-clustering technique. Finally, a binary linear-logistic regression trained
classifier is called upon the test data to predict the product purchases by the potential customers.
15
possible unique variation techniques in the training data. The test data is then utilised to measure final
model performance based on its true prediction and classification power of hand drawn electrical
circuit components [4]. A more general neural network architecture with deep hidden layers is shown
in figure 1.1 with the generalised model equations given by equation 1.1.
Given that 𝜀𝜀𝑡𝑡 is the bias term for each parameter calculations at different levels of layers.
16
Associations and sequence discovery in conjunction with other tools helps classifying the data. For an
incoming data to classify it against the already existing data, it is necessary to transform the data
records into ontology-based event graphs. These graphs are visual representations of event sequences
through time. This mapping technique in terms of events would help in resolving data conflicts among
aggregated records plotted in the form of event [7]. In an analogous manner the hybrid artificial neural
network (ANN) - support vector machine (SVM) are put to use to forecast the building energy
consumption with the ever growing human population [8]. Nearest-neighbour mapping, K-mean
clustering, self-aware-organising maps, local optimal search techniques (genetics algorithms),
expectation maximisation, Bayesian networks, principle component analysis (PCA), kernel density
estimation, singular value decomposition, gaussian mixture models, sequential covering rule buildings
are some of the developed ML algorithms that are often used singularly or in combination with other
algorithms on a series of datasets to find the optimal solutions in terms of classifying, making
predictions and fetching useful insights out of data. A breakdown of supervised techniques
standardised, and basic ML techniques used in the literature previously are shown in the figure 1.2.
Artificial Intelligence and Machine learning have become a popular subject in almost all the applied
sciences in modern times. The single in hand capability of the AI and ML is its ability to generalise the
behaviour of the process by and large through a large set of data gathered under various conditions,
we call it Big Data. Different state of the art ML algorithms been developed over the period of the
time that are considered as benchmarking standards. All newly proposed algorithms are bench
marked against these standard algorithms in performance and accuracy. A dive into literature
discussed in later chapters shows that ML can be developed or tuned for various parameters to
17
generalise the behaviour of data whether non-supervised or supervised and can be catered to a
specific dataset or data driven application. The algorithm trained for one dataset may not be a feasible
solution for the behavioural classification or prediction for another dataset.
An example of the sample smart transportation model which utilises the use of ML algorithms to
model the bus deviation behaviour, let’s say, is given in its general form as:
18
Equation 1.2 is the general form of the model that describes mathematically how the practical ML
architecture looks like for the bus deviation predictions scenarios. Equation 1.2 consists of the terms:
Artificial Intelligence (AI) techniques like smart vision, data pre-processing, Internet of thing devices
and ML like Neural Networks (NN), K-mean Clustering (KC) etc... The clarification of these terms would
be much clearer along the way to the algorithm development and in the final phase of the thesis
implementation.
The Aim of this research work is to adapt an AI algorithm catered specifically towards the development
of the transportation problem solutions that could use the existing algorithms for their better
performance. Multi modular, scheduling, improved and seem-less mode of travel based on congestion
and real time travel time estimations, are the tipping points to be aimed for, for the smart
transportation system as for the future development.
Thesis Structure
Different chapters with section and subsection are devoted to building the scenario from the small
problem definition to the possible questions and proposed mythology, solutions through to the state-
of-the-art machine learning model architectures and concluding the thesis with the conclusions and
possible future work. Rest of the chapter in the thesis are organised as follows: Chapter 2 presents the
comprehensive subject review regarding the conventional statistical, machine learning and deep
learning approaches for the prediction of traffic road parameter forecasting. Chapter 3 presents the
state-of-the-art traffic flow prediction frameworks. Chapter presents the theoretical details of the
chosen and implemented architectures using in this thesis. Research methodology is presented in
chapter 5. While chapter 6 discusses the experiments, results and evaluation of the proposed
frameworks quantitatively. Finally, chapter 7 concludes the thesis with the final say on the prediction
model performances and provides the future direction to be built upon this thesis.
19
2. Review of Traffic Flow Prediction Methods from Traditional to
the State-of-the-Art Techniques
Introduction
In the previous chapter a general introduction of the traffic prediction problem was discussed. A more
detailed background literature study in presented in this chapter. This chapter is organised in a way
that it contains the literature review discussion on traffic parameter prediction studies, with an
extensive comparison breakdown of each manuscript studied comparing their adopted approach,
selected features for decision making, performance measures along with adopted experimental setup
using statistical or data driven machine learning (ML) based algorithmic models, sample algorithm test
simulations and results as reported in the original manuscript. Further in this chapter review of
applications for data prediction in the general engineering domain closely related to those of traffic
engineering are also discussed. Further, based on the thorough background study an understanding
of the state of the art conceptual and implemented frameworks is envisioned at the end of this chapter
which highlights the key research gaps in the existing literature.
Aims
Chapter 2 presents the background detailed study on road flow and closely related traffic travel time
inference models and approaches. Later, this chapter aims to synthesise the types of statistical and
machine learning methods presented in the literature study. Finally, the identified gaps trigger the
selection of best algorithms for our model development that are further discussed in the next chapter
3.
20
using much macroscopic prediction techniques are the complex parameter estimations and a real
struggle to generate close to real world simulation test environment. Also, the predictions are highly
influenced by the quality of the estimated traffic parameters [11]. Both the statistical ML and
macroscopic approaches are useful for the ideal traffic flow prediction model development.
This research however, focuses purely on the study for the data driven statistical to complex ML
methods for traffic related predictions. The major difference between the ML and conventional
analytical method-based models is that ML is considered as a black box which learns the relationships
between the inputs and the outputs to predict traffic variables. While ML models are complicated to
optimise for the learning, but they are less complicated and computationally efficient to calculate the
final prediction once trained. Continued training allows ML models to adapt to the changing behaviour
displayed in the data. A detailed description of the selected models can be found in the section 4.
According to the literature, for comparison to be meaningful the same traffic data needs to be used
for both the statistical and computational learning models. However, such a comparison of models
across the literature with same data used in different comparison scenarios is difficult to be found.
21
in traffic flows in a piecewise switched linear manner, describing the model as an aggregated set of
partial derivatives of the involved parameters [27].
General road delays and blocked lane duration (BLD) in [28], a queueing based model is compared
considering the road delays, incident severity and road incident locations as the input parameters. For
the proposed performance measure using decision trees (DT) blocked lane duration (BLD) and general
road delays from a particular incident are considered in [28] as the final input parameter. Finally, the
model quantifies the average delay per number of cars as the final performance parameter measure
for the effective Traffic Incident Management (TIM) System. The fact that the proposed DT [28] does
not require additional data makes it favourable for traffic predictions. Similarly, lane blocking incident
data is used as an input parameter in automatic traffic incident detection system by considering the
hybrid of Time series analysis (TSA) and machine learning (ML) techniques utilising the theory of fault
diagnosis [29]. TSA is used to predict the normal traffic based upon the past normal traffic data.
Likewise, ML is used to detect the traffic incidents from real-time behavioural learning, already
existing normally predicted traffic data and the differences between the two [29]. The proposed
approach in [29] claimed to have the better detection rate and lesser mean time for detection of
incidents under the constant condition of same false alarm rate (FAR), when compared to other
standard algorithms. Traffic features i.e. acceleration and other action based parameters recorded
during the driving are used as input parameters and clustered together using k- mean learning in a
supervised learning fashion, further tested to categorise the overall vehicle driving behaviour and later
used to predict the potential traffic accidents in a driving simulation analysis [30]. Another automatic
traffic incident severity classification system comparing different machine learning techniques has
been presented in [31]. Input data not only contains the standard traffic incident parameters (e.g.
incident location, date, time and affected lanes) but incident severity levels are also considered as an
important deciding parameter to issue control commands. The proposed ML model approach in [31]
is developed and tested to help manage the traffic incident management controllers to automate the
network traffic control process instead of just doing them manually and breaking the information for
classifying it into the pre-determined categories, and to minimize the effects an incident could have
on the network [31].
22
function based outer layer, for the network wide traffic congestion prediction is given in [36]. The SAE
developed model prediction accuracy was compared with other algorithms performance accuracy
which include; back propagation neural nets BP NN, random walk (RW) forecast method, support
vector machines (SVM) and the radial basis function neural network (RBF-NN) [36]. An overall
prediction accuracy improvement of 93% was shown by the SAE model when tested for 15 minutes
average duration, traffic flow rates larger than 450 vehicles and different number of hidden layers,
disregarding any other road parameters (weather, speed, density, traffic incident) had a direct or
indirect effect on the traffic volumes [36]. Similarly, in [19], individual vehicular feature based
behavioural study of four prominent features (speed, acceleration, and lane changing ratio) for pre-
effective traffic incident detection are used in an urban environment using the mobile sensors data
instead of fixed road sensors. Four different road scenarios of normal and incident traffic conditions,
with having each variable passed through the Kolmogorov-Smirnov test (K-S). Final consideration of
the empirical cumulative distribution function (ECDF) with an initial null-hypothesis revealed the
relative importance of these variables in effectively detecting different types of traffic incidents [21].
However, [21] does suggests that for higher vehicular flow volumes (>500 veh/h) the chosen variables
do not play a very significant role in differentiating between the normal and incident road conditions
thus the better implementation of the incident detection system (IDS) must also consider the traffic
volumes and flows.
Data Driven approach with GPS collected speed data to predict the traffic congestion evolution using
spatial-temporal features learned using recurrent neural network and restricted Boltzmann machine
(RNN-RBM), is given in [37]. To assist transport professional in congestion prediction and planning,
[37] ruled out the common assumption that traffic flow dynamics over the networks follow the power
law distribution all the times, which is generally assumed due to the lack of enough traffic sensors
data. Further the proposed RNN-RBM is tested on different road networks and compared to the
performance accuracy of SVM, conventional neural networks and the sensitivity analysis done with
different speed thresholds, [37] reports an overall prediction of accuracy of 88%, training accuracy of
95% and finally the overall algorithm execution time to be less than 6 minutes. Further, the results are
visualised on the GIS map for congested road planning. Proposed future recommended techniques
includes are the model pre-training using hessian-free optimisation method for parameter rational
initialisation and spatial road interaction to be considered for more precise training and prediction
accuracies [37]. Support vector regressor (SVR), Bayesian classifier and linear regressor are used as
main algorithms for the traffic flow estimations by predicting spatiotemporal traffic features in [38].
Traffic flow input parameter data and its relations are models into graphics form using 3D Markov
random field in spatiotemporal domain. Based upon the cliques of cones obtained in the
spatiotemporal domain and the overlapping between the successive cones, multiple SVR’s and Linear
Regressor were used to predict the dependencies on that defined cone [38]. Finally, to predict he
traffic flow for future time stamps, the speed level was found by decreasing the energy function [38].
SVR based prediction (84.64%) showed higher accuracy than the linear based approach (~76%) when
tested on the test data with multiple cone zones that are not even complex geometries but also
represents noisy traffic flow conditions [38].
In [39], a real-time distributed VANET approach for not just detecting the road incident-based
congestion in the urban environment but also to classify congestions into different types into
recurrent and non-recurrent congestions NRC (road incidents, work zones, special events and adverse
weather conditions). The proposed model considers the spatial temporal causality (cause/effect
features) measures with the training data produced synthetically from a real case study [39]. The
algorithms tested with their prediction accuracy include: Decision tree Classifier (DT) (88.63%), Naïve
Bayes (87.83%), random, Random Forest (89.51%), and boosting technique (89.17%). Future add on
23
techniques as suggested in [39] can include a voting process, a likelihood evaluation or a model to
value the density of information in the data. Also, in case of real-world data with the connected
vehicles strategy, Ann Arbor automated vehicle operational test can also be performed in a test
environment congestion estimation. Another novel statistical approach has been discussed in [18], to
detect traffic congestions from the vehicle flow/density data. The unique and advance statistics model
developed in [18], uses the piecewise switched linear (PWSL) traffic model to describe the traffic flow
dynamics from the data, with the leftover deterministic features from PWSL fed to exponentially
weighted moving average (EWMA) chart to detect traffic congestions. EWMA performance was
degraded in the presence of the real noisy data so [18] suggests the multiscale filtering before the
application of EWMA.
A detailed study of literature also tells us that a fair number of researchers have given their
contribution to mitigate the traffic road congestions. Reinforcement learning technique had been
used to control the variable speed limits to control congestions at the recurrent freeways bottlenecks
[40]. A Q-learning (QL) model in offline and variable speed limit (VSL) model in online mode has been
used in conjunction with one another [40]. The VSL controller agent, which works in an online mode,
is already trained with the optimal speed limits to be observed under different traffic conditions. The
modified cell-based traffic model is used to evaluate the prediction-based control of trained VSL
controller on a freeway recurrent bottleneck [40]. According to the paper [40], the proposed QL-VSL
controller-based strategy showed a much-improved congestion optimising (travel time reduction of
~49.34% in stable conditions, ~21.89% in fluctuating traffic conditions) model than a simple feedback
based VSL controller on a long-term performance basis. According to [40], the future development
may include the, sophisticated prediction models, with the combination of further traffic flow
estimation techniques with RL-VSM model for better performance, bottlenecks related to the incident,
lane reduction, work zone, event related, merged positions of road links or even multiple bottle necks
together, and other single or multiple reward functions, as part of better RL, can also be considered.
Furthermore, the future recommendations in [40], are of the view to include advanced deep learning
techniques as part of the VSL and overall model development strategy to improve the traffic
congestions.
24
Title & Reference Data Resolution / Time
Interval
On the capacity of bus transit systems [35] 1-hour recorded data
1.
resolution
Traffic origins: A simple visualization technique to support 15-minutes recorded data
2.
traffic incident analysis [9] resolution
Traffic incident data analysis and performance measures
3. 5-minutes aggregated data
development [28]
4. A Hybrid Approach for Automatic Incident Detection [29] 1-minute data resolution
Traffic Flow Prediction with Big Data: A Deep Learning
5. 5-minutes aggregated data
Approach [36]
Large-scale transportation network congestion evolution 2-minutes recorded data
prediction using deep learning theory [37] resolution
6.
5, 10, 30, 60 – minutes
aggregated data
Predicting Spatiotemporal Traffic Flow based on 30-seconds & 1-minute
Support Vector Regression and Bayesian Classifier [38] recorded data resolution
7.
1– minutes average
aggregated data
Effective Variables for Urban Traffic Incident Detection [19] 1-second recorded data
8.
resolution
Automatic classification of traffic incident's severity using 1-day recorded data
9.
machine learning approaches [31] resolution
Fuzzy Deep Learning based Urban Traffic Incident Detection
10. 100-seconds aggregated data
[32]
Learning traffic as images: A deep convolutional neural 1-minute recorded data
network for large-scale transportation network speed resolution
11. prediction [39] 2-minutes aggregated data
25
Bus Dwell Time Prediction Based on KNN [48]
21. --------
26
Distributed Classification of Urban multi-Step (5-15 minutes)
12. Congestion Using VANET [40] multi-variable predictions
27
their own ways. Generally, the traffic road conditions can be divided into two main categories: Normal
and Abnormal traffic conditions. In [55] two extreme traffic conditions are considered for the
proposed deep LSTM model i.e. peak hours and the post-accident conditions. Likewise, heterogenous
traffic conditions are the focus for the proposed neural network (ANN) based model [15]. Other
researchers have also focussed on exploiting the conditioning based traffic data for their proposed
prediction models [44][50].
Literature review suggests that the passenger wait time at the bus stop have been generally grouped
into three main categories, based upon the time logic estimation or inference: waiting time with
microscopic simulation level models, which involves different types of buses stopping at various types
of bus stops [57]. The simulation model developed in [58] simulates stop operations, their working
capacity with recorded delays that results in queues at the bus stops. Very basic level simulator has
been used to create a virtual environment that simulates the cases under study in [57][35]. [57][35].
In the second category, the transit travel studies focus mainly on finding the statistical relationship
between the actual waiting times and the ones recorded by the passengers [59]. And lastly the waiting
times at bus stops are inferred by manipulating the vehicle’s arrival information [60][61]. The wait
times at the bus stops that the passenger have to encounter can be deduced using a classical
probabilistic approach and queueing model [62]. The Queueing Model takes into effect the stop arrival
headways, bus or service operating times and the total number of services along throughout the day
serving that same bus stop. This is one way to estimate the passenger waiting time at bus stops.
The bus spends as much time at the stop berth as the passenger dwell time and none during the actual
headway until the next bus comes into the berth. So it is rather more practical, to calculate the average
waiting time to render an application of vacation queueing model rather than the classical queueing
model [63].
According to [64], a more practical approach in order to model the wait time at bus stops is the use
of vacation queueing model as the classical queueing model is incapable to model the periods of
absence in between adjacent bus headways, which is the main time when the passenger actually waits
for the bus to arrive. The vacation queueing theory allows to classify the wait times of the system in
equilibrium state into two different types: Wait times at the stop modelled using the classical queueing
model without the consideration of the vacant service periods and in addition the second being the
derivative from the length of the vacant service period using the actual vacation model. As we know
that a stochastic process can be termed as a sequence of events in which the outcome always depends
upon certain probability [62]. Markov process is a kind of stochastic process in which the outcomes at
any stage depends on just the previous event, outcome possibilities are always finite, and the
probabilities being considered for the problem remain constant over time of the event until the next
28
event happens. In [65] the passengers wait time is estimated to be the duration, from the moment
when the bus boarding starts, when the bus door opens up to the moment, when the last passenger
boards the bus, and this last passenger instant is then termed as the ‘achievement instant’.
Considering the achievement instant as the last finite point of the Markov finite chain [62], the Laplace
transform of the vacation queueing model helps deducing the wait time in [64]. The arrival rate of the
passengers at the bus stops is considered according to a Poisson process in [64].
Bus headway probability prediction model using relevance vector machine (RVM) utilises the time
series headways data, travel times and the passenger demands at previous stops [66]. With the
relevance vector algorithm in [66], the upper and lower bounds of the probability of bus headway are
predicted with up to a confidence level of 95% and the algorithm in general outperforms the SVM,
genetic SVM, Kalman filters k-NN, ANN in terms of testing accuracy and confidence levels. Passenger
and transit rider behaviour is greatly affected by the reliability of bus headway information. This allows
the transit riders to plan their trips more effectively and to transit operators on the other hand to
maintain the smooth transit flow by intelligent bus scheduling [66]. Predictions based upon the
accelerated survival model proposed in [67] not only estimate the bus travel times down to the bus
stops but also estimate the uncertainty associated with it. When predicting the time estimate of the
travel times, simultaneous survival prediction models, when compared with the linear regression
models showed relatively same root mean squared and (RMS) and mean absolute errors (MAEs) but
the survival models on the other hand does much better in stating the uncertainty associated with
each prediction [67]. A least square support vector machine (LS-SVM) is utilised to highlight the bus
bunching with the headway pattern predictions [68]. The proposed model in [68] captures the bus
headway irregularity based on the transit smart card data along with the past headway, passenger
demands and travel times data to predict the travel time pattern based bus bunching on the following
stops. LS-SVM in [68] exhibits 95% accuracy, more sensitivity and specificity in predicting bus bunching
occurrences hence the travel times, when compared with KNN, ANN, RF and gaussian process
regression (GPR) algorithms.
Based on the literature study [69], bus travel time predictions are classified as naïve approaches, data
driven approaches and traffic flow based approaches. The algorithm proposed in [69] assume the
spatiotemporal variations in the travel time data. Most studies just consider either the spatial or
temporal data alone. According to the proposed model in [69], vehicle conservation equations are
rewritten in terms of traffic speeds instead of flow and density, following a differential approach using
the traffic stream models. Godunov scheme was then utilised to discretise the vehicle equation and
fed to Kalman filters-based predictor to predict the travel times. Model developed in [69]
outperformed the classical average, regression, ANN and simple temporal and spatial methods based
on the past data.
Socio-economic conditions, weather conditions, trip specific characteristics including but not limited
to, infrastructure facility usage have a great effect on the transit rider’s total travel time. Comparison
between the passenger perceived travel time and the actual travel time in [70] shows that passengers
do perceive the travel time to be greater than the actual time at any stage of the transit journey. Three
linear regression based models, when applied on the transit survey data at all stages of a journey to
predict the perceived walking, waiting and in-vehicle time do authenticate the effect socioeconomic
characteristics and trip stages have on the travel time perceptions [70]. Interval based sampling
approach is also thought to be a better approach to estimate the uncertainties in the stop arrival and
overall travel times of the buses. It does cover the ramifications of both the late and early bus arriving.
Interval based model to estimate the travel time in [34], made use of the similar approach by
generating the probability distributions to close to accuracy predictions of the arrival times at the
29
respective stops. Systematically thought of intervals (8km) based probability distribution of the travel
times are found and compared with different distance interval thresholds. Lognormal and normal
distributions were found to be the better estimates of the travel time behaviour for before and after
the cut-off horizon intervals of 7-8km, respectively [34]. Similar approach of route stop based
segmentation have also been adopted in [71] to predict the journey travel time using scheduled time
data based on the combination of the queueing theory model and ML decision tree algorithm.
According to the snapshot model (queueing theory) in [71], some segments use learning (hybrid-
Queueing theory and ML) to predict while other are learning free. But learning free segments based
on queueing theory do encounter some prediction outliers which are then effectively identified using
decision tree ML based prediction trained on historical data [71].
The travel time predictions are introduced with a lot of variability in the urban environments. This
variability in the travel time prediction is a night mare for the transit riders. The variability may arise
due to the schedule and design of the transit lines themselves and because of different operators
running different trip schedules due to their own needs and demands. Also, the buses using the same
lanes as with the other public vehicles also introduce a factor of variability. Through recent advances
in the data gathering techniques and technology made it possible to research for the root causes of
variability. The experimental justification to the variability have been adopted in [72], in which
Automated vehicle counter (VAC) and automated vehicle location (AVL) data is put to test for a basic
analytical study. Results from [72], shows a strong similarity between the urban traffic temporal
patterns and the travel times and the possibility of more short term travel time predictions in an urban
environment.
Recent work [73], predicts the travel times based on the routes recorded traces of GPS trajectories
data. In [73], the issue of sparsity in the GPS recorded data have been addressed, as some of the routes
may not be even travelled by the GPS equipped vehicles once in the designated considered time. A
tensor-based modelling approach is adopted, that models different driver’s travel times on different
road segments on different scheduled time. Tensor missing values are filled with the context aware
decomposition approach with geospatial, historical and temporal data learned from past trajectories
and map data [73]. An optimal estimation of the missing tensor trajectory value concatenation is
finally used for the considered time slots [73]. Not directly related but a similar parametric learning
spatial-temporal hidden Markov model (STHMM) is used in [74] to model the dependencies of
different traffic time series GPS tracking data and to infer the future travel costs in a transportation
network. Data sparsity, heterogeneity and spatial-temporal correlations are the major driving force
for the models to learn the STHMM parameter [74]. In contrast to [74], Spatial-temporal random (STRF)
based traffic future conditions, is implemented in [75] for the purpose of dynamic route planning.
Travel time is the main factor is future predictions and dynamic route planning. Although both use
non-ML techniques in their models for future traffic estimations. Further in [75], estimations of the
incomplete data fields estimation done through gaussian process regression technique with
conditioning of spatial regression on the intermediate predictions in discrete probabilistic graphical
models for better inter dependency explanations though historical and real-time data. Real time traffic
speed, flow, travel time measurements and statistical bundles of information is estimation in an IBM
Infosphere Streams environment implemented applications, to be accessible for masses [76].
Probability distribution Function (PDF) of arrival times, PDF smoothing and classification techniques
have been used for the estimation of the parameters for both online and offline mode utilising the
historical sparse, noisy Automated Vehicle Location (AVL) data [76].
Missing time series traffic data imputation technique based on gap-sensitive windowed (GSW) k
nearest neighbour (GSW-KNN) have been presented in [77]. There results show a 34% accuracy then
30
general KNN and benchmarking methods. This method can be used towards our aim to predict travel
time from past series data. Short term travel time predictions between two points using conventional
feed forward backpropagation ANN algorithms have been put to use in [78]. The data used consisted
of two points as start and stop points that detected the vehicle movement through them, and the
times were recorded. Not directly related in terms of aims but the mechanism presented in [79],
denoising stacked auto encoders works on temporal and spatial factors for traffic data imputation,
can be used for the travel time prediction. The proposed model [79], as reported, shows a better
performance compared with ARIMA and BP NN models. A geographically weighted regression (GWR)
model have been proposed in [80]. The proposed model [80] compares the least squares (OLS)
multiple regression traditional models, for the application of forecasting at the train station. The
similar approach can be used to forecast the passenger wait time and the total bus travel time. A
Bayesian network and neural network based double star modular framework approach have been
formulated considering the spatial and temporal relations and to predict traffic network speeds and
compared with the seasonal autoregressive moving average model (SARMA) [22]s. Different Time
series models provide the priori estimations of predictions to the Bayesian network [22] .
Models Limitations
Autoregressive Integrated Moving Could be used for more than one-time interval predictions but
Average (ARIMA) Model the prediction performance degrades with the increase in
prediction horizon.
Seasonal Autoregressive Like ARIMA but incorporates the seasonality of the time series.
Integrated Moving Average Works best for non-stationary seasonal series as stationary
(SARIMA) seasonal component in difficult to tackle.
Kalman Filter model Cannot predict well enough due to stochastic and non-linear
approach of traffic data. As simple biases update for the net
value is incorporated which is lacks the behavioural modelling.
Auto Regressive Moving Average Better for short term forecasts, same as ARIMA model. Better
(ARMA) Model suited for stationary time series. So, the integrated part has not
31
much of the effect on the final forecasts. Performance can be
enhanced by integrating the seasonal component to it.
Exponential Smoothing, Biased significance towards the most recent observations,
Simple Smoothing, cannot handle the real-time trends well. Poor performance for
Complex Time Series Analysis and unbalanced class or the seasonal components
Filtering methods
Weighted average method, Importance given to those observations with heavy weights,
Weighted moving average custom weightiness decision is complex to make in traffic flow
(WMA), Geographical Weighted data and so the trends filtering.
Regression (GWR)
Mean speed based on two- Macroscopic approach, estimated inference for the estimated
dimensional linear interpolation, variable is better in accuracy with higher resolution in data
Spatial-Temporal Hidden Markov observations.
models (STHMM),
Piecewise Method, Bivariate A pattern library must be constructed based on the data
Movelets intelligence for this technique to work effectively, difficult to do
in multi trend traffic environment.
Gaussian Model for Flow Data Additional parameters inferred from data are required e.g.
Imputation mean, covariance among data, which accounts for bad
estimations for data that is already incomplete. Prior data
knowledge is difficult to estimate.
Tensor based multi-dimensional Modelling approach needs an objective function that considers
modelling, Linear, Polynomial, the trades off between the concerned variables.
Power, and Exponential curve
data fitting
Table 2. 3 Summary of Parametric Models.
The problem with most of these parametric approaches is that they can effectively be employed for
only one-time interval prediction and cannot predict well enough due to the stochastic and nonlinear
nature of traffic data. this can better suit short term forecasts only which are well biased towards the
recent observations in the data, thus this makes the parametric approaches incapable of handling real
world trends.
2.2.2 Non-Parametric and Data Driven Data Driven Machine Learning Methods:
Models with not fixed structure and not pre-defined number of fixed parameters falls into this
category Deep learning long short-term memory (LSTM), gated recurrent units (GRU), neural network
(NN), and recurrent neural network (RNN). Non-parametric approaches mostly constitute of the data
driven models as well. They utilise the empirical underlying algorithms to provide the predictions. They
can assume any assumptions based on the data formation and uncertainty as they estimate the model
parameters, a classic example of this approach is the neural networks.
A few years ago machine learning (ML) strategy based traffic parameter prediction algorithms [48]
have been utilised. These data driven approaches are also termed as non-parametric approaches.
These have been utilised with not fixed structure and not pre-defined number of fixed parameters.
The most commonly tested non-parametric approaches for spatiotemporal traffic forecasting includes
the k-nearest neighbours (KNN) [49][50] and support vector regression (SVR) [47]. However, these
shallow ML algorithms mostly wok in a supervised manner which makes their performance dependent
upon the dataset manual features selection criteria.
32
Limitations
Models
K-Nearest Neighbour (KNN) Models, Exhibits a better performance if data correlation is
K-means and Hierarchical clustering, too low. Also, algorithm performance diminishes
Linear Regression, Random forest (RF). significantly in high dimensional data
classification. But traffic series data is a highly
correlated data.
Support Vector Regression (SVR)
Back-Propagation Neural Network (BPNN), Out performs conventional linear parametric
Multi-Layer Feed Forward Neural Networks models but struggles with time series data during
(ANN) and variants of NN data learning phase for finding the absolute global
minimum since there might be multiple minimums
for the trendy non-stationary data traffic series.
Recurrent Neural Network (RNN) Designed to cope with time series data prediction
and it’s variants (Long Short-Term Memory problems, gives the options to learn across varying
(LSTM), Gated Recurrent Units (GRU)) time steps at once. Due to the recurrent feedback
the relative prediction performance is better than
the simple NNs.
Deep Learning Models, Convolutional Neural Raw deep learning methods are devised mostly for
Network (CNN) with multiple layers image learning purposes and need some sort of
modifications for time stamp data learning as the
dependency might be a big issue due to high data
co-relations and scattered trends which in cases of
image classification in not so obvious (same group
of pixels might appear together that makes the
classification job easier). Similar concept is
employed to exploit the traffic series spatial data,
if input to CNN is structured properly.
Bayesian Networks, Naïve Bayes, and Self- Naïve Bayes, to mine spatiotemporal performance
organizing maps trends at the network level and for individual links.
Bayesian networks along with naïve Bayes solve
computational complexity by considering
correlated features as mutually independent
happening events to calculate their probability
distributions. Further on link level they can be
employed for conditional probability-based flow
predictions
Principal Component Analysis (PCA) PCA is dimensionality reduction technique in first
place primarily based on finding the most relevant
eigen matrices of data variable. PCA based models
might consider those parameters which have less
or no decision in final prediction and hence it is not
favourable for traffic predictions and mostly used
in feature dimensionality reduction purposes.
33
With the advancement in the ML algorithms, a bit more sophisticated dense supervised learning
approach is applied for traffic predictions by using back propagation techniques in artificially
connected neural networks (ANN) [25][51]. Although ANN out performs conventional linear
parametric models but struggles with simple time series data learning and finding global minimum.
Recently, deep recurrent neural networks (RNN) have shown some great promises for dynamic
sequential modelling especially in the field of speech recognition [81][82]. Simple RNNs however
suffer from gradient explosion for extra-long sequence training which results in information loss and
reduced performance [83]. Fu R et al [42], have used the RNN variants called long short term memory
(LSTM) [84] and gated recurrent units (GRU) for the traffic forecasting because of their ability to retain
and pass on the information that is necessary and forget what is redundant using the output and forget
gates.
Models Limitations
Extremely randomised Trees (ET), The assumed additive (and uncorrelated) structure
Spatial-Temporal Random Field (STRF), of the segmented model is less accurate for high
Spatial-Temporal Hidden Markov Model loads of correlated data. Where previous data
(STHMM) for parameter learning resembles the current one more for example
(evening and seasonal easily predictable trends).
Least Square Support Vector Machine (LS- LS-SVM fails to predict the extreme high flow rates
SVM), future predictions and the possible reason causing
Temporal Window based Support Vector such failures is the sudden increase in traffic
Machine (v-SVR), congestion and un-anticipated trends. RVM
Relevance Vector Machine Regression sometimes needs to have the upper and lower
(RVM) boundary values estimated beforehand for the
variable to be predicted.
Fuzzy Logic Controlled Deep Neural FCDN uses the fuzzy login in back propagation
Network (FCDN), weight training and the fuzzy rules needs to be
defined beforehand that makes it a semi-
unsupervised learning and difficult to implement in
traffic problems.
Denoising Stacked Auto encoders, Over fits the input data but lacks the inherent
Stacked Auto encoder (SAE) generalisation ability of time series-based data. The
learning of SAE can be enhances by further
recurrent layers.
AdaBoost Deals with an ensemble built iteratively by
reweighting the learning samples based on how well
their target variable is predicted by the current
ensemble. The worse the prediction is, the higher the
weight becomes. So, the overall performance
accuracy is dependent upon prediction algorithm
34
itself. The working principle of AdaBoost resembles
to that of reinforcement learning (RL)
Gap-Sensitive Windowed KNN (GSW- Combining the power of two ML algorithms and
KNN), their short comings as well. GSW-KNN seems a
KNN-PCA, promising approach for missing data imputation for
KNN-RFE, time series traffic parametric data.
Random Forests-PCA,
Random Forests-RFE
Convolutional Neural Network (CNN) – The deep learning models that combines the spatial
Recurrent Neural Network (RNN) feature exploiting power of CNN and temporal using
(DeepTransport) LSTM. Generally Good for dealing structural missing
Convolutional LSTM NN (ConvLSTM NN) data. Can Learns adjacent roads spatiotemporal
features based on the impact on a road section and
its local network. Critical road sections can be ranked
according to the distance and thus can be
distinguished based on their order. Good for mining
learning road topology.
Table 2. 5 Summary of Hybrid Models.
Where 𝜂𝜂𝑖𝑖 & 𝜏𝜏𝑖𝑖 respectively are the mean travel time and the standard deviation of trip 𝑖𝑖. Equation 2.1
serves as the bare minimum equation to serve the concept of finding traffic flow predictions on a
network level.
35
Summary
In this chapter the series data predictions techniques for the traffic variables in literature are explored.
The focus was on traffic flow modelling techniques using machine learning, but the latest methods of
various statistical flow forecasting are also explored. The traffic prediction model and techniques are
adaptable hence some areas of these closely related fields are also explored. The closely related field
are passenger wait times forecasting at bus stops, bus headway stops times and congestion
predictions. In the next chapter the literature extracted techniques are compared for their advantages
and disadvantages in terms of their adaptability towards the proposed methodology.
This chapter reviews the state-of-the-art traffic prediction techniques. Closely related models are
grouped into three categories based on how well they are constructed and how well they treat the
input traffic time series data. Finally, their usability and performance limitations are compared.
36
3. Models and Architectures
In this chapter the specific chosen models are discussed in detail. The models are then implemented
in the experiments. The reasons for choosing these specific models are explained in section 3.1
whereas section 3.2 lists the complete implementation details with the frameworks used along with
each individual model pipelines for searching best performing model parameters.
For the univariate training data ( 𝑋𝑋𝑇𝑇 = 𝑥𝑥1 , 𝑥𝑥2 , … . , 𝑥𝑥𝑡𝑡 ), with windows size greater than zero (k >0), the
kth moving average or HA at 𝑡𝑡 is given as:
37
ARIMA expects the time series input to be seasonal stationarity so the time series is put for the
stationarity test for null hypothesis as stationary using Augmented Dicky Fuller (ADF) test [86, p. 9] is
performed in the experiments section. 𝑆𝑆ARIMA (p, d, q, P, D, Q, m) as the name suggest deals with the
seasonal trendy stationary data, contains three more parameters, which are 𝑃𝑃, 𝐷𝐷,𝑄𝑄 and 𝑚𝑚. 𝑃𝑃 is the
order of the seasonal component for AR model, 𝐷𝐷 is the order of the integration seasonal model, 𝑄𝑄 is
the order of seasonal component belonging to MA model and 𝑚𝑚 represents the cyclic seasonal period
or time steps for the seasonal lag consideration. According to [87, p. 208], SARIMA model applied to
the time series 𝑦𝑦𝑡𝑡 is given by the following expression:
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 (𝑝𝑝, 𝑑𝑑, 𝑞𝑞) 𝑥𝑥 (𝑃𝑃, 𝐷𝐷, 𝑄𝑄)𝑚𝑚 = ( 𝛷𝛷 (𝐿𝐿𝑚𝑚 )𝜙𝜙 (𝐿𝐿) 𝛥𝛥𝑑𝑑 𝛥𝛥𝐷𝐷 𝑚𝑚
𝑚𝑚 𝑦𝑦𝑡𝑡 = 𝜃𝜃0 + 𝛩𝛩(𝐿𝐿 )𝜃𝜃(𝐿𝐿)𝜖𝜖𝑡𝑡 ) (3.2)
Where 𝑚𝑚 is the seasonal length, 𝐿𝐿 is the lag operator and 𝜖𝜖𝑡𝑡 is the gaussian white noise process with
zero mean and variance. Δ𝑑𝑑 and Δ𝐷𝐷𝑚𝑚 are the difference and seasonal difference operators with 𝑑𝑑 and
𝐷𝐷 representing their orders respectively. The difference operations help transform the non-stationary
time series into a stationary time series. The AR part is this model is derived by multiplying
autoregressive lag polynomials 𝜙𝜙 (𝐿𝐿) and Φ (𝐿𝐿𝑚𝑚 ). And the MA part if represented by the moving
average lag polynomials 𝜃𝜃(𝐿𝐿) and Θ(𝐿𝐿𝑚𝑚 ).
38
3.1.4 Support Vector Regressor (SVR)
Support Vector Regressor (SVR) is chosen to deal with the nonlinear data prediction with its capability
to fit regression functions to the set of data points. SVR model is a non-parametric model and can be
applied without any prior data knowledge with ease. Support Vector Machines (SVM) theory, the
predecessor of SVR with its methods for estimating the indicators of a real valued function were first
discussed in [88, p. 443]. The SVR is a classical statistical theory-based learning models that works
implements the structural risk minimisation principle from computational learning theory. It works
like a pattern recognition; the basic aim is to map input data vector 𝑥𝑥 into a high dimensional feature
space 𝐹𝐹 using a non-linear mapping function Φ. The linear regressions are carried out in higher space
F which corresponds to the non-linear regression in the low dimensional input space. The
corresponding regression function is given as:
𝑤𝑤 represents the vector in the feature space, Φ(x) is the mapping function to map input x and b is
the threshold. The mapping function is usually a kernel function. Four different kernels linear, poly
with third order degree, sigmoid and radial basis function (RBF) considered. Replacing the dot product
with kernel function enables the higher dimensional feature space mapping easy. RBF is usually among
the most popular choice for nonlinear mapping, chosen because of its robustness short state
predictions [66] and is defines as:
𝛾𝛾 in equation 3.4 represents the gaussian bandwidth. With the aim to find the optimal weight 𝑤𝑤 and
bias 𝑏𝑏 . To consider the regression problem the flatness of weights and the error generated though
empirical risk estimation process are considered [88, p. 473]. The 𝑤𝑤 value is optimised by minimising
the sum of empirical risk 𝑅𝑅𝑒𝑒𝑒𝑒𝑒𝑒 (𝑓𝑓) and the complexity term |𝑤𝑤|2 . The regression function is given as:
𝜆𝜆
𝑅𝑅𝑟𝑟𝑟𝑟𝑟𝑟 ( 𝑓𝑓) = 𝑅𝑅𝑒𝑒𝑒𝑒𝑒𝑒 (𝑓𝑓) + |𝑤𝑤|2 = 𝐶𝐶 ∑𝑁𝑁
𝑖𝑖=1 𝛤𝛤 ( 𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝑦𝑦𝑖𝑖 ) + 𝜆𝜆 | 𝑤𝑤 |
2
(3.5)
2
Where C is a curve fitting number usually defined beforehand, N is the sample size and 𝛤𝛤 is the loss
function with 𝜆𝜆 being the regularisation constant. Different loss functions are considered which when
input into equation 3.5, it can be reduced to a quadratic statistical problem defines as:
1
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚( ∑𝑁𝑁 ∗ ∗ 𝑁𝑁 ∗
𝑖𝑖,𝑗𝑗=1(𝛼𝛼𝑖𝑖 − 𝛼𝛼𝑖𝑖 )�𝛼𝛼𝑗𝑗 − 𝛼𝛼𝑗𝑗 � 𝑘𝑘�𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 � − ∑𝑖𝑖=1 𝛼𝛼𝑖𝑖 (𝑦𝑦𝑖𝑖 − 𝜀𝜀) − 𝛼𝛼𝑖𝑖 (𝑦𝑦𝑖𝑖 + 𝜀𝜀) ) (3.6)
2
𝛼𝛼𝑖𝑖 and 𝛼𝛼𝑖𝑖∗ are Lagrange multipliers and are found through the constraints of equation 3.7.
Equation 3.7 represents the weight term in terms of the data which when input back into the original
equation 3.3 gives it the form given as:
39
3.1.5 Feed Forward Backpropagation Neural Network (FFBNN)
Neural Network (NN) is another popular non-parametric model which in the basis of Artificial
Intelligence science. The basic idea of NN is to mimic the human brain and its decision-making power.
Feed Forward Backpropagation Neural Network (FFBNN) is one of the forms of NN other being the
recursive NNs. FFBNN can harvest useful nonlinear mappings and insights from input data features
and approximate them with close to real value. Neurons are the basic building blocks of the FFBNN.
Each neuron receives an input signal, processes it through an activation function to generate an output
according to the weight value associated with the neuron. The neuron weights are then calibrated in
the feedback training process using the weight error difference from network output hence the name
back propagation is given to the model. These neurons are arranged in the networks of layers making
a feed forward network. The simplest of the neurons in the perceptron. With a set of inputs
( 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) with the network output 𝑦𝑦 the weighted sum of perceptron inputs 𝑤𝑤𝑖𝑖𝑖𝑖 𝑥𝑥𝑗𝑗 with a
threshold bias value 𝜗𝜗𝑖𝑖 , subject to an activation function Φ(.) is given as:
𝑦𝑦𝑖𝑖 = Φ ( ∑𝑁𝑁
𝑗𝑗=1 𝑤𝑤𝑖𝑖𝑖𝑖 𝑥𝑥𝑗𝑗 − 𝜗𝜗𝑖𝑖 ) (3.9)
A detailed discussion of an MLP and its graphical illustration is also given in section 1.1. The group of
different perceptron form different layers of the network with the inner to the visible input and output
layers are termed as hidden layer perceptron. A network with multiple perceptron is often called
multi-layer perceptron (MLP). At each hidden layer of the network the data features are computed.
There are many different types of activation functions. The set of activation functions considered in
this thesis are given in table 3.1. The FFBNN training involves two phases: forward and backward
passes.
1
1. sigmoid Φ(𝑥𝑥) = ; Φ(𝑥𝑥) ∈ [0,1]
1+𝑒𝑒 −𝑥𝑥
𝑥𝑥
𝑒𝑒 𝑗𝑗
2. softmax Φ(𝑥𝑥)𝑗𝑗 = ∑𝐾𝐾 𝑥𝑥𝑘𝑘 ; 𝑗𝑗 = 1,2, … , 𝐾𝐾; Φ(𝑥𝑥)𝑗𝑗 ∈ [0,1]
𝑘𝑘=1 𝑒𝑒
1−𝑒𝑒 −2𝑥𝑥
3. tanh Φ(𝑥𝑥) = ; Φ(𝑥𝑥) ∈ [−1, +1]
1+𝑒𝑒 −2𝑥𝑥
5. linear Φ(𝑥𝑥) = x
Nomenclature: softmax represents the normalised exponential function for multiclass logistic function flow values in our
case, that makes K-dimensional vector x to have values in range [0, 1] that all add up to 1.
1) Forward Pass
This is the first step in the feed forward network where training data is passed through the network
and the error estimate Δ 𝑓𝑓 is calculated based on the loss or cost function and the final network output.
2) Backward Pass
Given the calculated error estimate in forward pass. The weights of the network in the backward pass
are altered in an iterative manner. Different network weight update or optimisation techniques have
been used. Table 3.2 gives the optimisation techniques considered in the scope of this thesis. In table
3.1, SGD is the most common technique mostly used which involves the calculation of second order
gradient descent which is then adjusted back into the weight’s matrix. Adaptive SGD or Adagrad on
40
the other hand inherits the gradient descent with adaptive qualities. Whereas in RMSprop the root
mean square of the second order gradient descent is taken and then adjusted back into the weights.
Nomenclature: 𝒘𝒘𝒊𝒊 = (𝒚𝒚 �𝒊𝒊 − 𝒚𝒚𝒊𝒊 )𝟐𝟐 , 𝜂𝜂 is the learning rate, 𝛼𝛼 is the learning momentum factor, 𝑔𝑔𝑡𝑡 is the iteration gradient,𝐺𝐺𝑡𝑡 =
𝑁𝑁
∑𝑖𝑖=1 𝑔𝑔2 𝑡𝑡,𝑖𝑖 is the diagonal.
FFBNN Convergence
After several successful forward and backward passes the FFBNN starts to converge to find the local
minimum in the error curve. The single variable best performing optimizer may differ from the multi
variable bets performing optimizer this is because of the most suitable to make the network converge
for learning the data. For this reason, each optimizer uses the learning rate 𝜂𝜂 which is definable and
can make a difference while the optimizer goal is to converge. A large 𝜂𝜂 can sometimes make the
network step over the local error minimum towards the direction opposite to direction of convergence
whereas the least 𝜂𝜂 can make the network to take it much longer to converge. The amount of training
batch considered for training for training can also make a difference. Different optimisers performed
differently with different batch size. Where epochs are the number of iterations carried out to go
through each data sample feature once.
𝐸𝐸 (𝑣𝑣, ℎ |𝜃𝜃) = − ∑𝑖𝑖 𝑏𝑏𝑖𝑖 𝑣𝑣𝑖𝑖 − ∑𝑗𝑗 𝑎𝑎𝑗𝑗 ℎ𝑗𝑗 − ∑𝑖𝑖 ∑𝑗𝑗 𝑤𝑤𝑖𝑖𝑖𝑖 𝑣𝑣𝑖𝑖 ℎ𝑗𝑗 (3.10)
From this the joint probability distribution between the hidden and visible layer is given as:
exp(−𝐸𝐸 ( 𝑣𝑣,ℎ |𝜃𝜃 ) )
𝑝𝑝 (𝑣𝑣, ℎ |𝜃𝜃) = ∑𝑣𝑣,ℎ exp( −𝐸𝐸 ( 𝑣𝑣,ℎ |𝜃𝜃 ) )
(3.11)
41
Figure 3. 1 RBM Structure (left) and DBN Model (right).
To obtain the optimal parameters for the set 𝜃𝜃 for a given data vector 𝑣𝑣, the derivative approach is
used. For this the gradient log-likelihood estimation is calculated as below:
𝜕𝜕 log 𝑝𝑝 ( 𝑣𝑣 |𝜃𝜃 )
= 〈 𝑣𝑣𝑖𝑖 ℎ𝑗𝑗 〉𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 〈 𝑣𝑣𝑖𝑖 ℎ𝑗𝑗 〉𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝜕𝜕 𝑤𝑤𝑖𝑖𝑖𝑖
𝜕𝜕 log 𝑝𝑝 ( 𝑣𝑣 |𝜃𝜃 )
= 〈 ℎ𝑗𝑗 〉𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 〈 ℎ𝑗𝑗 〉𝑚𝑚𝑚𝑚𝑚𝑚𝑒𝑒𝑙𝑙 (3.13)
𝜕𝜕 𝑎𝑎𝑗𝑗
𝜕𝜕 log 𝑝𝑝 ( 𝑣𝑣 |𝜃𝜃 )
= 〈 𝑣𝑣𝑖𝑖 〉𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 − 〈 𝑣𝑣𝑖𝑖 〉𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝜕𝜕 𝑏𝑏𝑖𝑖
Where 〈 . 〉 is expectation of the distributions. There are no corresponding connections between the
RBM layer units themselves. So, the layer distributions are easily estimated though conditional
probability distributions, given as:
1
𝑝𝑝 �ℎ𝑗𝑗 �𝑣𝑣, 𝜃𝜃) =
1 + exp(− ∑𝑖𝑖 𝑤𝑤𝑖𝑖𝑖𝑖 𝑣𝑣𝑖𝑖 − 𝑎𝑎𝑗𝑗 )
(3.14)
1
𝑝𝑝 (𝑣𝑣𝑖𝑖 |ℎ, 𝜃𝜃) =
1 + exp(− ∑𝑗𝑗 𝑤𝑤𝑖𝑖𝑖𝑖 ℎ𝑗𝑗 − 𝑏𝑏𝑖𝑖 )
The weights in the RBM components are fine-tuned by the contrastive divergence (CD) [89, p. 3] by
default though fast and greedy unsupervised method, and with one additional layer of neuron at the
end overall model weights are trained with the backpropagation using supervised learning approach.
An activation function is also used in the last output layer.
42
Figure 3. 2 Steps in Convolution Operation (left) and CNN (C-FBNN) Model (right).
effect of filtering one of them by another. The general convolutional operator when applied two
discrete functions 𝑓𝑓 and 𝑔𝑔 can be given as the summation of the following form:
The input for CNN is an image representing one state, the pixel values of which represents the scaled
input values. The convolutional layers act as a filter which when built into the model does emphasize
on certain characteristics of the input feature vectors. It behaves like an automatic custom detector
of feature patterns to create a feature map. For an input feature matrix, the commonly applied
43
functions are of two to five elements per dimension and are declared zero on the remaining elements,
when strides onto the input feature matrices. The resulting small matrices which represent the groups
of filter functions are called kernels. At a given position of the convolutional kernel, the element-wise
multiplication of each kernel cell value and the corresponding feature value that overlaps the kernel
cell to take the sum of that. For the kernel stride 𝑚𝑚 of width and height ℎ, convolutional output 𝑥𝑥,
input 𝑤𝑤, the kernel outputs or sub matrices when slid on the input are given as:
ℎ𝑖𝑖,𝑗𝑗 = ∑𝑚𝑚 𝑚𝑚
𝑘𝑘=1 ∑𝑙𝑙=1 𝑤𝑤(𝑘𝑘 ,𝑙𝑙 ) 𝑥𝑥( 𝑖𝑖+𝑘𝑘−1 ,𝑗𝑗+𝑙𝑙−1) (3.16)
Pooling layers after the convolutional operation makes the CNN output as translational invariant.
Two pooling mechanism are commonly used as average and max pooling. Average pooling or batch
normalisation is used in the CNN model in this thesis. The max pooling operation lists the maximum
values as outputs from all the values input by the kernel operations if they fall into the kernel range
compensating the number strides used for sliding the kernel. This is mathematically given as below:
The CNN model final output 𝑦𝑦𝑖𝑖,𝑙𝑙 is given by equation 3.18 where 𝑓𝑓𝑎𝑎 is the output layer activation
function and 𝑏𝑏𝑖𝑖,𝑗𝑗 is the bias term. As shown in figure 3.2, the generated feature map from
convolutional layers in then passed through simple FFBN network that gets activated for certain
pattern or feature values as actually present tin the input. For the training, each convolutional kernel
layer parameters are optimised for the involved parameters to reflect the useful features to the FFBNN.
FFFBNN is trained using the BP algorithm with the suitable optimiser as already discussed in section
3.1.5.
𝑐𝑐𝑡𝑡 = 𝑓𝑓𝑡𝑡 ⋅ 𝑐𝑐𝑡𝑡−1 + 𝑖𝑖𝑡𝑡 ⋅ tanh( 𝑥𝑥 𝑡𝑡 𝑊𝑊𝑥𝑥𝑥𝑥 + ℎ𝑡𝑡−1 𝑊𝑊ℎ𝑐𝑐 + 𝑏𝑏𝑐𝑐 ) (3.22)
44
Figure 3. 3 LSTM Memory Unit Structure (left) and Stacked LSTM Model (right).
𝑥𝑥 𝑡𝑡 is the feature input to the memory unit whereas 𝑖𝑖𝑡𝑡 , 𝑓𝑓𝑡𝑡 , 𝑜𝑜𝑡𝑡 , ℎ𝑡𝑡 , 𝑐𝑐𝑡𝑡 represents the output of the
input gate, output of the forget gate, output of the output gate, the final cell state output and the final
memory unit output, respectively. 𝑊𝑊𝑥𝑥𝑥𝑥 , 𝑊𝑊𝑥𝑥𝑥𝑥 , 𝑊𝑊𝑥𝑥𝑥𝑥 represents the weights between the input layer and
input gate, input layer and forget gate and input layer and output gate, respectively. Similarly, 𝑊𝑊ℎ𝑖𝑖 ,
𝑊𝑊ℎ𝑓𝑓 , 𝑊𝑊ℎ𝑜𝑜 are the weights assigned between the recurrent hidden layer and the input layer, forget
gate and the output gate, respectively. Likewise, as the subscript suggests, 𝑊𝑊𝑐𝑐𝑐𝑐 , 𝑊𝑊𝑐𝑐𝑐𝑐 , 𝑊𝑊𝑐𝑐𝑐𝑐 the weights
associated with the cell state and the input gate, forget gate and the output gate, respectively. All the
variables represented by 𝑏𝑏 are the associated with each of the gates as given in equations (3.19 – 3.22).
𝜎𝜎 represents the sigmoid activation function used. The hidden recurrent unit output ℎ𝑡𝑡 is passed from
the previous LSTM memory unit to the next LSTM unit and from final LSTM memory unit of one layer
to the next layer memory unit as an input. THE LSTM model layers are trained using backpropagation
for different optimisers and layer parameters and the best performing parameters are chosen as the
final model forecasts. The model structure used for calibrating the parameters is shown in figure 3.3.
Like CNN model one max pooling layer is inducted. And every LSTM layer is succeeded by a batch
normalisation layer that normalises the batch vectors during each training iteration. Equation 3.23
represents the overall model output when iteratively calculated by following the equations from (3.19
– 3.23).
45
combined power of LSTM for time dependent recurrent sequence learning and the usual feed forward
network (ANN) for state space learning. B-LSTM-ANN exhibits a much better performing model than
just the LSTM model for the time series predictions [52]. The B-LSTM-ANN model can learn using the
optimal time lags when combined with the recurrent memory for the pattern determination. LSTMs
are good in recurrent value adaptive learning using gating to control information flow, on the other
hand ANN also help memorising for the overall pattern attributes in a series, when passed though
ANN after LSTM layers. Other reasons to test B-LSTM-ANN model is that simple RNN and typical LSTM
does suffer from a little gradient explosion problem when trained for long data series which make the
model a little unstable and unreliable, but the combination of LSTM and ANN makes it a more reliable
model by keeping the network error constant. Also, there are very few instances where B-LSTM-NN is
applied to the transportation problems. Figure 3.4 shows the stacked LSTM layers with the
subsequent ANN layers attached into become the B-LSTM-ANN model.
46
LSTM’s purpose can be defined as the estimation of the conditional probability
𝑝𝑝( 𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦𝑇𝑇 ′ | 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑇𝑇 ) given that ( 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑇𝑇 ) is an input sequence and ( 𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦𝑇𝑇 ′ )
is the corresponding output sequence. The lengths of 𝑇𝑇 ′ and 𝑇𝑇 may differ. The deep LSTM computes
the conditional probability by first computing the fixed dimensional input representations 𝑣𝑣, of the
input sequence, from the last hidden memory state of the LSTM layer [81, p. 3]. The hidden state ℎ𝑡𝑡
for each individual LSTM unit is calculated as given by the equation 3.22. Accordingly, the standard
LSTM network for the 𝑖𝑖 𝑡𝑡ℎ node with internal hidden states 𝑣𝑣 of corresponding inputs ( 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑇𝑇 )
is given by equation 3.24.
′
𝑦𝑦𝑗𝑗 = 𝑝𝑝( 𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦 𝑇𝑇 ′ | 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑇𝑇 ) = ∑𝑇𝑇𝑡𝑡=1 𝑃𝑃(𝑦𝑦𝑡𝑡 | 𝑣𝑣 , (𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦𝑡𝑡−1 ) ) (3.24)
With a set of inputs (𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦𝑇𝑇 ′ ) for dense ANN from stacked LSTM output, final network output
𝑦𝑦0 for the weighted sum of ANN inputs 𝑤𝑤𝑖𝑖𝑖𝑖 𝑦𝑦𝑗𝑗 with the corresponding threshold bias value 𝜗𝜗𝑖𝑖 , subject
to an activation function Φ(.) is given as:
𝑦𝑦0𝑖𝑖 = Φ ( ∑𝑁𝑁
𝑗𝑗=1 𝑤𝑤𝑖𝑖𝑖𝑖 𝑦𝑦𝑗𝑗 − 𝜗𝜗𝑖𝑖 ) (3.25)
B-LSTM-ANN training is done in a truncated Back propagation Though Time (BPTT) manner. It’s the
same as the normal back propagation except that it involves the gradient descent optimisation across
time intervals specifically for the recurrent networks. Error rates tend to decay forever as they are
truncated when they pass through the output gate of memory cell, this process makes the error decay
to follow the exponential process outside the memory cell. Due to this reason B-LSTM-ANN have the
ability for learning arbitrary long dependencies [52, p. 191].
If input to the DCNN is 𝐴𝐴𝑇𝑇 = { 𝑥𝑥𝑘𝑘,𝑙𝑙 } , where 𝑘𝑘 and 𝑙𝑙 represent the input data dimensions. 𝑘𝑘 being the
number of samples and 𝑙𝑙 the number of features, respectively and the output 𝑋𝑋 𝑇𝑇 = { 𝑥𝑥𝑡𝑡 } , at time
𝑡𝑡 is given as in equations [3.16-3.18].
∀ 1 ≤ 𝑘𝑘 ≤ 𝑚𝑚 𝑎𝑎𝑎𝑎𝑎𝑎 1 ≤ 𝑙𝑙 ≤ 𝑚𝑚
Φ represents a nonlinear activation function. After convolution max pooling is employed for the more
prominent feature selection and to reduce the number of learning parameters for the densely
connected NN layers. The output of DCNN is the input of the LSTMs and the output of LSTM is the final
output of the model. For the temporal features the cell states and the output states of the gates from
the DCNNs output are calculated by following the sequential equations (3.19-3.23). The final predicted
outcome of the model is given as:
47
′ ′
𝑌𝑌 𝑇𝑇+1 = Φ � 𝑝𝑝� 𝑌𝑌 𝑇𝑇 � 𝑋𝑋 𝑇𝑇 � ] = Φ ( ∑𝑇𝑇𝑡𝑡=1 𝑃𝑃(𝑥𝑥𝑡𝑡 | 𝑣𝑣 , (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑡𝑡−1 ) ) ) (3.28)
Where 𝑣𝑣 represents the fixed representations calculated using the hidden LSTM units. Φ represents
the output activation function. Like B-STM-ANN training, DCNN-LSTM model is trained end to end
using the BP training mechanism. The feature vectors after the DCNN are reshaped to make the input
compatible with the first LSTM layer. By default, an LSTM unit requires time steps or interval as one
of the input dimensions. So, inducting one-time step with the reshaping feature vector was mandatory.
Figure 3. 5 Deep Convolutional Neural Network- Long Short-Term Memory (DCNN-LSTM) Model.
48
Hardware and Software Implementation Details
All the development and experiments are carried out using the popular programming and simulation
language Python 3.7.11. Anaconda which is an open source distribution compiling the data science and
machine learning libraries, dependencies and binaries was used. Being a popular dynamically
interpreted language python is fast and suitable for Realtime processing applications. The popularity
of python has risen since the existing libraries for scientific computing and heavy processing written
in C are easily integrate able with python. Among the other factors is the already growing vast
community support of python contributors.
TensorFlow the low-level python API used by Keras is developed by Google in 2015. TensorFlow deals
the model computation in the form of graphs this makes it possible to developed new architectures
based on the basic unit constructs. Some of the TensorFlow features include easy model development,
graphs computations to be carried by both the CPU and GPUs which makes it more easier using Keras
API on top of it.
To pre-process the data before training and for preparing training and validation datasets scikit-learn5
python library was used. For end to end model training with best parameters models flow pipelines
were written which made it easy to find the best performing parameters for each individual model
though grid search function in scikit-learn library. The best performing model parameters are given in
appendix A. Interestingly scikit-learn functions are written to be compatible with Keras high-level
prediction functions.
1
https://www.anaconda.com/distribution/
2
https://pandas.pydata.org/
3
https://keras.io/
4
https://www.tensorflow.org/
5
https://scikit-learn.org/
49
4. Research Methodology and Contributions
Introduction
In chapter 2 the review of literature review with an overview of different methods of modelling traffic
flows, travel times, road congestion analysis, and the prediction models positioned around the latest
state of the art machine learning algorithms and in chapter 3 the chosen models are discussed in detail.
This chapter covers details of the potential datasets, more favourable datasets breakdown their
suitability for the research methodology and the proposed mechanism for the chosen machine
learning models.
Study Area
This section explains the approach and the criteria to gather the traffic flow characteristics data. The
initial aim of this research is to analyse the data in the Hertfordshire UK area. Due to the type,
availability and format of the data it was decided to consider the UK traffic road networks as a case
study for this research. This section presents an explanation of the study area and the related datasets.
Due to the comprehensive data availability the considered defined area is shown in figure 4.1 a. The
broader application of this research outcome will follow the same procedure for the whole of road
networks.
Data Collection
Before getting into the proposed algorithm it is very important to know the procedures adopted to
gather the data. After a lot of search some of the dataset with comparatively reliable sources are
shortlisted in table 4.1. A list of potential open datasets that are suitable for the proof of concept
implementation are shown in table 4.1. We have only discussed the datasets that have the enough
information for our model validation and testing. On site sensor loops and radar technology that were
used to log the data at discrete points make the collected data a complete set of suitable datasets for
our research methodology.
Considering the UK only data, Highways England (refer table 4.1, No. 10) is responsible for most of
motorways and major category A roads in England. Highways England has outsourced his National
Traffic Information Service (NTIS) for seven years to a joint venture between Mouchel and Thales,
called Network Information Services (NIS) ltd. NTIS has installed equipment at regional control centres
to interface with the various subsystems of the Highways England Traffic Management Systems
(HATMS). This equipment provides access to the Motorway Incident Detection and Automatic
Signalling (MIDAS) traffic data and high occupancy alerts with the ability to set variable message signs
and to receive variable message sign (VMS) network signal settings via the message sign and signal
subsystems of HATMS. Traffic data is also collected from traffic monitoring units and travel data from
automatic number plate recognition (ANPR) cameras located at strategic locations on the network.
Both categories of data are collected 5-minute intervals once processed the data is accessed by the
subscribers. NTIS collects traffic data from various sensors and make them available in two different
forms as isolated sensors data or fused sensors data [91]. HE has categorised data collection sites into
three types: MIDAS, TAME and TMU. MIDAS sites which are mostly equipped with inductive loops
although few sites also being used for the radar technology trials. Some sites collected data for traffic
appraisal, modelling and economics (TAME) purposes only using inductive loops. In an analogy to
TAME sites, some sites are equipped with traffic monitoring units (TMU) only.
50
NO. Dataset Description Location Suitability Source
1. Bus Breakdown and delays The Bus Breakdown and Delay New York Dataset can be Kaggle
system collects information from exploited to consider https://www.kaggle.com/zx4724/b
school bus vendors operating the transport operator us-breakdown-and-delays-
out in the field in real time. performance and the analysis/data
(2015-2017) delays caused by their
vehicle breakdowns
impacting the overall
road traffic delays.
2. Annual Traffic Volume (ATV) Traffic figures give the total (covers most This dataset can be used Department for Transport (DFT)
volume of traffic on the stretch sites with to create a prediction
(Major by direction and of road for the whole year and minor road model for urban road https://www.dft.gov.uk/traffic-
Minor Roads) – are calculated by multiplying the data occupancy and can be counts/download.php
Annual Average daily flow estimates) used as a separate
DFT Dataset (AADF) by the corresponding UK feature with other [92]
length of road and by the traffic prediction
number of days in the years. models as well.
51
5. Road Traffic Estimates The National Statistics UK Not much junction on
statistics in Great Britain publications of road traffic road level information Department for Transport (DFT)
estimates for Great Britain are in the data so it can be
released on an annual and employed into the https://www.gov.uk/government/c
quarterly basis and provide junction level prediction ollections/road-traffic-statistics
summary statistics at national, models.
regional, and local authority
level.
6. 1.6 million UK Traffic The UK government amassed UK It’s an incident log so Kaggle
Accidents traffic data from 2000 and 2016, can be used to predict https://www.kaggle.com/daveianhi
recording over 1.6 million the road incidents on ckey/2000-16-traffic-flow-england-
accidents in the process and roads according to the scotland-wales
making this one of the most weather, road
comprehensive traffic data sets conditions, accident
out there. It's a huge picture of a severity and seasonality
country undergoing change. along with the time of
the day. However, the
traffic incidents data
can be used as an
additional feature for
the flow prediction
model.
7. Road Traffic Accidents Information on accidents across Leeds, UK It’s an incident log so UK Government Data Website
Leeds. Data includes location, can be used to predict https://data.gov.uk/dataset/road-
number of people and vehicles the road incidents on traffic-accidents
involved, road surface, weather roads according to the
conditions and severity of any weather, road
casualties conditions, accident
severity and seasonality
along with the time of
the day. However, the
traffic incidents data
52
can be used as an
additional feature for
the flow prediction
model.
8. Road Safety Data- Accidents These files provide detailed road UK This dataset can be UK Government Data Website
and Causalities 2016 safety data about the incorporated into the
circumstances of personal injury flow prediction model https://data.gov.uk/dataset/road-
road accidents in GB from 1979, to predict the accidents-safety-
the types (including Make and breakdown likelihood of data/resource/91789e37-03e5-
Model) of vehicles involved and specific vehicle. 48cf-9720-2d13639c32b9
the consequential casualties.
9. University Bus Company Bus Routes transactions, UK Useful to explore the University Bus company Ticketing
(UNO) Automated Vehicle GPS route performances for Systems Logs.
Locations (AVL), Automatic routes, prediction
Passenger Counts (APC), models development
Scheduled Bus Arrival and for predicting passenger
Departure vs Real time Data. counts for stops and
predicting the bus stop
arrival times.
10. Highways England Network Highway and major road UK (selected The most compact and UK Government Data Website
Journey Time and Traffic flow statistics. Contains the logs of sites) comprehensive dataset
Data – the speeds and the average found expressing the https://data.gov.uk/dataset/highw
MIDAS/TAME/TMU Dataset speed and traffic flow. traffic flows and their ays-england-network-journey-
average speeds Suitable time-and-traffic-flow-data
for traffic flow
prediction models. Highway England Portal
http://tris.highwaysengland.co.uk/
[93]
53
a) b)
Figure 4. 1 a) Original Sample chosen test area with circles (yellow for MIDAS sites and blue for TAME sites. b) showing the sensors installed at the test sites by Highway England authority. b)
Square red line boxes indicate the virtually divided network.
54
Data Description
Different data sources have different recorded parameters some of them have common parameters
i.e. timestamp of the log, vehicle flow etc. Data is recorded through sensors activity at the model sites.
The sensors used were loop based data from traffic monitoring units (TMU) and journey time was
inferred using ANPR equipment in case of Highway England gathered dataset. The sensor loops in the
road surface measured the actual speeds, vehicle flows and occupancy whilst travel times between
two points was measured using ANPR camera recognition. If one of the loop was deemed faulty on
the site it was reported and the flow value were imputed from previous values and not the vehicle
category and speeds [91]. The following two datasets are chosen due to their suitability towards the
testing and validity of our proposed network methodology, which are further sketched out in deep
details as below.
55
Data Field Description
Total Carriageway Flow The number of vehicles detected on any lane within the 15-minute time slice.
Total Flow vehicles less than 5.2m The number of vehicles less than 5.2m detected on any lane within the 15-minute time slice.
Total Flow vehicles 5.21m - 6.6m Number of vehicles between 5.21m - 6.6m detected on any lane within the 15-minute time
slice.
Total Flow vehicles 6.61m - 11.6m The number of vehicles between 6.61m - 11.6mn detected on any lane within the 15-minute
time slice.
Total Flow vehicles above 11.6m The Number of vehicles above 11.6m detected on any lane within the 15-minute time slice.
Speed Value The average speed in km/h. of all vehicles for all lanes measured by the site over the 15-minute
period.
Day Type The following are valid:
• 0 - First working day of normal week;
• 1 - Normal working Tuesday;
• 2 - Normal working Wednesday;
• 3 - Normal working Thursday;
• 4 - Last working day of normal week;
• 5 - Saturday, but excluding days falling within type 14;
• 6 - Sunday, but excluding days falling within type 14;
• 7 - First day of school holidays;
• 9 - Middle of week - school holidays, but excluding days falling within type 12, 13 or
14;
• 11 - Last day of week - school holidays, but excluding days falling within type 12,13 or
14;
• 12 - Bank Holidays, including Good Friday, but excluding days falling within type 14;
• 13 - Christmas period holidays between Christmas day and New Year’s Day;
• 14 - Christmas Day/New Year’s Day.
Quality Index The Indication of the quality of the data provided. The number of valid one-minute records
reported and used to generate the Total Traffic Flow and speed. A quality index of 0 indicates
no valid records.
Network Link Id An identifier unique to the NTIS link.
56
Description
Data Field
Average Speed in MPH The average speed of vehicles per NTIS link for the 15-minute time slices.
Category 1 Speed Count The average count of vehicles detected by the TAME site with a speed less than 10 mph in the
15 minutes time for all lanes.
Category 2 Speed Count The average count of vehicles detected by the TAME site with a speed between 10 to 15 mph in
the 15-minute time interval for all lanes.
Category 3 Speed Count The average count of vehicles detected by the TAME site with a speed between 15 to 20 mph in
the 15-minute time interval for all lanes.
Category 4 Speed Count The average count of vehicles detected by the TAME site with a speed between 20 to 25 mph in
the 15-minute time interval for all lanes.
Category 5 Speed Count The average count of vehicles detected by the TAME site with a speed between 25 to 30 mph in
the 15-minute time interval for all lanes.
Category 6 Speed Count The average count of vehicles detected by the TAME site with a speed between 30 to 35 mph in
the 15-minute time interval for all lanes
Category 7 Speed Count The average count of vehicles detected by the TAME site with a speed between 35 to 40 mph in
the 15-minute time interval for all lanes.
Category 8 Speed Count The average count of vehicles detected by the TAME site with a speed between 40 to 45 mph in
the 15-minute time interval for all lanes.
Category 9 Speed Count The average count of vehicles detected by the TAME site with a speed between 45 to 50 mph in
the 15-minute time interval for all lanes.
Category 10 Speed Count The average count of vehicles detected by the TAME site with a speed between 50 to 55 mph in
the 15-minute time interval for all lanes.
Category 11 Speed Count The average count of vehicles detected by the TAME site with a speed between 55 to 60 mph in
the 15-minute time interval for all lanes.
Category 12 Speed Count The average count of vehicles detected by the TAME site with a speed between 60 to 70 mph in
the 15-minute time interval for all lanes.
57
Category 13 Speed Count The average count of vehicles detected by the TAME site with a speed between 70 to 80 mph in
the 15-minute time interval for all lanes.
Category 14 Speed Count The average count of vehicles detected by the TAME site with a speed greater than 80 mph in
the 15-minute time interval for all lanes.
Category speed counts included flag This denotes whether there are speed bin values present. Possible values are:
• 0 - Not Present;
• 1 - Present.
Table 4. 2 Traffic Flow, Additional field names and description features unique to TAME Dataset [94].
ONS GOR Name Former Government Office Region that the CP sits within.
ONS LA Name Local authority that the CP sits within.
Road This is the road name (for instance M25 or A3).
R Category The classification of the road type.
iDir Direction of travel.
S Ref E Easting coordinates of the CP location.
S Ref N Easting coordinates of the CP location.
A-Junction The road name of the start junction of the link.
B-Junction The road name of the end junction of the link
LenNet_miles Total length of the network road link for that CP (in miles).
FdPC AADF for pedal cycles.
Fd2WMV AADF for two-wheeled motor vehicles.
FdCar AADF for Cars and Taxis.
FdBus AADF for Buses and Coaches
58
FdLGV AADF for LGVs.
FdHGVR2 AADF for two-rigid axle HGVs.
FdHGVR3 AADF for three-rigid axle HGVs.
FdHGVR4 AADF for four or more rigid axle HGVs.
FdHGVA3 AADF for three or four-articulated axle HGVs.
FdHGVA5 AADF for five-articulated axle HGVs.
FdHGVA6 AADF for six-articulated axle HGVs.
FdHGV AADF for all HGVs.
FdAll_MV AADF for all motor vehicles.
Table 4. 3 AADF dataset common Data field names and description [93].
59
5.4.2 AADF Dataset
From the available datasets in table 4.1, AADF dataset is managed by DFT authority. It contained the
average annual daily flows (AADF) and traffic flow counts which gave the opportunity to analyse the
network on a street and minor roads level. Unlike highway England dataset, this dataset covered not
only the Motorways, A-class roads but also the minor roads that includes B-class, C-class and certain
urban unclassified roads. However, overall the minor roads data is not as comprehensive dataset when
compared to the major roads this is due to the fact that minor road estimates were only gathered at
some of the sample points [92]. Some of the AADF major roads data fields that made the DFT dataset
more favourable for study and made it a better fit for use in the experimentation and testing, are
shown in table 5.4. AADF data figures are produced for each junction to junction link on the major
roads for every year. AADF stands for average over a full year of number of vehicles passing a point in
the road network each day. Figure 5.3 gives the DFT dataset basic network road topology-based
breakdown.
Data Preparation
The operational process of flow-based predictions is a multi-stage process (refer Figure 5.8). The
process starts with a series of live data streams containing the time series data covering all the
concerned nodes or junctions of the road network initially chosen in the study area. Like any machine
learning algorithms, the incoming real time data is tested on the trained classifier model to predict
the prediction variables. In a conventional machine learning model implementation, the validation
scores are calculated on the validation set to compare the performance efficiency and prediction
accuracies of the tested algorithms.
All the experiments are performed on the traffic flow MIDAS dataset for the Hatfield Hertfordshire UK
area junctions as shown in figure 4.1 a & b. The used dataset contained traffic flow information for
two-hour timed aggregated intervals from start of 1st April 2015 to the end of 31st Dec 2015 for the
highway roads. First three and last three raw dataset plots from patch 1 node 2 links (refer figure 4.1
b.) are shown in figure 5.4. After the data collection process data preparation process involved
gathering the relevant data fields for the model development. As mentioned in section 5.4, the data
was collected for the number of passing vehicles using the loop detectors technology installed on both
the ends of the selected highway links. The data pre-processing is carried out using the steps as below:
60
a)
b)
Figure 4. 4 a) First and b) last three days of pre-processed data from Patch 1, Node 2 associated Links.
61
4.5.1 Data Cleaning
The raw link dataset had approximately fifteen percent of values that were missing. Due to the
ongoing trends comprising of seasonality and other environmental factors it is very important to retain
the inherit trends in the traffic data. So, the missing values are imputed using the backward fill
approach. In backward fill approach the flow value is imputed using the next interval original recorded
value. This imputation process was continued until all the missing values were imputes. Although the
data inconsistency was resolved but this technique can danger the inherit data properties if the
missing value rates is too high.
62
the network area, compiled for different time interval i.e. 15,30,45,60 minutes. An initial insight into
the data is to be developed along with the trends filtering for different independent variables like type
of the day. Figure 4.4 a & b shows the results post data preparation as outlined in sub sections 4.4.1-
4.4.4 for the sample area considered for experimentation as illustrated in figure 4.2 a & b.
Preliminary Analysis
Preliminary data analysis is second phase of the methodology process (refer figure 5.8). Preliminary
phase involves the network sampling by dividing it in several patches and subsequently each patch
into nodes according to the proposed strategy.
In our study the focus is majorly on the traffic flow, the causes of bottlenecks and the effects on the
overall traffic travel times. The rules that we defined in our study, to declare a possible virtual network
patches are as follows:
• First and for most the defined patch is considered an enclosed virtual geo-fence boundary
defined system, to study the effects of the dependent variables (i.e. total flows for the links,
the resulting journey and the inflicted travel times) in that patch.
• A patch must consider the minimum of one node in it. The node is defined by the sensor site
with the aggregate of the current data availability for all the intersecting road links.
• A patch (n) acts like an independent system with its own inputs (traffic flows) source origins
from subsequent patch (n-1) and likewise the output to be dumped into another successive
path (n+1) as shown in figure 5.5.
• Order of exploiting the patches data in our ML model is of extreme importance. Order is
important such that the traffic flows output of one system is the input of the other system in
line.
63
4.6.2 Preparing the Dataset Subset for Each Node of a System Patch
After nodes identification in a patch the next step is to gather and prepare the dataset subset for the
patch and this is achieved by compiling the dataset for all the nodes and their associated bidirectional
links in that node. The subset dataset for each patch includes the total traffic flows for each link on all
the nodes. Initially It is thought to keep the methodology and experimental simple by just considering
one junction as shown in figure 5.6 a.
a)
b)
Figure 4. 6 a) 𝑃𝑃1 -𝑁𝑁2 , Highway junction under consideration (Google Maps, 2018). b) Node illustration retaining junction
original topology.
64
Figure 4.6 a) shows the original patch one, node junction two along with the associated bi-directional
links. Figure 4.6 a) shows the node 𝑁𝑁2 marked along with the road links forming the node. This node
comprises of four road links (𝐿𝐿1 , 𝐿𝐿2 , 𝐿𝐿3 , and 𝐿𝐿4 ). All the road links in figure 4.6 a) are two-way links
which signifies that they bear not only the burden of the incoming flow of traffic but also the outgoing
traffic flow. All the links considered in this case belongs to the motorway category of highways more
specifically, 𝐿𝐿2 , 𝐿𝐿3 belongs to the A1(M) whereas 𝐿𝐿1 is part of North Orbital road. Links are numbered
according to clockwise rule with the first link being the one that falls in the zero to ninety-degree range
and the later follow in a sequence. Table 4.5 shows the final subset dataset for node 𝑁𝑁3 given only the
links division with field header variables for the whole of the patch 𝑃𝑃1 (refer figure 4.1 b). Patch 𝑃𝑃1
filtered dataset header fields are shown in table 4.5 to convey the concept of links flow divisions,
because it’s an example of diverse node which contains varying number of links 𝑁𝑁𝑖𝑖 i.e. three, four and
four for nodes 𝑁𝑁2 , 𝑁𝑁1 and 𝑁𝑁3 respectively.
Table 4. 4 Links Divisions for Patch 𝑃𝑃1 (refer figure 4.1 b).
Methodology
In this section, the proposed traffic model representation is presented. In view of the proposed
methodology the traffic model is considered as a set of nodes with corresponding inputs and output
links. The traffic flow for a set of links will have an influence on the traffic flows of the output links.
The traffic model is considered as a block box interpreting and modulating the system inputs. As
system is governed by a set of rules associated with the fixed and dynamic states which are mapped
to the outputs. This is shown in graphical form with the mathematical expressions as in figure 4.7.
Such as each individual road links for a node can be modelled as an objective function consisting of
variable parameters as shown in figure 4.7.
Figure 4. 7 General Network Node Link Dependencies Written in An Analogy with The General Function Definition.
65
4.7.1 Traffic Network Representation on a Junction Level
Each spatially located junction with its inflows and outflows is an independent system. Each network
junction is designated as a node denoted by 𝑁𝑁𝑥𝑥 , where 𝑥𝑥 gives the node number in a patch to which
the node belongs. As the highway links are bidirectional, the link, represented by 𝐿𝐿𝐿𝐿𝑖𝑖𝑖𝑖 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜 can be an
inflow (𝑖𝑖𝑖𝑖) and out flow ( 𝑜𝑜𝑜𝑜𝑜𝑜) , where x is the number of links associated with a node under
consideration. As an example, for the experimentation and consideration of the proposed mechanism,
a simple node in figure 5.6 a) is considered and it’s equivalent representation using the nodes and the
links configuration is illustrated in figure 4.6 b). Further, the bidirectional arrows indicate the
bidirectional traffic flows of the nodes. Here outflow implies traffic flow moving away from the node
and inflows to those moving into the node.
𝐹𝐹 ′ ( 𝐿𝐿 ( 𝑗𝑗 ) 𝑖𝑖𝑖𝑖 )
Where in equation 4.4, �𝐹𝐹 ( 𝐿𝐿 ( 𝑗𝑗 ) ), represents a fraction of the traffic inflow that is
𝑖𝑖 𝑖𝑖𝑖𝑖
contributed to an outflow of a specific link.
This above mathematical representation is further illustrated graphically in figure 4.8 a & b. In figure
4.8 a) the circle represents a node 𝑖𝑖 with three links. The thick blue arrow indicates the traffic inflow
66
of link 𝐿𝐿 (1)𝑖𝑖𝑖𝑖 that gets dispersed into the node and flows through the rest of the links. This show that
the flow contributes to the outflow of the rest of links including itself. This dispersion indicated by thin
blue arrows in figure 4.8 a). The outflow of each of the links in figure 4.8 b) is shown in green arrows.
The symbol ∃1−𝑗𝑗 indicates that part of the inflow of link 𝐹𝐹( 𝐿𝐿(1)𝑖𝑖𝑖𝑖 ) contributes to the outflow of the
links 𝐹𝐹(𝐿𝐿(1)𝑜𝑜𝑜𝑜𝑜𝑜 ). The sum of the traffic flow of 𝐹𝐹( 𝐿𝐿(1)𝑖𝑖𝑖𝑖 ) inside the node represented by thin blue
arrows is equal to the traffic inflow of 𝐿𝐿(1) represented by a thick blue arrow, at a time instant as
shown in figure 4.8 a). This applies to the traffic inflow of all other links at the node as shown in figure
4.8 b).
a)
b)
Figure 4. 8 a) Extension of traffic network at node 𝑖𝑖 showing three links and their associated inflows and outflows. b) A
simple traffic network at a node 𝑖𝑖 with 3 links. It shows the distribution of incoming traffic dispersed as outgoing traffic at the
node.
67
Figure 4. 9 Implementation Steps for The Proposed Methodology.
Summary
In this chapter, possible gathered datasets are discussed in detail along with their suitability to aim of
this research and the chosen study area for the data collection. After a thorough data description,
MIDAS dataset as the chosen dataset, is passed through a set of data preparation steps which involve
cleaning, integration, normalisation, data reduction and discretisation techniques. Possibility of
dependent and independent variables in the dataset are also explored. Network division into patches
and further into nodes with attached traffic road links is presented. The topology based proposed
network methodology to be employed for ML models is discussed in detail at the end of this chapter.
68
5. Experiments and Results: Evaluation of The Proposed
Frameworks
In this chapter, the preliminary analysis is done to decide upon the best analytical methods. Then the
proposed research methodology to predict the highway traffic flow predictions in Hatfield area is
discussed. The main aim of this chapter rests upon presenting the findings with an in-depth analysis
of traffic flow prediction using hybrid DNN techniques. This chapter introduces the structure of how
the experiments are performed based on the proposed methodology, chosen ML model techniques
along with the reported results. Section 5.1 describes the experiment settings for experimental
scenarios, which are given in section 5.2. Data correlation study is carried out in section 5.3.
Experimental setup is described in 5.4 and section 5.5 lists the actual experimental results in detail.
Experimental Settings
In this section the performance metrics used to report the best performing individual models and
evaluation methods for comparing different models are introduced. The chosen dataset is analysed
further for correlation analysis along with the training and testing of the proposed models. Further
the merits and demerits of the proposed methods, fusion of different modular architectures for traffic
flow prediction is carried out.
Where 𝑦𝑦𝑡𝑡 and 𝑦𝑦𝑡𝑡′ are the actual and predicted traffic flows at time t respectively. These performance
indices do the job of measuring the linear score that averages the prediction error with the same
weight as with RMSE and MRE it allows the relative residual error measurement by assigning larger
weights to larger errors. It is also important to see how different models perform for different node
links among different junctions. This is done by the empirical distribution function plots. Model
accuracies are also analysed during rush peak and normal non-peak hours.
69
5.1.3 Empirical Error Distributions
Different models generate the multivariate level predictions for traffic flows. The factor that
differentiates different models based on their performance is how well they predict for every link on
the considered nodes. Error measurements among different models are highlighted using the
cumulative distribution function (CDF) which is in the form of EDF. Let us assume for example 𝑥𝑥 ∈ 𝑋𝑋 ,
where 𝑋𝑋 is the performance measure of the model in the form of the calculated RMSE or MRE for the
prediction results for each node link. The distribution tells us how much of 𝑥𝑥 is distributed, from the
value of 0 up to the maximum sample distribution level of 1 in the sample space.
Experiments
This section sheds a light on experiments performed and the logical reason behind for doing them.
Firstly, the experiments with model performances for different prediction time horizons are discussed,
based on the chosen model architecture and what input data lag gives the better prediction results.
In the second scenario, the effects on deep model results by including more variables beside the flow
data are reported. It’s the combination of Convolutional Neural Network (CNN) and Recurrent Neural
Network (RNN) variants i.e. LSTM and GRU that make up the deep learning models.
The existing models are to be outperformed if the newly proposed model is to be considered a better
alternative. Thus, the chosen considered benchmarks including: The Auto Regressive Moving Average
(ARIMA), Historical Average (HA), Random Walk (RW) or Random Forest Regressor (RFR), Support
Vector Regressor (SVR), Feed Forward Backpropagation Neural Network (FFBNNs), Deep Belief Neural
Network (DBNs), Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs),
Backpropagation-Long Short Term Memory -Neural Network (B-LSTM-ANN) and Deep Convolutional
Neural Network – Long Short Term Memory (DCNNs-LSTM) (refer chapter 3.), are compared. HA and
RFR constitute the bare bone simplest models considered here. As both the models signify two
different abilities that they inherit. Firstly, any methodology to be considered as a better model must
have the performance show of better than RFR as it suggests that the compared model has got the
ability of meaningful data learning whereas HA is taken as a baseline performing model for trendiness.
Any worst model performance than HA would suggest the inability and slackness of the data learning
abilities.
70
fifteen minutes or one-time interval. But as mentioned in data pre-processing, the time intervals with
two hours apart are considered for this experiment. So essentially next time step means the flow
recorded in the last fifteen minutes and the preceding timestep will be two hours apart from the last
recorded sample. To have a better understanding of the traffic flows, these intervals help understand
the short (fifteen-minute, one-time step), medium (thirty minutes, two-time steps) and long-term
(forty minutes or three-time steps ahead) reliability of the prediction models.
• Vehicle speed value (The average speed in km/h. of all vehicles for all lanes measured by the
site over the 15-minute period)
• Total carriageway flow (The number of vehicles detected on any lane within the 15-minute time
slice.)
So, 15-minute is the time index by which the data is recorded. Some of the potential recorded variables,
over each time interval (15-minutes) from MIDAS dataset that can also be used in conjunction with
flow and speed features as more meaningful feature vector are the basis of further problem solution,
as listed below:
• Day type (Day of the week, normal week working days, first, middle and last day of the week
school holidays and bank holidays, day of the year)
• Time of the day (the interval index of 15-minutes or one-time step). Times of the days are
used as a further multi-feature multi feature deep end model training keeping the proposed
objective function intact.
• Flow of different vehicle categories (Total flow of vehicles in range less than 5.2m, Total flow
of vehicles in range 5.21m - 6.6m, Total flow of vehicles in range 6.61m - 11.6m, Total flow of
vehicles in range above 11.6m).
Correlation Analysis
In this section we analyse the dataset for studying the relevancy in the feature’s selection (auto-
correlation) and the relevancy of the main selected features to other considered features (cross-
correlation). The primary feature is the total carriageway flow as the main selected feature variable
and secondary considered features are the time lagged versions of link flows.
5.3.1 Auto-Correlation
To check the dependence of different time intervals for the carriageway flow values, auto-correlation
test is performed using the timed lag version of the flow data. This analysis tells if the number of
previous time intervals (n-steps) that are relevant and have an effective correlator effect on future
values corresponding to ahead time intervals (n-steps), so as the optimal interval steps can be
considered in the prediction model. Figure 6.1 shows the auto correlation graph for the original traffic
flow features data for the incoming link 𝐿𝐿1𝑖𝑖𝑖𝑖 . Each lag is for the 15-minute interval time step. The
blue shaded area represents the 95% of the confidence interval for the correlation coefficient. Any
correlation coefficient past the confidence interval shows the existence of significant autocorrelation
71
between traffic flow at time (t) and (t-interval step). Traffic flow values exhibit a significant correlation
for even up to forty lags in the past as can be seen from figure 5.1. Although after about twenty lags
the correlation becomes periodic depicting the trendiness in the flow values, so the lags past the 20
lags can be discarded. The similar autocorrelation coefficient behaviour is observed in outgoing and
incoming traffic of the other connected links as well. Next, we analyse the cross-correlation of
different linked road links.
Figure 5. 1 Original Flow features auto-correlation for the incoming link 𝐿𝐿1𝑖𝑖𝑖𝑖 .
5.3.2 Cross-Correlation
To cross check the dependence of different junction connected links traffic flows at different time
steps, cross-correlation is performed. The cross-correlation results are shown below in figure 5.2. To
properly understand the traffic flow parameters that determine the shape of traffic flow profiles, it is
necessary to investigate the cross-correlation of connected traffic links. In this case we analyse the
cross correlation for the past six-time intervals for the L1in. Cross correlation results are shown in
figure 5.2. The labels 𝐿𝐿1𝑖𝑖𝑖𝑖 , 𝐿𝐿1𝑜𝑜𝑜𝑜𝑜𝑜 , 𝐿𝐿2𝑜𝑜𝑜𝑜𝑜𝑜 , 𝐿𝐿3𝑜𝑜𝑜𝑜𝑜𝑜 , 𝐿𝐿1𝑖𝑖𝑖𝑖_1 , 𝐿𝐿1𝑜𝑜𝑜𝑜𝑜𝑜_1 , 𝐿𝐿2𝑜𝑜𝑜𝑜𝑜𝑜_1, 𝐿𝐿3𝑜𝑜𝑜𝑜𝑜𝑜_1,. in figure 5.2 refer
to traffic flows related to link one (inflow), link one (outflow), link two (outflow), link three (outflow),
link one (inflow with one-time interval lagged), link one (outflow with one-time interval lagged), link
two (outflow with one-time interval lagged), link three (outflow with one-time interval lagged) etc.
72
respectively. The cross correlation between each pair of links are given by real numbers along with
the colour map plot which represents the pair relevant cross correlation. The final plotted links lag set
is shown in figure 5.3. In order to get the sense of the lagged linked pairs cross and auto-correlation
Pearson coefficient was considered [97]. There exists a higher auto-correlation of the links with their
own time lagged versions (Pearson coefficient > +0.5) but the cross-correlation fades away towards
no correlation (Pearson coefficient = 0) and becomes non-linear (Pearson coefficient <-0.5), if this link
lag increases. As in figure 5.3, the link 𝐿𝐿1𝑖𝑖𝑖𝑖 vs 𝐿𝐿1𝑖𝑖𝑛𝑛_ 1 , 𝐿𝐿1𝑖𝑖𝑖𝑖_2 , 𝐿𝐿1𝑖𝑖𝑖𝑖_3 , 𝐿𝐿1𝑖𝑖𝑖𝑖_4 , 𝐿𝐿1𝑖𝑖𝑖𝑖_5 , 𝐿𝐿1𝑖𝑖𝑖𝑖_6 exhibits
no linear or zero cross-correlation. And same is true for the same linked lagged pairs, where auto
correlation is maximum but cross correlation is zero. But as more further time lags are considered the
correlations become either non-correlated or non-linear. It is interesting to know that at later lag
periods the cross correlation becomes nonlinear i.e. less than zero. For example, 𝐿𝐿3𝑜𝑜𝑜𝑜𝑜𝑜_5 , 𝐿𝐿3𝑜𝑜𝑜𝑜𝑜𝑜_6 each
individually have insignificant correlation with other lag pairs except when compared with other links
fifth and sixth lags (𝐿𝐿𝐿𝐿𝑖𝑖𝑖𝑖_5 , 𝐿𝐿𝐿𝐿𝑖𝑖𝑖𝑖_6 ), where it exhibits very high cross-correlations resulting in a linear
or close to linear correlations i.e. greater than zero Pearson coefficient.
Figure 5. 2 Cross Correlation of Link 𝐿𝐿1𝑖𝑖𝑖𝑖 with it's Time Lagged Versions.
From the plot in figure 5.3 the following corollary can be drawn:
• The cross correlation of the links with their own lagged versions is almost zero at any interval
whereas their auto correlation depends on the time lags as shown in figure 5.1.
• The lagged pairs have more linear correlation to the links nearest lagged versions suggesting
the that the trend is flowing through to the next time lag and fading away gradually in the
subsequent lags.
• There is a significant trendiness across links of the relating time lag hence they are more cross
correlation.
• At any one-time lag consideration, suggests that the traffic flow distributes not evenly to the
joined links but follows the flow conservation principle so much that the linked road link can
have a nonlinear relation to the master inflow link regardless of other linked links on the same
73
junction. This fact supports our intuition for the proposed methodology mentioned in section
4.7.
5.3.3 Relation Between Traffic Flow Profiles and Times of the Day
Further, to understand the flow profiles of the connected links with respect to the time of the day.
Figure 5.5 consists of two figures, a) being the links correlation pair plots with no time lags and b) is
the plot with flow from six-time steps ago. It is very clear from figure 5.3 that the time of the day have
a significant effect on the link flows. Although the incoming flow from 𝐿𝐿1𝑖𝑖𝑖𝑖 is distributed into 𝐿𝐿1𝑜𝑜𝑜𝑜𝑜𝑜 ,
𝐿𝐿2𝑜𝑜𝑜𝑜𝑜𝑜 , 𝐿𝐿3𝑜𝑜𝑜𝑜𝑜𝑜 but the distribution of this division is time dependent. As can be seen in figure 5.5 b)
𝐿𝐿1𝑖𝑖𝑛𝑛_ 6 exhibits a high-density flow distribution at 4th, 6th and 22nd hour of the day. Compared to
74
𝐿𝐿1𝑜𝑜𝑜𝑜𝑜𝑜_ 6 , traffic flows at the 6th, 8th and 20th hour are mostly dominant. Which is because these hours
fall in to the category of peak hours. On the contrary, 𝐿𝐿2𝑜𝑜𝑜𝑜𝑜𝑜_6 get its share of flow maximum at the 6th,
10th and 16th hour of the day. And most of the flow out of 𝐿𝐿1𝑖𝑖𝑛𝑛_ 6 taken by 𝐿𝐿3𝑜𝑜𝑜𝑜𝑜𝑜_6 , which is further
apparent by the likewise flow density distribution profiles of both links. Comparison of other link pairs
suggests the fact that peak flow in one linked is not distributed evenly to the connected flow receiving
links. But the most crucial think to note is the time of the day. Figure 5.4 exhibits the relation 𝐿𝐿1𝑖𝑖𝑖𝑖 of
flow profile for with respect to the times of the days. Based on these results traffic flow profile can be
divided into three time slots: 1) Morning Peak Hours (04:00-10:00), 2) Normal Hours (10:00-16:00), 3)
Evening Peak Hours (16:00-20:00), 4) Late-Night Off-Peak Hours (20:00-04:00). The average
correlation for the evening peak hours between the link pairs falls below 0.5 this is because the traffic
flow is reversing in the opposite direction and some of the original morning flow won’t go through the
actual incoming link channel and may take some other route. Whereas for the late-night off-peak
hours the flow becomes minimum at first due to minimalistic traffic on road but then it starts to keep
pace during its last hours to constitute the peak morning hours.
Figure 5. 4 Link 𝐿𝐿1𝑖𝑖𝑖𝑖 Normalised Flow Profiles with Respect to The Times of The Days.
75
a)
b)
Figure 5. 5 a) Correlation Between Non-Lagged Interconnected Link Pair Normalised Flows vs Time of the Day. b)
Correlation Between Non-Lagged Interconnected Link Pairs Normalised Flows vs Time of the Day.
76
5.3.1 Seasonality and Trends in Traffic Flows
The final link flows are plotted in figure 5.5 a & b. and as illustrated in figure 5.4, the link flow pairs
have a positive correlation for most of the times of the day. The obvious thing is that traffic volume
increases with peak hours and decreases with off peak hours when the day light is getting dimmed.
Furthermore, it is significant that traffic flow is inversely correlated to the speeds of the vehicles.
During adverse weather and low visibility conditions average speed is lower than normal resulting in
decreased traffic flows. These changing factors have a major effect on traffic flows especially traffic
congestions. Traffic congestion may also be the function of seasonality and trends. From figure 5.5, in
both the plot pairs it is seen to have some level of flow at a certain time of the day that is highly
correlated, but not so correlated in the lagged versions of the pair plots. This explains that traffic flow
profiles are highly seasonal dependent. Also, because in winter season conditions with low visibility
the difference between the peak and normal hours gap is shortened. This difference in flow behaviour
is more discernible by the density distribution curve shift of link pairs for the morning peak hours and
normal hours of the day as can be seen in figure 6.6 in the trend plots. Traffic flow trend may remain
constant for most of the year as it’s the function of regular road users, but it is the seasonality that
makes the flow exhibit major variations due to the density of the road being used at any time as shown
in figure 6.6. Figure 6.6 shows the seasonality breakdown of the four links previously being discussed.
Since the original observed data is gathered at two-hour intervals, the additive decomposition at a
frequency rate of two months (sixty days) allows to see the periodicity very well in the trends.
From the results of Figure 5.6 it can be inferred that there is a significant amount of seasonal
component involved in the traffic flows. Eliminating the seasonal component and trendiness in the
flow profiles gives the unaffected residual traffic flows that remains pretty much the same for any link.
As expected, there is a clear seasonal reverse shift in the seasonal plots which is captured in the trend
plots as well. Indicating that the summer traffic volume does gets changed when the winter season
starts. The trend plots on the other hand do reflect that traffic flows or traffic volume does dip for a
77
while when the days are shorter and the winter time changes but then it starts to get higher when the
days are back to their nominal length at the end of the December. Likewise, the flow densities may
differ for different days of the week, but the overall flow profiles almost resemble the general flow
profiles. The example of this behaviour is shown in figure 5.7 for 𝐿𝐿1𝑖𝑖𝑖𝑖 .
The conclusions made in these sub sections authenticates that above discussed facts that
interconnected links does influence the intra link traffic flows for a specific time of the day and this
effect is a seasonal one based on the localisation of the links. Therefore, these hidden features are
further sought to be explored in our proposed flow prediction architectures.
Figure 5. 7 Link 𝐿𝐿1𝑖𝑖𝑖𝑖 Flow Profiles with Respect to The Times of The Day Along with the Days of the Week Breakdown.
78
Figure 5. 8 Averaged Monthly Traffic Flows.
Figure 5.9 shows the result of the ADF statistics test on the on 𝐿𝐿1𝑖𝑖𝑖𝑖 flow series. The ADF test is also
called unit root test. It tells how much a trendiness in the time series is there. ADF uses an auto
regressive model. The null hypothesis set is that the time series can be represented by a unit root i.e.
the series in non-stationary. The alternate hypothesis being that the series is stationary. By comparing
the ADF test statistic to critical values we can accept or reject the hypothesis by comparing how much
both the result values differ each other. Figure 5.9 shows that the test statistic value of -8.62 is much
less than all the three critical thresholder values of the test suggesting that the flow series have no
unit root, rejecting the null hypothesis and that the time series is stationary and does not have time-
dependent structure but instead have a seasonality component so it can be classified as seasonal
stationary and not a strict stationary flow series. The same test was performed for all the link flows
and they all rejected the null hypothesis and exhibited a strong seasonal stationarity behaviour. Also,
to double check the p value is less than 0.05, which affirms our intuition that the series is in fact
stationary. And so no further steps are needed to perform to make the flow data series strict
stationary.
Experimental Environment
Experimental setup has been deployed on a single personal running laptop which made it easier for
off campus working with continued development as certain top end deep machine learning algorithms
took longer the expected for the best parameter estimation and for the ML models to be trained with
them.
79
Figure 5. 9 Stationary Test: Augmented Dickey Fuller Test Results.
Experimental Results
In this section the experimental results are presented. The experiment results in this section comprise
of the comparison of mean absolute error (MAE) and root mean square error (RMSE) results. The MAE
and RMSE are both calculated for the training and the test data respectively and overall for the links
data. Further we discuss different predictions cases as defined in section 5.2. More detailed
explanation of the performance measures mentioned in both the cases is presented in the conclusions
section.
80
Table 5. 1 MAE and RMSE Results for The Short-Term Prediction Horizon.
Table 5. 2 MAE and RMSE Results for The Medium-Term Prediction Horizon.
Table 5. 3 MAE and RMSE Results for The Long-Term Prediction Horizon.
81
Table 5. 4 MAE and RMSE aggregated Results of The Short-Term Prediction Horizon for The Multi Feature Inclusion.
Table 5. 5 MAE and RMSE aggregated Results of The Medium-Term Prediction Horizon for The Multi Feature Inclusion.
Table 5. 6 MAE and RMSE aggregated Results of The Long-Term Prediction Horizon for The Multi Feature Inclusion.
Summary
In this chapter, the experiments done to carry out the simulation of the models are discussed. With
the initial correlation analysis of the traffic flow data and the detailed dataset breakdown it was
discussed in detail. In the end the experimental results for different scenarios are presented. These
results are further discussed in the evaluation and conclusion chapter 7.
82
6. Evaluation and Conclusion
This chapter presents the evaluation of the experimental results performed in section 5.2. Further the
performance results are evaluated in section 6.1 with further discussions on results is presented in
section 6.2. At the end, conclusion of this study is presented in section 6.2.
Evaluation
In this section, the forecasting results of the performed experiments are evaluated. Each case of the
experiments is evaluated. Section 6.1.1 explains the performance measure for the case of just
considering the flow variables for three different prediction horizons: short, medium- and long-term
predictions. Whereas in section 6.1.2 prediction performances from the extended link flow variables
based on time dependent proposed flow optimisation function are discussed in detail.
83
engineering that the models had to do as a result of data pre-processing that had to be done to make
it more suitable for the medium horizon predictions.
6.1.2 Case 2: Evaluation of Experimental Results with Inclusion of the Related Variables
In this section the experimental results from section 5.2.2 with the inclusion of related time-based
flow link variables utilising the proposed objective function are evaluated. Deep learning models
including LSTM-ANN and DCNN-LSTM were only considered for this experiment case. The preliminary
analysis in section 5.3.2 showed that traffic flows are highly correlated and the their exists a strong
correlation with rest to the time of the day. Thus LSTM-ANN and DCNN-LSTM explored the spatial-
temporal features for better prediction accuracy under adverse circumstances when other shallow
models failed. The ECDF score plots for the case scenarios for the short, medium and long term are
given in figures 6.1-6.3, respectively. Whereas, the ECDF plots with multi feature learning are given in
figures 6.4-6.6.
Figure 6. 1 Empirical CDF Plot of Absolute Mean Square Error Score on the Short-Term Prediction Results.
84
Figure 6. 2 Empirical CDF Plot of Absolute Mean Square Error Score on the Medium-Term Prediction Results.
Figure 6. 3 Empirical CDF Plot of Absolute Mean Square Error Score on the Long-Term Prediction Results.
85
Figure 6. 4 Empirical CDF Plot of Absolute Mean Square Error Score on the Short-Term Prediction Results with Multi Link
Proposed Flow Learning.
Figure 6. 5 Empirical CDF Plot of Absolute Mean Square Error Score on the Medium-Term Prediction Results with Multi Link
Proposed Flow Learning.
86
Figure 6. 6 Empirical CDF Plot of Absolute Mean Square Error Score on the Long-Term Prediction Results with Multi Link
Proposed Flow Learning.
Discussion
Consistently from figures 6.1 & 6.2 it was found that SVR gave best results in both short- and medium-
term predictions. The ECDF was calculated on the k-fold validation results scores where the error,
considered was the mean absolute error. The ECDF was plotted to get the better understanding of the
ECDF on the individual model performances. Whereas further detailed model mean scores are given
in appendix A for each individual model with respect to their hyperparameter grid search. The reason
for SVR exhibiting best performance in these two cases is due to the regression mechanism which
leaded to classify the regressed nodes easily when they’re no other related features being considered
as been also reported in [47]. Which is apparent as in later experiments i.e. figure 6.3-6.6, when the
feature data was increased then SVR struggled to keep up with the deep learning models.
From the ECDFs, it was clear that models that deep learning cannot just learn the time series data but
also predict with relative ease and less error than the statistical techniques were the neural based
models and the specific RNN based LSTM exhibited its superior performance than simple feed forward
neural network.
87
the learning and testing of advanced models to further rule out the redundant models. So different
morning, evening or day condition can be considered specifically for training or class balancing in
terms for the data pre-processing.
By extending the feature vector with the proposed link objective function, the advanced deep models
performances showed an improvement as can be seen in figure 6.4-6.6. This is because with each
horizon increase as part of lag, during supervised pre-processing one more lagged feature was
considered which brought about the stability and more learning capability into these models. So, the
LSTM-ANN* model performances are improved while the DCNN-LSTM* performance accuracy went
down for long term predictions, but the models can be further compared by considering other learned
variables before having a final say. This can be attributed to the fact that the CNNs are unable to
preserve the pattern integrity in the original data in a bid to generate feature vectors through
convolutional layers but LSTMs on the other hand did a good job by forgetting what is not likely the
possible outcome using their forget and output gates. Such mechanism is missing in DCNNs. Almost
similar experimental observations are reported in [36].
6.2.1 Limitations
Conclusions
The research questions raised in section 1.2 are addressed in this section:
RQ1:
What are the potential hindering challenges for the practical implementation of the road traffic
parameter forecasting systems?
88
RQ2:
What are the state-of-the-art traffic prediction machine learning architectures for traffic flow
forecasting and what effect does the proposed methodology have on the chosen model performances?
Research questions one and two (RQ1 & RQ2) are answered by the detailed literature review in section
2.3 and 2.4. The main hinderance found through the literature review was that different prediction
models considered different traffic datasets and there were no common datasets across the literature,
which was one of the key issues. Ideally the model performance merits must be judged utilising the
common datasets. As it becomes difficult to address which one is the state-of-the-art model due to
the dynamic nature of the gathered traffic networks data. Through the recent advancement in deep
learning algorithms they have defined the new limits for the state of the art, which indeed in true in
the scope of this thesis as well and has been proved by the experimental results in this thesis.
Deep Learning Networks that evolved from different neural network-based forecasting models have
been extensively studied in the literature [44]. They have been integrated into deep belief networks
(DBNs) and later into Convolutional networks (CNNs) with much success. But currently the focus of
researchers is mostly on the deep learning and hybrid data riven models. i.e. CNN-LSTMs DCNN-LSTMs.
RQ3:
What are the state-of-the-art traffic prediction machine learning architectures for traffic flow
forecasting?
From the latest literature review the state-of-the-art deep machine learning techniques are now being
freshly considered in the field of ITS. The freshly proposed techniques in this thesis utilise a modular
approach on a road junction level which employs good features of a model to tackle the dynamic
nature of the data. This gives rise to the researchers being proposing an amalgam of hybrid data driven
techniques that mostly centre around the RNNs and ANNs.
RQ4:
What deep machine learning approaches have to offer when compared to conventional or shallow
machine learning techniques considering the traffic flow data?
Deep learning techniques have the added advantage of adaptability and continuous model training
which makes them a favourable candidate for the big data problems. Where shallow machine learning
techniques like SVR and RFR limits themselves as in this thesis, deep learning models takes the charge.
In ITS researchers are mostly focussing on spatial-temporal transport data. Which for deep learning
models is handed by different parts i.e. LSTMs handles the temporal data learning and ANNs or CNNs
handles the spatial based data.
The bi-directional flow function of individual roads is reported considering the net inflows and
outflows by a topological breakdown of the highway network. Further, the proposed objective
function is optimised and compared for constraints involved using statistical and neural based
machine learning models considering different loss functions and training optimisation strategies.
Finally, we report the best fitting machine learning model parameters for the proposed flow objective
function for better prediction accuracy. The deep learning models are also tested in a separate
experiment case for the features that are time dependent in the experiments. Although every flow
time series is time dependent but the combination of how the input data is fed to the models with
respect to the time does matter because the models exploits for the features that they see, which for
89
the proposed methodology in this thesis, was partly incorporated as part of the data pre-processing
phase.
The driving force of deep modular learning models is the hyperparameter tuning of each individual
model which took a lot of the authors time for the experimentation as well. But this is the key for the
making the best bets out of deep ML model. Without this the shallow ML models might perform better
than the deep learning techniques.
Conclusively, the results from experiments exhibit that shallow machine learning techniques can be
used if the data is sparse enough to be categorically predicted like in the case of SVR and RFR and if
not then the patterns in data needs to be learned properly using FFBNN and LSTM based deep learning
techniques, since the later performed better in highly correlated sparse data conditions . Also, the
proposed network breakdown for machine learning implementation does influence the performance
of final model which in our experiments improved than those of with no objective function to consider
traffic network flow links [44].
Contributions
There are two main contributions that are made in this thesis.
Future Works
This thesis explains that the deep learning-based methods can be applied to the traffic flow data from
the Highway England (HE). As the MIDAS dataset (refer section 4.4.1) have variously gathered traffic
parameters which can be used in conjunction with the flow features. This incorporation of new feature
vectors (i.e. average lane speeds fused, local weather conditions etc) can greatly lift the performance
of the deep learning models and can be deployed as further reliable Realtime traffic predictions
systems for the public. This would further modify the objective functions and would make them more
elaborative a complete representation of the network path (refer Appendix C), which would lead to
congestion emerging bottleneck points identification and their effect on individual links flow
forecasting. The trained model performances can be further subjected to different flow conditions
which will lead to more insights as to how the models will perform under varying traffic conditions.
These forecasting techniques would help the public and transport providers to adopt the safety
measures before the event is about to happen. Further detailed future works that could be adopted
are given in appendix C.
90
References
[1] P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10,
p. 78, 2012.
[2] R. Gupta and C. Pathak, “A machine learning framework for predicting purchase by online
customers based on dynamic pricing,” Procedia Comput. Sci., vol. 36, no. C, pp. 599–605, 2014.
[3] R. C. Staudemeyer and C. W. Omlin, “Extracting salient features for network intrusion detection
using machine learning methods,” South African Comput. J., vol. 52, no. July, pp. 82–96, 2014.
[4] M. Rabbani, R. Khoshkangini, H. S. Nagendraswamy, and M. Conti, “Hand Drawn Optical Circuit
Recognition,” Procedia Comput. Sci., vol. 84, pp. 41–48, 2016.
[5] B. van Riessen, R. R. Negenborn, and R. Dekker, “Real-time container transport planning with
decision trees based on offline obtained optimal solutions,” Decis. Support Syst., vol. 89, pp. 1–16,
2016.
[6] A. Verikas, A. Gelzinis, and M. Bacauskiene, “Mining data with random forests: A survey and results
of new tests,” Pattern Recognit., vol. 44, no. 2, pp. 330–349, 2011.
[7] M. Schuh, J. Sheppard, S. Strasser, R. Angryk, and C. Izurieta, “An IEEE standards-based visualization
tool for knowledge discovery in maintenance event sequences,” IEEE Aerosp. Electron. Syst. Mag.,
vol. 28, no. 7, pp. 30–39, 2013.
[8] A. S. Ahmad et al., “A review on applications of ANN and SVM for building electrical energy
consumption forecasting,” Renew. Sustain. Energy Rev., vol. 33, pp. 102–109, 2014.
[9] A. Anwar, T. Nagel, and C. Ratti, “Traffic origins: A simple visualization technique to support traffic
incident analysis,” IEEE Pacific Vis. Symp., pp. 316–319, Mar. 2014.
[10] J. W. C. van Lint, “Reliable Travel Time Prediction for Freeways,” te Delft, 2004.
[11] A. Abadi, T. Rajabioun, and P. A. Ioannou, “Traffic Flow Prediction for Road Transportation Networks
With Limited Traffic Data,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 653–662, 2015.
[12] C. Hsu and F. Lian, “A Case Study on Highway Flow Model Using 2-D Gaussian Mixture Modeling,” in
Proceedings of the 2007 IEEE Intelligent Transportaion Systems Conference, 2007, pp. 790–794.
[13] S. Oh, Y. J. Byon, K. Jang, and H. Yeo, “Short-term Travel-time Prediction on Highway: A Review of
the Data-driven Approach,” Transp. Rev., vol. 35, no. 1, pp. 4–32, 2015.
[14] C. Goves, R. North, R. Johnston, and G. Fletcher, “Short Term Traffic Prediction on the UK Motorway
Network Using Neural Networks,” Transp. Res. Procedia, vol. 13, pp. 184–195, 2016.
[15] K. Kumar, M. Parida, and V. K. Katiyar, “Short term traffic flow prediction in heterogeneous
condition using artificial neural network,” Transport, vol. 30, no. 4, pp. 397–405, 2015.
[16] Z. Abdelhafid, F. Harrou, and Y. Sun, “An Efficient Statistical-based Approach for Road Traffic
Congestion Monitoring,” in 5th Int. Conf. Electr. Eng. - Boumerdes, 2017, vol. 2017–Janua, pp. 1–5.
[17] R. Li and G. Rose, “Incorporating uncertainty into short-term travel time predictions,” Transp. Res.
Part C Emerg. Technol., vol. 19, no. 6, pp. 1006–1018, 2011.
[18] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Short-term traffic forecasting : Where we are and
where we ’ re going,” Transp. Res. Part C, vol. 43, pp. 3–19, 2014.
[19] C. Siripanpornchana, S. Panichpapiboon, and P. Chaovalit, “Effective variables for urban traffic
incident detection,” IEEE Veh. Netw. Conf. VNC, vol. 2016–Janua, pp. 190–195, Dec. 2016.
[20] Y. Zhu, Z. Li, H. Zhu, M. Li, and Q. Zhang, “A compressive sensing approach to urban traffic
estimation with probe vehicles,” IEEE Trans. Mob. Comput., vol. 12, no. 11, pp. 2289–2302, 2013.
[21] Z. Duan, Y. Yang, K. Zhang, Y. Ni, and S. Bajgain, “Improved Deep Hybrid Networks for Urban Traffic
Flow Prediction Using Trajectory Data,” IEEE Access, vol. 6, pp. 31820–31827, 2018.
[22] G. Fusco, C. Colombaroni, and N. Isaenko, “Short-term speed predictions exploiting big data on large
urban road networks,” Transp. Res. Part C Emerg. Technol., vol. 73, pp. 183–201, 2016.
[23] F. Schimbinschi, L. Moreira-Matias, V. X. Nguyen, and J. Bailey, “Topology-regularized universal
vector autoregression for traffic forecasting in large urban areas,” Expert Syst. Appl., vol. 82, pp.
301–316, Oct. 2017.
[24] F. Su, H. Dong, L. Jia, Y. Qin, and Z. Tian, “Long-term forecasting oriented to urban expressway traffic
situation,” Adv. Mech. Eng., vol. 8, no. 1, pp. 1–16, 2016.
[25] S. Oh, Y. Kim, and J. Hong, “Urban Traffic Flow Prediction System Using a Multifactor Pattern
Recognition Model,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 5, pp. 2744–2755, 2015.
[26] Z. Yuan and C. Tu, “Short-term Traffic Flow Forecasting Based on Feature Selection with Mutual
Information,” in Materials Science, Energy Technology, and Power Engineering I AIP Conf. Proc.,
2017, vol. 020179, no. 1, pp. 1–9.
[27] A. Zeroual, N. Messai, S. Kechida, and F. Hamdi, “A piecewise switched linear approach for traffic
flow modeling,” Int. J. Autom. Comput., vol. 14, no. 6, pp. 729–741, 2017.
[28] Q. Li, S. Li, and Y. Wang, “Traffic incident data analysis and performance measures development,”
IEEE Conf. Intell. Transp. Syst. Proceedings, ITSC, no. 086, pp. 65–69, 2007.
[29] J. Wang, X. Li, S. S. Liao, and Z. Hua, “A Hybrid Approach for Automatic Incident Detection,” IEEE
Trans. Intell. Transp. Syst., vol. 14, no. 3, pp. 1176–1185, 2013.
[30] R. Kalsoom and Z. Halim, “Clustering The Driving Features Based On Data Streams,” IEEE, pp. 89–94,
Dec. 2013.
[31] H. Nguyen, C. Cai, and F. Chen, “Automatic classification of traffic incident’s severity using machine
learning approaches,” IET Intell. Transp. Syst., vol. 11, no. 10, pp. 615–623, Dec. 2017.
[32] C. E. L. Hatri and J. Boumhidi, “Fuzzy deep learning based urban traffic incident detection,” 2017
Intell. Syst. Comput. Vis., pp. 1–6, Apr. 2017.
[33] J. Guo, Z. Liu, W. Huang, Y. Wei, and J. Cao, “Short-term traffic flow prediction using fuzzy
information granulation approach under different time intervals,” IET Intell. Transp. Syst., vol. 12,
no. 2, pp. 143–150, 2018.
[34] M. M. Rahman, S. C. Wirasinghe, and L. Kattan, “Analysis of bus travel time distributions for varying
horizons and real-time applications,” Transp. Res. Part C Emerg. Technol., vol. 86, no. December
2017, pp. 453–466, 2018.
[35] R. Fernandez and R. Planzer, “On the capacity of bus transit systems,” Transp. Rev., vol. 22, no. 3,
pp. 267–293, 2002.
[36] Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Y. Wang, “Traffic Flow Prediction with Big Data: A Deep Learning
Approach,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 865–873, 2015.
[37] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Large-scale transportation network congestion evolution
prediction using deep learning theory,” PLoS One, vol. 10, no. 3, 2015.
[38] J. Y. Ahn, E. Ko, and E. Kim, “Predicting Spatiotemporal Traffic Flow Based on Support Vector
Regression and Bayesian Classifier,” 2015 IEEE Fifth Int. Conf. Big Data Cloud Comput., pp. 125–130,
2015.
[39] X. Ma, Z. Dai, Z. He, J. Ma, Y. Y. Wang, and Y. Y. Wang, “Learning traffic as images: A deep
convolutional neural network for large-scale transportation network speed prediction,” Sensors
(Switzerland), vol. 17, no. 4, p. 818, Apr. 2017.
[40] R. Al Mallah, A. Quintero, and B. Farooq, “Distributed Classification of Urban Congestion Using
VANET,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 9, pp. 2435–2442, Sep. 2017.
[41] Z. Li, P. Liu, C. Xu, H. Duan, and W. Wang, “Reinforcement Learning-Based Variable Speed Limit
Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks,” IEEE Trans. Intell.
Transp. Syst., vol. 18, no. 11, pp. 3204–3217, 2017.
[42] R. Fu, Z. Zhang, and L. Li, “Using LSTM and GRU neural network methods for traffic flow prediction,”
Proc. - 2016 31st Youth Acad. Annu. Conf. Chinese Assoc. Autom. YAC 2016, pp. 324–328, 2017.
[43] G. Yang, Y. Wang, H. Yu, Y. Ren, and J. Xie, “Short-term traffic state prediction based on the
spatiotemporal features of critical road sections,” Sensors (Switzerland), vol. 18, no. 7, 2018.
[44] X. Cheng, R. Zhang, J. Zhou, and W. Xu, “DeepTransport: Learning Spatial-Temporal Dependency for
Traffic Condition Forecasting,” Proc. Int. Jt. Conf. Neural Networks, vol. 2018–July, pp. 1–8, 2018.
[45] S. V. Kumar and L. Vanajakshi, “Short-term traffic flow prediction using seasonal ARIMA model with
limited input data,” Eur. Transp. Res. Rev., vol. 7, no. 3, pp. 1–9, 2015.
[46] J. Guo, W. Huang, and B. M. Williams, “Adaptive Kalman filter approach for stochastic short-term
traffic flow rate prediction and uncertainty quantification,” Transp. Res. Part C Emerg. Technol., vol.
43, pp. 50–64, 2014.
[47] M. T. Asif et al., “Spatiotemporal patterns in large-scale traffic speed prediction,” IEEE Trans. Intell.
Transp. Syst., vol. 15, no. 2, pp. 794–804, 2014.
[48] J. Xin and S. Chen, “Bus Dwell Time Prediction Based on KNN,” Procedia Eng., vol. 137, pp. 283–288,
2016.
[49] P. Cai, Y. Wang, G. Lu, P. Chen, C. Ding, and J. Sun, “A spatiotemporal correlative k-nearest neighbor
model for short-term traffic multistep forecasting,” Transp. Res. Part C Emerg. Technol., vol. 62, pp.
21–34, 2016.
[50] D. Xia, B. Wang, H. Li, Y. Li, and Z. Zhang, “A distributed spatial-temporal weighted model on
MapReduce for short-term traffic flow forecasting,” Neurocomputing, vol. 179, pp. 246–26, 2016.
[51] J. Amita, S. S. Jain, and P. K. Garg, “Prediction of Bus Travel Time Using ANN: A Case Study in Delhi,”
Transp. Res. Procedia, vol. 17, no. December 2014, pp. 263–272, 2016.
[52] X. Ma, Z. Tao, Y. Y. Wang, H. Yu, and Y. Y. Wang, “Long short-term memory neural network for
traffic speed prediction using remote microwave sensor data,” Transp. Res. Part C Emerg. Technol.,
vol. 54, pp. 187–197, 2015.
[53] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma, “Spatiotemporal recurrent convolutional networks for
traffic prediction in transportation networks,” Sensors (Switzerland), vol. 17, no. 7, pp. 1–16, 2017.
[54] C.-M. Hsu, F.-L. Lian, and C.-M. Huang, “A Systematic Spatiotemporal Modeling Framework for
Characterizing Traffic Dynamics Using Hierarchical Gaussian Mixture Modeling and Entropy
Analysis,” IEEE Syst. J., vol. 8, no. 4, pp. 1126–1135, 2014.
[55] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, “Deep Learning: A Generic Approach for Extreme
Condition Traffic Forecasting,” Proc. 2017 SIAM Int. Conf. Data Min., pp. 777–785, 2017.
[56] W. Fan and R. B. Machemehl, Characterizing Bus Transit Passenger Waiting Times, vol.
SWUTC/99/1, no. 1. 1999.
[57] R. Fernández, “Modelling public transport stops by microscopic simulation,” Transp. Res. Part C
Emerg. Technol., vol. 18, no. 6, pp. 856–868, 2010.
[58] National Research Council (U.S.) et al., “Guidelines for the design and location of Bus Stops,” Transit
Coop. Res. Progr., 1994.
[59] J. B. D.B.Hess, “Waitingfor the bus,” J. Public Transp., vol. 7, no. 4, pp. 67–84, 2004.
[60] P. G. Furth and T. H. J. Muller, “Service Reliability and Hidden Waiting Time: Insights from Automatic
Vehicle Location Data,” Transp. Res. Board, vol. 1955, 2006.
[61] F. McLeod, “Estimating bus passenger waiting times from incomplete bus arrivals data,” J. Oper.
Res. Soc., vol. 58, no. 11, pp. 1518–1525, 2007.
[62] N. E. Myridis, “Probability, Random Processes, and Statistical Analysis, by H. Kobayashi, B.L. Mark
and W. Turin,” Contemp. Phys., vol. 53, no. 6, pp. 533–534, Nov. 2012.
[63] O. C. Ibe, O. A. Isijola, O. A. Isijola-Adakeja, and O. C. Ibe, “M/M/1 multiple vacation queueing
systems with differentiated vacations and vacation interruptions,” IEEE Access, vol. 2, pp. 1384–
1395, 2014.
[64] G. Xin and W. Wang, “Model Passengers’ Travel Time for Conventional Bus Stop,” J. Appl. Math.,
vol. 2014, pp. 1–9, Apr. 2014.
[65] D. A. Wu and H. Takagi, “M/G/1 queue with multiple working vacations,” Perform. Eval., vol. 63, no.
7, pp. 654–681, Jul. 2006.
[66] H. Yu, Z. Wu, D. Chen, and X. Ma, “Probabilistic Prediction of Bus Headway Using Relevance Vector
Machine Regression,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 7, pp. 1772–1781, Jul. 2017.
[67] Z. Yu, J. S. Wood, and V. V. Gayah, “Using survival models to estimate bus travel times and
associated uncertainties,” Transp. Res. Part C Emerg. Technol., vol. 74, pp. 366–382, 2017.
[68] H. Yu, D. Chen, Z. Wu, X. Ma, and Y. Wang, “Headway-based bus bunching prediction using transit
smart card data,” Transp. Res. Part C Emerg. Technol., vol. 72, pp. 45–59, 2016.
[69] B. A. Kumar, L. Vanajakshi, and S. C. Subramanian, “Bus travel time prediction using a time-space
discretization approach,” Transp. Res. Part C Emerg. Technol., vol. 79, pp. 308–332, 2017.
[70] M. Meng, A. Rau, and H. Mahardhika, “Public transport travel time perception: Effects of
socioeconomic characteristics, trip characteristics and facility usage,” Transp. Res. Part A Policy
Pract., no. xxxx, pp. 0–1, 2018.
[71] A. Gal, A. Mandelbaum, F. Schnitzler, A. Senderovich, and M. Weidlich, “Traveling time prediction in
scheduled transportation with journey segments,” Inf. Syst., vol. 64, pp. 266–280, 2017.
[72] A. Comi, A. Nuzzolo, S. Brinchi, and R. Verghini, “Bus travel time variability: Some experimental
evidences,” Transp. Res. Procedia, vol. 27, pp. 101–108, 2017.
[73] Y. Wang, Y. Zheng, and Y. Xue, “Travel time estimation of a path using sparse trajectories,” Proc.
20th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD ’14, no. 5, pp. 25–34, 2014.
[74] B. Yang, C. Guo, and C. S. Jensen, “Travel cost inference from sparse, spatio-temporally correlated
time series using markov models,” Proc. VLDB Endow., vol. 6, no. 9, pp. 769–780, 2013.
[75] T. Liebig, N. Piatkowski, C. Bockermann, and K. Morik, “Dynamic route planning with real-time
traffic predictions,” Inf. Syst., vol. 64, pp. 258–265, 2017.
[76] L. Gasparini, E. Bouillet, F. Calabrese, O. Verscheure, B. O’Brien, and M. O’Donnell, “System and
analytics for continuously assessing transport systems from sparse and noisy observations: Case
study in Dublin,” IEEE Conf. Intell. Transp. Syst. Proceedings, ITSC, no. April 2015, pp. 1827–1832,
2011.
[77] B. Sun et al., “An improved k-nearest neighbours method for traffic time series imputation,” ©IEEE
CAC 2017, vol. 10, no. October, pp. 7346–7351, 2017.
[78] M. Moniruzzaman, H. Maoh, and W. Anderson, “Short-term prediction of border crossing time and
traffic volume for commercial trucks: A case study for the Ambassador Bridge,” Transp. Res. Part C
Emerg. Technol., vol. 63, pp. 182–194, 2016.
[79] Y. Duan et al., “An efficient realization of deep learning for traffic data imputation,” Transp. Res.
Part C Emerg. Technol., vol. 72, no. 10, pp. 168–181, 2016.
[80] O. D. Cardozo, J. C. García-Palomares, and J. Gutiérrez, “Application of geographically weighted
regression to the direct forecasting of transit ridership at station-level,” Appl. Geogr., vol. 34, no. 4,
pp. 548–558, 2012.
[81] Q. V. Le I.lSutskever, OV.inyals, “Sequence to Sequence Learning with Neural Networks,” in Neural
Information Processing Systems Conference, 2016, pp. 1–9.
[82] L. Deng and N. Jaitly, “Deep Discriminative and Generative Models for Pattern Recognition,” pp. 1–
26, 2015.
[83] G. B. Zhou, J. Wu, C. L. Zhang, and Z. H. Zhou, “Minimal gated unit for recurrent neural networks,”
Int. J. Autom. Comput., vol. 13, no. 3, pp. 226–234, 2016.
[84] V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, “Efficient Processing of Deep Neural Networks: A
Tutorial and Survey,” Proc. IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
[85] K. Yin, W. Wang, X. Bruce Wang, and T. M. Adams, “Link travel time inference using entry/exit
information of trips on a network,” Transp. Res. Part B Methodol., vol. 80, pp. 303–321, 2015.
[86] F. N. Savas, “Forecast Comparison of Models Based on SARIMA and the Kalman Filter for In ation,”
2013.
[87] P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting , Second Edition Springer
Texts in Statistics. 2003.
[88] V. N. Vapnik, Statistical learning theory. 1998.
[89] Geoffrey E. Hinton, “Training Products of Experts by Minimizing Contrastive Divergence,” London,
2002.
[90] S. Hochreiter and J urgen Schmidhuber, “LONG SHORT-TERM MEMORY,” Neural Comput., vol. 9, no.
8, pp. 1735–1780, 1997.
[91] T. M. Units, “National Traffic Information Service DATEX II Service,” 2018.
[92] DfT, “Road traffic statistics,” pp. 1–13, 2014.
[93] Highways England, “Highways England – Data.gov.uk – Journey Time and Traffic Flow Data April
2015 onwards – User Guide,” no. April, pp. 1–14, 2015.
[94] A. Rahi and S. Ramalingam, “Empirical Formulation of Highway Traffic Flow Prediction Objective
Function Based on Network Topology,” Int. J. Adv. Res. Sci. Eng. Technol., vol. 5, no. November,
2018.
[95] D. Zhang and M. R. Kabuka, “Combining Weather Condition Data to Predict Traffic Flow: A GRU
Based Deep Learning Approach,” in 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure
Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data
Intelligence and Computing and Cyber Science and Technology
Congress(DASC/PiCom/DataCom/CyberSciTech), 2017, pp. 1216–1219.
[96] Y. Jia, J. Wu, and M. Xu, “Traffic flow prediction with rainfall impact using a deep learning method,”
J. Adv. Transp., vol. 2017, 2017.
[97] M. Shardlow, “An Analysis of Feature Selection Techniques,” Studentnet.Cs.Manchester.Ac.Uk, pp.
1–7, 2007.
[98] D. A. Dickey, Stationarity Issues in Time Series Models. .
[99] W. Fan and Z. Gurmu, “Dynamic Travel Time Prediction Models for Buses Using Only GPS Data,” Int.
J. Transp. Sci. Technol., vol. 4, no. 4, pp. 353–366, 2015.
[100] Y. Liu and H. Wu, “Prediction of Road Traffic Congestion Based on Random Forest,” 2017 10th Int.
Symp. Comput. Intell. Des., pp. 361–364, 2017.
[101] S. Sun, C. Zhang, and G. Yu, “A Bayesian network approach to traffic flow forecasting,” Intell. Transp.
Syst. IEEE Trans., vol. 7, no. 1, pp. 124–132, 2006.
[102] A. Pascale and M. Nicoli, “Adaptive Bayesian network for traffic flow prediction,” 2011 IEEE Stat.
Signal Process. Work., pp. 177–180, 2011.
[103] R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern classification (2nd edition),” in John Wiley & Sons,
Inc, no. 2nd ed., 2000.
[104] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Short-term traffic forecasting: Where we are and
where we’re going,” Transp. Res. Part C Emerg. Technol., vol. 43, pp. 3–19, 2014.
[105] W. Feng, Wei Feng, W. Feng, and Wei Feng, “PDXScholar Analyses of Bus Travel Time Reliability and
Transit Signal Priority at the Stop-To-Stop Segment Level,” 2014.
[106] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” 2014.
Appendix A : Hyperparameters Tuning Results
In this appendix the search results for the hyperparameters among each individual model are presented with
different forecasting horizon. Except the historical Average (HA) model every model had gone through
validation curve routines and grid search for hyper parameter. The best scored model hyperparameters were
used in the final training. The grid searches were performed for each prediction horizons.
A.1 Experiment Case1: Best Search Hyperparameters Used for Multi Prediction Horizons
The model best performing hyper parameter search results are presented in this appendix.
Figure A.1 ARIMA Hyperparameter Grid Search for Short, Medium- and Long-Term Prediction Horizon.
And the model was trained with the least AIC exhibiting hyperparameters values from the grid search as
given in figure A.2.
Figure A.2 RFR Hyperparameter Grid Search for Short Term Prediction Horizon.
Figure A.3 RFR Hyperparameter Grid Search for Medium Term Prediction Horizon.
Figure A.4 RFR Hyperparameter Grid Search for Long Term Prediction Horizon.
Table A.3 RFR Model Fit Final Best-Chosen Parameters.
Figure A.5 SVR Hyperparameter Grid Search for Short Term Prediction Horizon.
Figure A.6 SVR Hyperparameter Grid Search for Medium Term Prediction Horizon.
Figure A.7 SVR Hyperparameter Grid Search for Long Term Prediction Horizon.
Table A.4 SVR Model Fit Final Best-Chosen Parameters.
Figure A.8 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to Optimizers and
Activation Functions.
Figure A.9 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to Optimizers and
Number of Epochs.
Figure A.10 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to No of Neurons
and Batch Sizes.
Figure A.11 FFBNN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to No of Neurons
and Epochs.
Figure A.12 FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to Optimizers
and Activation Functions.
Figure A.13 FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to Optimizers and
Number of Epochs.
Figure A.14 FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to No of Neurons
and Batch Sizes.
Figure A.15 Figure FFBNN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to No of
Neurons and Epochs.
Figure A.16 Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to Optimizers and Activation
Functions.
Figure A.17 FFBNN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to Optimizers and
Number of Epochs.
Figure A.18 FFBNN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to No of Neurons and
Batch Sizes.
Figure A.19 Figure FFBNN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to No of
Neurons and Epochs.
Table A.5 FFBNN Model Fit Final Best-Chosen Parameters.
Figure A.20 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to the First RBM Layer
Iterations and First Layer RBMs Batch Size.
Figure A.21 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to The Second RBM
Layer Iterations and Second Layer RBMs Batch Size.
Figure A.22 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect to The Second RBM
Layer Iterations and Second Layer RBMs Numbers.
Figure A.23 DBN Hyperparameter Grid Search, Mean Results for Short Term Prediction Horizon with Respect To The Number OF
Neurons and Epochs.
Figure A.24 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to The First &
Second RBM Layer Iterations and RBM numbers and the Model Activation and Optimizer Functions.
Figure A.25 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect To The Number OF
Neurons and Second RBM Numbers.
Figure A.26 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to The Neural Layer
Batch Size and Number of Epochs.
Figure A.27 DBN Hyperparameter Grid Search, Mean Results for Medium Term Prediction Horizon with Respect to The RBM Layer
Batch Sizes.
Figure A.28 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to The First & Second
RBM Layer Iterations and RBM numbers and the Model Activation and Optimizer Functions.
Figure A.29 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect To The Number OF
Neurons and Second RBM Numbers.
Figure A.30 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to The Neural Layer
Batch Size and Number of Epochs.
Figure A.31 Figure A.27 DBN Hyperparameter Grid Search, Mean Results for Long Term Prediction Horizon with Respect to The RBM
Layer Batch Sizes.
Figure B.1 Data Flow and Operations in Long Short-Term Memory (LSTM) Unit Structure which Contains The Forget, Input, Output,
and Update Gates.
The difference between a feedforward and recurrent neural network is that a trained feed forward network
can be trained further for as many data structures for one class, but the main thing is it won’t necessarily
alter the classification accuracy for the other training data class. RNN ingest their own memory (F (n-1)) with
information in a sequence itself. RNN use this past information to perform decisions which a feedforward
network is incapable of performing.
Why Using GRU And LSTM As the Modified RNN? A larger simple RN networks exhibit the gradient
exploding phenomenon while the learning of long sequences during gradient descent. LSTM and GRU solve
this issue by controlling the flow of the information in RNN using various gates [83]. The basic structures of
an LSTM and GRU are shown in figure B.1. Only GRU structure with mathematical is discussed here. Although
the final approach is to compare the based RNN algorithm performance in our proposed model against the
LSTM and GRUs and if possible propose the change in the gated recurrent units in the end. A gated recurrent
unit (GRU) resembles in structure and working to that of LSTM except it doesn’t contains the output gate.
Which makes sure the content from the memory cell is written to the output at every time step.
A normal LSTM data flow model (refer figure B.1) mimics the operations of forget gate (ft), output gate (ot),
input gate (it), hidden memory update gate (ct) by using the mathematical operations using sigmoid function,
tanh and vector element multiplication and additions operations. LSTM model unit inherits an additional
input sequential input (ct) for better sequence memory keeping. But the whole learning process becomes
complex over time due to too much parameters although with an increased performance then simple RNN.
GRU on the other hand presented after LSTM, makes its structure less complex by eliminating the need to
pass an additional sequential data value (ct) instead by just determining the hidden input (ht) update through
an update gate (Wz) and reset gate (Wr) thus eliminating the need for the output gate. Reset gate resembles
in functioning to pretty much to that of the forget gate in LSTM. The overall computational complexity with
less parameters involves compared to LSTM while still performing better than LSTM makes it a favourable
model. Various other version of GRU have been presented in literature but we presented here just the base
models of both LSTM and GRU. The mathematical model representing the GRU data flow through a single
unit as shown as arrows and operations in figure 4.10 is presented in equation in 5-9.
After the GRU-NN model is developed next step is to train the model on the individual links flow rate dataset
that we have already filtered in the Data preliminary analysis (refer section 3.6). According to the overall
proposed model (refer section 5.7 methodology framework) different instances of GRU-NN model are
trained separately on time series flow rate dataset (refer table 5.5) for all the links with inflows and outflows
separate data trained models. Utilising the flow rate probability prediction for the link whose unidirectional
flow rate data is used for the training considering, the idea is to identify the bottleneck link at each node
considering the flowrate likelihood probability of all other links on a single node. The bottle neck link is the
one that limits the flow of traffic when compared to other links in a node. The bottleneck link identification
is done relative to all the link in a node this is because every node has different flow rates due to a number
of factor mainly being their location in the road network.
Gaussian Mixture Model Distribution Estimation (GMM) On Historical Links Flow Rate Data:
The predicted flow rate probability 𝑃𝑃�𝑓𝑓𝑖𝑖,𝑡𝑡 �𝐿𝐿1 gives the probability of the link 1. The effect of other links in a
node are to be taken care of by estimating their likelihood flow rate probabilities on at a time for all the
instances. Noe Links can be regarded as the cause links and the effect links somewhat same methodology
have been proposed in [101][102], which describes Bayesian network as a Gaussian mixture model (GMM.
We opt to use the GMM model for each individual link flowrate estimation using past data for different time
instances using the predefined parameters as given in equation (9).
Appendix C : Future Works
As a general understanding, when a flowrate bottleneck is created in a node position, the resulting
congestion then propagates patch by patch, right from the point of origin of constriction and in the opposite
direction of the traffic flow. Since the RNNGB model considers the direction of congestion waves so the
bottleneck identification becomes easy for each direction of traffic flow. Figure C.2 explains the basic
concepts of the bound-flow 𝑄𝑄𝑚𝑚𝑚𝑚𝑚𝑚 (bottleneck point) to be found for each individual directional link using
RNNGB model. As general approach shown in figure C.2, traffic flow link is considered congested if it falls
below 𝑄𝑄𝑚𝑚𝑚𝑚𝑚𝑚 , into the stable portion of the traffic density with ever decreasing traffic flux. The point of
location of 𝑄𝑄𝑚𝑚𝑚𝑚𝑚𝑚 and 𝐷𝐷𝑚𝑚𝑚𝑚𝑚𝑚 will differ for each link at different interval time aggregations for each node.
Figure C.2 Traffic Flux versus Traffic Density generalised observation with optimum traffic flux point Q (max) differentiating the
instable and stable unidirectional flow for a single link in any node [99].
Figure C.3 Flow Rate Space-time Diagram for a single link in a node considering one direction only.
Considering the space-time critical congestion flow (𝑄𝑄𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) graph for a single link, the corresponding space-
time mean congestion link density �𝜌𝜌𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 � is given by equation C.1.
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝑞𝑞𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑣𝑣𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = � (C.2)
𝜌𝜌𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
Where d is the average distance that the user travels through the considered study area. The dependence
of bottleneck and effects of bottleneck propagates from links to nodes and finally to the patches.
Equation C.4 is the general Gaussian Mixture Distribution estimation for a function where M is the number
of components and 𝛽𝛽𝑚𝑚 (. |𝜇𝜇𝑚𝑚,𝐶𝐶𝑚𝑚 ) is the 𝑚𝑚 − th Gaussian distribution term and 𝜇𝜇𝑚𝑚,𝐶𝐶𝑚𝑚 represents the
vectors of means values and the covariance matrix respectively. These parameters are estimated using the
expectation-maximum (EM) algorithm. The cause links are the ones being used for the flow estimations of
the effected ones (effect links) that is under consideration and the final output represents the flow states
for all the links in node. One link can a part of the two or more-effect links depending upon the network
structure.
𝑀𝑀
The basic Bayesian theory assumptions suggest that the flows rate data gathered for all the links is
independent of each other at any time intervals i.e. 𝑃𝑃(𝐴𝐴) is independent of 𝑃𝑃(𝐵𝐵) and vice versa. If the each
link instance is considered as an independent event then the marginal conditional distribution for a link L is
𝑝𝑝�𝒇𝒇𝑖𝑖,𝑡𝑡 �
given as 𝑝𝑝�𝑓𝑓𝑖𝑖,𝑡𝑡 � 𝒇𝒇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 (𝑖𝑖,𝑡𝑡) ) = 𝑝𝑝( 𝒇𝒇 because of the overall joint probability density or
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 (𝑖𝑖,𝑡𝑡) )
distribution 𝑝𝑝 ( 𝒇𝒇𝑖𝑖,𝑡𝑡 ) for all the links in a node.
The conditional probabilities cannot be usually considered commutative i.e. 𝑃𝑃 (𝐴𝐴│𝐵𝐵) ≠ 𝑃𝑃(𝐵𝐵|𝐴𝐴) . The
conditional probabilities relation is given by naïve bayes theorem as 𝑃𝑃 (𝐴𝐴|𝐵𝐵) = 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵) 𝑃𝑃(𝐵𝐵)� ↔
𝑃𝑃(𝐴𝐴)
𝑃𝑃(𝐵𝐵|𝐴𝐴) 𝑃𝑃(𝐵𝐵)
�𝑃𝑃(𝐴𝐴|𝐵𝐵) = �𝑃𝑃(𝐴𝐴) if and only if 𝑃𝑃(𝐴𝐴) ≈ 𝑃𝑃(𝐵𝐵) i.e. that is the probability likelihood of flowrates
for the links being considered are the same. Once we have calculated the likelihood probability using GMM
using equation C.4 for the effect links each for different time intervals we then estimate the conditional
distribution using equation C.5, where 𝒇𝒇𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 (𝑖𝑖,𝑡𝑡) being the prior probability is obtained from GRU-NN
predictions model for each link and normalised all the links in a node. Equation C.5 is the joint distribution
or probability density of all the conditional probabilities of each link on a given node provided that the
normalised probabilities of all the cause links simultaneously.
1,2*
Doctoral Student , 3 Senior Lecturer
1,3
Centre for Engineering Research, School of Engineering and Technology, University of Hertfordshire (UH), Hatfield,
Hertfordshire, United Kingdom, AL10 9AB.
2
Research Assistant at University Bus Limited (UNO), Hatfield, Hertfordshire, United Kingdom, AL10 9BS.
ABSTRACT: Accurate Highway road predictions are necessary for timely decision making by the transport authorities.
In this paper, we propose a traffic flow objective function for a highway road prediction model. The bi-directional flow
function of individual roads is reported considering the net inflows and outflows by a topological breakdown of the
highway network. Further, we optimise and compare the proposed objective function for constraints involved using
stacked long short-term memory (LSTM) based recurrent neural network machine learning model considering different
loss functions and training optimisation strategies. Finally, we report the best fitting machine learning model
parameters for the proposed flow objective function for better prediction accuracy.
KEY WORDS: Intelligent Transportation Systems, Machine Learning, LSTM, Flow Estimation, Hyper Parameter
Optimisation.
I.INTRODUCTION
With the understanding of how intelligent transport systems (ITS) operate in a modern city, their reliance on an
accurately predicted regional traffic flow and congestions changes have become inevitable. This gives rise to the quest
for finding the better formula to forecast traffic parameters for as close as possible to the real world observed
parameters [104]. But for ITS and transport operators to rely on traffic parametric forecasts, systems must be reliable,
and this is only possible when the forecasting systems represent the traffic network on a smallest unit as offered by the
network which consists of junction and the inter road links. Based on this criterion we set out the flow of this paper.
We report the unique significance of the proposed system in section II, section III sheds a detailed light on what has
already been done in the relevant subject in response to the advancements in machine learning technique and traffic
flow predictions. Section IV list the proposed strategy along with the subsequent subsections detailing the dataset and
pre-processing involved along with the system design and performance metrics are considered. Sections V and VI deal
with the experimental results and their conclusion with future suggestions respectively.
II. SIGNIFICANCE OF THE SYSTEM
The paper mainly focuses on predicting the real traffic flow based on retaining the traffic network topology in the form
of a dynamic objective function and using data driven time series spatiotemporal machine learning model to optimise
it for more accurate highway network individual road flow predictions.
Traffic flow forecasting has been in research discussions for quite some time. Traffic flow forecasting can be broadly
classified into two distinct categories which are as follows:
Parametric: Conventional approaches that use statistical methods for time series forecasting are normally termed as
parametric model approaches. The prior knowledge of data distribution is assumed in parametric approaches. Most
notable of these approaches are auto regressive integrated moving average (ARIMA) and its variant seasonal auto
regressive integrated moving average (SARIMA) [45], Kalman filters [46] and exponential smoothing [47]. The problem
with most of these parametric approaches is that they can effectively be employed for only one-time interval prediction
and cannot predict well enough due to the stochastic and nonlinear nature of the traffic data. This can better suit short
term forecasts only which are well biased towards the most recent observations in the data, thus this makes the
parametric approaches incapable of handling real world trends.
Non-Parametric: A few years ago machine learning (ML) strategy based traffic parameter prediction algorithms have
been utilised [48]. These data driven approaches are also termed as non-parametric approaches. The most commonly
tested non-parametric approaches for spatiotemporal traffic forecasting includes the k-nearest neighbours (KNN)
[49][50] and support vector regression (SVR)[47]. However, these shallow ML algorithms work in a supervised manner
which makes their performance dependent upon the dataset manual feature selection criteria.
With the advancement in the ML algorithms, a bit more sophisticated dense supervised learning approach is applied
for traffic predictions by using back propagation techniques in artificially connected neural networks (ANN) [25][51].
Although ANN out performs conventional linear parametric models but struggles with simple time series data learning
and finding global minimum. Recently, deep recurrent neural networks (RNN) have shown some great promises for
dynamic sequential modelling especially in the field of speech recognition [81][82]. Simple RNNs however suffer from
gradient explosion for extra-long sequence training which results in information loss and reduced performance [83]. Fu
R et al [42], have used the RNN variants called long short term memory (LSTM)[84] and gated recurrent units (GRU) for
the traffic forecasting because of their ability to retain and pass on the information that is necessary and forget what is
redundant using the output and forget gates. Haiyang Yu et al. proposed the spatiotemporal traffic feature learning
utilising the deep convolutional LSTM network where LSTM network learns the temporal dependent patterns in the
data. This makes the LSTM vanishing gradient problem during back propagation problem to fade off during error
training with the usage of LSTM memory blocks and makes it able to predict with much accuracy for longer sequences
[52]. For the very reason we employ LSTM in our proposed methodology to learn the temporal features whereas to
keep the training and the model architecture simple we incorporate the feed forward connected ANN layer at the end
for the spatial feature learning and then we train the whole architecture in a back-propagation manner. This is further
discussed in the system design section.
IV. METHODOLOGY
In this Section, we represent a traffic model as consisting of a set of nodes and input-output links. The traffic flow of a
set of input links will have an influence on the traffic flow of the output links. This model acts as a black box interpreting
and manipulating the system inputs. A system is governed by a set of rules associated with a combination of the inputs
fixed and dynamic states mapped to outputs and represented in mathematical terms [105]. Such a system can be
modelled as an objective function consisting of variable parameters is shown in figure 1.
Figure 1. A general function definition.
A) Definitions
We consider a highway junction spatially with inflows and outflows to be an independent system and designate each
junction system as a node denoted by 𝑁𝑁. The links 𝐿𝐿 serves as both the inputs as well as outputs of a node in
bidirectional highway links. As an example, consider a single sample node of an actual highway junction in Hertfordshire,
UK, shown in figure 2.a and its equivalent representation using the nodes and links configuration is given in figure 2.b.
Further, the bidirectional arrows indicate bidirectional traffic flow of the node. Here outflow implies traffic flow moving
away from the node and inflows to those moving into the node.
a) b)
Figure 2. a) Highway junction under consideration (Google Maps, 2018). b) Node illustration retaining junction original topology.
To predict the outflow of traffic for each individual link on a single node, all the incoming link flows are to be considered
for the output flow forecast objective function. The outflow of a node’s link is determined by the summation of inflows
of individual links of the node. Figure 2.b shows that the output flow associated with a link is dependent on the inflows
of every other link in the same node. The estimated traffic outflow for link 𝐿𝐿1 is given by equation (1) showing the
dependency of the objective function on the inflows associated with the rest of the links of the same node. Equation
(2) is a more general objective function mathematical representation which describes the conservation of flow with a
node system where 𝑥𝑥 is the link for which the flow is being calculated and 𝑛𝑛 is the total number of links on the same 𝑁𝑁.
This makes the objective function retain the correlations in the flow characteristics for each individual node link when
the single node is considered as a basic unit level in the traffic network.
𝐿𝐿1𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑓𝑓(𝐿𝐿2𝑖𝑖𝑖𝑖 + 𝐿𝐿3𝑖𝑖𝑖𝑖 + 𝐿𝐿1𝑖𝑖𝑖𝑖 ) { 𝐿𝐿1, 𝐿𝐿2, 𝐿𝐿3 𝜀𝜀 (𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑁𝑁) } (1)
𝑥𝑥, 𝑛𝑛 𝜀𝜀 (𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑁𝑁 )
𝐿𝐿(𝑥𝑥)𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑓𝑓( 𝐿𝐿(𝑛𝑛 − 𝑥𝑥)𝑖𝑖𝑖𝑖 ) � (2)
𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥 < 𝑛𝑛
a) b)
Figure 3. a) Extension of traffic network at node i showing three links and their associated inflows and outflows. b) A simple traffic
network at a node i with 3 links. It shows the distribution of incoming traffic dispersed as outgoing traffic at the node.
With reference to figure 2.b, let us consider a node i consisting of a set of links 𝐿𝐿(𝑗𝑗) that are associated with bi-
directional traffic flow 𝐹𝐹𝑖𝑖 . Each link 𝐿𝐿(𝑗𝑗) at the node i is associated with traffic inflow 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 ) and a corresponding
outflow indicated by 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑜𝑜𝑜𝑜𝑜𝑜 ). The function for the traffic flow of links 𝐿𝐿(𝑗𝑗) considers the fact that the traffic inflow
of every link contributes partially (to a certain degree) to the outflow of each of the other links at the same node. In
other words, the traffic outflow of a link is a function of the traffic inflow of all the other links including its own at the
node. This notion is modelled as follows:
𝐹𝐹 ′ (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 )
𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑜𝑜𝑜𝑜𝑜𝑜 ) = ∑𝑛𝑛𝑗𝑗=1 �𝐹𝐹 (𝐿𝐿(𝑗𝑗) ) (3)
𝑖𝑖 𝑖𝑖𝑖𝑖
𝐹𝐹 ′ (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 )
where �𝐹𝐹 (𝐿𝐿(𝑗𝑗) ) represents a fraction of the traffic inflow that contributes to an outflow of a specific link.
𝑖𝑖 𝑖𝑖𝑖𝑖
As an example, consider figure 3.b, in which the circle represents a node i with three links. The thick blue arrow indicates
the traffic inflow of link 𝐿𝐿(1)𝑖𝑖𝑛𝑛 that gets dispersed into the node and flows through the rest of the links. They contribute
to the outflows of the rest of links including itself. This dispersion is indicated by thin blue arrows in Fig 3.b. The outflow
of each of the links in shown in green arrows. The symbol ∃1−𝑗𝑗 indicates that part of the inflow of link 𝐹𝐹(𝐿𝐿(1)𝑖𝑖𝑖𝑖 )
contributes to the outflow of the links 𝐿𝐿(𝑗𝑗)𝑜𝑜𝑜𝑜𝑜𝑜 . The sum of the traffic flow of 𝐹𝐹(𝐿𝐿(1)𝑖𝑖𝑖𝑖 ) inside the node represented by
thin blue arrows is equal to the traffic inflow of 𝐿𝐿(1) represented by a thick blue arrow, at a time instant. This applies
to the traffic inflow of all other links at the node as shown in figure 3.b.
We will show in the sub-section E, the use of the above model in the proposed system design.
C) Dataset Description
We perform all the experiments on the traffic flow dataset for the chosen Hatfield Hertfordshire UK area junction as
shown in figure 1. The dataset is obtained from Gov.uk open datasets which contains public sector information licensed
under the Open Government Licence v3.0 [93]. The used dataset contains traffic flow information for two-hour timed
aggregated intervals from start of 1st April 2015 to the end of 31st Dec 2015 for the highway roads. First three and last
three raw dataset plots for links are shown in figure 4. The data is collected for the number of passing vehicles using
the loop detectors installed on both the ends of the selected highway links.
D) Dataset Pre-processing
Data Cleaning: As with every real world gathered data the links flow raw dataset had approximately 15% of values that
were missing. Due to the ongoing trends comprising of seasonality and other environmental factors it is very important
to retain the inherit trends in the traffic data. So, these values are imputed using the backward fill approach. The
backward filling approach takes the value from next interval logged value and make an imputation for the previous
interval. This imputing process continues until all the missing imputes are done through which all the inconsistencies
are resolved.
• Data Integration: A total of 3252 data samples are used for each considered link. Using equation (1) they are
reshaped to form an array of dimensions 3252x4. Where 4 corresponds to the links considered as given by equation
(1). The sample plot from dataset containing the newly shaped 𝐿𝐿1𝑖𝑖𝑖𝑖 and two outflows i.e.(𝐹𝐹(𝐿𝐿(1)𝑜𝑜𝑜𝑜𝑜𝑜 , 𝐿𝐿(2)𝑜𝑜𝑜𝑜𝑜𝑜 )) for
first and last three days of the gathered dataset are shown with twelve two-hour intervals as shown in figure 4.
• Data Transformation: After the data aggregation and reshaping is done it is further generalized and normalized by
scaling for the minimum and maximum values among each data column. i.e. intra flow links normalization. Further
the reshaped dataset is lagged by one-time interval to make it suitable for supervised training.
• Data Reduction: With the aim to generate the training and validation sets to train and validate the ML model we
consider 20% of the original dataset as the validation set. Since it’s a time series consecutive interval data the order
of training and validation ensemble is very important. Therefore, we consider the tail end 20% for the validation of
trained model after each training iteration.
• Data Discretization: Among the originally reported dataset there are twelve intervals in a twenty-four-hour time
window we consider only the twelve intervals which are two hours apart each to make the ML model training not
only fast but a more generalized representation of the sequential data throughout the day
E) System Design
In this section the machine learning model used to fit the pre-processed data is discussed. We discuss the architecture
of LSTM and the proposed architecture based on the combination of LSTM and the NN architectures.
• Feed Forward-Long Short-Term Memory (LSTM): As the first part we just consider the recurrent neural network
(RNN) variants called long short-term memory (LSTM) units in training for feed forward data iteration as the main
time series data learners of our ML architecture along with conventional connected feed forward neural networks
(NN). The hybrid LSTM-NN architecture is shown in figure 6. This part of the architecture consists of two layers of
LSTM units and one layer of densely connected NN. In between each layer is an activation function. The LSTM model
is defined [12] by the following set of equations:
Figure 5. Structural data flow in a Long Short-Term Memory (LSTM) unit [11].
𝑓𝑓𝑡𝑡 = 𝜎𝜎�𝑤𝑤𝑓𝑓 . [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑓𝑓 � (4), 𝑖𝑖𝑡𝑡 = 𝜎𝜎(𝑤𝑤𝑖𝑖 . [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑖𝑖 ) (5),
𝐶𝐶𝑡𝑡̅ = tanh(𝑤𝑤𝑐𝑐 . [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏𝑐𝑐 ) (6), 𝐶𝐶𝑡𝑡 = 𝑓𝑓𝑡𝑡 ⨂𝐶𝐶𝑡𝑡−1 + 𝑖𝑖𝑡𝑡 ⨂𝐶𝐶𝑡𝑡̅ (7),
𝑜𝑜𝑡𝑡 = 𝜎𝜎(𝑤𝑤0 [ℎ𝑡𝑡−1 , 𝑥𝑥𝑡𝑡 ] + 𝑏𝑏0 ) (8), ℎ𝑡𝑡 = 𝑜𝑜𝑡𝑡 ⨂ tanh(𝐶𝐶𝑡𝑡 ) (9).
LSTM’s general purpose can be defined as the estimation of the conditional probability
𝑝𝑝 �𝑦𝑦1 , 𝑦𝑦2 , … 𝑦𝑦 𝑇𝑇 ′ | 𝑥𝑥1, 𝑥𝑥2 , … 𝑥𝑥𝑇𝑇 � given that (𝑥𝑥1, 𝑥𝑥2 , … 𝑥𝑥𝑇𝑇 ) is an input sequence and (𝑦𝑦1 , 𝑦𝑦2 , … 𝑦𝑦 𝑇𝑇 ′ ) is the
corresponding output sequence. The lengths of 𝑇𝑇 ′ and 𝑇𝑇 may differ. The deep LSTM computes the conditional
probability by first computing the fixed dimensional input representations 𝑣𝑣, of the input sequence, from the
last hidden memory state of the LSTM layer [106]. The hidden states ℎ𝑡𝑡 for each individual LSTM unit is
calculated as given by the equation (9). Accordingly, for the proposed objective function in
(3), standard LSTM network for the 𝑖𝑖 𝑡𝑡ℎ node with internal hidden states 𝑣𝑣 of corresponding inputs
�∑𝑛𝑛𝑗𝑗=1( 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 )1 , ∑𝑛𝑛𝑗𝑗=1( 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 )2 … , ∑𝑛𝑛𝑗𝑗=1( 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 ) 𝑇𝑇 � is given by equation (10) :
𝑛𝑛 𝑛𝑛 𝑛𝑛
� 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑘𝑘)𝑜𝑜𝑜𝑜𝑜𝑜 )1 , 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑘𝑘)𝑜𝑜𝑜𝑜𝑜𝑜 )2 , … , 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑘𝑘)𝑜𝑜𝑜𝑜𝑜𝑜 ) 𝑇𝑇 ′ | ��( 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 )1 , �( 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 )2 … , �( 𝐹𝐹𝑖𝑖 (𝐿𝐿(𝑗𝑗)𝑖𝑖𝑖𝑖 ) 𝑇𝑇 � �
𝑗𝑗=1 𝑗𝑗=1 𝑗𝑗=1
𝑇𝑇 ′
𝐻𝐻𝑗𝑗 = 𝑓𝑓�𝐼𝐼𝑗𝑗 �; 𝐼𝐼𝑗𝑗 = ∑𝑛𝑛𝑗𝑗=1 𝑊𝑊𝑘𝑘𝑘𝑘 𝑋𝑋𝑗𝑗 (12) 𝑂𝑂𝑘𝑘 = 𝑓𝑓(𝐼𝐼𝑘𝑘 ); 𝐼𝐼𝑘𝑘 = � 𝑊𝑊𝑘𝑘𝑘𝑘 𝐻𝐻𝑗𝑗 (13)
𝑗𝑗=1
Substituting (11) and (12) in (13), we get:
The activation functions 𝐹𝐹𝑖𝑖 tested for the scope of this paper are given in table 1 along with their mathematical
representation. In our model the pre-processed data of shape (2602, 1, 4) with three inflows and one outflow according
to equation (1) is be fed into the model and the respective link inflow and outflow values for the next time interval can
be generated through the LSTM-NN. The shape dimensional values in (2602,1,4) represents the number of samples,
batch number, variable features or corresponding link values, respectively. For each model iteration a separate
validation set of similar shape (650, 1, 4) as of training data is used for the performance analysis measures. The final
model parameters including the number of LSTMs and NNs chosen along with activation function are further discussed
in the experiments section.
Activation Function (g) Mathematical Implementation
1
6. sigmoid 𝜎𝜎(𝑥𝑥) = ; 𝜎𝜎(𝑥𝑥) ∈ [0,1]
1+𝑒𝑒 −𝑥𝑥
𝑥𝑥
7. softmax 𝑒𝑒 𝑗𝑗
𝜎𝜎(𝑥𝑥)𝑗𝑗 = ∑𝐾𝐾 𝑥𝑥𝑘𝑘 ; 𝑗𝑗 = 1,2, … , 𝐾𝐾; 𝜎𝜎(𝑥𝑥)𝑗𝑗 ∈ [0,1]
𝑘𝑘=1 𝑒𝑒
8. tanh 1−𝑒𝑒 −2𝑥𝑥
tanh(𝑥𝑥) = ; tanh(𝑥𝑥) ∈ [−1, +1]
1+𝑒𝑒 −2𝑥𝑥
9. relu 𝑓𝑓(𝑥𝑥) = max(0, 𝑥𝑥) ; 𝑓𝑓(𝑥𝑥) ∈ [0, ∞)
Nomenclature: softmax represents the normalised exponential function for multiclass logistic function flow values in our case, that makes K-
dimensional vector x to have values in range [0, 1] that all add up to 1.
• Feed Backward-Loss and Optimiser Function: The second part of the system design considers the optimisation
function and the loss function while updating the feed forward model weights before the next iteration. The
iterative back-propagation allows the LSTM architecture to learn the temporal correlations amongst the intra node
links whereas as the connected NN layer help learns the spatial dependencies. A set of optimisation strategies and
loss functions considered in the experiments are given in table 2 & 3, respectively whose relative performances are
evaluated in the process.
diagonal.
F) Performance Metrics
For the performance measure for the proposed model, we consider the root mean square error (RMSE) as widely used
by researcher’s community in the field of machine learning. We consider validation RMSE as our major model
performance indicator. The formula given in equation (15) is the mathematical representation of RMSE.
1�
1 2
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = � ∑𝑁𝑁 2
𝑛𝑛=1(|𝑦𝑦𝑛𝑛 − 𝑦𝑦𝑛𝑛 |) � (15)
𝑁𝑁
where in equation (15), N represents the number of validation samples used for the error calculation, 𝑦𝑦𝑛𝑛 is the predicted
output and 𝑦𝑦𝑛𝑛 is the original value observed by model.
V. EXPERIMENTAL RESULTS
In this section we show how the hyper-parameters of the proposed LSTM-NN network are optimised based on the
network’s performance using the Hatfield node junction data. The following notation is observed. Let
𝑔𝑔 → 𝐴𝐴ctivation Function, 𝑋𝑋 → Optimisation Function, 𝐽𝐽 → Loss Function, 𝑛𝑛 → Number of nodes in hidden layer, 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐
and 𝑿𝑿𝒐𝒐𝒐𝒐𝒐𝒐 are the optimised output values of 𝐽𝐽 and 𝑋𝑋 respectively.
Hyper-parameters optimisation is carried as a three-stage process whereby we first determine optimal values of 𝐽𝐽 and
𝑋𝑋 using Algorithm A. These optimal parameters are in turn used by Algorithm B to determine the optimal parameters
of 𝑛𝑛. It is worth noting that 𝑛𝑛 takes only 2 sets of values in Algorithm A to determine 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐 and 𝑿𝑿𝒐𝒐𝒐𝒐𝒐𝒐 whereas in principle
several other combinations exist, and they are not considered at this point; instead they are optimised in the second
stage using Algorithm B.
Firstly, we compare the performance measure by changing the loss functions 𝐽𝐽along with the optimisation techniques
𝑋𝑋. We compare nine different loss functions for our data model including the most common ones majorly used in data
regression problems like mean square error, mean absolute error, mean squared logarithmic error, Poisson, cosine and
the probability based logarithmic hyperbolic cosine, cosine proximity, hinge and lastly the cross entropy based Kullback-
Leibler divergence. The best performing loss function 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐 is declared based on the minimum RMSE error.
The hybrid LSTM-NN model training is carried out by two different layer configurations of 𝑛𝑛 = (35, 5, and 5) and (45, 20,
20) at different instances each with three different optimisers used. Each layer configuration corresponds to the (LSTM-
layer1, LSTM-layer2, and NN-layer) respectively. But for each of them the activation function 𝑔𝑔 for the respective layers
was taken as constant i.e. (sigmoid, sigmoid, sigmoid) for the loss function versus the optimiser function performance
test. The optimiser we used are the simple stochastic gradient descent (SGD), to the adaptive gradient algorithm
(Adagrad) and running average-based root mean squared propagated gradient descent (RMSprop). Performance bar
graphs in figure 7 shows that the minimum validation RMSE is achieved by the RMSprop among all the three optimiser
which indeed is true in our case as the learning rate of the optimiser better adapts to the running average of time series
then just simply considering the previous time interval. And the least RMSE is achieved by the (45, 20, 20) layer
configuration. The training loss, accuracy and validation RMSE for each of the instances are shown in figure 7. All three
metrics reflect one and the same result.
Algorithm A: Hyper parameter Optimization - Loss (𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐 , ) and Optimisation Functions (𝑿𝑿𝒐𝒐𝒐𝒐𝒐𝒐 )
function Hyper parameter Optimisation ( 𝑱𝑱, 𝑿𝑿, 𝒈𝒈, 𝒏𝒏, 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐 , 𝑿𝑿𝒐𝒐𝒐𝒐𝒐𝒐 )
1. Input: Performance evaluate loss functions 𝑱𝑱 (dimensionality=9)
2. Compute RMSE
3. Output: 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐
Algorithm B: Hyper parameter Optimisation – Number of Hidden Layer Nodes, 𝒏𝒏𝒊𝒊 of LSTM Layers
function Hyper parameter Optimisation ( 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐 , 𝑿𝑿𝒐𝒐𝒐𝒐𝒐𝒐, 𝒏𝒏𝒐𝒐𝒐𝒐𝒐𝒐 )
5. Input: Performance evaluate Number of Hidden Layer Nodes, 𝒏𝒏 (dimensionality =20)
6. Compute RMSE
7. Output: 𝒏𝒏𝒐𝒐𝒐𝒐𝒐𝒐
In the third stage, we analyse the architecture based on the choice of different layer activation functions, 𝑔𝑔. From
Algorithms A and B, we consider the determined optimum performing RMSprop optimiser (𝑋𝑋𝑜𝑜𝑜𝑜𝑜𝑜 ), MAE as a loss
function (𝐽𝐽𝑜𝑜𝑜𝑜𝑜𝑜 ) and 𝑛𝑛𝑜𝑜𝑜𝑜𝑜𝑜 = (65,65, 5) as the chosen final layer LSTM unit configuration. This is because the (65,65,5)
combination exhibits the lowest mean validation RMSE out of all the configurations tested as shown in figure 8.
Algorithm C tests all the combination of layer activation functions from table 1. We find that the least validation RMSE
of 0.1398 is exhibited by the relu-tanh-relu configuration as shown in figure 9. The experimental result heat map in
figure 8 shows that tanh does generalise the objective function well enough compared to softmax and sigmoid. This is
because tanh as given in table 1 has a range of [-1, 1] and the negative first derivative is not a constant which is the
property common to both sigmoid and softmax activation functions.
function Hyper parameter Optimisation ( 𝑱𝑱, 𝑿𝑿, 𝒏𝒏, 𝑱𝑱𝒐𝒐𝒐𝒐𝒐𝒐 , 𝑿𝑿𝒐𝒐𝒐𝒐𝒐𝒐 , 𝒈𝒈𝒐𝒐𝒐𝒐𝒐𝒐 )
To forecast the traffic flow in transportation networks several methods have been proposed by many
researchers. During the survey it is seen that the flow prediction using conventional statistical and latest machine
learning techniques starting from simple KNN to the latest deep ANN and time series LSTMs are highly effective
in determining the spatiotemporal features which are crucial to traffic flow forecasting. In this paper we showed
the spatiotemporal flow data remodelling in the form of topological objective function and exhibited the
performance comparison of LSTM-NN with architecture parameter tunings. LSTM and ANN learns the temporal
and spatial features respectively. The network is simple and fast enough for online data learning with dedicated
geographical junction weight matrices for future training models. Future recommendations might include the
local weather and incident data in combination with the objective function.
REFERENCES
[1] P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, p. 78, 2012.
[2] R. Gupta and C. Pathak, “A machine learning framework for predicting purchase by online customers based on dynamic pricing,”
Procedia Comput. Sci., vol. 36, no. C, pp. 599–605, 2014.
[3] R. C. Staudemeyer and C. W. Omlin, “Extracting salient features for network intrusion detection using machine learning
methods,” South African Comput. J., vol. 52, no. July, pp. 82–96, 2014.
[4] M. Rabbani, R. Khoshkangini, H. S. Nagendraswamy, and M. Conti, “Hand Drawn Optical Circuit Recognition,” Procedia Comput.
Sci., vol. 84, pp. 41–48, 2016.
[5] B. van Riessen, R. R. Negenborn, and R. Dekker, “Real-time container transport planning with decision trees based on offline
obtained optimal solutions,” Decis. Support Syst., vol. 89, pp. 1–16, 2016.
[6] A. Verikas, A. Gelzinis, and M. Bacauskiene, “Mining data with random forests: A survey and results of new tests,” Pattern
Recognit., vol. 44, no. 2, pp. 330–349, 2011.
[7] M. Schuh, J. Sheppard, S. Strasser, R. Angryk, and C. Izurieta, “An IEEE standards-based visualization tool for knowledge discovery
in maintenance event sequences,” IEEE Aerosp. Electron. Syst. Mag., vol. 28, no. 7, pp. 30–39, 2013.
[8] A. S. Ahmad et al., “A review on applications of ANN and SVM for building electrical energy consumption forecasting,” Renew.
Sustain. Energy Rev., vol. 33, pp. 102–109, 2014.
[9] A. Anwar, T. Nagel, and C. Ratti, “Traffic origins: A simple visualization technique to support traffic incident analysis,” IEEE Pacific
Vis. Symp., pp. 316–319, Mar. 2014.
[10] J. W. C. van Lint, “Reliable Travel Time Prediction for Freeways,” te Delft, 2004.
[11] A. Abadi, T. Rajabioun, and P. A. Ioannou, “Traffic Flow Prediction for Road Transportation Networks With Limited Traffic Data,”
IEEE Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 653–662, 2015.
[12] C. Hsu and F. Lian, “A Case Study on Highway Flow Model Using 2-D Gaussian Mixture Modeling,” in Proceedings of the 2007 IEEE
Intelligent Transportaion Systems Conference, 2007, pp. 790–794.
[13] S. Oh, Y. J. Byon, K. Jang, and H. Yeo, “Short-term Travel-time Prediction on Highway: A Review of the Data-driven Approach,”
Transp. Rev., vol. 35, no. 1, pp. 4–32, 2015.
[14] C. Goves, R. North, R. Johnston, and G. Fletcher, “Short Term Traffic Prediction on the UK Motorway Network Using Neural
Networks,” Transp. Res. Procedia, vol. 13, pp. 184–195, 2016.
[15] K. Kumar, M. Parida, and V. K. Katiyar, “Short term traffic flow prediction in heterogeneous condition using artificial neural
network,” Transport, vol. 30, no. 4, pp. 397–405, 2015.
[16] Z. Abdelhafid, F. Harrou, and Y. Sun, “An Efficient Statistical-based Approach for Road Traffic Congestion Monitoring,” in 5th Int.
Conf. Electr. Eng. - Boumerdes, 2017, vol. 2017–Janua, pp. 1–5.
[17] R. Li and G. Rose, “Incorporating uncertainty into short-term travel time predictions,” Transp. Res. Part C Emerg. Technol., vol.
19, no. 6, pp. 1006–1018, 2011.
[18] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Short-term traffic forecasting : Where we are and where we ’ re going,” Transp.
Res. Part C, vol. 43, pp. 3–19, 2014.
[19] C. Siripanpornchana, S. Panichpapiboon, and P. Chaovalit, “Effective variables for urban traffic incident detection,” IEEE Veh.
Netw. Conf. VNC, vol. 2016–Janua, pp. 190–195, Dec. 2016.
[20] Y. Zhu, Z. Li, H. Zhu, M. Li, and Q. Zhang, “A compressive sensing approach to urban traffic estimation with probe vehicles,” IEEE
Trans. Mob. Comput., vol. 12, no. 11, pp. 2289–2302, 2013.
[21] Z. Duan, Y. Yang, K. Zhang, Y. Ni, and S. Bajgain, “Improved Deep Hybrid Networks for Urban Traffic Flow Prediction Using
Trajectory Data,” IEEE Access, vol. 6, pp. 31820–31827, 2018.
[22] G. Fusco, C. Colombaroni, and N. Isaenko, “Short-term speed predictions exploiting big data on large urban road networks,”
Transp. Res. Part C Emerg. Technol., vol. 73, pp. 183–201, 2016.
[23] F. Schimbinschi, L. Moreira-Matias, V. X. Nguyen, and J. Bailey, “Topology-regularized universal vector autoregression for traffic
forecasting in large urban areas,” Expert Syst. Appl., vol. 82, pp. 301–316, Oct. 2017.
[24] F. Su, H. Dong, L. Jia, Y. Qin, and Z. Tian, “Long-term forecasting oriented to urban expressway traffic situation,” Adv. Mech. Eng.,
vol. 8, no. 1, pp. 1–16, 2016.
[25] S. Oh, Y. Kim, and J. Hong, “Urban Traffic Flow Prediction System Using a Multifactor Pattern Recognition Model,” IEEE Trans.
Intell. Transp. Syst., vol. 16, no. 5, pp. 2744–2755, 2015.
[26] Z. Yuan and C. Tu, “Short-term Traffic Flow Forecasting Based on Feature Selection with Mutual Information,” in Materials
Science, Energy Technology, and Power Engineering I AIP Conf. Proc., 2017, vol. 020179, no. 1, pp. 1–9.
[27] A. Zeroual, N. Messai, S. Kechida, and F. Hamdi, “A piecewise switched linear approach for traffic flow modeling,” Int. J. Autom.
Comput., vol. 14, no. 6, pp. 729–741, 2017.
[28] Q. Li, S. Li, and Y. Wang, “Traffic incident data analysis and performance measures development,” IEEE Conf. Intell. Transp. Syst.
Proceedings, ITSC, no. 086, pp. 65–69, 2007.
[29] J. Wang, X. Li, S. S. Liao, and Z. Hua, “A Hybrid Approach for Automatic Incident Detection,” IEEE Trans. Intell. Transp. Syst., vol.
14, no. 3, pp. 1176–1185, 2013.
[30] R. Kalsoom and Z. Halim, “Clustering The Driving Features Based On Data Streams,” IEEE, pp. 89–94, Dec. 2013.
[31] H. Nguyen, C. Cai, and F. Chen, “Automatic classification of traffic incident’s severity using machine learning approaches,” IET
Intell. Transp. Syst., vol. 11, no. 10, pp. 615–623, Dec. 2017.
[32] C. E. L. Hatri and J. Boumhidi, “Fuzzy deep learning based urban traffic incident detection,” 2017 Intell. Syst. Comput. Vis., pp. 1–
6, Apr. 2017.
[33] J. Guo, Z. Liu, W. Huang, Y. Wei, and J. Cao, “Short-term traffic flow prediction using fuzzy information granulation approach
under different time intervals,” IET Intell. Transp. Syst., vol. 12, no. 2, pp. 143–150, 2018.
[34] M. M. Rahman, S. C. Wirasinghe, and L. Kattan, “Analysis of bus travel time distributions for varying horizons and real-time
applications,” Transp. Res. Part C Emerg. Technol., vol. 86, no. December 2017, pp. 453–466, 2018.
[35] R. Fernandez and R. Planzer, “On the capacity of bus transit systems,” Transp. Rev., vol. 22, no. 3, pp. 267–293, 2002.
[36] Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Y. Wang, “Traffic Flow Prediction with Big Data: A Deep Learning Approach,” IEEE Trans.
Intell. Transp. Syst., vol. 16, no. 2, pp. 865–873, 2015.
[37] X. Ma, H. Yu, Y. Wang, and Y. Wang, “Large-scale transportation network congestion evolution prediction using deep learning
theory,” PLoS One, vol. 10, no. 3, 2015.
[38] J. Y. Ahn, E. Ko, and E. Kim, “Predicting Spatiotemporal Traffic Flow Based on Support Vector Regression and Bayesian Classifier,”
2015 IEEE Fifth Int. Conf. Big Data Cloud Comput., pp. 125–130, 2015.
[39] X. Ma, Z. Dai, Z. He, J. Ma, Y. Y. Wang, and Y. Y. Wang, “Learning traffic as images: A deep convolutional neural network for large-
scale transportation network speed prediction,” Sensors (Switzerland), vol. 17, no. 4, p. 818, Apr. 2017.
[40] R. Al Mallah, A. Quintero, and B. Farooq, “Distributed Classification of Urban Congestion Using VANET,” IEEE Trans. Intell. Transp.
Syst., vol. 18, no. 9, pp. 2435–2442, Sep. 2017.
[41] Z. Li, P. Liu, C. Xu, H. Duan, and W. Wang, “Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce
Traffic Congestion at Freeway Recurrent Bottlenecks,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 11, pp. 3204–3217, 2017.
[42] R. Fu, Z. Zhang, and L. Li, “Using LSTM and GRU neural network methods for traffic flow prediction,” Proc. - 2016 31st Youth
Acad. Annu. Conf. Chinese Assoc. Autom. YAC 2016, pp. 324–328, 2017.
[43] G. Yang, Y. Wang, H. Yu, Y. Ren, and J. Xie, “Short-term traffic state prediction based on the spatiotemporal features of critical
road sections,” Sensors (Switzerland), vol. 18, no. 7, 2018.
[44] X. Cheng, R. Zhang, J. Zhou, and W. Xu, “DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition
Forecasting,” Proc. Int. Jt. Conf. Neural Networks, vol. 2018–July, pp. 1–8, 2018.
[45] S. V. Kumar and L. Vanajakshi, “Short-term traffic flow prediction using seasonal ARIMA model with limited input data,” Eur.
Transp. Res. Rev., vol. 7, no. 3, pp. 1–9, 2015.
[46] J. Guo, W. Huang, and B. M. Williams, “Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and
uncertainty quantification,” Transp. Res. Part C Emerg. Technol., vol. 43, pp. 50–64, 2014.
[47] M. T. Asif et al., “Spatiotemporal patterns in large-scale traffic speed prediction,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 2,
pp. 794–804, 2014.
[48] J. Xin and S. Chen, “Bus Dwell Time Prediction Based on KNN,” Procedia Eng., vol. 137, pp. 283–288, 2016.
[49] P. Cai, Y. Wang, G. Lu, P. Chen, C. Ding, and J. Sun, “A spatiotemporal correlative k-nearest neighbor model for short-term traffic
multistep forecasting,” Transp. Res. Part C Emerg. Technol., vol. 62, pp. 21–34, 2016.
[50] D. Xia, B. Wang, H. Li, Y. Li, and Z. Zhang, “A distributed spatial-temporal weighted model on MapReduce for short-term traffic
flow forecasting,” Neurocomputing, vol. 179, pp. 246–26, 2016.
[51] J. Amita, S. S. Jain, and P. K. Garg, “Prediction of Bus Travel Time Using ANN: A Case Study in Delhi,” Transp. Res. Procedia, vol.
17, no. December 2014, pp. 263–272, 2016.
[52] X. Ma, Z. Tao, Y. Y. Wang, H. Yu, and Y. Y. Wang, “Long short-term memory neural network for traffic speed prediction using
remote microwave sensor data,” Transp. Res. Part C Emerg. Technol., vol. 54, pp. 187–197, 2015.
[53] H. Yu, Z. Wu, S. Wang, Y. Wang, and X. Ma, “Spatiotemporal recurrent convolutional networks for traffic prediction in
transportation networks,” Sensors (Switzerland), vol. 17, no. 7, pp. 1–16, 2017.
[54] C.-M. Hsu, F.-L. Lian, and C.-M. Huang, “A Systematic Spatiotemporal Modeling Framework for Characterizing Traffic Dynamics
Using Hierarchical Gaussian Mixture Modeling and Entropy Analysis,” IEEE Syst. J., vol. 8, no. 4, pp. 1126–1135, 2014.
[55] R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, “Deep Learning: A Generic Approach for Extreme Condition Traffic
Forecasting,” Proc. 2017 SIAM Int. Conf. Data Min., pp. 777–785, 2017.
[56] W. Fan and R. B. Machemehl, Characterizing Bus Transit Passenger Waiting Times, vol. SWUTC/99/1, no. 1. 1999.
[57] R. Fernández, “Modelling public transport stops by microscopic simulation,” Transp. Res. Part C Emerg. Technol., vol. 18, no. 6,
pp. 856–868, 2010.
[58] National Research Council (U.S.) et al., “Guidelines for the design and location of Bus Stops,” Transit Coop. Res. Progr., 1994.
[59] J. B. D.B.Hess, “Waitingfor the bus,” J. Public Transp., vol. 7, no. 4, pp. 67–84, 2004.
[60] P. G. Furth and T. H. J. Muller, “Service Reliability and Hidden Waiting Time: Insights from Automatic Vehicle Location Data,”
Transp. Res. Board, vol. 1955, 2006.
[61] F. McLeod, “Estimating bus passenger waiting times from incomplete bus arrivals data,” J. Oper. Res. Soc., vol. 58, no. 11, pp.
1518–1525, 2007.
[62] N. E. Myridis, “Probability, Random Processes, and Statistical Analysis, by H. Kobayashi, B.L. Mark and W. Turin,” Contemp. Phys.,
vol. 53, no. 6, pp. 533–534, Nov. 2012.
[63] O. C. Ibe, O. A. Isijola, O. A. Isijola-Adakeja, and O. C. Ibe, “M/M/1 multiple vacation queueing systems with differentiated
vacations and vacation interruptions,” IEEE Access, vol. 2, pp. 1384–1395, 2014.
[64] G. Xin and W. Wang, “Model Passengers’ Travel Time for Conventional Bus Stop,” J. Appl. Math., vol. 2014, pp. 1–9, Apr. 2014.
[65] D. A. Wu and H. Takagi, “M/G/1 queue with multiple working vacations,” Perform. Eval., vol. 63, no. 7, pp. 654–681, Jul. 2006.
[66] H. Yu, Z. Wu, D. Chen, and X. Ma, “Probabilistic Prediction of Bus Headway Using Relevance Vector Machine Regression,” IEEE
Trans. Intell. Transp. Syst., vol. 18, no. 7, pp. 1772–1781, Jul. 2017.
[67] Z. Yu, J. S. Wood, and V. V. Gayah, “Using survival models to estimate bus travel times and associated uncertainties,” Transp.
Res. Part C Emerg. Technol., vol. 74, pp. 366–382, 2017.
[68] H. Yu, D. Chen, Z. Wu, X. Ma, and Y. Wang, “Headway-based bus bunching prediction using transit smart card data,” Transp. Res.
Part C Emerg. Technol., vol. 72, pp. 45–59, 2016.
[69] B. A. Kumar, L. Vanajakshi, and S. C. Subramanian, “Bus travel time prediction using a time-space discretization approach,”
Transp. Res. Part C Emerg. Technol., vol. 79, pp. 308–332, 2017.
[70] M. Meng, A. Rau, and H. Mahardhika, “Public transport travel time perception: Effects of socioeconomic characteristics, trip
characteristics and facility usage,” Transp. Res. Part A Policy Pract., no. xxxx, pp. 0–1, 2018.
[71] A. Gal, A. Mandelbaum, F. Schnitzler, A. Senderovich, and M. Weidlich, “Traveling time prediction in scheduled transportation
with journey segments,” Inf. Syst., vol. 64, pp. 266–280, 2017.
[72] A. Comi, A. Nuzzolo, S. Brinchi, and R. Verghini, “Bus travel time variability: Some experimental evidences,” Transp. Res.
Procedia, vol. 27, pp. 101–108, 2017.
[73] Y. Wang, Y. Zheng, and Y. Xue, “Travel time estimation of a path using sparse trajectories,” Proc. 20th ACM SIGKDD Int. Conf.
Knowl. Discov. data Min. - KDD ’14, no. 5, pp. 25–34, 2014.
[74] B. Yang, C. Guo, and C. S. Jensen, “Travel cost inference from sparse, spatio-temporally correlated time series using markov
models,” Proc. VLDB Endow., vol. 6, no. 9, pp. 769–780, 2013.
[75] T. Liebig, N. Piatkowski, C. Bockermann, and K. Morik, “Dynamic route planning with real-time traffic predictions,” Inf. Syst., vol.
64, pp. 258–265, 2017.
[76] L. Gasparini, E. Bouillet, F. Calabrese, O. Verscheure, B. O’Brien, and M. O’Donnell, “System and analytics for continuously
assessing transport systems from sparse and noisy observations: Case study in Dublin,” IEEE Conf. Intell. Transp. Syst.
Proceedings, ITSC, no. April 2015, pp. 1827–1832, 2011.
[77] B. Sun et al., “An improved k-nearest neighbours method for traffic time series imputation,” ©IEEE CAC 2017, vol. 10, no.
October, pp. 7346–7351, 2017.
[78] M. Moniruzzaman, H. Maoh, and W. Anderson, “Short-term prediction of border crossing time and traffic volume for commercial
trucks: A case study for the Ambassador Bridge,” Transp. Res. Part C Emerg. Technol., vol. 63, pp. 182–194, 2016.
[79] Y. Duan et al., “An efficient realization of deep learning for traffic data imputation,” Transp. Res. Part C Emerg. Technol., vol. 72,
no. 10, pp. 168–181, 2016.
[80] O. D. Cardozo, J. C. García-Palomares, and J. Gutiérrez, “Application of geographically weighted regression to the direct
forecasting of transit ridership at station-level,” Appl. Geogr., vol. 34, no. 4, pp. 548–558, 2012.
[81] Q. V. Le I.lSutskever, OV.inyals, “Sequence to Sequence Learning with Neural Networks,” in Neural Information Processing
Systems Conference, 2016, pp. 1–9.
[82] L. Deng and N. Jaitly, “Deep Discriminative and Generative Models for Pattern Recognition,” pp. 1–26, 2015.
[83] G. B. Zhou, J. Wu, C. L. Zhang, and Z. H. Zhou, “Minimal gated unit for recurrent neural networks,” Int. J. Autom. Comput., vol. 13,
no. 3, pp. 226–234, 2016.
[84] V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proc. IEEE,
vol. 105, no. 12, pp. 2295–2329, 2017.
[85] K. Yin, W. Wang, X. Bruce Wang, and T. M. Adams, “Link travel time inference using entry/exit information of trips on a
network,” Transp. Res. Part B Methodol., vol. 80, pp. 303–321, 2015.
[86] F. N. Savas, “Forecast Comparison of Models Based on SARIMA and the Kalman Filter for In ation,” 2013.
[87] P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting , Second Edition Springer Texts in Statistics. 2003.
[89] Geoffrey E. Hinton, “Training Products of Experts by Minimizing Contrastive Divergence,” London, 2002.
[90] S. Hochreiter and J urgen Schmidhuber, “LONG SHORT-TERM MEMORY,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[93] Highways England, “Highways England – Data.gov.uk – Journey Time and Traffic Flow Data April 2015 onwards – User Guide,”
no. April, pp. 1–14, 2015.
[94] A. Rahi and S. Ramalingam, “Empirical Formulation of Highway Traffic Flow Prediction Objective Function Based on Network
Topology,” Int. J. Adv. Res. Sci. Eng. Technol., vol. 5, no. November, 2018.
[95] D. Zhang and M. R. Kabuka, “Combining Weather Condition Data to Predict Traffic Flow: A GRU Based Deep Learning Approach,”
in 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and
Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology
Congress(DASC/PiCom/DataCom/CyberSciTech), 2017, pp. 1216–1219.
[96] Y. Jia, J. Wu, and M. Xu, “Traffic flow prediction with rainfall impact using a deep learning method,” J. Adv. Transp., vol. 2017,
2017.
[97] M. Shardlow, “An Analysis of Feature Selection Techniques,” Studentnet.Cs.Manchester.Ac.Uk, pp. 1–7, 2007.
[99] W. Fan and Z. Gurmu, “Dynamic Travel Time Prediction Models for Buses Using Only GPS Data,” Int. J. Transp. Sci. Technol., vol.
4, no. 4, pp. 353–366, 2015.
[100] Y. Liu and H. Wu, “Prediction of Road Traffic Congestion Based on Random Forest,” 2017 10th Int. Symp. Comput. Intell. Des., pp.
361–364, 2017.
[101] S. Sun, C. Zhang, and G. Yu, “A Bayesian network approach to traffic flow forecasting,” Intell. Transp. Syst. IEEE Trans., vol. 7, no.
1, pp. 124–132, 2006.
[102] A. Pascale and M. Nicoli, “Adaptive Bayesian network for traffic flow prediction,” 2011 IEEE Stat. Signal Process. Work., pp. 177–
180, 2011.
[103] R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern classification (2nd edition),” in John Wiley & Sons, Inc, no. 2nd ed., 2000.
[104] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Short-term traffic forecasting: Where we are and where we’re going,” Transp.
Res. Part C Emerg. Technol., vol. 43, pp. 3–19, 2014.
[105] W. Feng, Wei Feng, W. Feng, and Wei Feng, “PDXScholar Analyses of Bus Travel Time Reliability and Transit Signal Priority at the
Stop-To-Stop Segment Level,” 2014.
[106] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” 2014.
AUTHOR’S BIOGRAPHY
Arsalan Rahi graduated in Electronics Engineering from Ghulam Ishaq Khan Institute (GIKI) Institute Pakistan in 2014.He
finished his MSc in Embedded Intelligent Systems from Hertfordshire University (UH) United Kingdom in 2015. He is now
a PhD candidate in Biometrics and Media Processing department in UH since 2016 and working as a data scientist at
University Bus Limited (UNO). Field of interest includes smart transport management systems, IOT, Artificial Intelligence,
data analytics with research interests lies in the latest machine learning algorithms implementations.
Dr Soodamani Ramalingam is a Senior Lecturer in the School of Engineering and Technology, University of Hertfordshire since
2006. She has several years of academic and research experience in the UK, Singapore and Melbourne. She received her
PhD(CSE) award from the University of Melbourne, Australia in 1997 and her M.E.(CS) and B.E.(ECE) degrees from PSG College
of Technology, Bharathiar University in India. Her research expertise is in Computer Vision and Machine Learning, Biometrics,
Image Processing and Fuzzy Logic. Applications areas include Automatic Number Plate Recognition ANPR), 3D Face
Recognition and Intelligent Transportation Systems and Energy. She has over 65 international conference and 30 journal
publications in related areas of research. She is a member of IEEE and Biometrics Institute (UK).