Training Seminar Report 20117119

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Traffic Graph Convolutional Recurrent Neural

Network with LSTM and Transformers

Training Seminar Technical Report

28/08/2023

Submitted by

Shreya Singh ( 20117119)

Bachelor of Technology in Mechanical Engineering

Under the guidance of

Prof. Anil Kumar and Prof. Krishnan Murugesan

MECHANICAL AND INDUSTRIAL ENGINEERING


DEPARTMENT

INSTITUTE OF TECHNOLOGY ROORKEE


ROORKEE-247667
August 2023

CANDIDATE’S DECLARATION

I declare that the work carried out in this report entitled “Traffic Graph
Convolutional Recurrent Neural Network with LSTM and Transformers” is
presented on behalf of the fulfilment of the course MIN-499 submitted to the
Mechanical and Industrial Engineering Department, Indian Institute of
Technology Roorkee.
I further certify that the work presented in this report has not been submitted
anywhere for any kind of certification or award of any other degree/diploma.
Date: 28/08/2023
Place: Roorkee, India Shreya Singh
3

Tables of content:

1. Acronyms.............................................................................................4
2. Acknowledgement...............................................................................5
3. Abstract................................................................................................6
4. Introduction..........................................................................................7
5. Theoretical Background.......................................................................7
6. Objectives.............................................................................................8
7. Motivation ...........................................................................................8
8. Problem Statement...............................................................................9
9. Model architecture… ..........................................................................9
10.Methods..............................................................................................10
10.1. LSTM Model .............................................................................10
10.2.Transformers ...............................................................................11
10.3. Graph Convolutional Networks… .............................................12
11.Data set Analysis ................................................................................13
11.1. Study Area… .............................................................................13
11.2. Data Format ...............................................................................14
12.Methodology.......................................................................................15
12.1. TGC-LSTM……………………………………………………15
12.2. Transformers…………………………………………………..15
13.Results.................................................................................................16
14.Conclusion...........................................................................................18
15.References...........................................................................................19
4

Acronyms

● LSTM : Long short term Memory

● GC-LSTM : Graph Convolutional Long Short-Term Memory Neural Network

● GCN-TRANS : Graph Convolutional Networks with Transformers

● LSGC-LSTM : Localised Spectral Graph Convolutional Long Short Term Memory


Neural Network

● L1-Norm/MAE (Mean Absolute Error) :

● L2-Norm/RMSE (Root Mean


Squared Error) :
5

Acknowledgment

I would like to extend my sincere gratitude to all those who have contributed to
the success of this project. Firstly, I would like to express my thanks to
Professor Neetesh Kumar Sir (CSE Dept. IIT Roorkee) for his guidance,
insights, and unwavering support throughout the project. His expertise and
mentorship have been instrumental in shaping our project.
We also want to thank Ms. Nisha Singh Chauhan Ma’am (PhD student at IIT
Roorkee)for her valuable insights in completing the project.
I am grateful to have had the opportunity to undertake this project.
6

Abstract

Traffic forecasting is a particularly challenging application of spatiotemporal


forecasting, due to the time-varying traffic patterns and the complicated spatial
dependencies on road networks. To address this challenge, we learn the traffic
network as a graph and propose a novel deep learning framework, Traffic Graph
Convolutional Long Short-Term Memory Neural Network (TGC-LSTM), Long
Short-Term Memory Neural Network(LSTM), Localised Spectral Graph
Convolutional Long Short-Term Memory Neural Network (LSGC-LSTM) and
Graph Convolutional Networks with Transformers (GCN-TRANS), to learn the
interactions between roadways in the traffic network and forecast the network-
wide traffic state.
We define the traffic graph convolution based on the physical network topology.
The relationship between the proposed traffic graph convolution and the spectral
graph convolution is also discussed. An L1-norm (MAE) on graph convolution
weights and an L2-norm (RMSE) on graph convolution features are added to the
model’s loss function to enhance the interpretability of the proposed model.
Experimental results show that the proposed model outperforms baseline methods
on two real-world traffic state datasets. The visualisation of the graph convolution
weights indicates that the proposed framework can recognise the most influential
road segments in real-world traffic networks.
7

4. Introduction

Traffic forecasting is one of the most challenging components of Intelligent


Transportation Systems (ITS). The goal of traffic forecasting is to predict future
traffic states in the traffic network given a sequence of historical traffic states
and the physical roadway network. Since the volume and variety of traffic data
has been increasing in recent years, data-driven traffic forecasting methods have
shown considerable promise in their ability to outperform conventional and
simulation based methods

5. Theoretical background

Deep learning models have shown their superior capabilities of capturing


nonlinear spatiotemporal effects for traffic forecasting. Ever since the
precursory study using the feed-forward NN for vehicle travel time
estimation was proposed, many other NN-based models, including fuzzy
NN, recurrent NN, convolution NN deep belief networks auto-encoders
generative adversarial networks and combinations of these models have
been applied to forecast traffic states. With the capability of capturing
temporal dependencies, the recurrent NN or its variants, like LSTM and
GRU was widely adopted as a component of a traffic forecasting model to
forecast traffic speed, travel time, and traffic flow.
Multiple novel LSTM based models, such as bidirectional LSTM,
deep LSTM , shared hidden LSTM, and nested LSTM , have
been designed via reorganizing and combing single LSTM models and applied t
capture comprehensive temporal dependencies for traf c prediction. In addition,
sequenceto-sequence (seq2seq) architecture based models, have also
been used for traf c state sequence forecasting.
fi
fi

6. Objectives

Following objectives have been proposed for this project:


1) A traffic graph convolution operator is proposed to accommodate physical
specialties of traffic networks and extract comprehensive features.
2) A traffic graph convolutional LSTM neural network and Transformers is
proposed to learn the complex spatial and dynamic temporal dependencies
presented in traffic data.
3) To make learned localised graph convolution features more consistent and
interpretable, we proposed two regularisation terms, including an L1-norm on
traffic the graph convolution weights and an L2-norm on the traffic graph
convolution features.

7. Motivation

We learn the traffic network as a graph and define a traffic graph convolution
operation to capture spatial features from the traffic network. The traffic graph
convolution incorporates the adjacency matrix and the proposed free-flow
reachable matrix to extract localized features from the graph. We propose a
traffic graph convolutional LSTM neural network and Transformers to forecast
network-wide traffic states.
We also design two regularization terms on the TGC weights and TGC features,
respectively, that can be added to the model’s loss function to help the learned
TGC weight to be more stable and interpretable.
By evaluating on real-world traffic datasets, our approach is proved to be
superior to the compared baseline models. In addition, the learned TGC weight
can help to identify the most influential roadways, and thus, enhance the
interpretability of the proposed model.
9

8. Problem statement

A Deep Learning Framework for Network-Scale Traffic Learning and


Forecasting on Real World Data Set

9. Model Architecture

The architecture of the proposed Traf c Graph Convolution LSTM is shown


on the right side. The traf c graph convolution (TGC) as a component of the
proposed model is shown on the left side in detail by unfolding the traf c
graph convolution at time t, in which A˜ks and FFR (Free Flow Reachable Matrix)
with respect to a red star node are demonstrated

Fig. Showing Graph Convolutional on Left ans LSTM on right


fi
fi
.

fi

10

10. Methods

1. LSTM model:

An LSTM, a basic form of RNN (recursive neural network), needs to


input sequential data. Each time step (ht) depends on the previous step
(ht-1) and outside input (Xt.), then produces output (ot) at this moment
andoffers this time step (ht) for the next step. Finally, a fully connected
layermaps the output.
The spatial-temporal data relations were dealt with using LSTMs. Finally, a
fully connected layer maps the output of the last time step cell to the
output node. The LSTM has time steps and three hidden layers, and each
LSTM has 50 hidden units.
The non-sequential data are appended to the last hidden state, which is
then fully connected to the output layer with yield.Other parameters like
maximum iterations, mini-batch size, ReLU activation function, dropout,
learning rate and optimiser were adjusted according to the model validation
requirements.We calculate the yield for each input feature timely data

andyield for all input features in timely series.

Fig 2 : Flow chart for LSTM


11

2. Transformers Model :

Transformer is an architecture for transforming one sequence into another


one with the help of two parts (Encoder and Decoder)
The Encoder is on the left and the Decoder is on the right.
Both Encoder and Decoder are composed of modules that can be
stacked on top of each other multiple times, which is described by
Nx in the figure.
We see that the modules consist mainly of Multi-Head Attention
and Feed Forward layers. The inputs and outputs (target sentences)
are first embedded into an n-dimensional space since we cannot use
strings directly.
One slight but important part of the model is the positional encoding
of the different words. Since we have no recurrent networks that can
remember how sequences are fed into a model, we need to somehow
give every word/part in our sequence a relative position since a
sequence depends on the order of its elements. These positions
are added to the embedded representation (n-dimensional vector) of each
word.
12

3. Graph Convolutional Network:


The core idea of a graph convolution layer is to extract localised features
from input data in a graph structure. Thus, the product of the neighbourhood
matrix A˜, the input data xt , and a trainable weight matrix W, i.e. Ax˜ tW,
can be considered as a graph convolution operation to extract features from
one-hop neighbourhood . Then, the receptive field of the graph convolution
operation on a node is the one-hop neighbourhood. We extend the receptive
field of graph convolution by replacing the one-hop neighbourhood matrix
A˜ with the k-hop neighbourhood matrix (A˜)^k.

To enrich the feature space, the features extracted from different orders
(from 1 to K) of traffic graph convolution with respect to Xt are
concatenated together as a vector defined as follows

The GC{K} t ∈ R^(N×K) contains all the K orders of traffic graph


convolutional features, as intuitively. In this study, after operating
the TGC on input data xt , the generated GC{K} t
will be fed into the following layer in the proposed neural network
structure.
13

11. Dataset Analysis

1. Study area:

In this project, the data is collected by the inductive loop detectors


deployed on freeways in Seattle area. The freeways contains I-5, I-405,
I-90, and SR-520. This dataset contains spatial-temporal speed
information of the freeway system. In the picture, each blue icon
demonstrates loop detectors at a milepost. The speed information at a
milepost is averaged from multiple loop detectors on the mainlanes in a
same direction at the specific milepost. The time interval of the dataset is
5-minute.

Fig. Loop detectors deployed on freeways in Seattle area


14

2. Data Format:

A demo of the speed_matrix_2015 is shown as the following gure.


The horizontal header denotes the milepost and the vertical heade
indicates the timestamps.

This was further convert to numpy arrays to be used in code.

The name of each milepost header contains 11 characters

• 1 char: 'd' or 'i', i.e. decreasing direction or increasing direction


• 2-4 chars: route name, e.g. '405' demonstrates the route I-405
• 5-6 chars: 'es' has no meanings here
• 7-11 chars: milepost, e.g. '15036' demonstrates the 150.36 milepost

fi
.

15

13. Methodology

1. TGC-LSTM: Given the traffic state data xt and the graph related matrices as input, the
process of generating the final output ht after t steps of iteration.
The minibatch gradient descent process and the backpropagation-based
parameter updating process.

2. Transformers Model : For con dentiality reasons, the details of Transformers


Model can not be shared.

Fig. Flow Chart of the


Model
fi

16

14. Results

K L1-Norm L2-Norm
LSTM 2.98492495211943 0.148709757091356
LSGC-LSTM 1 4.26501764010497 0.292260112906983
LSGC-LSTM 2 4.14056689761565 0.31158408806262
LSGC-LSTM 3 4.1438384588285 0.321233727728875
LSGC-LSTM 4 4.20053284966008 0.283045182342036
LSGC-LSTM 5 4.11924856439494 0.293083399091012
GC-LSTM 1 3.0253767472294 0.174000397808078
GC-LSTM 2 2.76522413252443 0.129908970591243
GC-LSTM 3 2.66644555451595 0.119873028215239
GC-LSTM 4 2.64367515846024 0.123952958671975
GC-LSTM 5 3.89547943482374 0.140417291946409

Fig. Validation Loss VS Number of Epochs at K=3.


17

Fig. Training Time (is sec) VS Number of Epochs at K=3.

Fig. Training Loss VS Number of Epochs at K=3.


18

14. Conclusion

● We have used Graph Convolutional Networks (GCNs) in combination


with Transformers for traffic flow prediction which have proven to be
effective in capturing spatial-temporal relationships in traffic data. Graph
Convolutional Networks (GCNs) are designed to work with data
structured as graphs, where nodes sensors deployed on the road and edges
represent the connectivity among the sensors.
● GCNs are particularly useful for capturing spatial relationships in data, as
they allow information to be propagated through the graph's structure,
enabling nodes to gather information from their neighbours.
● On the other hand, Transformers are known for their ability to model
sequential and temporal relationships in data through self-attention
mechanisms. They have been highly successful in tasks like natural
language processing and time series analysis. When dealing with traffic
data, it's important to consider both the spatial and temporal aspects.
● GCNs help capture the spatial relationships between different road
segments or intersections, while Transformers can capture the temporal
patterns and dependencies within the traffic data.
19

References

[1]Zhiyong Cui , Student Member, IEEE, Kristian Henrickson , Ruimin Ke , Student


Member, IEEE, and Yinhai Wang , Senior Member, IEEE “Traffic Graph
Convolutional Recurrent Neural Network: A Deep Learning Framework for
Network-Scale Traffic Learning and Forecasting”, IEEE Transactions on Intelligent
Transportation Systems vol. 21, No. 11, November 2020
[2] D. Park and L. R. Rilett, “Forecasting freeway link travel times with a multilayer
feedforward neural network,” Comput. Civil Infrastruct. Eng.,
vol. 14, no. 5, pp. 357–367, Sep. 1999.
[3] E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Short-term traffic forecasting:
Where we are and where we’re going,” Transp. Res. C, Emerg. Technol.,
vol. 43, pp. 3–19, Jun. 2014.
[4] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memory
neural network for traffic speed prediction using remote microwave sensor data,”
Transp. Res. C, Emerg. Technol., vol. 54, pp. 187–197, May 2015.
[5] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning traffic as images:
A deep convolutional neural network for large-scale transportation network speed
prediction,” Sensors, vol. 17, no. 4, p. 818, 2017.
[6] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,”
Statist. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004.
[7] M. M. Hamed, H. R. Al-Masaeid, and Z. M. B. Said, “Short-term prediction of
traffic volume in urban arterials,” J. Transp. Eng., vol. 121, no. 3, pp. 249–254,
May 1995.
[8] J. W. C. Van Lint, S. P. Hoogendoorn, and H. J. Van Zuylen, “Freeway travel
time prediction with state-space neural networks: Modeling statespace dynamics
with recurrent neural networks,” Transp. Res. Rec.,
vol. 1811, no. 11, pp. 30–39, 2002.
[9] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for traffic
flow prediction: Deep belief networks with multitask learning,”
IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp. 2191–2201, Oct. 2014

You might also like