Reinforcement Learning Based Multiagent System For Network Traffic Signal Control
Reinforcement Learning Based Multiagent System For Network Traffic Signal Control
org
ISSN 1751-956X
Abstract: A challenging application of artificial intelligence systems involves the scheduling of traffic signals
in multi-intersection vehicular networks. This paper introduces a novel use of a multi-agent system and
reinforcement learning (RL) framework to obtain an efficient traffic signal control policy. The latter is aimed at
minimising the average delay, congestion and likelihood of intersection cross-blocking. A five-intersection
traffic network has been studied in which each intersection is governed by an autonomous intelligent agent.
Two types of agents, a central agent and an outbound agent, were employed. The outbound agents schedule
traffic signals by following the longest-queue-first (LQF) algorithm, which has been proved to guarantee
stability and fairness, and collaborate with the central agent by providing it local traffic statistics. The central
agent learns a value function driven by its local and neighbours’ traffic conditions. The novel methodology
proposed here utilises the Q-Learning algorithm with a feedforward neural network for value function
approximation. Experimental results clearly demonstrate the advantages of multi-agent RL-based control over
LQF governed isolated single-intersection control, thus paving the way for efficient distributed traffic signal
control in complex settings.
128 IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128– 135
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-its.2009.0070
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
learning way, rather than the conventional traffic scheduling. timings. In larger networks, the system defines smaller
The coordination factor among the intersection is an obscure regions for modelling and optimisation purposes. It uses in
object and has a nonlinear relation. The AI intelligent real time the same traffic flow modelling scheme used in
algorithm can compute the inner nonlinear relation which TRANSYT (a steady-state model using platoon-dispersion
cannot be provided by the traditional approach. Simulation equations). SCATS does not require modelling. It is an
results clearly indicate that the proposed RL control automated, real-time, traffic responsive signal control
scheme outperforms the LQF algorithm strategy by strategy using local controllers and regional computers for
yielding a lower delay and cross-blocking frequency, tactical and strategic control. Constant monitoring of
particularly for medium and high traffic arrival rates. traffic allows the system to choose the appropriate signal
timings from a library, minimising vehicle stops with light
The rest of the paper is structured as follows. Section 2 demand, minimising delay with normal demand and
presents the basic principles of RL. In Section 3, a maximising throughput with heavy demand. Systems such
summary of work in traffic signal scheduling is provided. as SCOOT and SCATS suffer from inefficient handling
Section 4 describes the multi-intersection RL system of saturated conditions due to inadequate real-time
model, including definitions pertaining to the state, action adaptability [6].
and reward function. Section 5 extends the algorithm by
introducing a neural network-based function approximation Other approaches such as OPAC [7 –9] and RHODES
module. Section 6 presents and discusses the simulation [7 – 9] calculate switching times by solving dynamic
results, and in Section 7, the conclusions are drawn. optimisation problems in a real-time manner. Both
approaches do not consider explicitly cycle, split and
offsets. To obtain the optimal switching times, a dynamic
2 Basic principles of RL optimisation problem is solved in real-time employing
RL is a field of study in machine learning where an dynamic traffic models and traffic measurements. The
agent, by interacting with and receiving feedback from its typical performance index to be minimised is the total
environment, attempts to learn an optimal action selection intersection delay. Such systems suffer exponential
policy. RL algorithms typically learn and progress in an complexities that diminish their chances of being deployed
iterative manner. During each iteration, the agent observes on a large scale [9].
its current environment, from which it infers the
environment’s state, then executes an action that leads the Another real-time control strategy is TUC [10]. Based on
agent to the subsequent state. Next, the agent evaluates this a store-and-forward modelling of the urban network traffic
action by the reward or penalty it is incurred and updates a and using the linear-quadratic regulatory theory, the design
value function, accordingly. The value function is the utility of TUC leads to a multivariate regulator for traffic-
construct that it attempts to maximise (or minimise). A responsive coordinated network-wide signal control that is
commonly used RL algorithm is Q Learning [3]. The particularly suitable also for saturated traffic conditions.
latter is a model-free RL algorithm, which assumes that Real-time decisions in TUC cannot be taken more
the agent has no explicit knowledge of its environment’s frequently than at the maximum employed signal cycle.
behaviour prior to interacting with it. Interaction with the The strategy will need to be redesigned in the case of
environment is what offers the agent knowledge regarding modifications and expansions of the controlled network.
both state transitions (as a function of actions taken) as TUC was compared with a fixed-time signal control
well as their related long-term reward prospect. The goal producing reduction in total waiting time and total travel
of the agent is to maximise such long-term reward, by time in the system.
learning a good policy which is a mapping from perceived
states to actions. In summary, RL methods provide a 3.2 Multi-agent systems
way to solve complex, real-world control problems, and
one such challenge is traffic signal scheduling in multi- On the basis of TUC system, but aiming to cope with large
intersection settings. networks and to allow distributed reconfiguration, de Oliveira
and Camponogara [11] proposes a framework for a network
of distributed agents to control linear dynamic systems which
3 Work in traffic signal scheduling are put together by interconnecting linear subsystems with
local input constraints. The framework decomposes the
3.1 Adaptive signal control systems optimisation problem arising from the model predictive
In the field of adaptive signal control systems, well-known control (MPC) approach into a network of coupled, but
systems include SCOOT [4] and SCATS [5]. SCOOT small subproblems to be solved by the agent network. Each
is a centralised system that continuously measures traffic agent senses and controls the variables of its subsystems,
volumes and occupancies that serve to adjust signal timings while communicating with agents in the vicinity to obtain
with the primary objective of minimising the sum of the neighbourhood variables and coordinate their actions. A
average queue in a specific area. A heuristic optimisation problem decomposition and coordination protocol ensures
evaluates potential timing plans adjusting the signal convergence of the agents’ iterates to a global optimum of
IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128 – 135 129
doi: 10.1049/iet-its.2009.0070 & The Institution of Engineering and Technology 2010
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
the MPC problem. The proposed approach achieved controlled Markov process in [16]. The Markovian model
performance comparable to the TUC system. developed as the system model for signal control
incorporates Robertson’s platoon dispersion traffic model
A distributed and interactive network of agents to manage between intersections and employs the value iteration
real-time traffic control was proposed in [12]. Each agent in algorithm to find the optimal decision for the controlled
the cooperative ensemble is able to dynamically determine Markov process.
the size of its cooperative zone. Therefore, a stochastic
cooperative parameter update algorithm was designed Three self-organising methods for traffic light control
improving the online learning and update process for the were proposed in [17]. The author defended schemes that
agent. are distributed and non-cyclic. The methods presented
distinguished themselves because no direct communication
In [7], a collaborative RL (CRL) system using a local between traffic lights was utilised, only local rules.
adaptive round robin phase switching model was employed
at each signalised junction of a network. Each signalised Finally, in previous work, the authors proposed an LQF
junction collaborates with neighbouring agents in order to traffic signal scheduling algorithm [18] for an isolated
learn appropriate phase timing based on traffic patterns. intersection. The LQF algorithm was designed for a signal
Traffic patterns were of a steady and uniform nature. The control problem employing concepts drawn from the field
approach was compared with a non-adaptive fixed-time of packet switching in computer networks. The novel
system and to a saturation balancing algorithm producing method proposed utilised a maximal weight matching
reduction in average waiting time per arrived vehicle. algorithm to minimise the queue sizes at each approach,
yielding significantly lower average vehicle delay through
In [13], multi-agent RL was applied to schedule traffic the intersection. LQF has been proved to be stable and
signals at six intersections by constructing a vehicle-based shown to yield strong performance attributes under various
model. The RL systems learn value functions estimating traffic scenarios. However, to schedule a multi-intersection
expected waiting times for cars given different settings of network, where a phase scheduling decision at one
traffic lights. Selected settings of traffic lights results from intersection would largely impact the traffic conditions in
combining the predicted waiting times of all cars involved. its neighbouring intersections, is a more complex task. For
Results show that the proposed algorithm can outperform such settings, LQF, as well as many other existing schemes,
non-adaptable traffic light controllers. is inherently limited in the sense that it is unable to take
into consideration neighbouring intersections conditions.
It will be demonstrated that RL offers the capability to
3.3 Additional work in traffic signal provide distributed control as needed for scheduling
scheduling multiple intersections. At its core, the RL algorithm learns
It is argued that the use of a model-based RL approach adds a nonlinear mapping between intersection elements,
unnecessary complexities compared with using model-free Q from which it can derive a high-performance policy for
Learning RL. In [14], the model-free Q Learning RL-based scheduling traffic signals at a network of intersections.
method was applied to derive an optimal and adaptive traffic
control policy for an isolated, two-phase traffic signal.
Performance was compared with that of a commonly used 4 System model
pre-timed signal controller, resulting in significantly lower
delays with variable traffic flows.
4.1 Notation and terminology
We begin with defining the terms and notation used
In [15], RL and approximate dynamic programming throughout the rest of the paper. Traffic throughput in our
was used to develop a self-sufficient adaptive traffic signal study is defined as the average number of vehicles per unit
controller that substantially reduced vehicle delays when of time that successfully traverse the intersection. Traffic
compared with fixed time control in an isolated congestion, typically occurring in multi-intersection
intersection. Two learning techniques, temporal difference settings, is a condition in which a link is increasingly
(TD) RL and perturbation learning were explored. The occupied by queued vehicles. Highly congested
TD method constantly tracks the difference between intersections often cause cross-blocking whereby vehicles
current estimation and actual observation of state values moving upstream fail to cross an intersection due to lack
and propagates the difference back to the functional of queuing positions at a designated link. Low traffic
parameter so as to update the approximation. The throughput and high congestion both lead to an increase
perturbation learning method directly estimates the in vehicle delay, a fundamental metric in evaluating traffic
gradients of the approximate function by giving a signal controller performance. In our study, as well as
perturbing signal to the system state. in the other multi-intersection scheduling schemes, the
ultimate goal is to maximise traffic throughput, avoid traffic
The problem of finding the optimal traffic signal timing congestion and intersection cross-blocking and reduce
plans has been solved as a decision-making problem for a overall vehicle delay.
130 IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128– 135
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-its.2009.0070
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
4.2 Intersection network configuration lanes which are notated by even numbers can either go
straight or turn right. Vehicles at the odd lanes should turn
The network of intersections under study is illustrated in left to their designated queues. All lanes can queue at most
Fig. 1. This is a five-intersection traffic network, in which 40 vehicles. In general, vehicles cross either only one
the intersection at the middle is referred to as the central intersection or three intersections (outbound-central-
intersection. The other four intersections are labelled outbound) prior to leaving the network. In our
as outbound intersections. In multi-agent systems, agent performance analysis, we collect statistics pertaining to
cooperation does not mean an agent can extract vehicles which have crossed the central intersection.
information from all other agents, but rather that agents are
able to inquire about regional (local) information from their
neighbours agents. Hence, we assume an intersection agent 4.3 Definition of the RL elements
can only communicate with its immediate neighbours. An RL problem is defined once states, actions and rewards
Consequently, the network in Fig. 1 is a basic are clearly defined. To that end, we next describe these
computational component for larger multi-intersection basic constructs in the context of our problem domain.
networks, without loss of generality.
4.3.1 System states: At each simulation time step, the
Individually, a four-way intersection is the most common local state of an intersection is based on local traffic
intersection in real life, and therefore being most suitable to statistics. The state is represented by an eight-dimensional
be considered in our approach. Even though the capacity feature vector with each element representing the relative
of intersections might diverge, the queues that impact traffic flow at one of the lanes. The relative traffic flow is
the intersections are within a certain range from the defined as the total delay of vehicles in a lane divided by
intersection. Therefore, considering the same maximum the average delay at all lanes in the intersection. For the
capacity for intersections yields a more generic solution. outbound intersection agent, only local traffic statistics is
considered (as suggested by the LQF algorithm). However,
During each simulation time step, new vehicles are the central intersection agent is assumed to have access to
generated, as governed by a Poisson process, outside each all states of its neighbouring intersections. Intuitively, such
outbound intersection. They are placed at the end of the additional information allows the central intersection to
queue of their respective destination lanes. No vehicles are better predict upstream traffic flow, thereby improving its
generated at the central intersection. Based on our previous signal scheduling behaviour. There is a nonlinear relation
study in [18], eight-phase combination schemes are between the action selection and traffic statistics in the
available to each intersection (Section 4.3.2). Vehicles in network. Given the large state space that is spanned
under these assumptions, we employ a feedforward neural
network providing nonlinear function approximation.
IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128 – 135 131
doi: 10.1049/iet-its.2009.0070 & The Institution of Engineering and Technology 2010
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
actions to be taken at regular intervals of time (20 time units, value is updated as follows
for example), innovating the green time constraints concept.
In other words, smaller (minimum) intervals can be set where Q(st , at ) Q(st , at ) + a[rt+1 + g max Q(st+1 , a)
a new action will take place or where the algorithm will a
continue to choose the same action until it does not yield − Q(st , at )] (2)
the best reward. Therefore, the proposed algorithm deviates
from the established notion of coordination (split, cycle
where at is the action executed while in state st leading to the
and offset) and explores the concept that the proposed RL-
subsequent state st+1 , and yielding a reward rt+1 , a is
based multi-agent system can potentially guarantee better
the step-size parameter used in the incremental method
overall network performance, minimising the average delay,
described above and changes from time step to time step
congestion and likelihood of intersection cross-blocking.
and g is the discount rate for the rewards.
4.3.3 Reward function: For both types of intersection With the continuous task, the reward to be maximised
agents, a reward is provided to an intersection agent after could easily go to infinite. The discount rate g determines
executing a given action. The reward ranges from 21 to 1, the present value of the future rewards. A reward received
where positive reward values are received if the current at time step in the future is worth gn21 after n time steps.
delay is lower than that of the previous time step. On the In vehicular traffic network, an intersection agent is
other hand, the agent is subject to a penalty (negative charged with selecting a phase combination for a
value) if an increased average delay is observed. In [19], a given traffic condition and updating the value function
weighted exit traffic flows metric was used to quantify the approximation using the resulting reward. In time, as the
reward. In our study, the local reward is directly affected by agent becomes proficient, the value function approximation
vehicle volume and vehicular delay at an intersection, which error is expected to become marginal.
is defined as
132 IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128– 135
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-its.2009.0070
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
Since the weights of the neural network are initially 6 Simulation results
randomly assigned, the system is expected to be exposed to
sufficient experience so as to gain proficiency. Therefore a 6.1 Simulation setup and parameters
large exploration rate is necessary at the very beginning
All simulations were executed in the Matlab environment
as means of promoting exposure to more states. The
using a discrete event environment. A one time unit refers
exploration rate is reduced with the agent’s proficiency, to a
to one time step in discrete simulation environment. One
point where very little exploration is required.
discrete simulation time step is defined as one discrete
second. Intersection agents took actions once every 20 time
5.3 Convergence for Q Learning by units, and no more than one vehicle was allowed to traverse
function approximation the intersection at each time unit. If a light signal
transitions from red to green, there is a 2 unit delay for
Although Q Learning in a tabular form has been proved
red-clear. Traffic arrivals followed the Poisson distribution
to converge deterministically to an optimal policy [22], Q
with average arrival rate ranging from 0.1 to 1.0. As
Learning with function approximation can only guaranteed
mentioned above, to evaluate the performance, we only
to reach a suboptimal policy [23], primarily because of the
collected statistics pertaining to vehicles which passed the
limitations of the function approximation module. Having
central intersection. All simulations pertained to the five-
said that, several techniques can be effectively utilised to
intersection network described in Section 4.2. The duration
improve the overall performance of neural networks in the
of each simulation run was 20 000 time units. The weight
context of value function approximations, with the
decay factor was 0.05, and the Q Learning discount factor
following techniques used in the setup described here.
( g) was chosen to be 0.95.
1. Data standardisation: Large variance in input signal
magnitudes would belittle the contributions of inputs with
relatively small dynamic range. To address this problem, 6.2 Results and discussion
the input values should be normalised within a specific Promising results were reported by the authors in their
range. For each element in state vector, we impose a previous work [18] pertaining to the case of a single
dynamic range of [21, 1]. intersection operating under the LQF scheme. Additional
work on traffic signal scheduling (Section 3) would usually
2. Activation function: Instead of using the common sigmoid be compared with a fixed time strategy, what can be
function, an anti-symmetric sigmoid function which is considered not to be a fair comparison to adaptive,
centred at zero is taken as the activation function. responsive or intelligent systems, due to its static nature.
Differently, the LQF algorithm was compared not only to
3. Learning rate adaptation: The learning rate controls how an optimised fixed time strategy but to a vehicle-actuated
fast the Q value would be updated. If the learning rate is controller as well, yielding best results at higher relative
smaller than the second-order derivative of the delta error, a traffic loads. Appropriately, the comparison of the RL
local minimum can be found. A more efficient way to system with the novel LQF system is then intrinsically
achieve this goal is to use the Boltzmann learning rate [24], valid. The primary goal of the simulations presented next
which gradually decreases in time, proportional to the were to contrast results drawn from a five-intersection
reduction in mean error. In our simulations, the learning rate traffic network performing solely under the novel LQF
is discounted by a fixed value and is lower-bounded by 0.01. algorithm, with the results obtained using the CRL
framework proposed. Therefore the RL-based agent was
4. Momentum: During the learning process, should the compared with that running the LQF algorithm, in which
error result in negligible changes to the state-action value every agent only considers its own local traffic volume
estimates, an agent would assume it found the optimal and thus controls its traffic signals in isolation. Prior to
policy. In all other cases, a momentum element is used to measuring and evaluating the performance of the central
modified the weight changes such that agent, the latter was given the opportunity to run for
10 000 steps so as to learn the environment with a
4k+1 = 4k + (1 − c k )D4kbp + c k (4k − 4k−1 ) (3) decreasing exploration rate. Following the learning phase,
the state– action value function updating scheme was
where 4k is the weight at time step k, D4kbp the error gradient allowed to continue with an exploration rate of 0.02. It
and c k the learning rate. should be noted that the outbound agents only apply the
LQF algorithm without RL and therefore not being
5. Weight decay: Because of the random initial value of required to go through the learning phase.
weights when constructing the neural network, a weight
decay scheme has similar impact to that of pruning An initial goal was to investigate the mean delay
methods and often leads to better performance. In our Q experienced by vehicles traversing the central intersection
Learning realisation, a simple weight decay approach was under either scheme. Fig. 2 depicts the mean delay per
taken, such that 4new = (1 − 1)4old . vehicle, averaged over ten simulation runs. As expected,
IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128 – 135 133
doi: 10.1049/iet-its.2009.0070 & The Institution of Engineering and Technology 2010
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
134 IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128– 135
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-its.2009.0070
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.
www.ietdl.org
[3] SUTTON R. , BARTO A.: ‘Reinforcement learning: an [14] ABDULHAI B.: ‘Reinforcement learning for the true
introduction’ (MIT Press, 1998) adaptive traffic signal control’, J. Transp. Eng., 2003, 129,
(3), pp. 278– 285
[4] JAYAKRISHNAN R., MATTINGLY S., MCNALLY M.: ‘Performance
study of SCOOT traffic control system with non-ideal [15] CAI C., WONG C.K., HEYDECKER B.G.: ‘Adaptive traffic signal
detectorization: field operational test in the city of control using approximate dynamic programming’, Transp.
Anaheim’. 80th Ann. Meeting of the Transportation Res. Part C, 2009, 17, (5), pp. 456 – 474
Research Board, Washington, DC, 2001
[16] YU X.-H., RECKER W.W.: ‘Stochastic adaptive control model
[5] WOLSHON B., TAYLOR W.: ‘Analysis of intersection delay for traffic signal systems’, Transp. Res., 2006, 14C, (4),
under real-time adaptive signal control’, Transp. Res. Part pp. 263– 282
C, 1999, 7, pp. 53 – 72
[17] GERSHENSON C.: ‘Self-organizing traffic lights’, Complex
[6] FEHON P.K.: ‘Adaptive traffic signals are we missing the Syst., 2005, 16, (1), pp. 29– 53
boat?’. ITE District 6 Ann. Meeting, DKS Associates, 2004
[18] WUNDERLICH R., LIU C., ELHANANY I., URBANIK T.: ‘A novel signal
[7] SALKHAM A., CUNNINGHAM R., GARG A., CAHILL V.: ‘A scheduling algorithm with quality of service provisioning for
collaborative reinforcement learning approach to urban an isolate intersection’, IEEE Trans. Intell. Transp. Syst.,
traffic control optimization’. Proc. 2008 IEEE/WIC/ACM 2008, 9, (3), pp. 536 – 547
Int. Conf. on Web Intelligence and Intelligent Agent
Technology, Sydney, Australia, December 2008, pp. 560–566 [19] JACOB C. , ABDULHAI B.: ‘Automated adaptive
traffic corridor control using reinforcement learning’,
[8] PAPAGEORGIOU M. , DIAKAKI C. , DINOPOULOU V. , KOTSIALOS A. ,
Transp. Res. Rec.: J. Transp. Res. Board, 2006, 1959,
WANG Y. :‘Review of road traffic control strategies’, Proc. pp. 1 – 8
IEEE, 2003, 91, (12), pp. 2043 – 2067
[20] ALBUS J.S.: ‘A theory of cerebellar function’, Math.
[9] PAPAGEORGIOU M. , BEN-AKIVA M. , BOTTOM J. , BOVY P. ,
Biosci., 1971, 2, pp. 25– 61
HOOGENDOORN S. , HOUNSELL N. , KOTSIALOS A. , MCDONALD M. :
‘ITS
and traffic management’, in BARNHART C. , LAPORTE G. (EDS.): [21] BROOMHEAD D.S., LOWE D.: ‘Multivariable functional
‘Handbook in operations research & management interpolation and adaptive networks’, Complex Syst.,
science: transportation’ (Elsevier, North Holland), vol. 14, 1998, 2, pp. 321 – 355
pp. 715– 774
[22] WALKINS C.J.C.H., DAYAN P.: ‘Q-learning’, Mach. Learn., 1992,
[10] DIAKAKI C., PAPAGEORGIOU M., ABOUDOLAS K.: ‘A multivariable 8, pp. 270 – 300
regulator approach to traffic-responsive network-wide
signal control’, Control Eng. Pract., 2002, 10, pp. 183 – 195 [23] HAYKIN S.: ‘Neural networks: a comprehensive
foundation’ (Prentice-Hall, 1998, 2nd edn.), pp. 670 – 700
[11] DE OLIVEIRA L.-B., CAMPONOGARA E.: ‘Multi-agent model
predictive control of signaling split in urban traffic [24] YEGNANARAYANA B.: ‘Artificial neural networks’ (PHI
networks’, Transp. Res., 2010, 18C, (1), pp. 120– 139 Learning Pvt. Ltd, 2004), pp. 190– 192
IET Intell. Transp. Syst., 2010, Vol. 4, Iss. 2, pp. 128 – 135 135
doi: 10.1049/iet-its.2009.0070 & The Institution of Engineering and Technology 2010
Authorized licensed use limited to: UNIVERSITY OF TENNESSEE. Downloaded on July 27,2010 at 17:30:06 UTC from IEEE Xplore. Restrictions apply.