0% found this document useful (0 votes)
10 views6 pages

Coordination Between Traffic Signals Based On Cooperative: S.S. Shamshirband, H. Shirgahi, M.Gholami and B. Kia

Coordination between Traffic Signals Based on Cooperative

Uploaded by

mohsen gholami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Coordination Between Traffic Signals Based On Cooperative: S.S. Shamshirband, H. Shirgahi, M.Gholami and B. Kia

Coordination between Traffic Signals Based on Cooperative

Uploaded by

mohsen gholami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

World Applied Sciences Journal 5 (5): 525-530, 2008

ISSN 1818-4952
© IDOSI Publications, 2008

Coordination between Traffic Signals Based on Cooperative


1
S.S. Shamshirband, 2 H. Shirgahi, 3 M.Gholami and 4 B. Kia

1
Department of Computer, Islamic Azad University, Chalous Branch, Chalous, Iran
2
Department of Computer, Islamic Azad University, Jouybar Branch, Jouybar, Iran
3
Iranian Academic Center for Education, Culture and Research Mazandaran Branch, Sari, Iran
4
Department of Science, Islamic Azad University, Chalous Branch, Chalous, Iran

Abstract: The single traffic signal control agent improves its control ability with the Multiagents-learning
method. This paper proposes a new cooperative learning method; called weighted strategy sharing (WSS) is
presented. In this method, each agent measures the expertness of its teammates and assigns a weight to
their knowledge and learns from them accordingly. The presented methods are tested on three traffic lights.
Also, the effect of the communication noise, as a source of uncertainty, on the cooperative learning method
is studied. Moreover, the Qtable of one of the cooperative agents is changed randomly and its effects on
the presented methods are examined. Results using cooperative traffic agents are compared to results of
control simulations where non-cooperative agents were deployed. The result indicates that the new
coordination method proposed in this paper is effective.

Key words: Multiagents cooperative learning expertness qlearning


• • •

INTRODUCTION the Years have adopted numerous techniques and used


various tools to implement multi-agent systems for
The increase in urbanisation and traffic congestion their problem domains. As researchers gain a better
create an urgent need to operate our transportation understanding of these autonomous multi-agent
systems with maximum efficiency. Realtime traffic signal systems, more features are incorporated into them to
control is an integral part of modern Urban Traffic enhance their performance and the enhanced systems
Control Systems aimed at achieving optimal Utilization can then be used for more complex application domains.
of the road network. Providing effective real time traffic Intelligent software agent is an autonomous
signal control for a large comp lex traffic network is an computer program, which interacts with and assists an
extremely challenging distributed control problem. end user in certain computer related tasks [1]. In any
Signal system operation is further complicated by the agent, there is always a certain level of intelligence. The
recent trend that views traffic signal system as a small level of the Intelligence could vary from pre-determined
component of an integrated multimodal transportation roles and responsibilities to a learning entity. Multi-
System. Optimization of traffic signals and other control Agent System is the aggregate of agents, whose object
devices for the efficient movement of traffic on streets is to decompose the large system to several small
and highways constitutes a challenging part of the systems which communicate and coordinate with each
advanced traffic management system of intelligent other and can he extended easily.
transportation system [1-6]. Agent-based simulations are models where multiple
For a large-scale traffic management system, it may entities sense and stochastically respond to conditions
be difficult or impossible to tell whether the traffic in their local environments, mimicking complex large-
network is flowing smoothly and assess its current scale system behavior [2]. The urban traffic system is a
state. Over the past few years, multi-agent systems have much complex system, which involved many entities
become a crucial technology for effectively exploiting and the relationship among them are Complicated.
the increasing availability of diverse, heterogeneous Therefore, the Application of MAS into the simulation
and distributed information sources. Researchers over of traffic system is suitable and efficient [3]. One of the
Corresponding Author: Dr. S.S. Shamshirband, Department of Computer, Islamic Azad University, Chalous Branch,
Chalous, Iran
525
World Appl. Sci. J., 5 (5): 525-530, 2008

most important issues for a learner agent is the Interface Knowledge


assessment of the behavior and the intelligence level of agent Database
Decision
the other agents. In addition, the learner agent must maker

assign a relative weight to the other agents’ knowledge Neighborl Communication


module
and use it accordingly. In general, these three issues are y control Executive
agent module
very complex and need careful attention. Therefore, in
Coordination Machine
this paper attention has been paid to find some Vehicle
Module
learning
solutions for homogeneous, independent and Detector
cooperative Q-learning agents. In some studies a new Environment

cooperative learning strategy, called weighted strategy


sharing (WSS) and some expertness measuring methods Fig. 1: Model of the control agent
are introduced [3, 4].
Some studies assumed that the learner agents
• Every agent controls an area of intersections [8].
cooperate only with the more expert agents. Also
The separation of the area should be done firstly
assumed that, the communication is perfect and all of
and then, it’s hard to change. The shortcoming of
the agents are reliable, therefore it is considered that all
of the agents could learn from each other.in addition, this method is that it isn’t flexible. We design our
effects of the communication noise as a source of control agent on the base of method (2). The model
uncertainty on the cooperative learning are studied. of it is shown in Fig. 1.
Moreover, the Q-table of one of the cooperative agents
is changed randomly and its effects on the presented The process of the control is as follows: first, the
method are examined [5, 6]. vehicle detector and the neighborly control agents send
In this paper, some kind of traffic signal control the information to the agent; then, it makes the decision
agents is developed in the agent-based simulation based on the received information and the knowledge it
environment and the coordination strategy between the owns; finally the decision is put into control action by
control agents is introduced in detail. In the next section the Executive module.
Then, WSS is briefly introduced and some expertness In RL, an agent tries to maximize a scalar evaluation
measures are presented. Section 3 introduces the detail (Reward or punishment) of its interaction with the
related with the coordination between more than two environment. The goal of a RL system is to find an
traffic control agents. In sections 4, the effectiveness of optimal policy which maps the state of the environment
the coordination strategy is proved in the simulation to an action which in turn will maximize the accumulated
system. Finally, the conclusion is of this paper is given future rewards. Most RL techniques are based on Finite
is section 5. Markov Decision Processes (FMDP) causing finite state
and action spaces. The main advantage of RL is that it
TRAFFIC SIGNAL CONTROL AGENT (TSCA) does not use any knowledge database, as do most
forms of machine learning, making this class of learning
According to the difference of the control scope, suitable for online learning. The main disadvantages are
there are there methods for the realization of the traffic a longer convergence time and the lack of generalization
light control agent: among continuous variables. The latter is one of the
most active research topics in RL [9-13].
• Every agent controls only a phase of an The control actions of the traffic light control agent
intersection [4-7]. In this situation, when there are are: ‘Extend’ or ‘terminate’. ‘Extend’ means to “extend
many intersections in the road network, the amount the original lamp state to the next time interval”;
of the agents is too large. And as a result, the ‘Terminate’ means to “change the lamp state”. We
communication and the coordination between the suppose that the states of the lamp are only green and
agents is much complex. red, the yellow state is eliminated.
• Every agent controls all the phases of an In this paper, the reward of the control agent is
intersection [5-7]. The control agent of this kind fuzzy reward determines whether to extend or terminate
could coordinate the benefit of all the phases of an the current green Phase based on a set of fuzzy rules.
intersection. The coordination between different
intersections depends on the social rules and the QC = Average queue length on the lanes served by the
game theory. current green, in veh/lane.
526
World Appl. Sci. J., 5 (5): 525-530, 2008

Fig. 2: Fuzzy set for traffic flow

Fig. 4: Weighted strategy sharing

Q̂ = f((AR,QN,QC),a, θ ) (1)

where, (AR, QN, QC) is the input state, a is the chosen


action, θ is the weight vector of the neural network.
Fig. 3: Fuzzy set for delay time
The possibility of choosing action a is determined
by the following function:
QN = Average queue length on lanes with red which
may receive green in The next phase, in veh/lane. Q(a)/τ
e
AR = Average arrival rate on lanes with the current Pa = (2)
∑ eQ ( b ) / τ
n
green, in veh/sec/lan. b =1

The decision making process based on a set of Where, n is the number of the actions; Q (a) is the
fuzzy rules which takes into account the traffic evaluation value of action a; τ is a positive number
Conditions with the current and next phases. The named as temperature. The higher the temperature, the
general format of the fuzzy rules is as follows: more average every action is selected.

If {QC is X1} and {AR is X2} COORDINATION MECHANISM


and {QN is X3} Then {E or T}.
WSS method : In the WSS method [16] (Fig. 3), it is
where, X1, X2, X3 = natural language expressions of assumed that n homogeneous one-step Q-learning
traffic conditions of respective variables. agents learn in some distinct environments and no
The Q-Value is a function of the main factors hidden state is produced [10-17].
influencing the control strategy, which include the The agents learn in two modes: individual learning
traffic flow of the green phase (AR); the number of the mode and cooperative learning mode (Fig. 5). At first, all
waiting vehicles in red phase (QN); Average queue of the agents are in individual learning mode. Agent i
length on the lanes served by the current green, in executes ti learning trials. Each learning trial starts from
veh/lane (QC). a random state and ends when the agent reaches the
Then, the Q-Value can be determined by the goal. After a specified number of individual trials, all
following function: agents switch to cooperative learning mode.
527
World Appl. Sci. J., 5 (5): 525-530, 2008

(1) Initialize of the reinforcement signals in that time interval. This


(2) While not end of learning do means that more successes and fewer failures are
(3) Begin considered a sign of a higher degree of expertness. This
(4) If ← in individual learning mode then expertness measuring method is not Optimal in some
(5) Begin individual learning situations. For example, the agent that has faced many
(6) Xi ← Find Current State() failures has some useful knowledge to be learned from
(7) ai← Select Action(ai) it. In other words, it is possible that this agent does not
(8) Do Action (a i) know the ways arriving at the goal, but it is aware of
(9) r i← Get Reward () those not leading to its target and can avoid them. Also,
(10) y i ← Go To Next State () an agent at the beginning of its learning process is
(11) V (yi)←Maxb∈actionsQ(yi, b) fewer experts than those learned for a longer time and
new old naturally has confronted more failures. Considering the
(12) Q (x , a ) :=( 1 −β )Q (x , a ) + β (r + γ V(y))
i i i i i i i i i i i
dis cussions, one expertness measure is introduced.
(13) ei←Update Expertness (ri) These measures include the following [15-17].
(14) End
(15) Else Cooperative Learning A) Absolute (Abs): A sum of the absolute value of the
(16) Begin reinforcement signals
(17) For j: = 1 to n do
(18) ei← Get Expertness (Aj) now

(19) Q inew←0 e Abs


i = ∑| ri( t ) | (4)
t =1
(20) For j: = 1 to n do
(21) Begin
Abs considers both rewards and punishments as a
(22) W ij←Compute Weights (i, j, e1 …en )
sign of being experienced.
(23) Q old ← GetQ(A)
j i
Type the title approximately 2.5 centimeters (1 inch)
(24) Q new
i
← Qinew + Wij* Q old
j below the first line of the page and use 20 points type-
(25) End font size in bold. Center the title (horizontally) on the
page. Leave approximately 1 centimeter (0.4-inches)
between the title and the name and address of yourself
Fig. 5: Algorithm, weighted sharing algorithm for
(and of your co-authors, if any.) Type name(s) and
agent (Ai)
address(s) in 11 points and center them (horizontally) on
the page. Note that authors are advised not to include
In cooperative learning mode, each learning agent
their email addresses (unless they really want to.)
assigns some weights to the other agents’ Q-tables with
respect to their relative expertness. Then, each agent
B) Weight assigning mechanisms
takes the weighted average of the others’ Q-tables and
Learning from All (LA): It can be said that all agents
uses the resulted table as its new Q-table.
have some valuable knowledge to be learned. When
n using all agents’ knowledge, the simplest formula to
Q new
i ← ∑ (Wij × Q jold ) (3) assign weight to agent j knowledge by learner could be
j= 1

ej
Expertness criteria: In the WSS method, Wij is a Wij = n
(5)
measure of agent reliance on the knowledge and the ∑ ek
k =1
experiences of agent. Here we argue that this weight is a
function of the agents’ relative expertness. In the
where n is the number of the agents and ek is the
strategy sharing method, expertness of the agents are
amount of the expertness of agent k. In this method,
assumed to be equal. Some studies used the user
effects of agent j knowledge on all learners are equal, i.e.
judgment for specifying the expert agent. This method
requires continuous human supervision.However, some
W 1j = W 2j =... W nj
studies specified the expert agents by means of their
successes and failures during current moves and Also all of Q-tables become homogeneous after
considered the expertness criterion as an algebraic sum each cooperation step.
528
World Appl. Sci. J., 5 (5): 525-530, 2008

IMPLEMENTATION 40

35
We have constructed a prototype traffic simulator
30
program to test the efficiency of the coordination

Total delay time


25
mechanism we proposed. The programming language fixed split time
20 scat
we used to build the simulator is VC#.Net.
proposed method
15

The prototype of simulator: The simulator prototype is 10

programmed mainly to verify the efficiency of the 5

coordination mechanism we proposed in this paper. The 0


traffic environment includes: 2-lane roads, 3

5
10

15

20

25

30

35

40

45

50
intersections, traffic light control agent and vehicles. cycle

The main reason we choose only 3 intersections is that


the computational complexity of more than 3 Fig. 7: Compare proposed method with scat and fix
intersections is too high and the work of this paper is time split
just an exploration. Further study should be done in the
future to simulate the Coordination among more than In this study supposed that there are only two
three intersections. phases in the three intersections. The percent of the
vehicles turning left is 0.2. Inputs of car default are 40
RESULT AND DISCUSSION vehicles.
Figure 6 and 7 shows the result of the simulation.
The road network in the simulator is shown From Fig. 6 we can see that the coordination mechanism
in Fig. 6. proposed in this paper is efficient, especially when the
traffic flow of horizontal direction is much more than the
vertical direction.

CONCLUSION

In this paper, one weight-assigning procedure for


the Weighted Strategy Sharing (WSS) methods was
introduced. Also, some criteria to measure the
expertness of the agents were presented. The
Fig. 5: The road network
introduced methods were tested on the Traffic Lights
problem. Detection of the agents with incorrect
70 knowledge and minimizing their effects on the
cooperative group learning is another Direction for
60
future research. To make the mechanism suitable
50 for more intersections, the algorithm should be
with coordination optimized to reduce the learning time of the TSCAs.
40
The simulator prototype of this paper is only a primary
with out
30 system. To be a more complete and universal traffic
coordination=VA
simulator, many of the elements should be improved
20
in the future work.
10
REFERENCES
0
0
4
8
12
16
20
24
28
32
36
40
44
48

1. Gary, S., H. Tan and K.L. Hui, 1998. Applying


intelligent agent technology as the platform for
Fig. 6: Compare between result of with and without simulation. Proceedings of the Simulation
coordination Symposium, pp: 180-187.

529
World Appl. Sci. J., 5 (5): 525-530, 2008

2. Sanchez, S.M. and T.W. Lucas, 1972. Exploring the 10. Sutton, R.S., 1988. Learning to predict by the
world of agent based simulations: simple models, methods of temporal differences. Mach. Learn,
complex analyses, Proceedings of the Simulation 3: 9-44.
Conference, pp.116-126. 11. Sutton, R.S., 1998. Machine Learning: Special Issue
3. Zhongzhi, S., 1998. The Advanced Artificial on Reinforcement Learning. Cambridge, MA: MIT
Intelligence. Beijing: Science Press, Chapter 10, Press, 8: 3-4.
pp: 223-226. 12. Yamaguchi, T., Y. Tanaka and M. Yachida, 1997.
4. Pan, G.C. and B. Maddox, 1995. A Framework for Speed up reinforcement learning between two
Distributed Reinforcement Learning, Lecture Notes agents with adaptive mimetism. Proc. IEEE Conf.
in Artificial Intelligence, Adaptation and Learning in Intl. Robot. Syst. (IROS), pp: 594-600.
Multi-Agent Systems, pp: 97-112.
13. Friedrich, H., M. Kaiser, O. Ragalla and R. Dillmann,
5. Liu, Z., 2007. A Survey of Intelligence Methods in
1996. Learning and communication in multi-agent
Urban Traffic Signal Control, International Journal
systems. Distributed Artificial Intelligence Meets
of Computer Science and Network Security, 7(7):
105-112. Machine Learning. Weiss, G. (Ed.). New York:
6. Lucia, A. and C. Bazan, 1994. Traffic Signal Springer-Verlag, 1221: 259-275.
Coordination Based on Distributed Problem 14. Watkins, C.J., 1989. Learning from delayed rewards.
Solving. 7’ Fanfares Symposium on Transportation Ph.D. dissertation, King’s College, Cambridge, UK.
Systems: Theory and Application of Advanced 15. Watkins, C.J. and P. Dayan, 1998. Q-learning
Technology, Tianjin, China, pp: 957-962. (technical note). Machine Learning: Special Issue
7. Ming, L.X. and F.Y. Wang, 2001. Study of city area on Reinforcement Learning. Cambridge, MA: MIT
traffic coordination control on the basis of agent. Press, pp: 55-68.
F’roceedings of the IEEE intelligent Transportation 16. Tan, M., 1993. Multi-agent reinforcement learning:
Systems Conference, Singapore, September, Independent vs. cooperative agents. In: Proc. 10th
pp: 758-761. Intl. Conf. Machine Learning, Amherst, MA.
8. Ossowaki, S., J. Cuena and A. Garcia, 1998. A Case 17. Friedrich, H., 1996. Learning and communication in
of Multiagent Decision Support: Using multi-agent systems, in Distributed Artificial
autonomous agents for urban traffic control. Intelligence Meets Machine Learning, Weiss, G.,
Lecture Notes in Artificial Intelligence, 1484: (Ed.). New York: Springer-Verlag, 1221: 259-275.
100-111.
9. Chen, W. and K. Decker, 2004. Developing
Altcmative Mechanisms for Multiagent
Coordination. Lecture Notes in Computer Science,
Springer-Verlag Heidelberg, 2413: 63-76.

530

You might also like