0% found this document useful (0 votes)
19 views5 pages

A Reinforcement Learning System With Neuro-Fuzzy Network

Uploaded by

Minh Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

A Reinforcement Learning System With Neuro-Fuzzy Network

Uploaded by

Minh Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/269411754

A Reinforcement Learning System with Neuro-Fuzzy Network and its


Applications

Conference Paper · December 2014


DOI: 10.13140/2.1.3244.1922

CITATION READS

1 357

4 authors, including:

Takashi Kuremoto Masanao Obayashi


Nippon Institute of Technology Yamaguchi University
156 PUBLICATIONS 1,317 CITATIONS 159 PUBLICATIONS 1,279 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Takashi Kuremoto on 13 December 2014.

The user has requested enhancement of the downloaded file.


Proceedings of Innovative Application Research and Education (ICIARE2014)

A Reinforcement Learning System with Neuro-Fuzzy Network

and its Applications

Takashi KUREMOTO1, Masanao OBAYASHI1,

Kunikazu KOBAYASHI2, Shingo MABU1

1
Graduate School of Science and Engineering, Yamaguchi University, Japan
2
School of Information Science and Technology, Aichi Prefectural University, Japan
1
{wu, m.obayas, mabu}@yamaguchi-u.ac.jp, [email protected]

Abstract: It is important to distinguish different states from the input information for intelligent
systems such as autonomous robots and software agents. In this paper, a self-organizing fuzzy neural
network (SOFNN) which can deal with the above problem is introduced. SOFNN can be applied to time
series forecasting, adaptive action learning of agents, and other fields. As a reinforcement learning system,
SOFNN can use Kimura et al.'s SGA (Stochastic Gradient Ascent) as its learning rule to solve problems in
partially observable Markov decision processes (POMDPs). In this paper, the reinforcement learning system
with SOFNN and SGA learning algorithm are introduced. Applications to behavior learning of agent, time
series prediction are reported.

Key-Words: Self-Organizing Fuzzy Neural Network (SOFNN), Reinforcement Learning (RL), Stochastic Gradient Ascent
(SGA).

1. Introduction machine learning method, has been shown its usability in the

The development of intelligent systems has been giving fields of adaptive control, system identification, pattern

impact to the social for decades. Autonomous robots, for recognition, time series prediction, and so on since several

example, are entering into our daily lives recently, and decades before [1] [2]. In this paper, we will show how a RL

artificial intelligence is showing its attraction and power more system can be applied to different fields such as adaptive

clearly to the social. behavior learning of agent (or autonomous mobile robot)

Facing to the unknown or variable environment, an [3]-[8], swarm learning of agents [5]-[8], and time series

intelligent system is required to identify the situation, extract forecasting [9]-[11]. The RL system is proposed in our

the knowledge hiding in its observed information, and judge previous works which uses a self-organizing fuzzy neural

to output adaptive actions to accomplish its tasks. Tasks for network (SOFNN) as state classifier, and a stochastic gradient

intelligent systems, for example, hunting a prey, finding the ascent (SGA) learning algorithm proposed by Kimura et al.

minimum path of a maze, predicting the future value of a time 2. RL system with SOFNN and SGA
series data, etc., are usually with uncertain elements. In other 2.1 SOFNN
words, intelligence or “know-how” needs to be obtained A self-organizing fuzzy neural network (SOFNN) is
according to the system design and learning algorithm for the proposed by Obayashi et al. [1] (Figure 1).
autonomous systems. For an n-dimension input state
Reinforcement learning (RL), as active unsupervised space xx1 (t ), x 2 (t ),..., x n (t ) , a fuzzy inference net is

68
T. Kuremoto, M. Obayashi, K. Kobayashi, S. Mabu: A Reinforcement Learning System with Neuro-Fuzzy Network

designed with a hidden layer composed by units of a threshold value of whether an input state is
fuzzy membership functions B xi (t )  , i.e., Eq. (1), to
i
k
evaluated enough by existing membership functions.
classify input states. A new rule is generated automatically when a

 xi (t )  c
Bik  xi (t )   exp 


 (1)
k
i 
2
new membership function is added according to Eq.


 2  k2


i
(2). Iteratively, the fuzzy net is completed to adapt to
Here ci ,  i denotes the mean and the deviation of ith
k k
input states.
membership function which corresponding to ith
dimension of input xi (t ) , respectively.

Let K (t ) be the largest number of fuzzy rules, we

have Eq. (2) :


if ( x1 (t ) is B1k ( x1 (t )) , ,... , xn (t ) is Bnk ( xn (t )) ) then
n
k ( x(t )) = B
i 1
i
k
( xi (t )) (2)

To determine the number of membership

functions and rules of fuzzy net, a self-organized fuzzy

neural network (SOFNN) which is constructed

adaptive membership functions and rules driven by


Figure 1 A self-organizing fuzzy neural network (SOFNN)
training data and thresholds automatically [5] [6].
Eq. (2) plays a role of state classification. The
The self-organizing process of SOFNN is given as
output of Fuzzy rule nodes are summarized with
follows.
modifiable weights to the output of the system. In the
Only one membership function is generated by
case of Figure 1, they are used as parameters of a
the first input data (for example, the position of
Gaussian distribution function  ,  [9] -[11]. Also they
agent) , the value of its membership’s center equals to
can be the value function of actions or states [5] [6], or
the value of input data, and the value of width of all
the value function of action and state (Q value) [7].
Gaussian function units is fixed to an empirical value.
2.2 SGA
The number of rule for membership functions is one at
Kimura et al.’s stochastic gradient ascent (SGA)
first, and the output of the rule R equals
1
algorithm [12] can be summarized as below.
to 1 (x(1)) = Step 1. Observe an input xt  from training
( x i (1)) according to Eq. (2).
n

B
i 1
1
i
data of time series.
Step 2. Predict a future data or an adaptive
For the next input state xx1 (t ), x 2 (t ),..., x n (t )  , a
action yˆ t  1 according to a
new membership function is generated if Eq. (3) is probability   yˆ t  1, w, xt  .
Step 3. Receive the immediate reward
satisfied. rt from the environment or by
calculating the prediction error.
max Bi ,s ( xi (t ))  F (3) Step 4. Improve the policy   yˆ t  1, w, xt  by
s
renewing its internal variable
Here Bi ,s ( xi (t )) denotes the value of existed w according to Eq. (4).
w  w   s wt  (4)
membership functions calculated by Eq. (1) and Her wt   w1 t , w2 t ,, wi t , denotes
synaptic weights, and other internal
s  1,2,..., Li (t ) indicates the sth membership
variables of SOFNN,  s is a positive
function with the maximum number Li (t ) . F denotes learning rate.

69
Proceedings of Innovative Application Research and Education (ICIARE2014)

wi t   (rt  b) Di t  (5) autonomous agents (robots) explored the minimum



Di t   ln  yˆ t  1, w, xt   Di t  1 (6)
w path from their fixed start position to a goal area
 (0    1) i is a discount factor, wi
denotes ith internal variable vector, (Figure 4). Four direction actions, up, down, left, right
b denotes the reinforcement baseline.
Step 5. For next time step t+1, return to can be observed and moved to them by the agent.
step 1.
・ ・
The finish condition of training iteration is ・
Learner
usually given by the convergence of Eq. (5).

3. Applications SOFNN Policy


π(na)
In this paper, we show how the system with SOFNN and

SGA is efficient to solve above problem with unknown

environment exploration of multiple agents. States


Action a
Environment
3.1 SOFNN with SGA Reward r
The relationship of the RL system and its environment is
Figure 2 SOFNN is used as a part of an internal model of
shown in Figure 2. The policy of action selection is given by
autonomous robot, i.e., a RL system
an asymmetric probability distribution function APDF [3] [4].
  (x )
 pe ( na   )
 ( na )   (7)
  (x )
(1  p) e
 ( na   )

 w kptk (x(t ))
p k (8)
 tk (x( t ))
k

 wk tk (x(t ))


 k (9)
 tk (x( t ))
k
Figure 3 An asymmetric probability distribution function
where n a is a random number provided by

probability  (na ) , tk (x(t )) is given by Eq. (1) and Eq.


(2), wkp , wk , wk are connection weights between Fuzzy
rules and parameters, Fuzzy rule k  1,2,... K t .

Figure 3 shows a sample of APDF which decides to

selection probability of candidate actions. When a random

number z in (0.0, 1.0) is generated according to uniform


distribution, the n a is given as follows.

1 z
  ln( p )   (0  z  p ) Figure 4 An environment with obstacles and goals

na   (10)
  1 ln( 1  z )   ( p  z  1.0)
So the input to SOFNN had 4 dimensions, each with 0

  1 p
(aisle) or 1 (occupied) values. Goal area with a high
Then, the probabilities of actions, for example, may be labeled
reward 100.0, and crash to obstacle, other agent, wall
in different areas respectively, as shown in Figure 3.
-1.0. When a suitable distance between 2 agents was
3.2 Experiments and Results
considered as positive reward and added into Eq. (5),
The experiment was a simulation that two
the learning method called “swarm learning”, and

70
T. Kuremoto, M. Obayashi, K. Kobayashi, S. Mabu: A Reinforcement Learning System with Neuro-Fuzzy Network

Scientific Research of JSPS (No. 25330287 and No.


“individual learning” means without this constraint.
26330254)
The learning performances of different learning
References
method are shown in Figure 5. Swarm learning
[1] R. S. Sutton, and A. G. Barto: Reinforcement learning: an
showed its priority to the individual learning with introduction, The MIT Press, Cambridge, 1998
[2] L. P. Kaelbling, M. L.Littman: Reinforcement Learning: A
faster convergence. The trajectories of learning results
Survey, Journal of Artificial Intelligence Research, Vol. 4,
are shown in Figure 6. All agents found the goal, 1996, pp.237-285
[3] M. Obayashi, A. Iseki, and K. Umesako: Self-organized
avoided the obstacle in the center of the environment,
reinforcement learning using fuzzy inference for stochastic
however, swarm learning with shorter steps (232) gradient ascent method, Proceedings of the International
Conference on Control, Automation and Systems
than individual learning results (327).
(ICCAS2011), pp.735-738, 2001
[4] M. Obayashi, T. Kuremoto, and K. Kobayashi: A
self-organized fuzzy-neuro reinforcement leaning system for
continuous state space for autonomous robots, Proceedings of
International Conference on Computational Intelligence for
Modeling, Control, and Automation (CIMCA’08), pp. 552-559,
2008.
[5] T. Kuremot, M. Obayashi, and K. Kobayashi: Adaptive
swarm behavior acquisition by a neuro-fuzzy system and
reinforcement learning algorithm, International Journal of
Intelligent Computing and Cybernetics, Vol. 2, No.4, 2009,
pp.724-744
[6] T. Kuremoto, Y. Yamano, M. Obayashi, and K.
Kobayashi: An improved internal model for swarm formation
and adaptive swarm behavior acquisition, Journal of Circuits,
Figure 5 Learning performance
Systems, and Computers, Vol. 18, No. 8, 2009, pp. 1517-1531
[7] T. Kuremoto, Y. Yamano, L.-B. Feng, K. Kobayashi, and
M. Obayashi: A fuzzy neural network with reinforcement
learning algorithm for swarm learning, Lecture Notes in
Electronic Engineering (LNEE), Vol.144, 2011, pp.101-108
[8] T. Kuremoto, M. Obayashi, K. Kobayashi: Neuro-Fuzzy
Systems for Autonomous Mobile Robots. In Horizons in
Computer Science Research, Vol. 8, (ed. Thomas S. Clary), pp.
67-90, Nova Scientific Publishers , 2013
[9] T. Kuremoto, M. Obayashi, A. Yamamoto, and K.
Kobayashi: Neural Prediction of Chaotic Time Series Using
Stochastic Gradient Ascent Algorithm. In Proceedings of the
Figure 6 Trajectories of 2 agents: swarm learning result
35th ISCIE International Symposium on Stochastic Systems
(left) and individual learning result (right) Theory and Its Applications (SSS’03 ), pp. 17-22, 2003
[10] T. Kuremoto, M. Obayashi, and K.
4. Conclusion Kobayashi: Forecasting Time Series by SOFNN with
A RL system with SOFNN and SGA was introduced and Reinforcement Learning. in Proceedings of the 27th Annual
International Symposium on Forecasting (ISF 2007), pp.99,
applied to swarm learning of multiple autonomous agents
2007
firstly. Simulations of unknown environment exploration [11] T. Kuremoto, M. Obayashi, and K. Kobayashi: Neural
Forecasting Systems. In Reinforcement Learning, Theory
using the proposed method were performed and results
and Applications (Eds.: C. Weber, M. Elshaw, and N. M.
showed the effectiveness of the method. Additionally, actions Mayer), pp.1-20, In-Tech , (Online Open Access) 2008
[12] H. Kimura, M. Yamamura, S. Kobayashi: Reinforcement
which result suitable distances of agents were obtained as a
learning in partially observable Markov decision process: a
swarm learning method in the simulation. stochastic gradient method. Japanese Society for Artificial
Intelligence, Vol.11, No. 5, 1996, pp. 85-92 (in Japanese)
Acknowledgement
A part of this work was supported by Graint-in-Aid for

71

View publication stats

You might also like