A Reinforcement Learning System With Neuro-Fuzzy Network
A Reinforcement Learning System With Neuro-Fuzzy Network
net/publication/269411754
CITATION READS
1 357
4 authors, including:
All content following this page was uploaded by Takashi Kuremoto on 13 December 2014.
1
Graduate School of Science and Engineering, Yamaguchi University, Japan
2
School of Information Science and Technology, Aichi Prefectural University, Japan
1
{wu, m.obayas, mabu}@yamaguchi-u.ac.jp, [email protected]
Abstract: It is important to distinguish different states from the input information for intelligent
systems such as autonomous robots and software agents. In this paper, a self-organizing fuzzy neural
network (SOFNN) which can deal with the above problem is introduced. SOFNN can be applied to time
series forecasting, adaptive action learning of agents, and other fields. As a reinforcement learning system,
SOFNN can use Kimura et al.'s SGA (Stochastic Gradient Ascent) as its learning rule to solve problems in
partially observable Markov decision processes (POMDPs). In this paper, the reinforcement learning system
with SOFNN and SGA learning algorithm are introduced. Applications to behavior learning of agent, time
series prediction are reported.
Key-Words: Self-Organizing Fuzzy Neural Network (SOFNN), Reinforcement Learning (RL), Stochastic Gradient Ascent
(SGA).
1. Introduction machine learning method, has been shown its usability in the
The development of intelligent systems has been giving fields of adaptive control, system identification, pattern
impact to the social for decades. Autonomous robots, for recognition, time series prediction, and so on since several
example, are entering into our daily lives recently, and decades before [1] [2]. In this paper, we will show how a RL
artificial intelligence is showing its attraction and power more system can be applied to different fields such as adaptive
clearly to the social. behavior learning of agent (or autonomous mobile robot)
Facing to the unknown or variable environment, an [3]-[8], swarm learning of agents [5]-[8], and time series
intelligent system is required to identify the situation, extract forecasting [9]-[11]. The RL system is proposed in our
the knowledge hiding in its observed information, and judge previous works which uses a self-organizing fuzzy neural
to output adaptive actions to accomplish its tasks. Tasks for network (SOFNN) as state classifier, and a stochastic gradient
intelligent systems, for example, hunting a prey, finding the ascent (SGA) learning algorithm proposed by Kimura et al.
minimum path of a maze, predicting the future value of a time 2. RL system with SOFNN and SGA
series data, etc., are usually with uncertain elements. In other 2.1 SOFNN
words, intelligence or “know-how” needs to be obtained A self-organizing fuzzy neural network (SOFNN) is
according to the system design and learning algorithm for the proposed by Obayashi et al. [1] (Figure 1).
autonomous systems. For an n-dimension input state
Reinforcement learning (RL), as active unsupervised space xx1 (t ), x 2 (t ),..., x n (t ) , a fuzzy inference net is
68
T. Kuremoto, M. Obayashi, K. Kobayashi, S. Mabu: A Reinforcement Learning System with Neuro-Fuzzy Network
designed with a hidden layer composed by units of a threshold value of whether an input state is
fuzzy membership functions B xi (t ) , i.e., Eq. (1), to
i
k
evaluated enough by existing membership functions.
classify input states. A new rule is generated automatically when a
xi (t ) c
Bik xi (t ) exp
(1)
k
i
2
new membership function is added according to Eq.
2 k2
i
(2). Iteratively, the fuzzy net is completed to adapt to
Here ci , i denotes the mean and the deviation of ith
k k
input states.
membership function which corresponding to ith
dimension of input xi (t ) , respectively.
B
i 1
1
i
data of time series.
Step 2. Predict a future data or an adaptive
For the next input state xx1 (t ), x 2 (t ),..., x n (t ) , a
action yˆ t 1 according to a
new membership function is generated if Eq. (3) is probability yˆ t 1, w, xt .
Step 3. Receive the immediate reward
satisfied. rt from the environment or by
calculating the prediction error.
max Bi ,s ( xi (t )) F (3) Step 4. Improve the policy yˆ t 1, w, xt by
s
renewing its internal variable
Here Bi ,s ( xi (t )) denotes the value of existed w according to Eq. (4).
w w s wt (4)
membership functions calculated by Eq. (1) and Her wt w1 t , w2 t ,, wi t , denotes
synaptic weights, and other internal
s 1,2,..., Li (t ) indicates the sth membership
variables of SOFNN, s is a positive
function with the maximum number Li (t ) . F denotes learning rate.
69
Proceedings of Innovative Application Research and Education (ICIARE2014)
w kptk (x(t ))
p k (8)
tk (x( t ))
k
1 z
ln( p ) (0 z p ) Figure 4 An environment with obstacles and goals
na (10)
1 ln( 1 z ) ( p z 1.0)
So the input to SOFNN had 4 dimensions, each with 0
1 p
(aisle) or 1 (occupied) values. Goal area with a high
Then, the probabilities of actions, for example, may be labeled
reward 100.0, and crash to obstacle, other agent, wall
in different areas respectively, as shown in Figure 3.
-1.0. When a suitable distance between 2 agents was
3.2 Experiments and Results
considered as positive reward and added into Eq. (5),
The experiment was a simulation that two
the learning method called “swarm learning”, and
70
T. Kuremoto, M. Obayashi, K. Kobayashi, S. Mabu: A Reinforcement Learning System with Neuro-Fuzzy Network
71