Neural Networks Reinforcement Learning
Neural Networks Reinforcement Learning
edu
Abhijit Gosavi
1
A. Gosavi
Missouri S & T [email protected]
Outline
2
A. Gosavi
Missouri S & T [email protected]
3
A. Gosavi
Missouri S & T [email protected]
Q-Learning
4
A. Gosavi
Missouri S & T [email protected]
5
A. Gosavi
Missouri S & T [email protected]
Simulator
(environment)
Feedback
r(i,a,j)
RL Algorithm
(Agent)
Action
a
Figure 1: Trial and error mechanism of RL. The action selected by the RL
agent is fed into the simulator. The simulator simulates the action, and the
resultant feedback obtained is fed back into the knowledge-base (Q-factors)
of the agent. The agent uses the RL algorithm to update its knowledge-base,
becomes smarter in the process, and then selects a better action.
5
A. Gosavi
Missouri S & T [email protected]
Q-Learning: Feedback
6
A. Gosavi
Missouri S & T [email protected]
Q-Learning: Algorithm
7
A. Gosavi
Missouri S & T [email protected]
8
A. Gosavi
Missouri S & T [email protected]
Incremental or Batch?
9
A. Gosavi
Missouri S & T [email protected]
10
A. Gosavi
Missouri S & T [email protected]
∑
k
output = w(j)x(j), where
j=0
w(j) is the jth weight of neuron and x(j) is the jth input.
Step 2b: Update each w(i) for i = 0, 1, . . . , k using:
11
A. Gosavi
Missouri S & T [email protected]
12
A. Gosavi
Missouri S & T [email protected]
Qnext (1) = w(1, 1) + w(2, 1)j; Qnext (2) = w(1, 2) + w(2, 2)j.
Step 3b. The current step in turn may contain a number of steps
and involves the neural network updating. Set m = 0, where m is
the number of iterations used within the neural network. Set
mmax , the maximum number of iterations for neuronal updating,
13
A. Gosavi
Missouri S & T [email protected]
14
A. Gosavi
Missouri S & T [email protected]
Remark 2. The step-size µ is the step size of the neuron, and it can
be also be decayed with every iteration m
15
A. Gosavi
Missouri S & T [email protected]
Backpropagation
16
A. Gosavi
Missouri S & T [email protected]
17
A. Gosavi
Missouri S & T [email protected]
18
A. Gosavi
Missouri S & T [email protected]
Future Directions
19
A. Gosavi
Missouri S & T [email protected]
References
20
A. Gosavi
Missouri S & T [email protected]
21
A. Gosavi