0% found this document useful (0 votes)
51 views10 pages

Slides-ASR Presentation Formulation

This document provides an overview of real-time machine learning using PyBrain. It discusses PyBrain and alternative machine learning frameworks, how reinforcement learning differs from other types of machine learning problems, and how PyBrain supports reinforcement learning. It then gives examples of using PyBrain for real-time reinforcement learning, including an inverted pendulum and maze navigation problem. The document concludes by outlining some potential modifications to PyBrain's source code for reinforcement learning tasks.

Uploaded by

Bisma Nusa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views10 pages

Slides-ASR Presentation Formulation

This document provides an overview of real-time machine learning using PyBrain. It discusses PyBrain and alternative machine learning frameworks, how reinforcement learning differs from other types of machine learning problems, and how PyBrain supports reinforcement learning. It then gives examples of using PyBrain for real-time reinforcement learning, including an inverted pendulum and maze navigation problem. The document concludes by outlining some potential modifications to PyBrain's source code for reinforcement learning tasks.

Uploaded by

Bisma Nusa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Real-Time Machine Learning with PyBrain

Boris Mocialov

Engineering & Physical Sciences


Heriot-Watt University,
Edinburgh Centre For Robotics

2015

1 / 10
Outline

PyBrain

Alternatives

Short on RL

PyBrain RL

Examples

Source Alterations

2 / 10
PyBrain

I Easy-To-Use
I Algorithms for ANN, UL, SL, RL, Evolution
I Modular
I FF/R-NN, LSTM, Deep Belief, Boltzmann Machines

3 / 10
ML Alternatives
I FANN(.c/.cpp)
Fast, evolving topologies, adjust parameters on-the-fly
I Encog(.java)
Multi-threaded, SVM, ANN, GP, BN, HMM, GA
I Theano(.py)
Number-crunching framework, tight integration with Numpy,
fast, many sub-projects:
I Pylearn, Theanets [scientific]
I Lasagne [lightweight (FF/C/R-NN), LSTM, CPU/GPU]
I Keras [modular, minimalistic, (C/R-NN), CPU/GPU]
I Caffe(.cpp)
models defined separately, CPU/GPU
I Accord(.net)
combined with audio/video processing libraries, backprop,
DBN, BM
etc.
4 / 10
Short on RL

I Data is spread out in the environment and spates are


distinguished
I Algorithm (agent) must learn mapping between input and
output (behaviour)
I Agent must explore the environment
I Agent receives reinforcement based on the state transitions

5 / 10
PyBrain RL

PyBrain src pybrain.rl.environments.mazes

PyBrain src pybrain.rl.learners.valuebased 6 / 10


Examples
I Inverted Pendulum (aka pole balancing)
Continuous states
Certain Transitions
Neuro-Fitted Q-Learning
Epsilon-Greedy
Stationary
Fully Observable
Finite Horizon

I Maze
Discrete states
Certain Transitions
Q-Learning
Epsilon-Greedy
(Non-)Stationary
Fully Observable
Finite Horizon
7 / 10
Source Alterations
I pybrain.rl.environments.mazes.maze
class Maze(Environment, Named):
initPos = None
def __init__(self, topology, goal, **args):
if self.initPos == None:
self.initPos = self._freePos()
def _freePos(self):
if self.punishing_states != None:
if (i, j) not in self.punishing_states:
res.append((i, j))

I pybrain.rl.environments.mazes.tasks
class MDPMazeTask(Task):
def getReward(self):
if self.env.goal == self.env.perseus:
self.env.reset()
reward = 1
elif self.env.punishing_states != None and
self.env.perseus in self.env.punishing_states:
self.env.reset()
reward = -1
else:
reward = -0.02
return rewar

I pybrain.rl.explorers.discrete.egreedy
class EpsilonGreedyExplorer(DiscreteExplorer):
#self.epsilon *= self.decay

8 / 10
Maze Real-Time Learning Set-Up
envmatrix = array([[1, 1, 1, 1, 1, 1, 1, 1, 1],
...])
env = Maze(envmatrix, (1, 7), [(1, 1)], [(1, 6)])

# create task
task = MDPMazeTask(env)

# create value table and initialize with ones


table = ActionValueTable(81, 4)
table.initialize(0.)

# create agent with controller and learner


learner = Q()

# create agent
agent = LearningAgent(table, learner)

# create experiment
experiment = Experiment(task, agent)

for i in range(5000):
# interact with the environment (here in batch mode)
experiment.doInteractions(200)
agent.learn()
agent.reset()

if i == 2500:
env.clearPunishingStates()

9 / 10
Results

First 2500 Iterations

Second 2500 Iterations

10 / 10

You might also like