0% found this document useful (0 votes)
652 views183 pages

Neuromorphic Engineering Systems and Applications PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
652 views183 pages

Neuromorphic Engineering Systems and Applications PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 183

NEUROMORPHIC ENGINEERING

SYSTEMS AND APPLICATIONS

Topic Editors
Andr van Schaik, Tobi Delbruck and
Jennifer Hasler

NEUROSCIENCE
FRONTIERS COPYRIGHT
STATEMENT ABOUT FRONTIERS
Copyright 2007-2015 Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering
Frontiers Media SA.
All rights reserved.
approach to the world of academia, radically improving the way scholarly research is managed.
All content included on this site, such as The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share
text, graphics, logos, button icons, images, and generate knowledge. Frontiers provides immediate and permanent online open access to all
video/audio clips, downloads, data
compilations and software, is the property its publications, but this alone is not enough to realize our grand goals.
of or is licensed to Frontiers Media SA
(Frontiers) or its licensees and/or
subcontractors. The copyright in the text
of individual articles is the property of their FRONTIERS JOURNAL SERIES
respective authors, subject to a license
granted to Frontiers. The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online
The compilation of articles constituting journals, promising a paradigm shift from the current review, selection and dissemination
this e-book, wherever published, as well
as the compilation of all other content on processes in academic publishing.
this site, is the exclusive property of All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service
Frontiers. For the conditions for
downloading and copying of e-books from to the scholarly community. At the same time, the Frontiers Journal Series operates on a revo-
Frontiers website, please see the Terms lutionary invention, the tiered publishing system, initially addressing specific communities of
for Website Use. If purchasing Frontiers
e-books from other websites or sources, scholars, and gradually climbing up to broader public understanding, thus serving the interests
the conditions of the website concerned of the lay society, too.
apply.
Images and graphics not forming part of
user-contributed materials may not be
downloaded or copied without
permission.
DEDICATION TO QUALITY
Individual articles may be downloaded Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interac-
and reproduced in accordance with the
principles of the CC-BY licence subject to
tions between authors and review editors, who include some of the worlds best academicians.
any copyright or other notices. They may Research must be certified by peers before entering a stream of knowledge that may eventually
not be re-sold as an e-book.
reach the public - and shape society; therefore, Frontiers only applies the most rigorous and
As author or other contributor you grant a
CC-BY licence to others to reproduce unbiased reviews.
your articles, including any graphics and Frontiers revolutionizes research publishing by freely delivering the most outstanding research,
third-party materials supplied by you, in
accordance with the Conditions for evaluated with no bias from both the academic and social point of view.
Website Use and subject to any copyright By applying the most advanced information technologies, Frontiers is catapulting scholarly
notices which you include in connection
with your articles and materials. publishing into a new generation.
All copyright, and all rights therein, are
protected by national and international
copyright laws.
The above represents a summary only.
WHAT ARE FRONTIERS RESEARCH TOPICS?
For the full conditions see the Conditions Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are
for Authors and the Conditions for
Website Use. collections of at least ten articles, all centered on a particular subject. With their unique mix
of varied contributions from Original Research to Review Articles, Frontiers Research Topics
unify the most influential researchers, the latest key findings and historical advances in a hot
ISSN 1664-8714 research area!
ISBN 978-2-88919-454-4 Find out more on how to host your own Frontiers Research Topic or contribute to one as an
DOI 10.3389/978-2-88919-454-4 author by contacting the Frontiers Editorial Office: [email protected]

Frontiers in Neuroscience March 2015 | Neuromorphic Engineering Systems and Applications|1


NEUROMORPHIC ENGINEERING
SYSTEMS AND APPLICATIONS

Topic Editors:
Andr van Schaik, University of Western Sydney, Australia
Tobi Delbruck, University of Zurich and ETH Zurich, Switzerland
Jennifer Hasler, Georgia Institute of Technology, USA

Neuromorphic engineering has just reached


its 25th year as a discipline. In the first two
decades neuromorphic engineers focused
on building models of sensors, such as
silicon cochleas and retinas, and building
blocks such as silicon neurons and synapses.
These designs have honed our skills in
implementing sensors and neural networks
in VLSI using analog and mixed mode
circuits.

Malcolm Slaney (Google) leads the Over the last decade the address event
Neuromorphs team down the July 4th parade representation has been used to interface
route in Telluride Colorado in 2012. Copyright devices and computers from different
owner is S.C. Liu. designers and even different groups. This
facility has been essential for our ability to
combine sensors, neural networks, and actuators into neuromorphic systems. More recently,
several big projects have emerged to build very large scale neuromorphic systems.

The Telluride Neuromorphic Engineering Workshop (since 1994) and the CapoCaccia
Cognitive Neuromorphic Engineering Workshop (since 2009) have been instrumental not
only in creating a strongly connected research community, but also in introducing different
groups to each others hardware. Many neuromorphic systems are first created at one of these
workshops. With this special research topic, we showcase the state-of-the-art in neuromorphic
systems.

Frontiers in Neuroscience March 2015 | Neuromorphic Engineering Systems and Applications|2


Table of Contents

05 Research Topic: Neuromorphic Engineering Systems and Applications a


Snapshot of Neuromorphic Systems Engineering
Andr van Schaik, Tobi Delbruck and Jennifer Hasler
07 Adaptive Pulsed Laser Line Extraction for Terrain Reconstruction Using a
Dynamic Vision Sensor
Christian Brandli, Thomas A. Mantel, Marco Hutter, Markus A. Hpflinger,
Raphael Berner, Roland Siegwart and Tobi Delbruck
16 Robotic Goalie with 3ms Reaction Time at 4% CPU Load using Event-Based
Dynamic Vision Sensor
Tobi Delbruck and Manuel Lang
23 Event-Driven Visual Attention for the Humanoid Robot Icub
Francesco Rea, Giorgio Metta and Chiara Bartolozzi
34 On the use of Orientation Filters for 3D Reconstruction in Event-Driven Stereo
Vision
Luis A. Camunas-Mesa, Teresa Serrano-Gotarredona, Sio H. Ieng, Ryad B. Benosman
and Bernabe Linares-Barranco
51 Asynchronous Visual Event-Based Time-to-Contact
Xavier Clady, Charles Clercq, Sio-Hoi Ieng, Fouzhan Houseini, Marco Randazzo,
Lorenzo Natale, Chiara Bartolozzi and Ryad Benosman
61 Real-Time Classification and Sensor Fusion with a Spiking Deep Belief Network
Peter OConnor, Daniel Neil, Shih-Chii Liu, Tobi Delbruck and Michael Pfeiffer
74 Event-Driven Contrastive Divergence for Spiking Neuromorphic Systems
Emre Neftci, Srinjoy Das, Bruno Pedroni, Kenneth Kreutz-Delgado and Gert
Cauwenberghs
88 Compiling Probabilistic, Bio-Inspired Circuits on a Field Programmable Analog
Array
Bo Marr and Jennifer Hasler
97 An Adaptable Neuromorphic Model of Orientation Selectivity Based on Floating
Gate Dynamics
Priti Gupta and C.M. Markan
118 A Mixed-Signal Implementation of a Polychronous Spiking Neural Network
with Delay Adaptation
Runchun M. Wang, Tara J. Hamilton, Jonathan C. Tapson and Andr van Schaik
134 Real-Time Biomimetic Central Pattern Generators in an FPGA for Hybrid
Experimentsf
Matthieu Ambroise, Timothe Levi, Sbastien Joucla, Blaise Yvert and Sylvain Saghi

Frontiers in Neuroscience March 2015 | Neuromorphic Engineering Systems and Applications|3


145 Dynamic Neural Fields as a Step Toward Cognitive Neuromorphic Architectures
Yulia Sandamirskaya
158 A Robust Sound Perception Model Suitable for Neuromorphic Implementation
Martin Coath, Sadique Sheik, Elisabetta Chicca, Giacomo Indiveri, Susan L. Denham
and Thomas Wennekers
168 An Efficient Automated Parameter Tuning Framework for Spiking Neural
Networks
Kristofor D. Carlson, Jayram Moorkanikara Nageswaran, Nikil Dutt and
Jeffrey L. Krichmar

Frontiers in Neuroscience March 2015 | Neuromorphic Engineering Systems and Applications|4


EDITORIAL
published: 19 December 2014
doi: 10.3389/fnins.2014.00424

Research topic: neuromorphic engineering systems and


applications. A snapshot of neuromorphic systems
engineering
Tobi Delbruck 1 , Andr van Schaik 2* and Jennifer Hasler 3
1
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
2
Bioelectronics and Neuroscience, The MARCS Institute, University of Western Sydney, Sydney, NSW, Australia
3
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
*Correspondence: [email protected]

Edited and reviewed by:


Giacomo Indiveri, University of Zurich and ETH Zurich, Switzerland

Keywords: neuromorphic engineering, neural networks, event-based, spiking neural networks, dynamic vision sensor, floating gate, neural simulation,
synaptic plasticity

The 14 papers in this research topic were solicited primarily from the Koch-Ittie visual saliency model was adapted to event-driven
attendees to the two most important hands-on workshops in sensors and many experiments were done to characterize its
neuromorphic engineering: the Telluride Neuromorphic Cognition effectiveness and efficiency.
Engineering Workshop (www.ine-web.org) and the Capo Caccia Camunas-Mesa et al. (2014) reports on a stereo vision system
Cognitive Neuromorphic Engineering Workshop (capocaccia.ethz. that uses a pair of their DVS cameras together with an FPGA that
ch). The papers show the results of feasibility studies of new computes low level oriented features. This paper has a wealth of
concepts, as well as neuromorphic systems that have been con- characterization results.
structed from more established neuromorphic technologies. Five Clady et al. (2014) report the first results of computing the
papers exploit neuromorphic dynamic vision sensor (DVS) events old problem of time to contact (TTC) for a moving entity from
that mimic the asynchronous and sparse spikes on biologys optic DVS events. They take a geometrical approach in order to extract
nerve fiber (Delbruck and Lang, 2013; OConnor et al., 2013; Rea low level motion features from the DVS events to obtain the TTC
et al., 2013; Brandli et al., 2014; Camunas-Mesa et al., 2014; Clady information. This paper includes robotic experiments.
et al., 2014). Two papers are on the hot topic (based on largest
number of views) of event-driven computation in deep belief net- REGARDING EVENT-DRIVEN DEEP NETWORKS
works (DBNs) (OConnor et al., 2013; Neftci et al., 2014). Two OConnor et al. (2013) present the first of the papers to focus
papers use floating gate technology for neuromorphic analog cir- on event-based learning and networks. Their system also uses a
cuits (Gupta and Markan, 2014; Marr and Hasler, 2014). The DVS, as well as an AEREAR2 binaural silicon cochlea, to build
collection is rounded out by papers on central pattern generators a spike-based DBN for recognizing MNIST digits presented in
(Ambroise et al., 2013), neural fields for cognitive architectures conjunction with pure tones. They demonstrate that a DBN con-
(Sandamirskaya, 2014), sound perception (Coath et al., 2014), structed from stacks of restricted Boltzmann machines (RBMs)
polychronous spiking networks (Wang et al., 2014), and auto- is valuable for learning and computing sensor fusion. They also
matic parameter tuning for large network simulations (Carlson show that a DBNs recurrent persistent activity is useful partic-
et al., 2014). ularly with sparse event-driven sensor input. This network was
trained off-line, and then the weights were transferred onto the
REGARDING THE EVENT-BASED VISION PAPERS spiking network.
Brandli et al. (2014) report on a novel method for rapidly and Neftci et al. (2014) report on the same target application of
cheaply tracking a flashing laser line using a DVS, which is aimed MNIST digit recognition, but their paper takes a further step by
at building a fast obstacle detector for small ground-based robots, proposing how a network of integrate and fire neurons can imple-
such as vacuum cleaners or toy cars. Particular novelties in this ment a RBM, and can be trained with an event-driven version
paper are the adaptive temporal filter and the efficient algorithmic of the well-known contrastive divergence training algorithm for
update of the laser line position. RBMs.
Delbruck and Lang (2013) report on the detailed implemen-
tation of a fun robotic goalie, which uses a DVS to help track the REGARDING FLOATING GATE TECHNOLOGY
balls and robot arm. The paper includes measurements of USB Two papers show the versatility of floating-gate (FG) circuit
latency. The novelty in this paper is the self-calibration of the approaches. Marr and Hasler (2014) describe a collaborative
goalie arm so that it can be rapidly placed in a particular place in project started and effectively completed during the Telluride
visual space. Both of the preceding two papers include YouTube 2008 workshop as a representative of the possible opportu-
citations to videos and both have open-source implementations. nity at any of these workshops. In this case, the opportunity
Rea et al. (2013) report on the integration of a stereo pair was enabled through the use of large-scale field programmable
of DVS sensors with the iCub robot, and how they are used for analog arrays (FPAA) as a mixed mode processor for which
quick, low power saliency detection for the iCub. In particular, functions can be compiled enabling a range of circuit, system,

www.frontiersin.org December 2014 | Volume 8 | Article 424 | 5


Delbruck et al. Neuromorphic engineering systems and applications

and application design. The focus was on stochastic compu- SUMMARY


tations that are dynamically controllable via voltage-controlled Amidst the promises offered by projects with major chunks of
amplifiers and comparator thresholds. From Bernoulli vari- funding in neuromorphic engineering like HBP, BrainScaleS,
ables it is shown that exponentially distributed random vari- SpiNNaker, and TrueNorth, this research topic offers a refresh-
ables, and random variables of an arbitrary distribution can ing glimpse into some of the current actual accomplishments in
be computed. The trajectory of a biological system com- neuromorphic systems engineering and applications.
puted stochastically with this probabilistic hardware results
in a 127X performance improvement over current software REFERENCES
approaches. Ambroise, M., Levi, T., Joucla, S., Yvert, B., and Saghi, S. (2013). Real-time
biomimetic central pattern generators in an FPGA for hybrid experiments.
Gupta and Markan (2014), report on a FG adaptive system for Front. Neurosci. 7:215. doi: 10.3389/fnins.2013.00215
investigating self-organization of image patterns. They describe Brandli, C., Mantel, T. A., Hutter, M., and Delbruck, T. (2014). Adaptive pulsed
adaptive feature selectivity as a mechanism by which nature opti- laser line extraction for terrain reconstruction using a dynamic vision sensor.
mizes resources so as to have greater acuity for more abundant Front. Neurosci. 7:275. doi: 10.3389/fnins.2013.00275
Camunas-Mesa, L. A., Serrano-Gotarredona, T., Ieng, S. H., Benosman, R.
features. The authors look to exploit hardware dynamics to build
B., and Linares-Barranco, B. (2014). On the use of orientation filters for
adaptive systems utilizing time-staggered winner-take-all circuits, 3D reconstruction in event-driven stereo vision. Front. Neurosci. 8:48. doi:
exploiting the adaptation dynamics of FG transistors, to model an 10.3389/fnins.2014.00048
adaptive cortical cell. Carlson, K. D., Nageswaran, J. M., Dutt, N., and Krichmar, J. L. (2014). An efficient
automated parameter tuning framework for spiking neural networks. Front.
Neurosci. 8:10. doi: 10.3389/fnins.2014.00010
REGARDING OTHER TOPICS IN NETWORK ARCHITECTURES Clady, X., Clercq, C., Ieng, S.-H., Houseini, F., Randazzo, M., Natale, L., et al.
Wang et al. (2014) report results from a polychronous multi- (2014). Asynchronous visual event-based time-to-contact. Front. Neurosci. 8:9.
neuron chip. Polychronization is the process in which spikes doi: 10.3389/fnins.2014.00009
Coath, M., Sheik, S., Chicca, E., Indiveri, G., Denham, S., and Wennekers, T. (2014).
travel down axons with specific delays to arrive at a common
A robust sound perception model suitable for neuromorphic implementation.
target neuron simultaneously and cause it to fire, despite the Front. Neurosci. 7:278. doi: 10.3389/fnins.2013.00278
source neurons firing asynchronously. This paper shows digi- Delbruck, T., and Lang, M. (2013). Robotic goalie with 3ms reaction time at 4%
tal and analog tradeoffs and offers advice for scaling to future CPU load using event-based dynamic vision sensor. Front. Neurosci. 7:223. doi:
technologies. 10.3389/fnins.2013.00223
Gupta, P., and Markan, C. M. (2014). An adaptable neuromorphic model of ori-
Ambroise et al. (2013) describe a neuromorphic implementa-
entation selectivity based on floating gate dynamics. Front. Neurosci. 8:54. doi:
tion of a network of 240 Central Pattern Generator modules mod- 10.3389/fnins.2014.00054
eling the leech heartbeat neural network on a field programmable Marr, B., and Hasler, J. (2014). Compiling probabilistic, bio-inspired cir-
gate array. It uses the Izhikevich neuron model, implemented as cuits on a field programmable analog array. Front. Neurosci. 8:86. doi:
a single computational core, time multiplexed to update all the 10.3389/fnins.2014.00086
Neftci, E., Das, S., Pedroni, B., Kreutz-Delgado, K., and Cauwenberghs, G. (2014).
neurons in the network. In order to fit the digital implementation Event-driven contrastive divergence for spiking neuromorphic systems. Front.
to the data from the biological system without implementing all Neurosci. 7:272. doi: 10.3389/fnins.2013.00272
the detailed synaptic dynamics, which would take up too many OConnor, P., Neil, D., Liu, S.-C., Delbruck, T., and Pfeiffer, M. (2013). Real-
resources, they propose a new synaptic adaptation model: an time classification and sensor fusion with a spiking deep belief network. Front.
activity-dependent depression synapse. Neurosci. 7:178. doi: 10.3389/fnins.2013.00178
Rea, F., Metta, G., and Bartolozzi, C. (2013). Event-driven visual attention for the
Sandamirskaya (2014) leverages the relationship between humanoid robot iCub. Front. Neurosci. 7:234. doi: 10.3389/fnins.2013.00234
dynamic field theory networks and neuromorphic circuits using Sandamirskaya, Y. (2014). Dynamic neural fields as a step toward cognitive neuro-
soft winner take all circuits (WTA) to formally describe the equiv- morphic architectures. Front. Neurosci. 7:276. doi: 10.3389/fnins.2013.00276
alence between the two and establish a common ground. It sets Wang, R. M., Hamilton, T. J., Tapson, J., and van Schaik, A. (2014). A mixed-signal
implementation of a polychronous spiking neural network with delay adapta-
a possible roadmap for the development of cognitive neuromor-
tion. Front. Neurosci. 8:51. doi: 10.3389/fnins.2014.00051
phic systems using WTA implementations.
Coath et al. (2014) describes a pattern recognition network Conflict of Interest Statement: The Associate Editor Giacomo Indiveri declares
implemented using a column of three neurons in which the that, despite being affiliated to the same institution as author Tobi Delbruck,
columns are connected via axons with delays that explicitly the review process was handled objectively and no conflict of interest exists. The
authors declare that the research was conducted in the absence of any commercial
depend on the distance between the columns. The networked is
or financial relationships that could be construed as a potential conflict of interest.
trained using spike-timing dependent plasticity and it is shown
that the performance of the network is robust to natural varia- Received: 24 November 2014; accepted: 03 December 2014; published online: 19
tions in the input stimuli. December 2014.
Carlson et al. (2014) address the significant problem of finding Citation: Delbruck T, van Schaik A and Hasler J (2014) Research topic: neuro-
morphic engineering systems and applications. A snapshot of neuromorphic systems
solutions in the enormous parameter space found in implemen- engineering. Front. Neurosci. 8:424. doi: 10.3389/fnins.2014.00424
tations of spiking neural networks by proposing an automated This article was submitted to Neuromorphic Engineering, a section of the journal
tuning framework. Their approach uses evolutionary algorithms Frontiers in Neuroscience.
implemented on graphics processing units for speed. They use Copyright 2014 Delbruck, van Schaik and Hasler. This is an open-access article
an objective function based on the Efficient Coding Hypothesis distributed under the terms of the Creative Commons Attribution License (CC BY).
The use, distribution or reproduction in other forums is permitted, provided the
to tune these networks. In their example, they demonstrate the original author(s) or licensor are credited and that the original publication in this
evolution of V1 simple cell responses. Using GPU parallelization, journal is cited, in accordance with accepted academic practice. No use, distribution or
they report 65x speedups over CPU implementations. reproduction is permitted which does not comply with these terms.

Frontiers in Neuroscience | Neuromorphic Engineering December 2014 | Volume 8 | Article 424 | 6


ORIGINAL RESEARCH ARTICLE
published: 17 January 2014
doi: 10.3389/fnins.2013.00275

Adaptive pulsed laser line extraction for terrain


reconstruction using a dynamic vision sensor
Christian Brandli 1*, Thomas A. Mantel 2 , Marco Hutter 2 , Markus A. Hpflinger 2 , Raphael Berner 1 ,
Roland Siegwart 2 and Tobi Delbruck 1
1
Department of Information Technology and Electrical Engineering, Institute of Neuroinformatics, ETH Zurich and University of Zurich, Zurich, Switzerland
2
Autonomous Systems Lab, Department of Mechanical and Process Engineering, ETH Zurich, Zurich, Switzerland

Edited by: Mobile robots need to know the terrain in which they are moving for path planning
Andr Van Schaik, The University of and obstacle avoidance. This paper proposes the combination of a bio-inspired,
Western Sydney, Australia
redundancy-suppressing dynamic vision sensor (DVS) with a pulsed line laser to allow
Reviewed by:
fast terrain reconstruction. A stable laser stripe extraction is achieved by exploiting the
Christoph Posch, Universite Pierre
et Marie Curie, France sensors ability to capture the temporal dynamics in a scene. An adaptive temporal filter
Viktor Gruev, Washington University for the sensor output allows a reliable reconstruction of 3D terrain surfaces. Laser stripe
in St. Louis, USA extractions up to pulsing frequencies of 500 Hz were achieved using a line laser of 3 mW
Garrick Orchard, National University
of Singapore, Singapore
at a distance of 45 cm using an event-based algorithm that exploits the sparseness of the
sensor output. As a proof of concept, unstructured rapid prototype terrain samples have
*Correspondence:
Christian Brandli, Department of been successfully reconstructed with an accuracy of 2 mm.
Information Technology and
Keywords: neuromorphic, robotics, event-based, address-event representation (AER), dynamic vision sensor
Electrical Engineering, Universitt
(DVS), silicon retina
Zrich, Winterthurerstr. 190, 8057,
Zurich, Switzerland
e-mail: [email protected]

INTRODUCTION 1 Hz up to 500 Hz. The flexibility in choosing the pulsing frequen-


Motion planning in mobile robots requires knowledge of the ter- cies allows fast and detailed surface reconstructions for fast robot
rain structure in front of and underneath the robot; possible motions as well as saving laser power for slow motions.
obstacles have to be detected and their size has to be evaluated.
Especially legged robots need to know the terrain on which they THE DYNAMIC VISION SENSOR (DVS)
are moving so that they can plan their steps accordingly. A vari- The DVS used in this setup is inspired by the functionality of the
ety of 3D scanners such as the Microsoft Kinect (Palaniappa retina and senses only changes in brightness (Lichtsteiner et al.,
et al., 2011) or LIDAR (Yoshitaka et al., 2006; Raibert et al., 2008). Each pixel reports a change in log-illuminance larger than
2008) devices can be used for this task but these sensors and their a given threshold by sending out an asynchronous address-event:
computational overhead typically consume on the order of sev- if it becomes brighter it generates a so called ON event, and
eral watts of power while having a sample rate limited to tens of if darker, it generates an OFF event. The asynchronously gen-
Hertz. Passive vision systems partially overcome these limitations erated address-events are communicated to a synchronous pro-
but they exhibit a limited spatial resolution because their terrain cessing device by a complex programmable logic device (CPLD)
reconstruction is restricted to a small set of feature points (Weiss which also transmits the time in microseconds at which the event
et al., 2010). occurred. Each event contains the pixel horizontal and vertical
Many of the drawbacks in existing sensor setups (active as well address (u,v), its polarity (ON/OFF) and the timestamp. After
as passive) arise from the fact that investigating visual scenes as the event is registered, it is written into a FIFO buffer which is
a stroboscopic series of (depth) frames leads to redundant data transferred through a high-speed USB 2.0 interface to the process-
that occupies communication and processing bandwidth and lim- ing platform. Real-time computations on the processing platform
its sample rates to the frame rate. If the redundant information operate on the basis of so called event packets which can contain
is already suppressed at the sensor level and the sensor asyn- a variable number of events but are delivered at a minimum fre-
chronously reports its output, the output can be evaluated faster quency of 1 kHz. This approach of sensing a visual scene has the
and at a lower computational cost. In this paper such a vision sen- following advantages:
sor, the so called dynamic vision sensor (DVS; Lichtsteiner et al.,
2008) is combined with a pulsed line laser, forming an active sen- 1. The absence of a global exposure time lets each pixel settle
sor to reconstruct the terrain in front of the system while it is to its own operating point which leads to a dynamic range of
moved. This terrain reconstruction is based on a series of surface more than 120 dB.
profiles based on the line laser pulses. The proposed algorithm 2. Because the pixels only respond to brightness changes, the out-
allows extracting the laser stripe from the asynchronous temporal put of the sensor is non-redundant. This leads to a decrease
contrast events generated by the DVS using only the event timing in processor load and therefore to a reduction in power con-
so that the laser can be pulsed at arbitrary frequencies from below sumption of the system.

www.frontiersin.org January 2014 | Volume 7 | Article 275 | 7


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

FIGURE 1 | Wheel spinning at 3000 rpm. (A) Still image. (B) Events generated in 30 ms: ON events rendered white, OFF events in black. (C) Events
generated in 200 us.

3. The asynchronous readout allows a low latency of as little as


15 us. This latency allows to close control loops very quickly as
demonstrated in Delbruck and Lichtsteiner (2007); Conradt
et al. (2009); Ni et al. (2012). Figure 1 shows the speed of the
DVS, which is capable of resolving fast movements such as a
wheel spinning at 3000 rpm.
4. Since the events are timestamped as they occur (with a tempo-
ral resolution of 1 us), the output allows a detailed analysis of
the dynamics in a scene or to process its output using temporal
filters.

In the following, the output of the DVS is described as a set of


events and each event Ev carries its u- and v-address, a timestamp
and its polarity as a value of +1 if it is an ON event and a 1 for
OFF events [with notation adapted from Ni et al. (2012)].

+1, if  ln(Iu,v ) > ON
Ev(u, v, t) = (1)
1, if  ln(Iu,v ) < OFF

where  ln(Iu,v ) denotes the change in illumination at the pixel


with coordinates u,v since the last event. ON and OFF denote
the event thresholds that must be crossed to trigger an event.
These thresholds can be set independently which allows balancing
the number of ON and OFF events.
In addition to these visually triggered events, the DVS allows
FIGURE 2 | Setup of the DVS together with the line laser. (A) Schematic
the injection of special, timestamped trigger events to the output view of the setup. (B) Photo of the DVS128 camera with line laser: the rigid
stream by applying a pulse to a pin on the back of the sensor. laser mount allows a constant distance and inclination angle of the laser
These Et events are numbered in software so that they carry a with respect to the camera. The optical filter is mounted on the lens.
pulse number and a timestamp:

Etn = t. (2) the DVS was fixed. To run the terrain reconstruction, the system is
moved over the terrain while the laser is pulsed at a frequency fp .
Each pulse of the laser initiated the acquisition of a set of events
MATERIALS AND METHODS for further analysis and laser stripe extraction. The background
HARDWARE SETUP illumination level was a brightly-lit laboratory at approximately
As reviewed in Forest and Salvi (2002), there are several variations 500 lx.
of combining a line laser and a camera to build a 3D scanner. Since For the measurements described in the results section, the sys-
it is intended to apply this scanner setup on a mobile robot that tem was fixed and the terrain to scan was moved on an actuated
already has a motion model for the purpose of navigation, a mir- sled on rails underneath it. This led to a straight-forward cam-
ror free, fixed geometry setup was chosen. As shown in Figure 2, era motion model controlled by the speed of the DC motor that
a red line laser (Laser Components GmbH LC-LML-635) with a pulled the sled toward the sensor system. The sled was fixed to
wavelength of 635 nm and an optical power of about 3 mW was rails which locked the system in one dimension and led to highly
mounted at a fixed distance above the DVS. (The laser power con- repeatable measurements. The DVS was equipped with a lens hav-
sumption was 135 mW.) The relative angle of the laser plane and ing a focal length of 10 mm and it was aimed at the terrain from

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 275 | 8


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

a distance of 0.45 m. The laser module was placed at a distance the set of calibration equations (Equations 35). This proce-
of 55 mm from the sensor at an inclination angle L of 8 with dure is done manually in Matlab but needs only to be done
respect to the principal axis of the DVS. The system observed the once.
scene at an inclination angle C of 39 .
To enhance the signal to noise ratio, i.e., the percentage of LASER STRIPE EXTRACTION
events originating from the pulsed laser line, the sensor was The stripe extraction method is summarized in Figure 5. Most
equipped with an optical band pass filter (Edmund Optics NT65- laser stripe extraction algorithms perform a simple column-
167) centered at 636 nm. The filter has full width at half maxi- wise maximum computation to find the peak in light intensity
mum (FWHM) of 10 nm and a transmittance of 85% in the pass e.g., Robinson et al. (2003); Orghidan et al. (2006). Accordingly
band and less than 0.01% in the stop band (optical density 4.0). for the DVS the simplest approach to extract the laser stripe
To mark the laser pulses within the event stream, the event trig- would be to accumulate all events after a laser pulse and find
ger pin on the back of the DVS was connected to the function the column-wise maximum in activity. This approach performs
generator triggering the laser. poorly due to background activity: Even with the optical fil-
ter in place, contrast edges that move relative to the sensor also
CALIBRATION induce events which corrupt the signal to noise ratio. For a more
To extract the laser stripe, i.e., the pixels whose events originate robust laser strip extraction, spatial constraints could be intro-
from the laser line, the sensor is calibrated based on the approach duced but this would restrict the generality of the approach
described in Siegwart (2011). The model was simplified by the (Usamentiaga et al., 2010). Instead the proposed approach
following assumptions: exploits the highly resolved temporal information of the output of
the DVS.
1. For the intrinsic camera model, rectangular pixels with
orthogonal coordinates u,v are assumed. This leads to the
following transformation from pixel coordinates to camera
coordinates xC , yC , zC :

kfl
u= xC + u0 (3)
zC
kfl
v= yC + v0 (4)
zC

where k denotes the inverse of the pixel size, fl the focal length
in pixels, and u0 , v0 the center pixel coordinates.
2. For the extrinsic camera model it was assumed that the rail
restricts the origin of the camera xC0 , yC0 , zC0 to a planar
translation (by ty and tz ) within a plane spanned by the y- and
z-axis of the world reference frame xR , yR , and zR as depicted
in Figure 3. In the setup used for the measurement, the rota- FIGURE 3 | The coordinate systems used along the scanning
tional degrees of freedom of the system were constrained so direction. yR , zR are the real world coordinates, yC , zC the ones of
the camera. xL is the distance of the laser line plane perpendicular to
that the that the camera could only rotate (by C ) around its nL from the camera origin. C is the inclination angle of the sensor
x-axis which leads to following transformation from camera to with respect to the horizontal plane and L the laser inclination angle
world coordinates: with respect to the camera.


xR 1 0 0 xC 0
yR = 0 cos(C + ) sin(C + ) yC + ty
2 2
zR 0 sin(C + 2 ) cos(C + 2 ) zC tz
(5)

The fact that the DVS does not produce any output for
static scenes makes it difficult to find and align correspon-
dences and therefore the typical checkerboard pattern could
not be used for calibration. As an alternative, the laser was
pulsed onto two striped blocks of different heights as depicted
in Figure 4. The black stripes on the blocks absorb suffi- FIGURE 4 | The calibration setup. The pulsed laser shines onto two
striped blocks of different height. (A) Schematic view. (B) Schematic of the
cient laser light to not excite any events in the DVS. This
DVS output: the laser is absorbed by the black stripes and only the white
setup allows finding sufficient correspondence points between stripes generate events.
the real world coordinates and the pixel coordinates to solve

www.frontiersin.org January 2014 | Volume 7 | Article 275 | 9


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

With the help of the laser trigger events Etn , the event stream event scores. This score map tells us for each pixel how well-timed
can be sliced into a set of time windows Wn each containing a the events were with respect to the nth trigger event, and it is
set of events Sn where n denotes the nth trigger event. ON and computed by Equations 89:
OFF events are placed into separate sets (for simplicity only the
formulas for the ON events are shown): CnON (u, v) = {Ev(u , v , t) : Ev SON  
n u = u v = v} (8)
 
Mn (u, v) = PnON (Ev) + PnOFF (Ev) (9)
Wn = {t : t > Etn t < Etn+1 } (6) CON (u,v) COFF (u,v)

SON
n = {Ev(u, v, t) : t Wn Ev > 0} (7) The scoring function P that assigns each event a score indicat-
ing how probable it is that it was caused by the laser pulse Etn is
The timing of the events is jittered by the asynchronous com- obtained by using another histogram-based approach. The ratio-
munication and is also dependent on the sensors bias settings nale behind this approach is the following: All events that are
and light conditions. Our preliminary experiments showed that caused by the laser pulse should be temporally correlated with it
it is not sufficient to only accumulate the events in a fixed time while noise events should show a uniform temporal distribution.
window after the pulse. Instead a stable laser stripe extraction In a histogram with binned relative times the events triggered by
algorithm must adaptively collect relevant events. This adapta- the laser pulse should form peaks. In the proposed algorithm,
tion is achieved by using of a temporal scoring function P which the histogram Hn consists of k bins Bn of width fk. For stabil-
is continually updated as illustrated in Figure 6. ity, Hn is an average over m laser pulses. Hn is constructed by
The scoring function is used as follows: Each event obtains Equations 1012:

a score s = P(Ev) depending only on its time relative to the last
trigger. From these s a score map Mn (Figure 5) is established Dn (l) = Ev(u, v, t) : Ev SON
ON
n t
where each pixel (u,v) of Mn contains the sum of the scores of all 
the events with address (u,v) within the set Sn [these subsets of Sn l l+1
Etn t Etn < (10)
are denoted as Cn (u, v)]. In other words, Mn is a 2D histogram of fk fk

FIGURE 5 | Schematic overview of the laser stripe extraction filter. At the score maps. The maps are averaged and the laser stripe is extracted by
arrival of each laser pulse the temporal histograms are used to adapt the selecting the maximum scoring pixel for each column, if it is above the
scoring function P, and each events score is calculated and mapped on the threshold peak .

FIGURE 6 | Scoring function: examples of event histograms of the laser pulsed at 1 kHz at the relief used for the reconstruction. (A) Measured histograms
of ON and OFF events following laser pulse ON and OFF edges. (B) Resulting OFF and ON scoring functions after normalization and mean subtraction.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 275 | 10


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

1
n  Algorithm 1 | Pseudo code for the laser stripe extraction.
BON
n (l) = Ev (11)
//iterate over all events in a packet
i = n m DON (l)
i for event:packet

HnON = BON
n (l) : l [0, k 1] (12) //the laser stripe extraction is only done at
//the arrival of a new pulse
where f is the laser frequency, l is the bin index, k is the number if(event.isTrigger)
of bins, Dn (l) is a temporal bin of the set Sn , Bn (l) is a bin of the lastTrigger = event.timestamp
averaged histogram over the m and the histogram Hn is the set of histogramAverage.removeOldest()
all bins Bn . It is illustrated in Figure 6A. histogramAverage.add(histogram)
To obtain the scoring function P, the HnON and HnOFF his- histogram.clear()
tograms are normalized by the total number T of events in them. //update done according to Equation (14)
To penalize bins that have a count below the average i.e., bins that scoreFunction.update(histogramAverage)
are dominated by the uniformly distributed noise, the average bin averageMap.removeOldest()
count T/k is subtracted from each bin. An event can have a neg- averageMap.add(scoreMap)
ative score. This is the case if it is more probable that it is noise laserLine = averageMap.findColumPeaks()
than signal. Tn is computed from Equation 13: else
//update of histogram


TnON = BON ON ON deltaT = lastTrigger-event.timestamp


n : Bn Hn (13)
binIndex = deltaT*k/period
The nth scoring function Pn (illustrated in Figure 6B) is com- histogram.bin[binIndex]++
puted from Equation 14: //update of score map
score = scoreFunction.get(binIndex)

ON
ON Tn scoreMap[event.u][event.v]+ =score
BON
n : Ev Bn k end if
PnON (Ev) = (14)
TnON
each event only increases the bin count. The new score function
To extract the laser stripe, the last o score maps are averaged and
is computed from the accumulated histogram by normalizing it
the maximum score s(u,v) and its y value are determined for each
only after the mth pulse.
column. If the maximum value is above a threshold peak it is con-
The score map computation is optimized by accumulating
sidered to be a laser stripe pixel. If the neighboring pixels are also
event scores for o laser pulses. Each event requires a lookup of
above the threshold, a weighted average is applied among them
its score and a sum into the score map. After each sum, if the new
to determine the center of the laser stripe. The positions of the
score value is higher than the previous maximum score for that
laser stripe are then transformed into real world coordinates using
column, then the new maximum score value and its location are
Equations 35 and thus mapped as surface points.
stored for that column. This accumulation increases the latency
The pseudo-code shown in Algorithm 1 illustrates how the
by a factor of o, but is necessary in any case when the DVS events
algorithm is executed: Only on the arrival of a new laser trigger
are not reliably generated by each pulse edge.
event, the histograms are averaged, the score maps are averaged to
After the o laser pulses are accumulated, the search of the col-
an average score map and the laser stripe is extracted. Otherwise,
umn wise maxima laser line pixels is based on the maximum
for each DVS event only its contribution to the current score map
values and their locations stored during accumulation. For each
is computed, using the current scoring function. The laser stripe
column, the weighted mean location of the peak is computed
extraction and computation of the scoring function operate on
starting at the stored peak value and iterating over pixels up and
different time scales. While the length o of the moving average for
down from the peak location until the score drops below the
the scoring function is chosen as small as possible to ensure a low
threshold value. This way, only a few pixels of the score map are
latency, the number of histograms m to be averaged is chosen as
inspected for each column.
large as possible to obtain higher stability and dampen the effect
The final step is to reset the accumulated score map and
of variable background activity.
peak values to zero. This low-level memory reset is done by
Algorithm optimization microprocessor logic hardware and is very fast.
To reduce the memory consumption and the computational Results of these optimizations are reported in Results.
cost of this frame-based algorithm, the computations of the
scoring function, the accumulation of evidence into a score map, PARAMETER SETTINGS
and the search for the laser line columns were optimized to be Because the DVS does analog computation at the pixel level, the
event-based. behavior of the sensor depends on the sensor bias settings. These
The average histogram changes only on a long time scale settings can be used to control parameters such as the tempo-
(depending on lighting conditions and sensor biasing) and this ral contrast cutoff frequency and the threshold levels. For the
fact is exploited by only updating the averaged histogram every experiments described in the following, the bias settings were
mth pulse. The m histograms do not have to be memorized and optimized to report small as well as fast changes. These settings

www.frontiersin.org January 2014 | Volume 7 | Article 275 | 11


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

lead to an increase in noise events which does not affect the Table 1 | Performance of the line extraction algorithm.
performance because they are filtered out successfully with the
Frequency (Hz) False positives (%)
algorithm described previously. Furthermore, the biases are set to
produce a clear peak in the temporal histogram of the OFF events 50 0.14
(Figure 6). The variation in the peak form for ON and OFF events 100 <0.01
is caused by the different detection circuits for the two polari- 200 0.03
ties in the pixel (Lichtsteiner et al., 2008) and different starting 500 5.75
illumination conditions before the pulse edges.
The line laser is not strong enough to perform well at frequencies above 200 Hz.
The parameters for the algorithm are chosen heuristically: The
bin size is fixed to 50 us, the scoring function average is taken over
a sliding window size m = 1000 histograms, the stripe detection
is set to average o = 3 probability maps, and the peak threshold
for the line detection is chosen to be peak = 1.5.
Firstly, the performance of the stripe extraction algorithm was
measured. Because the performance of the system is limited by
the strength of the laser used, the capabilities of the DVS using
a stronger laser were characterized to investigate the limits of the
approach. Finally, a complex 3D terrain was used to assess the
performance under more realistic conditions.

RESULTS
The laser stripe extraction results presented in the following were
run in real-time as the open-source jAER-filter FilterLaserLine
(jAER, 2007) on an Intel Core i7 975 @ 3.33 GHz Windows
7 64 platform using Java 1.7u45. The 3D reconstruction was
run off-line in Matlab on the same platform. FIGURE 7 | Number of events at a pixel per laser pulse of a 4.75 mW
Comparing the computational cost to process an event (mea- point laser. Although the event count drop with higher frequencies, the
sured in CPU time) between the frame-based and the event-based average does not drop below 1 event per cycle even at 2 kHz.

algorithm with o = 10 pulses showed an 1800% improvement


from 900 to 50 ns per event. This improvement is a direct result of
the sparse sensor output: For each laser line point update, only a limited to about this frequency. Therefore, not every pulse results
few active pixels around the peak value in the score map column in a DVS event, and so the laser stripe can only be found in a
are considered, rather than the entire column. At the typical event few columns which leads to a degradation of the reconstruction
rate of 500 keps observed in the terrain reconstruction example, quality.
using a laser pulse frequency of 500 Hz, a single core of this (pow- To explore how fast the system could go, another laser setup
erful) PC is occupied 2.5% of its available processor time using was used: A stronger point laser (4.75 mW, Class C) was pulsed
the event-based algorithm. Turning off the scoring function his- using a mechanical shutter to avoid artifacts from the rise and fall
togram update further decreases compute time to an average of time of the electronic driver. This point was recorded with the
30 ns/event, only 25 ns more than processing event packets with a DVS to investigate whether it can elicit more at least one event
no operation jAER filter that iterates over packets of DVS events per polarity and pulse at high frequencies. The measurements
without doing anything else. in Figure 7 show that even at frequencies exceeding 2 kHz suffi-
cient events are triggered by the pulse. The mechanical shutter did
EXTRACTION PERFORMANCE not allow pulsing the laser faster than 2.1 kHz so the DVS might
To assess the line-detection performance of the stripe extraction even go faster. The increase of events per pulse above 1.8 kHz is
algorithm, a ground truth was manually established for a scenario probably caused by resonances in the DVS photoreceptor circuits
in which a plain block of uniform height was passed under the which facilitate the event generation. These findings indicate that
setup. The block was moved at about 2 cm/s to investigate the a system using a sufficiently strong line laser should be capable of
performance of the laser stripe extraction algorithm at different running at up to 2 kHz.
frequencies. In Table 1, the results of these measurements are dis-
played: False positives designates the ratio of events wrongly TERRAIN RECONSTRUCTION
associated to the line over the total number of events. The per- As a proof of concept and as well for studying possible applica-
formance of the algorithm drops at a frequency of 500 Hz and tions and shortcomings of the approach, an artificial terrain was
because the DVS should be capable of detecting temporal con- designed with a CAD program and it was fabricated on a 3D
trasts in the kHz regime, this was further investigated. For optimal printer (Figure 8). The sensor setup of Figure 2 was used together
algorithm performance, each pulse should at least excite one event with the sled to capture data at a speed of 1.94 cm/s over this ter-
per column. This is not the case for the line laser pulsed at rain using a laser pulse frequency of 200 Hz, translating in the
500 Hz because the pixel bandwidth at the laser intensity used is ty direction (Equation 5). (This slow speed was a limitation of

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 275 | 12


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

for terrain surface reconstruction, 2013). This video also shows


various stages of the sensor output and laser line extraction.
This recording is done at a sled speed of about 1 m/s using a
free-falling sled on an incline, which was not limited by the DC
motor speed. In this movie it is also clear that some parts of the
terrain where the laser hits the surface at a glancing angle do not
generate line data. The movie also shows that background DVS
activity caused by image contrast is also effectively filtered out by
the algorithm although at this high frequency many pixels do not
generate events on each laser pulse.

DISCUSSION
In this paper the first application of a DVS as a sensing device
for terrain reconstruction was demonstrated. An adaptive event-
based filtering algorithm for efficiently extracting the laser line
position was proposed. The proposed application of DVSs in
active sensor setups such as 3D scanners allows terrain recon-
struction with high temporal resolution without the necessity
FIGURE 8 | Artificial 3D rapid prototype terrain used for proof of
of using a power-consuming high-speed camera and subsequent
concept reconstruction. Blue: area depicted in Figure 9, Red: laser line,
Black: scan direction. high frame rate processing or any moving parts. The event-based
output of DVSs has the potential to reduce the computational
load and thereby decreasing the latency and power consump-
tion of such systems. The system benefits from the high dynamic
the DC motor driving the sled.) Figure 9 shows results of these range and the sparse output of the sensor as well as the highly
measurements: Figure 9A shows the CAD model and Figure 9B resolved time information on the dynamics in a scene. With the
shows the raw extracted line data after transformation through proposed algorithm, temporal correlations between the pulsed
Equation 5 using the calibration parameters and the measured stimulus and the recorded signal can be extracted as well as used
sled speed. The blind spots where the laser did not reach the sur- as filtering criterion for the stripe extraction.
face and the higher sampling density on front surfaces are evident. Further improvements to the system are necessary to realize
These blind spots were filled by applying the MATLAB function the targeted integration to mobile robots. The Java and jAER
TriScatteredInterp on the sample points as shown in Figure 9C. overhead would have to be removed and the algorithm would
Finally, Figure 9D shows the error between the reconstruction have to be implemented on a lower level programming language
and model as explained in the next paragraph. (such as C) using the optimized event-based algorithm. A camera
To quantify the error, the data was compared to the ground motion model and surface reconstruction would have to be inte-
truth of the CAD model. However, the model and data lack align- grated into the software and for portability of the system it would
ment marks and therefore they were first aligned by hand using need to be embedded in a camera such as the eDVS (Conradt
a global translation. Next, the alignment was refined using the et al., 2009). Motion models could be obtained from 3D sur-
iterative closest point algorithm (ICP; Besl and McKay, 1992), face SLAM algorithms (Newcombe et al., 2011) and/or inertial
which slightly adjusted the global translation and rotation to min- measurement units (IMUs). The use of DVSs with a higher sensi-
imize the summed absolute distance errors. Thirdly the closest tivity (Serrano-Gotarredona and Linares-Barranco, 2013) would
3D point of the model was determined for each point of the allow using weaker lasers to save power. Higher resolution sen-
non-interpolated Figure 9B raw data and fourthly the distance sors that include a static readout (Posch et al., 2011; Berner et al.,
to this model point was measured. The resulting accuracy i.e., 2013) would facilitate the calibration and increase the resolution.
the mean 3D distance between these two points in the 3D data The use of a brighter line laser would allow higher laser puls-
is 1.7 1.1 mm, i.e., the mean absolute distance between the ing frequencies, a wider sensing range as well as possible outdoor
sample and data points is 1.7 mm but the errors vary with a stan- applications.
dard deviation of 1.1 mm. This accuracy represents 0.25 pixel But despite its immature state, the proposed approach com-
precision of measurement of the laser line given the geometry pares well to existing commercial depth sensing systems like the
of the measurement setup. In the resampled, linearly interpo- Microsoft Kinect and a LIDAR optimized for mobile robots
lated data shown in Figure 9D, most of the error originates from such as the SOKUIKI (comparison shown in Table 2). The system
the parts of the surface where the line laser is occluded by the has a higher maximal sampling rate than the other sensors, a
surface, which are interpolated as flat surfaces, and in partic- much lower average latency of 5 ms at a 200 Hz pulse rate, and
ular the bottoms of the valleys show the worst error, as could it is more accurate at short distances. These features are cru-
be expected. cial for motion planning and obstacle avoidance in fast moving
An online movie showing the stripe extraction for the terrain robots. The latency of the proposed approach is, however, depen-
reconstruction using a higher laser pulse frequency of 500 Hz is dent on the reliability of the DVS pixel responses, so there is a
available (Adaptive filtering of DVS pulsed laser line response tradeoff between latency and noise that has not yet been fully

www.frontiersin.org January 2014 | Volume 7 | Article 275 | 13


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

FIGURE 9 | The reconstructed surface. (A) CAD model of the surface. (B) This section of the reconstruction was chosen for display because in the
Measured data points. (C) Interpolated reconstruction of the surface using surrounding area border effects were observed caused by the Gaussian
Matlabs TriScatteredInterp function. (D) Distance between closest profile of the laser line that reduced the DVS event rate to be too low to
reconstruction point and model aligned using ICP (Besl and McKay, 1992). result in acceptable reconstruction.

Table 2 | Performance comparison of the proposed approach with existing depth sensors.

This work Microsoft Kinect for Xbox 360 LIDAR (SOKUIKI)

Spatial resolution (pixels) 128 320 240b 680e


Field of view (degree) 28 58 44b 240e
Output data Surface profile Depth map Range profile
Accuracy 2 mm @ 0.45 m (0.45%) 1.5 cm @ 3 mc (0.5%) 3 cm @ 1 me (3%)
Power consumption USB camera + laser: 535 mWa 2.254.7 W (active)b 2.5 We
Max sample rate (Hz) 500 30d 10e
Average latency (ms) 5f 45g 120h 100e

a DVS:400 mW + Laser: 135 mW.


b Nominally 640 480 (Viager, 2011) but spatial pattern used reduces to 1 pixel in each direction (Andersen et al., 2012).
c Khoshelham and Elberink, 2012.
d Kinect for Windows Sensor Components and Specifications.
e URG-04LX-UG01.
f 200 Hz laser pulse rate.
g VGA depth map output with Core2 E6600 CPU @ 2.4 GHz (Specs about OpenNI compliant 3D sensor Carmine 1.08 | OpenNI, 2012).
h Skeleton model w/1 skeleton tracked (Livingston et al., 2012).

studied, and this tradeoff will also depend on other conditions camera and laser does not include the power to process the events
such as background lighting and surface reflectance. On the nor to reconstruct the surface but because the sensor system
downside, the systems spatial resolution is limited by the use of power consumption is comparably lower, the data processing will
the first-generation DVS128 camera and the field of view for the probably fit into the power budget of the other two approaches
proposed system is narrow. But these drawbacks are not funda- when embedded into a 32-bit ARM-based microcontroller, e.g.,
mental and they can easily be improved (e.g., by using newer as in Conradt et al. (2009). In summary, this paper demonstrates
sensors, shorter lenses and stronger lasers). The limitation that the applicability of DVSs combined with pulsed line lasers to
the system does not deliver depth maps but surface profiles could provide surface profile measurement with low latency and low
be overcome by projecting sparse 2D light patterns instead of computational cost, but integration onto mobile platforms will
a laser line. The power consumption of 500 mW for the USB require further work.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 275 | 14


Brandli et al. Pulsed laser line-DVS for terrain reconstruction

ACKNOWLEDGMENTS Ni, Z., Pacoret, C., Benosman, R., Ieng, S.-H., and Rgnier, S. (2012). Asynchronous
This research was supported by the European Union funded event-based high speed vision for microparticle tracking. J. Microsc. 245,
236244. doi: 10.1111/j.1365-2818.2011.03565.x
project SeeBetter (FP7-ICT-2009-6), the Swiss National Science Orghidan, R., Salvi, J., and Mouaddib, E. M. (2006). Modelling and accuracy esti-
Foundation through the NCCR Robotics, ETH Zurich, and the mation of a new omnidirectional depth computation sensor. Pattern Recognit.
University of Zurich. The authors thank the reviewers for their Lett. 27, 843853. doi: 10.1016/j.patrec.2005.12.015
helpful critique which had a big impact on the final form of this Palaniappa, R., Mirowski, P., Ho, T. K., Steck, H., Whiting, P.,
paper. and MacDonald, M. (2011). Autonomous RF Surveying Robot for
Indoor Localization and Tracking. Gumaraes. Available online at:
http://ipin2011.dsi.uminho.pt/PDFs/Shortpaper/49_Short_Paper.pdf
SUPPLEMENTARY MATERIAL Posch, C., Matolin, D., and Wohlgenannt, R. (2011). A QVGA 143 dB dynamic
The Supplementary Material for this article can be found range frame-free PWM image sensor with lossless pixel-level video compres-
online at: http://www.frontiersin.org/journal/10.3389/fnins. sion and time-domain CDS. IEEE J. Solid-State Circuits 46, 259275. doi:
2013.00275/abstract 10.1109/JSSC.2010.2085952
Raibert, M., Blankespoor, K., Nelson, G., Playter, R., and Big Dog Team.
(2008). BigDog, the Rough-Terrain Quadruped Robot. Seoul. Available online at:
REFERENCES http://web.unair.ac.id/admin/file/f_7773_bigdog.pdf
Adaptive filtering of DVS pulsed laser line response for terrain surface reconstruc- Robinson, A., Alboul, L., and Rodrigues, M. (2003). Methods for
tion. (2013). Available online at: http://youtu.be/20OGD5Wwe9Q. (Accessed: indexing stripes in uncoded structured light scanning systems, in
December 23, 2013). International Conference in Central Europe on Computer Graphics,
Andersen, M. R., Jensen, T., Lisouski, P., Mortensen, A. K., Hansen, M. K., Visualization and Computer Vision (Plzen; Bory). Available online at:
Gregersen, T., et al. (2012). Kinect Depth Sensor Evaluation for Computer http://wscg.zcu.cz/WSCG2004/Papers_2004_Full/I11.pdf
Vision Applications. Aarhus: Aarhus University, Department of Engineering. Serrano-Gotarredona, T., and Linares-Barranco, B. (2013). A 128 128 1.5% con-
Available online at: http://eng.au.dk/fileadmin/DJF/ENG/PDF-filer/Tekniske trast sensitivity 0.9% FPN 3 s latency 4 mW asynchronous frame-free dynamic
rapporter/TechnicalReportECE-TR-6-samlet.pdf. (Accessed: December 11, vision sensor using transimpedance preamplifiers. IEEE J. Solid-State Circuits
2013). 48, 827838. doi: 10.1109/JSSC.2012.2230553
Berner, R., Brandli, C., Yang, M., Liu, S.-C., and Delbruck, T. (2013). Siegwart, R. (2011). Introduction to Autonomous Mobile Robots. 2nd Edn.
A 240 180 10 mW 12 us latency sparse-output vision sensor Cambridge, MA: MIT Press.
for mobile applications, in Symposium on VLSI Circuits (Kyoto), Specs about OpenNI compliant 3D sensor Carmine 1.08 |OpenNI. (2012).
C186C187. Available online at: http://www.openni.org/rd1-08-specifications/. (Accessed:
Besl, P. J., and McKay, N. D. (1992). A Method for Registration of 3-D November 12, 2013).
shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 239256. doi: 10.1109/34. URG-04LX-UG01. Scanning Range Finder URG-04LX-UG01. Available online
121791 at: http://www.hokuyo-aut.jp/02sensor/07scanner/urg_04lx_ug01.html.
Conradt, J., Cook, M., Berner, R., Lichtsteiner, P., Douglas, R., and Delbruck, (Accessed: October 24, 2013).
T. (2009). A pencil balancing robot using a pair of AER dynamic vision Usamentiaga, R., Molleda, J., and Garca, D. F. (2010). Fast and robust laser stripe
sensors, in IEEE International Symposium on Circuits and Systems (ISCAS) extraction for 3D reconstruction in industrial environments. Mach. Vis. Appl.
2009 (Taipei), 781784. Available online at: http://ieeexplore.ieee.org/lpdocs/ 23, 179196. doi: 10.1007/s00138-010-0288-6
epic03/wrapper.htm?arnumber=5117867. (Accessed: August 13, 2013). doi: Viager, M. (2011). Analysis of Kinect for Mobile Robots. Lyngby: Technical University
10.1109/ISCAS.2009.5117867 of Denmark.
Delbruck, T., and Lichtsteiner, P. (2007). Fast sensory motor control based Weiss, S., Achtelik, M., Kneip, L., Scaramuzza, D., and Siegwart, R. (2010). Intuitive
on event-based hybrid neuromorphic-procedural system, in International 3D maps for MAV terrain exploration and obstacle avoidance. J. Intell. Robot.
Symposium on Circuits and Systems (ISCAS) 2007 (New Orleans, LA), Syst. 61, 473493. doi: 10.1007/s10846-010-9491-y
845848. Available online at: http://ieeexplore.ieee.org/lpdocs/epic03/ Yoshitaka, H., Hirohiko, K., Akihisa, O., and Shinichi, Y. (2006). Mobile robot
wrapper.htm?arnumber=4252767. (Accessed: August 13, 2013). doi: localization and mapping by scan matching using laser reflection intensity
10.1109/ISCAS.2007.378038 of the SOKUIKI sensor, in IECON 2006 - 32nd Annual Conference on IEEE
Forest, J., and Salvi, J. (2002). A review of laser scanning three-dimensional digi- Industrial Electronics (Paris), 30183023.
tisers, in IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS) (Lausanne: IEEE), 7378. Available online at: http://ieeexplore.ieee.org/ Conflict of Interest Statement: One of the Authors (Tobi Delbruck) is one of
lpdocs/epic03/wrapper.htm?arnumber=1041365. (Accessed: August 13, 2013). the research topic editors. One of the Authors (Tobi Delbruck) has a financial
jAER. (2007). JAER Open Source Proj. Available online at: http://jaerproject.net. participation in iniLabs, the start-up which commercially distributes the DVS
(Accessed: September 17, 2013). camera prototypes. The authors declare that the research was conducted in the
Khoshelham, K., and Elberink, S. O. (2012). Accuracy and resolution of kinect absence of any commercial or financial relationships that could be construed as a
depth data for indoor mapping applications. Sensors 12, 14371454. doi: potential conflict of interest.
10.3390/s120201437
Kinect for Windows Sensor Components and Specifications. Available online Received: 23 August 2013; accepted: 23 December 2013; published online: 17 January
at: http://msdn.microsoft.com/en-us/library/jj131033.aspx. (Accessed: October 2014.
23, 2013). Citation: Brandli C, Mantel TA, Hutter M, Hpflinger MA, Berner R, Siegwart R
Lichtsteiner, P., Posch, C., and Delbruck, T. (2008). A 128 128 120 dB 15 s and Delbruck T (2014) Adaptive pulsed laser line extraction for terrain reconstruction
latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State using a dynamic vision sensor. Front. Neurosci. 7:275. doi: 10.3389/fnins.2013.00275
Circuits 43, 566576. doi: 10.1109/JSSC.2007.914337 This article was submitted to Neuromorphic Engineering, a section of the journal
Livingston, M. A., Sebastian, J., Ai, Z., and Decker, J. W. (2012). Performance Frontiers in Neuroscience.
measurements for the Microsoft Kinect skeleton, in 2012 IEEE Virtual Copyright 2014 Brandli, Mantel, Hutter, Hpflinger, Berner, Siegwart and
Reality Short Papers and Posters (VRW) (Costa Mesa, CA), 119120. doi: Delbruck. This is an open-access article distributed under the terms of the Creative
10.1109/VR.2012.6180911 Commons Attribution License (CC BY). The use, distribution or reproduction in other
Newcombe, R. A., Davison, A. J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., et al. forums is permitted, provided the original author(s) or licensor are credited and that
(2011). KinectFusion: real-time dense surface mapping and tracking, in 2011 the original publication in this journal is cited, in accordance with accepted academic
10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR) practice. No use, distribution or reproduction is permitted which does not comply with
(Basel), 127136. these terms.

www.frontiersin.org January 2014 | Volume 7 | Article 275 | 15


ORIGINAL RESEARCH ARTICLE
published: 21 November 2013
doi: 10.3389/fnins.2013.00223

Robotic goalie with 3 ms reaction time at 4% CPU load


using event-based dynamic vision sensor
Tobi Delbruck* and Manuel Lang
Department of Information Technology and Electrical Engineering, Institute of Neuroinformatics, UNI-ETH Zurich, Zurich, Switzerland

Edited by: Conventional vision-based robotic systems that must operate quickly require high video
Andr van Schaik, The University of frame rates and consequently high computational costs. Visual response latencies are
Western Sydney, Australia
lower-bound by the frame period, e.g., 20 ms for 50 Hz frame rate. This paper shows how
Reviewed by:
an asynchronous neuromorphic dynamic vision sensor (DVS) silicon retina is used to build
Jorg Conradt, Technische Universitt
Mnchen, Germany a fast self-calibrating robotic goalie, which offers high update rates and low latency at
Gregory K. Cohen, Bioelectronics low CPU load. Independent and asynchronous per pixel illumination change events from
and Neuroscience Research Group the DVS signify moving objects and are used in software to track multiple balls. Motor
at the MARCS Institute, Australia
actions to block the most threatening ball are based on measured ball positions and
*Correspondence:
velocities. The goalie also sees its single-axis goalie arm and calibrates the motor output
Tobi Delbruck, Department of
Information Technology and map during idle periods so that it can plan open-loop arm movements to desired visual
Electrical Engineering, Institute of locations. Blocking capability is about 80% for balls shot from 1 m from the goal even with
Neuroinformatics, Winterhurerstr. the fastest-shots, and approaches 100% accuracy when the ball does not beat the limits of
190, UNI-ETH Zurich, CH-8057
Zurich, Switzerland
the servo motor to move the arm to the necessary position in time. Running with standard
e-mail: [email protected] USB buses under a standard preemptive multitasking operating system (Windows), the
goalie robot achieves median update rates of 550 Hz, with latencies of 2.2 2 ms from ball
movement to motor command at a peak CPU load of less than 4%. Practical observations
and measurements of USB device latency are provided1 .
Keywords: asynchronous vision sensor, address-event representation, AER, high-speed visually guided robotics,
high frame rate, neuromorphic system, soccer

INTRODUCTION describes a fruit-fly wing-beat analyzer that uses Kalman filtering


The notion of a frame of video data is embedded in machine to move the ROI in anticipation of where it should be accord-
vision. High speed frame-based vision is expensive because it is ing to the Kalman filter parameters, and even to time-multiplex
based on a series of pictures taken at a constant rate. The pix- the ROI between different parts of the scene. The computer must
els are sampled repetitively even if their values are unchanged. process all the pixels for each ROI or binned frame of data and
Short-latency vision problems require high frame rate and pro- ROI control latencies must be kept short if the object motion is
duce massive amount of input data. At high frame rate, few CPU not predictable.
instructions are available for processing each pixel. For example, By contrast, in the camera used for this paper, data are gen-
a VGA 640 480 pixel image sensor at 1 kHz frame rate deliv- erated and transmitted asynchronously only from pixels with
ers data at a rate of 307 M pixels/s, or a pixel every 3.3 ns. At changing brightness. In a situation where the camera is fixed
usable instruction rates of 1 GHz a computer would only be able and the illumination is not varying only moving objects gener-
to dedicate 3 instructions per pixel to processing this informa- ate events. This situation reduces the delay compared to waiting
tion. This high data rate, besides requiring specialized computer for and processing an entire frame. Also, processor power con-
interfaces and cabling (Wilson, 2007), makes it expensive in terms sumption is related to the scene activity and can be reduced
of power to deal with the data, especially in real time or embed- by shorter processing time and longer processor sleep phases
ded devices. Specialized high-frame-rate machine vision cameras between processing cycles.
with region of interest (ROI) or binning (sub-sampling) capabil- This paper describes the results of experiments in low-latency
ities can reduce the amount of data significantly, but the ROI and visual robotics using an asynchronous dynamic vision sensor
binning must be controlled by software and the ROI is limited (DVS) (Lichtsteiner et al., 2006, 2007) as the input sensor, a stan-
to a single region, reducing its usefulness for tracking multiple dard PC as the processor, standard USB interfaces, and a standard
objects. Tracking a single object requires steering the ROI to fol- hobby servo motor as the output.
low the object. The latency of this ROI control must be kept Specifically, this paper demonstrates that independent pixel
short to avoid losing the object and ROI control can become quite event data of a DVS are well-suited for object tracking and real-
complex to implement. Ref. (Graetzel et al., 2006), for example, time visual feedback control. The simple but highly efficient
object-tracking algorithm is implemented on a general purpose
1 During this work the authors were with the Inst. of Neuroinformatics, CPU. The experiments show that such a robot, although based
Winterhurerstr. 190, UNI-ETH Zurich, CH-8057 Zurich, Switzerland., e-mail: on traditional, cheap, ubiquitous PC components like USB and
[email protected], phone: +41(44) 635-3038. a standard preemptive operating system (Windows) a simple

www.frontiersin.org November 2013 | Volume 7 | Article 223 | 16


Delbruck and Lang Robo goalie

programmable Java control application achieves reaction times


on par with high speed conventional machine vision hardware
running on dedicated real-time operating systems consuming the
resources of an entire computer.
This paper expands on a brief conference report (Delbruck
and Lichtsteiner, 2007) by including the new feature of self-
calibration, more detailed descriptions of the algorithms, and
new measurements of performance and latency particularly relat-
ing to USB interfaces. Other related work that has integrated
an event-based neuromorphic vision sensor in a robot includes
CAVIAR, a completely spike-hardware based visual tracking sys-
tem (Serrano-Gotarredona et al., 2009), a pencil balancing robot
using a pair of embedded-processor DVS cameras (Conradt et al.,
2009a), which was first prototyped using two DVS cameras
interfaced by USB (Conradt et al., 2009b), a demonstration of
real-time stereo distance estimation computed on an FPGA with
2 DVS cameras (Domnguez-Morales et al., 2012), an embed-
ded FPGA-based visual feedback system using a DVS (Linares-
Barranco et al., 2007), and a micro gripper haptic feedback system
(Ni et al., 2013) which uses a DVS as one of the two input sensors.
FIGURE 1 | Goalie robot illustration and a photo of the setup, showing
MATERIALS AND METHODS: GOALIE ARCHITECTURE the placement of vision sensor, goalie arm, and goal. The white or
orange balls have a diameter of 3 or 4 cm and are viewed against the light
The application presented here is a self-calibrating soccer goalie
brown wood table. The reflectance ratio between balls and table is about
robot (Figure 1). The robotic goalie blocks balls shot at a goal 1.3. The retina view extends out to 1 m from the goal line. The goalie hand
using a single-axis arm with only a single degree of freedom. is 5 cm wide and the goal is 45 cm wide.
Figure 1 shows our goalie robot hardware architecture. Players
attempt to score by shooting balls at the goal (either by rolling or
flicking with their fingernails) and the goalie robot tries to block cell spikes seen in biology, each event that is output from the
all balls from entering the goal. Only balls that roll or slide along DVS indicates that the log intensity at a pixel has changed by an
or near the table surface can be blocked and this limitation is what amount T since the last event. T is a global event threshold which
enables the solution to the blocking problem without stereo vision is typically set to about 15% contrast in this goalie robot applica-
or some other means of determining the height of the ball over tion. In contrast to biology, the serial data path used requires the
the table. The fact that the balls move along the surface of the events to carry address information of what pixels number has
table means that their 3D position can (implicitly in this applica- changed. The address encodes the positive or negative brightness
tion) be determined from the balls 2D image position. The goalie changes (ON or OFF) with one bit and the rest of the bits encode
is self-calibrating i.e., by visual observation it learns the motor the row and column addresses of the triggering pixel. This repre-
control to arm position relationship. When turned on the goalie sentation of change in log intensity encodes scene reflectance
is one of 4 distinct states. In the active state, the goalie has deter- change, as long as the illumination is constant over time, but
mined that a ball is approaching the goal that can be blocked and not necessarily over space. Because this computation is based on
tries to block it. Between balls, the goalie is relaxed to the middle a compressive logarithmic transformation in each pixel, it also
position. When no definite balls have been seen for a few sec- allows for wide dynamic range operation (120 dB, compared with
onds, the goalie enters sleeping state where it does not respond to e.g., 60 dB for a high quality traditional image sensor).
every movement in the scene. This state reduces apparently spas- This neuromorphic abstraction of the transient pathway seen
tic movements in response to people walking by, hands, etc. After in biology turns out to be useful for a number of reasons. The
several minutes in sleeping state the goalie enters the learning in wide dynamic range means that the sensor can be used with
which it recalibrates itself. The goalie wakes up from sleeping to uncontrolled natural lighting, even when the scene illumination
become active when it again sees a definite ball. is non-uniform and includes strong shadows, as long as they are
The rest of this section will describe the individual compo- not moving. The asynchronous response property also means
nents of the system. that the events have the timing precision of the pixel response
rather than being quantized to the traditional frame rate. Thus,
DYNAMIC VISION SENSOR the effective frame rate is typically several kHz and is set by
Conventional image sensors see the world as a sequence of frames, the available illumination which determines the pixel bandwidth.
each consisting of many pixels. In contrast, the DVS is an example The temporal redundancy reduction reduces the output data rate
of a sensor that outputs digital address events (spikes) in response for scenes in which most pixels are not changing. The design of
of temporal contrast at the moments that pixels see changing the pixel also allows for uniformity of response: the mismatch
intensity (Lichtsteiner et al., 2006, 2007; Delbruck et al., 2010) between pixel contrast thresholds is 2.1% contrast and the event
(Figure 2). Like an abstraction of some classes of retinal ganglion threshold can be set down to 10% contrast, allowing the device

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 223 | 17


Delbruck and Lang Robo goalie

FIGURE 3 | Snapshot of action showing 128 events (all events within


2.9 ms) from the vision sensor. It shows 5 tracked objects (the middle 3
are real balls, the top one is the shooters hand, and the bottom object is the
goalie arm). The Attacking ball rolling toward the goal (and being blocked) is
marked with a circle; other balls are tracked but ignored. The thin squares
represent potential clusters that have not received sufficient support. The
velocity vectors of each ball are also shown as a slightly thicker line and
have been computed by least squares linear regression over the past 10
packets of events. The goalie arm is being moved to the left bar and the
presently tracked location of the arm is shown as a light bar inside the arm
cluster. The state of the goalie is indicated as active meaning a tracked
ball is being blocked. The balls generate average event rates of 330 keps
(kilo events per second). The mean event rate for this packet was 44 keps.

frame-based vision (Cheng, 1995; Comaniciu and Ramesh, 2000).


FIGURE 2 | Characteristics of the dynamic vision sensor (Tmpdiff128). Each cluster models a moving object as a source of events.
(A) the dynamic vision sensor with its lens and USB2.0 interface. (B) A die Visible clusters are indicated by the boxes in Figure 3. Events
photograph. Pixels generate address-events, with the address formed from
that fall within the cluster move the cluster position, and a clus-
the x, y, location and ON or OFF type (C) an abstracted schematic of the
pixel, which responds with events to fixed-size changes of log intensity. (D)
ter is only considered supported (visible) when it has received
How the ON and OFF events are internally represented and output in a threshold number of events. Clusters that lose support for a
response to an input signal. Figure adapted from Lichtsteiner et al. (2006). threshold period are pruned. Overlapping clusters are merged
periodically at 1 ms intervals. Cluster positions are updated by
to sense real-world contrast signals rather than only artificial high using a mixing factor that mixes the old position with the new
contrast stimuli. The vision sensor has integrated digitally con- observations using fixed factors. Thus, the time constant gov-
trolled biases that minimize chip-to-chip variation in parameters erning cluster position is inversely proportional to the evidence
and temperature sensitivity. Equipped with an USB2.0 high-speed (event rate).
interface, the DVS camera delivers the time-stamped address- The advantages of the cluster tracker are:
event representation (AER) address-events to a host PC with
timestamp resolution of 1 us.
(1) There is no frame correspondence problem because
EVENT-DRIVEN TRACKING ALGORITHM the events continuously update the cluster locations during
Events from the DVS are processed inside jAER, an open-source the movement of the objects, and the faster the objects move,
Java software infrastructure for processing event-based sensor the more events they generate.
outputs (2007). The goalie implementation consists of about 3 k (2) Only pixels that generate events need to be processed.
non-comment lines of code. The goalie software implementation The cost of this processing is dominated by the search
is open-sourced in jAER. for the nearest existing cluster, which is a cheap operation
The ball and arm tracker is an event-driven cluster tracker because there are only a few clusters.
described briefly in (Lichtsteiner et al., 2006; Litzenberger (3) Memory cost is low because there is no full frame memory,
et al., 2006) (Figure 3) and further enhanced in this work. only cluster memory, and each cluster requires only a few
This algorithm is inspired by the mean-shift approach used in hundred bytes of memory.

www.frontiersin.org November 2013 | Volume 7 | Article 223 | 18


Delbruck and Lang Robo goalie

In the goalie application the objects have a known size and roll The goalie robot determines the ball object as the cluster that
on a flat surface so tracked clusters have an image space radius will next hit the goal line, based on the cluster positions and
determined by their perspective location in the scene. velocities. The ball clusters location and velocity measurement
The algorithm runs on each packet of combined events are used to position the servo to intercept the ball. If there is no
received from USB transmission, typically 128 (or fewer): threatening ball, the goalie relaxes.
Accurate and rapid measurement of cluster velocity is impor-
(1) Pruning: Iterate over all existing clusters, pruning out those tant in the goalie application because it allows forward prediction
clusters that have not received sufficient support. A cluster is of the proper position of the arm. A number of algorithms
pruned if it has not received an event to support it within a for estimating cluster velocity were tried. Low-pass filtering the
given time, typically 10 ms in this application. instantaneous cluster velocity estimates that come from the clus-
(2) Merging: Iterate over all clusters to merge overlapping clus- ter movements caused by each event is cheap to compute, but was
ters. This merging operation is necessary because new clus- not optimal because the lowpass filter takes too long to settle to an
ters can be formed when an object grows larger as it accurate estimate. The method presently used is a rolling least
approaches the vision sensor. For each cluster rectangle that squares linear regression on the cluster locations at the ends of
overlaps the rectangle of another cluster, merge the two clus- the last N packets of events. This method is almost as cheap to
ters into a new cluster and discard the previous clusters. The compute because it only updates least-squares summary statistics
new cluster takes on the history of the older two clusters and by adding in the new location and removing the oldest location,
its position is the weighted average of the locations of the and it instantaneously settles to an optimal estimate. A value of
source clusters. The averaging is weighted by the number of N = 10 computes velocity estimates over about the last 1030 ms
events in each source cluster. This weighting reduces the jit- of ball location.
ter in the cluster location caused by merging. This iteration
continues as long as there are overlapping clusters. GOALIE SELF-CALIBRATION
(3) Positioning: For each event, find the nearest cluster that con- In an earlier version of the goalie (Delbruck and Lichtsteiner,
tains the event. The predicted location of each cluster that is 2007) the arm position was specified by adjustable offset and
considered in this step is computed using its present cluster gain parameters that mapped a motor command to a certain
location combined with the present cluster velocity estimate arm position. It was difficult to calibrate this goalie accurately and
and the time between this event and the last one that updated every time the aim of the vision sensor was adjusted or moved
the cluster. This way, an event can be in a clusters predicted accidentally (the goal bounces around quite a bit due to the arm
location even if it is not inside the last location of the cluster. movements) laborious manual calibration had to be done again.
The goalie arm was also not visible to the goalie and so there was
(a) If the event is within the cluster, add the event to the no straightforward way for the goalie to calibrate itself. In the
cluster by pushing the cluster a bit toward the event and present goalie, the orientation of the arm was changed so that it
updating the last event time of the cluster. The new clus- swings on a horizontal plane rather than hanging like a pendu-
ter location xn+1 is given by mixing the predicted value of lum and used a wide angle lens (3.6 mm) that allows the vision
the old location (xn + vt), where v is the cluster veloc- sensor to see both incoming balls and the goalies hand. The hori-
ity and t is the time between this event and the last one zontal arm orientation has the additional advantage that it allows
that updated this cluster, with the event location eusing the goalie to block corner shots much better.
an adjustable mixing factor 0.01: Goalie calibration occurs in the learning state. When active,
the arm position is tracked by using a motion tracker like the
xn + 1 = (1 )(xn + vn t) + e ball tracker but with a single cluster sized to the size and aspect
ratio of the arm (Figure 3). The x position of the arm tracker is
This step implements a predictive tracker by giving the arm coordinate in image space. The motor is controlled in
clusters a kind of momentum that helps keep clusters coordinates chosen in software to span [0-1] units. The calibra-
attached to rapidly moving objects even if they emit few tion algorithm has the following steps demonstrated by the data
events. If the present event appears at the predicted loca- shown in Figure 4:
tion of the cluster, the clusters location is only modified
to the predicted location. Events from the leading edge of (1) The present calibration is checked by randomly placing the
the object pull the cluster forward and speed it up, while arm in 5 pixel positions (using the current calibration param-
events at the clusters trailing edge pull the cluster back eters to determine the mapping) and measuring the actual
and slow it down. arm position in pixel coordinates. If the average absolute
(b) If the event is not in any cluster, seed a new cluster if there error is smaller than a threshold (typically 5 pixels) calibra-
are spare unused clusters to allocate. The goalie typically tion is finished. In the situation shown in Figure 4A, the
uses 20 potential clusters. calibration is initially very incorrect, and learning is initiated.
(2) If calibration is needed, the algorithm places the arm in ran-
A cluster is not marked as visible until it receives a certain domly chosen motor positions within a range specified in
number of events (typically 10 in the goalie) and is moving at a GUI interface to be in roughly in the center of the field
a minimum speed (typically 20 pixels/s in the goalie). of view. (The GUI allows interactive determination of the

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 223 | 19


Delbruck and Lang Robo goalie

to rotate 60 with no load in 100 ms. It can move the 40 cm long


20 g mass arm across the goal in about 100 ms and is slightly
(10%) underdamped with the goalie arm as only load. Other
fast servos can be severely underdamped and actually oscillate
(e.g., the Futaba S9253). The remaining overshoot with the HiTec
servo is enough that the servo occasionally overshoots its intended
location enough that the ball is not blocked.
A custom board based on the Silicon Labs C8051F320 USB2.0
full-speed microcontroller (www.silabs.com) interfaces between
the PC and the servo motor. The microcontroller accepts com-
FIGURE 4 | Goalie self-calibration. (A) Desired and actual arm position
mands over a USB bulk endpoint (Axelson, 2001) that program
before and after calibration (the desired positions are the inverse mapping the PWM output width. The servo motor is powered directly
from motor commands in the allowed physical range to the pixel space). (B) by the 5V USB VBUS and 0.5F of ultracapacitor on the con-
Example measured arm positions vs. servo command used for least troller board helps to ballast the 5V USB VBUS voltage. The servo
squares linear regression calibration. controller design is open-sourced in jAER (2007).
The servo arm is constructed from a paint stirrer stick with a
balsa wood hand glued to its end. A goal of this project was to
servo motor limits). For each placement position, the actual make this hand as small as possible to demonstrate the precision
pixel position is measured from the arm tracker. Typically of tracking. The hand width used in this study was about 1.5 times
20 points are collected. A least-squares linear regression then the ball width (Figure 1).
determines the linear mapping from desired pixel position
to motor position (Figure 4B). The algorithm then goes RESULTS
back to step 1. In Figure 4A, the calibration is checked after Ordinarily a good shooter can aim most of the shots within the
fitting and is satisfactory, so the calibration algorithms is goal; thus a good shooter can potentially score on most shots. In
terminated. a trial with several experienced shooters who were told they could
take as much time as they needed to shoot, it required an aver-
Calibration typically achieves accuracy within 25 pixels over the age of 40 shots to score 10 goals. This means that each ball had to
entire range. The linear approximation sin(x) = xnear x = 0 was be shot about 4 times to score once, representing a shot success
sufficiently accurate that it was not necessary to account for the rate of 25%. A post experiment analysis of the data showed that
sinusoidal relation between servo command and location of the the shooters could potentially have scored on 75% of their shots,
arm across the goal. with the rest of the shots representing misses wide of the goal
(the shooters were intentionally aiming at the corners of the goal).
USB INTERFACES AND SERVO CONTROL Therefore, they had 30 shots on the goal and the goalie blocked 20
Both the vision sensor and the servo controller use the Java of these shots. The missed blocks consisted of a mixture of shots
interface provided by the Thesycon Windows USB driver develop- were not blocked for three reasons, ranked from highest to lowest
ment kit for Windows (www.thesycon.de). The servo commands occurrence: (1) they were so hard that they exceeded the ability
are sent to the microcontroller in a separate writer thread that of the servo to move the arm to the correct position in time; (2)
takes commands placed in a queue by the retina event process- tracking noise so that the arm position was not correctly com-
ing thread. This decoupling allows for full speed USB 2.0 event puted well-enough; (3) servo overshoot, where the servo tries to
processing although servo controller commands are transmit- move the arm to the correct position but because of the under-
ted using USB full-speed protocol at 12 Mbps (Axelson, 2001). damped dynamics, the arm momentarily overshoots the correct
The servo motor control command rate is 500 Hz, because each position, allowing the ball to pass by.
command requires 2 polls from the host controller and the min- The cluster tracker algorithm is effective for ignoring dis-
imal possible USB2.0 full-speed polling interval of 1 ms. The tracters In Figure 3 four balls are simultaneously tracked. The
command queue length is set to one to minimize latency. New topmost ball is probably the shooters hand. Two balls are
commands replace old ones if they have not yet been transmitted. rolling away from the goal and are thus ignored. One is approach-
Likewise, the incoming DVS events are transmitted in 128-event ing the goal and the arm is moving to block it, based on the
(or smaller) blocks and processed in a high priority thread that balls position and velocity. Ignoring the many distracters would
runs independently from the GUI or rendering threads. The DVS be impossible using a simpler method of ball tracking, such as
uses a USB2.0 high-speed interface with a data rate of 480 Mbps median event location. Figure 5 shows the dynamics of a single
and a polling interval of 128 us. The USB interface threads were blocking event for a ball that was shot quite fast, so that that it
set to high priority, with highest priority given to the servo writ- covers the distance from the top of the scene to the goal in about
ing thread. Javas maximum priority is equivalent to Windows 100 ms. During the balls 100 ms approach, about 50 packets of
TIME_CRITICAL priority (Oaks and Wong, 2004). events, and thus samples of the ball position (ballx and bally),
A HiTec HS-6965 MG digital servo moves the goalie arm. This are captured by the tracker. The bounce off the arm is visible as
$120 hobby digital servo accepts pulse-width modulation (PWM) the inflection in bally. The desired arm position is shown also
input up to at least the183 Hz frequency that we used and is rated as a function of time and is computed from ballx, bally, and the

www.frontiersin.org November 2013 | Volume 7 | Article 223 | 20


Delbruck and Lang Robo goalie

FIGURE 5 | Single shot dynamics. (A) 2D histogram of spike activity


caused by balls and goalie arm over 160 ms. (B) Time course of blocking
one ball.

ball x and y velocities (not shown). The ball velocities are esti-
mated by rolling linear regressions over the past 10 ball position
samples for ballx and bally vs. time. The actual arm position FIGURE 6 | Statistics. (A) Latency measurements. Stimulus was flashing
is the position of the arm as measured by the arm tracker and LED turned on at time 0. (B) Host packet processing interval distribution
it can be seen that the arm requires about 80 ms to move to the during normal operation while goalie is under attack. (C) Histogram of
number of events per packet during normal operation. (D) Processor load
correct blocking position and also exhibits about 10% overshoot
during normal operation (2 balls/second attack).
which is due to slight under-damping in the servos controller.
The response latency is dominated by the arm movement and
the delay between knowing the desired arm position and the
initiation of arm movement. a servo motor command. The start of PWM output from
Events are processed by the goalie software at a rate of 2 Meps the servo controller and the actual start of motor movement
(million events per second) on a 2.1 GHz Pentium M laptop run- were measured. (The motor movement was measured from the
ning Windows XP, Java JVM version 1.6. During typical goalie power supply drop on the servo power supply). The measured
operation, the average event rate is 20 keps, varying between median latency of 2.2 ms between the beginning of the LED
<1 keps when idle to a maximum of 100 keps during active flashing and the microcontroller output is the response latency
10 ms windows of time. For buffers of 128 events processing the leaving out the latency of the random PWM phase and the
goalie code requires about 60 us. Figure 6B shows a histogram servo motor (Figure 6A). This latency was achieved by setting
of processing intervals as recorded on the host PC using Javas the servo controller USB2.0 full speed interrupt polling inter-
System.nanoTime(). The median interval is 1.8 ms (the peak in the val to 1 ms in the devices USB descriptor (Axelson, 2001);
histogram at 10 ms is caused by forced transfers of data from the using the default polling interval of 10 ms resulted in sub-
vision sensor at 100 Hz rate even when the USB FIFOs have not stantially higher median latency of 5.5 ms that varied approx-
filled). During processing the computers CPU load never rises imately bi-modally between 3 and 10 ms. The total latency
over 4% (Figure 6D). for actuating the motor (515 ms) is dominated by the vari-
In this system sensor-to-computer latency is dominated by able delay of PWM phase. The 183 Hz servo pulse frequency
the USB FIFO filling time. The vision sensor pixel latency is used in the robot has a period of 5.5 ms. A custom servo
inversely proportional to illumination (Lichtsteiner et al., 2007) which directly accepted USB commands could reduce servo
and is about 100 us at normal indoor office illumination levels latency to about 12 ms, the delay to send a single USB1.1
of 500 lux. A single ball that produces events at a peak rate of command.
100 keps causes a device-side 128-event USB packet about every
1 ms, although bursts of events can cause USB transfers that are CONCLUSION
received as often as every 128 us, the minimum USB2.0 high- The main achievement of this work is the concrete demonstra-
speed polling interval. Increased retina activity (caused, say, by tion of a spike-event driven hybrid of a neuromorphic-sensor
the arm movement) actually reduces this latency, but only because coupled to conventional procedural processing for low latency
the USB device FIFO buffers are filled more rapidly. We used host object tracking, sensory motor processing, and self-calibration.
side USB packet sizes of 256 events to match the maximum 500 Hz Secondary achievements are developments of robust and high
rate of writing commands to the servo motor, and the distribution speed event-based object tracking and velocity estimation algo-
of packet sizes reflects this (Figure 6C). rithms. This paper also reports practical observations on the use
To measure latency, an artificial stimulus consisting of a flash- of USB interfaces for sensors and actuators.
ing LED was set up so that it could be activated in bursts to The goalie robot can successfully block balls even when these
mimic an instantaneously appearing ball. The servo controller are low contrast white-on-gray objects and there are many back-
was programmed to toggle an output pin when it received ground distracters. Running with standard USB buses for vision

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 223 | 21


Delbruck and Lang Robo goalie

sensor input and servo-motor output under a standard preemp- Delbruck, T., and Lichtsteiner, P. (2007). Fast sensory motor control based on
tive multitasking operating system, this system achieves median event-based hybrid neuromorphic-procedural system, in ISCAS 2007 (New
Orleans, LA), 845848.
update rates of 550 Hz, with latencies of 2.2 2 ms from ball
Delbruck, T., Linares-Barranco, B., Culurciello, E., and Posch, C. (2010). Activity-
movement to motor command at a peak CPU load of less driven, event-based vision sensors, Presented at the IEEE International
than 4%. Symposium on Circuits and Systems (Paris), 24262429.
A comparable system based on using a standard image sen- Domnguez-Morales, M., Jimenez-Fernandez, A., Paz-Vicente, R., Jimenez, G.,
sor would require a frame rate of at least 500 Hz. At the same and Linares-Barranco, A. (2012). Live demonstration: on the distance
estimation of moving targets with a stereo-vision AER system, in IEEE
spatial resolution (16 k pixels), a computer would need to con-
International Symposium on Circuits and Systems (ISCAS) 2012, (Seoul),
tinuously process 16 MBps of raw pixel information (with an 721725.
8-bit sensor output) to extract the basic visual information about Graetzel, C. G., Fry, S. N., and Nelson, B. J. (2006). A 6000 Hz computer vision
changing pixels. Although this computation is certainly possi- system for real-time wing beat analysis of Drosophila. Biorob 2006, 278284.
ble, the scaling to higher resolution is very unfavorable to the doi: 10.1109/BIOROB.2006.1639099
Lichtsteiner, P., Posch, C., and Delbruck, T. (2006). A 128128 120dB 30mW asyn-
frame-based approach. Increasing the resolution to VGA resolu- chronous vision sensor that responds to relative intensity change, in Visuals
tion (640 480) at 1 kHz, for instance, would require processing Supplement to ISSCC Digest of Technical Papers (San Francisco, CA), 508509
307 MBps, about 3 times the effective capacity of a high speed (27.9).
USB 2.0 interface and would allow only 3.3 ns per pixel of process- Lichtsteiner, P., Posch, C., and Delbruck, T. (2007). A 128128 120dB 15us latency
ing time. A VGA-sized DVS would generate about 18 times more asynchronous temporal contrast vision sensor. IEEE J. Solid State Circuits 43,
566576. doi: 10.1109/JSSC.2007.914337
data than the 128 128 sensor used for this paper if the objects Linares-Barranco, A., Gmez-Rodrguez, F., Jimnez, A., Delbruck, T., and
filled a proportionally larger number of pixels, but even then Lichtsteiner, P. (2007). Using FPGA for visuo-motor control with a sil-
the processing of the estimated 400 keps from the sensor would icon retina and a humanoid robot, in ISCAS 2007, (New Orleans, LA),
barely load a present-days microprocessor CPU load and would 11921195.
Litzenberger, M., Posch, C., Bauer, D., Schn, P., Kohn, B., Garn, H., et al. (2006).
be within the capabilities of modestly-powered embedded proces-
Embedded vision system for real-time object tracking using an asynchronous
sors. As demonstrated by this work and other implementations transient vision sensor, in IEEE Digital Signal Processing Workshop 2006 (Grand
(Linares-Barranco et al., 2007; Conradt et al., 2009a; Domnguez- Teton, WY), 173178.
Morales et al., 2012; Ni et al., 2013), the use of event-driven Ni, Z., Bolopion, A., Agnus, J., Benosman, R., and Regnier, S. (2013). Asynchronous
sensors can enable faster and lower-power robots of the future. event-based visual shape tracking for stable haptic feedback in microrobotics.
IEEE Transactions on Robotics 28, 10811089. doi: 10.1109/TRO.2012.2198930
Oaks, S., and Wong, H. (2004). Java Threads. OReilly.
ACKNOWLEDGMENTS Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., Linares-Barranco, A., Paz-
This work was supported by the University of Zurich and Vicente, R. Gomez-Rodriguez, F., et al. (2009). CAVIAR: A 45k Neuron,
5M Synapse, 12G Connects/s AER hardware sensoryprocessing learning
ETH Zurich, the Swiss NCCR Robotics, and the EU project
actuating system for high-speed visual object recognition and track-
SEEBETTER. The authors gratefully acknowledge the opportu- ing. IEEE Trans. Neural Netw. 20, 14171438. doi: 10.1109/TNN.2009.
nity to prototype this system at the Telluride Neuromorphic 2023653
Engineering Workshop. Wilson, A. (2007). Beyond camera link: looking forward to a new cam-
era/frame grabber interface standard. Vis. Syst. Design. 12, 7983. Available
online at: http://www.vision-systems.com/articles/print/volume-12/issue-
SUPPLEMENTARY MATERIAL 10/features/product-focus/beyond-camera-link.html
The Supplementary Material for this article can be found
online at: http://www.frontiersin.org/journal/10.3389/ Conflict of Interest Statement: The spinoff inilabs GmbH of the Inst. of
fnins.2013.00223/abstract Neuroinformatics is actively marketing dynamic vision sensor technology, selling
vision sensor prototypes, and supporting users of the technology. The authors
declare that the research was conducted in the absence of any commercial or
REFERENCES financial relationships that could be construed as a potential conflict of interest.
(2007). jAER Open Source Project: Real Time Sensory-Motor Processing for Event-
Based Sensors and Systems. Available online at: http://www.jaerproject.org Received: 07 October 2013; paper pending published: 02 November 2013; accepted: 05
Axelson, J. (2001). USB Complete. Madison, WI: Lakeview Research. November 2013; published online: 21 November 2013.
Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Citation: Delbruck T and Lang M (2013) Robotic goalie with 3 ms reaction time at
Analysis Mach. Intell. 17, 790799. doi: 10.1109/34.400568 4% CPU load using event-based dynamic vision sensor. Front. Neurosci. 7:223. doi:
Comaniciu, D., and Ramesh, V. (2000). Mean shift and optimal prediction for 10.3389/fnins.2013.00223
efficient object tracking, in IEEE International Conference on Image Processing This article was submitted to Neuromorphic Engineering, a section of the journal
(ICIP) (Vancouver, BC), 7073. Frontiers in Neuroscience.
Conradt, J., Berner, R., Cook, M., Delbruck, T. (2009a). An embedded AER Copyright 2013 Delbruck and Lang. This is an open-access article distributed under
dynamic vision sensor for low-latency pole balancing, in 5th IEEE Workshop the terms of the Creative Commons Attribution License (CC BY). The use, distribution
on Embedded Computer Vision (in Conjunction with ICCV 2009) (Kyoto), 16. or reproduction in other forums is permitted, provided the original author(s) or licen-
Conradt, J., Cook, M., Berner, R., Lichtsteiner, P., Douglas, R. J., and Delbruck, sor are credited and that the original publication in this journal is cited, in accordance
T. (2009b). Live demonstration: a pencil balancing robot using a pair of AER with accepted academic practice. No use, distribution or reproduction is permitted
dynamic vision sensors, in ISCAS 2009, (Taipei), 781785. which does not comply with these terms.

www.frontiersin.org November 2013 | Volume 7 | Article 223 | 22


ORIGINAL RESEARCH ARTICLE
published: 13 December 2013
doi: 10.3389/fnins.2013.00234

Event-driven visual attention for the humanoid robot iCub


Francesco Rea 1*, Giorgio Metta 2 and Chiara Bartolozzi 2
1
Robotics, Brain and Cognitive Science, Istituto Italiano di Tecnologia, Genova, Italy
2
iCub Facility, Istituto Italiano di Tecnologia, Genova, Italy

Edited by: Fast reaction to sudden and potentially interesting stimuli is a crucial feature for safe and
Tobi Delbruck, University of Zurich reliable interaction with the environment. Here we present a biologically inspired attention
and ETH Zurich, Switzerland
system developed for the humanoid robot iCub. It is based on input from unconventional
Reviewed by:
event-driven vision sensors and an efficient computational method. The resulting system
Theodore Yu, Texas Instruments
Inc., USA shows low-latency and fast determination of the location of the focus of attention. The
Nabil Imam, Cornell University, USA performance is benchmarked against an instance of the state of the art in robotics artificial
*Correspondence: attention system used in robotics. Results show that the proposed system is two orders
Francesco Rea, Robotics, Brain and of magnitude faster that the benchmark in selecting a new stimulus to attend.
Cognitive Science, Istituto Italiano di
Tecnologia, via Morego 30, 16163 Keywords: visual attention, neuromorphic, humanoid robotics, event-driven, saliency map
Genova, Italy
e-mail: [email protected]

1. INTRODUCTION Selective attention is a key component of artificial sensory sys-


For successfully interacting with the environment in daily tasks, tems; in robotics, it is the basis for object segmentation (Qiaorong
it is crucial to quickly react to ubiquitous dynamic stimuli. et al., 2009), recognition (Miau et al., 2001; Walther et al., 2005)
However, reaction time of state-of-the-art robotic platforms is and tracking (Ouerhani et al., 2005), for scene understanding
limited by the low temporal resolution of sensory data acqui- and action selection for visual tracking and object manipulation.
sition and by the time required to process the corresponding It is also used in navigation, for self-localization and simulta-
sensory data. neous localization and mapping (SLAM) (Frintrop and Jensfelt,
In conventional robotic systems, sensory information is avail- 2008). Moreover, the implementation of biologically inspired
able in a sequence of snapshots taken at regular intervals. Highly models of attention is helpful in robots that interact with human
redundant data are received at fixed frame-rate. High dynamics beings. Engaging attention on similar objects can be the basis for
can be sensed only by increasing the sampling rate, at the cost a common understanding of the environment, of shared goals
of increasing the quantity of data that needs to be transmitted, and hence of successful cooperation. State-of-the art artificial
stored and processed. attention systems, based on traditional video acquisition, suffer
Additionally, the available bandwidth limits the amount of from the high computational load needed to process each frame.
information that can be transmitted, and the available comput- Extreme computational demand limits the speed of the selection
ing platforms limit the speed at which data can be processed, of new salient stimuli and therefore the dynamics of the attention
forcing a compromise between resolution and speed. As a result, scan path. Specific implementations of such models have been
current robotic systems are less efficient in reacting appropriately explicitly developed for real-time applications, exploiting either
to unexpected, dynamic events (Delbruck, 2008). For example, parallelization on several CPUs (Itti, 2002; Siagian et al., 2011)
in robotic soccer competitions (e.g., Robocup, 2011), the perfor- or dedicated hardware (Ouerhani and Hgli, 2003), or the opti-
mance strongly depends on the latencies in the perception loop, mization and simplification of the algorithms (Itti et al., 1998;
where the robot has to detect, track and predict the trajectory Frintrop et al., 2007) for the extraction of features from images,
of the ball, to plan where and when it should be catched. For or combination of them (Bumhwi et al., 2011).
the same reason, in manipulation tasks, unexpected failures of the An alternative approach is the implementation of simplified
grasping are difficult to correct online, resulting in the fall of the models of attention systems based on frame-less event-driven
object to be grasped. On the contrary, robotic systems equipped neuromorphic vision sensors, so far realized with the design of
with vision neuromorphic chips show remarkable performance in ad hoc dedicated hardware devices (Bartolozzi and Indiveri, 2009;
tracking (Serrano-Gotarredona et al., 2009), ball goalkeeping and Sonnleithner and Indiveri, 2012).
pencil balancing (Conradt et al., 2009). Along this line of research, we developed an event-driven,
Differently from main-stream state-of-the art vision systems attention system capable of selecting interesting regions of the
that repeatedly sample the visual input, event-driven vision visual input with a very short latency. The proposed system
(Camunas-Mesa et al., 2012; Wiesmann et al., 2012) samples exploits low latency, high temporal resolution and data compres-
changes in the visual input, being driven by the stimuli, rather sion given by event-driven dynamic vision sensors, as well as an
than by an external clock. As such, event-driven systems are inher- efficient strategy for the computation of the attention model that
ently more efficient because they acquire, transmit and perform directly uses the output spikes from the sensors. The proposed
computation only when and where a change in the input has implementation is therefore fully event-driven, exploiting the
been detected, removing redundancies at the lowest possible level. advantages offered by neuromorphic sensors at its maximum.

www.frontiersin.org December 2013 | Volume 7 | Article 234 | 23


Rea et al. Event-driven attention

Intermediate hybrid approaches can be implemented by recon- General Address Event Processor (GAEP) (Hofstaetter et al.,
structing frames from the events and applying the vast collection 2010). The FPGA merges the data streams from left and right
of available standard machine vision algorithms. However, this camera sensors and interfaces them with the GAEP. The GAEP
approach would suffer from errors in the frame reconstruction provides effective data processing, protocol verification and accu-
due to drifts in the gray level calculation, it would increase the rate time-stamping of the events, with a temporal resolution of
latency of the response and loose the temporal resolution gained 160 ns. Processed events are connected to the rest of the sys-
by the use of event-driven sensors, hindering the full exploitation tem thanks to an USB connection to a PC104 embedded CPU.
of the neuromorphic approach advantages. The PC104 gathers the data and passes them to the processing
The output of the Event-Driven Visual Attention (EVA) system infrastructure of the iCub (Metta et al., 2006).
has been implemented for the humanoid robot iCub which will
therefore be able to quickly orient its gaze, scrutinize and act on 2.2. SOFTWARE
the selected region and react to unexpected, dynamical events. An application running on the embedded PC104 configures the
Additionally, it can be of generic interest for robotics systems with sensors in the preferred operating state. The same software mod-
fast actuation. ule reads the data through the USB port, checking for protocol
In the following, we will describe EVA, show the improved errors and formatting the stream of asynchronous events. Each
latency in the selection of salient stimulus and compare its address event (AE) is composed of the emitting pixel address and
performance with the well-known state-of-the art frame-based the corresponding time-stamp. The application sends the received
selective attention system from the iLab Neuromorphic Vision collection of events on the gigabit network where distributed pro-
C++ Toolkit (iNVT), 1 developed at the University of Southern cessing takes advantage from middleware YARP 2 library. From
California. this point, any process connected to the network can acquire data
and perform computation. There is no limit in the number of
2. METHODS nodes that can be recruited for processing events.
The selective attention system described in this work has been Finally, specific classes are used to efficiently transmit and
developed on the iCub humanoid robot (www.icub.org) and is (un)mask the AER stream into a dedicated format. The AE format
entirely based on the input from non-standard sensors. Such consists in: address event, polarity, timestamp and type. The struc-
sensors use a new way of encoding information based on a ture transparently manages events from the DVS sensor, as well as
custom asynchronous communication protocol Address Event generic events such as complex objects deriving from clustering
Representation (AER). In the following we shortly describe the and feature extraction (Wiesmann et al., 2012).
custom hardware and software modules developed for the atten- Buffers of asynchronous data are handled with a two-threads
tion system implementation. method. N-buffering is used to guarantee concurrent access to
data in process, thus avoiding conflicts and allowing each mod-
2.1. HARDWARE ule to run at the desired rate irrespective of the incoming flow of
The robot is equipped with two asynchronous bio-inspired event. Examples of developed modules are used to: display DVS
Dynamic Vision Sensors (DVS) (Lichtsteiner et al., 2008). It fea- activity, generate feature maps, perform weighted combination of
tures three degrees of freedom in the eyes to realize the tilt, multiple feature maps.
vergence and version movements required for the implementa-
tion of active vision. As opposed to the traditional frame-based 2.3. EVENT DRIVEN VISUAL ATTENTIONEVA
approach, in the DVS each pixel responds to local variations EVA is an event-driven reduced implementation of the saliency-
of contrast. It emits an asynchronous digital pulse (spike or map based attention model proposed by Koch and Ullman (1985)
event) when the change of the logarithm of light intensity and Itti and Koch (2001). In this foundational work, the authors
exceeds a pre-defined threshold. This bio-inspired sensory trans- propose a biophysically plausible model of bottomup attention
duction method is inherently efficient, as it discards redundancies where multiple feature maps concur to form a unique saliency
at the lowest level, reducing the data acquisition, transfer, storage map used to compute the location of the focus of attention. Each
and processing needs. This technique preserves the high dynamic feature map encodes for a characteristic of the visual input such as
content of the visual scene with a temporal granularity of few color opponency, orientation, contrast, flicker, motion, etc. com-
hundreds of nanoseconds. puted at different spatial scales. These maps are then normalized
The visual system is entirely based on the AER protocol (Deiss and summed together to form the final saliency map. The saliency
et al., 1998). The sensors asynchronously send digital spikes or map topologically encodes for local scene conspicuity, irrespec-
events that signal a relative contrast change in the pixel. The tive of the feature dimension that has contributed to its salience.
address transmitted with the event corresponds to the identity of That is, an active location in the saliency map encodes the fact
the active pixel. Information is self encoded in the timings of the that this location is salient, no matter whether it corresponds to a
spikes. 45 oriented object in a field of prevalent orientation of 90 , or to
A dedicated printed circuit board located in the head of a stimulus moving in a static background. Eventually, a winner-
the robot hosts a Field Programmable Gate Array (FPGA) and take-all (WTA) network selects regions in the map in order of
an embedded processor specialized for asynchronous data, the decreasing saliency, and guides the deployment of the focus of

1 http://ilab.usc.edu/toolkit/home.shtml 2 Yet Another Robotic platform.

Frontiers in Neuroscience | Neuromorphic Engineering December 2013 | Volume 7 | Article 234 | 24


Rea et al. Event-driven attention

attention and gaze. In EVA, events from the visual sensor con- DOG (Difference of Gaussians) and Gabor filters, respectively.
cur to generate a number of feature maps. Figure 1 shows the On the contrary, EVA uses a much simpler and efficient imple-
model and the distribution of the diverse computing modules mentation: the mapping. In the mapping, a RF is defined as a
on the hardware platform described in paragraph 2.1. Once col- look-up table. The level of activation of the RF increases when
lected by the dedicated hardware, the sensors events are sent to it receives ON-spikes from the DVS pixel located in the ON-
software modules that extract diverse visual features. The corre- region of the RF and OFF-spikes in the OFF-region. If the neuron
sponding feature maps are then normalized and summed. The does not receive any spike over time, the activation decreases.
resulting saliency map is then transmitted to a WTA network that When the neuron activation crosses a threshold, it generates a
generates the attentional shifts. new event in the corresponding location of the feature map.
Figures 2A,B show two center-surround RFs. Each RF has a
2.3.1. Feature extraction defined location in the visual space and a specific size. The algo-
In EVA a number of features are extracted from the DVS output rithms below explain the procedure that generates the response
to populate diverse feature maps. As the DVS does not convey of the RF.
information about color or absolute intensity, we implemented The visual field is covered with RFs following a multiscale
a subset of feature maps from Itti and Koch (2001): contrast, approach. In the current implementation, we use two different
orientation (0 , 45 , 90 , 45 ) and flicker map. Specifically, scales with 4 4 and 8 8 pixels receptive fields. Figures 2C,D
the flicker map encodes for the scene temporal changes and show the RFs of oriented cells with different sizes. Sub-regions
in EVA it is implemented by directly using the sensors contribute to facilitation for aligned RFs: spikes from the visual
output. Contrast and orientation feature maps are generated by field contributing to the activation of the RF at the border of the
the output of filters inspired by receptive fields of center-surround elongated central region (in green) contribute to the activation of
retinal ganglion cells and simple cells of primary visual cor- neighboring RFs aligned along the same orientation. This feature
tex (Hubel and Wiesel, 1962; Movshon et al., 1978; De Valois enhances the representation of long oriented edges (Ferster and
et al., 1982; Kandel et al., 2000), respectively. Receptive field Koch, 1987) by reinforcing the activity of RFs responding to the
activation is usually obtained by convolving the image with same oriented edge.

FIGURE 1 | Structure of EVA and its implementation on the diverse HW AexGrabber module. From then on, any SW module can acquire the events
modules: the DVS cameras send asynchronous events, the FPGA buffer and perform a specific computation (feature extraction, normalization,
merges left and right DVS events, the GAEP assigns a timestamp to linear weighted sum, and WTA selection). A dedicated connection between
each event. The resulting list of addresses and timestamps is sent to the the feature extracting modules and the Event-Selective-Attention module
PC104 that makes them available to the YARP network through the avoids interferences between feature maps and other trains of events.

www.frontiersin.org December 2013 | Volume 7 | Article 234 | 25


Rea et al. Event-driven attention

FIGURE 2 | Receptive fields of cells used for the mapping 8 8 cell. ON- and OFF-regions in orange and white, respectively.
procedure: center-surround cells. (A) Four 4 4 cells, (B) one In green, the pixels that contribute to the activity of neighboring
8 8 cell. Simple oriented cells: (C) two 4 4 cells, (D) one cells.

Data: RF: receptive field; r : event


Result: update of the activation of RF
if RF.type == ON then
if r.polarity == ON then
Data: bRE: buffer of retina events, featMap: mapping related if r RF.center then
to the feature map RF.response := RF.response + c;
Result: bFE: buffer of feature-maps events end
c = constant; else
foreach event bRE do RF.response := RF.response c;
mapEvent = map(event, featMap); end
RF = belong(mapEvent); end
updateActivation(RF); else
if affectNeighbor(mapEvent) then if r RF.center then
RFNeighbor = lateralConnection(mapEvent); RF.response := RF.response c;
updateActivation(RFNeighbor); end
if RFNeighbor.activation > positiveThreshold then else
featureEvent = RF.response := RF.response + c;
end generateFeatureEvent(RFNeighbor); end
featureEvent.polarity = positive; end
end end
if RF.activation < negativeThreshold then else
featureEvent = generateFeatureEvent(RF); if r.polarity == ON then
featureEvent.polarity = negative; if r RF.center then
end RF.response := RF.response + c;
end end
if RF.activation > positiveThreshold then else
featureEvent = generateFeatureEvent(RF); RF.response := RF.response c;
featureEvent.polarity = positive; end
end end
if RF.activation < negativeThreshold then else
featureEvent = generateFeatureEvent(RF); if r RF.center then
featureEvent.polarity = negative; RF.response := RF.response c;
end end
else
RF.response := RF.response + c;
end
end
end

Frontiers in Neuroscience | Neuromorphic Engineering December 2013 | Volume 7 | Article 234 | 26


Rea et al. Event-driven attention

The mapping is less computationally demanding than a tra- to the oculomotor controllers to direct the robots gaze toward
ditional convolution operation. Additionally, with this approach, salient regions with a saccade command. It continuously updates
the feature maps are only updated at the arrival of a new spike, the saliency map and the resulting focus of attention location,
without calculation of the RF activation for the complete image allowing for fast reaction to unexpected, dynamic events and for
at each time step. To further reduce the computational load, a more natural interaction of robots with the environment.
we implemented non-overlapping receptive fields, at the cost of The improvement in computation latency is obtained thanks
reducing the output resolution. However, in EVA the final goal to many factors, among which the asynchronous low-latency and
is to obtain short latency in relation to saliency map resolution low-redundancy input, efficient sensory encoding, and efficient
that guarantees reliable gaze shift. As a result the selected region computing strategy (the mapping).
is focused in the sensors fovea for detailed inspection. To assess its performances and validate our results, we tested
EVA in three different experimental setups. Unfortunately, a
2.3.2. Saliency map and attention selection direct quantitative comparison of the performance of EVA with
The final saliency map is obtained through weighted linear com- literature state-of-the art artificial bottomup attention systems
bination of the computed contrast (I), orientation (O) feature cannot be performed as each has its own characteristics in terms
maps and flicker feature map (F): of feature maps, methods for feature map calculation, hardware,
software implementation, and stimuli (Borji and Itti, 2012). For
S = Norm (kI I + kO O + kF F) (1) this reason, we rather preferred to benchmark our implemen-
tation against the state-of-the art main-stream system based on
The weights kI , kO , and kF can be changed in real-time to the Itti and Koch (2001) model: the iLab Neuromorphic Vision
bias saliency computation toward behaviorally relevant features, Toolkit (iNVT) (Itti et al., 1998, 2003; Navalpakkam and Itti,
implementing a task-dependent bias (Itti and Koch, 2001). 2005) sharing the same number and type of feature maps, hard-
Finally, a WTA module selects the most conspicuous location of ware platform and stimuli. The iNVT algorithm is based on
the saliency map, defining the current focus of attention. Feature traditional frame-based cameras and convolution operation for
extraction can be performed in parallel by multiple modules, the calculation of the feature maps.
however, the normalization and sum of feature maps into the The two systems are at the two opposite extremes, one is fully
saliency map is sequential and requires time. The data-driven sys- event-driven, the other fully frame-based. Other intermediate
tem further improves the speed of computation, as the saliency solutions might be implemented, where the output of the DVS
map is updated only with the last train of events, avoiding a is first translated into frames by integrating spikes over time, then
complete generation of the entire map. iNVT is used on the resulting sensory output. However, the nec-
In iNVT, as well as in most of saliency map based selective essary transition from event-driven to frame-based information
attention models, the currently selected location is deselected coding spoils some of the advantages of event-driven acquisition,
thanks to a self-inhibition mechanism, known as Inhibition of such as temporal resolution and low latency and brings addi-
Return (IOR). This mechanism prevents the system from imme- tional costs and relevant overhead in the computation. It is worth
diately re-select the current winner, and allows for a scan of many to further detail at which extent the performance improvement
points of the saliency map in order of decreasing conspicuity. inherits from the use of DVS sensor as compared to the use of
However, in our setup neither EVA nor iNVT implement IOR, event-based algorithm implementation. As shown in the sum-
rather, the shifts of the focus of attention are determined by mary table 2, the latency of EVA amounts to 23 us, of which 15us
intrinsic noise in the system. can be attributed to the characteristic latency of the DVS sensor
2.3.3. Ocular movements
(Lichtsteiner et al., 2008) and the remaining 8 s as result of the
event-based algorithm. On the contrary, in frame-based scenario,
A dedicated module implements saccades or gaze shifts toward
the latency is affected by both the acquisition time (for 30 fps
salient regions selected by EVA. Tremor and microsaccades are
acquisition the acquisition time interval is 33 ms) and the frame-
used to generate motion of static visual stimuli on the DVS sen-
based algorithm for the image processing which we measured in
sor focal plane, to elicit activity of the pixels that only respond to
23 ms. The performance of such systems would be in terms of
stimulus changes. This approach is similar to the mammals visual
qualitative performance and computational cost in between the
system, where small eye movements counteract photoreceptors
two extremes that are analyzed in the following.
bleaching adaptation (Kowler, 2011). Tremor is implemented as
The two systems are implemented on the iCub robot using
an omnidirectional movement of 0.45 amplitude with frequency
respectively the DVS and the standard robots Dragonfly cam-
of 500 Hz and random direction, superimposed on microsaccades
eras. They simultaneously run on two identical machines3; both of
of amplitude 0.75 and frequency 2.5 Hz in exclusively horizontal
them distribute the processing over the four available CPU cores.
direction.
To correctly compare the two systems, we implemented the same
3. PERFORMANCE AND BENCHMARK type and number of feature maps in both, restricting the numer-
The absolute novelty of EVA is in the short latency of the atten- ous feature maps of iNVT to intensity, orientation and flicker4. In
tional shifts that guarantees fast reaction times. The proposed
processing of the attention system generates short latency that 3 Intel Core 2 Quad Cpu Q9950 @2.83GHz
hardly compares with the performance of frame-based attention 4 ezvisionin=raster:*.ppm display=display -T -j 4 input-frames=@30Hz
systems. The selected attended location can be communicated textlog=iNVTLog.log vc-chans=IOF ior-type=None use-random.

www.frontiersin.org December 2013 | Volume 7 | Article 234 | 27


Rea et al. Event-driven attention

order to remove any overhead to the computation time, the iNVT events associated to feature maps from the moment a new stim-
program processes a batch of camera images. ulus arrives. The latter represents the time interval to process the
The stimulus is placed at a distance d in front of the robot and generated trains of events and determine attentional shift. In both
centered in the fovea of both the Dragonfly and DVS cameras, measures, a sequence of events is needed to alter the output of the
such that it is completely visible and the quantity of received light module. The frequency of redeployment of the focus of attention
is comparable for both sensors. The sensors have been configured depends on the time needed to acquire enough visual data and
with typical parameters (see Table 1) and have not been specif- the time required to extract features, compute saliency and per-
ically tuned for the experiments, in order to assess the systems form the winner-take-all selection. On the contrary, for iNVT we
performance in typical use cases. present a single frame and we measure the time interval necessary
For each experiment we report the diffuse scene light measure5 for the system to process the camera image.
since the performance of both sensors and, consequently, of the CPU utilization and data rate give an accurate measure of
two attention systems depend on the illumination level. the computation demand of both implementations. To obtain
For all of the validation setups we report the focus of an unbiased measure, we normalized by the number of atten-
attentions scan path generated by the two systems, giving an tional shifts and report the computational load per shift. The
immediate qualitative evaluation of the computation time. For benchmark comprises three test experiments. The first uses typi-
a quantitative assessment the benchmark comprises a set of cal stimuli for visual experiments, such as oriented gratings, and
predefined measurements: is run under two different illumination conditions. The second
shows the performance of the EVA system with a fast unpre-
Number of shifts of the focus of attention over time FEVA and dictable stimulus such as a chaotic pendulum. The third indicates
FiNVT and the correspondent time interval between consecu- how performance changes with the increase of the information to
tive shifts tEVA and tiNVT process.
CPU utilization UEVA and UiNVT 6
Data rate DEVA and DiNVT 3.1. FIRST EXPERIMENT, GRATINGS WITH DIFFERENT ORIENTATIONS
Latency time interval LEVA and LiNVT Figure 3A shows the stimulus used in the first characterization
setup: two horizontal and two vertical gratings of 4 4 cm with
The time interval between two consecutive shifts in the selec- a gaussian profile, each positioned at the distance d = 20 cm
tive attention is a good measure of the frequency of attentional from the camera. In this scenario the stimuli are static and the
redeployments. The latency measure gives an estimate of the DVS output is generated with the use of microsaccades (see
minimum reaction time to the new stimuli. section 2.3.3).
We measure the latency in both systems as the time interval
3.1.1. Case A, bright illumination
from the instant a novel stimulus is presented to a complete pro-
cessing of the visual input. In EVA, the latency interval comprises The focus of attention locations selected by EVA and iNVT and
the time interval for feature extraction and WTA selection. The their hit frequency are shown in Figures 3C,B, respectively. Both
former represents the time necessary to generate a new flow of systems select conspicuous locations corresponding to the ori-
ented gratings, with slightly different patterns. As we disabled
inhibition of return, the specific focus of attention scan-path
depends on the computed saliency and on the noise present in the
Table 1 | Setup parameters of DVS and Dragonfly sensors.
system. Small differences in stimulus illumination and noise pat-
Parameter dragonfly Value Bias DVS Value (A) tern can contribute to slightly different computed saliency for the
same grating placed in different regions; the missing inhibition
Width 320 (pixel) cas 0.094 of selected areas over a long period of time leads to the selection
Height 640 (pixel) injg 0.0182 of fewer stimuli with very similar salience, as shown in Figure 3B.
Shutter 0.913 reqPd 3.0 Two of the oriented gratings are not selected by iNVT, despite they
Gain 0.312 pux 1.4401 should have had exactly the same salience. In this scenario, EVA
White balance A 0.506 diffoff 2.378e5 is capable of selecting more stimuli, reducing the latency, prob-
White balance B 0.494 req 0.0287 ably thanks to the different pattern of noise, that is intrinsically
Sharpness 0.5 refr 1.688e4 generated by the hardware.
Hue 0.48 puy 3.0 In EVA, the data rate depends on the lighting condition and
Gamma 0.4 diffon 0.1143 on the stimulus, under these conditions it is about 7 kAE/s.
Saturation 0.271 diff 0.0054 Conversely, the data rate produced by a traditional color cam-
Framerate 30 (fps) foll 3.576e6 era only depends on intrinsic parameters of the systems such as
pr 1.431e6 number of pixels, color resolution and frame rate, being indepen-
dent from the stimulus; for the Dragonfly used on the iCub this
amounts to 530 Mbits/s. The lower amount of data corresponds
5 Measured by portable hand-held exposure meter Gossen Lunasix F. to lower processing demand and, hence, in a faster computation
6 Measurements performed with SAR, a program that directly measures the of the focus of attention location. Consequently this results in
computational load on the processor over a user-defined time interval. higher shifts frequency generated by EVA with respect to iNVT,

Frontiers in Neuroscience | Neuromorphic Engineering December 2013 | Volume 7 | Article 234 | 28


Rea et al. Event-driven attention

FIGURE 3 | First scenario. Case A: comparison of the shifts the use of artificial light). The (x,y) coordinates of the two attention
generated by iNVT (B) and EVA (C) in response to four oriented systems correspond to the image coordinates of the sensors
gratings (A) under bright background illumination (55.6 LUXindoor (240 320 for the Dragonfly and 128 128 for the DVS). (D) Mean
illumination of a diffuse bright natural light). Case B: comparison of and standard deviation of CPU system percentage of utilization
the shifts generated by iNVT (E) and EVA (F) under dim background (green), temporal distance between consecutive shifts (blue) over 10
illumination (2.7 LUXdim illumination that typically would require repetitions of 10 trials in both illumination conditions.

Table 2 | Quantitative benchmark: f : frequency of attentional shifts [shifts/s], L : Latency time interval [s], U : normalized cpu utilization [%] ,
D : data rate of input [Kbit/s], : duration of the acquisition[s].

Experiment 1 Experiment 2

Bright (55.6 LUX) Dim (2.7 LUX) 27.2 LUX

iNVT EVA iNVT EVA EVA

hor. top 60.85% 33.53% 100% 0% .


hor. bot 0% 33.19% 0% 0% .
ver. top 39.15% 15.06% 0% 0% .
ver. bot 0% 18.21% 0% 100% .
f (shifts/s) 1.89 158.2 18.08 3.72 1708.80
L (s) (5.60 0.3)e2 (23.2 3)e4 (5.56 0.3)e2 (23.1 3)e4 (3.72 1)e3
U (%) 6.79 0.2 8.4 1.3 0.2
D (Kbit/s) 530e3 226.72 0.078 530e3 20.32 1.2 2.1e3
t (s) 100 100 100 100 2.68

First experiment: number of hits clustered on the horizontally (top and bottom) and vertically (top and bottom) oriented grating stimuli under bright and dim
illumination. Second experiment: performance of the EVA in details.

and higher number of attention relocation, as shown in Table 2. the frame acquisition frequency (30 ms) and the time between
EVA, because of the temporal resolution of the visual signal and two attentional shifts amounts to 50 ms. The latencies of the
the low computational demand, can generate a shift of attention two systems differ of two orders of magnitude, showing the high
approximately every 1.5 ms, on the contrary, iNVT is limited by responsiveness of EVA to sudden and potentially relevant stimuli.

www.frontiersin.org December 2013 | Volume 7 | Article 234 | 29


Rea et al. Event-driven attention

Figure 3 shows that the computation demand of EVA is lower Figure 3 shows the aggregated performance measures for
than iNVT of at least one order of magnitude, as expected from case A and B for both EVA and iNVT; Despite the latency
the lower Data Rate and the different computational load of the of both systems remains largely unchanged, EVA outperforms
mapping procedure. This different performance is also reflected iNVT, while the normalized computational load increases, with
in the shift latency that amounts to about 300 ns for EVA and different slopes. In case B (low illumination), iNVT abso-
0.4 ms for iNVT. lute CPU usage remains unchanged but it is normalized for
a lower number of shifts; in EVA both the number of shifts
3.1.2. Case B, dim illumination and the CPU utilization decrease, as a result of a lower
One of the advantages of using the logarithmic encoding in input data rate, as shown in Table 2. The resulting nor-
the DVS is the wider dynamic range with respect to traditional malized computation load increases less than what observed
sensors. Thus, we tested the attention systems in the scenario for iNVT.
described above, but with reduced ambient light. The result-
ing focus of attention scan path is shown in Figures 3E,F. The 3.2. SECOND EXPERIMENT: CHAOTIC PENDULUM
selection of the top horizontally oriented grating in the iNVT sys- We used a chaotic pendulum to test EVA with fast unpredictable
tem and the selection of the bottom vertical oriented grating in stimuli. The chaotic pendulum shown in Figure 4A is composed
EVA are the results of a strong decrease of the response strength. of two black bars (22 and 18 cm) connected by a low friction
The lower illumination affects both systems by drastically reduc- joint and attached to a fixed support via a second low friction
ing the number of shifts. A way to improve this behavior in joint. In this configuration, the first bar can freely rotate with
EVA would be the implementation of adaptive firing threshold, respect to the support and its movement is influenced by the sec-
that can be dynamically set according to the level of background ond bar that revolves independently around the first joint. The
illumination (Delbruck and Oberhof, 2004). pendulum is mounted over a white background and we used an

FIGURE 4 | Second scenario. (A) The chaotic pendulum is located 50 on the head with fixed supports. (B) Raster representation of the
cm far from the robot to keep the whole stimulus in the cameras field activation of pixels in the DVS and relative location of the WTA
of view. In the setup, DVS sensors are embedded in the iCubs eyes selected by EVA. (C) Events generated by the chaotic pendulum, (D)
to exploit ocular movements, while the Dragonfly cameras are mounted focus of attention scan-path generated by EVA.

Frontiers in Neuroscience | Neuromorphic Engineering December 2013 | Volume 7 | Article 234 | 30


Rea et al. Event-driven attention

average lighting condition of 27.6LUXcorresponding to diffuse


illumination where artificial light is not required.
The stimulus is so fast that neither the Dragonfly, nor the
human eye, can successfully perceive its full motion. In this
scenario, iNVT hardly relocates the focus of interest on the pen-
dulum without introducing an evident delay. In iNVT, such shift
is clearly shown in the video provided in Technical Materials. 7
Conversely, we accurately assess the performance of EVA as a
viable to technological solution for fast dynamic stimuli in a wide
range of operating conditions.
The fast movement of the pendulum generates a higher data
rate with respect to the previous scenario. The resulting perfor-
mance parameters are listed in Table 2.
To estimate the quality of the attention system, in Figure 4
we compare the trajectory generated by the pendulum with the
location of the attention shifts over time. To achieve this, we syn-
chronized the generation of attention shifts with the batch data FIGURE 5 | Expected evolution of the normalized computation
of the generated events, using the temporal information stored in demand for increasing the sensors output by increasing the sensors
the timestamp. size, in green iNVT and in red EVA. In the inset the measured data for
EVA, for increasing number of events, generated by increasing number of
3.3. THIRD EXPERIMENT: PERFORMANCE SCALING WITH QUANTITY black rotating edges repeated, respectively, at 360 , 180 , 90 , 45 , and
22.5 , in red the fit from which we extrapolate the computational demand
OF INFORMATION
for bigger sensors.
In this scenario, we assess how the performance of EVA scales
with increasing number of events. In EVA, the number of events
can increase for cluttered scenes and for higher resolution sen-
sors, increasing the computational demand of the system. This correspondence of quantity of information generated by different
happens also for iNVT, when higher resolution sensors are used. fixed size sensors (128 128, 320 240, 640 480).
We estimate how the computation demand expressed in CPU uti- To estimate these numbers, we select speed of rotating bar that
lization scales with the processed information (number of bits). covers typical use (4.2284 rad/s). Even though in normal scene
In order to perform such analysis we determined how the com- operation the DVS activation is about 30%, in this test, we con-
putation demand of the two systems change when the quantity sider the worst case scenario where all the pixels in the sensor
of information scales. For EVA, we control the number of gener- show the maximum level of activation (27 events per pixel).
ated events (and then quantity of information) by increasing the Thus, we estimate the maximum computation load associ-
number of black edges printed on a white disk rotating at con- ated to sensors that have dimension 128 128, like the DVS,
stant speed. We use five different configurations, where the edge 320 240, like the Dragonfly used for iNVT and 640 480. As
is repeated every 360 , 180 , 90 , 45 , and 22.5 . Figure 5 shows the processing required by EVA sets well below iNVT, we con-
the normalized CPU utilization measured in this experiment (red clude that for any possible situation, the required processing of
dots in the inset); we do a linear fit of the normalized CPU uti- EVA results less impacting on the performance than iNVT.
lization in relation with the increasing quantity of information This confirms the assumption that relevant saving in compu-
processed. We use this function as reference to estimate the com- tation demand is associated to the design of the processing in EVA
putation demand of EVA at arbitrary number of generated events. and it is not limited to the hardware of the system.
Similarly, for iNVT, we provide different sets of images. The sets
differ only for the dimension of the images, the stimulus is the 4. DISCUSSION
same of Figure 4A. The computation demand increases with the In this manuscript we described EVA, a real-time implemen-
quantity of processed information (green dots in Figure 5) and tation of selective attention based on the bio-inspired model
we fit a first order curve that best describes the distribution. proposed in the foundational work of Itti and Koch (2001), that
Figure 5 shows that the required level of processing for EVA uses a frame-less asynchronous vision sensor as input. The goal
(in red) never exceeds the required level of processing of iNVT of the implementation was to offer a flexible and light com-
(in green) as it increases with a more gradual slope. Figure 5 putational paradigm for the computation of the feature maps,
shows that the required level of processing for EVA (in red) never exploiting the mapping mechanism. The overall performance of
exceeds the required level of processing of iNVT (in green) as it the developed system takes advantage of the efficient informa-
increases with a more gradual slope. This observed divergence tion encoding operated of the sensor, its high dynamic range,
indicates the increasingly better performance of EVA, as more low response latency and high temporal resolution. The use of
processing is required. Key points (arrows in figure) help identi- non-conventional sensors coupled with an efficient processing
fying the estimated computation demand for both the systems in results in a unprecedented fast attentional selection. We report the
behavior of the system in three different experiments that high-
7 http://youtu.be/Nqd3uRbjXHE light performance in detail. The three experiments give insights

www.frontiersin.org December 2013 | Volume 7 | Article 234 | 31


Rea et al. Event-driven attention

on the three major benefits of EVA: low computation demand, chips (Serrano-Gotarredona et al., 2008) and using the SAC (or
high responsiveness to high frequency stimuli and favorable higher resolution implementations) for WTA and IOR. Both
scalability. The attention system EVA requires lower computation implementations would probably be faster than the software
utilization up to one order of magnitude with respect to iNVT mapping procedure described in this manuscript, for example,
when stimulated by identical stimulus. This positive characteris- the ConvNet chip can start providing the output of oriented filters
tic does not degrade the quality of the generated attention shifts. with a 1 s latency and is shown to perform pseudo-simultaneous
The characteristic of low response latency and high temporal res- object recognition. This system, with the appropriate miniatur-
olution resulting from the efficient design of the attention system ization and integration with topdown modules implemented on
EVA allow remarkable performance in highly dynamic scenarios. the robot, will be able to give a fast estimate of the focus of atten-
The attention system accurately and swiftly redeploys the atten- tion, leaving the computational units of the iCub free for other
tional foci on the most salient regions in the visual field even in more complex tasks.
situations where frame-based algorithms of visual attention fail
in obtaining clear interpretation of the stimulus. The second sce- ACKNOWLEDGMENTS
nario shows that the high temporal resolution allows the attention This work has been inspired by fruitful discussions at the
system to track very fast stimuli, expanding the application range Capocaccia Cognitive Neuromorphic Engineering Workshop.
of the system from humanoid robotics to even more demanding The present work benefited form the great support of Giacomo
use cases. We presented a solution that, by sensing via efficient Indiveri who provided valuable comments and ideas. The authors
event-driven hardware sensor, provides outperforming selective would like to thank iLab and Prof. L. Itti for making the iNVT
attention mechanism with low latency and high dynamic range. toolkit freely available.
In addition, for increasing the information load, e.g., for higher
resolution sensors, EVAs CPU utilization increases with lower rate FUNDING
than iNVTs. The design of efficient processing in EVA guaran- This work has been supported by the EU grant eMorph (ICT-
tees, when compared with iNVT, relative superior performance of FET-231467).
growing effectiveness with the amount of processed information.
SUPPLEMENTARY MATERIAL
Finally, the last benchmark shows that the computational advan-
The Supplementary Material for this article can be found online
tage of EVA is not restricted to the specific stimuli and sensor
at: http://www.frontiersin.org/journal/10.3389/fnins.2013.
dimension used in this experimental setup, rather is more general.
00234/abstract
Most attention systems designed for real-time applications
report the computational cost in terms of time needed to pro-
cess a frame. The relative saliency map is often obtained in about REFERENCES
Bartolozzi, C., and Indiveri, G. (2009). Selective attention in multi-chip address-
5060 ms, slower than typical image frame-rate (30 frames per event systems. Sensors 9, 50765098. doi: 10.3390/s90705076
second) (Frintrop et al., 2007). This time scale is appropriate Bartolozzi, C., Rea, F., Clercq, C., Hofsttter, M., Fasnacht, D., Indiveri,
to reproduce typical attentional scan-paths, nevertheless, 50 ms G., et al. (2011). Embedded neuromorphic vision for humanoid robots,
(plus 30 ms for frame acquisition) is the lower bound for reacting in IEEE Computer Society Conference on Computer Vision and Pattern
to the onset of a new potentially interesting or threatening stimu- Recognition Workshops (CVPRW) (Colorado Springs, CO), 129135. doi:
10.1109/CVPRW.2011.5981834
lus. With EVA, this limit is estimated to be as small as about 1 ms Borji, A., and Itti, L. (2012). State-of-the-art in visual attention modeling. IEEE
[plus 15 s of sensor latency (Lichtsteiner et al., 2008)], thanks to Trans. Pattern Anal. Mach. Intell. 35, 185207. doi: 10.1109/TPAMI.2012.89
the low-latency event-driven front-end data acquisition and the Bumhwi, K., Hirotsugu, O., Tetsuya, Y., and Minho, L. (2011). Implementation
low computational cost of the attention system. This property is of visual attention system using artificial retina chip and bottom-up saliency
map model, in Neural Information Processing. Volume 7064 of Lecture Notes
crucial in robotics systems, as it allows the robot to plan actions
in Computer Science, eds B.-L. Lu, L. Zhang, and J. Kwok (Berlin; Heidelberg:
for reacting to unforeseen situations and sudden stimuli. Springer), 416423. doi: 10.1007/978-3-642-24965-5_47
EVA has been developed to equip the iCub with a fast and low Camunas-Mesa, L., Zamarreno-Ramos, C., Linares-Barranco, A., Acosta-Jimenez,
weight attention system, exploiting the event-driven vision sys- A., Serrano-Gotarredona, T., and Linares-Barranco, B. (2012). An event-driven
tem of the iCub (Bartolozzi et al., 2011). The mapping procedure multi-kernel convolution processor module for event-driven vision sensors.
IEEE J. Solid State Circ. 47, 504517. doi: 10.1109/JSSC.2011.2167409
for events filtering and feature maps generation derives from AER
Conradt, J., Cook, M., Berner, R., Lichtsteiner, P., Douglas, R., and Delbruck, T.
implementations (Bartolozzi and Indiveri, 2009; Sonnleithner (2009). A pencil balancing robot using a pair of AER dynamic vision sensors,
and Indiveri, 2012), where a simple mapping is realized to use in International Symposium on Circuits and Systems, (ISCAS), 2009 (Taipei:
the sensor output as feature map. The resulting saliency map IEEE), 781784. doi: 10.1109/ISCAS.2009.5117867
is sent to a dedicated hardware platform that implements the De Valois, R. L., Albrecht, D. G., and Thorell, L. G. (1982). Spatial frequency selec-
tivity of cells in macaque visual cortex. Vis. Res. 22, 545559. doi: 10.1016/0042-
winner-takes-all selection enriched with dedicated inhibition of 6989(82)90112-2
return mechanism (the Selective Attention Chip, SAC). The Deiss, S., Douglas, R., and Whatley, A. (1998). A pulse-coded communications
modules developed in this work and EVA can easily be inte- infrastructure for neuromorphic systems, in Pulsed Neural Networks, chapter 6,
grated with such a system and further optimized. For example, eds W. Maass and C. Bishop (Cambridge, MA: MIT Press), 157178.
maximization of the performance can be achieved by imple- Delbruck, T. (2008). Frame-free dynamic digital vision, in Proceedings of the
International Symposium on Secure-Life Electronics, Advanced Electronics for
menting the mapping procedure and the relative feature maps Quality Life and Society (Tokyo, Japan), 2126. doi: 10.5167/uzh-17620
on an embedded FPGA (Bartolozzi et al., 2011; Fasnacht and Delbruck, T., and Oberhof, D. (2004). Self biased low power adaptive photorecep-
Indiveri, 2011) or implementing fast convolution on ConvNet tor. Intl. Symp. Circ. Syst. 4, 844847. doi: 10.1109/ISCAS.2004.1329136

Frontiers in Neuroscience | Neuromorphic Engineering December 2013 | Volume 7 | Article 234 | 32


Rea et al. Event-driven attention

Fasnacht, D., and Indiveri, G. (2011). A PCI based high-fanout AER mapper Ouerhani, N., and Hgli, H. (2003). Real-time visual attention on a massively
with 2 GiB RAM look-up table, 0.8 s latency and 66 mhz output event-rate, parallel simd architecture. Real Time Imag. 9, 189196. doi: 10.1016/S1077-
in Conference on Information Sciences and Systems, CISS 2011 (Johns Hopkins 2014(03)00036-6
University), 16. doi: 10.1109/CISS.2011.5766102 Qiaorong, Z., Guochang, G., and Huimin, X. (2009). Image segmentation
Ferster, D., and Koch, C. (1987). Neuronal connections underlying orientation based on visual attention mechanism. J. Multimedia 4, 363370. doi:
selectivity in cat visual cortex. Trends Neurosci. 10, 487492. doi: 10.1016/0166- 10.4304/jmm.4.6.363-370
2236(87)90126-3 Robocup, T. (2011). RoboCup official site. URL: http://www.robocup.org/
Frintrop, S., and Jensfelt, P. (2008). Active gaze control for attentional visual Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., Linares-Barranco, A., Paz-
slam, in IEEE International Conference on Robotics and Automation, ICRA 2008 Vicente, R., Gmez-Rodriguez, F., et al. (2009). CAVIAR: a 45k neuron, 5M
(Pasadena, CA), 36903697. doi: 10.1109/ROBOT.2008.4543777 synapse, 12G connects/s aer hardware sensoryprocessing learningactuating
Frintrop, S., Klodt, M., and Rome, E. (2007). A real-time visual attention system system for high-speed visual object recognition and tracking. IEEE Trans. Neural
using integral images, in In Proceedings of the 5th International Conference on Netw. 20, 14171438. doi: 10.1109/TNN.2009.2023653
Computer Vision Systems (ICVS) (Bielefeld). doi: 10.2390/biecoll-icvs2007-66 Serrano-Gotarredona, R., Serrano-Gotarredona, T., Acosta-Jimenez, A., Serrano-
Hofstaetter, M., Schoen, P., and Posch, C. (2010). A SPARC-compatible general Gotarredona, C., Perez-Carrasco, J., Linares-Barranco, A., et al. (2008).
purpose address-event processor with 20-bit 10ns-resolution asynchronous sen- On real-time aer 2d convolutions hardware for neuromorphic spike
sor data interface in 0.18 m CMOS, in International Symposium on Circuits based cortical processing. IEEE Trans. Neural Netw. 19, 11961219. doi:
and Systems, ISCAS (Paris), 42294232. doi: 10.1109/ISCAS.2010.5537575 10.1109/TNN.2008.2000163
Hubel, D. H., and Wiesel, T. N. (1962). Receptive fields, binocular interaction and Siagian, C., Chang, C., Voorhies, R., and Itti, L. (2011). Beobot
functional architecture in the cats visual cortex. J. Physiol. 160, 106154. 2.0: cluster architecture for mobile robotics. J. Field Robot. 28,
Itti, L. (2002). Real-time high-performance attention focusing in outdoors color 278302. doi: 10.1002/rob.20379
video streams, in Proceedings of the SPIE 4662, Human Vision and Electronic Sonnleithner, D., and Indiveri, G. (2012). A real-time event-based selective atten-
Imaging VII (San Jose, CA), 235. doi: 10.1117/12.469519 tion system for active vision, in Advances in Autonomous Mini Robots, eds U.
Itti, L., Dhavale, N., and Pighin, F. (2003). Realistic avatar eye and head animation Ruckert, S. Joaquin, and W. Felix (Berlin; Heidelberg: Springer), 205219. doi:
using a neurobiological model of visual attention, in Proceedings of the SPIE 10.1007/978-3-642-27482-4_21
48th Annual International Symposium on Optical Science and Technology. Vol. Walther, D., Rutishauser, U., Koch, C., and Perona, P. (2005). Selective visual atten-
5200, eds B. Bosacchi, D. B. Fogel, and J. C. Bezdek (Bellingham, WA: SPIE tion enables learning and recognition of multiple objects in cluttered scenes.
Press), 6478. doi: 10.1117/12.512618 Comput. Vis. Image Underst. 100, 4163. doi: 10.1016/j.cviu.2004.09.004
Itti, L., and Koch, C. (2001). Computational modelling of visual attention. Nat. Rev. Wiesmann, G., Schraml, S., Litzenberger, M., Belbachir, A., Hofstatter, M.,
Neurosci. 2, 194203. doi: 10.1038/35058500 and Bartolozzi, C. (2012). Event-driven embodied system for feature
Itti, L., Koch, C., and Niebur, E. (1998). A model of saliency-based visual attention extraction and object recognition in robotic applications, in 2012 IEEE
for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 12541259. Computer Society Conference on Computer Vision and Pattern Recognition
doi: 10.1109/34.730558 Workshops (CVPRW) (Providence, RI), 7682. doi: 10.1109/CVPRW.2012.
Kandel, E., Schwartz, J. H., and Jessell, T. M. (2000). Principles of Neural Science. 6238898
4th Edn. New York, NY: McGraw-Hill Medical. doi: 10.1036/0838577016
Koch, C., and Ullman, S. (1985). Shifts in selective visual-attention towards the
Conflict of Interest Statement: The authors declare that the research was con-
underlying neural circuitry. Hum. Neurobiol. 4, 219227.
ducted in the absence of any commercial or financial relationships that could be
Kowler, E. (2011). Eye movements: the past 25 years. Vis. Res. 51, 14571483. doi:
construed as a potential conflict of interest.
10.1016/j.visres.2010.12.014
Lichtsteiner, P., Posch, C., and Delbruck, T. (2008). An 128 128 120dB 15 s-
latency temporal contrast visi on sensor. IEEE J. Solid State Circ. 43, 566576. Received: 13 August 2013; paper pending published: 08 September 2013; accepted: 20
doi: 10.1109/JSSC.2007.914337 November 2013; published online: 13 December 2013.
Metta, G., Fitzpatrick, P., and Natale, L. (2006). YARP: yet another robot platform. Citation: Rea F, Metta G and Bartolozzi C (2013) Event-driven visual attention for
Intl. J. Adv. Robot. Syst. 3, 4348. doi: 10.5772/5761 the humanoid robot iCub. Front. Neurosci. 7:234. doi: 10.3389/fnins.2013.00234
Miau, F., Papageorgiou, C., and Itti, L. (2001). Neuromorphic algorithms for This article was submitted to Neuromorphic Engineering, a section of the journal
computer vision and attention. Proc. SPIE 46, 1223. doi: 10.1117/12.448343 Frontiers in Neuroscience.
Movshon, J., Thompson, I., and Tolhurst, D. (1978). Spatial summation in the Copyright 2013 Rea, Metta and Bartolozzi. This is an open-access article dis-
receptive fields of simple cells in the cats striate cortex. J. Physiol. 283, 5377. tributed under the terms of the Creative Commons Attribution License (CC BY).
Navalpakkam, V., and Itti, L. (2005). Modeling the influence of task on attention. The use, distribution or reproduction in other forums is permitted, provided
Vis. Res. 45, 205231. doi: 10.1016/j.visres.2004.07.042 the original author(s) or licensor are credited and that the original publica-
Ouerhani, N., Bur, A., and Hgli, H. (2005). Robot self-localization using visual tion in this journal is cited, in accordance with accepted academic practice. No
attention, in Proceedings of the CIRA 2005, (Ancona, Italy), 309314. doi: use, distribution or reproduction is permitted which does not comply with these
10.1109/CIRA.2005.1554295 terms.

www.frontiersin.org December 2013 | Volume 7 | Article 234 | 33


ORIGINAL RESEARCH ARTICLE
published: 31 March 2014
doi: 10.3389/fnins.2014.00048

On the use of orientation filters for 3D reconstruction in


event-driven stereo vision
Luis A. Camuas-Mesa 1*, Teresa Serrano-Gotarredona 1 , Sio H. Ieng 2 , Ryad B. Benosman 2 and
Bernabe Linares-Barranco 1
1
Instituto de Microelectrnica de Sevilla (IMSE-CNM), CSIC y Universidad de Sevilla, Sevilla, Spain
2
UMR_S968 Inserm/UPMC/CNRS 7210, Institut de la Vision, Universit de Pierre et Marie Curie, Paris, France

Edited by: The recently developed Dynamic Vision Sensors (DVS) sense visual information
Tobi Delbruck, INI Institute of asynchronously and code it into trains of events with sub-micro second temporal
Neuroinformatics, Switzerland
resolution. This high temporal precision makes the output of these sensors especially
Reviewed by:
suited for dynamic 3D visual reconstruction, by matching corresponding events generated
Theodore Yu, Texas Instruments
Inc., USA by two different sensors in a stereo setup. This paper explores the use of Gabor
Jun Haeng Lee, Samsung filters to extract information about the orientation of the object edges that produce the
Electronics, South Korea events, therefore increasing the number of constraints applied to the matching algorithm.
*Correspondence: This strategy provides more reliably matched pairs of events, improving the final 3D
Luis A. Camuas-Mesa, Instituto de
reconstruction.
Microelectrnica de Sevilla
(IMSE-CNM), CSIC y Universidad de Keywords: stereovision, neuromorphic vision, Address Event Representation (AER), event-driven processing,
Sevilla, Av. Amrico Vespucio, s/n, convolutions, gabor filters
41092 Sevilla, Spain
e-mail: [email protected]

INTRODUCTION an equivalent sampling rate higher than 100 KFrames/s. Exploiting


Biological vision systems are known to outperform any mod- this fine time resolution provides a new mean for achieving stereo
ern artificial vision technology. Traditional frame-based systems vision with fast and efficient algorithms (Rogister et al., 2012).
are based on capturing and processing sequences of still frames. Stereovision processing is a very complex problem for conven-
This yields a very high redundant data throughput, imposing tional frame-based strategies, due to the lack of precise timing
high computational demands. This limitation is overcome in bio- information as used by the brain to solve such tasks (Meister and
inspired event-based vision systems, where visual information is Berry II, 1999). Frame-based methods usually process sequen-
coded and transmitted as events (spikes). This way, much less tially sets of images independently, searching for several features
redundant information is generated and processed, allowing for like orientation (Granlund and Knutsson, 1995), optical flow
faster and more energy efficient systems. (Gong, 2006) or descriptors of local luminance (Lowe, 2004).
Address Event Representation (AER) is a widely used bio- However, event-based systems can compute stereo information
inspired event-driven technology for coding and transmitting much faster using the precise timing information to match pixels
(sensory) information (Sivilotti, 1991; Mahowald, 1992; Lazzaro between different sensors. Several studies have applied events tim-
et al., 1993). In AER sensors, each time a pixel senses relevant ing together with additional constraints to compute depth from
information (like a change in the relative light) it asynchronously stereo visual information (Marr and Poggio, 1976; Mahowald
sends an event out, which can be processed by event-based pro- and Delbrck, 1989; Tsang and Shi, 2004; Kogler et al., 2009;
cessors (Venier et al., 1997; Choi et al., 2005; Silver et al., 2007; Domnguez-Morales et al., 2012; Carneiro et al., 2013; Serrano-
Khan et al., 2008; Camuas-Mesa et al., 2011, 2012; Zamarreo- Gotarredona et al., 2013).
Ramos et al., 2013). This way, the most important features pass In this paper, we explore different ways to improve 3D object
through all the processing levels very fast, as the only delay is reconstruction using Gabor filters to extract orientation informa-
caused by the propagation and computation of events along the tion from the retinas events. For that, we use two DVS sensors
processing network. Also, only pixels with relevant information with high contrast sensitivity (Serrano-Gotarredona and Linares-
send out events, reducing power and bandwidth consumption. Barranco, 2013), whose output is connected to a convolutional
These properties (high speed and low energy) are making AER network hardware (Zamarreo-Ramos et al., 2013). Different
sensors very popular, and different sensing chips have been Gabor filter architectures are implemented to reconstruct the 3D
reported for vision (Lichtsteiner et al., 2008; Leero-Bardallo shape of objects. In section Neuromorphic Silicon Retina, we
et al., 2010, 2011; Posch et al., 2011; Serrano-Gotarredona and describe briefly the DVS sensor used. Section Stereo Calibration
Linares-Barranco, 2013) or auditory systems (Lazzaro et al., 1993; describes the calibration method used in this work. In section
Cauwenberghs et al., 1998; Chan et al., 2007). Event Matching, we detail the matching algorithm applied, while
The development of Dynamic Vision Sensors (DVS) was very section 3D Reconstruction shows the method for reconstructing
important for high speed applications. These devices can track the 3D coordinates. Finally, section Results provides experimental
extremely fast objects with standard lighting conditions, providing results.

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 34


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Let us use lower case to denote a 2D point in the retina


sensing plane as m = [x y]T , and capital letter to denote the cor-
responding 3D point in real space as M = [X Y Z]T . Augmented
vectors are built by adding 1 as the last element: m = [x y 1]T and
M = [X Y Z 1] . Under the assumptions of the pinhole camera
T

model, the relationship between m and M is given by Hartley and


Zisserman (2003):
m = Pi M (1)

where Pi is the projection matrix for camera i. In order to obtain


the projection matrices of a system, many different techniques
have been proposed, and they can be classified into the following
two categories (Zhang, 2000):

Photogrammetric calibration: using a calibration object with


known geometry in 3D space. This calibration object usu-
ally consists of two or three planes orthogonal to each other
(Faugeras, 1993).
Self-calibration: the calibration is implemented by moving the
cameras in a static scene obtaining several views, without using
any calibration object (Maybank and Faugeras, 1992).
FIGURE 1 | Data driven asynchronous event generation for two
equivalent pixels in Retina 1 and Retina 2. Because of intra-die pixel In this work, we have implemented a calibration technique based
mismatch and inter-die sensor mismatch, both response curves differ. on a known 3D object, consisting of 36 points distributed in two
orthogonal planes. Using this fixed pattern, we calibrate two DVS.
A blinking LED was placed in each one of these 36 points. LEDs
NEUROMORPHIC SILICON RETINA blinked sequentially one at a time, producing trains of spikes
The DVS used in this work is an AER silicon retina with 128 in several pixels at both sensors. From these trains of spikes,
j
128 pixels and increased contrast sensitivity, allowing the retina we needed to extract the 2D calibration coordinates m i , where
to detect contrast as low as 1.5% (Serrano-Gotarredona and i = 1, 2 represents each silicon retina and j = 1, . . . 36 repre-
Linares-Barranco, 2013). The output of the retina consists of sents the calibration points (see Figure 2). There are two different
asynchronous AER events that represent a change in the sensed approaches to obtain these coordinates: with pixel or sub-pixel
relative light. Each pixel independently detects changes in log resolution. In the first one, we decided that the corresponding 2D
intensity larger than a threshold since the last emitted event ev = coordinate for a single LED was represented by the pixel which
I (t) I(tlastspike )/I(t). responded with a higher firing rate. In the second one, we selected
The most important property of these sensors is that pixel a small cluster of pixels which responded to that LED with a fir-
information is obtained not synchronously at fixed frame rate ing rate above a certain threshold, and we calculated the average
t, but asynchronously driven by data at fixed relative light coordinate, obtaining sub-pixel accuracy.
increments ev , as shown in Figure 1. This figure represents the After calculating m
j
1 and m
j
2 (j = 1, . . . 36) and knowing M j,
photocurrent transduced by two pixels in two different retinas in we can apply any algorithm that was developed for traditional
a stereo setup, configured so that both pixels are sensing an equiv- frame-based computer vision (Longuet-Higgins, 1981) to extract
alent activity. Even though if both are sensing exactly the same P1 and P2 (Hartley and Zisserman, 2003). More details can be
light, the transduced currents are different, given the change in found in Calculation of Projection Matrix P in Supplementary
initial conditions (I01 and I02 ) and mismatch between retina pixels Material.
that produce a different response to the same stimulus. As a con- The fundamental matrix F relates the corresponding points
sequence, the trains of events generated by these two pixels are obtained from two cameras, and is defined by the equation:
not identical, as represented in Figure 1.
The events generated by the pixels can have either positive
m 2 = 0
T1 F m (2)
or negative polarity, depending on whether the light intensity
increased or decreased. These events are transmitted off-chip,
where m 2 are a pair of correspondent 2D points in both
1 and m
timestamped and sent to a computer using a standard USB
cameras (Luong, 1992). This system can be solved using the 36
connection.
pairs of points mentioned before (Benosman et al., 2011).
STEREO CALIBRATION
Before using a pair of retinas for sensing and matching pairs EVENT MATCHING
of corresponding events and reconstruct each event in 3D, both In stereo vision systems, a 3D point in space M is projected onto
retinas relative positions and orientations need to be calibrated. the focal planes of both cameras in pixels m1 and m2 , therefore

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 35


Camuas-Mesa et al. Orientation filters for 3D reconstruction

FIGURE 3 | Temporal match. Two events can be considered as candidates


FIGURE 2 | Photograph of the calibration structure, with 36 LEDs to match if they are generated within a certain time interval t .
distributed in two orthogonal planes. The size of the object is shown in
the figure.

Nonetheless, corresponding events occur within a milli second


generating events e(mi1 , t) and e(mi2 , t). Reconstructing the orig- range time window, depending on ambient light (the lower
inal 3D point requires matching each pair of events produced by light, the wider the time window). As a consequence, this first
point M at time t (Carneiro et al., 2013). For that, we imple- restriction implies that for an event e(mi1 , t1 ), only those events
mented two different matching algorithms (A and B) based on e(mi2 , t2 ) with |t1 t2 | < t /2 can be candidates to match, as
a list of restrictions applied to each event in order to find its shown in Figure 3. In our experimental setup we used a value
matching pair. These algorithms are described in the following of t = 4 ms, which gave the best possible result under standard
subsections. interior lighting conditions.
RETINAS EVENTS MATCHING ALGORITHM (A) Restriction 2: epipolar restriction
This first algorithm (Carneiro et al., 2013) consists of applying the As is described in detail in (Hartley and Zisserman, 2003), when a
following restrictions (14) to the events generated by the silicon 3D point in space M is projected onto pixel m1 in retina 1, the cor-
retinas. Therefore, for each event generated by retina 1 we have responding pixel m2 lies on an epipolar line in retina 2 (Carneiro
to find out how many events from retina 2 satisfy the 4 restric- et al., 2013). Using this property, a second restriction is added to
tions. If the answer is only one single event, it can be considered the matching algorithm using the fundamental matrix F to cal-
its matching pair. Otherwise, it is not possible to determine the culate the epipolar line Ep2 in retina 2 corresponding to event
corresponding event, and it will be discarded. m1 in retina 1 (Ep2 (m1 ) = F T m 1 ). Therefore, only those events
e(mi2 , t2 ) whose distance to Ep2 is less than a given limit Epi can
Restriction 1: temporal match
be candidates to match. In our experiments we used a value of
One of the most useful advantages of event-driven DVS based
Epi = 1 pixel.
vision sensing and processing is the high temporal resolution
down to fractions of micro seconds (Lichtsteiner et al., 2008; Restriction 3: ordering constraint
Posch et al., 2011; Serrano-Gotarredona and Linares-Barranco, For a practical stereo configuration of retinas where the angle
2013). Thus, in theory, two identical DVS cameras observing the between their orientations is small enough, a certain geometrical
same scene should produce corresponding events simultaneously constraint can be applied to each pair of corresponding events.
(Rogister et al., 2012). However, in practice, there are many non- In general, the horizontal coordinate of the events generated by
ideal effects that end up introducing appreciable time differences a retina is always larger than the horizontal coordinate of the
(up to many milli seconds) between corresponding events: corresponding events generated by the other retina.

(a) inter-pixel and inter-sensor variability in the light-dependent Restriction 4: polarity


latency since a luminance change is sensed by the photodiode The silicon retinas used in our experimental setup generate out-
until it is amplified, processed and communicated out of the put events when they detect a change in luminance in a pixel,
chip; indicating in the polarity of the event if that change means
(b) presence of noise at various stages of the circuitry; increasing or decreasing luminance (Lichtsteiner et al., 2008;
(c) variability in inter-pixel and inter-sensor contrast sensitivity; Posch et al., 2011; Serrano-Gotarredona and Linares-Barranco,
and 2013). Using the polarity of events, we can impose the condition
(d) randomness of pixel initial conditions when a change of light that two corresponding events in both retinas must have the same
happens. polarity.

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 36


Camuas-Mesa et al. Orientation filters for 3D reconstruction

GABOR FILTER EVENTS MATCHING ALGORITHM (B)


We propose a new algorithm where we use the orientation of the
object edges to improve the matching, increasing the number of
correctly matched events.
If the focal planes of two retinas in a stereo vision system are
roughly vertically aligned and have a small horizontal vergence,
the orientation of observed edges will be approximately equal
provided that the object is not too close to the retinas. A static
DVS produces events when observing moving objects, or more
precisely, when observing the edges of moving objects. Therefore,
correspondent events in the two retinas are produced by the same
moving edges, and consequently the observed orientation of the
edge should be similar in both retinas. An edge would appear
with a different angle in both retinas only when it is relatively
close to them, and in practice this does not happen because of
two reasons1 :

(1) Since both cameras have small horizontal vergence, the object
would be out of the overlapping field of view of the 2 retinas
far before being so close. In that case, we do not have stereo
vision anymore.
(2) The minimal focusing distance of the cameras lenses limits
the maximal vergence.

Considering that, we can assume that the orientation of an edge


will be approximately the same in both retinas under our working
conditions. Under different conditions, an epipolar rectification
should be applied to the stereo system to ensure the orientations
of the edges to be identical in the two cameras. This operation
consists in estimating the homographies mapping and scaling
the events of each retina into two focal planes parallel to the
stereo baseline (Loop and Zhang, 1999). Lines in the rectified
focal planes are precisely the epipolar lines of the stereo system.
This rectification should be carried out at the same time than the
retinas calibration.
The application of banks of Gabor filters to the events gener-
ated by both retinas provides information about the orientation
of the object edges that produce the events as shown in Figure 4.
This way, by using Gabor filters with different angles we can apply
the previously described matching algorithm to pairs of Gabor
filters with the same orientation. Thus, the new matching algo-
rithm is as follows. The events coming out of retinas R1 and R2
are processed by Gabor filters G1x and G2x , respectively (with
x = 1, 2, . . . N, being N the number of orientation filters for FIGURE 4 | Illustration of the use of 3 Gabor filters with different
orientations to the output of both retinas. The events generated by the
each retina). Then, for each pair of Gabor filters G1x and G2x ,
filters carry additional information, as they represent the orientation of the
conditions 14 are applied to obtain matched events for each edges.
orientation. Therefore, the final list of matched events will be
obtained as the union of all the lists of matched events obtained
for each orientation. consists of two events with coordinates m1 = (x1 , y1 )T and m2 =
for both retinas is
and M
(x2 , y2 )T . The relationship between m
3D RECONSTRUCTION given by:
The result provided by the previously described matching algo-
m =0
1 P1 M (3)
rithm is a train of pairs of corresponding events. Each pair
=0
2 P2 M
m
1 There is, however, a pathological exception: a very thin and long object,
perfectly centred between the two retinas, having its long dimensin perpen- where P1 and P2 represent the projection matrices calculated dur-
dicular to the retina planes, may produce different angles at both retinas. is the augmented vector corresponding to
ing calibration, and M

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 37


Camuas-Mesa et al. Orientation filters for 3D reconstruction

the 3D coordinate that must be obtained. These equations can if any of the updated pixels has reached its positive or negative
be solved as a linear least squares minimization problem (Hartley threshold, in that case resetting the pixel and sending a signed
and Zisserman, 2003), giving the final 3D coordinates M = event to the output block. A programmable forgetting process
[X Y Z]T as a solution. More details can be found in Calculation decreases linearly the value of all the pixels periodically, making
of Reconstructed 3D Coordinates in Supplementary Material. the pixels behave like leaky integrate-and-fire neurons.
Several convolutional modules can be arranged in a 2D mesh,
RESULTS each one communicating bidirectionally with all four neighbors,
In this Section, we describe briefly the hardware setup used for the as illustrated in Figure 7 (Zamarreo-Ramos et al., 2013). Each
experiments, then we show a comparison between the different module is characterized by its module coordinate within the
calibration methods, after that we characterize the 3D reconstruc- array. Address events are augmented by adding either the source
tion method, and finally we present results on the reconstruction or destination module coordinate. Each module includes an AER
of 3D objects. router which decides how to route the events (Zamarreo-Ramos
et al., 2013). This way, any network architecture can be imple-
HARDWARE SETUP mented, like the one shown in Figure 4 with any number of Gabor
The event-based stereo vision processing has been tested using filters. Each convolutional module is programmed to extract a
two DVS sensor chips (Serrano-Gotarredona and Linares- specific orientation by writing the appropriate kernel. In our
Barranco, 2013) whose outputs are connected to a merger board experiments, the resolution of the convolutional blocks is 128
(Serrano-Gotarredona et al., 2009) which sends the events to a 128 pixels.
2D grid array of event-based convolution modules implemented In order to compensate the mismatch between the two DVS
within a Spartan6 FPGA. This scheme has been adapted from a chips, an initial procedure must be implemented. This procedure
previous one that used a Virtex6 (Zamarreo-Ramos et al., 2013). consists of setting the values of the bias signals which control the
The Spartan6 was programmed to perform real-time edge extrac- sensitivity of the photosensors to obtain approximately the same
tion on the visual flow from the retinas. Finally, a USBAERmini2 number of events in response to a fixed stimulus in both retinas.
board (Serrano-Gotarredona et al., 2009) was used to timestamp
all the events coming out of the Spartan6 board and send them to CALIBRATION RESULTS
a computer through a high-speed USB2.0 port (see Figure 5). In order to calibrate the setup with both DVS retinas (with a base-
The implementation of each convolution module in the FPGA line distance of 14 cm, being the retinas approximately aligned
is represented in Figure 6. It consists of two memory blocks (one
to store the pixel values, and the other to store the kernel), a con-
trol block that performs the operations, a configuration block
that receives all the programmable parameters, and an output
block that sends out the events. When an input event arrives, it is
received by the control block, which implements the handshaking
and calculates which memory positions must be affected by the
operation. In particular, it must add the kernel values to the pixels
belonging to the appropriate neighborhood around the address
of the input event, as done in previous event-driven convolution
processors (Serrano-Gotarredona et al., 1999, 2006, 2008, 2009;
Camuas-Mesa et al., 2011, 2012). At the same time, it checks

FIGURE 6 | Block diagram for the convolutional block implemented on


FIGURE 5 | Experimental stereo setup. FPGA.

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 38


Camuas-Mesa et al. Orientation filters for 3D reconstruction

and the focal length of the lenses 8 mm), we built a structure of while the vertical separation is 3.5 cm. This structure was placed
36 blinking LEDs distributed in two orthogonal planes, each with in front of the DVS stereo setup at approximately 1 m distance,
an array of 6 3 LEDs with known 3D coordinates in each plane and the events generated by the retinas were recorded by the com-
(see Figure 2). The horizontal distance between LEDs is 5 cm, puter. The LEDs would blink sequentially, so that when one LED
produces events no other LED is blinking. This way, during a

FIGURE 9 | Measurement of the disparity (distance) between a pixel in


FIGURE 7 | Block diagram for a sample network with 3 3 Retina 1 and its corresponding epipolar line in Retina 2. The minimum
convolutional blocks implemented on FPGA. disparity point separates Region A and B.

FIGURE 8 | 3D reconstruction of the coordinates of the calibration the reconstructed coordinate. (C,D) Show the measured errors absolute
LEDs. (A) With pixel resolution and (B) with sub-pixel resolution. Blue value in cm for approaches 1 and 2, respectively. Red lines represent the
circles represent the real location of the LEDs, while red crosses indicate mean error.

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 39


Camuas-Mesa et al. Orientation filters for 3D reconstruction

FIGURE 11 | Kernels used for the 4-orientation configuration. Each row


represents a different scale (from smaller to larger kernels). The maximum
kernel value is 15 and the minimum is 7. Kernel size is 11 11 pixels.

FIGURE 10 | Characterization of the 3D reconstruction of the epipolar


lines for different pixels in Retina 1. Each color represents a different
pixel. (A) Distance between the reconstructed points and the retinas for
different disparity values. The dashed lines represent the upper and lower
limits associated to the allowed deviation around the epipolar line.
(B) Reconstruction error for 3D points closer to the retinas, Region A.
(C) Reconstruction error for points farther from the retinas, Region B.
FIGURE 12 | Photograph of the three objects used to test the 3D
reconstruction algorithm: a pen, a ring, and a cube.
simultaneous event burst in both cameras, there is only one LED
in 3D space blinking, resulting in a unique spatial correspondence
between the events produced in both retinas and the original 3D the threshold are set to one, obtaining a binarization of the
position. This recording was processed offline to obtain the 2D image. Figure S1 in Calculation of Projection Matrix P in
coordinates of the LEDs projected in both retinas following two Supplementary Material shows an example of a 2D binarized
different approaches: image obtained for one DVS, where the 36 clusters represent
the responses to the blinking LEDs. Then, for each cluster
(1) We represent a 2D image coding the number of spikes gener- of pixels we calculate the mean coordinate, obtaining the 2D
ated by each pixel. This way for each LED we obtain a cluster projection of the LEDs with sub-pixel resolution.
of pixels with large values. The coordinate of the pixel with
the largest value in each cluster is considered to be the 2D In both cases, these 2D coordinates together with the known 3D
projection of the LED. The accuracy of this measurement is positions of the LEDs in space are used to calculate the projec-
one pixel. tion matrices P1 and P2 , and the fundamental matrix F following
(2) Using the same 2D image, the following method is applied. the methods described in section Stereo Calibration. To vali-
First, all those pixels with a number of spikes below a cer- date the calibration, P1 and P2 were used to reconstruct the 3D
tain threshold are set to zero, while all those pixels above calibration pattern following the method described in section 3D

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 40


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Table 1 | Comparison of the 3D reconstruction results for the pen.


Scale 1 Scale 2
Orientations 0 2 3 4 5 6 7 8 2 3 4 5 6 7 8
Nev 100 71 65 78 77 87 100 105 73 78 85 98 121 128 146
Nm 28 15 14 15 14 16 17 18 15 15 16 18 22 24 27
Matching rate 28 21 21 19 19 18 17 17 21 20 19 18 18 19 19
Isolated events 2.9 5.6 6.4 5.4 5.8 5.1 4.5 4.1 5.1 4.9 4.7 4.1 3.0 2.6 2.1
Merr 8.0 4.1 3.9 4.2 3.9 4.1 3.9 4.1 3.6 3.6 3.7 3.6 3.6 3.6 3.6
Nmcorrect 24.9 14 13 14 13 15 16 17 14 14 15 17 21 23 25
Scale 3 Scale 4
Orientations 2 3 4 5 6 7 8 2 3 4 5 6 7 8
Nev 74 80 87 106 131 154 169 77 79 85 99 129 145 170
Nm 16 17 17 21 26 31 34 19 19 19 22 30 34 39
Matching rate 22 21 20 19 20 20 20 24 24 23 23 23 23 23
Isolated events 5.0 4.7 4.5 3.3 2.4 1.8 1.5 3.3 3.3 3.2 2.6 1.6 1.4 1.0
Merr 5.2 5.2 5.2 5.1 4.9 4.9 5.0 8.3 8.3 8.3 8.3 8.2 8.1 7.8
Nmcorrect 14 15 15 19 24 29 32 17 17 17 20 27 31 36

The first column (0 orientations) presents the results obtained applying the matching algorithm to the retinas events (algorithm A, section Event Matching), while
the rest of the columns are related to the pair-wise application of the matching algorithm to the outputs of the Gabor filters (algorithm B, section Event Matching),
from Scale 1 (smaller kernels) to Scale 4 (larger kernels). For each scale, different numbers of orientations are considered (from 2 to 8), as indicated in the first
row (Orientations). Second row (Nev ) shows the number of events processed (in Kevents) by the matching algorithm in each case (i.e., the total number of events
generated by all the filters). Third row (Nm ) presents the number of matched events (in Kevents) produced by the algorithm, while fourth row (Matching Rate) shows
the ratio of matched events over the total number of events generated by the Gabor filters (Matching Rate = 100 Nm /Nev , in %). Fifth row (Isolated events) shows
the ratio of isolated events over the total number of matched events (in %). Sixth row (Merr ) presents the ratio of wrongly matched events over the total number
of matched events (in %). The last row (Nmcorrect ) encapsulates
 the number ofmatched
 events
 with the ratio of isolated and wrongly matched events, presenting
Isolated events Merr
the number of correctly matched events (Nmcorrect = Nm 100 Nm 100 Nm , in Kevents).

Reconstruction, obtaining the results shown in Figures 8A,B. The algorithm, we calculated two more lines, an upper and a lower
reconstruction error is measured as the distance between each limit, given by the distance of 1 pixel to the epipolar line. Using
original 3D point and its corresponding reconstructed position, projection matrices P1 and P2 , we reconstructed the 3D coor-
giving the results shown in Figures 8C,D. As can be seen in the dinates for all the points in these three lines. We repeated the
figure, the mean reconstruction error for approach 1 is 7.3 mm procedure for a total of four different pixels in Retina 1 mi1 (i =
with a standard deviation of 4.1 mm, while for approach 2 it 1, 2, 3, 4) distributed around the visual space, obtaining four
is only 2 mm with a standard deviation of 1 mm. This error is sets of 3-dimensional lines. In Figure 10A, we represent the dis-
comparable to the size of each LED (1 mm). tance between these 3D points and the retinas for each disparity
value [the disparity measures the 2D euclidean distance
 between

PRECISION CHARACTERIZATION the projections of a 3D point in both retinas x1 , y1 and x2 , y2 ],
Using the calibration results obtained in the previous subsection, where each color corresponds to a different pixel mi1 in Retina 1,
we performed the following evaluation of the 3D reconstruction and the dashed lines represent the upper and lower limits given
method. For a fixed pixel m11 in Retina 1, we used the fundamen- by the tolerance of 1 pixel around the epipolar lines. As can be
tal matrix F to calculate the corresponding epipolar line in Retina seen in the figure, each disparity has two different values of dis-
2 Ep12 , as represented in Figure 9. Although a perfect alignment tance associated, which represent the two possible points in Epi2
between the two retinas would produce an epipolar line parallel which are at the same distance from mi1 . This effect results in two
to the x-axis and crossing
 the pixel position [minimum disparity different zones in each trace (regions A and B in Figure 9), which
point coincident with x1 , y1 ], we represent a more general case, correspond to two different regions in the 3D space, where the
where the alignment is performed manually and is not perfect. performance of the reconstruction changes drastically. Therefore,
This case is illustrated in Figure S1 (see Calculation of Projection we consider both areas separately in order to estimate the recon-
Matrix P in Supplementary Material), where we show the 2D struction error. Using the range of distances given by Figure 10A
images representing the activity recorded by both retinas during between each pair of dashed lines, we calculate the reconstruction
calibration. The orientations of the epipolar lines indicate that error for each disparity value as (dmax dmin )/d , where dmax
the alignment is not perfect. The mean disparity for the LEDs and dmin represent the limits of the range of distance at that point,
coordinates is 24.55 pixels. Considering that we admit a devia- and d is the mean value. Figure 10B shows the obtained error for
tion around the epipolar line of Epi = 1 pixel in the matching the 3D points located in the closer region (A), while Figure 10C

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 41


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Note that the minimum disparity value is around 20 pixels (while


a perfect alignment would give 0), showing the robustness of the
method for manual approximate alignment.

3D RECONSTRUCTION
For the experimental evaluation of the 3D reconstruction, we ana-
lyzed the effect of several configurations of Gabor filters on the
event matching algorithm B in order to compare them to algo-
rithm A. For each configuration, we tested different numbers of
orientation Gabor filters (from 2 to 8). All filters had always the
same spatial scale, and we tested 4 different scales. Identical fil-
ters were applied to both retina outputs. Each row in Figure 11
shows an example of the kernels used in a configuration of 4 ori-
entations (90, 45, 0, 45 ), each configuration for a given spatial
scale. In general, the different angles implemented in each case
are uniformly distributed between 90 and 90 . This strategy was
used to reconstruct in 3D the three objects shown in Figure 12: a
14 cm pen, a 22 cm diameter ring, and a 15 cm side metal wire
cube structure.

Pen
A swinging pen of 14 cm length was moved in front of the two
retinas for half a minute, with a number of approximately 100
Kevents generated by each retina. Table 1 summarizes the results
of the 3D reconstruction, in terms of events. The column labeled
Orientations 0 corresponds to applying the matching algorithm
directly to the retina pair outputs (algorithm A). When using
Gabor filters (algorithm B), experiments with four different scales
were conducted. For each scale, a different number of simultane-
ous filter orientations were tested, ranging from 2 to 8. In order
to compare the performance of the stereo matching algorithm
applied directly to the retinas (algorithm A, see section Event
Matching) and applied to the outputs of the Gabor filters (algo-
rithm B, see section Event Matching), the second row in Table 1
(Nev ) shows the number of events processed by the algorithm in
both cases. We show only the number of events coming origi-
nally from Retina 1, as they both have been configured to generate
approximately the same number of events for a given stimulus.
When the algorithm is applied directly to the output of the reti-
nas, the number of matched pairs of events obtained is around
28 Kevents (28% of success rate). The third row in Table 1
(Nm ) shows the number of matched events for the different
configurations of Gabors. If we calculate the percentage of suc-
cess obtained by the algorithm for each configuration of filters
in order to compare it with the 28% provided by the retinas
alone, we obtain the values shown in the fourth row of Table 1
FIGURE 13 | Illustration of enhancing edges and noise reduction by a
Gabor filter. (A) Input events representing a discontinuous edge with (Matching Rate).
noise. (B) Output events generated by the Gabor filter, with the Although these results show that the matching rate of the algo-
reconstructed edge without noise. (C) Gabor kernel. All axes represent rithm is smaller when we use Gabor filters to extract information
pixels, being the visual space in (A,B) 128 128 and the size of the kernel about the orientation of the edges that generated the events, we
in (C) 11 11.
should consider that the performance of 3D reconstruction is
determined by the total number of matched events, not the rel-
ative proportion. Note that the Gabor filters are capable of edge
corresponds to the points farther from the retinas (Region B). In filling when detecting somewhat sparse or incomplete edges from
both figures, each line represents a different pixel mi1 in Retina 1. the retina, thus enhancing edges and providing more events for
As shown in Figure 10B, the reconstruction error in the area of these edges. Figure 13 shows an example where a weak edge (in
interest (around 1m distance from the retinas) is less than 1.5%. Figure 13A) produced by a retina together with noise events is

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 42


Camuas-Mesa et al. Orientation filters for 3D reconstruction

filled by a Gabor filter (with the kernel shown in Figure 13C) (Lindenbaum et al., 1994), and has also been used for fingerprint
producing the enhanced noise-less edge in Figure 13B, and image enhancement (Hong et al., 1998; Greenberg et al., 2002).
increasing the number of edge events from 24 to 70 while remov- Another parameter that can be used to measure the quality of
ing all retina-noise events. The more matched events, the better the 3D reconstruction is the proportion of isolated events in the
3D reconstruction. For that reason, we consider that a bank of matched sequence. We define an isolated event as an event which
8 Gabor filters with kernels of scale 4 gives the best result, with is not correlated to any other event in a certain spatio-temporal
more than 39 Kevents that can be used to reconstruct the 3D window, meaning that no other event has been generated in its
sequence, using 100 Kevents generated by the retinas. This appli- neighbor region within a limited time range. A non-isolated event
cation of Gabor filters for edges filling was first demonstrated in (an event generated by an edge of the object) will be correlated

FIGURE 16 | Sequence of disparity maps. They were reconstructed with


Tframe = 50 ms and they correspond to the movement of the swinging pen
(from AI). The disparity scale goes from dark blue to red to encode events
FIGURE 14 | Illustration of matching errors. from far to near.

FIGURE 15 | Graphical representation of Table 1. Each subplot corresponds to a different row of the table, showing the obtained values for each number of
orientations and scale. The black horizontal lines indicate the values obtained using algorithm A (0 orientations).

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 43


Camuas-Mesa et al. Orientation filters for 3D reconstruction

FIGURE 17 | Result of the 3D reconstruction of the swinging pen recording. Each plot (from AI) corresponds to a 50 ms-frame representation of the 3D
coordinates of the matched events.

to some other events generated by the same edge, which will be However, it is possible that some noise events add their contribu-
close in space and time. Note that these isolated matched events tions together producing noise events at the output of the Gabor
correspond to false matches. These false matches can be produced filters. Two different things can happen with these events: (1) the
when an event in one retina is matched by mistake with a noise stereo matching algorithm does not find a corresponding event
event in the other retina, or when two or more events that hap- in the other retina; (2) there is a single event which satisfies all
pen very simultaneously in 3D space are cross-matched by the restrictions, so a 3D point will be reconstructed from a noise
matching algorithm. With this definition of isolated events, the event, producing a wrongly matched event, as is described in the
28 Kevents that were matched for the retinas without any filter- next paragraph.
ing were used to reconstruct the 3D coordinates of these events, Although the object used in this first example is very sim-
resulting in only 2.93% of isolated events. After the application ple, we must consider the possibility that the algorithm matches
of the same methodology to all the Gabor filters configurations, wrongly some events. In particular, if we think about a wide object
the results in the fifth row in Table 1 (Isolated events) are obtained. we can have events generated simultaneously by two far edges: the
These results show that several configurations of Gabor filters give left and the right one. Therefore, it can happen that an event cor-
a smaller proportion of isolated events. responding to the left edge in Retina 1 does not have a proper
In order to remove the retina-noise events, it is also pos- partner in Retina 2, but another event generated by the right edge
in Retina 2 might satisfy all the restrictions imposed by the match-
sible to insert a noise removal block directly at the output of
ing algorithm. Figure 14 illustrates the mechanism that produces
the retina (jAER, 2007). However, this introduces a small extra
this error. Let us assume that the 3D object has its left and right
latency before the events can be processed, thus limiting event-
edges located at positions A and B in 3D space. Locations A
driven stereo vision for very high speed applications (although
and B produce events at x1A and x1B in Retina 1, and at x2A and
it can be a good solution when timing restrictions are not too
x2B in Retina 2. These events are the projections onto the  focal
critical). The effect of Gabor filters on noise events is also illus- j j
trated in Figure 13, where all the events that were not part of an points R1 and R2 of both retinas, activating pixels xi , yi , with
edge with the appropriate orientation are removed by the filter. i = 1, 2 and j = A, B. Therefore, an event generated in Retina

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 44


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Table 2 | Comparison of the 3D reconstruction results for the ring.


Scale 1 Scale 2
Orientations 0 2 3 4 5 6 7 8 2 3 4 5 6 7 8
Nev 115 78 75 100 109 131 151 168 78 95 119 143 177 197 229
Nm 17 8 8 9 10 12 14 16 8 10 12 15 19 21 25
Matching rate 15 10 11 9 10 10 9 9 10 11 10 10 11 11 11
Isolated events 5.9 7.8 7.1 6.5 5.4 4.9 4.1 3.9 7.6 6.2 5.0 3.9 3.0 2.6 1.9
Merr 12.0 9.9 9.5 9.3 8.7 8.5 8.7 8.9 9.3 9.0 8.4 8.1 8.2 8.0 7.8
Nmcorrect 14 7 7 8 9 10 12 14 7 8 10 13 17 19 23
Scale 3 Scale 4
Orientations 2 3 4 5 6 7 8 2 3 4 5 6 7 8
Nev 82 103 122 157 185 217 245 83 107 131 161 201 229 266
Nm 8 10 12 16 19 22 25 6 9 11 14 17 20 23
Matching rate 9 10 10 10 10 10 10 8 8 8 9 9 9 9
Isolated events 7.5 6.3 4.8 3.5 2.9 2.3 2.0 7.7 6.3 5.1 3.9 3.0 2.5 2.0
Merr 8.9 7.7 7.3 6.8 6.5 6.6 6.4 8.4 6.5 6.2 5.7 5.9 5.8 5.6
Nmcorrect 7 9 11 14 17 20 23 5 8 10 13 15 18 21

The meaning of the columns and rows is as in Table 1.

FIGURE 18 | Graphical representation of Table 2. Each subplot corresponds to a different row of the table, showing the obtained values for each number of
orientations and scale. The black horizontal lines indicate the values obtained using algorithm A (0 orientations).

   
1 with coordinates x1A , y1A should match another event gener- but another event with coordinates x2B , y2B is generated within
 
ated in Retina 2 with coordinates x2A , y2A . However, note that a short time range by the opposite simultaneously moving edge,
in Figure 13, an edge at position D is captured by Retina 1 at being those coordinates in the
 same  epipolar
 line. In that case,
the same pixel that an edge at A, and in Retina 2 they would the algorithm might match x1A , y1A with x2B , y2B , reconstruct-
be on the same epipolar lines. The same happens for edges at ing a wrong 3D point in coordinate D. The opposite combination
positions B and C. Consequently, it can
 happen  that no event is would produce a wrong 3D event in point C. This effect could
produced in Retina 2 at coordinate x2A , y2A at the same time, produce false edges in the 3D reconstruction, especially when

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 45


Camuas-Mesa et al. Orientation filters for 3D reconstruction

is 8 orientations and Scale 4, as it provides the largest number of


correctly matched events. However, it could also be argued that
8 orientations and Scale 3 gives a smaller number of wrongly
matched events, but in that case the number of correctly matched
events is also smaller.
Using the sequence of matched events provided by the algo-
rithm in the best case (8 orientations, Scale 4), we computed
the disparity map. The underlying reasons why this configura-
tion provides the best result are: (a) Scale 4 matches better the
scale of the object edges in this particular case, and (b) given
the object geometry and its tilting in time, a relatively fine ori-
entation angle detection was required. If we compare this case
with the results obtained applying algorithm A without Gabor fil-
ters (first column in Table 1), we observe an increase of 39% in
the number of matched events, while the proportions of isolated
events and wrongly matched pairs have decreased by 65 and 2.5%,
respectively. Moreover, the number of correctly matched events
has increased by 44%. In order to compute the disparity map,
we calculated the euclidean distance between both pixels in each
pair of events (from Retina 1 and Retina 2). This measurement
is inversely proportional to the distance between the represented
object and the retinas, as further objects produce a small dispar-
ity and closer objects produce a large disparity value. Figure 16
shows 9 consecutive frames of the obtained disparity sequence,
with a frame time of 50 ms. The disparity scale goes from dark
blue to red to encode events from far to close.
Applying the method described in section 3D Reconstruction,
the 3 dimensional coordinates of the matched events are calcu-
lated. Figure 17 shows 9 consecutive frames of the resultant 3D
reconstruction, with a frame time of 50 ms. The shape of the
pen is clearly represented as it moves around 3D space. Using
FIGURE 19 | Results obtained for the rotating ring. (A) Disparity map
reconstructed with Tframe = 50 ms corresponding to the rotation of the ring.
this sequence, we measured manually the approximate length of
(B) Result of the 3D reconstruction of the same frame of the ring recording. the pen by calculating the distance between the 3D coordinates
of pairs of events located in the upper and lower limits of the
pen, respectively. This gave an average length of 14.85 cm, being
the real length 14 cm, which means an error of 0.85 cm. For an
processing more complex objects. However, the introduction of approximate distance to the retinas of 1 m, the maximum error
the Gabor filters to extract the orientation of the edges will reduce predicted in Figure 10 would be below 1.5%, resulting in 1.5 cm.
the possibility of matching wrong pairs of events. In order to mea- Therefore, we can see that the 0.85 cm error is smaller than the
sure the proportion of wrongly matched events, we consider that maximum predicted by Figure 10.
all the good pairs of events will follow certain patterns of dis-
parity, so all the events which are close in time will be included Ring
within a certain range of disparity values. Calculating contin- A ring with a diameter of 22 cm was rotating slowly in front
uously the mean and standard deviation of the distribution of of the two retinas for half a minute, with a number of approxi-
disparities, we define the range of acceptable values, and we iden- mately 115 Kevents generated by each retina. As in the previous
tify as wrongly matched all those events whose disparity is outside example, the matching algorithm was applied both to the events
that range. Using this method, we calculate the proportion of generated by the retinas (see section Event Matching, algorithm
wrongly matched events and present it (in %) in the sixth row A) and to the events generated by the Gabor filters (see section
of Table 1 (Merr ). Finally, the last row presents the number of cor- Event Matching, algorithm B), in order to compare both methods.
rectly matched events, subtracting both the isolated and wrongly Table 2 shows all the results for all the configurations of Gabor
matched events from  the total number  of matched  events: filters (from 2 to 8 orientations, with scales 14). All these results
Merr
Nmcorrect = Nm Isolated events
100 Nm 100 Nm . All these are presented graphically in Figure 18, where the colored verti-
results are presented graphically in Figure 15, where the colored cal bars represent the results obtained applying algorithm B with
vertical bars represent the results obtained applying algorithm B different number of orientations and scales, while the black hor-
with different number of orientations and scales, while the black izontal lines indicate the values obtained using algorithm A (no
horizontal lines indicate the values obtained using algorithm A Gabor filters). We can see in the table how the largest number
(no Gabor filters). From this figure, we decide that the best case of matched events (25 K) is obtained for 8 orientations and both

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 46


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Table 3 | Comparison of the 3D reconstruction results for the cube.


Scale 1 Scale 2
Orientations 0 2 3 4 5 6 7 8 2 3 4 5 6 7 8
Nev 118 54 68 100 112 132 153 178 50 93 125 152 183 205 243
Nm 11 6 10 13 15 18 21 24 6 11 14 17 21 24 28
Matching rate 9 12 14 13 14 14 14 14 11 12 11 11 11 11 12
Isolated events 14.0 5.2 5.0 4.5 4.3 3.8 3.7 3.3 5.0 5.0 4.1 4.1 3.9 3.4 3.1
Merr 20.3 17.0 15.5 15.1 15.0 15.1 15.8 14.1 17.9 14.2 11.9 11.1 13.3 12.0 10.3
Nmcorrect 6 5 8 10 12 15 17 20 5 9 12 14 17 20 24
Scale 3 Scale 4
Orientations 2 3 4 5 6 7 8 2 3 4 5 6 7 8
Nev 54 130 170 219 256 300 346 51 145 190 235 285 329 386
Nm 5 12 14 20 23 27 31 3 10 12 16 19 21 25
Matching rate 9 9 8 9 9 9 9 6 7 6 7 7 7 7
Isolated events 5.2 4.2 4.1 3.6 3.1 3.2 3.0 4.8 3.7 3.2 3.1 2.9 2.7 2.8
Merr 19.0 15.1 12.7 11.3 11.9 11.2 10.9 27.4 15.0 12.9 11.4 13.7 12.2 10.7
Nmcorrect 4 10 12 17 20 23 27 2 8 10 14 16 18 22

The meaning of the columns and rows is as in Table 1.

FIGURE 20 | Graphical representation of Table 3. Each subplot corresponds to a different row of the table, showing the obtained values for each number of
orientations and scale. The black horizontal lines indicate the values obtained using algorithm A (0 orientations).

scales 2 and 3. Although the ratio of noise events is very similar for events and wrongly matched pairs have decreased by 66 and 46%,
both of them (1.9% for Scale 2 and 2.0% for Scale 3), Scale 3 pro- respectively. Therefore, the number of correctly matched events
vides a smaller ratio of wrongly matched events (7.8% for Scale 2 has increased by 64%. A frame reconstruction of the disparity
and 6.4% for Scale 3). Therefore, we conclude that the best per- map and the 3D sequence are shown in Figure 19.
formance is found with 8 orientations and Scale 3, as it is more The diameter of the reconstructed ring was measured manu-
appropriate to the geometry of the object. If we compare this case ally by selecting pairs of events with the largest possible separa-
with the results obtained applying algorithm A without Gabor fil- tion. This gave an average diameter of 21.40 cm, which implies a
ters (first column in Table 2), we observe an increase of 47% in reconstruction error of 0.6 cm. This error is also smaller than the
the number of matched events, while the proportions of isolated maximum predicted in Figure 10.

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 47


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Cube
Finally, a cube with an edge length of 15 cm was rotating in
front of the retinas, with a number of approximately 118 Kevents
generated by each retina in approximately 20 s. The same proce-
dure performed in previous examples was repeated, obtaining the
results shown in Table 3. All these results are presented graphi-
cally in Figure 20, where the colored vertical bars represent the
results obtained applying algorithm B with different number of
orientations and scales, while the black horizontal lines indicate
the values obtained using algorithm A (no Gabor filters). In this
case, the largest number of matched events (31 K) is given by 8
orientations and Scale 3, while both the ratio of isolated events
and the ratio of wrongly matched events are very similar for
the four different scales with 8 orientations (around 3% noise
and 10.9% wrong matches). Therefore, the best performance is
given by 8 orientations and Scale 3. If we compare this case with
the results obtained applying algorithm A without Gabor fil-
ters (first column in Table 3), we observe an increase of 181%
in the number of matched events, while the proportions of iso-
lated events and wrongly matched pairs have decreased by 78 and
46%, respectively. The number of correctly matched events has
increased by 350%.
A reconstruction of the disparity map and the 3D sequence
is shown in Figure 21. The ratio of wrongly matched events is
much larger than on the ring example (about twice as much).
That is because this object has many parallel edges, increasing
the number of events in the same epipolar line which are can-
didates to be matched and which the orientation filters do not
discriminate. While Figure 14 shows a situation where 2 different
positions in 3D space (A and B) can generate events that could
be wrongly matched, in this case we could find at least 4 different
positions in 3D space (as we have 4 parallel edges) with the same FIGURE 21 | Results obtained for the cube. (A) Disparity map
properties. reconstructed with Tframe = 50 ms corresponding to the rotation of the
The edge length of the reconstructed 3D cube was measured cube. (B) Result of the 3D reconstruction of the same frame of the cube
recording.
manually on the reconstructed events, giving an average length of
16.48 cm, which implies a reconstruction error of 1.48 cm. This
error is smaller than the maximum predicted in Figure 10.
of filters with several spatial scales, we have shown that we
can increase the number of reconstructed events for a given
CONCLUSION sequence, reducing the number of both noise events and wrong
This paper analyzes different strategies to improve 3D stereo matches at the same time. This improvement has been vali-
reconstruction in event-based vision systems. First of all, a com- dated by reconstructing in 3D three different objects. The size
parison between stereo calibration methods showed that by using of these objects was estimated from the 3D reconstruction, with
a calibration object with LEDs placed in known locations and an error smaller than theoretically predicted by the method
measuring their corresponding 2D projections with sub-pixel res- (1.5%).
olution, we can extract the geometric parameters of the stereo
setup. This method was tested by reconstructing the known coor- ACKNOWLEDGMENTS
dinates of the calibration object, giving a mean error comparable This work has been funded by ERANET grant PRI-PIMCHI-
to the size of each LED. 2011-0768 (PNEUMA) funded by the Spanish Ministerio de
Event matching algorithms have been proposed for stereo Economa y Competitividad, Spanish research grants (with
reconstruction, taking advantage of the precise timing informa- support from the European Regional Development Fund)
tion provided by DVS sensors. In this work, we have explored TEC2009-10639-C04-01 (VULCANO) and TEC2012-37868-
the benefits of using Gabor filters to extract the orientation C04-01 (BIOSENSE), Andalusian research grant TIC-6091
of the object edges and match events from pair wise fil- (NANONEURO) and by the French national Labex pro-
ters directly. This imposes the restriction that the distance gram Life-senses. The authors also benefited from both the
from the stereo cameras to the objects must be much larger CapoCaccia Cognitive Neuromorphic Engineering Workshop,
than the focal length of the lenses, so that edge orientations Sardinia, Italy, and the Telluride Neuromorphic Cognition
appear similar in both cameras. By analyzing different numbers Engineering Workshop, Telluride, Colorado.

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 48


Camuas-Mesa et al. Orientation filters for 3D reconstruction

SUPPLEMENTARY MATERIAL Leero-Bardallo, J. A., Serrano-Gotarredona, T., and Linares-Barranco, B.


The Supplementary Material for this article can be found (2011). A 3.6s latency asynchronous frame-free event-driven dynamic-vision-
sensor. IEEE J. Solid State Circuits 46, 14431455. doi: 10.1109/JSSC.2011.
online at: http://www.frontiersin.org/journal/10.3389/fnins. 2118490
2014.00048/abstract Lichtsteiner, P., Posch, C., and Delbrck, T. (2008). A 128x128 120dB 15 s latency
asynchronous temporal contrast vision sensor. IEEE J. Solid State Circuits 43,
REFERENCES 566576. doi: 10.1109/JSSC.2007.914337
Benosman, R., Ieng, S., Rogister, P., and Posch, C. (2011). Asynchronous event- Longuet-Higgins, H. (1981). A computer algorithm for reconstructing a scene from
based Hebbian epipolar geometry. IEEE Trans. Neural Netw. 22, 17231734. doi: two projections. Nature 293, 133135. doi: 10.1038/293133a0
10.1109/TNN.2011.2167239 Loop, C., and Zhang, Z. (1999). Computing rectifying homographies for
Camuas-Mesa, L., Acosta-Jimnez, A., Zamarreo-Ramos, C., Serrano- stereo vision. IEEE Conf. Comp. Vis. Pattern Recognit. 1, 125131. doi:
Gotarredona, T., and Linares-Barranco, B. (2011). A 32x32 pixel convolution 10.1109/CVPR.1999.786928
processor chip for address event vision sensors with 155ns event latency Lowe, D. G. (2004). Distinctive image features from scale-invariant key-
and 20Meps throughput. IEEE Trans. Circuits Syst. I 58, 777790. doi: points. Int. J. Comput. Vis. 60, 91110. doi: 10.1023/B:VISI.0000029664.
10.1109/TCSI.2010.2078851 99615.94
Camuas-Mesa, L., Zamarreo-Ramos, C., Linares-Barranco, A., Acosta-Jimnez, Luong, Q. T. (1992). Matrice Fondamentale et Auto-Calibration en Vision Par
A., Serrano-Gotarredona, T., and Linares-Barranco, B. (2012). An event- Ordinateur. Ph.D. Thesis, Universite de Paris-Sud, Centre dOrsay.
driven multi-kernel convolution processor module for event-driven visin Mahowald, M. (1992). VLSI Analogs of Neural Visual Processing: a Synthesis
sensors. IEEE J. Solid State Circuits 47, 504517. doi: 10.1109/JSSC.2011. of form and Function. Ph.D. dissertation, California Institute of Technology,
2167409 Pasadena, CA.
Carneiro, J., Ieng, S., Posch, C., and Benosman, R. (2013). Asynchronous event- Mahowald, M., and Delbrck, T. (1989). Cooperative stereo matching using
based 3D reconstruction from neuromorphic retinas. Neural Netw. 45, 2738. static and dynamic image features, in Analog VLSI Implementation of Neural
doi: 10.1016/j.neunet.2013.03.006 Systems, eds C. Mead and M. Ismail (Boston, MA: Kluwer Academic Publishers),
Cauwenberghs, G., Kumar, N., Himmelbauer, W., and Andreou, A. G. (1998). An 213238. doi: 10.1007/978-1-4613-1639-8_9
analog VLSI chip with asynchronous interface for auditory feature extraction. Marr, D., and Poggio, T. (1976). Cooperative computation of stereo disparity.
IEEE Trans. Circuits Syst. II 45, 600606. doi: 10.1109/82.673642 Science 194, 283287. doi: 10.1126/science.968482
Chan, V., Liu, S. C., and van Schaik, A. (2007). AER EAR: a matched silicon cochlea Maybank, S. J., and Faugeras, O. (1992). A theory of self-calibration of a moving
pair with address event representation interface. IEEE Trans. Circuits Syst. I 54, camera. Int. J. Comp. Vis. 8, 123152. doi: 10.1007/BF00127171
4859. doi: 10.1109/TCSI.2006.887979 Meister, M., and Berry II, M. J. (1999). The neural code of the retina. Neuron 22,
Choi, T. Y. W., Merolla, P., Arthur, J., Boahen, K., and Shi, B. E. (2005). 435450. doi: 10.1016/S0896-6273(00)80700-X
Neuromorphic implementation of orientation hypercolumns. IEEE Trans. Lindenbaum, M., Fischer, M., and Bruckstein, A. M. (1994). On Gabors contri-
Circuits Syst. I 52, 10491060. doi: 10.1109/TCSI.2005.849136 bution to image enhancement. Pattern Recognit. 27, 18. doi: 10.1016/0031-
Domnguez-Morales, M. J., Jimnez-Fernndez, A. F., Paz-Vicente, R. Jimnez- 3203(94)90013-2
Moreno, G., and Linares-Barranco, A. (2012). Live demonstration: on the Posch, C., Matolin, D., and Wohlgenannt, R. (2011). A QVGA 143 dB dynamic
distance estimation of moving targets with a stereo-vision AER system. Int. range frame-free PWM image sensor with lossless pixel-level video compres-
Symp. Circuits Syst. 2012, 721725. doi: 10.1109/ISCAS.2012.6272137 sion and time-domain CDS. IEEE J. Solid State Circuits 46, 259275. doi:
Faugeras, O. (1993). Three-Dimensional Computer Vision: a Geometric Viewpoint. 10.1109/JSSC.2010.2085952
Cambridge, MA: MIT Press. Rogister, P., Benosman, R., Ieng, S., Lichsteiner, P., and Delbruck, T. (2012).
Gong, M. (2006). Enforcing temporal consistency in real-time stereo estimation. Asynchronous event-based binocular stereo matching. IEEE Trans. Neural Netw.
ECCV 2006, Part III, 564577. doi: 10.1007/11744078_44 23, 347353. doi: 10.1109/TNNLS.2011.2180025
Granlund, G. H., and Knutsson, H. (1995). Signal Processing for Computer Vision. Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., Linares-Barranco, A., Paz-
Dordrecht: Kluwer. doi: 10.1007/978-1-4757-2377-9 Vicente, R., Gmez-Rodrguez, F., et al. (2009). CAVIAR: a 45k-Neuron,
Greenberg, S., Aladjem, M., and Kogan, D. (2002). Fingerprint image enhance- 5M-Synapse, 12G-connects/sec AER hardware sensory-processing-learning-
ment using filtering techniques. Real Time Imaging 8, 227236. doi: actuating system for high speed visual object recognition and tracking. IEEE
10.1006/rtim.2001.0283 Trans. Neural Netw. 20, 14171438. doi: 10.1109/TNN.2009.2023653
Hartley, R., and Zisserman, A. (2003). Multiple View Geometry in Computer Vision. Serrano-Gotarredona, R., Serrano-Gotarredona, T., Acosta-Jimenez, A., and
(New York, NY: Cambridge University Press). doi: 10.1017/CBO9780511811685 Linares-Barranco, B. (2006). A neuromorphic cortical-layer microchip for
Hong, L., Wan, Y., and Jain, A. (1998). Fingerprint image enhancement: algo- spike-based event processing vision systems. IEEE Trans. Circuits and Systems
rithm and performance evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 20, I 53, 25482556. doi: 10.1109/TCSI.2006.883843
777789. doi: 10.1109/34.709565 Serrano-Gotarredona, R., Serrano-Gotarredona, T., Acosta-Jimenez, A., Serrano-
jAER Open Source Project. (2007). Available online at: http://jaer.wiki. Gotarredona, C., Perez-Carrasco, J. A., Linares-Barranco, A., et al. (2008).
sourcefourge.net On real-time AER 2D convolutions hardware for neuromorphic spike
Khan, M. M., Lester, D. R., Plana, L. A., Rast, A. D., Jin, X., Painkras, E., based cortical processing. IEEE Trans. Neural Netw. 19, 11961219. doi:
et al. (2008). SpiNNaker: mapping neural networks onto a massively-parallel 10.1109/TNN.2008.2000163
chip multiprocessor, in Proceedings International Joint Conference on Neural Serrano-Gotarredona, T., Andreou, A. G., and Linares-Barranco, B. (1999). AER
Networks, IJCNN 2008 (Hong Kong), 28492856. doi: 10.1109/IJCNN.2008.46 image filtering architecture for vision processing systems. IEEE Trans. Circuits
34199 Syst. I 46, 10641071. doi: 10.1109/81.788808
Kogler, J., Sulzbachner, C., and Kubinger, W. (2009). Bio-inspired stereo vision Serrano-Gotarredona, T., and Linares-Barranco, B. (2013). A 128x128 1.5% con-
system with silicon retina imagers, in 7th ICVS International Conference on trast sensitivity 0.9% FPN 3s latency 4mW asynchronous frame-free dynamic
Computer Vision Systems, Vol. 5815, (Liege), 174183. doi: 10.1007/978-3-642- vision sensor using transimpedance amplifiers. IEEE J. Solid State Circuits 48,
04667-4_18 827838. doi: 10.1109/JSSC.2012.2230553
Lazzaro, J., Wawrzynek, J., Mahowald, M., Sivilotti, M., and Gillespie, D. (1993). Serrano-Gotarredona, T., Park, J., Linares-Barranco, A., Jimnez, A., Benosman,
Silicon auditory processors as computer peripherals. IEEE Trans. Neural Netw. R., and Linares-Barranco, B. (2013). Improved contrast sensitivity DVS and its
4, 523528. doi: 10.1109/72.217193 application to event-driven stereo vision. IEEE Int. Symp. Circuits Syst. 2013,
Leero-Bardallo, J. A., Serrano-Gotarredona, T., and Linares-Barranco, B. 24202423. doi: 10.1109/ISCAS.2013.6572367
(2010). A five-decade dynamic-range ambient-light-independent calibrated Silver, R., Boahen, K., Grillner, S., Kopell, N., and Olsen, K. L. (2007).
signed-spatial-contrast AER Retina with 0.1-ms latency and optional time- Neurotech for neuroscience: unifying concepts, organizing principles, and
to-first-spike mode. IEEE Trans. Circuits Syst. I 57, 26322643. doi: emerging tools. J. Neurosci. 27, 1180711819. doi: 10.1523/JNEUROSCI.3575-
10.1109/TCSI.2010.2046971 07.2007

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 48 | 49


Camuas-Mesa et al. Orientation filters for 3D reconstruction

Sivilotti, M. (1991). Wiring Considerations in Analog VLSI Systems With Application Conflict of Interest Statement: The authors declare that the research was con-
to Field-Programmable Networks. Ph.D. dissertation, California Institute of ducted in the absence of any commercial or financial relationships that could be
Technology, Pasadena, CA. construed as a potential conflict of interest.
Tsang, E. K. C., and Shi, B. E. (2004). A neuromorphic multi-chip model of a
disparity selective complex cell, in Advances in Neural Information Processing Received: 25 September 2013; accepted: 23 February 2014; published online: 31 March
Systems, Vol. 16, eds S. Thrun, L. K. Saul, and B. Schlkopf (Vancouver, BC: 2014.
MIT Press), 10511058. Citation: Camuas-Mesa LA, Serrano-Gotarredona T, Ieng SH, Benosman RB and
Venier, P., Mortara, A., Arreguit, X., and Vittoz, E. A. (1997). An integrated cortical Linares-Barranco B (2014) On the use of orientation filters for 3D reconstruction in
layer for orientation enhancement. IEEE J. Solid State Circuits 32, 177186. doi: event-driven stereo vision. Front. Neurosci. 8:48. doi: 10.3389/fnins.2014.00048
10.1109/4.551909 This article was submitted to Neuromorphic Engineering, a section of the journal
Zamarreo-Ramos, C., Linares-Barranco, A., Serrano-Gotarredona, T., and Frontiers in Neuroscience.
Linares-Barranco, B. (2013). Multi-casting mesh AER: a scalable assem- Copyright 2014 Camuas-Mesa, Serrano-Gotarredona, Ieng, Benosman and
bly approach for reconfigurable neuromorphic structured AER systems. Linares-Barranco. This is an open-access article distributed under the terms of the
Application to ConvNets. IEEE Trans. Biomed. Circuits Syst. 7, 82102. doi: Creative Commons Attribution License (CC BY). The use, distribution or reproduction
10.1109/TBCAS.2012.2195725 in other forums is permitted, provided the original author(s) or licensor are credited
Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE and that the original publication in this journal is cited, in accordance with accepted
Trans. Pattern Anal. Mach. Intell. 22, 13301334. doi: 10.1109/34. academic practice. No use, distribution or reproduction is permitted which does not
888718 comply with these terms.

www.frontiersin.org March 2014 | Volume 8 | Article 48 | 50


ORIGINAL RESEARCH ARTICLE
published: 07 February 2014
doi: 10.3389/fnins.2014.00009

Asynchronous visual event-based time-to-contact


Xavier Clady 1*, Charles Clercq 1,2 , Sio-Hoi Ieng 1 , Fouzhan Houseini 2 , Marco Randazzo 2 ,
Lorenzo Natale 2 , Chiara Bartolozzi 2 and Ryad Benosman 1,2
1
Vision Institute, Universit Pierre et Marie Curie, UMR S968 Inserm, UPMC, CNRS UMR 7210, CHNO des Quinze-Vingts, Paris, France
2
iCub Facility, Istituto Italiano di Tecnologia, Genova, Italia

Edited by: Reliable and fast sensing of the environment is a fundamental requirement for
Jennifer Hasler, Georgia Insitute of autonomous mobile robotic platforms. Unfortunately, the frame-based acquisition
Technology, USA
paradigm at the basis of main stream artificial perceptive systems is limited by low
Reviewed by:
temporal dynamics and redundant data flow, leading to high computational costs. Hence,
Ueli Rutishauser, California Institute
of Technology, USA conventional sensing and relative computation are obviously incompatible with the design
Leslie S. Smith, University of of high speed sensor-based reactive control for mobile applications, that pose strict limits
Stirling, UK on energy consumption and computational load. This paper introduces a fast obstacle
Scott M. Koziol, Baylor University,
USA
avoidance method based on the output of an asynchronous event-based time encoded
imaging sensor. The proposed method relies on an event-based Time To Contact (TTC)
*Correspondence:
Xavier Clady, Vision Institute, computation based on visual event-based motion flows. The approach is event-based in
Universit Pierre et Marie Curie, the sense that every incoming event adds to the computation process thus allowing fast
UMR S968 Inserm, UPMC, CNRS avoidance responses. The method is validated indoor on a mobile robot, comparing the
UMR 7210, CHNO des
Quinze-Vingts, 17 rue Moreau,
event-based TTC with a laser range finder TTC, showing that event-based sensing offers
75012 Paris, France new perspectives for mobile robotics sensing.
e-mail: [email protected]
Keywords: neuromophic vision, event-based computation, time to contact, robotics, computer vision

1. INTRODUCTION Several navigation strategies using vision have been proposed,


A fundamental navigation task for autonomous mobile robots is the most common consist of extracting depth information from
to detect and avoid obstacles in their path. This paper introduces visual information. Stereo-vision techniques can also produce
a full methodology for the event-based computation of Time To accurate depth maps if the stability of the calibration parame-
Contact (TTC) for obstacle avoidance, using an asynchronous ters and a relative sufficient inter-camera distance can be ensured.
event-based sensor. However, these are strong requirements for high-speed and small
Sensors such as ultrasonic sensors, laser range finders or robots. Another class of algorithms (Lorigo et al., 1997; Ulrich
infrared sensors are often mounted on-board of robotic platforms and Nourbakhsh, 2000), is based on color or texture segmenta-
in order to provide distance to obstacles. Such active devices are tion of the ground plane. Even if this approach works on a single
used to measure signals transmitted by the sensor and reflected image, it requires the assumption that the robot is operating on
by the obstacle(s). Their performance is essentially dependent on a flat and uni-colored/textured surface and all objects have their
how the transmitted energy (ultrasonic waves, light,...) interacts bases on the ground.
with the environment Everett (1995); Ge (2010). Another extensively studied strategy is based on the evalu-
These sensors have limitations. In the case of ultrasonic sen- ation of the TTC, noted . This measure, introduced by Lee
sors, corners and oblique surfaces, or even temperature variations (1976), corresponds to the time that would elapse before the robot
can provide artifacts in the measurements. Infrared-based sensors reaches an obstacle if the current relative motion between the
(including recently emerged Time-Of-Light or RGB-D cameras) robot and the obstacle itself were to continue without change.
are sensitive to sunlight and can fail if the obstacle absorbs the sig- As the robot can navigate through the environment following a
nal. Laser range finder readings may also be erroneous because of trajectory decomposed into straight lines (which is a classic and
specular reflections; additionally, the potential problems of eye- efficient strategy for autonomous robots in most environments),
safety limit the use of many laser sensors to environments where a general definition of TTC can be expressed as follows:
humans are not present. In addition, most of the sensors have
restrictions in terms of field-of-view and/or spatial resolution, Z
= dZ (1)
requiring a mechanical scanning system or a network of several dt
sensors. This leads to severe restrictions in terms of temporal
responsiveness and computational load. where Z is the distance between the camera and the obstacle, and
Vision can potentially overcome many of these restrictions; dZ
dt corresponds to the relative speed.
visual sensors often provide better resolution, wider range at The Time-to-contact can be computed considering only visual
faster rates than active scanning sensors. Their capacity to detect information, without extracting relative depth information and
the natural light reflected by the objects or the surrounding areas speed, as demonstrated by Camus (1995) (see Section 3.2). Its
paves the way to biological-inspired approaches. computation has the advantage to work with a single camera,

www.frontiersin.org February 2014 | Volume 8 | Article 9 | 51


Clady et al. Asynchronous visual event-based time-to-contact

without camera calibration or binding assumptions about the time stamping at which a given signal change occurs. Events
environment. Several techniques for the measure of TTC have can be processed locally while encoding the additional temporal
been proposed. In Negre et al. (2006); Alyena et al. (2009), it is dynamics of the scene.
approximated measuring the local scale change of the obstacle, This article presents an event-based methodology to measure
under the assumption that the obstacle is planar and parallel to the TTC from the events stream provided by a neuromorphic
the image plane. This approach requires either to precisely seg- vision sensor mounted on a wheeled robotic platform. The TTC
ment the obstacle in the image or to compute complex features in is computed and then updated for each incoming event, mini-
multi-scales representation of the image. Most studied methods mizing the computational load of the robot. The performance
of TTC rely on the estimation of optical flow. Optical flow conveys of the developed event-based TTC is compared with a laser
all necessary information from the environment Gibson (1978), range finder, showing that event-driven sensing and computation,
but its estimation on natural scenes is well-known to be a difficult with their sub-microsecond temporal resolution and the inherent
problem. Existing techniques are computationally expensive and redundancy suppression, are a promising solution to vision-based
are mostly used off line (Negahdaripour and Ganesan, 1992; Horn technology for high-speed robots.
et al., 2007). Real-time implementations, using gradient-based, In the following we briefly introduce the used neuromorphic
feature matching-based (Tomasi and Shi, 1994) or differential vision sensor (Section 2), describe the event-based approach pro-
ones, do not deal with large displacements. Multi-scale process, posed to compute the TTC (Section 3) and present experimental
as proposed by Weber and Malik (1995), can manage with this results validating the accuracy and the robustness of the proposed
limitation, at the cost of computing time and hardware memory technique on a mobile robots moving in an indoor environment
to store and process frames at different scales and timings. (Section 4).
Rind and Simmons (1999) proposed a bio-inspired neural net-
work modeling the lobula giant movement detector (LGMD), a 2. TIME ENCODED IMAGING
visual part of the optic lobe of the locust that responds most Biomimetic, event-based cameras are a novel type of vision
strongly to approaching objects. In order to process the frames devices thatlike their biological counterpartsare driven by
provided by a conventional camera, existing implementations events happening within the scene, and not by artificially
proposed by Blanchard et al. (2000) and Yue and Rind (2006) created timing and control signals (i.e., frame clock of con-
required a distributed computing environment (three PCs con- ventional image sensors) that have no relation whatsoever
nected via ethernet). Another promising approach consists in with the source of the visual information. Over the past
VLSI architecture implementing functional models of similar few years, a variety of these event-based devices, reviewed
neural networks, but it will require huge investments to go beyond in Delbruck et al. (2010), have been implemented, including
the single proof of concept, such as the 1-D architecture of 25 temporal contrast vision sensors that are sensitive to relative
pixels proposed by Indiveri (1998) modeling locust descending light intensity change, gradient-based sensors sensitive to static
contralateral movement detector (DCMD) neurons. The hard- edges, edge-orientation sensitive devices and optical-flow sen-
ware systems constructed in Manchester and Heidelberg, and sors. Most of these vision sensors encode visual information
described respectively by Bruderle et al. (2011) and Furber et al. about the scene in the form of asynchronous address events
(2012), could be an answer to this issue. (AER) Boahen (2000)using time rather than voltage, charge or
Globally, most of these approaches suffer from the limita- current.
tions imposed by frame-based acquisition of visual information The ATIS (Asynchronous Time-based Image Sensor) used in
in the conventional cameras, that output large and redundant this work is a time-domain encoding image sensor with QVGA
data flow, at a relative low temporal frequency. Most of the cal- resolution Posch et al. (2011). It contains an array of fully
culations are operated on uninformative parts of the images, or autonomous pixels that combine an illuminance change detector
are dedicated to compensate for the lack of temporal precision. circuit and a conditional exposure measurement block.
Existing implementations are often a trade off between accuracy As shown in the functional diagram of the ATIS pixel in
and efficiency and are restricted to mobile robots moving rela- Figure 1, the change detector individually and asynchronously
tively slowly. For example, Low and Wyeth (2005) and Guzel and initiates the measurement of an exposure/gray scale value only if
Bicker (2010) present experiments on the navigation of a wheeled a brightness change of a certain magnitude has been detected in
mobile robotic platform using optical flow based TTC compu- the field-of-view of the respective pixel. The exposure measure-
tation applied with an embedded conventional camera. Their ment circuit encodes the absolute instantaneous pixel illuminance
softwares run at approximatively 5 Hz and the maximal speed of into the timing of asynchronous event pulses, more precisely into
the mobile robot is limited to 0.2 m/s. inter-event intervals.
In this perspective, free-frame acquisition of the neuromor- Since the ATIS is not clocked, the timing of events can be
phic cameras (Guo et al., 2007; Lichtsteiner et al., 2008; Lenero- conveyed with a very accurate temporal resolution in the order
Bardallo et al., 2011; Posch et al., 2011), can introduce significant of microseconds. The time-domain encoding of the intensity
improvements in robotic applications. The operation of such sen- information automatically optimizes the exposure time sepa-
sors is based on independent pixels that asynchronously collect rately for each pixel instead of imposing a fixed integration time
and send their own data, when the processed signal exceeds a for the entire array, resulting in an exceptionally high dynamic
tunable threshold. The resulting compressed stream of events range and an improved signal to noise ratio. The pixel-individual
includes the spatial location of active pixels and an accurate change detector driven operation yields almost ideal temporal

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 9 | 52


Clady et al. Asynchronous visual event-based time-to-contact

redundancy suppression, resulting in a sparse encoding of the 3. EVENT-BASED TTC COMPUTATION


image data. 3.1. EVENT-BASED VISUAL MOTION FLOW
Figure 2 shows the general principle of asynchronous imag- The stream of events from the silicon retina can be mathemat-
ing in a spatio-temporal representation. Frames are absent from ically defined as follows: let e(p, t) = (p, t)T a triplet giving the
this acquisition process. They can however be reconstructed, position p = (x, y)T and the time t of an event. We can then
when needed, at frequencies limited only by the temporal res- define locally the function e that maps to each p, the time t:
olution of the pixel circuits (up to hundreds of kiloframes per
second) (Figure 2 top). Static objects and background informa- e : N2 R
(2)
tion, if required, can be recorded as a snapshot at the start of p  e (p) = t
an acquisition. And henceforward moving objects in the visual
scene describe a spatio-temporal surface at very high temporal Time being an increasing function, e is a monotonically increas-
resolution (Figure 2 bottom). ing surface in the direction of the motion.

FIGURE 1 | Functional diagram of an ATIS pixel Posch (2010). Two types of asynchronous events, encoding change and brightness information, are
generated and transmitted individually by each pixel in the imaging array.

FIGURE 2 | Lower part The spatio-temporal space of imaging events: change. Frames are absent from this acquisition process. Samples of
static objects and scene background are acquired first. Then, dynamic generated images from the presented spatio-temporal space are shown in
objects trigger pixel-individual, asynchronous gray-level events after each the upper part of the figure.

www.frontiersin.org February 2014 | Volume 8 | Article 9 | 53


Clady et al. Asynchronous visual event-based time-to-contact

We then set the first partial derivatives with respect to the


parameters as: ex =  e
x and ey = y (see Figure 3). We can
e

then write e as:

e (p + p) = e (p) + eT p + o(||p||) (3)


 T
with e =  x
e e
, y .
The partial functions of e are functions of a single variable,
whether x or y. Time being a strictly increasing function, e is
a nonzero derivatives surface at any point. It is then possible
to use the inverse function theorem to write around a location
p = (x, y)T :
 T  T
e e de |y = y0 de |x = x0
(x, y0 ), (x0 , y) = (x), (y)
x y dx dy FIGURE 3 | General principle of visual flow computation, the surface of
active events e is derived to provide an estimation of orientation and
 T amplitude of motion.
1 1
= , (4)
vnx (x, y0 ) vny (x0 , y)
the focus of expansion (FOE). The visual motion flow field and
e |x = x0 , e |y = y0 being e restricted respectively to x = x0 and the corresponding focus of expansion can be used to determine
y = y0 , and vn (x, y) = (vnx , vny )T represents the normal compo- the time-to-contact (TTC) or time-to-collision. If the camera
nent of the visual motion flow; it is perpendicular to the object is embedded on an autonomous robot moving with a constant
boundary (describing the local surface e ). velocity, the TTC can be determined without any knowledge of
The gradient of e or e , is then: the distance to be traveled or the velocity the robot is moving.
We assume the obstacle is at P = (Xc , Yc , Zc )T in the cam-
 T
1 1 era coordinate frame and p = (x, y)T is its projection into the
e (p, t) = , (5) cameras focal plane coordinate frame (see Figure 4). The velocity
vnx (x, y0 ) vny (x0 , y)
vector V is also projected into the focal plane as v = (x, y )T .
By deriving the pinhole models equations, Camus (1995)
The vector e measures the rate and the direction of change demonstrates that, if the coordinates pf = (xf , yf )T of the FOE
of time with respect to the space, its components are also the are known, the following relation is satisfied:
inverse of the components of the velocity vector estimated at p.
The flow definition given by Equation 5 is sensitive to noise Zc y yf x xf
since it consists in estimating the partial derivatives of e at each = = = , where
Z c y x
individual event. One way to make the flow estimation robust
against noise is to add a regularization process to the estima- dZc dx dy
Z c = , x = , y = . (6)
tion. To achieve this, we assume a local velocity constancy. This dt dt dt
hypothesis is satisfied in practice for small clusters of events. It is
then equivalent to assume e being locally planar since its par- With our notation, this is equivalent to:
tial spatial derivatives are the inverse of the speed, hence constant
velocities produce constant spatial rate of change in e . Finally, (p, t)v(p, t) = p pf (7)
the slope of the fitted plane with respect to the time axis is directly
proportional to the motion velocity. The regularization also com- The TTC is then obtained at pixel p according to the relation:
pensates for absent events in the neighborhood of active events
where motion is being computed. The plane fitting provides an vT (p, t)(p pf )
(p, t) = (8)
approximation of the timing of still non active spatial locations ||v(p, t)||2
due the non idealities and the asynchronous nature of the sensor.
The reader interested in the computation of motion flow can refer The TTC as defined is a signed real value because of the scalar
to Benosman et al. (2014) for more details. A full characterization product. Its sign refers to the direction of the motion: when
of its computational cost is proposed; it shows that the event- is positive, the robot is going toward the obstacle and, vicev-
based calculation required much less computation time than the ersa, for negative it is getting away. This equality shows also
frame-based one. that can be determined only if the velocity v at p is known or
can be estimated for any p at anytime t. There is unfortunately
3.2. TIME-TO-CONTACT no general technique for estimating densely the velocity v from
Assuming parts of the environment are static, while the camera is the visual information. However, optical flow techniques allow
moving forward, the motion flow diverges around a point called to compute densely the vector field of velocities normal to the

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 9 | 54


Clady et al. Asynchronous visual event-based time-to-contact

FIGURE 4 | 3D obstacle velocity V projected into the camera focal plane as v. The dotted letters refer to temporal derivatives of each component.

edges, noted as vn . The visual flow technique presented in subsec- The principle of the approach is described in Algorithm 1. We
tion 3.2 is the ideal technique to compute , not only because of consider a probability map of the visual field, where each point
its event-based formulation, but it is also showing that the nor- represents the likelihood of the FOE to be located on the corre-
mal to the edge component of v is sufficient for determination. sponding point in the field. Every flow provides an estimation of
From Equation 7, we apply the scalar product of both end sides the location of the FOE in the visual field; indeed, because the
with e : visual flow is diverging from the FOE, it belongs to the negative
semi-plane defined by the normal motion flow vector. So, for each
(p, t)v(p, t)T e (p, t) = (p pf )T e (p, t) (9) incoming event, all the corresponding potential locations of the
FOE are also computed (step 3 in Algorithm 1) and their likeli-
Because v can be decomposed as the sum of a tangential vector vt , hood is increased (step 4). Finding the location of the probability
and a normal vector vn , the left end side of Equation 9 simplifies map with maximum value, the FOE is shifted toward this location
into: (step 5)). This principe is illustrated in Figure 5A. The area with
the maximum of probability is highlighted as the intersection of
 T the negative semi-planes defined by the normal motion flow vec-
vT (p, t)e (p, t) = vt (p, t) + vn (p, t) e (p, t)
tors. Finally, an exponential decreasing function is applied on the
= vTn (p, t)e (p, t) = 2 (10) probability map; it allows updating the location of the FOE, giv-
ing more importance to the contributions provided by the most
vTt e = 0 since the tangential component is orthogonal to recent events and their associated flow.
e . Therefore is given by: Figures 5B,C show real results obtained viewing a densely tex-
tured pattern (the same as used in Experiment 1, see Figure 7).
1 Figure 5B shows the probability map defined as an accumulative
(p, t) = (p pf )T e (p, t) (11) table and the resulting FOE. The corresponding motion flow is
2
given in Figure 5C; the normal motion vectors (with an ampli-
3.3. FOCUS OF EXPANSION tude superior than a threshold) computed in a time interval
The FOE is the projection of the observers direction of transla- t = 10 ms are represented as yellow arrows. Globally, the esti-
tion (or heading) on the sensors image plane. The radial pattern mated FOE is consistent with the motion flow. However, some
of flows depends only on the observers heading and is indepen- small groups of vectors (an example is surrounded by a white
dent of 3D structure, while the magnitude of flow depends on dotted ellipse) that seems converging, instead of diverging, to the
both heading and depth. Thus, in principle, the FOE could be FOE. Such flow events do not occur at the same time as the others;
obtained by triangulation of two vectors in a radial flow pattern. they are most probably generated by a temporary micro-motion
However, such a method would be vulnerable to noise. To cal- (vibration, unexpected roll-, pitch- or yaw-motion). The cumula-
culate the FOE, we used the redundancy in the flow pattern to tive process allows to filter such noise motions and to keep a FOE
reduce errors. stable.

www.frontiersin.org February 2014 | Volume 8 | Article 9 | 55


Clady et al. Asynchronous visual event-based time-to-contact

For an incoming event e(p, t) with a velocity vector vn , we can the actual distance between the platform and the obstacles. In
define the following algorithm to estimate the FOE: experimental environment free of specular or transparent objects
(as in the first proposed experiment), the TTC based on the
LRF can be estimated using the Equation 1 and is used as
Algorithm 1 | Computation of the Focus Of Expansion. ground truth measure against which the event-based TTC is
Require Mprob Rm Rn and Mtime Rm Rn (Mprob is the probability
benchmarked.
map and holds the likelihood for each spatial location and Mtime the
last time when its likelihood has been increased).
1: Initiate the matrices Mprob and Mtime to 0

2: for every incoming e(p, t) at velocity vn do


3: Determine all spatial locations pi such as (p pi )T .vn > 0
4: for all pi : Mprob (pi ) = Mprob (pi ) + 1 and Mtime (pi ) = ti
5: pi Rm Rn , update the probability map Mprob (pi )= Mprob (pi )
ti Mtime (pi )
e t

6: Find pf = (xf , yf )T the spatial location of the maximum value of


Mprob corresponding to the FOE location
7: end for

4. EXPERIMENTAL RESULTS
The method proposed in the previous sections is validated
in the experimental setup illustrated in Figure 6. The neuro-
FIGURE 6 | Experimental setup: (A) the Pioneer 2, (B) the asynchronous
morphic camera is mounted on a Pioneer 2 robotic platform, event-based ATIS camera, (C) the Hokuyo laser range finder (LRF).
equipped with a Hokuyo laser range finder (LRF) providing

A B

FIGURE 5 | Computation of the focus of expansion: (A) the focus of 1, cf. Figure 7). Note that only the vectors with high amplitude are
expansion lies under the normal flow, we can then vote for an area of represented in order to enhance the readability of the Figure. Most of the
the focal plane shown in (B) the FOE is the max of this area (C) Motion motion flow vectors are diverging from the estimated FOE. The white ellipse
flow vectors obtained during a time period of t = 10 ms and in the up left corner shows a group of inconsistent motion flow vectors: they
superimposed over a corresponding snapshot (realized using the PWM are probably due to a temporary noise micro-motion (vibration, unexpected
grayscale events; the viewed pattern is the same as used in Experiment roll-, pitch-, or yaw-motion).

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 9 | 56


Clady et al. Asynchronous visual event-based time-to-contact

In the first experiment, the robot is moving forward and back- estimation of the speed of the robot based on the LRF is rela-
ward in the direction of a textured obstacle as shown in Figure 7, tively inaccurate during the change of velocity. In addition, brutal
the corresponding TTC estimated by both sensors (LRF and changes of velocity could generate fast pitch motions which pro-
ATIS) is shown in Figure 8A. The TTC is expressed in the coordi- duce unstable FOE. Globally, more than the 60% of the relative
nate system of the obstacle: the vertical axis corresponds to time errors are inferior to 20% showing that that the event-based
and the horizontal axis to the size of the obstacle. The extremes approach is robust and accurate when the motion of the robot
(and the white parts of the plots) correspond to the changes of is stable.
direction of the robot: when its speed tends to 0, the LRF based In the second experiment, the robot moves along a corri-
TTC tends to infinity and the vision based TTC cannot be com- dor. In this conditions, multiple objects reflect the light from
puted because too few events are generated. In order to show the LRF, that fails to detect obstacles, on the contrary the event-
comparable results, only the TTC obtained with a robot speed based algorithm succeeds in estimating the TTC relative to the
superior to 0.1 m/s are shown; under this value, the robot motion obstacles. Figure 8 shows the robots trajectory: during the first
is relatively unstable, the robot tilting during the acceleration stage the robot navigates toward an obstacle (portion A-B of
periods. the trajectory). An avoidance maneuver is performed during
Figure 8B shows the relative error of the event-based TTC with portion B-C that leads the robot to continue its trajectory to
respect of the ground truth calculated with the LRF TTC. The enter the warehouse (portion C-D). The estimated TTC to the
error is large during the phases of positive and negative accel- closest obstacle, is shown as red plots in Figure 9 and com-
erations of the robot. There are two potential explanations. The pared to the ground truth given by the odometers data (in

A B

FIGURE 7 | First experiment: (A) setup and location of the coordinate TTC over time are computed based on the odometer of the robot. Only
system (XO , YO , ZO ) related to the obstacle; (B) distance between the the TTC computed while the velocity of the robot is superior to 0.1 m/s is
robot and the obstacle, velocity of the robot and the relative estimated given, because it tends to infinity when velocity tends to 0.

FIGURE 8 | Comparison of the results obtained while the robot is (A) TTC computed using the LRF (right) and the ATIS (left). (B) Relative errors
moving forward and backward in the direction of an obstacle. Results between bothTTC estimations. illustrated using a color map, blue to red for
are expressed related to time and the coordinates system of the obstacle. increasing TTC.

www.frontiersin.org February 2014 | Volume 8 | Article 9 | 57


Clady et al. Asynchronous visual event-based time-to-contact

FIGURE 9 | Results of second experiment: the top Figure represents two time intervals during which the TTC is estimated; the red curves
the trajectory followed by the the robot, on a schematic view of the correspond to the TTC estimated from the neuromorphic cameras
warehouse, both middle Figures represents data collected from the data, compared to an estimation of the ttc (blue curves) using the
odometer (the trajectory and the speed of the robot) and finally, the odometers data and the knowledge of the obstacles locations in
bottom Figures represent the time-to-contact estimated during the the map.

blue). It corresponds to the TTC collected in a region of inter- the lower ROI, in which the activity, expressed as the num-
est of 60 60 pixels, matching with the closest obstacle. The ber of events per second, is superior to a threshold (>5000
image plane is segmented into four regions of interest (ROI) of events/s), are considered, assuming that the closest obstacle is
60 60 pixels (4 squares represented in the Figure 10) around on the ground and so viewed in the bottom part of the vision
the x-coordinate of the FOE. Only the normal flow vectors into field.

February 2014 | Volume 8 | Article 9 | 58


Frontiers in Neuroscience | Neuromorphic Engineering
Clady et al. Asynchronous visual event-based time-to-contact

Such computational requirements are out of the reach of most


small robots. Additionally, the temporal resolution of frame-
based cameras trades off with the quantity of data that need to
be processed, posing limits on the robots speed and computa-
tional demand. In this paper, we gave an example of a simple
collision avoidance technique based on the estimation of the TTC
by combining the use of an event-based vision sensor and a recent
previously developed event-based optical flow. We showed that
event-based techniques can solve vision tasks in a more efficient
way than traditional approaches that are used to do, by means of
complex and hungry algorithms.
One remarkable highlight of this work is how well the event-
based optical flow presented in Benosman et al. (2014) helped
in estimating the TTC. This is because we have ensured the
preservation of the high temporal dynamics of the signal from
its acquisition to its processing. The precise timing conveyed by
FIGURE 10 | TTC computation: the yellow arrows represent the motion
flow vectors obtained during a time period of 1 ms. These flow vectors
the neuromorphic camera allows to process locally around each
are superimposed over a corresponding image of the scene (realized using event for a low computational cost, whilst ensuring a precise com-
the PWM grayscale events). In order to enhance the readability of the putation of the visual motion flow and thus, of the TTC. The
Figure, only 10% vectors with high strengths and orientations close to experiments carried out on a wheeled robotic platform support
/2 have been draw. The red square corresponds to the ROI where the
this statement, as the results are as reliable as the ones obtained
measure of TTC is estimated.
with a laser range finder, at a much higher frequency. With
event-based vision, the motion behavior of a robot could be con-
trolled with a time delay far below the one that is inherent to the
The low shift between them can be explained by the drift frame-based acquisition in conventional cameras.
in odometers data (especially after the avoidance maneuver) The method described in this work stands on the constant
Everett (1995); Ge (2010); a difference of 0.5 m. has been observed velocity hypothesis since Equation 1 is a result derived from that
between the real position of the robot and the odometer-based assumption. for this reason, the normal to the edges velocity is
estimate of the ending location. This is an expected effect, as sufficient for the TTC estimation. For more general motion, the
odometer always drifts in the same measured proportions Ivanjko proposed method should be modified by for example assuming
et al. (2007). In addition, the estimations are slightly less precise the velocity to be constant only locally.
once the robot is in the warehouse, where the poor environment This work supports the observation that event-driven (bio-
with white walls without texture or objects produces less events inspired) asynchronous sensing and computing are opening
and the computation degrades. This shows the robustness of the promising perspectives for autonomous robotic applications. The
technique even in poorly textured. event-based approaches would allow small robots to avoid obsta-
All programs have been written in C++ under linux and cles in natural environment with high speed that has never been
run in real time. We estimated the average time per event spent achieved until now. Extending our approach to more complex
to compute the Time-to-Contact: it is approximately 20 s per scenarios than those exposed in this paper, and proposing a
event on the computer used in the experiments (Intel Core i7 at complete navigation system able to deal with motion or uncon-
2.40 GHz). When the robot is at its maximum speed, data stream trolled environment, requires to combine the visual information
acquired during 1 s is processed in 0.33 s . The estimation of the with other provided from top-down process and proprioceptive
visual flow is the most computationally expensive task (>99% of sensing, as for humans or animals.
the total computational cost), but could be easily run in parallel
to further accelerate it. ACKNOWLEDGMENTS
The most significant result of this work is that the TTC can be This work benefitted from the fruitful discussions and collab-
processed at an unprecedented rate and with a low computational orations fostered by the CapoCaccia Cognitive Neuromorphic
cost. The output frequency of our method reaches over 16 kHz, Engineering Workshop and the NSF Telluride Neuromorphic
which is largely superior to the ones which can be expected from Cognition workshops.
any other conventional cameras, limited by their frame-based
acquisitions and processing load needed to process data. FUNDING
This work has been supported by the EU grant eMorph (ICT-
5. CONCLUSIONS AND PERSPECTIVES FET-231467). The authors are also grateful to the Lifesense Labex.
The use of vision based navigation using conventional frame-
based cameras is impractical for the limited available resources REFERENCES
usually embedded on autonomous robots. The corresponding Alyena, G., Negre, A., and Crowley, J. L. (2009). Time to contact for obstacle
avoidance, in European Conference on Mobile Robotics (Dubrovnik).
large amount of data to process is not compatible with fast Benosman, R., Clercq, C., Lagorce, X., Ieng, S.-H., and Bartolozzi, C. (2014).
and reactive navigation commands, especially when parts of Event-based visual flow. IEEE Trans. Neural Netw. Learn. Syst. 25, 407417. doi:
the processing are allocated to extract the useful information. 10.1109/TNNLS.2013.2273537

www.frontiersin.org February 2014 | Volume 8 | Article 9 | 59


Clady et al. Asynchronous visual event-based time-to-contact

Blanchard, M., Rind, F., and Verschure, P. F. (2000). Collision avoidance using Low, T., and Wyeth, G. (2005). Obstacle detection using optical flow, in
a model of the locust LGMD neuron. Robot. Auton. Syst. 30, 1738. doi: Proceedings of Australasian Conference on Robotics and Automation (Sydney,
10.1016/S0921-8890(99)00063-9 NSW). doi: 10.1109/IVS.1992.252254
Boahen, K. A. (2000). Point-to-point connectivity between neuromorphic chips Negahdaripour, S., and Ganesan, V. (1992). Simple direct computation of the
using address-events. Circuits and Sys. II: Analog and Digit. Signal Process. IEEE FOE with confidence measures, in Computer Vision and Pattern Recognition
Trans. 47, 416434. doi: 10.1109/82.842110 (Champaign, IL), 228235. doi: 10.1109/CVPR.1992.22327
Bruderle, D., Petrovici, M. A., Vogginger, B., Ehrlich, M., Pfeil, T., Millner, S., et al. Negre, A., Braillon, C., Crowley, J. L., and Laugier, C. (2006). Real-time time-
(2011). A comprehensive workflow for general-purpose neural modeling with to-collision from variation of intrinsic scale, in International Symposium
highly configurable neuromorphic hardware systems. Biol. Cybern. 104, 263 of Experimental Robotics (Rio de Janeiro), 7584. doi: 10.1007/978-3-540-
296. doi: 10.1007/s00422-011-0435-9 77457-0_8
Camus, T. (1995). Calculating time-to-contact using real-time quantized optical Posch, C. (2010). High-dr frame-free pwm imaging with asynchronous aer inten-
flow. National Institute of Standards and Technology NISTIR 5609. sity encoding and focal-plane temporal redundancy suppression, in Circuits
Delbruck, T., Linares-Barranco, B., Culurciello, E., and Posch, C. (2010). Activity- and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on
driven, event-based vision sensors, in IEEE International Symposium on Circuits Circuits and Systems (Paris). doi: 10.1109/ISCAS.2010.5537150
and Systems (Paris), 24262429. doi: 10.1109/ISCAS.2010.5537149 Posch, C., Matolin, D., and Wohlgenannt, R. (2011). A qvga 143 db dynamic
Everett, H. (1995). Sensors for Mobile Robots: Theory and Applications. Natick, MA: range frame-free pwm image sensor with lossless pixel-level video com-
A K Peters/CRC Press. pression and time-domain cds. J. Solid-State Circ. 46, 259275. doi:
Furber, S., Lester, D., Plana, L., Garside, J., Painkras, E., Temple, S., et al. 10.1109/JSSC.2010.2085952
(2012). Overview of the spinnaker system architecture. IEEE Trans. Comput. 62, Rind, F. C., and Simmons, P. J. (1999). Seeing what is coming: building
24542467. doi: 10.1109/TC.2012.142 collision-sensitive neurones. Trends Neurosci. 22. 215220. doi: 10.1016/S0166-
Ge, S. (2010). Autonomous Mobile Robots: Sensing, Control, Decision Making and 2236(98)01332-0
Applications. Automation and Control Engineering. Boca Raton, FL: Taylor and Tomasi, C., and Shi, J. (1994). Good features to track, in Proceedings CVPR 94.,
Francis. IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
Gibson, J. J. (1978). The ecological approach to the visual perception of pictures. 1994, (Seattle, WA), 593600. doi: 10.1109/CVPR.1994.323794
Leonardo 11, 227235. doi: 10.2307/1574154 Ulrich, I., and Nourbakhsh, I. R. (2000). Appearance-based obstacle detection
Guo, X., Qi, X., and Harris, J. (2007). A time-to-first-spike cmos image sensor. Sens. with monocular color vision, in Proceedings of the International Conference on
Jo. IEEE 7, 11651175. doi: 10.1109/JSEN.2007.900937 AAAI/IAAI (Austin, TX), 866871.
Guzel, M., and Bicker, R. (2010). Optical flow based system design for mobile Weber, J., and Malik, J. (1995). Robust computation of optical flow in a multi-scale
robots, in Robotics Automation and Mechatronics (RAM), 2010 IEEE Conference differential framework. Int. J. Comput. Vis. 14, 6781. doi: 10.1007/BF01421489
on (Singapour), 545550. doi: 10.1109/RAMECH.2010.5513134 Yue, S., and Rind, F. C. (2006). Collision detection in complex dynamic scenes
Horn, B., Fang, Y., and Masaki, I. (2007). Time to contact relative to a planar using an lgmd-based visual neural network with feature enhancement. Neural
surface, in Intelligent Vehicles Symposium, 2007 IEEE (Istanbul), 6874. doi: Netw. IEEE Trans. 17, 705716. doi: 10.1109/TNN.2006.873286
10.1109/IVS.2007.4290093
Indiveri, G. (1998) Analog vlsi model of locust dcmd neuron response for Conflict of Interest Statement: The authors declare that the research was con-
computation of object approach. Prog. Neural Process. 10, 4760. doi: ducted in the absence of any commercial or financial relationships that could be
10.1142/9789812816535_0005 construed as a potential conflict of interest.
Ivanjko, E., Komsic, I., and Petrovic, I. (2007). Simple off-line odometry calibra-
tion of differential drive mobile robots, in Proceedings of 16th Int. Workshop on Received: 30 September 2013; accepted: 16 January 2014; published online: 07
Robotics in Alpe-Adria-Danube Region-RAAD. (Ljubljana). February 2014.
Lee, D. N. (1976). A theory of visual control of braking based on information about Citation: Clady X, Clercq C, Ieng S-H, Houseini F, Randazzo M, Natale L, Bartolozzi
time-to-collision. Perception 5, 437459. doi: 10.1068/p050437 C and Benosman R (2014) Asynchronous visual event-based time-to-contact. Front.
Lenero-Bardallo, J., Serrano-Gotarredona, T., and Linares-Barranco, B. (2011). A Neurosci. 8:9. doi: 10.3389/fnins.2014.00009
3.6 s latency asynchronous frame-free event-driven dynamic-vision-sensor. J. This article was submitted to Neuromorphic Engineering, a section of the journal
Solid-State Circ. 46, 14431455. doi: 10.1109/JSSC.2011.2118490 Frontiers in Neuroscience.
Lichtsteiner, P., Posch, C., and Delbruck, T. (2008). A 128128 120 db 15s latency Copyright 2014 Clady, Clercq, Ieng, Houseini, Randazzo, Natale, Bartolozzi and
asynchronous temporal contrast vision sensor. J. Solid-State Circ. 43, 566576. Benosman. This is an open-access article distributed under the terms of the Creative
doi: 10.1109/JSSC.2007.914337 Commons Attribution License (CC BY). The use, distribution or reproduction in other
Lorigo, L., Brooks, R., and W. Grimsou, W. (1997). Visually-guided obstacle avoid- forums is permitted, provided the original author(s) or licensor are credited and that
ance in unstructured environments, in Proceedings of the IEEE International the original publication in this journal is cited, in accordance with accepted academic
Conference on Intelligent Robots and Systems Vol. 1 (Grenoble), 373379. doi: practice. No use, distribution or reproduction is permitted which does not comply with
10.1109/IROS.1997.649086 these terms.

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 9 | 60


ORIGINAL RESEARCH ARTICLE
published: 08 October 2013
doi: 10.3389/fnins.2013.00178

Real-time classification and sensor fusion with a spiking


deep belief network
Peter OConnor , Daniel Neil , Shih-Chii Liu , Tobi Delbruck and Michael Pfeiffer*
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland

Edited by: Deep Belief Networks (DBNs) have recently shown impressive performance on a broad
Andr Van Schaik, The University of range of classification problems. Their generative properties allow better understanding of
Western Sydney, Australia
the performance, and provide a simpler solution for sensor fusion tasks. However, because
Reviewed by:
of their inherent need for feedback and parallel update of large numbers of units, DBNs are
Bernabe Linares-Barranco, Instituto
de Microelectrnica de Sevilla, Spain expensive to implement on serial computers. This paper proposes a method based on the
Eugenio Culurciello, Purdue Siegert approximation for Integrate-and-Fire neurons to map an offline-trained DBN onto
University, USA an efficient event-driven spiking neural network suitable for hardware implementation.
*Correspondence: The method is demonstrated in simulation and by a real-time implementation of a 3-layer
Michael Pfeiffer, Institute of
network with 2694 neurons used for visual classification of MNIST handwritten digits
Neuroinformatics, University of
Zurich and ETH Zurich, with input from a 128 128 Dynamic Vision Sensor (DVS) silicon retina, and sensory-
Winterthurerstrasse 190, CH-8057, fusion using additional input from a 64-channel AER-EAR silicon cochlea. The system
Zurich, Switzerland is implemented through the open-source software in the jAER project and runs in real-
e-mail: [email protected]
time on a laptop computer. It is demonstrated that the system can recognize digits in the
presence of distractions, noise, scaling, translation and rotation, and that the degradation
of recognition performance by using an event-based approach is less than 1%. Recognition
is achieved in an average of 5.8 ms after the onset of the presentation of a digit. By cue
integration from both silicon retina and cochlea outputs we show that the system can be
biased to select the correct digit from otherwise ambiguous input.
Keywords: deep belief networks, spiking neural network, silicon retina, sensory fusion, silicon cochlea, deep
learning, generative model

1. INTRODUCTION backpropagation, where overfitting and premature convergence


Deep Learning architectures, which subsume convolutional net- pose problems (Hochreiter et al., 2001; Bengio et al., 2006). The
works (LeCun et al., 1998), deep autoencoders (Hinton and data required for pre-training does not have to be labeled, and can
Salakhutdinov, 2006), and in particular DBNs (Bengio et al., thus make use of giant databases of images, text, sounds, videos,
2006; Hinton et al., 2006; Hinton and Salakhutdinov, 2006) have etc. that are now available as collections from the Internet. An
excelled among machine learning approaches in pushing the additional attractive feature is that the performance of deep net-
state-of-the-art in virtually all relevant benchmark tasks to new works typically improves with network size, and there is new hope
levels. In this article we focus on DBNs, which are constructed as of achieving brain-like artificial intelligence simply by scaling up
hierarchies of recurrently connected simpler probabilistic graph- the computational resources.
ical models, so called Restricted Boltzmann Machines (RBMs). With the steady increase in computing power, DBNs are
Every RBM consists of two layers of neurons, a hidden and a vis- becoming increasingly important for an increasing number of
ible layer, which are fully and symmetrically connected between commercial big data applications. Using gigantic computational
layers, but not connected within layers (see Figure 1). Using resources industry leaders like Google or Microsoft have started
unsupervised learning, each RBM is trained to encode in its to invest heavily in this technology, which has thus been recently
weight matrix a probability distribution that predicts the activ- named one of the Breakthrough Technologies of 2013 (MIT
ity of the visible layer from the activity of the hidden layer. By Technology Review, 2013), and has led to what has been called
stacking such models, and letting each layer predict the activ- the second reNNaissance of neural networks (Ciresan et al.,
ity of the layer below, higher RBMs learn increasingly abstract 2010). This is the result of the success stories of Deep Learning
representations of sensory inputs, which matches well with rep- approaches for computer vision (Larochelle et al., 2007; Lee et al.,
resentations learned by neurons in higher brain regions e.g., of 2009; Ciresan et al., 2010; Le et al., 2012), voice recognition
the visual cortical hierarchy (Gross et al., 1972; Desimone et al., (Dahl et al., 2012; Hinton et al., 2012; Mohamed et al., 2012),
1984). The success of Deep Learning rests on the unsupervised or machine transcription and translation (Seide et al., 2011; MIT
layer-by-layer pre-training with the Contrastive Divergence (CD) Technology Review, 2013). Despite this potential, the sheer num-
algorithm (Hinton et al., 2006; Hinton and Salakhutdinov, 2006), ber of neurons and connections in deep neural networks requires
on which supervised learning and inference can be efficiently massive computing power, time, and energy, and thus makes
performed (Bengio et al., 2006; Erhan et al., 2010). This avoids their use in real-time applications e.g., on mobile devices or
typical problems of training large neural networks with error autonomous robots infeasible. Instead of speculating on Moores

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 61


OConnor et al. Spiking deep belief networks

are transferred to a functionally equivalent spiking neural net-


work, in which event-driven real-time inference is performed. In
this article we explicitly perform learning of the network offline,
rather than with spike-based learning rules, but note that there is
a high potential for future event-driven DBNs that could exploit
spike-timing based learning for recognizing dynamical inputs.
We evaluate the spiking DBNs by demonstrating that networks
constructed in this way are able to robustly and efficiently clas-
sify handwritten digits from the MNIST benchmark task (LeCun
et al., 1998), given either simulated spike-train inputs encoding
FIGURE 1 | Boltzmann and Restricted Boltzmann Machines. A static images of digits, or live inputs from neuromorphic vision
Boltzmann machine is fully connected within and between layers, whereas sensors. In addition we present an event-based DBN architec-
in a RBM, the lateral connections in the visible and hidden layers are
ture that can associate visual and auditory inputs, and combine
removed. As a result, the random variables encoded by hidden units are
conditionally independent given the states of the visible units, and vice multiple uncertain cues from different sensory modalities in a
versa. near-optimal way. The same architecture that is used for inference
of classes can also be used in a generative mode, in which samples
from the learned probability distribution are generated through
law to achieve progress through faster and cheaper computing feed-back connections.
resources in the future, we argue that fast and energy efficient The aspect of combining feed-back and feed-forward streams
inference in DBNs is already possible now, and is an ideal use case of information is an important deviation from traditional purely
for neuromorphic circuits (Indiveri et al., 2011), which emulate feed-forward hierarchical models of information processing in
neural circuits and event-based, asynchronous communication the brain (Van Essen and Maunsell, 1983; Riesenhuber and
architectures in silicon. This is motivated by the fact that in the Poggio, 1999), and DBNs provide a first step toward linking
brain, having many neurons and connections is not a factor that state-of-the-art machine learning techniques and modern mod-
constrains the processing time, since all units operate in paral- els of Bayesian inference and predictive coding in the brain (Rao
lel, and only the arrival of spike events triggers processing, so and Ballard, 1999; Hochstein and Ahissar, 2002; Friston, 2010;
the neural circuits can adapt the processing speed to the rate at Markov and Kennedy, 2013). The importance of recurrent local
which input spikes occur. This scheme would allow the system and feed-back connections in the cortex seems obvious from the
to remain silent, consuming little power, in potentially long silent anatomy (da Costa and Martin, 2009; Douglas and Martin, 2011;
periods, and still allow fast recognition when bursts of input activ- Markov et al., 2012), and in vivo experiments (Lamme et al., 1998;
ity arrive, a scenario that is realistic for natural organisms. These Kosslyn et al., 1999; Bullier, 2001; Murray et al., 2002), but the
advantages have been recently realized for event-based convolu- precise role of feed-back processing is still debated (Lamme et al.,
tional networks using convolution chips (Camuas Mesa et al., 1998; Bullier, 2001; Kersten and Yuille, 2003). One hypothesized
2010; Farabet et al., 2012), but a principled way of building DBNs role is in multisensory integration, and as generative Bayesian
models out of spiking neurons, in which both feed-forward and models, DBNs are very well suited to perform such tasks, e.g.,
feed-back processing are implemented has been lacking. by combining visual and auditory cues for improved recognition
This paper presents the first proof-of-concept of how to trans- (Hinton et al., 2006). We will thus discuss the potential impact of
form a DBN model trained offline into the event-based domain. DBNs as abstract functional models for cortical computation and
This allows exploiting the aforementioned advantages in terms learning.
of processing efficiency, and provides a novel and computation- The structure of this article is as follows: The mathematical
ally powerful model for performing recognition, sampling from framework and the algorithms used for training and converting
the model distribution, and fusion of different sensory modali- conventional DBNs into spiking neural networks are presented in
ties. Although our current implementation is in software, and not Section 2. Section 3 shows the application of the framework to
on neuromorphic VLSI, inference with small DBNs runs in real simulated spike train inputs and real visual and auditory inputs
time on a standard laptop, and thus provides the first necessary from neuromorphic sensors. Implications of this new framework
step toward the goal of building neuromorphic hardware systems are discussed in Section 4.
that efficiently implement deep, self-configuring architectures. In
particular, the novel framework allows us to apply state-of-the-art 2. MATERIALS AND METHODS
computer vision and machine learning techniques directly to data 2.1. DEEP BELIEF NETWORKS
coming from neuromorphic sensors that naturally produce event A DBN (Bengio et al., 2006; Hinton et al., 2006) is a multi-
outputs, like silicon retinas (Lichtsteiner et al., 2008) and cochleas layered probabilistic generative model. The individual layers con-
(Liu et al., 2010). sist of simpler undirected graphical models, so called Restricted
Our main contribution is a novel method for adapting conven- Boltzmann Machines (RBMs), typically with stochastic binary
tional CD training algorithms for DBNs with spiking neurons, units. A RBM has a bottom layer of visible units, and a top
using an approximation of the firing rate of a Leaky Integrate- layer of hidden units, which are fully and bidirectionally con-
and-Fire (LIF) spiking neuron (Siegert, 1951; Jug et al., 2012). nected with symmetric weights. The difference between standard
After training with a time-stepped model, the learned parameters Boltzmann machines and RBMs is that in the restricted model

Frontiers in Neuroscience | Neuromorphic Engineering October 2013 | Volume 7 | Article 178 | 62


OConnor et al. Spiking deep belief networks

units within the same layer are not connected (see Figure 1), RBM this leads to the learning rule
which makes inference and learning within this graphical model

tractable. The visible layers of RBMs at the bottom of a DBN are wij = vi hj data vi hj model , (5)
clamped to the actual inputs when data is presented. When RBMs
are stacked to form a DBN, the hidden layer of the lower RBM where .data denotes an average over samples with visible units
becomes the visible layer of the next higher RBM. Through this clamped to actual inputs, .model denotes an average over samples
process, higher level RBMs can be trained to encode more and when the network is allowed to sample all units freely, and is the
more abstract features of the input distribution. learning rate.
In a binary RBM the units stochastically take on states 0 or Using a sampling approximation normally requires creating
1, depending on their inputs from the other layer. Denoting the enough samples such that the network can settle into an equi-
states of visible units with vi , the states of hidden units with hj , the librium. However, for a RBM the CD algorithm (Hinton et al.,
weights connecting these units with wij , and the biases of visible 2006) has been developed, which uses only a single sample for
(v) (h) the data and model distribution, and performs very well in prac-
and hidden units with bi and bj respectively, a RBM encodes
a joint probability distribution p(v, h|), defined via the energy tice. CD first samples new values for all hidden units in parallel,
function conditioned on the current input, which gives a complete sample
(vdata , hdata ) for the data distribution. It then generates a sample
  
E(v, h; ) = wij vi hj b(v)
i vi b(h)
j hj , (1) for the visible layer, conditioned on the hidden states hdata sam-
pled in the first step, and then samples the hidden layer again,
i j i j
conditioned on this new activity in the visible layer. This gener-
ates a sample (vmodel , hmodel ) from the model distribution. The
where = (w, b(v) , b(h) ). The encoded joint probability can then
weight update can then be computed as
be written as


exp(E(v, h; )) wij = vi,data hj,data vi,model hj,model . (6)
p(v, h|) =    
. (2)
v h exp(E(v , h ; )) 2.1.2. Persistent CD and transient weights
From equations (1) and (2) the following stochastic update rules Since the form of sampling induced by CD strongly biases
the samples from the model distribution toward the most
for the states of units were derived (Hinton and Sejnowski, 1986),
such that on average every update results in a lower energy state, recently seen data, one can alternatively use so-called Persistent
Contrastive Divergence (Tieleman, 2008). In this approach, the
and ultimately settles into an equilibrium:
model distribution is initialized arbitrarily, and at every iteration
of the training process further samples are created by sampling
 (v) conditioned on the most recently sampled hidden states, which
p(vi = 1|h, ) = wij hj + bi (3)
are maintained between data points.
j
There is a delicate balance between sampling and learning in

 (h)
Persistent CD: Although fast learning is generally desirable, too
p(hj = 1|v, ) = wij vi + bj , (4) fast learning can result in too fast changes of the encoded joint
i probability distribution, which can cause the equilibrium distri-

bution to change too fast for the Markov chain of model states
where (x) = 1/ 1 + exp(x) is the sigmoid function, and the to ever settle in. Nevertheless, high learning rates have turned
units will switch to state 0 otherwise. When left to run freely, out to be beneficial in practice, since they increase the mixing
the network will generate samples over all possible states (v, h) rates of the persistent Markov chains (Tieleman and Hinton,
according to the joint probability distribution in (2). This holds 2009). Following the suggestions in Tieleman and Hinton (2009)
for any arbitrary initial state of the network, given that the net- we used so called fast weights, which are added to the regular
work has enough time to become approximately independent of weights of the network, and decay exponentially with each train-
the initial conditions. ing step. When sampling from the model distribution, the fast
weights are updated with the rule:
2.1.1. Training a RBM
During learning, the visible units are clamped to the actual inputs, wijfast = vi hj model . (7)
which are seen as samples from the data distribution. The task
for learning is to adaptthe parameters such that the marginal We will later show that such transient weight changes can be
distribution p(v|) = h p(v, h|) becomes maximally similar to interpreted as short-term plasticity in a spiking neural network
the true observed data distribution p (v), i.e., the log-likelihood implementation.
of generating the observed data needs to be maximized. Hinton
et al. (2006) have shown that this gradient ascent on the log- 2.1.3. Constructing DBNs by stacking RBMs
likelihood w.r.t. the weights wij can be efficiently approximated by As discussed previously, DBNs can be constructed by stacking
a Gibbs-sampling procedure, which alternates between stochasti- RBMs and interpreting the hidden layer of the lower RBM as
cally updating the hidden and visible units respectively. For the the visible layer of the next layer. It has been shown that adding

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 63


OConnor et al. Spiking deep belief networks

hidden layers and applying the previously discussed unsuper- potentials only need to be updated upon the arrival of input
vised learning methods for RBMs is guaranteed to increase the spikes, and spikes can only be created at the times of such input
lower bound on the log-likelihood of the training data (Hinton events. For a LIF neuronrepresenting hj , which receives a con-
et al., 2006). Higher layers will tend to encode more abstract fea- stant input current sj = i wij vi corresponding to the weighted
tures, which are typically very informative for classification tasks. sum of inputs from connected visible units, the expected firing
The top-layer of the DBN can then be trained with supervised rate j (sj ) is:
learning methods, and the whole multi-layer network can be opti-
  1
mized for the task through error backpropagation (Hinton and Vth
tref log 1 if sj Vth
Salakhutdinov, 2006; Hinton et al., 2006). j (sj ) = sj (9)
DBNs can also be used for associating different sets of inputs, 0 otherwise
e.g., from different sensory modalities. In this case one can build
pre-processing hierarchies for both inputs independently, and The above equation holds when the neuron is injected with a
then treat the top layers of these hierarchies as a common visi- constant input, but under realistic conditions the neuron receives
ble layer for a new association layer on top of them (Hinton et al., a continuous stream of input spike trains, each arriving to first
2006). DBNs are therefore not necessarily single hierarchies, but approximation as samples from a Poisson process with some
can also exhibit tree-like architectures. underlying firing rate. For this case, a more accurate prediction
of the average firing rate can be obtained using Siegert neurons
2.2. DISCRETE-TIME AND EVENT-DRIVEN NEURON MODELS (Siegert, 1951; Jug et al., 2012). Siegert neurons have transfer
Traditional RBMs are, like most machine-learning models, sim- functions that are mathematically equivalent to the input-rate
ulated in time-stepped mode, where every neuron in a layer gets output-rate transfer functions of LIF neurons with Poisson-
updated at every time step, and the size of this time step t is process inputs. In order to compute the Siegert transformation
fixed throughout the simulation. While training is typically eas- for a neuron receiving excitatory and inhibitory inputs with rates
ier to achieve with continuous and time-stepped neuron models, (e , i ) and weights (we , wi ) respectively, we first have to compute
the event-driven model has the potential to run faster and more the auxiliary variables
precisely. This is because the states of LIF neurons in the event-
 
based network are only updated upon the arrival of input spikes, Q = (we e + wi i ) 2Q =
2 (we2 e + wi2 i )
and only at these times the neurons decide whether to fire or not.
Temporal precision is limited only by the numerical representa- = Vrest + Q  = Q
tion of time in the system (as opposed to the duration of the 
k = syn / = | (1/2)|
time-step parameter). A drawback is that not all neuron models,
e.g., smooth conductance-based models, can be easily converted where syn is the synaptic time constant (for our purposes con-
into event-driven models. sidered to be zero), and is the Riemann zeta function. Then the
In the standard formulation (Hinton et al., 2006), units within average firing rate out of the neuron with resting potential Vrest
RBMs are binary, and states are sampled according to the sig- and reset potential Vreset can be computed as (Jug et al., 2012):
moidal activation probabilities from Equations (3) and (4). We
call such neuron models sigmoid-binary units. In Nair and  

Hinton (2010) it was shown that an equivalent threshold-linear out = tref + (10)
 2
model can be formulated, in which zero-mean Gaussian noise
N (0, 2n ) with variance 2n is added to the activation functions:  Vth + k      1
(u )2 u
exp 1 + erf du .
 Vreset + k 2 2  2

hj = max 0, wij vi + bhj + N (0, 2n ) , (8)
i
A RBM trained using Siegert units can thus be easily converted
into an equivalent network of spiking LIF neurons: By normaliz-
and similarly for the sampling of visible units. ing the firing rate in Equation (10) relative to the maximum firing
A threshold-linear function can also be used to approximate rate 1/tref , out can be converted into activation probabilities as
the expected firing rates of simplified spiking neurons under required to sample RBM units in Equations (3, 4) during standard
constant current stimulation, such as the LIF neuron (Gerstner CD learning with continuous units. After learning, the parame-
and Kistler, 2002), which is one of the simplest, yet biolog- ters and weights are retained, but instead of sampling every time
ically relatively plausible models for spiking neurons. In this step, the units generate Poisson spike trains with rates computed
model each incoming event adds to the membrane potential Vm by the Siegert formula Equation (10).
according to the strength wij of the synapse along which the
event occurred. Incoming spikes within an absolute refractory 2.3. TRAINING THE NETWORK
period tref after an output spike are ignored. Spikes are gener- 2.3.1. Task
ated deterministically whenever the membrane potential crosses The network was trained on a visual classification task on the
the firing threshold Vth , otherwise the membrane potential decays MNIST benchmark dataset for machine learning (LeCun et al.,
exponentially with time constant . Simple versions of LIF neu- 1998). This set consists of a collection of 28 28 gray-scale
rons can be simulated in an event-based way, since membrane images of handwritten digits, of which 60,000 form a training set,

Frontiers in Neuroscience | Neuromorphic Engineering October 2013 | Volume 7 | Article 178 | 64


OConnor et al. Spiking deep belief networks

and 10,000 an independent test set. In order to make the network 2.4. SIMULATION OF AN EVENT-DRIVEN DBN
more robust, we modified the training set by adding small ran- We created simulators for arbitrary event-driven DBNs in Matlab
dom translations (15%), rotations (3 ) and scalings (10%). and Java. The simulation can be either run in Recognition mode,
The modified training set contains 120,000 images. where input is applied at the bottom layer, and the label has to be
inferred through bottom-up processing, or in Generation mode,
2.3.2. Network Architecture where the activity of the label layer is fixed, and the network
For the visual classification task we trained a DBN with one input samples activity in the Visual Input Layer through top-down con-
layer of 784 visual input units (corresponding to the pixels of nections, according to the learned generative model. Bottom-up
28 28 input images), a 500-unit Visual Abstraction Layer, a and top-down processing can also be activated simultaneously.
500-unit Association Layer, and a 10-unit Label Layer, with In Recognition mode, the DBN is shown a number of test
units corresponding to the 10 digit-classes. The architecture of images, which are transformed into spike trains that activate
the network is shown in Figure 2. Since our goal in this arti- the Visual Input Layer. A Poisson spike train is created for each
cle is to demonstrate a proof-of-concept for spiking DBNs, the pixel with a rate proportional to the pixel intensity, and all fir-
785-500-500-10 network we used is substantially smaller than the ing rates are scaled such that the total input rate summed over all
784-500-500-2000-10 network used previously for the MNIST 28 28 pixels is constant (between 300 and 3000 spikes per sec-
task (Hinton et al., 2006), or the state-of-the-art network in ond). The goal is to compute the correct classification in the Label
Ciresan et al. (2010). Layer. For every input image, the input activations are sampled as
Poisson spike trains with rates proportional to the pixel intensi-
ties. Classification can be done in one of two ways: first, we can
2.3.3. Training turn on only bottom-up connections from the Visual Input Layer
Each RBM in Figure 2 was first trained in a time-stepped mode toward the Label Layer, and observe which of the neurons in the
with Siegert neurons as individual units, for which we fixed the Label Layer spikes the most within a fixed time interval. The sec-
parameters for resting and reset potential, membrane time con- ond variant is to use only bottom-up connections between Visual
stants, and refractory period. Since the output rates of Siegert Input and Visual Abstraction Layer, but activate all recurrent con-
neurons are not constrained to the interval [0, 1] like in Sigmoid- nections in the other RBMs. Information about previous inputs
Binary units, the outputs were normalized, such that the maxi- is stored both within the membrane potentials and the recurrent
mum possible firing rate (given by 1/tref ) had a value of 1. As spiking activity within the network. Recognition is thus achieved
training algorithm for RBMs we applied persistent Contrastive through a modulation of the persistent network activity by input
Divergence learning (Hinton et al., 2006) and the fast weights spike trains. In the absence of input, the network will continue to
heuristics described in Section 2.1.2. We also applied a modi- be active and drift randomly through the space of possible states
fication to the training process proposed by Goh et al. (2010) according to the encoded generative model.
to encourage sparse and selective receptive fields in the hidden This principle is exploited in the Generation mode, where units
layer. within the Label Layer are stimulated, and activation propagates
Learning proceeded in a bottom-up fashion, starting by recurrently through the top-level RBM, and top-down to the
training the weights between the Visual Input and the Visual Visual Input Layer. Thus, analyzing these samples from the gen-
Abstraction Layers. Next, the weights of the Associative Layer erative model provides a way to visualize what the network has
were trained, using input from the previously trained Visual learned so far. If the DBN is activated in this way, it might settle
Abstraction Layer and the supervised information in the Label into a particular state, but could become stuck there, if this state
Layer as the joint visible layer of the RBM. For each layer we corresponds to a local minimum of the Energy landscape accord-
trained for 50 iterations over the complete training set. ing to (1). This can be avoided by using a short-term depressing
STDP kernel in Generation mode, which temporarily reduces
the weights of synapses where pre- and post-synaptic neurons
are active within the same short time window (see Figure 3).
These short-term modifications vanish over time, and the weights
return to their original values. This modification is inspired by
the idea of using auxiliary fast-weights for learning (Tieleman
and Hinton, 2009), which transiently raise the energy of any state
that the network is currently in, thereby slightly pushing it out
of that state. The effect is that the network, instead of settling
into an energy well and remaining there, constantly explores the
whole space of low-energy states. This is a useful feature for search
and associative memory tasks, where the network represents a
cost function through the encoded energy landscape, and the task
is to converge to a maximally likely state starting from an arbi-
trary initial state, e.g., an incomplete or ambiguous input. We
FIGURE 2 | Architecture of the DBN for handwritten digit recognition.
The connections between layers represent the weights of a RBM.
demonstrate this in Section 3.4 in the context of multi-sensory
integration.

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 65


OConnor et al. Spiking deep belief networks

Table 1 | Paired tones and digits in multi-sensory fusion task.

Tone A4 B4 C5 D5 E5 F5 G5 # A5 B5 C6

Freq.(Hz) 440.0 493.9 523.3 587.3 659.3 698.5 830.6 880.0 987.8 1046.5
Digit 0 1 2 3 4 5 6 7 8 9

During training pure tones with given frequencies (upper rows) were paired with
an associated digit (bottom row).

to be phase-locked to the sound waveform to which they are


FIGURE 3 | Short-term plasticity kernel for Generation mode. The
responding, the distribution of Inter-spike Intervals (ISIs) was
fast-weight STDP kernel temporarily depresses all synapses in which the
pre- and post-synaptic neurons were active shortly after each other, a more precise indicator of the frequency of pure input tones
depending on the spike timing difference tpre tpost . As a result, the than the distributions of channels from which the spikes origi-
network is constantly being pushed out of its present state. nated. We preprocessed the auditory spikes with an event-based
ISI histogramming method wherein 100 ISI bins were distributed
logarithmically between 0.833 and 2.85 ms (3501200 Hz), and
2.5. REAL-TIME IMPLEMENTATION for each bin an input LIF unit was assigned which was stimu-
2.5.1. Neuromorphic visual input lated every time an ISI occurred on any channel that was within
We developed a real-time variant of the event-driven DBN which the units designated frequency-range. The output events of these
receives inputs from neuromorphic sensors. Visual input was units were then routed to the Auditory Input Layer (see Section
obtained from the DVS (Lichtsteiner et al., 2008), an event- 3.4 and Figure 8).
generating image sensor consisting of 128 128 pixels, which As stimuli we chose the pure tones from Table 1 from the
asynchronously outputs streams of address events in response to A-minor harmonic scale, ranging from A4 (440 Hz) to C6
local relative light-intensity changes. The events are tagged with (1046.5 Hz), which were played for 1 s each into the silicon
the address of the creating pixel, a time-stamp, and an ON or cochlea. We recorded the spike response of neurons in the
OFF polarity tag, which indicates whether the event was created Auditory Input Layer, which fired whenever enough input events
in response to an increase or decrease of light intensity over that from AER-EAR2 in their ISI range were received. For training in
pixel. Events are transmitted via a USB port to a computer, and the time-stepped domain we constructed data vectors for audi-
processed in the open-source jAER software framework written in tory data by computing the average firing rates of Auditory Input
Java (Delbruck, 2013). The networks were first trained in Matlab, Layer neurons over time bins of 100 ms, evaluated every 30 ms.
and then transferred into the jAER software, where they could run
in real-time in response to event stream inputs. We did not use 3. RESULTS
the polarity information for our purposes, and down-sampled the This section presents the classification performance, shows the
128 128 pixels to a resolution of 28 28, which matched the generative mode of operation, and presents sensor fusion exper-
resolution of the images in the MNIST training set. These events iments. For the results in sections 3.1 and 3.2 we use simulated
were fed into the Visual Input Layer (see Figure 2) while the DVS spike-train input (see Section 2.4). Inputs from neuromorphic
was moved by hand across several hand-drawn images. sensors (Section 2.5) are directly used in the results of sections
3.3 and 3.4.
2.5.2. Multi-sensory fusion
We also created a task in which visual stimuli from a silicon retina 3.1. CLASSIFICATION PERFORMANCE
and auditory stimuli from a silicon cochlea (see Section 2.5.3) Three variants of DBNs were trained, using the architecture
were associated with each other in real-time. During training the shown in Figure 2 for the MNIST visual classification task: the
presentation of a pure tone was always paired with the presenta- first two variants are time-stepped models using sigmoid-binary
tion of an image of a handwritten digit. Table 1 shows the tones or Siegert neurons respectively (see Section 2.2), the third is an
and frequencies that were used, and the visual-auditory pairing event-driven DBN using LIF neurons that were converted from
scheme. The network thus had to learn to associate the two sen- Siegert neurons used during training. The networks were all
sory domains, e.g., by resolving ambiguity in one sensory stream trained in time-stepped mode for 50 iterations over the modified
through information from the other stream. 120,000 example MNIST dataset using a variant of Contrastive
The DBN architecture for sensory fusion is described in detail Divergence learning (see Section 2.3). Figure 4 shows the features
in Section 3.4 and shown in Figure 8. learned by a subset of the neurons in the RBM for the Visual
Abstraction Layer. One can see that this first layer has learned
2.5.3. Neuromorphic Auditory Input through unsupervised learning to extract useful features for the
Auditory input was received from the AER-EAR2 (Liu et al., discrimination of handwritten digits, in this case parts of digits.
2010) neuromorphic auditory sensor, which was built to mimic The classification performance shown in Table 2 was evaluated
the biological cochlea. The device transforms input sounds into on images from the MNIST test set, using simulated Poisson
streams of spikes in 64 channels responsive to different frequency spike trains with a total rate of 300 spikes per second for the
ranges. We found that since spikes of the silicon cochlea tend whole image as input for event-based models. The size of our

Frontiers in Neuroscience | Neuromorphic Engineering October 2013 | Volume 7 | Article 178 | 66


OConnor et al. Spiking deep belief networks

FIGURE 4 | Analysis of weights learned in the DBN. Visualization of the


weights learned by a subset of neurons in the Visual Abstraction Layer for
28 28 images in the MNIST task. Each image shows the vector of FIGURE 5 | DBN architecture for recognition and generation. The Visual
weights feeding into one neuron. Input Layer was split into a bottom-up and a top-down part, used for
projecting inputs in Recognition mode, or visualizing top-down activity in
Generation mode.
Table 2 | Classification performance on the MNIST test set for two
time-stepped and one event-based LIF neuron model.

Neuron model Domain % correct on the MNIST dataset. In the new architecture shown in Figure 5
the lowest layer is split up after training into two Visual Input
Sigmoid-Binary time-step 97.48 Layers, one projecting only bottom-up from inputs to the Visual
Siegert time-step 95.2 Abstraction Layer, and another copy that is purely driven by
LIF event-based 94.09 top-down connections. The weight matrices for bottom-up and
Inputs for the event-based model were simulated Poisson spike trains (see
top-down connections are identical. Thus, the top layers of the
Section 2.4). network form the recurrent model that encodes the data distri-
bution, whereas the bottom part either projects inputs through
bottom-up connections in Recognition mode, or visualizes the
DBN is substantially smaller than in current state-of-the-art deep activity of the top layers through top-down connections in
network approaches for MNIST, e.g., (Ciresan et al., 2010), but Generation mode. If both bottom-up and top-down connections
Table 2 shows that the performance is in a very good range (above are activated at the same time, the top-down Visual Input Layer
94%). More importantly for this proof-of-concept study, the per- visualizes a processed image of what the network believes it is
formance loss when switching to spiking neuron models is small seeing in the bottom-up Visual Input Layer. This process per-
(on the order of 1%), and can possibly be further improved when forms probabilistic inference by which evidence from the current
going to larger network sizes. input is combined with the prior distribution over likely MNIST
images encoded in the DBN weights, and a posterior estimate of
3.2. GENERATION MODE the most likely input is generated.
In Generation mode the network does not receive external input Figure 6A illustrates the generation of samples from the
at the bottom layers. Instead one of the top layers (in our case encoded probabilistic model after activating a unit in the
the Label Layer in Figure 2) is stimulated, and activity spreads Label Layer. This induces spiking activity in the intermediate
in top-down direction through the network. This provides a way Associative and Visual Abstraction Layer, and ultimately stim-
to visualize what has been learned in the probabilistic generative ulates units in the top-down Visual Input Layer, which can be
model encoded in the bi-directional weights. visualized. Figure 6A shows the response of the network when the
Since the network is event-driven, and neurons fire only upon label unit corresponding to the class 4 is stimulated. The snap-
the arrival of input spikes, an initial stimulus in at least one of the shot shows the induced activity in the lower layers, and one can
layers is needed to push the network from a silent state into one clearly see that the response in the Visual Input Layer resembles
of self-sustaining activity, provided that the neuron parameters closely the handwritten digits in the MNIST set that were used
and recurrent connectivity allow this. We performed empirical for training. By using short-term depressing synapses as described
exhaustive parameter search over firing thresholds Vth and mem- in Section 2.1.2 and in Figure 3 the network not just samples
brane time constants in a fully trained network of LIF neurons one single example of a 4, but iterates through different vari-
and measured the mean firing rate within the network after 1 s of ations that are compatible with the variance over inputs in the
100 Hz stimulation of one Label Layer unit, and 5 s without exter- learned generative model. This can be best seen in Video 1 of the
nal stimulation. This allowed us to identify parameter regimes supplementary material.
that allow self-sustained activity of about 20 Hz average activity Figure 6B shows the spiking activity in the different layers of
in Generation mode ( = 800 ms, Vreset = 0, Vth = 0.005). the network in generation mode, both during a forced stimula-
To visualize the activity of the DBN in Generation mode we tion, and in a free self-sustained mode. The network is initially
modified the architecture in Figure 2 that was used for training stimulated for 2 s by forcing firing of neurons in the Label Layer

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 67


OConnor et al. Spiking deep belief networks

FIGURE 6 | Generation mode of the event-driven DBN. (A) Screen synapses (see Figure 3) the network starts to drift through the space of
capture of the network while generating samples of input activations all potential compatible input activations. (B) Raster plot of the DBN in
corresponding to class 4. The neuron corresponding to label 4 was Generation mode. The Label Layer (bottom) is initially stimulated for 2 s
stimulated in the Label Layer (left), and activity propagated through the (shaded region) to fire in sequence for digits 1, 2, and 3. Afterwards, the
whole network. The snapshot shows a single example of activity in the network freely samples from the encoded generative model. Although
Visual Input Layer (right) that is sampled from the generative model activity in the Label Layer jumps between digits, activity in the Visual Input
encoded in the weights of the DBN. Through short-term depressing Layer transitions smoothly.

corresponding to digit classes 1, 2, and 3 (shaded region).


One can see that through the recurrent connectivity activity
spreads throughout the layers of the network. After 2 s the input to
the Label Layer is turned off, and the network is allowed to freely
generate samples from the encoded probability distribution. We
can see that in the Label Layer the network jumps between
different digits, whereas in the other layers, more smooth tran-
sitions are found. Switches between visually similar digits (e.g.,
4 and 9) occurred more often on average than between very
different digits (e.g., 0 and 1).

3.3. REAL-TIME VISUAL RECOGNITION


For this task the event-driven DBN was connected to a neuro-
morphic vision sensor, the 128 128 pixel DVS (Lichtsteiner FIGURE 7 | Screen captures of the real-time spiking DBN in operation
et al., 2008). Events indicating local light intensity changes are during visual handwritten digit recognition. Each row displays a
used as inputs in the bottom-up Visual Input Layer. The whole snapshot of the activity in the different layers of the network (see Figure 5)
system works in real-time, i.e., while the DVS is recording visual for a different visual input recorded with the DVS (left column). Neurons in
input, the DBN simultaneously computes the most likely inter- the Label Layer (column 5) are arranged such that the first column
represent classes 04 (top to bottom), and the second column classes 59.
pretation of the input. By splitting up the connections between The rightmost column shows the top-down reconstruction of the Visual
Visual Input and Visual Abstraction Layer into a bottom-up and Input Layer. (A) The network recognizes the digit 5. (B) For an ambiguous
a top-down pathway as in Figure 5 we can simultaneously clas- input, the network alternates between the two possible interpretations 3
sify the input in real-time, and also visualize in the top-down and 5. The top-down reconstruction shows the current interpretation. (C)
For an unfamiliar input (letter A), the network classifies it as the closest
Visual Input Layer the interpretation of the input after recurrent
resembling digit class 9, and reconstructs a mixture between the actual
processing in the DBN. input and the generative model for class 9. (D) For an input containing a
The real-time system runs as a filter in the jAER software pack- distractor, the network still classifies it as the most likely input, and
age (Delbruck, 2013) on a standard laptop, after training the reconstructs an image without the distractor.
weights of the DBN offline in Matlab on the MNIST database.
Figure 7 shows snapshots of the activity within the different lay-
ers of the network during operation on various stimuli recorded input was presented, which can either be interpreted as a 3 or
in real-time with the DVS. In Figure 7A the DVS was moved over a 5. The network iterated between both interpretations, in this
a hand-drawing of the digit 5 which was not included in the snapshot the reconstruction on the right shows that the network
training set. The left panel shows the input into the Visual Input currently interprets the input as a 3, adding the missing parts
Layer. The digit was correctly classified as a 5 in the Label Layer. of the input to match the actual shape of a digit. In Figure 7C
On the right we can see the reconstruction of the image, which the network is shown an input from an unknown input class,
closely resembles the actual input. In Figure 7B an ambiguous namely the letter A. Since the generative model learned in the

Frontiers in Neuroscience | Neuromorphic Engineering October 2013 | Volume 7 | Article 178 | 68


OConnor et al. Spiking deep belief networks

network knows only digits, it classifies the input as the most sim- to the DBN, such that the combination of both stimuli would
ilar digit, in this case 9, and reconstructs the input as a mixture provide more conclusive evidence of the true label than the sin-
between the actual DVS input and the entrained model of the gle modalities. The auditory stimulus was a mixture of A4 and
digit. In Figure 7D a digit 4 with a distracting stimulus on top F5 tones corresponding to 0 and 5 digits, with four times
was shown. It was correctly classified and reconstructed in the as many input spikes corresponding to class 0 as to class 5.
top-down Visual Input Layer without the distracting stimulus. Thus, if given only the audio input, the DBN should identify
In general, the network recognized reliably all tested classes a 0. Conversely, the visual input shows an ambiguous input
of handwritten digits in real-time, even in the presence of strong that is consistent with either a 3 or a 5, but very unlikely
distractors, with slightly rotated images, or variations in scale or for a 0. Figure 9 demonstrates the audio-visual fusion using an
translation of the image. It can also do so very quickly: at a typical ambiguous visual input and the auditory input favoring class 0.
low-rate input firing rate of 3000 input spikes per second over the However, while each input stream favors an incorrect interpreta-
whole image, the DBN submits its first correct guess of the output tion of either 3 or 0, class 5 is correctly chosen as the most
label within an average of 5.8 ms after the onset of the simulated consistent representation for the combined visual-auditory input
Poisson spike train input. Firing rates in the intermediate layers stream.
are higher, resulting in 58800 spikes/s in the 500 neuron Visual In Figure 10 we analyzed how this depends on the relative
Abstraction Layer (see Figure 2), 147600 spikes/s in the 500 neu- strength of visual and auditory input streams and the ambigu-
ron Association Layer, and 1800 spikes/s in the 10 neuron Label ity of the visual input by (1) changing the relative proportion of
Layer. input spikes coming from the audio stream, and (2) interpolating
the visual input between an image showing 3 and another one
3.4. REAL-TIME SENSORY FUSION showing 5. We varied the mixture of firing rates of input neu-
We trained a DBN to associate visual stimuli from a silicon retina, rons such that 80% (Figure 10A), 20% (Figure 10B), and 10%
and auditory stimuli from a silicon cochlea, in order to clas- (Figure 10C) of all input spikes came from the auditory stream,
sify them in real-time by integrating both input streams. Table 1 and measured the proportion of output spikes for the three classes
shows the respective association of digit images recorded with the 0, 3, and 5. In panels A and C the classes that are inconsistent
DVS (Lichtsteiner et al., 2008), and tones of different frequencies with the dominating auditory respectively visual input are almost
recorded with the AER-EAR2 silicon cochlea (Liu et al., 2010). completely suppressed, and class 5 is favored. One can also see
We used the DBN architecture shown in Figure 8, in which a from the difference between Figures 10A,B that an increase of a
bidirectional connection between the top-level Association Layer few spikes favoring an alternative interpretation can dramatically
and the Auditory Input Layer is added. adjust the output choice: In this case 10% more of spikes favoring
During training a network of Siegert neurons were presented the interpretation 5 are enough to bias the classification toward
with input images from the MNIST database and pre-recorded the interpretation consistent with both visual and auditory input
activations of Auditory Input Layer neurons in response to the over a wide range of visual ambiguity.
tones in Table 1 (see Section 2.5.3). After the training phase,
the DBN was converted into an event-driven DBN as described 4. DISCUSSION
previously, which was run in real-time in the jAER software The great potential of DBNs is widely recognized in the machine
package. learning community and industry (MIT Technology Review,
One key aspect of sensory fusion is the ability to integrate 2013). However, due to the high computational costs, and the
multiple, possibly noisy or ambiguous cues from different sen- capability to integrate large amounts of unlabeled data that is
sory domains to decide on the actual state of the world. We
tested this by providing simultaneous visual and auditory stimuli

FIGURE 9 | Cue integration with a multi-sensory spiking DBN. (A)


When presenting only an ambiguous visual input to the DVS, the network
FIGURE 8 | DBN architecture of the multi-sensory fusion network. In in the absence of auditory input will alternate between recognizing a 3 or
addition to the architecture for visual recognition in Figure 5 the Auditory a 5 (see also Figure 7). (B) When presenting only an ambiguous auditory
Input Layer is bidirectionally connected to the top-level Association Layer. input to the cochlea, the network in the absence of visual input will
Thus, associations between visual inputs, auditory inputs, and classification alternate between recognizing a 0 or a 5. (C) By combining the two
results in the Label Layer can be learned during training, and classification inputs (mixing at 50%), the network reliably classifies the two ambiguous
can be achieved in real-time. patterns as class 5, which is the only consistent interpretation.

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 69


OConnor et al. Spiking deep belief networks

systems (Indiveri et al., 2011). Here we have presented a novel


method how to convert a fully trained DBN into a network of
spiking LIF neurons. Even though we have only shown a proof-
of-concept in software, this provides the necessary theoretical
framework for an architecture that can in the future be imple-
mented on neuromorphic VLSI chips, and first experiments in
this direction are promising. The event-driven approach can be
energy efficient, in particular since the required processing power
depends dynamically on the data content, rather than on the con-
stant dimensionality of the processed data. Furthermore, as we
have shown, spiking DBNs can process data with very low latency,
without having to wait for a full frame of data, which can be
further improved if individual units of the DBN compute in par-
allel, rather than updating each unit in sequence. This advantage
has been recognized for many years for feed-forward convolu-
tional networks, in which almost all operations can be efficiently
parallelized, and has led to the development of custom digital
hardware solutions and spike-based convolution chips (Camuas
Mesa et al., 2010; Farabet et al., 2012), which through the use
of the Address Event Representation (AER) protocol, can also
directly process events coming from event-based dynamic vision
FIGURE 10 | Proportion of output spikes for 3 different mixture ratios
sensors (Lichtsteiner et al., 2008). For such architectures (Perez-
of auditory and visual input in a multi-sensory spiking DBN. Red,
green, and blue encode the ratio of 0, 3, and 5 choices (spikes) relative to
Carrasco et al., 2013) have recently developed a similar mapping
the total number of spikes emitted from the Label Layer (averaged over 10 methodology between frame-based and event-driven networks
trials). The horizontal axis sweeps the probability that visual input spikes are that translates the weights and other parameters of a fully trained
chosen from either a 3 digit or an aligned 5 digit. Auditory input frame-based feed-forward network into the event-based domain,
consists of a mixture of 0 and 5 inputs, with four times more spikes
and then optimizes them with simulated annealing. In compari-
indicating a 0. Over a wide range of mixture values, the network correctly
infers the only consistent interpretation of the multi-modal input, which is son, this offers increased flexibility to change neuronal parameters
class 5. Inputs that are inconsistent with the dominating sensory domain after training, whereas our method uses the accurate Siegert-
(3 in A, 0 in B,C) are mostly suppressed. approximation of spike rates already during the training of a
bi-directional network, and does not require an additional opti-
mization phase. The advantages of spike-based versus digital
freely available on the web, applications so far have strongly con- frame-based visual processing in terms of processing speed and
centrated on big data problems (Le et al., 2012). Surprisingly little scalability have been compared in Farabet et al. (2012), where it
effort has gone into making this technology available for real-time was also suggested that spike-based systems are more suitable for
applications, although the scenarios in which DBNs excel, e.g., systems that employ both feed-forward and feed-back processing.
visual object recognition, speech recognition, or multi-sensory Although our model is event-based, the Siegert model (Siegert,
fusion, are extremely important tasks in fields like robotics or 1951) does not make use of the precise timing of spikes. The
mobile computing. An exception is the work of (Hadsell et al., basic theoretical framework of DBNs is not suitable for inputs
2009), who use small and mostly feed-forward deep networks that vary in time, and thus requires modifications to the net-
for long-range vision in an autonomous robot driving off road. work architecture (Taylor et al., 2007), or a transformation of
In general, previous attempts to reduce the running time have inherently time-dependent inputs (Dahl et al., 2012). Learning
mostly attempted to restrict the connectivity of networks (Lee with STDP-like rules in spiking DBNs provides an intriguing
et al., 2008; Le et al., 2012), e.g., by introducing weight-sharing, future possibility for a direct handling of dynamic inputs. In
pooling, and restricted receptive fields. In speech processing on our current network, the short-time memory of previously seen
mobile phones, data is first communicated to a central server inputs carried in the membrane potential of LIF neurons allows
where it is processed by a large DBN before the result is sent back us to process inputs from asynchronous neuromorphic sensors,
to the mobile device (Acero et al., 2008). Online and on-board in which complete frames are never available (Lichtsteiner et al.,
processing would be very important for mobile applications 2008; Liu et al., 2010). We can therefore for the first time apply
where such communication infrastructure is not available, e.g., the state-of-the-art machine learning technique of DBNs directly
for exploration robots in remote areas, underwater, or other plan- to inputs from event-based sensors, without any need to convert
ets, but this requires fast and efficient processing architectures, input signals, and can classify the input while also completing the
that conventional DBNs currently cannot provide. input signals using feed-back connections.
We think that this presents a big opportunity for neuromor- Feed-back connections are rarely used in models of biolog-
phic engineering, which has always pursued the goal of provid- ically inspired vision, e.g., HMAX (Riesenhuber and Poggio,
ing fast and energy efficient alternatives to conventional digital 1999), but as we show e.g., in Figure 7, feed-back and recurrency
computing architectures for real-time brain-inspired cognitive are essential for implementing general probabilistic inference,

Frontiers in Neuroscience | Neuromorphic Engineering October 2013 | Volume 7 | Article 178 | 70


OConnor et al. Spiking deep belief networks

e.g., to infer missing, ambiguous, or noisy values in the input. areas or between sensory streams are likely to be involved in sen-
Only in recent years have models become available that directly sory fusion tasks. Recent studies have revealed the existence of
link spiking activity in recurrent neural networks to inference and anatomical connections that would enable cross-modal interac-
learning in probabilistic graphical models. Nessler et al. (2013) tions also at lower levels (Falchier et al., 2002; Markov et al., 2012),
have shown that learning via STDP in cortical microcircuits can and functional studies have provided some (but not conclusive)
lead to the emergence of Bayesian computation for the detec- evidence of co-activations of early sensory areas by stimulation
tion of hidden causes of inputs. They interpret spikes as samples of different modalities [see (Kayser and Logothetis, 2007) for a
from a posterior distribution over hidden variables, which is review]. Integration might also be required within the same sen-
also the essential idea for neural sampling approaches (Bsing sory modality, since e.g., the visual pathway splits up into at least
et al., 2011), in which spiking neurons implement inference in two separate major ventral and dorsal streams.
a Boltzmann machine via Markov Chain Monte Carlo sampling. All these arguments indicate that the traditional concept
Using clock-like waves of inhibition, Merolla et al. (2010) showed of sensory processing in the cortex as a feed-forward hierar-
an alternative implementation of single Boltzmann machines chy of feature detectors with increasing levels of abstraction
with spiking neurons. in higher layers (Gross et al., 1972; Van Essen and Maunsell,
In biology, the precise role of feed-back processing is still 1983; Desimone et al., 1984) needs to be reassessed (Markov
debated, but the deficiencies of purely feed-forward architectures and Kennedy, 2013). A closer look at the anatomy of intra- and
for processing the kind of clutter, occlusions, and noise inher- inter-areal cortical connectivity reveals an abundance of feed-
ent to natural scenes point at least to a role in modulation by back and recurrent connections. Every brain area receives inputs
attention signals, and in the integration of multiple cues, possi- from a large number of cortical and subcortical sources (Douglas
bly from different modalities as well as memory and high-level and Martin, 2011; Markov et al., 2012), and feed-forward con-
cognitive areas (Lamme et al., 1998; Bullier, 2001; Kersten and nections actually make up only a relatively small fraction of
Yuille, 2003). A proposal from Hochstein and Ahissar (2002) even inputs to neurons along the hypothesized pathways (da Costa and
suggests a reverse hierarchy for conscious vision, whereby fast Martin, 2009). Many studies have demonstrated feed-back effects,
feed-forward perception is used for a quick estimate of the gist in which the activation or deactivation of a higher area alters
of the scene, and for activating top-down signals that focus atten- activity in lower sensory areas (Lamme et al., 1998; Bullier, 2001;
tion on low-level features necessary to resolve the details of the Murray et al., 2002), e.g., activation of V1 through a high-level
task. Such a model can explain the fast pop-out effect of image cognitive process like visual imagery (Kosslyn et al., 1999).
parts that violate model expectations, and also provides a model DBN models can play an important role in capturing many
for fast learning without changes in the early sensory process- of those effects, and the event-based framework presented in
ing stages. This is consistent with a variety of theories that the this article provides a model in which the dynamics and short-
brain encodes Bayesian generative models of its natural environ- term memory properties of spiking neurons can be exploited for
ment (Kersten and Yuille, 2003; Knill and Pouget, 2004). The dealing with realistic input sequences, in our case coming from
hierarchical organization of sensory cortices would then natu- bio-inspired sensors. There are still plenty of open research ques-
rally correspond to a hierarchy of prior distributions from higher tions, in particular concerning the integration of spike-timing
to lower areas that can be optimally adapted to statistics of the based learning in the DBN framework, and the exploitation of
real world in order to minimize surprise (Friston, 2010). Rao spike-timing for dealing with sequences of inputs. This will likely
and Ballard (1999) suggested that inference in such hierarchical require an adaptation of the simple RBM model used as the
generative models could be efficiently performed through pre- building block of DBNs, and will have to include recurrent lat-
dictive coding. In this framework, feed-back connections would eral connections. Similar mechanisms for the processing of input
signal a prediction from higher to lower layers, whereas feed- sequences have been proposed in the framework of Hierarchical
forward connections would encode the error between prediction Temporal Memory (Hawkins and Blakeslee, 2004), which opens
and actual input. In Rao and Ballard (1999) it was shown that up new directions for combining machine learning approaches
such a model can account for several phenomena concerning with cortical modeling.
the non-linear interaction of center and surround of receptive
fields, and fMRI data support the theory by reporting reduced V1 ACKNOWLEDGMENTS
activity when recognition-related activity in higher areas increases This project was partially supported by the FP7 SeeBetter
(Murray et al., 2002). (FP7-ICT-2009-6), Swiss National Foundation EARS
The framework of Bayesian generative models also provides a (200021_126844), and the Samsung Advanced Institute
principled way of associating and integrating potentially uncer- of Technology. Michael Pfeiffer has been supported by a
tain cues from different sources, e.g., across sensory modalities Forschungskredit grant of the University of Zurich. The funders
(Knill and Pouget, 2004). It is well known that humans use had no role in study design, data collection and analysis, decision
all available cues for solving tasks, e.g., by using visual cues to to publish, or preparation of the manuscript.
improve their understanding of speech (Kayser and Logothetis,
2007; Stein and Stanford, 2008). Although traditional mod- SUPPLEMENTARY MATERIAL
els have assumed that multi-sensory integration occurs only at The Supplementary Material for this article can be found
higher association areas like superior colliculus (Felleman and online at: http://www.frontiersin.org/journal/10.3389/fnins.
Van Essen, 1991), feed-back connections from higher to lower 2013.00178/abstract

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 71


OConnor et al. Spiking deep belief networks

REFERENCES Erhan, D., Bengio, Y., Courville, A., Hinton, G. E., and Salakhutdinov, in Proceedings of ICML, 473480.
Acero, A., Bernstein, N., Chambers, R., Manzagol, P., Vincent, P., and R. R. (2006). Reducing the ACM.
Ju, Y.-C., Li, X., Odell, J., et al. Bengio, S. (2010). Why does unsu- dimensionality of data with neural Le, Q. V., Ranzato, M. A., Monga, R.,
(2008). Live search for mobile: web pervised pre-training help deep networks. Science 313, 504507. Devin, M., Chen, K., Corrado, G.,
services by voice on the cellphone, learning? J. Mach. Learn. Res. 11, doi: 10.1126/science.1127647 et al. (2012). Building high-level
in IEEE International Conference 625660. Hinton, G. E., and Sejnowski, T. J. features using large scale unsuper-
on Acoustics, Speech and Signal Falchier, A., Clavagnier, S., Barone, (1986). Learning and Relearning in vised learning, in Proceedings of
Processing (ICASSP), (IEEE), (Las P., and Kennedy, H. (2002). Boltzmann Machines. Cambridge, ICML, (Edinburgh).
Vegas, NV), 52565259. Anatomical evidence of multimodal MA: MIT Press 1, 282317. LeCun, Y. L., Bottou, L., Bengio, Y.,
Bengio, Y., Lamblin, P., Popovici, integration in primate striate cortex. Hochreiter, S., Bengio, Y., Frasconi, P., and Haffner, P. (1998). Gradient-
D., and Larochelle, H. (2006). J. Neurosci. 22, 57495759. Schmidhuber, J., and Elvezia, C. based learning applied to docu-
Greedy layer-wise training of deep Farabet, C., Paz, R., Prez-Carrasco, J., (2001). Gradient flow in recur- ment recognition. Proc. IEEE 86,
networks, in Advances in Neural Zamarreo, C., Linares-Barranco, rent nets: the difficulty of learn- 22782324. doi: 10.1109/5.726791
Information Processing Systems 19. A., LeCun, Y., et al. (2012). ing long-term dependencies, in A Lee, H., Ekanadham, C., and Ng,
Vancouver: MIT Press. Comparison between frame- Field Guide to Dynamical Recurrent A. (2008). Sparse deep belief
Bullier, J. (2001). Integrated model of constrained fix-pixel-value and Neural Networks. eds S. C. Kremer net model for visual area V2, in
visual processing. Brain Res. Rev. frame-free spiking-dynamic-pixel and J. F. Kolen (New York, NY: IEEE Advances in Neural Information
36, 96107. doi: 10.1016/S0165- ConvNets for visual processing. Press), 237244. Processing Systems, Vol 20,
0173(01)00085-6 Front. Neurosci. 6:32. doi: 10.3389/ Hochstein, S., and Ahissar, M. (2002). (Vancouver), 873880.
Bsing, L., Bill, J., Nessler, B., and fnins.2012.00032 View from the top: hierarchies and Lee, H., Grosse, R., Ranganath, R., and
Maass, W. (2011). Neural dynamics Felleman, D. J., and Van Essen, D. C. reverse hierarchies review. Neuron Ng, A. Y. (2009). Convolutional
as sampling: a model for stochastic (1991). Distributed hierarchical 36, 791804. doi: 10.1016/S0896- deep belief networks for scalable
computation in recurrent networks processing in the primate cerebral 6273(02)01091-7 unsupervised learning of hierarchi-
of spiking neurons. PLoS Comput. cortex. Cereb. Cortex 1, 147. doi: Indiveri, G., Linares-Barranco, B., cal representations, in Proceedings
Biol. 7:e1002211. doi: 10.1371/ 10.1093/cercor/1.1.1 Hamilton, T., van Schaik, A., of ICML, (Montreal), 609616. doi:
journal.pcbi.1002211 Friston, K. (2010). The free-energy Etienne-Cummings, R., Delbruck, 10.1145/1553374.1553453
Camuas Mesa, L., Prez-Carrasco, principle: a unified brain theory? T., et al. (2011). Neuromorphic sili- Lichtsteiner, P., Posch, C., and
J., Zamarreo Ramos, C., Nat. Rev. Neurosci. 11, 127138. doi: con neuron circuits. Front. Neurosci. Delbruck, T. (2008). A 128
Serrano-Gotarredona, T., and 10.1038/nrn2787 5:73. doi: 10.3389/fnins.2011.00073 128 120 db 15 s latency asyn-
Linares-Barranco, B. (2010). On Gerstner, W., and Kistler, W. Jug, F., Cook, M., and Steger, A. chronous temporal contrast vision
scalable spiking ConvNet hard- (2002). Spiking Neuron Models. (2012). Recurrent competitive net- sensor. IEEE J. Solid-State Circ. 43,
ware for cortex-like visual sensory Single Neurons, Populations, works can learn locally excita- 566576. doi: 10.1109/JSSC.2007.
processing systems, in Processing Plasticity. Cambridge: Cambridge tory topologies, in International 914337
of IEEE International Symposium University Press. doi: 10.1017/ Joint Conference on Neural Networks Liu, S., Van Schaik, A., Minch, B.,
on Circuits and Systems (ISCAS), CBO9780511815706 (IJCNN), (Brisbane), 18. and Delbruck, T. (2010). Event-
(Paris), 249252. Goh, H., Thome, N., and Cord, Kayser, C., and Logothetis, N. K. based 64-channel binaural silicon
Ciresan, D. C., Meier, U., Gambardella, M. (2010). Biasing restricted (2007). Do early sensory cortices cochlea with Q enhancement mech-
L. M., and Schmidhuber, J. (2010). Boltzmann machines to manipulate integrate cross-modal information? anisms, in Proceedings of IEEE
Deep, big, simple neural nets for latent selectivity and sparsity, in Brain Struct. Funct. 212, 121132. International Symposium on Circuits
handwritten digit recognition. NIPS workshop on deep learning doi: 10.1007/s00429-007-0154-0 and Systems (ISCAS), (Paris),
Neural Comput. 22, 32073220. doi: and unsupervised feature learning, Kersten, D., and Yuille, A. (2003). 20272030.
10.1162/NECO_a_00052 (Whistler, BC). Bayesian models of object percep- Markov, N., and Kennedy, H. (2013).
da Costa, N. M., and Martin, K. A. C. Gross, C. G., Roche-Miranda, G. E., tion. Curr. Opin. Neurobiol. 13, The importance of being hierar-
(2009). The proportion of synapses and Bender, D. B. (1972). Visual 150158. doi: 10.1016/S0959-4388 chical. Curr. Opin. Neurobiol. 23,
formed by the axons of the lateral properties of neurons in the infer- (03)00042-4 187194. doi: 10.1016/j.conb.2012.
geniculate nucleus in layer 4 of area otemporal cortex of the macaque. J. Knill, D. C., and Pouget, A. (2004). The 12.008
17 of the cat. J. Comp. Neurol. 516, Neurophysiol. 35, 96111. Bayesian brain: the role of uncer- Markov, N. T., Ercsey-Ravasz, M. M.,
264276. doi: 10.1002/cne.22133 Hadsell, R., Sermanet, P., Ben, J., Erkan, tainty in neural coding and compu- Ribeiro Gomes, A. R., Lamy, C.,
Dahl, G. E., Yu, D., Deng, L., and Acero, A., Scoffier, M., Kavukcuoglu, K., tation. Trends Neurosci. 27, 712719. Magrou, L., Vezoli, J., et al. (2012).
A. (2012). Context-dependent pre- et al. (2009). Learning long-range doi: 10.1016/j.tins.2004.10.007 A weighted and directed interareal
trained deep neural networks for vision for autonomous off-road Kosslyn, S. M., Pascual-Leone, A., connectivity matrix for macaque
large-vocabulary speech recogni- driving. J. Field Robot. 26, 120144. Felician, O., Camposano, S., cerebral cortex. Cereb. Cortex 120.
tion. IEEE Trans. Audio Speech Lang. doi: 10.1002/rob.20276 Keenan, J. P., Thompson, W. L., doi: 10.1093/cercor/bhs270
Process. 20, 3042. doi: 10.1109/ Hawkins, J., and Blakeslee, S. (2004). et al. (1999). The role of Area 17 Merolla, P., Ursell, T., and Arthur,
TASL.2011.2134090 On intelligence. New York, NY: in visual imagery: Convergent evi- J. (2010). The Thermodynamic
Delbruck, T. (2013). jAER Open Source Times Books. dence from PET and rTMS. Science Temperature of a Rhythmic
Project. Available online at: http:// Hinton, G., Deng, L., Yu, D., Dahl, 284, 167170. doi: 10.1126/science. Spiking Network. Arxiv preprint
sourceforge.net/apps/trac/jaer/wiki. G. E., Mohamed, A., Jaitly, N., et al. 284.5411.167 arXiv:1009.5473.
Desimone, R., Albright, T. D., Gross, (2012). Deep neural networks for Lamme, V. A. F., Supr, H., and MIT Technology Review (2013). 10
C. G., and Bruce, C. (1984). acoustic modeling in speech recog- Spekreijse, H. (1998). Feedforward, breakthrough technologies 2013:
Stimulus-selective properties of nition: the shared views of four horizontal, and feedback process- Deep learning. Available online at:
inferior temporal neurons in the research groups. IEEE Signal Process. ing in the visual cortex. Curr. Opin. http://www.technologyreview.com/
macaque. J. Neurosci. 4, 20512062. Mag. 29, 8297. doi: 10.1109/MSP. Neurobiol. 8, 529535. doi: 10.1016/ featuredstory/513696/deep-learning/
doi: 10.1007/s12021-011-9106-1 2012.2205597 S0959-4388(98)80042-1 Mohamed, A.-R., Dahl, G. E., and
Douglas, R. J., and Martin, K. Hinton, G., Osindero, S., and Teh, Y. W. Larochelle, H., Erhan, D., Courville, Hinton, G. (2012). Acoustic model-
A. C. (2011). Whats black and (2006). A fast learning algorithm for A., Bergstra, J., and Bengio, Y. ing using deep belief networks. IEEE
white about the grey matter? deep belief nets. Neural Comput. 18, (2007). An empirical evaluation Trans. Audio Speech Lang. Process.
Neuroinformatics 9, 167179. doi: 15271554. doi: 10.1162/neco.2006. of deep architectures on problems 20, 1422. doi: 10.1109/TASL.2011.
10.1007/s12021-011-9106-1 18.7.1527 with many factors of variation, 2109382

Frontiers in Neuroscience | Neuromorphic Engineering October 2013 | Volume 7 | Article 178 | 72


OConnor et al. Spiking deep belief networks

Murray, S. O., Kersten, D., Olshausen, Trans. Pattern Anal. Mach. Intell. 35, Taylor, G., Hinton, G., and Roweis, S. that could be construed as a potential
B. A., Schrater, P., and Woods, D. L. 27062719. doi: 10.1109/TPAMI. (2007). Modeling human motion conflict of interest.
(2002). Shape perception reduces 2013.71 using binary latent variables, in
activity in human primary visual Rao, R. P. N., and Ballard, D. H. (1999). Advances in Neural Information Received: 12 June 2013; accepted: 17
cortex. Proc. Natl. Acad. Sci. U.S.A. Predictive coding in the visual Processing Systems, (Vancouver), September 2013; published online: 08
99, 1516415169. doi: 10.1073/pnas. cortex: a functional interpretation 13451352. October 2013.
192579399 of some extra-classical receptive- Tieleman, T. (2008). Training Citation: OConnor P, Neil D, Liu SC,
Nair, V., and Hinton, G. (2010). field effects. Nat. Neurosci. 2, restricted Boltzmann machines Delbruck T and Pfeiffer M (2013) Real-
Rectified linear units improve 7987. doi: 10.1038/4580 using approximations to the like- time classification and sensor fusion
Restricted Boltzmann Machines, Riesenhuber, M., and Poggio, T. (1999). lihood gradient, in Proceedings with a spiking deep belief network.
in Proceedings of ICML, (Haifa), Hierarchical models of object recog- of ICML, (Helsinki: ACM), Front. Neurosci. 7:178. doi: 10.3389/
807814. nition in cortex. Nat. Neurosci. 2, 10641071. fnins.2013.00178
Nessler, B., Pfeiffer, M., Buesing, 10191025. doi: 10.1038/14819 Tieleman, T., and Hinton, G. (2009). This article was submitted to
L., and Maass, W. (2013). Seide, F., Li, G., and Yu, D. (2011). Using fast weights to improve per- Neuromorphic Engineering, a section of
Bayesian computation emerges Conversational speech transcrip- sistent contrastive divergence, in the journal Frontiers in Neuroscience.
in generic cortical microcircuits tion using context-dependent deep Proceedings of ICML, (Montreal: Copyright 2013 OConnor, Neil, Liu,
through spike-timing-dependent neural networks, in Proceedings ACM), 10331040. Delbruck and Pfeiffer. This is an open-
plasticity. PLoS Comput. Biol. of Interspeech, (Florence), Van Essen, D. C., and Maunsell, J. H. R. access article distributed under the terms
9:e1003037. doi: 10.1371/journal. 437440. (1983). Hierarchical organization of the Creative Commons Attribution
pcbi.1003037 Siegert, A. J. F. (1951). On the first pas- and functional streams in the License (CC BY). The use, distribution or
Prez-Carrasco, J., Zhao, B., Serrano, sage time probability problem. Phys. visual cortex. Trends Neurosci. 6, reproduction in other forums is permit-
C., Acha, B., Serrano-Gotarredona, Rev. 81:617. doi: 10.1103/PhysRev. 370375. doi: 10.1016/0166-2236 ted, provided the original author(s) or
T., Chen, S., et al. (2013). Mapping 81.617 (83)90167-4 licensor are credited and that the origi-
from frame-driven to frame-free Stein, B. E., and Stanford, T. R. (2008). nal publication in this journal is cited, in
event-driven vision systems by Multisensory integration: current Conflict of Interest Statement: The accordance with accepted academic prac-
low-rate rate-coding and coin- issues from the perspective of the authors declare that the research tice. No use, distribution or reproduction
cidence processing. application single neuron. Nat. Rev. Neurosci. 9, was conducted in the absence of any is permitted which does not comply with
to feed forward ConvNets. IEEE 255266. doi: 10.1038/nrn2331 commercial or financial relationships these terms.

www.frontiersin.org October 2013 | Volume 7 | Article 178 | 73


ORIGINAL RESEARCH ARTICLE
published: 30 January 2014
doi: 10.3389/fnins.2013.00272

Event-driven contrastive divergence for spiking


neuromorphic systems
Emre Neftci 1*, Srinjoy Das 1,2 , Bruno Pedroni 3 , Kenneth Kreutz-Delgado 1,2 and Gert Cauwenberghs 1,3
1
Institute for Neural Computation, University of California, San Diego, La Jolla, CA, USA
2
Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, CA, USA
3
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA

Edited by: Restricted Boltzmann Machines (RBMs) and Deep Belief Networks have been
Andr Van Schaik, The University of demonstrated to perform efficiently in a variety of applications, such as dimensionality
Western Sydney, Australia
reduction, feature learning, and classification. Their implementation on neuromorphic
Reviewed by:
hardware platforms emulating large-scale networks of spiking neurons can have significant
Michael Schmuker, Freie Universitt
Berlin, Germany advantages from the perspectives of scalability, power dissipation and real-time interfacing
Philip De Chazal, University of with the environment. However, the traditional RBM architecture and the commonly used
Western Sydney, Australia training algorithm known as Contrastive Divergence (CD) are based on discrete updates
*Correspondence: and exact arithmetics which do not directly map onto a dynamical neural substrate. Here,
Emre Neftci, Institute for Neural
we present an event-driven variation of CD to train a RBM constructed with Integrate
Computation, University of
California, San Diego, 9500 Gilman & Fire (I&F) neurons, that is constrained by the limitations of existing and near future
Drive - 0523, La Jolla, CA-92093, neuromorphic hardware platforms. Our strategy is based on neural sampling, which
USA allows us to synthesize a spiking neural network that samples from a target Boltzmann
e-mail: [email protected]
distribution. The recurrent activity of the network replaces the discrete steps of the CD
algorithm, while Spike Time Dependent Plasticity (STDP) carries out the weight updates
in an online, asynchronous fashion. We demonstrate our approach by training an RBM
composed of leaky I&F neurons with STDP synapses to learn a generative model of
the MNIST hand-written digit dataset, and by testing it in recognition, generation and
cue integration tasks. Our results contribute to a machine learning-driven approach for
synthesizing networks of spiking neurons capable of carrying out practical, high-level
functionality.
Keywords: synaptic plasticity, neuromorphic cognition, Markov chain monte carlo, recurrent neural network,
generative model

1. INTRODUCTION Currently, RBMs and the algorithms used to train them are
Machine learning algorithms based on stochastic neural network designed to operate efficiently on digital processors, using batch,
models such as RBMs and deep networks are currently the state- discrete-time, iterative updates based on exact arithmetic calcula-
of-the-art in several practical tasks (Hinton and Salakhutdinov, tions. However, unlike digital processors, neuromorphic systems
2006; Bengio, 2009). The training of these models requires sig- compute through the continuous-time dynamics of their compo-
nificant computational resources, and is often carried out using nents, which are typically Integrate & Fire (I&F) neurons (Indiveri
power-hungry hardware such as large clusters (Le et al., 2011) et al., 2011), rendering the transfer of such algorithms on such
or graphics processing units (Bergstra et al., 2010). Their imple- platforms a non-trivial task. We propose here a method to con-
mentation in dedicated hardware platforms can therefore be very struct RBMs using I&F neuron models and to train them using
appealing from the perspectives of power dissipation and of an online, event-driven adaptation of the Contrastive Divergence
scalability. (CD) algorithm.
Neuromorphic Very Large Scale Integration (VLSI) systems We take inspiration from computational neuroscience to
exploit the physics of the device to emulate very densely the identify an efficient neural mechanism for sampling from the
performance of biological neurons in a real-time fashion, while underlying probability distribution of the RBM. Neuroscientists
dissipating very low power (Mead, 1989; Indiveri et al., 2011). argue that brains deal with uncertainty in their environments
The distributed structure of RBMs suggests that neuromorphic by encoding and combining probabilities optimally (Doya et al.,
VLSI circuits and systems can become ideal candidates for such a 2006), and that such computations are at the core of cogni-
platform. Furthermore, the communication between neuromor- tive function (Griffiths et al., 2010). While many mechanistic
phic components is often mediated using asynchronous address- theories of how the brain might achieve this exist, a recent neu-
events (Deiss et al., 1998) enabling them to be interfaced with ral sampling theory postulates that the spiking activity of the
event-based sensors (Liu and Delbruck, 2010; Neftci et al., 2013; neurons encodes samples of an underlying probability distribu-
OConnor et al., 2013) for embedded applications, and to be tion (Fiser et al., 2010). The advantage for a neural substrate
implemented in a very scalable fashion (Silver et al., 2007; Joshi in using such a strategy over the alternative one, in which
et al., 2010; Schemmel et al., 2010). neurons encode probabilities, is that it requires exponentially

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 74


Neftci et al. Event-Driven Contrastive Divergence

fewer neurons. Furthermore, abstract model neurons consis- with r(u(t)) proportional to exp(u(t)), where u(t) is the
tent with the behavior of biological neurons can implement membrane potential and r is an absolute refractory period dur-
Markov Chain Monte Carlo (MCMC) sampling (Buesing et al., ing which the neuron cannot fire. (u(t), t t  ) describes the
2011), and RBMs sampled in this way can be efficiently trained neurons instantaneous firing rate as a function of u(t) at time
using CD, with almost no loss in performance (Pedroni et al., t, given that the last spike occurred at t  . It can be shown that the
2013). We identify the conditions under which a dynamical average firing rate of this neuron model for stationary u(t) is the
system consisting of I&F neurons performs neural sampling. sigmoid function:
These conditions are compatible with neuromorphic imple-
mentations of I&F neurons (Indiveri et al., 2011), suggest- (u) = (r + exp(u))1 . (2)
ing that they can achieve similar performance. The calibra-
tion procedure necessary for configuring the parameters of the Second, the membrane potential of neuron i is equal to the linear
spiking neural network is based on firing rate measurements, sum of its inputs:
and so is easy to realize in software and in hardware plat-
forms. 
N

In standard CD, weight updates are computed on the basis ui (t) = bi + wij zj (t), i = 1, . . . , N, (3)
j=1
of alternating, feed-forward propagation of activities (Hinton,
2002). In a neuromorphic implementation, this translates to
where bi is a constant bias, and zj (t) represents the pre-synaptic
reprogramming the network connections and resetting its
spike train produced by neuron j defined as being equal to 1 when
state variables at every step of the training. As a consequence,
the pre-synaptic neuron spikes for a duration r , and equal to zero
it requires two distinct dynamical systems: one for normal
otherwise. The terms wij zj (t) are identified with the time course
operation (i.e., testing), the other for training, which is highly
of the PostSynaptic Potential (PSP), i.e., the response of the
impractical. To overcome this problem, we train the neural RBMs
membrane potential to a pre-synaptic spike. The two conditions
using an online adaptation of CD. We exploit the recurrent
above define a neuron model, to which we refer as the abstract
structure of the network to mimic the discrete construction
neuron model. Assuming the network states are binary vectors
and reconstruction steps of CD in a spike-driven fashion, and
[z1 , . . . , zk ], it can be shown that, after an initial transient, the
Spike Time Dependent Plasticity (STDP) to carry out the weight
sequence of network states can be interpreted as MCMC samples
updates. Each sample (spike) of each random variable (neuron)
of the Boltzmann distribution:
causes synaptic weights to be updated. We show that, over longer
periods, these microscopic updates behave like a macroscopic 1  
CD weight update. Compared to standard CD, no additional p(z1 , . . . , zk ) = exp E(z1 , . . . , zk ) , with
Z
connectivity programming overhead is required during the 1  (4)
training steps, and both testing and training take place in the E(z1 , . . . , zk ) = Wij zi zj bi zi ,
2
same dynamical system. ij i
Because RBMs are generative models, they can act simulta-   
neously as classifiers, content-addressable memories, and carry where Z = z1 ,...,zk exp E(z1 , . . . , zk ) is a constant such that
out probabilistic inference. We demonstrate these features in a p sums up to unity, and E(z1 , . . . , zk ) can be interpreted as an
MNIST hand-written digit task (LeCun et al., 1998), using an energy function (Haykin, 1998).
RBM network consisting of one layer of 824 visible neurons An important fact of the abstract neuron model is that, accord-
and one layer of 500 hidden neurons. The spiking neural net- ing to the dynamics of zj (t), the PSPs are rectangular and
work was able to learn a generative model capable of recognition non-additive since no two presynaptic spikes can occur faster than
performances with accuracies up to 91.9%, which is close to the the refractive period. The implementation of synapses producing
performance obtained using standard CD and Gibbs sampling, such PSPs on a large scale is very difficult to realize in hardware,
93.6%. when compared to first-order linear filters that result in alpha-
shaped PSPs (Destexhe et al., 1998; Bartolozzi and Indiveri, 2007).
This is because, in the latter model, the synaptic dynamics are lin-
2. MATERIALS AND METHODS ear, such that a single hardware synapse can be used to generate
2.1. NEURAL SAMPLING WITH NOISY I&F NEURONS the same current that would be generated by an arbitrary num-
We describe here conditions under which a dynamical system ber of synapses (see also next section). As a consequence, we will
composed of I&F neurons can perform neural sampling. It has use alpha-shaped PSPs instead of rectangular PSPs in our mod-
been proven that abstract neuron models consistent with the els. The use of the alpha PSP over the rectangular PSP is the major
behavior of biological spiking neurons can perform MCMC sam- source of degradation in sampling performance, as we will discuss
pling of a Boltzmann distribution (Buesing et al., 2011). Two in section 2.2.
conditions are sufficient for this. First, the instantaneous firing
rate of the neuron verifies: 2.1.1. Stochastic I&F neurons
A neuron whose instantaneous firing rate is consistent with

0 if t t  < r Equation (1) can perform neural sampling. Equation (1) is a gen-
(u(t), t t  ) = , (1) eralization of the Poisson process to the case when the firing prob-
r(u(t)) t t  r ability depends on the time of the last spike (i.e., it is a renewal

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 75


Neftci et al. Event-Driven Contrastive Divergence

process), and so can be verified only if the neuron fires stochasti- For a neuron j in layer h,
cally (Cox, 1962). Stochasticity in I&F neurons can be obtained
through several mechanisms, such as a noisy reset potential, Ij (t) = Ijh (t),
noisy firing threshold, or noise injection (Plesser and Gerstner,
2000). The first two mechanisms necessitate stochasticity in the d h  N (8)
syn Ij = Ijh + qij i (t) + qbj bhj (t),
neurons parameters, and therefore may require specialized cir- dt
i=1
cuitry. But noise injection in the form of background Poisson
spike trains requires only synapse circuits, which are present in
where I h is the feedback from the visible layer, and (t) and bh (t)
many neuromorphic VLSI implementation of spiking neurons
are Poisson spike trains of the visible neurons and the bias neu-
(Bartolozzi and Indiveri, 2007; Indiveri et al., 2011). Furthermore,
rons, defined similarly as in Equation (7). The dynamics of I h and
Poisson spike trains can be generated self-consistently in balanced
I correspond to a first-order linear filter, so each incoming spike
excitatory-inhibitory networks (van Vreeswijk and Sompolinsky,
results in PSPs that rise and decay exponentially (i.e., alpha-PSP)
1996), or using finite-size effects and neural mismatch (Amit and
(Gerstner and Kistler, 2002).
Brunel, 1997).
Can this neuron verify the conditions required for neural sam-
We show that the abstract neuron model in Equation (1) can
pling? The membrane potential is already assumed to be equal to
be realized in a simple dynamical system consisting of leaky I&F
the sum of the PSPs as required by neural sampling. So to answer
neurons with noisy currents. The neurons membrane potential
the above question we only need to verify whether Equation (1)
below firing threshold is governed by the following differential
holds. Equation (5) is a Langevin equation which can be analyzed
equation:
using the FokkerPlanck equation (Gardiner, 2012). The solution
to this equation provides the neurons input/output response, i.e.,
d its transfer curve (for a review, see Renart et al., 2003):
C ui = gL ui + Ii (t) + (t), ui (t) (, ), (5)
dt   u0 1
V
2
(u0 ) = r + m urst u0
dx exp(x )(1 + erf(x)) , (9)
where C is a membrane capacitance, ui is the membrane potential V
of neuron i, gL is a leak conductance, (t) is a white noise term of
amplitude (which can for example be generated by background where erf is the error function (the integral of the normal dis-
activity), Ii (t) its synaptic current and is the neurons firing tribution), u0 = gIL is the stationary value of the membrane
threshold. When the membrane potential reaches , an action potential when injected with a constant current I, m = C
gL is the
potential is elicited. After a spike is generated, the membrane membrane time constant, urst is the reset voltage, and V2 (u) =
potential is clamped to the reset potential urst for a refractory 2 /(gL C).
period r . According to Equation (2), the condition for neural sampling
In the case of the neural RBM, the currents Ii (t) depend on the requires that the average firing rate of the neuron to be the sig-
layer the neuron is situated in. For a neuron i in layer moid function. Although the transfer curve of the noisy I&F
neuron Equation (9) is not identical to the sigmoid function, it
was previously shown that with an appropriate choice of param-
Ii (t) = Iid (t) + Ii (t), eters, the shape of this curve can be very similar to it (Merolla
h N (6) et al., 2010). We observe that, for a given refractory period r ,
d
syn Ii = Ii + qhji hj (t) + qbi bi (t), the smaller the ratio Vurst in Equation (5), the better the trans-
dt fer curve resembles a sigmoid function (Figure 1). With a small
j=1
urst
V , the transfer function of a neuron can be fitted to

1
where Iid (t) is a current representing the data (i.e., the external 1 exp(I)
input), I is the feedback from the hidden layer activity and the (I) = 1+ , (10)
r r
bias, and the qs are the respective synaptic weights, and b (t) is
a Poisson spike train implementing the bias. Spike trains are rep- where and are the parameters to be fitted. The choice of
resented by a sum of Dirac delta pulses centered on the respective the neuron model described in Equation (5) is not critical for
spike times: neural sampling: A relationship that is qualitatively similar to
Equation (9) holds for neurons with a rigid (reflective) lower
  boundary (Fusi and Mattia, 1999) which is common in VLSI
bi (t) = (t tk ), hj (t) = (t tk ) (7) neurons, and for I&F neurons with conductance-based synapses
k Spi k Spj (Petrovici et al., 2013).
This result also shows that synaptic weights qi , qhj , which have
the units of charge are related to the RBM weights Wij by a factor
where Spi and Spj are the set of the spike times of the bias neuron 1 . To relate the neural activity to the Boltzmann distribution,
bi and the hidden neuron hj , respectively, and (t) = 1 if t = 0 Equation (4), each neuron is associated to a binary random vari-
and 0 otherwise. able which is assumed to take the value 1 for a duration r after the

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 76


Neftci et al. Event-Driven Contrastive Divergence

FIGURE 1 | Transfer curve of a leaky I&F neuron for three different


parameter sets where u0 = gI , and 1r = 250 [Hz] (dashed gray). In this
L
plot, V is varied to produce different ratios urst . The three plots above
V
shows that the fit with the sigmoid function (solid black) improves as the
ratio decreases.

neuron has spiked, and zero otherwise, similarly to Buesing et al. FIGURE 2 | Transfer function of I&F neurons driven by background
white noise Equation (5). We measure the firing rate of the neuron as a
(2011). With this encoding, the network state is characterized by
function of a constant current injection to estimate (u0 ), where for
a binary vector having the same number of entries as the number constant Iinj , u0 = Iinj /gL . (Top) The transfer function of noisy I&F neurons
of neurons in the network. The relationship between this ran- in the absence of refractory period [(u) = r(u), circles]. We observe that
dom vector and the I&F neurons spiking activity is illustrated in is approximately exponential over a wide range of inputs, and therefore
Figure 3. The membrane potential of the neuron (black) evolves compatible with neural sampling. Crosses show the transfer curve of
neurons implementing the abstract neuron Equation (1), exactly. (Bottom)
in a random fashion until it spikes, after which it is clamped to With an absolute refractory period the transfer function approximates the
urst for a duration r (gray). While the neuron is in the refrac- sigmoid function. The firing rate saturates at [250]Hz due to the refractory
tory period, the random variable associated to it is assumed to period chosen for the neuron.
takes the value 1. This way, the state of the network can always
be associated with a binary vector. According to the theory, the
dynamics in the network guarantees that the binary vectors are
which can be problematic. However, by selecting a large enough
samples drawn from a Boltzmann distribution.
noise amplitude and a slow enough input synapse time con-
2.1.2. Calibration protocol stant, the fluctuations due to the background input are much
In order to transfer the parameters from the probability distribu- larger than the fluctuations due to the inputs. In this case, and
tion Equation (4) to those of the I&F neurons, the parameters remain approximately constant during the sampling.
, in Equation (10) need to be fitted. An estimate of a neu- Neural mismatch can cause and to differ from neuron to
rons transfer function can be obtained by computing its spike neuron. From Equation (10) and the linearity of the postsynaptic
rate when injected with different values of constant inputs I. The currents I(t) in the weights, it is clear that this type of mismatch
refractory period r is the inverse of the maximum firing rate can be compensated by scaling the synaptic weights and biases
of the neuron, so it can be easily measured by measuring the accordingly. The calibration of the parameters and quan-
spike rate for very high input current I. Once r is known, the titatively relate the spiking neural networks parameters to the
parameter estimation can be cast into a simple linear regression RBM. In practice, this calibration step is only necessary for map-
problem by fitting log((i)1 r ) with I + log(). Figure 2 ping pre-trained parameters of the RBM onto the spiking neural
shows the transfer curve when r = 0 ms, which is approximately network.
exponential in agreement with Equation (1). Although we estimated the parameters of software simulated
The shape of the transfer curse is strongly dependent on the I&F neurons, parameter estimation based on firing rate measure-
noise amplitude. In the absence of noise, the transfer curve is a ments were shown to be an accurate and reliable method for VLSI
sharp threshold function, which softens as the amplitude of the I&F neurons as well (Neftci et al., 2012).
noise is increased (Figure 1). As a result, both parameters and
are dependent on the variance of the input currents from other 2.2. VALIDATION OF NEURAL SAMPLING USING I&F NEURONS
neurons I(t). Since q = w, the effect of the fluctuations on the The I&F neuron verifies Equation (1) only approximately, and the
network is similar to scaling the synaptic weights and the biases PSP model is different from the one of Equation (3). Therefore,

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 77


Neftci et al. Event-Driven Contrastive Divergence

the following two important questions naturally arise: how accu- those obtained through the histogram constructed with the sam-
rately does the I&F neuron-based sampler outlined above sample pled events. To construct this histogram, each spike was extended
from a target Boltzmann distribution? How well does it perform to form a box of length r (as illustrated in Figure 3), the spiking
in comparison to an exact sampler, such as the Gibbs sampler? To activity was sampled at 1 kHz, and the occurrences of all the pos-
answer these questions we sample from several neural RBM con- sible 210 states of the random vector z were counted. We added 1
sisting of five visible and five hidden units for randomly drawn to the number of occurrences of each state to avoid zero probabil-
weight and bias parameters. At these small dimensions, the proba- ities. The histogram obtained from a representative run is shown
bilities associated to all possible values of the random vector z can in Figure 4 (left).
be computed exactly. These probabilities are then compared to A common measure of similarity between two distributions p
and q is the KL divergence:
 pi
D(p||q) = pi log .
qi
i

If the distributions p and q are identical then D(p||q) = 0, other-


wise D(p||q) > 0. The right panel of Figure 4 shows D(p||Pexact )
as a function of sampling duration, for distributions p obtained
from three different samplers: the abstract neuron based sam-
pler with alpha PSPs (PNS,Abstract ), the I&F neuron-based sampler
(PNS ), and the Gibbs sampler (PGibbs ).
In the case of the I&F neuron-based sampler, the average KL
divergence for 48 randomly drawn distributions after 1000 s of
sampling time was 0.059 0.049. This result is not significantly
FIGURE 3 | Neural Sampling in an RBM consisting of 10 stochastic I&F different if the abstract neuron model Equation (1) with alpha
neurons, with five neurons in each layer. Each neuron is associated to a PSPs is used (average KL divergence 0.10 0.049), and in both
binary random variable which take values 1 during a refractory period r
after the neuron has spiked (gray shadings). The variables are sampled at
cases the KL divergence did not tend to zero as the number
1 kHz to produce binary vectors that correspond to samples of the joint of samples increased. The only difference in the latter neuron
distribution p(z). In this figure, only the membrane potential and the model compared to the abstract neuron model of Buesing et al.
samples produced by the first five neurons are shown. The vectors inside (2011), which tends to zero when sampling time tends to infin-
the brackets are example samples of the marginalized distribution
ity, is the PSP model. This indicates that the discrepancy is largely
p(z1 , z2 , z3 , z4 , z5 ) produced at the time indicated by the vertical lines. In
the RBM, there are no recurrent connections within a layer.
due to the use of alpha-PSPs, rather than the approximation of
Equation (1) with I&F neurons.

FIGURE 4 | (Left) Example probability distribution obtained by neural (PGibbs ), which is the common choice for RBMs. For comparison with the
sampling of the RBM of Figure 3. The bars are marginal probabilities neural sampler, we identified the duration of one Gibbs sampling iteration
computed by counting the events [00000], [00001], . . . , [11110], [11111], with one refractory period r = 4 ms. The plot shows that up to 104 ms, the
respectively. PNS is the distribution obtained by neural sampling and P is the two methods are comparable. After this, the KL divergence of the neural
exact probability distribution computed with Equation (4). (Right) The degree sampler tends to a plateau due to the fact that neural sampling with our I&F
to which the sampled distribution resembles the target distribution is neural network is approximate. In both figures, PNS, Abstract refers to the
quantified by the KL divergence measured across 48 different distributions, marginal probability distribution obtained by using the abstract neuron model
and the shadings correspond to its standard deviation. This plot also shows Equation (1). In this case, the KL divergence is not significantly different from
the KL divergence of the target distribution sampled by Gibbs Sampling the one obtained with the I&F neuron model-based sampler.

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 78


Neftci et al. Event-Driven Contrastive Divergence

The standard sampling procedure used in RBMs is Gibbs population of class neurons associated to the same label that has
Sampling: the neurons in the visible layer are sampled simultane- the highest population firing rate.
ously given the activities of the hidden neurons, then the hidden To reconstruct a digit from a class label, the class neurons
neurons are sampled given the activities of the visible neurons. belonging to a given digit are clamped to a high firing rate. For
This procedure is iterated a number of times. For comparison testing the discrimination performance of an energy-based model
with the neural sampler, the duration of one Gibbs sampling iter- such as the RBM, it is common to compute the free-energy F(vc )
ation is identified with one refractory period r = 4 ms. At this of the class units (Haykin, 1998), defined as:
scale, we observe that the speed of convergence of the neural sam- 
pler is similar to that of the Gibbs sampler up to 104 ms, after exp(F(vc )) = exp(E(vd , vc , h)), (11)
which the neural sampler plateaus above the D(p||q) = 102 line. vd ,h
Despite the approximations in the neuron model and the synapse
model, these results show that in RBMs of this size, the neural Table 1 | List of parameters used in the software simulations.a
sampler consisting of I&F neurons sample from a distribution
that has the same KL divergence as the distribution obtained after bias Mean firing rate of All figures 1000 Hz
bias Poisson spike
104 iterations of Gibbs sampling, which is more than the typical
train
number of iterations used for MNIST hand-written digit tasks in
Noise amplitude All figures, except Figure 1 3 1011 A/s0.5
the literature (Hinton et al., 2006).
Figure 1 (left) 2 1011 A/s0.5
Figure 1 (right) 3 1010 A/s0.5
2.3. NEURAL ARCHITECTURE FOR LEARNING A MODEL OF MNIST
Figure 1 (bottom) 1 109 A/s0.5
HAND-WRITTEN DIGITS
Exponential factor All figures 2.044 109 A1
We test the performance of the neural RBM in a digit recognition (fit)
task. We use the MNIST database, whose data samples consist Baseline firing rate All figures 8808 Hz
of centered, gray-scale, 28 28-pixel images of hand-written (fit)
digits 09 (LeCun et al., 1998). The neural RBMs network r Refractory period All figures 4 ms
architecture consisted of two layers, as illustrated in Figure 5. syn Time constant of All figures 4 ms
The visible layer was partitioned into 784 sensory neurons (vd ) recurrent, and bias
and 40 class label neurons (vc ) for supervised learning. The synapses
pixel values of the digits were discretized to two values, with br Burn-in time of the All figures 10 ms
low intensity pixel values (p 0.5) mapped to 105 and high neural sampling
intensity values (p > 0.5) mapped to 0.98. A neuron i in d gL Leak conductance All figures 1 nS
stimulated each neuron i in layer v, with synaptic currents fi such urst Reset potential All figures 0V
that P(i = 1) = (fi )r = pi , where 0 pi 1 is the value of C Membrane All figures 1012 F
pixel i. The value fi is calculated by inverting capacitance
the transfer function

of the neuron: fi = 1 (s) = log ssr 1 . Using this RBM, Firing threshold All figures 100 mV
W RBM weight matrix Figure 4 N(0.75, 1.5)
classification is performed by choosing the most likely label given ( RN Nh )
the input, under the learned model. This equals to choosing the b , bh RBM bias for layer Figure 4 N(1.5, 0.5)
and h
N , Nh Number of visible Figure 4 5, 5
and hidden units
in the RBM Figures 7, 8, 7 824, 500
Nc Number of class Figures 7, 8, 7 40
label units
2T Epoch duration Figures 4, 7, 8 100 ms
Figure 9 300 ms
Tsim Simulation time Figure 2 5s
Figure 4 1000 s
Figure 7 0.2 s
Figure 9 0.85 s
Figure 8 (testing) 1.0 s
Figure 8 (learning) 2000 s
STDP Learning time Figure 7 4 ms
FIGURE 5 | The RBM network consists of a visible and a hidden layer. window
The visible layer is partitioned into 784 sensory neurons (vd ) and 40 class
Learning rate Standard CD 0.1 102
label neurons (vc ) for supervised learning. During data presentation, the
activities in the visible layer are driven by a data layer d, consisting of a digit Event-driven CD 3.2 102
and its label (1 neuron per label). In the RBM, the weight matrix between
a Software simulation scripts are available online (https://github.com/
the visible layer and the hidden layer is symmetric.
eneftci/eCD).

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 79


Neftci et al. Event-Driven Contrastive Divergence

and selecting vc such that the free-energy is minimized. The The main result of this paper is an online variation of the CD
spiking neural network is simulated using the BRIAN simula- rule for implementation in neuromorphic hardware. By virtue
tor (Goodman and Brette, 2008). All the parameters used in the of neural sampling the spikes generated from the visible and
simulations are provided in Table 1. hidden units can be used to compute the statistics of the prob-
ability distributions online (further details on neural sampling
3. RESULTS in the Materials and Methods section 2.1). Therefore a possi-
3.1. EVENT-DRIVEN CONTRASTIVE DIVERGENCE ble neural mechanism for implementing CD is to use synapses
A Restricted Boltzmann Machine (RBM) is a stochastic neural whose weights are governed by synaptic plasticity. Because the
network consisting of two symmetrically interconnected layers spikes cause the weight to update in an online, and asynchronous
composed of neuron-like unitsa set of visible units v and a set fashion, we refer to this rule as event-driven CD.
of hidden units h, but has no connections within a layer. The weight update in event-driven CD is a modulated, pair-
The training of RBMs commonly proceeds in two phases. At based STDP rule:
first the states of the visible units are clamped to a given vec- d
tor from the training set, then the states of the hidden units qij = g(t) STDPij (i (t), hj (t)) (13)
dt
are sampled. In a second reconstruction phase, the network is
allowed to run freely. Using the statistics collected during sam- where g(t) R is a zero-mean global gating signal controlling
pling, the weights are updated in a way that they maximize the data vs. reconstruction phase, qij is the weight of the synapse
the likelihood of the data (Hinton, 2002). Collecting equilib- and i (t) and hj (t) refer to the spike trains of neurons i and hj ,
rium statistics over the data distribution in the reconstruction defined as in Equation (7).
phase is often computationally prohibitive. The CD algorithm As opposed to the standard CD rule, weights are updated after
has been proposed to mitigate this (Hinton, 2002; Hinton and every occurrence of a pre-synaptic and post-synaptic event. While
Salakhutdinov, 2006): the reconstruction of the visible units this online approach slightly differentiates it from standard CD, it
activity is achieved by sampling them conditioned on the values is integral to a spiking neuromorphic framework where the data
of the hidden units (Figure 6). This procedure can be repeated samples and weight updates cannot be stored. The weight update
k times (the rule is then called CDk ), but relatively good con- is governed by a symmetric STDP rule with a symmetric temporal
vergence is obtained for the equilibrium distribution even for window K(t) = K(t), t:
one iteration. The CD learning rule is summarized as follows:
STDPij (i (t), hj (t)) = i (t)Ahj (t) + hj (t)Ai (t),
 t
wij = ( i hj
data i hj
recon ), (12)
Ahj (t) = A dsK(t s)hj (s), (14)
where i and hj are the activities in the visible and hidden lay- t
ers, respectively. This rule can be interpreted as a difference Ai (t) = A dsK(s t)i (s),

of Hebbian and anti-Hebbian learning rules between the visi-
ble and hidden neurons sampled in the data and reconstruc- with A > 0 defining the magnitude of the weight updates. In
tion phases. In practice, when the data set is very large, weight our implementation, updates are additive and weights can change
updates are calculated using a subset of data samples, or mini- polarity.
batches. The above rule can then be interpreted as a stochas-
tic gradient descent (Robbins and Monro, 1951). Although 3.1.1. Pairwise STDP with a global modulatory signal
the convergence properties of the CD rule are the subject of approximates CD
continuing investigation, extensive software simulations show The modulatory signal g(t) switches the behavior of the synapse
that the rule often converges to very good solutions (Hinton, from LTP to LTD (i.e., Hebbian to Anti-Hebbian). The temporal
2002). average of g(t) must vanish to balance LTP and LTD, and must

FIGURE 6 | The standard Contrastive Divergence (CD)k procedure, weight update follows a STDP rule modulated by a zero mean signal g(t). This
compared to event-driven CD. (A) In standard CD, learning proceeds iteratively signal switches the behavior of the synapse from Long-Term Potentiation (LTP)
by sampling in construction and reconstruction phases (Hinton, 2002), to Long-Term Depression (LTD), and partitions the training into two phases
which is impractical in a continuous-time dynamical system. (B) We propose a analogous to those of the original CD rule. The spikes cause microscopic
spiking neural sampling architecture that folds these updates on a continuous weight modifications, which on average behave as the macroscopic CD weight
time dimension through the recurrent activity of the network. The synaptic update. For this reason, the learning rule is referred to as event-driven CD.

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 80


Neftci et al. Event-Driven Contrastive Divergence

vary on much slower time scales than the typical times scale of we can write (up to a negligible error Kempter et al., 2001)
the network dynamics, denoted br , so that the network samples

from its stationary distribution when the weights are updated.
i (t)Ahj (t)
td = A i+ h + dK(). (18)
The time constant br corresponds to a burn-in time of MCMC j
0
sampling and depends on the overall network dynamics and can-
not be computed in the general case. However, it is reasonable In the uncorrelated case, the second term in Cij contributes the
to assume br to be in the order of a few refractory periods of the same amount, leading to:
neurons (Buesing et al., 2011). In this work, we used the following
modulation function g(t): Cij = i+ h +
j .


1 if mod(t, 2T) (br , T) br 
with := 2A T 2T 0 dK(). Similar arguments apply to the
g(t) = 1 if mod(t, 2T) (T + br , 2T) , (15) averages in the time interval tr :


0 otherwise

where mod is the modulo function and T is a time interval. Rij = 2A dK() i (t)hj (t )
tr = i h
j .
0
The data is presented during the time intervals (2iT, (2i + 1)T),
where i is a positive integer. With the g(t) defined above, no with i h
j := i (t)
tr hj (t )
tr . The average update in
weight update is undertaken during a fixed period br . This allows
(0, 2T) then becomes:
us to neglect the transients after the stimulus is turned on and
off (respectively in the beginning of the data and reconstruction  
d
phases). In this case and under further assumptions discussed qij = i+ h + i h
j j . (19)
dt (0,2T)
below, the event-driven CD rule can be directly compared with
standard CD as we now demonstrate. The average weight update
during (0, 2T) is: According to Equation (18), any symmetric temporal win-
dow that is much shorter than T can be used. For sim-
  plicity, we choose an exponential temporal window K() =
d
qij = Cij + Rij , exp(|/STDP |) with decay rate STDP T (Figure 6B). In this
dt (0,2T)
case, = 2A T2T STDP .
br

T br The modulatory function g(t) partitions the training into


Cij = ( i (t)Ahj (t)
td + hj (t)Ai (t)
td )
2T epochs of duration 2T. Each epoch consists of a LTP phase during
T br which the data is presented (construction), followed by a free-
Rij = ( i (t)Ahj (t)
tr + hj (t)Ai (t)
tr ), running LTD phase (reconstruction). The weights are updated
2T
(16) asynchronously during the time interval in which the neural
sampling proceeds, and Equation (19) tells us that its average
where td = (br , T) and tr = (T + br , 2T) denote the intervals resembles Equation (12). However, it is different in two ways:
during the positive and negative phases of g(t), and
(a,b) = the averages are taken over one data and reconstruction phase
1
b rather than a mini-batch of data samples and their reconstruc-
b a a dt. tions; and more importantly, the synaptic weights are updated
We write the first average in Cij as follows:
during the data and the reconstruction phase, whereas in the CD
 T  t rule, updates are carried out at the end of the reconstruction
1
i (t)Ahj (t)
td = A dt dsK(t s)i (t)hj (s), phase. In the derivation above the effect of the weight change on
T br br the network during an epoch 2T was neglected for mathematical
 T  simplicity. In the following, we verify that despite this approxima-
1
=A dt dK()i (t)hj (t ), tion, the event-driven CD performs nearly as well as standard CD
T br br 0
 in the context of a common benchmark task.
=A dK() i (t)hj (t )
td .
0 3.2. LEARNING A GENERATIVE MODEL OF HAND-WRITTEN DIGITS
(17) We train the RBM to learn a generative model of the MNIST
handwritten digits using event-driven CD (see section 2.3 for
If the spike times are uncorrelated the temporal averages become details). For training, 20,000 digits selected randomly (with
a product of the average firing rates of a pair of visible and hidden repetition) from a training set consisting of 10,000 digits were
neurons (Gerstner and Kistler, 2002): presented in sequence, with an equal number of samples for each
digit.
i (t)hj (t )
td = i (t)
td hj (t )
td =: i+ h +
j . The raster plots in Figure 7 show the spiking activity of each
layer before and after learning for epochs of duration 100 ms. The
If we choose a temporal window that is much smaller than T, and top panel shows the population-averaged weight. After training,
assume the network activity is stationary in the interval (br , T), the sum of the upwards and downward excursions of the average

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 81


Neftci et al. Event-Driven Contrastive Divergence

FIGURE 7 | The spiking neural network learns a generative model of the bias, and Id is the data. The timing of the clamping (Id ) and g differ due to an
MNIST dataset using the event-driven CD procedure. (A) Learning curve, interval br where no weight update is undertaken to avoid the transients
shown here up to 10, 000 samples. (B) Details of the training procedure, (see section 2). Before learning and during the reconstruction phase, the
before and after training (20,000 samples). During the first half of each 0.1 s activity of the visible layer is random. But as learning progresses, the activity
epoch, the visible layer v is driven by the sensory layer, and the gating in the visible layer reflects the presented data in the reconstruction phase.
variable g is 1, meaning that the synapses undergo LTP. During the second This is very well visible in the layer class label neurons vc , whose activity
half of each epoch, the sensory stimulus is removed, and g is set to 1, so persists after the sensory stimulus is removed. Although the firing rates of
the synapses undergo LTD. The top panels of both figures show the mean of the hidden layer neurons before training is high (average 113 Hz), this is only a
the entries of the weight matrix. The second panel shows the values of the reflection of the initial conditions for the recurrent couplings W . In fact, at the
modulatory signal g(t). The third panel shows the synaptic currents of a end of the training, the firing rates in both layers becomes much sparser
visible neuron, where Ih is caused by the feedback from the hidden and the (average 9.31 Hz).

weight is much smaller than before training, because the learn- that the network learned a model that is consistent with the
ing is near convergence. The second panel shows the value of the mathematical description of the RBM.
modulatory signal g(t). The third panel shows the input current In an energy-based model like the RBM the free-energy min-
(Id ) and the current caused by the recurrent couplings (Ih ). imization should give the upper bound on the discrimination
Two methods can be used to estimate the overall recognition performance (Haykin, 1998). For this reason, the fact that the
accuracy of the neural RBM. The first is to sample: the visible recognition accuracy is higher when sampling as opposed to using
layer is clamped to the digit only (i.e., d ), and the network is the free-energy method may appear puzzling. However, this is
run for 1s. The known label is then compared with the posi- possible because the neural RBM does not exactly sample from
tion of the group of class neurons that fired at the highest rate. the Boltzmann distribution, as explained in section 2.2. This
The second method is to minimize free-energy: the neural RBMs suggests that event-driven CD compensates for the discrepancy
parameters are extracted, and for each data sample, the class label between the distribution sampled by the neural RBM and the
with the lowest free-energy (see section 2) is compared with the Boltzmann distribution, by learning a model that is tailored to
known label. In both cases, recognition was tested for 1000 data the spiking neural network.
samples that were not used during the training. The results are Excessively long training durations can be impractical for
summarized in Figure 8. real-time neuromorphic systems. Fortunately, the learning using
As a reference we provide the best performance achieved using event-driven CD is fast: Compared to the off-line RBM train-
the standard CD and one unit per class label (Nc = 10) (Figure 8, ing (250, 000 presentations, in mini-batches of 100 samples) the
table row 1), 93.6%. By mapping the these parameters to the event-driven CD training succeeded with a smaller number of
neural sampler, the recognition accuracy reached 92.6%. The dis- data presentations (20, 000), which corresponded to 2000 s of
crepancy is expected since the neural sampler does not exactly simulated time. This suggests that the training durations are
sample from the target Boltzmann distribution (see section 2.2). achievable for real-time neuromorphic systems.
When training a neural RBM of I&F neurons using event-
driven CD, the recognition result was 91.9% (Figure 8, table 3.2.1. The choice of the number of class neurons Nc
row 2). The performance of this RBM obtained by minimizing its Event-driven CD underperformed in the case of 1 neuron per
free-energy was 90.8%. The learned parameters performed well class label (Nc = 10), which is the common choice for standard
for classification using the free-energy calculation which suggests CD and Gibbs sampling. This is because a single neuron firing

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 82


Neftci et al. Event-Driven Contrastive Divergence

FIGURE 8 | To test recognition accuracy, the trained RBMs are sampled event-driven CD discretized to 8 and 5 bits. In all scenarios, the accuracy after
using the I&F neuron-based sampler for up to 1 s. The classification is 50 ms of sampling was above 80% and after 1 s the accuracies typically
read out by identifying the group of class label neurons that had the highest reached their peak at around 92%. The dashed horizontal lines show the
activity. This experiment is run for RBM parameter sets obtained by standard recognition accuracy obtained by minimizing the free-energy (see text). The
CD (black, CD) and event-driven CD (green, eCD). To test for robustness to fact that the eCD curve (solid green) surpasses its free-energy line suggests
finite precision weights, the RBM was run with parameters obtained by that a model that is tailored to the I&F spiking neural network was learned.

at its maximum rate of 250 Hz cannot efficiently drive the rest of bits, it degrades more substantially to 89.4%. In both cases, the
the network without tending to induce spike-to-spike correlations RBM still retains its discriminative power, which is encouraging
(e.g., synchrony), which is incompatible with the assumptions for implementation in hardware neuromorphic systems.
made for sampling with I&F neurons and event-driven CD. As
a consequence, the generative properties of the neural RBM 3.3. GENERATIVE PROPERTIES OF THE RBM
degrade. This problem is avoided by using several neurons per We test the neural RBM as a generative model of the MNIST
class label (in our case four neurons per class label) because the dataset of handwritten digits, using parameters obtained by run-
synaptic weight can be much lower to achieve the same effect, ning the event-driven CD. The RBMs generative property enables
resulting in smaller spike-to-spike correlations. it to classify and generate digits, as well as to infer digits by com-
bining partial evidence. These features are clearly illustrated in
3.2.2. Neural parameters with finite precision the following experiment (Figure 9). First the digit 3 is presented
In hardware systems, the parameters related to the weights and (i.e., layer d is driven by layer d) and the correct class label in
biases cannot be set with floating-point precision, as can be done vc activated. Second, the neurons associated to class label 5 are
in a digital computer. In current neuromorphic implementations clamped, and the network generated its learned version of the
the synaptic weights can be configured at precisions of about digit. Third, the right-half part of a digit 8 is presented, and the
8 bits (Yu et al., 2012). We characterize the impact of finite- class neurons are stimulated such that only 3 or 6 are able to acti-
precision synaptic weights on performance by discretizing the vate (the other class neurons are inhibited, indicated by the gray
weight and bias parameters to 8 bits and 5 bits. The set of possi- shading). Because the stimulus is inconsistent with 6, the network
ble weights were spaced uniformly in the interval ( 4.5, + settled to 3 and reconstructed the left part of the digit.
4.5), where , are the mean and the standard deviation of The latter part of the experiment illustrates the integration of
the parameters across the network, respectively. The classifica- information between several partially specified cues, which is of
tion performance of MNIST digits degraded gracefully. In the 8 interest for solving sensorimotor transformation or multi-modal
bit case, it degrades only slightly to 91.6%, but in the case of 5 sensory cue integration problems (Deneve et al., 2001; Doya

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 83


Neftci et al. Event-Driven Contrastive Divergence

FIGURE 9 | The recurrent structure of the network allows it to classify, right-half part of a digit 8 is presented, and the class neurons are stimulated
reconstruct and infer from partial evidence. (A) Raster plot of an such that only 3 or 6 can activate (all others are strongly inhibited as indicated
experiment illustrating these features. Before time 0s, the neural RBM runs by the gray shading). Because the stimulus is inconsistent with 6, the
freely, with no input. Due to the stochasticity in the network, the activity network settles to a 3 and attempts to reconstruct it. The top figures show
wanders from attractor to attractor. At time 0s, the digit 3 is presented (i.e., the digits reconstructed in layer d . (B) Digits 09, reconstructed in the same
layer d is driven by d), activating the correct class label in c ; At time manner. The columns correspond to clamping digits 09, and each is
t = 0.3 s, the class neurons associated to 5 are clamped to high activity and different, independent run. (C) Population firing rate of the experiment
the rest of the class label neurons are strongly inhibited, driving the network presented in (A). The network activity is typically at equilibrium after about
to reconstruct its version of the digit in layer d ; At time t = 0.6 s, the 10r = 40 ms (black bar).

et al., 2006; Corneil et al., 2012). This feature has been used for on-line, while being able to carry out functionally relevant tasks
auditory-visual sensory fusion in a spiking Deep Belief Network such as recognition, data generation and cue integration.
(DBN) model (OConnor et al., 2013). There, the authors trained The CD algorithm can be used to learn the parameters of
a DBN with visual and auditory data, which learned to asso- probability distributions other than the Boltzmann distribution
ciate the two sensory modalities, very similarly to how class (even those without any symmetry assumptions). Our choice for
labels and visual data are associated in our architecture. Their the RBM, whose underlying probability distribution is a special
network was able to resolve a similar ambiguity as in our exper- case of the Boltzmann distribution, is motivated by the following
iment in Figure 9, but using auditory inputs instead of a class facts: They are universal approximators of discrete distributions
label. (Le Roux and Bengio, 2008); the conditions under which a spik-
During digit generation, the trained network had a tendency ing neural circuit can naturally perform MCMC sampling of a
to be globally bistable, whereby the layer d completely deacti- Boltzmann distribution were previously studied (Merolla et al.,
vated layer h. Since all the interactions between d and c take 2010; Buesing et al., 2011); and RBMs form the building blocks of
place through the hidden layer, c could not reconstruct the many deep learning models such as DBNs, which achieve state-
digit. To avoid this, we added populations of I&F neurons that of-the-art performance in many machine learning tasks (Bengio,
were wired to layers and h, respectively. The parameters of 2009). The ability to implement RBMs with spiking neurons and
these neurons and their couplings were tuned such that each train then using event-based CD paves the way toward on-line
layer was strongly excited when its average firing rate fell below training of DBNs of spiking neurons (Hinton et al., 2006).
5 Hz. We chose the MNIST handwritten digit task as a benchmark
for testing our model. When the RBM was trained with standard
4. DISCUSSION CD, it could recognize up to 926 out of 1000 of out-of-training
Neuromorphic systems are promising alternatives for large- samples. The MNIST handwritten digit recognition task was pre-
scale implementations of RBMs and deep networks, but the viously shown in a digital neuromorphic chip (Arthur et al.,
common procedure used to train such networks, Contrastive 2012), which performed at 89% accuracy, and in a software sim-
Divergence (CD), involves iterative, discrete-time updates that do ulated visual cortex model (Eliasmith et al., 2012). However,
not straightforwardly map on a neural substrate. We solve this both implementations were configured using weights trained off-
problem in the context of the RBM with a spiking neural network line. A recent article showed the mapping of off-line trained
model that uses the recurrent network dynamics to compute these DBNs onto spiking neural network (OConnor et al., 2013). Their
updates in a continuous-time fashion. We argue that the recurrent results demonstrated hand-written digit recognition using neu-
activity coupled with STDP dynamics implements an event- romorphic event-based sensors as a source of input spikes. Their
driven variant of CD. Event-driven CD enables the system to learn performance reached up to 94.1% using leaky I&F neurons. The

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 84


Neftci et al. Event-Driven Contrastive Divergence

use of an additional layer explains to a large extent their bet- the symmetry condition required by the RBM does not necessar-
ter performance compared to ours (91.9%). Our work extends ily hold. In a neuromorphic device, the symmetry condition is
(OConnor et al., 2013) with on-line training that is based on impossible to guarantee if the synapse weights are stored locally
synaptic plasticity, testing its robustness to finite weight preci- at each neuron. Sharing one synapse circuit per pair of neurons
sion, and providing an interpretation of spiking activity in terms can solve this problem. This may be impractical due to the very
of neural sampling. large number of synapse circuits in the network, but may be
To achieve the computations necessary for sampling from the less problematic when using Resistive Random-Access Memorys
RBM, we have used a neural sampling framework (Fiser et al., (RRAMs) (also called memristors) crossbar arrays to emulate
2010), where each spike is interpreted as a sample of an under- synapses (Kuzum et al., 2011; Cruz-Albrecht et al., 2013; Serrano-
lying probability distribution. Buesing et al. proved that abstract Gotarredona et al., 2013).RRAM are a new class of nanoscale
neuron models consistent with the behavior of biological spik- devices whose current-voltage relationship depends on the his-
ing neurons can perform MCMC, and have applied it to a basic tory of other electrical quantities (Strukov et al., 2008), and so
learning task in a fully visible Boltzmann Machine. We extended act like programmable resistors. Because they can conduct cur-
the neural sampling framework in three ways: First, we identified rents in both directions, one RRAM circuit can be shared between
the conditions under which a dynamical system consisting of I&F a pair of neurons. A second problem is the number of recur-
neurons can perform neural sampling; Second, we verified that rent connections. Even our RBM of modest dimensions involved
the sampling of RBMs was robust to finite-precision parameters; almost two million synapses, which is impractical in terms of
Third, we demonstrated learning in a Boltzmann Machine with bandwidth and weight storage. Even if a very high number of
hidden units using STDP synapses. weights are zero, the connections between each pair of neurons
In neural sampling, neurons behave stochastically. This behav- must exist in order for a synapse to learn such weights. One pos-
ior can be achieved in I&F neurons using noisy input currents, sible solution is to impose sparse connectivity between the layers
created by a Poisson spike train. Spike trains with Poisson-like (Murray and Kreutz-Delgado, 2007; Tang and Eliasmith, 2010)
statistics can be generated with no additional source of noise, for and implement synaptic connectivity in a scalable hierarchical
example by the following mechanisms: balanced excitatory and address-event routing architecture (Joshi et al., 2010; Park et al.,
inhibitory connections (van Vreeswijk and Sompolinsky, 1996), 2012).
finite-size effects in a large network, and neural mismatch (Amit
and Brunel, 1997). The latter mechanism is particularly appeal- 4.2. OUTLOOK: A CUSTOM LEARNING RULE
ing, because it benefits from fabrication mismatch and operating Our method combines I&F neurons that perform neural sam-
noise inherent to neuromorphic implementations (Chicca and pling and the CD rule. Although we showed that this leads to a
Fusi, 2001). functional model, we do not know whether event-driven CD is
Other groups have also proposed to use I&F neuron mod- optimal in any sense. This is partly due to the fact that CDk is an
els for computing the Boltzmann distribution. (Merolla et al., approximate rule (Hinton, 2002), and it is still not entirely under-
2010) have shown that noisy I&F neurons activation function is stood why it performs so well, despite extensive work in studying
approximately a sigmoid as required by the Boltzmann machine, its convergence properties (Carreira-Perpinan and Hinton, 2005).
and have devised a scheme whereby a global inhibitory rhythm Furthermore, the distribution sampled by the I&F neuron does
drives the network to generate samples of the Boltzmann distri- not exactly correspond to the Boltzmann distribution, and the
bution. OConnor et al. (2013) have demonstrated a deep belief average weight updates in event-driven CD differ from those of
network of I&F neurons that was trained off-line, using standard standard CD, because in the latter they are carried out at the end
CD and tested it using the MNIST database. Independently and of the reconstruction step.
simultaneously to this work, Petrovici et al. (2013) demonstrated A very attractive alternative is to derive a custom synap-
that conductance-based I&F neurons in a noisy environment are tic plasticity rule that minimizes some functionally relevant
compatible with neural sampling as described in Buesing et al. quantity (such as Kullback-Leibler divergence or Contrastive
(2011). Similarly, Petrovici et al. find that the choice of non- Divergence), given the encoding of the information in the I&F
rectangular PSPs and the approximations made by the I&F neu- neuron (Deneve, 2008; Brea et al., 2013). A similar idea was
rons are not critical to the performance of the neural sampler. Our recently pursued in Brea et al. (2013), where the authors derived
work extends all of those above by providing an online, STDP- a triplet-based synaptic learning rule that minimizes an upper
based learning rule to train RBMs sampled using I&F neurons. bound of the KullbackLeibler divergence between the model
and the data distributions. Interestingly, their rule had a similar
4.1. APPLICABILITY TO NEUROMORPHIC HARDWARE global signal that modulates the learning rule, as in event-driven
Neuromorphic systems are sensible to fabrication mismatch CD, although the nature of this resemblance remains to be
and operating noise. Fortunately, the mismatch in the synaptic explored. Such custom learning rules can be very beneficial in
weights and the activation function parameters and are not guiding the design of on-chip plasticity in neuromorphic VLSI
an issue if the biases and the weights are learned, and the func- and RRAM nanotechnologies, and will be the focus of future
tionality of the RBM is robust to small variations in the weights research.
caused by discretization. These two findings are encouraging for
neuromorphic implementations of RBMs. However, at least two ACKNOWLEDGMENTS
conceptual problems of the presented RBM architecture must be This work was partially funded by the National Science
solved in order to implement such systems on a larger scale. First, Foundation (NSF EFRI-1137279, CCF-1317560), the Office of

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 85


Neftci et al. Event-Driven Contrastive Divergence

Naval Research (ONR MURI 14-13-1-0205), and the Swiss Gerstner, W., and Kistler, W. (2002). Spiking Neuron Models. Single Neurons,
National Science Foundation (PA00P2_142058). Populations, Plasticity. Cambridge: Cambridge University Press. doi: 10.1017/
CBO9780511815706
Goodman, D., and Brette, R. (2008). Brian: a simulator for spiking neural networks
REFERENCES in Python. Front. Neuroinform. 2:5. doi: 10.3389/neuro.11.005.2008
Amit, D., and Brunel, N. (1997). Model of global spontaneous activity and local Griffiths, T., Chater, N., Kemp, C., Perfors, A., and Tenenbaum, J. B. (2010).
structured activity during delay periods in the cerebral cortex. Cereb. Cortex 7, Probabilistic models of cognition: exploring representations and inductive
237252. doi: 10.1093/cercor/7.3.237 biases. Trends Cogn. Sci. 14, 357364. doi: 10.1016/j.tics.2010.05.004
Arthur, J., Merolla, P., Akopyan, F., Alvarez, R., Cassidy, A., Chandra, S., Haykin, S. (1998). Neural Networks: A Comprehensive Foundation. 2nd Edn.
et al. (2012). Building block of a programmable neuromorphic sub- Prentice Hall. Available online at: http://www.amazon.com/exec/obidos/
strate: a digital neurosynaptic core, in The 2012 International Joint redirect?tag=citeulike07-20&path=ASIN/0132733501
Conference on Neural Networks (IJCNN) (Brisbane, QLD: IEEE), 18. doi: Hinton, G., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep
10.1109/IJCNN.2012.6252637 belief nets. Neural Comput. 18, 15271554. doi: 10.1162/neco.2006.18.7.1527
Bartolozzi, C., and Indiveri, G. (2007). Synaptic dynamics in analog VLSI. Neural Hinton, G., and Salakhutdinov, R. (2006). Reducing the dimensionality of data with
Comput. 19, 25812603. doi: 10.1162/neco.2007.19.10.2581 neural networks. Science 313, 504507. doi: 10.1126/science.1127647
Bengio, Y. (2009). Learning deep architectures for ai. Found. Trends Mach. Learn. 2, Hinton, G. E. (2002). Training products of experts by minimizing contrastive
1127. doi: 10.1561/2200000006 divergence. Neural Comput. 14, 17711800. doi: 10.1162/089976602760128018
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., et al. Indiveri, G., Linares-Barranco, B., Hamilton, T., van Schaik, A., Etienne-
(2010). Theano: a CPU and GPU math expression compiler, in Proceedings Cummings, R., Delbruck, T., et al. (2011). Neuromorphic silicon neuron
of the Python for Scientific Computing Conference (SciPy). Vol. 4 (Austin, TX). circuits. Front. Neurosci. 5, 123. doi: 10.3389/fnins.2011.00073
Available online at: http://deeplearning.net/software/theano/ Joshi, S., Deiss, S., Arnold, M., Park, J., Yu, T., and Cauwenberghs, G.
Brea, J., Senn, W., and Pfister, J.-P. (2013). Matching recall and storage in (2010). Scalable event routing in hierarchical neural array architecture with
sequence learning with spiking neural networks. J. Neurosci. 33, 95659575. doi: global synaptic connectivity, in 12th International Workshop on Cellular
10.1523/JNEUROSCI.4098-12.2013 Nanoscale Networks and Their Applications (Berkeley, CA: IEEE), 16. doi:
Buesing, L., Bill, J., Nessler, B., and Maass, W. (2011). Neural dynamics as sampling: 10.1109/CNNA.2010.5430296
a model for stochastic computation in recurrent networks of spiking neurons. Kempter, R., Gerstner, W., and Van Hemmen, J. (2001). Intrinsic stabilization of
PLoS Comput. Biol. 7:e1002211. doi: 10.1371/journal.pcbi.1002211 output rates by spike-based hebbian learning. Neural Comput. 13, 27092741.
Carreira-Perpinan, M. A., and Hinton, G. E. (2005). On contrastive doi: 10.1162/089976601317098501
divergence learning. Artif. Intell. Stat. 2005, 17. Available online at: Kuzum, D., Jeyasingh, R. G., Lee, B., and Wong, H.-S. P. (2011). Nanoelectronic
http://www.gatsby.ucl.ac.uk/aistats/AIabst.htm programmable synapses based on phase change materials for brain-inspired
Chicca, E. and Fusi, S. (2001). Stochastic synaptic plasticity in deterministic computing. Nano Lett. 12, 21792186. doi: 10.1021/nl201040y
aVLSI networks of spiking neurons, in Proceedings of the World Congress on Le, Q. V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., et al.
Neuroinformatics, ARGESIM Reports, ed. F. Rattay (Vienna: ARGESIM/ASIM (2011). Building high-level features using large scale unsupervised learning.
Verlag), 468477. arXiv preprint: arXiv:1112.6209.
Corneil, D., Sonnleithner, D., Neftci, E., Chicca, E., Cook, M., Indiveri, G., Le Roux, N., and Bengio, Y. (2008). Representational power of restricted boltz-
et al. (2012). Function approximation with uncertainty propagation in a mann machines and deep belief networks. Neural Comput. 20, 16311649. doi:
VLSI spiking neural network, in International Joint Conference on Neural 10.1162/neco.2008.04-07-510
Networks, IJCNN (Brisbane: IEEE), 29902996. doi: 10.1109/IJCNN.2012. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based
6252780 learning applied to document recognition. Proc. IEEE 86, 22782324. doi:
Cox, D. (1962). Renewal Theory. Vol. 1. London: Methuen. 10.1109/5.726791
Cruz-Albrecht, J. M., Derosier, T., and Srinivasa, N. (2013). A scalable neural chip Liu, S.-C., and Delbruck, T. (2010). Neuromorphic sensory systems. Curr. Opin.
with synaptic electronics using cmos integrated memristors. Nanotechnology 24, Neurobiol. 20, 288295. doi: 10.1016/j.conb.2010.03.007
384011. doi: 10.1088/0957-4484/24/38/384011 Mead, C. (1989). Analog VLSI and Neural Systems. Reading, MA: Addison-Wesley.
Deiss, S., Douglas, R., and Whatley, A. (1998). A pulse-coded communications doi: 10.1007/978-1-4613-1639-8
infrastructure for neuromorphic systems, chapter 6, in Pulsed Neural Networks, Merolla, P., Ursell, T., and Arthur, J. (2010). The thermodynamic temperature of
eds W. Maass and C. Bishop (Cambridge, MA: MIT Press), 157178. a rhythmic spiking network. CoRR. abs/1009.5473, ArXiv e-prints. Available
Deneve, S. (2008). Bayesian spiking neurons I: inference. Neural Comput. 20, online at: http://arxiv.org/abs/1009.5473
91117. doi: 10.1162/neco.2008.20.1.91 Murray, J. F., and Kreutz-Delgado, K. (2007). Visual recognition and inference
Deneve, S., Latham, P., and Pouget, A. (2001). Efficient computation and cue using dynamic over complete sparse learning. Neural Comput. 19, 23012352.
integration with noisy population codes. Nature Neurosci. 4, 826831. doi: doi: 10.1162/neco.2007.19.9.2301
10.1038/90541 Neftci, E., Binas, J., Rutishauser, U., Chicca, E., Indiveri, G., and Douglas, R. J.
Destexhe, A., Mainen, Z., and Sejnowski, T. (1998). Kinetic models of synaptic (2013). Synthesizing cognition in neuromorphic electronic systems. Proc. Natl.
transmission , in Methods in Neuronal Modelling, from Ions to Networks, eds C. Acad. Sci. U.S.A. 110, E3468E3476. doi: 10.1073/pnas.1212083110
Koch and I. Segev (Cambridge, MA: MIT Press), 125. Neftci, E., Toth, B., Indiveri, G., and Abarbanel, H. (2012). Dynamic state and
Doya, K., Ishii, S., Pouget, A., and Rao, R. (2006). Bayesian Brain Probabilistic parameter estimation applied to neuromorphic systems. Neural Comput. 24,
Approaches to Neural Coding. Cambridge, MA: MIT Press. doi: 10.7551/ 16691694. doi: 10.1162/NECO_a_00293
mitpress/9780262042383.001.0001 OConnor, P., Neil, D., Liu, S.-C., Delbruck, T., and Pfeiffer, M. (2013). Real-
Eliasmith, C., Stewart, T., Choo, X., Bekolay, T., DeWolf, T., Tang, Y., et al. (2012). time classification and sensor fusion with a spiking deep belief network. Front.
A large-scale model of the functioning brain. Science 338, 12021205. doi: Neurosci. 7:178. doi: 10.3389/fnins.2013.00178
10.1126/science.1225266 Park, J., Yu, T., Maier, C., Joshi, S., and Cauwenberghs, G. (2012). Live
Fiser, J., Berkes, P., Orbn, G., and Lengyel, M. (2010). Statistically optimal per- demonstration: Hierarchical address-event routing architecture for
ception and learning: from behavior to neural representations: perceptual reconfigurable large scale neuromorphic systems, in Circuits and
learning, motor learning, and automaticity. Trends Cogn. Sci. 14, 119. doi: Systems (ISCAS), 2012 IEEE International Symposium on, (Seoul), 707,
10.1016/j.tics.2010.01.003 711, 2023. doi: 10.1109/ISCAS.2012.6272133. Available online at:
Fusi, S., and Mattia, M. (1999). Collective behavior of networks with lin- http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6272133
ear (VLSI) integrate and fire neurons. Neural Comput. 11, 633652. doi: Pedroni, B., Das, S., Neftci, E., Kreutz-Delgado, K., and Cauwenberghs, G. (2013).
10.1162/089976699300016601 Neuromorphic adaptations of restricted boltzmann machines and deep belief
Gardiner, C. W. (2012). Handbook of Stochastic Methods. Berlin: Springer. doi: networks, in International Joint Conference on Neural Networks, IJCNN. (Dallas,
10.1007/978-3-662-02377-8 TX).

www.frontiersin.org January 2014 | Volume 7 | Article 272 | 86


Neftci et al. Event-Driven Contrastive Divergence

Petrovici, M. A., Bill, J., Bytschok, I., Schemmel, J., and Meier, K. (2013). Stochastic van Vreeswijk, C., and Sompolinsky, H. (1996). Chaos in neuronal networks
inference with deterministic spiking neurons. arXiv preprint: arXiv:1311. with balanced excitatory and inhibitory activity. Science 274, 17241726. doi:
3211. 10.1126/science.274.5293.1724
Plesser, H. E., and Gerstner, W. (2000). Noise in integrate-and-fire neurons: Yu, T., Park, J., Joshi, S., Maier, C., and Cauwenberghs, G. (2012) 65k-neuron
from stochastic input to escape rates. Neural Comput. 12, 367384. doi: integrate-and-fire array transceiver with address-event reconfigurable synaptic
10.1162/089976600300015835 routing, in Biomedical Circuits and Systems Conference (BioCAS), IEEE,
Renart, A., Song, P., and Wang, X.-J. (2003). Robust spatial working mem- (Hsinch), 21, 24. 2830. doi: 10.1109/BioCAS.2012.6418479. Available online
ory through homeostatic synaptic scaling in heterogeneous cortical networks. at: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6418479
Neuron 38, 473485. doi: 10.1016/S0896-6273(03)00255-1
Robbins, H., and Monro, S. (1951). A stochastic approximation method. Ann. Conflict of Interest Statement: The authors declare that the research was con-
Math. Stat. 22, 400407. doi: 10.1214/aoms/1177729586 ducted in the absence of any commercial or financial relationships that could be
Schemmel, J., Brderle, D., Grbl, A., Hock, M., Meier, K., and Millner, S. (2010). construed as a potential conflict of interest.
A wafer-scale neuromorphic hardware system for large-scale neural model-
ing, in International Symposium on Circuits and Systems, ISCAS (Paris: IEEE), Received: 07 October 2013; accepted: 22 December 2013; published online: 30 January
19471950. doi: 10.1109/ISCAS.2010.5536970 2014.
Serrano-Gotarredona, T., Masquelier, T., Prodromakis, T., Indiveri, G., and Linares- Citation: Neftci E, Das S, Pedroni B, Kreutz-Delgado K and Cauwenberghs G
Barranco, B. (2013). Stdp and stdp variations with memristors for spiking neu- (2014) Event-driven contrastive divergence for spiking neuromorphic systems. Front.
romorphic learning systems. Front. Neurosci. 7:2. doi: 10.3389/fnins.2013.00002 Neurosci. 7:272. doi: 10.3389/fnins.2013.00272
Silver, R., Boahen, K., Grillner, S., Kopell, N., and Olsen, K. (2007). Neurotech for This article was submitted to Neuromorphic Engineering, a section of the journal
neuroscience: unifying concepts, organizing principles, and emerging tools. J. Frontiers in Neuroscience.
Neurosci. 27, 11807. doi: 10.1523/JNEUROSCI.3575-07.2007 Copyright 2014 Neftci, Das, Pedroni, Kreutz-Delgado and Cauwenberghs. This
Strukov, D. B., Snider, G. S., Stewart, D. R., and Williams, R. S. (2008). The missing is an open-access article distributed under the terms of the Creative Commons
memristor found. Nature 453, 8083. doi: 10.1038/nature06932 Attribution License (CC BY). The use, distribution or reproduction in other forums
Tang, Y. and Eliasmith, C. (2010). Deep networks for robust visual is permitted, provided the original author(s) or licensor are credited and that the orig-
recognition, in Proceedings of the 27th International Conference on inal publication in this journal is cited, in accordance with accepted academic practice.
Machine Learning (ICML-10) (Haifa), 10551062. Available online at: No use, distribution or reproduction is permitted which does not comply with these
http://www.icml2010.org/papers/370.pdf terms.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 272 | 87


ORIGINAL RESEARCH ARTICLE
published: 07 May 2014
doi: 10.3389/fnins.2014.00086

Compiling probabilistic, bio-inspired circuits on a field


programmable analog array
Bo Marr 1* and Jennifer Hasler 2
1
Raytheon, Space and Airborne Systems, Manhattan Beach, CA, USA
2
Georgia Institute of Technology, Atlanta, GA, USA

Edited by: A field programmable analog array (FPAA) is presented as an energy and computational
Tobi Delbruck, University of Zurich efficiency engine: a mixed mode processor for which functions can be compiled at
and ETH Zurich, Switzerland
significantly less energy costs using probabilistic computing circuits. More specifically,
Reviewed by:
it will be shown that the core computation of any dynamical system can be computed
Emre O. Neftci, ETH Zurich,
Switzerland on the FPAA at significantly less energy per operation than a digital implementation.
Davide Badoni, University Tor A stochastic system that is dynamically controllable via voltage controlled amplifier and
Vergata, Italy comparator thresholds is implemented, which computes Bernoulli random variables. From
*Correspondence: Bernoulli variables it is shown exponentially distributed random variables, and random
Bo Marr, Raytheon, Space and
variables of an arbitrary distribution can be computed. The Gillespie algorithm is simulated
Airborne Systems, Manhattan
Beach, 2000 E. El Segundo Blvd, El to show the utility of this system by calculating the trajectory of a biological system
Segundo, CA 90245 USA computed stochastically with this probabilistic hardware where over a 127X performance
e-mail: [email protected] improvement over current software approaches is shown. The relevance of this approach
is extended to any dynamical system. The initial circuits and ideas for this work were
generated at the 2008 Telluride Neuromorphic Workshop.
Keywords: FPAA, probabilistic hardware, reconfigurable analog, dymamical system, bio-inspired, Hardware
accelerator, biological computational model, probability theory

1. INTRODUCTION number of molecules, the stochastic method is more accurate than


Due to the large computational efficiency gap that is theorized its deterministic counterpart. It was further proven that in the
between classic digital computing and neuromorphic style com- thermodynamic limit (large number of molecules) of such a sys-
puting, particularly in biological systems, this work seeks to tem, the deterministic and stochastic forms are mathematically
explore the potential efficiency gains of a neuromorphic approach equivalent (Oppenheim et al., 1969; Kurtz, 1972).
using stochastic circuits to solve dynamical systems. Mathematical results will be extended from Gillespies algo-
There is wide demand for a technology to compute dynam- rithm to show that any dynamical system can be computed using
ical systems much more efficiently with some recent examples a system of stochastic equations. The efficiency in computing
being to calculate quantum equations to aid in the devel- such a system is greatly increased by a direct, probabilistic hard-
opment of quantum computers or in the search for new ware implementation that can be done in the analog co-processor
meta-materials and pharmaceuticals using high computational we present. However, how to express a general dynamical system
throughput search methods for new meta-compounds. Standard stochastically will not be discussed, only that a formulation exists
digital computerseven super computershave proven to be that is more efficient when computed with natively probabilistic
inefficient at these tasks limiting our ability to innovate here. hardware.
Stochastic functions can be computed in a more efficient way Several encryption algorithms and other on-chip solutions for
if these stochastic operations are done natively in probabilistic a uniformly random number generator in hardware have been
hardware. Neural connections in the cortex of the brain is just shown for microprocessors (Ohba et al., 2006). Generating static
such an example and occur on a stochastic basis (Douglas, 2008). Bernoulli trials, where a 1 is generated with fixed probability p
The neurotransmitter release is probabilistic in regards to synapse and 0 is generated with fixed probability 1 p, were proposed
firings (Goldberg et al., 2001) in neural communication. Most using amplified thermal noise across digital gates and is also not a
germaine to this work, many chemical and biological reactions novel concept, but the work in which this concept was described
occur on a stochastic basis and are modeled here via probabilistic was not fabricated, measured, or applied in hardware, and only
circuits compiled on a reconfigurable field programmable analog existed in theory (Chakrapani et al., 2006). Static probabilities,
array (Gillespie, 1976). or those with a fixed p-value, will not allow the performance
Gillespie effectively showed that molecular reactions occur gains that are possible in many stochastic systems because without
probabilistically and gave a method for translating a system of N dynamic p-values stochastic processes cannot be fully realizable in
chemical equations, normally specified by deterministic differen- hardware.
tial equations, into a system of probabilistic, Markov processes. There has been a hardware solution proposed for dynamic
This will be the dynamic system described and computed herein. Bernoulli trial generators, where the probability p can be
It has been shown that in a system with a sufficiently small dynamically reconfigured via reprogramming a floating gate

www.frontiersin.org May 2014 | Volume 8 | Article 86 | 88


Bo Marr and Hasler bioFPAA

transistor (Xu et al., 1972). While this latter work illustrates a We present several circuits to solve these issues that can
note-worthy solution and is a unique implementation to pro- uniquely be compiled in analog technology, and thus our FPAA
vide dynamic probability adjustment, there is an overhead cost co-processor.
in terms of time to readjust the probability p due to the nature of
floating gate programming, which were not designed in that work 2.1. A PROGRAMMABLE THERMAL NOISE CIRCUIT
for the continuous updates that are required for the applications Thermal noise is a well-defined, well-behaved phenomenon that
presented here. we show can be used as a computational resource within the FPAA
The topic of generating stochastic variables with hardware fabric. This work will show that it can be used not only as a
circuits has also been addressed previously in Genov and resource for random number generation, but for the generation
Cauwenberghs (2001), but not in this manner. We make a con- of arbitrarily complex probabilistic functions.
tribution to the literature by showing that not only can we The current through a transistor, and hence the voltage at the
produce stochastic variables, but we can tune the probability of drain or source node of a transistor, shows the probabilistic ther-
these stochastic variables in real time through a software control- mal noise effect as shown in Figure 1, and can be used as the basic
lable input to the Bernoulli trial generator circuit via the FPAA. building block of any probabilistic function generator.
Further, we show how an array of these can be compiled on hard- Thermal noise present in integrated circuits, also known as
ware and where outputs are input to a priority encoder to create Johnson Noise, is generated by the natural thermal excitation of
an exponentially random variable for the first time known to the the electrons in a circuit. When modeled on a capacitive load, the
authors. Finally, this paper shows how the FPAA, with tunable root-mean-square voltage level of the thermal noise is given by

stochastic variables which can be dynamically tuned in real time,
UT = kT C where k is Boltzmanns constant, T is temperature,
results in significant performance gains of 127X for computing
dynamical systems. and C is the capacitance of the load. The likelihood of the voltage
In short, this paper will present several novel contributions level of thermal noise is modeled as a Gaussian probability func-
including tion and has an equivalent magnitude throughout its frequency
spectrum and is thus known as white Gaussian noise (Kish, 2002).
To take advantage of the well-defined properties of ther-
A novel circuit for fast dynamic Bernoulli random number
mal noise trapped on a capacitor, the circuit in Figure 2A was
generation.
developed. This circuit can be broken down into the circuit
A compiled chaos circuit to generate environment independent
components seen in Figure 2B.
probabilistic variables.
Thermal noise voltage on a 0.35 m process, which is the pro-
A novel circuit for fast dynamic exponentially distributed ran-
cess size used for testing in this paper, with capacitor values in the
dom number generation.
range of C = 500 fF has RMS noise level of roughly 100 V. Even
Analysis of the performance gains provided by the latter cir-
with a 100 mV supply, the thermal noise voltage that would cause
cuits over current methods for Gillespies algorithm that apply
a probabilistic digital bit flip is 1000 down from supply, giv-
to many biological applications and stochastic applications in
ing the probabilistic function designer limited options. Hence, in
general.
these experiments the thermal voltage signal was routed through
Extension of the latter methods for applicability to any dynam-
two operational transconductance amplifiers (OTA) with gain Ai .
ical system and the result that all dynamic systems calculated
These circuits are shown in Figure 3.
stochastically require exponentially distributed random num-
By routing the amplified thermal noise signal through a com-
bers.
parator, and then comparing this signal to a user selectable voltage
A method for going from concept to circuit measurements in
signal, a Bernoulli trial is created where the comparator outputs
2 weeks using a novel reconfigurable chipset developed in part
by the authors.

Section 2 introduces the technology behind building probabilistic


function generators in hardware using thermal noise characteris-
tics. Section 3 reviews implementation of dynamical systems in
general and specifically Gillespies Algorithm to give a context for
why these circuits are important. Section 4 will review the chipset
that was built in part by the authors and how it can used for
faster stochastic computation. Section 5 will discuss the hardware
results and experimental measurements. Section 6 will conclude
the paper and discuss future directions.

2. EXTREMELY EFFICIENT STOCHASTIC CIRCUITS


Stochastic computation has shown to be a powerful tool to com- FIGURE 1 | (A) The noise characteristics generated by a single transistor
pute solutions to systems that would otherwise require complex can be used as well-defined noise source. The current through and thus the
voltage at the drain or source node of the transistor produces noise due to
continuous-time differential equations. However, the efficacy of
the thermal agitation of charge carriers. (B) Voltage measured from a
stochastic methods are lost if complex computations are needed transistor on 0.35 m test chip.
to produce the digital representation of these stochastic results.

Frontiers in Neuroscience | Neuromorphic Engineering May 2014 | Volume 8 | Article 86 | 89


Bo Marr and Hasler bioFPAA

a digital 1 if the amplified thermal voltage signal is greater than Figure 6 shows how these Bernoulli random variables are used
the probability select and a 0 is output otherwise. A probability to create an exponentially distributed number by being placed as
p can be set for the Bernoulli trial by setting the input voltage to inputs to a priority encoder. Recall that a priority encoder works
the comparator such that it is less or more likely for a randomly by encoding the output to represent in binary which input was the
varying thermal voltage to surpass this value, Probability Select. first in priority order (from top to bottom for example) to be a 1.
The integral from the input voltage to Vdd of the thermal noise Also recall from Equation (1) that the number of Bernoulli trials
function is the probability of a digital 1, and the probability of needed to get a 1 is exponentially distributed. !The top Bernoulli
a digital 0 is the integral from ground to the input voltage. This trial in the figure is considered our first trial, the second from the
concept is illustrated in Figure 4. top, our second trial etc. So the priority encoder in Figure 6 is
Note that the reconfigurable FPAA used to program this circuit encoding for us how many trials are needed to get a success (1),
has floating-gate controllable current biases such that they can be exactly our exponential distribution.
used to offset temperature effects. Using these methods, an exponentially distributed random
A Bernoulli random variable is generated with a probability p number can be generated in two clock cycles, a vast improvement
dynamically selectable by the user using these techniques. A useful
property of Bernoulli trials is that an arbitrary probability distri-
bution can be created when used in large numbers (as the number
of Bernoulli trials N ) they can be used to create an arbi-
trary probability distribution. This phenomenon is illustrated in
Figure 5.
An exponential random variable is generated from a Bernoulli
variable in the following way. X is defined here as the number
of Bernoulli trials needed to produce a success, and this variable
X is exponentially distributed. For example, to require six coin
flips to produce a heads is exponentially less likely than to require
two flips to get a head, since this is nothing more than a standard
geometric sequence. The shape of this exponential distribution is
controlled by the probability p of the Bernoulli trials.

Pr(X = k) = (1 p)k1 p (1)

FIGURE 3 | Circuits for generating a Bernoulli random variable (1 with


probability p and 0 with probability 1-p). (A) Dynamic Bernoulli
Probability Circuit. Thermal noise is trapped on the capacitor, amplified
twice through two operational transconductance amplifiers (OTA) with gain
of Ai then put through a comparator with a probability select input. Note
that these three OTAs are the same basic circuit programmed to different
functionality. (B) Nine-transistor OTA used in (A) for amplifier and
comparator circuits.

FIGURE 2 | Circuit used to take advantage of the well-defined thermal


noise properties on a capacitor where the root-mean-square
 (RMS)
voltage of the noise on a capacitor is Vn = kT C
. The thermal noise on FIGURE 4 | The probability distribution function of thermal noise and
this capacitor is used as the gate voltage for PMOS and NMOS transistors the probability of generating a P(1) or P(0) in the Bernoulli variable
where a current with the same spectrum as the thermal noise is generated. generating circuit. The probability of a 1 is the integral under the
This ultimately produces a voltage controlled current through the probability distribution function of the thermal noise from the comparator
diode-connected transistor. voltage to the supply voltage of the circuit, Vdd .

www.frontiersin.org May 2014 | Volume 8 | Article 86 | 90


Bo Marr and Hasler bioFPAA

FIGURE 5 | A Bernoulli probability trial can be used to generate an arbitrary probability distribution with the correct transfer function. This result will
be taken advantage of to greatly increase the performance of dynamical systems in this paper.

in this instance was connected back to the computer via GPIB


that was producing the netlist to allow a full hardware in the
loop programmable environment. Current toolsets are even more
advanced allowing Simulink and other Simulink-like tools to
build the circuit netlists.

2.3. TEMPERATURE INVARIANT BERNOULLI TRIALS


The thermal noise circuits used to create Bernoulli trials shown
in the previous section have the well known side effect that
their accuracy is highly dependent on temperature. And although
methods such as adjusting bias currents with temperature are
available on the FPAA, we present a temperature invariant
FIGURE 6 | Illustration of how to transform N Bernoulli trials into an
method here to address this potential variability in the Bernoulli
exponential distribution. Bernoulli trials put through a priority encoder as trials presented previously. These chaos circuits were built as
inputs results in an exponentially distributed probability function, where the follow-up work to the Telluride workshop.
shape of the exponential function can be tuned through the threshold Chaos circuits were chosen to exemplify a more temperature
inputs to the Bernoulli trials.
invariant method. The model and explanation for the low-power
chaos circuit used in this paper is first presented in Dudek and
over software and other current methods, which will be explained Juncu (2005).
in the next section. A chaos circuit works by introducing a seed value to a non-
linear chaos map circuit which is itself chaotic. The sample and
2.2. PROGRAMMING BERNOULLI TRIALS AT TELLURIDE WORKSHOP hold circuit then captures a continuous voltage value for con-
All of the above circuits from the previous section were con- sumption by a stochastic algorithm. The chaos map from Dudek
ceived of, designed, built, measured, and compiled in about in and Juncu (2005) was chosen because of its proven results, but
the 2 weeks of the 2008 Telluride Neuromorphic workshop. This also because it only requires nine transistors and is extremely
accomplishment is both a testament to the productivity that the energy efficient.
Telluride Neuromorphic workshop allows as well as a testament The resulting chaos map with a tunable control voltage to
to the quick prototyping capability of the FPAA. dictate the probability characteristics is shown in Figure 7.
The Telluride Neuromorphic workshop is unique in that some While further reading may be needed to understand the chaos
of the most brilliant minds in the country get together for an circuit map shown in Figure 7, this map is very close to the results
extended several week session dedicated to teaching, working with expected as shown in the literature. The general idea is that a
real hardware, and producing results such that students are com- given output voltage will result in a random assignment to the
pletely immersed in a world class engineering environment, day chaos map, allowing us to generate random variables in a tem-
and night, for 23 weeks. perature invariant way. The idea is that this chaos map could be
The FPAA is analogous to the FPGA for logic circuits in that used in place of the thermal noise circuits should the designer be
circuits can be conceived of, programmed onto the chip, and mea- concerned about temperature.
sured in the same day. The FPAA has a mature toolset where These circuits all have something in common: they can be used
an analog designer can conceive of a circuit, such as during a to directly compute a stochastic function, cannot be compiled on
Telluride lecture session on analog design, and can simply cre- a digital chip, and compute more efficiently than a digital system.
ate a netlist. The FPAA tools automatically take this netlist and Next we show how the usefulness of these circuits in a
optimally compile it to the fabric using a bitstream to program dynamical system.
the on-board floating gate devices to set switches allowing net-
works of active and passive devices, set current sources, bias 3. GILLESPIES ALGORITHM FOR STOCHASTIC
currents, amplifier characteristics, and calibrate out device mis- COMPUTATION
match. A standard multi-meter is connected to the FPAA test The previous findings are used to generate the results of a chem-
board via built-for-test pinned out circuit leads. The multi-meter ical and biological system using Gillespies algorithm in this

Frontiers in Neuroscience | Neuromorphic Engineering May 2014 | Volume 8 | Article 86 | 91


Bo Marr and Hasler bioFPAA

to do a given calculation on a modern general purpose CPU as


described in Patterson and Hennessy (2004).
A load instruction is used to initialize a variable in Step 1.
With the best case with a multiple data fetch scheme such as in
the cell processor, this requires a single computational step. The
propensity function ai in Step 2 is calculated by a floating point
multiplication (FPMUL), which takes five computational steps in
a modern processor per reaction (Gillespie, 1976; Patterson and
Hennessy, 2004). All r reactions assuming FPMUL units are
available takes 5r total computational steps. In Step 4, the mini-
mum reaction time takes r 1 compare operations. Assuming
ALU (compare) units are available, Step 4 takes r1 compu-
tational steps. Step 5 involves r 1 integer addition/subtraction
operations taking again r1 computational steps. Step 6 is an
update in the program counter resulting in a single step. Step 3
is a key step where each i according to an exponential distribu-
tion. Generating an exponentially distributed random number is
FIGURE 7 | Measured y(x) voltage vs. Vc for the chaos map circuit complex and deserves a bit more treatment.
compiled on the FPAA. This is believed to be the first method to generate an exponen-
tially distributed random number in hardware, and the random
numbesr generated by other current methods is only pseudo-
section. This section will also review the expense to calculate random. The ParkMiller algorithm on a modern advanced
the trajectory of stochastic systems in software as a compari- processor is the best known software method where a uni-
son. Gillespies algorithm is a natively probabilistic algorithm that formly pseudo-random number is generated in 88 computational
takes advantage of the naturally stochastic trajectories of molecu- steps (Park and Miller, 1998; Patterson and Hennessy, 2004). The
lar reactions (Gillespie, 1976), this algorithm is described below. equation to transform a uniformly distributed random variable
U to one with an exponential distribution E with parameter is
3.1. GILLESPIES ALGORITHM shown in Equation (2).
1. Initialize. Set the initial number of each type of molecule in the
system and time, t = 0. ln U
E= (2)
2. For each reaction i, calculate the propensity function to obtain
parameter value, ai .
3. For each i, generate a reaction time i according to an expo- The natural logarithm function, ln is extremely expensive, and
nential distribution with parameter ai . even in the best case of computing this on a modern digital signal
4. Let be the reaction time whose time is least, . processor (DSP) takes 136 computational steps by itself accord-
5. Change the number of molecules to reflect execution of reac- ing to Yang et al. (2002). Thus counting the FP multiply and the
tion . Set t = t + . FP divide taking 5 steps and 32 steps, respectively (Patterson and
6. If initialized time or reaction constraints met, finished. If not, Hennessy, 2004), it takes a total of 261 computational steps to
go to step 2. generate a single exponentially distributed pseudo-random vari-
able in software. Thus Step 3 alone takes 261r computational steps
We use complexity analysis, or big-Oh, analysis to analyze the to generate i for all i reactions. To review the number of compu-
algorithms here where O(x) gives an expression, x, that represents tational steps for each part of the algorithm is shown below.
the worst case running time of the algorithm. , Only algorith-
mic improvements in software have been made to computing 3.2. COMPUTATIONAL STEPS IN GILLESPIES ALGORITHM
Gillespies algorithm, until this work, such as Gibson et al. who Algorithmic step Computational steps
have improved the running time of the algorithm from O(Er) to (1)Initialize. 1
O(r + Elogr) where E is the number of reaction events in the tra- (2)Multiply to find each ai . 5r

jectory and r is the number of different reaction types (Gibson (3)Generate each i . 261r
and Bruck, 1998). Several orders of magnitude improvement in (4)Find . r1

energy efficiency and performance can be realized by comput- (5)Update. r1

ing the exponentially distributed random variable i in hardware. (6)Go to step 2 1
Note that the big-Oh function does not change, just the way we
implement this algorithm is much improved. Thus for a conservative value of = 2, generating each expo-
The generation of the exponentially distributed random num- nentially distributed i in Step 3 takes approximately 98% of the
ber is the bottleneck of the algorithm, and the computational computational steps for a single iteration of Gillespies algorithm.
complexity of each step is calculated to show this. The metric used Seen in this light, the problem of improving exponential random
to judge computational cost is the number of clock cycles it takes variable generation becomes quite an important one.

www.frontiersin.org May 2014 | Volume 8 | Article 86 | 92


Bo Marr and Hasler bioFPAA

3.3. EXPANSION TO ANY DYNAMICAL SYSTEM be needed. To calculate P0 ( ) from Equation (4), we break the
It has been shown in Gillespie (1976) and Kurtz (1972) that the interval (t, t + ) into K subintervals of equal length  = /K,
trajectory of a chemical system consisting of N different reactant and calculate the probability that no state update occurs in the
concentrations can be expressed as both a system of determinis- first  subinterval (t, t + ) which is:
tic differential equations and a system of stochastic processes. For
deeper understanding of Equations (35) that follows, the reader 
N 
N
is encouraged to read these aforementioned references. This con- [1 i  + o()] = 1 i  + o() (6)
cept can be generalized to any dynamical system where the time i=1 i=1
evolution of a system of N state variables that has been described
in a classical, state-based deterministic model as This probability is equal for every subinterval (t, t + 2), (t, t +
3), and so on. Thus the probability P0 ( ) for all K subintervals
dx1
= f1 (x1 , x2 , x3 , ...) is:
dt
dx2  K
dt = f2 (x1 , x2 , x3 , ...) 
N
P0 ( ) = lim 1 i  + o() (7)
... K
i=1
 K
and in general as: 
N
= F(X)
X (3) = lim 1 i /K + o(/K) (8)
K
i=1
This system can also be expressed as a stochastic, Markov process.
For the stochastic system, the probability that state variable x Where o() is the probability that more than one event occurs in
is updated during the next time interval is assigned, P(, )d . the time interval . Following the analysis in Gillespie (1976), we
More formally, this is the joint probability density function at assume as our incremental time interval goes to zero our func-
time t expressing the probability that the next state update will tion o() 0 as well. With o(/K) 0 we are left with the
occur in the differential time interval (t + , t + + d ) and that probabilistic, exponential function in Equation (9).
this update will occur to state variable x for = 1 . . . N and  
0 < . 
N

Given probability P0 ( ), the probability that no state-space P0 ( ) = exp i (9)


update occurs in the interval (t, t + ), and the probability, i=1

that an update to state x will occur in the differential time inter-


Thus we prove how this work can be extended to any dynamical
val (t + , t + + d ) we have the general form for the joint
system.
probability function:

P(, )d = P0 ( ) d (4) 4. RECONFIGURABLE ANALOG HARDWARE FOR


STOCHASTIC COMPUTATION
Note that is based on the state of the system X. Also note that These complex probability functions are generated on a recon-
determining is the critical factor in determining the Markov figurable platform, reviewed in this section. More specifically, a
process representation and no general method for this is given dynamic Bernoulli Trial generator is illustrated on chip. This novel
here. In the chemical system example, is the probability that method involves using the reconfigurable analog signal proces-
reaction R is going to occur in the differential time interval and is sor (RASP) chip that was recently introduced (Basu et al., 2008).
a function of the number of each type of molecule currently in the This device allows one to go from concept to full functionality
system. The probability that more than one state update will occur for analog circuits in a matter of weeks or even days instead of
during the differential time interval is shown to be small and thus months or years for large-scale, integrated circuits. The design
ignored (Kurtz, 1972; Gillespie, 1976). Finally some function g presented went from concept to measured hardware in a matter of
must be given describing the update to state variable x once 2 weeks. The other useful feature is that many FPGA type archi-
an update occurs. We then have the stochastic, Markov process tectures allow a designer to build a subset of the possible circuits
defined for the system: available with the RASP. Circuits such probabilistic function gen-
erators could not be produced on strictly digital reconfigurable
Pr[X(t + + d ) = G(X(t)) | X(t)] = P(, ) (5) architectures although digital designs can be built on the RASP.
The RASP chip is shown in Figure 8.
Note that this formulation does not make the assumption that
infinitesimal d need be approximated by a finite time step  , 4.1. STOCHASTIC CIRCUIT ARCHITECTURE
which is a source of error in many Monte Carlo formulations. The macroarchitecture and details of the algorithmic implemen-
To solve this system using computational methods, random tation via Bernoulli trials and how this is built on the RASP chip
numbers are generated according to the probability distributions is explored here. The RASP has 32 reconfigurable, computational
described by Equation (5). No matter what dynamical system is analog blocks (CABs). The elements of a CAB and the elements
involved, exponentially distributed random numbers will always that are used in this design are shown in Figure 9.

Frontiers in Neuroscience | Neuromorphic Engineering May 2014 | Volume 8 | Article 86 | 93


Bo Marr and Hasler bioFPAA

FIGURE 8 | Micrograph of the Reconfigurable Analog Signal Processor


(RASP), also referred to the Field Programmable Analog Array (FPAA).
The circuits were compiled onto this device. Computational Analog Blocks
(CAB) populate the chip where both the computational elements and the
routing is configured via floating gates. Buffers, capacitors, transmission
gates, NMOS, PMOS, floating gates, current biases, adaptive amplifiers,
and analog multiply arrays are all available and fully programmable on the
chip. Development is shared by many in the CADSP group at Georgia Tech. FIGURE 9 | The typical routing structure and available analog
computational elements in one of the 32 computational analog blocks
(CABs) present on the RASP chip. The elements used in the present
design are highlighted in red. Alternating CABs have NMOS transistors and
An array of up to 32 Bernoulli trials can be calculated simul-
PMOS as the bottom CAB element, and although only three of the
taneously on a single RASP device. New versions of the RASP MOSFETs are used in our circuit from Figure 3, instead of the four that are
have been fabricated in 350, 130, and 45 nm; an entire family circled, this is meant to show several different combinations of these four
of RASP 2.9 chip variants exist for different applications spaces, transistors can be used to make the three transistor circuit. The top triangle
which allow as much as 10X this number of Bernoulli trials, and elements are an OTAs showing the two inputs and output being routed
back into the fabric which is a mesh of lines with switches able to connect
this scales with Moores law. The RASP chipset and accompa-
any two lines that cross in the mesh. The element below the OTAs is a
nying tools also have the ability to be linked together easily for capacitor and the bottom elements circled are NMOS and PMOS available
a multi-core RASP chipset should more Bernoulli generators be to be routed.
needed. The RASP chipset is useful for a proof of concept here.
Since each Bernoulli generator only takes 30 transistors, many
thousands of these circuits could be built in custom hardware if The distribution of the Bernoulli trial circuits were measured
needed. in the following way: 2500 random numbers were captured at
each Probability Select voltage. The number of successes was
5. CHIP MEASUREMENTS AND EXPERIMENTAL RESULTS divided by the total number of samples captured at each voltage
To gather data, the Probability Select line described in Figure 3A to calculate the probability of a Bernoulli success (probability of
and the output producing the random numbers were routed randomly generating a 1). The results are shown in Figure 10.
to the primary inputs/outputs of the chip. A 40-channel, 14- An example of a voltage signal from the Dynamic Bernoulli
bit digital-to-analog-converter (DAC) chip was interfaced with Probability Circuit producing a digital 1 with probability p =
the RASP chip on a printed circuit board, which we used as 0.90 is shown in Figure 11. The voltage signal was recorded via
our testing apparatus, so that any arbitrary voltage could be on-chip measurement circuits and transmitted to a PC through a
input to the Probability Select line. This chip is 40 channel in USB connection to the chipset.
the sense that there are 40 independent outputs of the DAC. The array of Bernoulli trials was encoded and the exponential
The outputs of the RASP chip were connected to oscilloscope distribution of reaction times, i , was generated. The result-
probes so that the noise spectrum and random numbers could be ing distribution is what one would expect and matches a true,
captured. exponential distribution as shown in Figure 12.

www.frontiersin.org May 2014 | Volume 8 | Article 86 | 94


Bo Marr and Hasler bioFPAA

FIGURE 10 | Measurements from the output of the Dynamic Bernoulli FIGURE 11 | Voltage output from Dynamic Bernoulli Probability Circuit
Probability Circuit shown in Figure 3 were taken. The Probability also called a random number generator (RNG) circuit as labeled in the
Select voltage was adjusted from 0.1 volts up to 1.4 volts and the resulting graph, corresponding to the output out of the third (final) OTA circuit
probability of a digital 1 being produced was recorded with a 95% from Figure 3. A 1 volt offset was arbitrarily chosen for the comparator, but
confidence interval. Measurements were only taken down to p = 0.5 since other voltage offset values than this were anecdotally observed to have
a Bernoulli trial is symmetric about this value. undesired noise effects resulting in spurious switching at the output of the
comparator. The noise amplifiers, and thus comparators, were observed to
switch at approximately 208 ps as a maximum rate, thus independent
5.1. VALIDATION OF RANDOMNESS Bernoulli variables could be produced at most at this time period in this
particular device. A digital 1 is produced with probability p = 0.90 and
A probabilistic output and a random output are differing con-
0 with 1 p = 0.10. When the voltage is above the threshold
cepts, and the ability to control this difference is the strength Vout > Vdd V ss
it is considered a digital 1 and otherwise a 0, where the
2
of the proposed circuits. They are linked together and defined threshold happens to be 0 volts in this case. The samples in this circuit
through Shannons entropy (Shannon, 1949). Formally, entropy change much faster than can be consumed, and thus random samples are
and thus the randomness of a function are defined by H in taken from the output of this circuit at a slower rate than the rate at which it
changes state, but preserving randomness.
Equations (10, 11).
Let
1  Each of the tests are measured on a scale from 0 to 1, where
Hn = p(i, j, ..., s) log2 p(i, j, ..., s) (10) a passing mark is considered >0.93 and higher marks indi-
n
i,j,...,s cate a higher quality sequence. For a random output p = 0.5
these circuits with thermal noise as the source of randomness
Then entropy is passed all but the Overlapping Template Matching Test and
H = lim Hn (11) the Lempel-Ziv Complexity Test and even these two tests
n
received high marks >0.80. They also perform consistently better
where p(i, j, ..., s) is the probability of the sequence of symbols than the software generated, ParkMiller psuedo-random num-
i, j, ..., s, and the sum is over all sequences of n symbols. bers used by most algorithms, which failed half the tests in
By the same definition, a function exhibits the most random- the suite with some failing badly <0.10 (Chakrapani et al.,
ness if H is maximized, which occurs when all output sequences 2006).
are equally likely, or equivalently, if all possible outputs have
an equal probability of occurring (Shannon, 1949). From this 6. CONCLUSIONS AND FUTURE DIRECTIONS
work, a function is defined as random if all outputs have a uni- It was shown in section 3 that to generate an exponentially dis-
form probability of occurring. Conversely, we define a function tributed random variable in software takes a minimum of 261
as probabilistic if the function has an entropy 0 < H < log2 n. computational steps with the Park Miller algorithm. And with
There exist statistical measures of randomness, developed by hardware random number generators shown in previous micro-
the National Institute of Standards and Technology (NIST), in processor works, only uniformly random numbers were available.
the form of a suite consisting of 16 independent tests to mea- Bernoulli trials are generated here in hardware with a single com-
sure how random a sequence of numbers truly is Random Number putational step, and an exponentially distributed random number
Generation and Testing (2014). However, these tests only measure is generated with two computational steps.
the performance of random functions and not probabilistic ones Because of the high gain of our amplifiers, the thermal noise
such as the circuits presented in this work, although random distribution used to generate probabilistic distributions with our
number generation (when p = 0.5) is a subset function of these hardware is extremely sensitive to perturbations such as ambi-
circuits. ent electrostatic interactions, device variations, and changes in

Frontiers in Neuroscience | Neuromorphic Engineering May 2014 | Volume 8 | Article 86 | 95


Bo Marr and Hasler bioFPAA

REFERENCES
Basu, A., Twigg, C. M., Brink, S., Hasler, P., Petre, C., Ramakrishnan, S., et al.
(2008). Rasp 2.8: a new generation of floating-gate based field programmable
analog array, in Proceedings, Custom Integrated Circuits Conference CICC (San
Jose, CA).
Chakrapani, L., Akgul, B. E. S., Cheemalavagu, S., Korkmaz, P., Palem, K., and
Seshasayee, B. (2006). Ultra-efficient (embedded) SOC architectures based on
probabilistic cmos (PCMOS) technology, in Proceedings of Design Automation
and Test in Europe (DATE) (Munich).
Douglas, R. (2008). Modeling Development of the Cortex in 3d: From Precursors to
Circuits. Available online at: https://neuromorphs.net/ws2008/
Dudek, P., and Juncu, V. (2005). An area and power efficient discrete-time chaos
generator circuit, in Proceedings of the 2005 European Conference on Circuit
Theory and Design, 2005, IEEE 2, II87.
Genov, R., and Cauwenberghs, G. (2001). Stochastic mixed-signal vlsi architec-
ture for high-dimensional kernel machines, in Advances in Neural Information
Processing Systems (Vancouver, British Columbia), 10991105.
Gibson, M., and Bruck, J. (1998). An Efficient Algorithm for Generating Trajectories
of Stochastic Gene Regulation Reactions. Technical Report, California Institute of
Technology.
Gillespie, D. T. (1976). A general method for numerically simulating the stochastic
time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403434.
FIGURE 12 | The histogram (and thus a scaled probability distribution
Goldberg, D. H., Cauwenberghs, G., and Andreou, A. G. (2001). Analog
function) of the exponentially distributed Gillespie reaction times, i s,
vlsi spiking neural network with address domain probabilistic synapses, in
generated.
International Symposium on Circuits and Systems (Sydney).
Kish, L. B. (2002). End of Moores law: thermal (noise) death of integration in
micro and nano electronics. Phys. Lett. A 305, 144149. doi: 10.1016/S0375-
9601(02)01365-8
ambient temperature. Environment invariant chaos circuis were Kurtz, T. G. (1972). The relationship between stochastic and deterministic models
compiled and measured to mitigate these concerns. Because the for chemical reactions. J. Chem. Phys. 57, 29762978. doi: 10.1063/1.1678692
Bernoulli trial circuits presented here can be controlled via a Ohba, R., Matsushita, D., Muraoka, K., Yasuda, S., Tanamoto, T., Uchida, K., et al.
programmable input signal, software calibration can be done to (2006). Si nanocrystal mosfet with silicon nitride tunnel insulator for high-rate
random number generator, in Emerging VLSI Technologies and Architectures
mitigate these concerns as well.
(Karlsruhe).
An estimated performance increase of approximately 130X is Oppenheim, I., Shuler, K. E., and Weiss, G. H. (1969). Stochastic and determin-
realized based on measured results to generate exponentially dis- istic formulation of chemical rate equations. J. Chem. Phys. 50, 460466. doi:
tributed random numbers. With the assumption used in section 3 10.1063/1.1670820
that generating exponential random variables takes up 98% of the Park, S., and Miller, K. (1998). Random number generators: good ones are hard to
find. Commun. ACM 31, 10.
computation time of a single iteration through Gillespies algo- Patterson, D., and Hennessy, J. (2004). Computer Organization and Design, 3rd Edn.
rithm, our system could potentially speed up the calculation of Boston: Morgan Kauffman.
the trajectory of this algorithm by approximately 127X. Random Number Generation and Testing. (2014). Available online at:
Further is was shown that these performance increases via http://csrc.nist.gov/rng/
hardware generated probabilistic distributions can be applied to Shannon, C. (1949). Communication is the presence of noise, in Proceedings of
the I.R.E. (New York, NY), 1021.
any dynamical system and possibly have a much wider impact than Xu, P., Horiuchi, T. K., and Abshire, P. A. (1972). Compact floating-
the field of biological computations. gate true random number generator. Electron. Lett. 42, 23. doi:
Such a method to increase computational efficiency by two 10.1109/JRPROC.1949.232969
orders of magnitude is believed to be widely useful in calculat- Yang, M., Wang, Y., Wang, J., and Zheng, S. Q. (2002). Optimized scheduling and
mapping of logarithm and arctangent functions on ti tms320c67x processor,
ing biological, statistical, or quantum mechanical systems. The
in IEEE International Conference on Aucoustics, Speech, and Signal Processing
search for meta-materials new medicines, or any other host of (Orlando, FL). doi: 10.1109/ICASSP.2002.5745319
applications could benefit. Future directions for this specific work
include attempting to hardware accelerate quantum algorithms Conflict of Interest Statement: The authors declare that the research was con-
and venturing into the world of software defined analog radios. ducted in the absence of any commercial or financial relationships that could be
construed as a potential conflict of interest.

ACKNOWLEDGMENTS Received: 01 October 2013; accepted: 04 April 2014; published online: 07 May 2014.
This project was funded in part by the National Science Citation: Marr B and Hasler J (2014) Compiling probabilistic, bio-inspired circuits
on a field programmable analog array. Front. Neurosci. 8:86. doi: 10.3389/fnins.
Foundation, NSF award ID 0726969 and the Defense Advanced 2014.00086
Research Project Agency. The authors would also like to thank the This article was submitted to Neuromorphic Engineering, a section of the journal
organizers and participants of the 2008 Telluride Neuromorphic Frontiers in Neuroscience.
Workshop and the Institute for Neuromorphic Engineering. Dr. Copyright 2014 Marr and Hasler. This is an open-access article distributed under
Marr is currently at Raytheon company and Dr. Hasler is a profes- the terms of the Creative Commons Attribution License (CC BY). The use, distribu-
tion or reproduction in other forums is permitted, provided the original author(s)
sor at Georgia Tech where Dr. Marr performed the work. Finally or licensor are credited and that the original publication in this journal is cited, in
the authors would like to thank Raytheon for providing funding accordance with accepted academic practice. No use, distribution or reproduction is
to continue and publish the research. permitted which does not comply with these terms.

www.frontiersin.org May 2014 | Volume 8 | Article 86 | 96


ORIGINAL RESEARCH ARTICLE
published: 02 April 2014
doi: 10.3389/fnins.2014.00054

An adaptable neuromorphic model of orientation


selectivity based on floating gate dynamics
Priti Gupta* and C. M. Markan
VLSI Design Technology Lab, Department of Physics and Computer Science, Dayalbagh Educational Institute, Agra, Uttar Pradesh, India

Edited by: The biggest challenge that the neuromorphic community faces today is to build systems
Jennifer Hasler, Georgia Insitute of that can be considered truly cognitive. Adaptation and self-organization are the two
Technology, USA
basic principles that underlie any cognitive function that the brain performs. If we can
Reviewed by:
replicate this behavior in hardware, we move a step closer to our goal of having cognitive
Shantanu Chakrabartty, Michigan
State University, USA neuromorphic systems. Adaptive feature selectivity is a mechanism by which nature
Milutin Stanacevic, Stony Brook optimizes resources so as to have greater acuity for more abundant features. Developing
University, USA neuromorphic feature maps can help design generic machines that can emulate this
Bradley A. Minch, Franklin W. Olin
College of Engineering, USA
adaptive behavior. Most neuromorphic models that have attempted to build self-organizing
systems, follow the approach of modeling abstract theoretical frameworks in hardware.
*Correspondence:
Priti Gupta, VLSI Design Technology While this is good from a modeling and analysis perspective, it may not lead to the most
Lab, Department of Physics and efficient hardware. On the other hand, exploiting hardware dynamics to build adaptive
Computer Science, Faculty of systems rather than forcing the hardware to behave like mathematical equations, seems
Science, Dayalbagh Educational
Institute, Agra-282005,
to be a more robust methodology when it comes to developing actual hardware for
Uttar Pradesh, India real world applications. In this paper we use a novel time-staggered Winner Take All
e-mail: [email protected] circuit, that exploits the adaptation dynamics of floating gate transistors, to model an
adaptive cortical cell that demonstrates Orientation Selectivity, a well-known biological
phenomenon observed in the visual cortex. The cell performs competitive learning,
refining its weights in response to input patterns resembling different oriented bars,
becoming selective to a particular oriented pattern. Different analysis performed on the cell
such as orientation tuning, application of abnormal inputs, response to spatial frequency
and periodic patterns reveal close similarity between our cell and its biological counterpart.
Embedded in a RC grid, these cells interact diffusively exhibiting cluster formation, making
way for adaptively building orientation selective maps in silicon.

Keywords: feature maps, orientation selectivity, time-staggered WTA, floating gate synapses

1. INTRODUCTION learn according to cues in their environment (Indiveri et al., 2009;


The past decade has been a landmark decade in the progress of Indiveri and Horiuchi, 2011).
Neuromorphic Engineering. Technological advances have paved A major step toward building such systems would be to under-
the way for large scale neural chips having millions of neurons stand the underlying principles that the brain uses to accomplish
and synapses (Indiveri et al., 2006; Bartolozzi and Indiveri, 2007; adaptation. It is well accepted now that very early in develop-
Wijekoon and Dudek, 2008). We now have silicon cochleas and ment the brain has a generic cortical structure that adapts to
retinas (Chan et al., 2007; Lichtsteiner et al., 2008). A num- the environment by forming neural connections during the crit-
ber of groups around the world have built large scale multichip ical learning period (Sur and Leamey, 2001; Horng and Sur,
neuromorphic systems for real time sensory processing with pro- 2006). This kind of adaptation leads to the formation of fea-
grammable network topologies and reusable AER infrastructure ture maps or interconnectivity patterns between hierarchically
(Serrano-Gotarredona et al., 2005; Chicca et al., 2007; Merolla organized layers of the cortices. The lower layers extract basic fea-
et al., 2007; Schemmel et al., 2008). All these approaches can be tures from the input space so that higher layers can extract more
broadly classified into analog, digital or hybrid approaches. The complex features, using the information from the lower layers.
analog approach interfaces well with the real world, emulates bio- Both Nature (genetic biases) and Nurture (environmental fac-
inspired behavior more closely and is most suited for modeling tors) play a crucial role in the formation of these feature maps.
local neural computations. Digital systems on the other hand Different hardware and software approaches have been explored
efficiently exploit addressing mechanisms to emulate long dis- to model self-organization. Each approach has a set of mecha-
tance communication in the brain. Therefore, an amalgamation nisms that exploit the available techniques. While models built
of the digital and analog approaches i.e., the hybrid approach, is in software prefer to use mathematical equations, attempting to
most appropriate for implementing large scale neuromorphic sys- do the same in hardware can turn out to be extremely cumber-
tems. The challenge that now lies ahead is to develop truly brain some (Kohonen, 1993, 2006; Martn-del-Bro and Blasco-Alberto,
like cognitive systems. Systems that can adapt, self-organize and 1995; Hikawa et al., 2007). On the other hand, understanding

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 97


Gupta and Markan Neuromorphic orientation selectivity

the hardware dynamics and then building adaptive algorithms model by Boahen et al. (Taba and Boahen, 2002; Lam et al.,
around it seems to be a more robust approach for building real 2005) uses activity dependent axon remodeling by using the con-
world applications. cept of axonal growth cones and implements virtual connections
To emulate activity dependent adaptation of synaptic connec- by re-routing address events. Their design is biologically realistic
tions in electronic devices, we look towards the developing brain but hardware intensive since they use an additional latency cir-
for inspiration. In the developing brain, different axons connect- cuit to decide the wining growth cone. Therefore, what is needed
ing to a post synaptic cell, compete for the maintenance of their is an approach that is more autonomous in terms of deciding
synapses. This competition results in synapse refinement leading the winner in the competition. Through our approach, that is
to the loss of some synapses or synapse elimination (Lichtman, based on the biologically inspired synapse elimination process,
2009; Misgeld, 2011; Turney and Lichtman, 2012; Carrillo et al., we have attempted to build an analog design that can be used by
2013). Temporarily correlated activity prevents this competition both analog and hybrid systems. The design has minimum hard-
whereas uncorrelated activity seems to enhance it (Wyatt and ware requirements and is capable of self-organized clustering.
Balice-Gordon, 2003; Personius et al., 2007). Moreover, precise Our effort in designing a minimal competitive circuit, the time-
spike timing plays a key role in this process e.g., when activity at staggered Winner Take All (ts-WTA) (Figures 1AD) that exploits
two synapses is separated by 20 ms or less, the activity is perceived the adaptation dynamics of floating gate pFETs (Markan et al.,
as synchronous and the elimination is prevented (Favero et al., 2013) and then using a collective network of these ts-WTA cells to
2012). Apart from the biological relevance, synapse elimination as exhibit orientation selectivity (Markan et al., 2007) is a small yet
a means of honing neural connections is also suitable for imple- significant effort toward bridging the gap between biological phe-
mentation in large scale VLSI networks because in analog hard- nomenon and its neuromorphic equivalent. The simulations were
ware it is difficult to create new connections but it is possible to performed using Tanner T-Spice v13.0 and Cadence Specter v7.1
stop using some connections. Although some digital approaches with BSIM3 level 49 spice models for 0.35 m CMOS process.
work around this by using virtual connections using the Address Section 2 attempts to highlight the salient features of the
Event Representation, however, in purely analog designs for ease ts-WTA circuit and discusses the motivation behind its design.
of management of large scale connections, synapse elimination is Section 3 describes the development of a framework for multi-
best suited. In order to implement synapse pruning we need to dimensional feature selectivity which is then extended to create an
have non-volatile adaptable synapses which are best represented orientation selective cortical cell model that learns and eventually
by floating gate synapse or memresistors (Zamarreo-Ramos recognizes patterns resembling bars of different orientations. In
et al., 2011). While memresistor technology is still in develop- sections 3 and 4, experiments performed on the orientation selec-
ment floating gate transistors have gained widespread acceptance tive cortical cell, that highlight how close the cortical cell is to its
due to their capacity to retain charge for very long periods and biological counterpart, are discussed. Section 5 describes a frame-
the ease and accuracy with which they can be programmed dur- work for diffusive interaction and cluster formation between
ing operation (Srinivasan et al., 2005). Floating gate memories many orientation selective cells that has implications in orienta-
are being used for various applications like pattern classification tion selective map formation. Section 6 includes the results and
(Chakrabartty and Cauwenberghs, 2007), sensor data logging discussion.
(Chenling and Chakrabartty, 2012), reducing mismatch (Shuo
and Basu, 2011) etc. They have also found extensive application 2. TIME-STAGGERED WINNER TAKE ALL
in neuromorphic systems (Diorio et al., 1996; Hsu et al., 2002; A novel CMOS time-staggered Winner Take All (ts-WTA) circuit
Markan et al., 2013). We therefore extend the study of adaptive has been described in Markan et al. (2013). The ts-WTA is built
behavior of floating gate pFETs and demonstrate how this adap- with two arms each representing a weighted connection, imple-
tive, competitive and cooperative behavior can be used to design mented by means of floating gate pFET synapses (Figure 1A)
neuromorphic hardware that exhibits orientation selectivity, a (Rahimi et al., 2002). These arms connect at a common source
widely studied phenomenon observed in the visual cortex. node, Vs . Current through a bias pFET, also connected at Vs ,
Prior efforts toward hardware realization of orientation selec- drives the two arms of the ts-WTA and ensures resource limi-
tivity can be classified into two categories, (1) Ice Cube models, tation. A buffer device (D) separating Vs from Vi is introduced
(2) Plastic models. Ice cube models e.g., the model by Choi to ensure that Vs is not influenced directly by the neighboring
et al. (2005) assumes prewired feed-forward and lateral connec- cells. However, the voltages at Vs and Vi are nearly the same. A
tions. Another similar model by Shi et al. (2006) uses DSP and feedback mechanism modifies the floating gate voltages of the
FPGA chips to build a multichip modular architecture. They use two floating gate pFET synapses as a function of the activation
Gabor filters to implement orientation selectivity. This approach node voltage Vi . The Tunnel (T) and Injection (I) devices, that
provides an excellent platform for experimentation with fea- are a part of the feedback network (Figure 2), transform the Vi to
ture maps, however, it falls short when it comes to compactness appropriate ranges that make tunnel and injection feasible. The
and power efficiency. Moreover, these models do not capture initial floating gate voltages of the two synapses are chosen ran-
the developmental aspects of orientation selectivity. Some plastic domly with a small voltage difference Vfg . Inputs to the cell are
models that try to capture the developmental aspects include the applied in the form of pulses of high (6v) and low (1v) volt-
model by Chicca et al. (2007) that uses a mixed software/hardware age represented by 1 and 0, respectively. A {0,0} input means both
approach to simulate a biologically realistic algorithm on a PC the synapses are stimulated with 1v which is equivalent to say-
that is interfaced with a neuromorphic vision sensor. Another ing they are both off . An input {1,1} means that both synapses

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 98


Gupta and Markan Neuromorphic orientation selectivity

A
Vi C 5.6

Vfg1
5.5
D Vfg2

Floating Gate Voltages (volts)


5.4
Vs
T T
5.3
Vb
Ib
(Vfg)i1 (Vfg)i2 5.2

Isi1 VDD Isi2


5.1

x1 x2
4.9
I
0 1 2 3 4 5

A =Swjxj Time (sec)

B D
ts-WTA

w1 w2

x1 x2

FIGURE 1 | (A) Actual circuit of the ts-WTA learning cell and (B) its abstract floating gate voltages. (D) Starting with nearly equal weak connections (left),
model. In (A) (Vfg )i1 , (Vfg )i2 , and in (B) W1, W2 show the floating gate based the cell strengthens stronger of the two connections at the cost of the other
weighted connections. x1 , x2 are inputs and node voltage Vi is activation of (right, shows both possibilities). Here => connection representing one
the cell which is equivalent to A in (B). (C) Shows ts-WTA evolution of feature => connection representing other feature.

are stimulated with a 6v pulse at the same time, which is how (which means we can express Vi in terms of floating gate volt-
conventional WTA circuits receive inputs. The inputs {1,0} and ages of individual branches under the condition that only one
{0,1} mean that the synapses are stimulated alternately or in an xj is 1 the other is 0 at any given time). In the first part T{Vi }
uncorrelated manner. The ts-WTA is designed to work on this leads to a tunnel voltage Vtun which along with the floating gate
uncorrelated scheme of inputs. When inputs from the sets {0,1} voltage (Vfg ) determines the tunneling current (Itunnel ) and in the
and {1, 0} are applied at x1 and x2 in a random-inside-epoch order second part I{Vi } leads to an injection voltage Vinj which along
(i.e., within an epoch both the synapses are equally stimulated but with Vfg determines the injection current (Iinjection ) (please refer
the order in which they are stimulated is randomized for every to Markan et al., 2013 and Rahimi et al., 2002 for detailed equa-
epoch) competition between the two arms starts taking place. The tions). Injection works by lowering the floating gate voltage, Vfg ,
below equation expresses the adaptation dynamics of the floating thus making the transistor more and more ON whereas tunnel-
gate voltage (Vfg ) of any branch (synapse) as a function of Vfg of ing causes the Vfg to increase gradually causing the pFET to slowly
the stimulated branch drift toward the OFF state. On stimulation by uncorrelated inputs
over a period of time, injection amplifies the voltage difference
 
d Vfg ij         between the two floating gates. Tunnel on the other hand helps
= FT T{Vi }, Vfg ij FI I{Vi }, Vfg ij xj (1) in setting an upper limit on strength of the active connection,
dt and pruning the strength of the inactive connection. According
to Grossberg (1976), Winner Take All action requires that self-
The first part of the equation represents tunneling and the second excitation of a neuron must be accompanied by global lateral
part represents injection feedback. The second part has an addi- inhibition. This occurs in ts-WTA with self-excitation in the form
tional term xj , which is 1 when the pFET is ON and 0 when it of injection and global lateral inhibition in the form of tunnel-
is OFF, taking into consideration that injection works only when ing. If over many epochs, the synapse strengthens more than it
the floating gate transistor is ON whereas, tunneling works at weakens (there is more injection than tunneling), the floating
all times irrespective of the state of the floating 
gate transistor. gate pFET turns more and more ON, but if the synapse weak-
In the first and second parts, Vi is equivalent to f (Vfg )ij xj ens more than it strengthens (tunneling is more than injection)
j

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 99


Gupta and Markan Neuromorphic orientation selectivity

VDD

Vb

Ib

Vs

y1 y2

Isi1 Isi2

FIGURE 3 | Shows the Lazzaros WTA (L-WTA) circuit. This can be


compared with the ts-WTA of Figure 1A. In both the circuits, a bias
transistor in saturation acts like a current source with constant current Ib .
This current ensures resource limitation forcing only one of the input arms
FIGURE 2 | Shows the circuit level description of the Injection (I),
or synapse to survive in the competition. In ts-WTA the inputs x1 and x2
Tunnel (T) and Buffer (D) devices that are a part of the ts-WTA circuit
represent voltage pulses of +6 v and 1 v (1 and 0) applied alternately
shown in Figure 1A. The Injection (I) and Tunnel (T) devices modify the
(time-staggered inputs) to both the arms. The inputs y1 and y2 are voltages
voltage at Vi to appropriate ranges that enable injection and tunneling to
(equivalent to x1 w1 and x2 w2 , respectively) that are applied to the two
occur in the floating gate pFETs. The buffer device (D) shields the common
arms of L-WTA both at the same time. Here w1 and w2 represent the
source node (Vs ) from the loading effect of neighboring cells. The graphs
weights of the two floating gate pFET synapses in the ts-WTA. L-WTA
show how the tunnel (Vtun ) and injection (Vinj ) voltages vary with the
performs instantaneous comparison between the two inputs and does not
common source voltage. Here Vinj(min) is set to 0.65 V, Vtun(max) is set to
have any memory element. The ts-WTA has floating gate pFET based
13.6 V and VDD is 6 v.
memory and Tunnel and Injection feedback devices that modify the floating
gate voltages as a function of response voltage (Vi ). This allows it to
perform competition based on memory of prior activity unlike the L-WTA
then after several epochs it reaches a stage of no recovery where that can only perform instantaneous comparison between two inputs that
are simultaneously applied.
the floating gate pFET completely switches OFF. The synapse that
strengthens more emerges as the Winner. However, ts-WTA has
an additional interesting dimension according to which, if the
weaker connection is stimulated more, then that emerges as the applied at the same time, both ts-WTA and L-WTA behave in
winner. Interestingly, this ts-WTA competition can be extended much the same way. However, ts-WTA brings in an interesting
to any two contrasting input synapses (e.g., Left/Right eye in innovation in the form of long term memory retention using
Ocular Dominance, ON/OFF cells in Orientation Selectivity and floating gate dynamics. So, in fact, the ts-WTA is a learning WTA
Lagged/Non-Lagged cells in Direction Selectivity) to perform fea- cell that is capable of computing a winner based on which input is
ture selectivity. It can also be extended to other modalities like statistically more significant over many epochs unlike the L-WTA
auditory, somatosensory etc. Thus the ts-WTA is a very generic which only computes the winner based on an instantaneous com-
cell and can be an essential core around which different feature parison. Another interesting WTA circuit (inspired by L-WTA)
selectivity models can be built. This is necessary for eventual inte- that incorporates a sense of time by using floating gate transis-
gration of different feature maps into one universal framework. tors has been developed by Kruger et al. (1997). Their motivation
The ts-WTA has been studied under various stimulation schemes to introduce adaptation is to add a fatigue or refraction time to
and has been tested for stability under device parameter variations each cell that wins. Their application is to form saliency maps
(Markan et al., 2013) and is thus a robust circuit which closely where there is a need to ensure that the saliency of all inputs
emulates brain like competition and learning and is therefore is considered and the WTA operation chooses different winners
suitable to build brain like feature maps. at different times instead of just locking on to the most signif-
Amongst the various CMOS WTA circuits that have been icant input. Another interesting variant of L-WTA is the one
designed, Lazzaros WTA(L-WTA) (Lazzaro et al., 1989) has introduced in Indiveri (2001, 2008). In this circuit by using local
gained widespread acceptance (Figure 3). It is an elegant circuit excitatory feedback and a lateral excitatory coupling mechanism
that performs instantaneous comparison between two or more the authors realize distributed hysteresis using which the network
input values and brings about suppression of the outputs asso- is able to lock onto an input with the strongest amplitude and
ciated with lower input values as compared to the highest value track it as it shifts. They have shown an interesting application of
giving rise to Winner Take All action. Our ts-WTA is inspired by this in adaptive visual tracking sensors (Indiveri et al., 2002). Both
L-WTA, however, there are significant differences. In both the cir- these circuits work on the conventional {1,1} or simultaneously
cuits, a current source restricts the amount of current that can applied inputs. They are both ingenious circuits, however, their
flow in the two competing branches. As a result, the branch that motivation and design vary significantly from ours.
draws more current forces the transistor of the other branch to The true strength of the ts-WTA lies in the way it works
switch off, thus emerging as a winner. When both inputs are on uncorrelated inputs or inputs applied staggered over time.

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 100


Gupta and Markan Neuromorphic orientation selectivity

The inspiration for using time-staggered or uncorrelated inputs feature maps in the cortex. Therefore, using this inspiration to
comes from the way the brain is designed. In the brain, many build artificial feature maps in silicon would help us bridge the
pre-synaptic neurons connect to a single post-synaptic neuron gap between actual neural phenomenon and its neuromorphic
through many afferent connections. It is only through correlated equivalent.
or uncorrelated activity between the many pre-synaptic cell affer- The use of ts-WTA to build Ocular Dominance (OD) Maps
ents that the post-synaptic cell can tell to which pre-synaptic has been described in Markan et al. (2013). In order to build a
cell the afferents belong. The activity, from all the afferents of generic framework for cortical feature map formation in neuro-
one pre-synaptic neuron is perfectly correlated whereas between morphic hardware, our ultimate goal, we wanted to extend our
two different pre-synaptic neurons the activities are uncorre- model to a larger input space. Orientation Selectivity (OR), a
lated and this is the basis on which synapse elimination happens property exhibited by neurons in the visual cortex, is a natural
(Stent, 1973). Hence, uncorrelated activity between different pre- extension to OD. OD is the selective preference cortical neurons
synaptic neuron helps the post-synaptic neuron to decide, which show toward inputs from either the left eye or the right eye. The
connection is relevant and which is not. This selection pro- input space in OD is only two dimensional. OR on the other hand,
cess happens over a period of time and not instantaneously and is the selective preference cortical neurons show toward light or
therefore L-WTA is not suitable for such selection that involves dark bars or edges of different orientations. Since orientations can
retaining some information of prior neural activity. One of the vary anywhere from 0 to 180 , the input space is truly multi-
most important aspect of neural information processing is feature dimensional. The following sections describe how from the basic
extraction and formation of feature maps. Formation of feature building block of ts-WTA, we build an adaptable framework for
maps requires that cells with similar feature selectivity cluster multi-dimensional input features and how we extend it to build
together. For this to happen each cell should be able to uniquely an adaptable circuit that is able to learn and eventually respond to
convey its feature preference at its output node which requires different orientations.
that each cell has to be identically stimulated by a selected pat-
tern. Then on the basis of the responses of different cells for 3. ORIENTATION SELECTIVITY
that pattern, cells that are selective to that pattern can be iden- Cells in the primary visual cortex are known to respond to dark
tified. Similarly, by applying other patterns cells can be marked and bright oriented bars. This property of the cortical cells,
for feature selectivity toward those patterns. Therefore, learn- known as Orientation Selectivity, was first discovered by Hubel
ing has to be deferred over an epoch so that all patterns are and Wiesel (1959). Hubel and Wiesel identified the receptive
stimulated once and cells with similar feature preference can clus- fields of Simple Cells in the Primary Visual Cortex and then
ter together. In an L-WTA this is not possible for two reasons. showed bars of different orientations to the eye. Interestingly they
Firstly, because in L-WTA inputs are applied simultaneously or observed that a single cell gave maximum response to a bar of
in the {1,1} manner. This is analogous to applying all patterns only one particular orientation. They also observed that if the
at the same time and hence cells cannot be uniquely identified bar was in the center of the receptive field, it gave the highest
for their feature preference. Secondly, because the L-WTA lacks response. In earlier experiments on retinal ganglion cells and lat-
a mechanism for long term retention of modified weights which eral geniculate nucleus cells (Kuffler, 1953) it was observed that
is needed for forming clusters. The ts-WTA on the other hand is the receptive fields of these cells are divided into 2 parts (cen-
perfectly suited as a learning cell for developing feature maps in ter/surround), one of which is excitatory or ON, the other
silicon. inhibitory or OFF. For an ON/OFF center/surround cell, a spot
It may be apt to mention here that over and above facilitat- of light shown on the inside (center) of the receptive field elicits
ing synapse elimination, time-staggered or uncorrelated inputs spikes, while light falling on the outside ring (surround) sup-
play a major role in the formation of feature maps and this presses firing below the baseline rate. Results are opposite for
has been brought out in many seminal papers in neuroscience. an OFF/ON cell. Hubel and Wiesel were proponents of the the-
For example Weliky and Katz (1997) reported that by artifi- ory that receptive fields of cells at one level of the visual system
cially inducing correlated activity in both the eyes of the ferret, are formed by inputs from cells at a lower level of the visual
they found that the number of cells in the primary visual cor- system, emphasizing that there is a hierarchical arrangement in
tex with clear orientation and direction selectivity was markedly the cortex, where in the higher layers extract statistically rele-
reduced when compared to un-stimulated controls. In a simi- vant information from the lower layers. Hence, they advanced
lar experiment on kittens, Stryker and Strickland (1984) found the theory that small, simple receptive fields could be combined
that segregation in ocular dominance columns was promoted to form large, complex receptive fields. Later theorists also elab-
when neural activity is synchronized in each eye but not corre- orated this simple, hierarchical arrangement by allowing cells
lated between the eyes. In other similar experiments on cortical at one level of the visual system to be influenced by feedback
feature map development in visual (Elliott and Shadbolt, 1998; from higher levels. In their theory of orientation selectivity, Hubel
Jegelka et al., 2006) as well as auditory cortex (Zhang et al., and Wiesel proposed that Simple cells have receptive fields com-
2002) it has been reported time and again that spatiotemporal posed of elongated ON and OFF sub-regions (Hubel and Wiesel,
relation between the inputs to both eyes/ears are the key to for- 1959, 1962), which seem to originate from single synaptic input
mation of feature maps. Hence, it comes as a deduction from from ON and OFF centered lateral geniculate cells. The circu-
the above evidences that uncorrelated or time-staggered activ- larly symmetric receptive fields of neurons in LGN, that excite a
ity is an underlying biological mechanism for the formation of cortical cell, are arranged in a row creating elongated receptive

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 101


Gupta and Markan Neuromorphic orientation selectivity

fields see Figures 4B,C. These elongated sub-fields are sufficient diffusion of leaking chemicals (that lower the threshold of the
for generating a weakly tuned orientation response, which is neighboring cells and make them fire more readily on receiv-
then amplified by local intra-cortical connections. Unlike Ocular ing same stimulus) are biological phenomenon acting in the
Dominance, that seems to develop only after eye opening, ori- brain both before birth and after (Cellerino and Maffei, 1996;
entation selective responses have been observed to be present Elliott and Shadbolt, 1998; McAllister et al., 1999). Models based
in primates, cats and ferrets as early as the first recordings can on this competitive and cooperative behavior have been able
be made (Chapman et al., 1996). However, how the genicu- to explain aspects of feature map formation of both orienta-
late afferents organize themselves into segregated ON and OFF tion selectivity and ocular dominance (Markan, 1996; Bhaumik
sub-regions during the prenatal period, in the absence of visual and Markan, 2000; Bhaumik and Mathur, 2003). Our model
input, is still not clear. Some researchers attribute this develop- is inspired by the three layered model proposed by Bhaumik
ment to spontaneous waves of activity that flow in the retina and Mathur (2003) (see Figure 4A for the abstract sketch of
and LGN affecting cortical development (Mooney et al., 1996), the model). However, there are some differences. While their
and some attribute it to intra-cortical long range connections model aims to describe the formation of oriented receptive
that exist before birth, forming a scaffold for orientation maps fields prior to eye opening, our model also takes into account
that later mature with visual inputs (Shouval et al., 2000). In the influence of visual experience or cortical plasticity observed
order to gauge to what extent, visual experience influences the after eye opening. They use competition based on both pre
development of orientation maps, visual cortex of kittens reared and post synaptic resource limitation and diffusion between
in a single striped environment was studied using optical imag- ON/ON center and OFF/OFF center cells, requiring precise ini-
ing techniques. It was found that even though kittens reared in tial connections between cells. Our resource limitation is only
a striped environment responded to all orientations, however, post synaptic and is enforced by limiting current in the bias
twice the area of the cortex was devoted to the experienced ori- transistor representing the cortical cell. The diffusion in our
entation as compared to the orthogonal one (Sengpiel et al., model happens between all neighboring cells irrespective of their
1999). This effect is due to an instructive role of visual experi- type.
ence whereby some neurons shift their orientation preferences To build a hardware model of a cortical cell that exhibits
toward the experienced orientation. Thus, it is now generally orientation selectivity, from the building block of a single ts-
accepted that although orientation maps are fairly stable at the WTA circuit, systematic scaling up was required. The next section
time of birth, abnormal visual experience can alter the neu- describes how this scaling up was done and how diffusive interac-
ronal responses of a large percentage of cells to the exposed tion between ts-WTA cells was introduced.
oriented contours. Under normal conditions, the prenatal tun-
ing properties of neurons are retained and get refined with visual 3.1. BUILDING A FRAMEWORK FOR MULTIDIMENSIONAL FEATURE
stimulus. SELECTIVITY
A number of models suggesting possible formation of orien- Any attempt at building self-organizing feature maps in hard-
tation selective cells in cortex have been proposed. These have ware, requires neighborhood interaction to happen in such a way
two main shortcomings. First, they employ a Mexican hat cor- that local clusters are formed autonomously. We showed previ-
relation function in the cortex (some use it in the LGN as well ously that this can be achieved by means of diffusive coupling
Miller, 1994). In the developing cortex, it is highly unlikely that between neighboring cells by means of an RC network (Markan
this structure exists (Buzs et al., 2001; Yousef et al., 2001; Roerig et al., 2013). Biologically this happens through leaking chemicals
and Chen, 2002). Second, competition in these models is brought from active neurons and as more recently shown through gap
in through synaptic normalization (multiplicative or subtractive). junction coupling (Li et al., 2012; Mrsic-Flogel and Bonhoeffer,
Normalization has its own associated problems, for linear synap- 2012). In order to extend our design for feature selectivity over
tic weight update multiplicative normalization does not permit multi-dimensional input space, we took four ts-WTA cells and
positively correlated afferent to segregate, while under subtractive connected them in a row, with their outputs tied together in
normalization, a synapse either reaches the maximum allowed a feed-forward manner through MOSFETs (see Figure 5). This
value or decays to zero (Miller and MacKay, 1994). These short- can be understood as a three-layered model where the first layer
comings have brought in the necessity of introducing models is the retina, the second layer is the Lateral Geniculate Nucleus
that are biologically more plausible (Miller, 1996; Elliott and (LGN) and the third layer is the visual cortex. While there is
Shadbolt, 1998). It has been observed that although the hori- one-to-one mapping between cells in layer 1 and layer 2, there
zontal intra-cortical connections are still clustered at birth, the is many-to-one mapping from layer 2 to layer 3 cells, we call
thalamo-cortical connections are well defined (Sur and Leamey, these layer 2 cells the receptive field of that layer 3 cortical cell.
2001). This indicates that the Orientation selectivity observed Therefore, now we have a cortical cell with a 1 4 receptive field.
at birth could be manifesting out of the relatively well devel- Individual ts-WTAs are connected to their neighbors with a 10k
oped thalamo-cortical connections or the receptive fields of the diffusive resistor (RD ). The output of the cortical cell is fed back
cortical cell. These findings suggest the existence of some com- to the individual ts-WTA cells, through a resistive feedback net-
mon biological mechanisms that could be responsible for the work (RF ), also of 10 k, as can be seen in Figure 5. The purpose
emergence of receptive field structure and thus orientation selec- of these resistances (RF ) is to reinforce the initial bias so that the
tivity in the visual cortex. It has been shown that competition responses of the cells become fine-tuned ensuring that the pat-
for neurotropic factors and neighborhood cooperation through tern learnt is one of the applied patterns. The diffusion capacitor

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 102


Gupta and Markan Neuromorphic orientation selectivity

A B Cortex

LGN
Cortex

on-cell
off-cell Receptive field (RF)

Elongated ON center OFF surround Receptive Field

C Cortex
LGN

LGN
on-cell
off-cell

Retina

Elongated OFF center ON surround Receptive Field

FIGURE 4 | (A) Shows the three layer abstract feed-forward model of innervate at a single cortical cell forming its receptive field. (B) Shows the
Orientation Selectivity. The first layer, retina, is the layer that receives inputs. elongated ON-Centered, OFF-Surround receptive field of a cortical cell
The second layer is the LGN. There is one-to-one mapping between retina (Inspired by Hubel and Wiesels model of Orientation Selectivity). (C) Shows
and LGN cells. The third layer is the cortex. Many LGN ON/OFF center cells the elongated OFF-Centered, ON-Surround receptive field of a cortical cell.

VDD
mo Vbias OR Cell Output (out) Layer 3
(Cortex)
Ro
Diusion node (dno)
5v
RF RF CD RF RF

RD RD RD Layer 2
(LGN)

Layer 1
(Retina)
ON-Centered synapse OFF-Centered synapse

FIGURE 5 | Shows 4 ts-WTA cells connected in a row by means of diffusive The activation node of each cell (Vi ) is connected at the diffusion node, dno,
resistors (RD ). The output of each cell (Vs ) is connected in a feed forward with feedback resistances (RF ). This forms the feedback network of the cell. A
manner using mosfets with their drains connected together at node out which small resistance Ro connects out and dno to keep both these voltages nearly
is the feed forward path conveying the self activation or response of the cell. the same. The bias transistor mo represents the cortical cell. Here VDD is 6 v.

(CD ) connected at node dno, is of 10 pF. To achieve cluster forma- connected diffusively in a ring fashion (not shown in the figure).
tion on a larger scale, it is important to achieve cluster formation This ensures that the receptive field develops into only one of the
locally. To ensure formation of local clusters within the set of four four patterns (0011), (1100), (0110), (1001), in which 11 and 00
ts-WTA cells, the first and fourth ts-WTA of the cortical cell are are always clustered. We took 2 such cortical cells with a 1 4

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 103


Gupta and Markan Neuromorphic orientation selectivity

(ts-WTAs) receptive field. The development of the receptive fields on whether a branch is favored by the initial conditions more
was analyzed in two situations. First, when the cortical cells are in or is stimulated more or both, either the ON-Centered or the
isolation and second, when they are diffusively coupled with each OFF-Centered branch wins. The resultant receptive field (i.e., the
other (Figures 6.1A,B). floating gate voltage profile of each branch of the four ts-TWAs)
Both the cortical cells are stimulated with the same random- looks like one of the input patterns applied. Figure 6.2A repre-
inside-epoch order of input patterns, however, their initial biases sents the 1 4 receptive field of cortical cell 1 and Figure 6.2B
(initial floating gate voltages of LGN cells/layer 2) are different. represents the 1 4 receptive field of cortical cell 2 when they
The initial biases are randomly generated floating gate volt- develop in isolation. When the two cortical cells are isolated, their
ages varying between 5.15 and 5.16 v. We assume that the left receptive fields evolve into different patterns. Here cell 1 s recep-
branch of each ts-WTA represents an ON-Centered synapse tive field has evolved into 1100 whereas cell 2 s receptive field
and the right branch represents an OFF-Centered synapse. has evolved into 0011. However, in the second case, when the
The inputs patterns are from the set (1100/0011), (1001/0110), two cells are diffusively coupled, their receptive fields evolve into
(0110/1001), (0011/1100), (1010/0101), (0101/1010). The nota- similar patterns (1100) (Figures 6.3A,B). This happens because
tion (1100/0011) means that when the ON-Centered synapses the diffusive node (dno) voltage, of the two cells becomes cou-
(left branches) of the four ts-WTAs of a cortical cell are stimulated pled. When the input patterns are applied, if one of the cells has a
by 1100 the OFF-Centered synapses (right branches) are stimu- stronger bias for a particular input pattern the voltage at its node
lated by 0011 (as described in section 2, 1 here represents a high dno becomes high. Since both the cells receive the same random-
voltage (+6 v), and 0 represents a low voltage (1 v) applied for inside-epoch order of inputs, the other cell also experiences this
0.02 s). This is to emulate time-staggered or uncorrelated inputs. raised voltage at its node dno for the same pattern. The feedback
Please note that the patterns 0001 and 1000 are omitted from the resistors convey this high response back to the tunnel (T) and
set because they have unequal number of 0 s and 1 s and thus injection (I) devices (Figure 1A) which modify the floating gate
do not stimulate both the branches equally). When the input voltages of all the ts-WTA cells reinforcing this pattern on them.
patterns are applied in a random-inside-epoch fashion, competi- Over many epochs, the difference between the ON-Centered and
tion between the two arms of each ts-WTA cell begins. Depending OFF-Centered branches of each ts-WTA cell gets amplified and

A VDD VDD
Cell 1 Vbias Vbias Cell 2
Diusion node (dno)
5v 5v
RF CD RF
RF RF RF RF CD RF RF

RD RD RD RD RD RD

VDD VDD
Vbias Vbias
B
Cell 1 Cell 2
RDIFF
5v 5v
RF CD RF
RF RF RF RF CD RF RF

RD RD RD RD RD RD

FIGURE 6.1 | Shows 2 cortical cells Cell 1 and Cell 2 with a 1 4 a resistance RDIFF for diffusive interaction. Figures 6.2, 6.3 show how
(ts-WTA) receptive field. In (A) the two cells develop independently. In the receptive fields (floating gate voltages) evolve for the two cells in
(B) the two cells are connected at the diffusion node (dno) by means of both the situations.

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 104


Gupta and Markan Neuromorphic orientation selectivity

Cell 1

5.2 5.2 5.2 5.2


Floating Gate Voltages (V)

5.0 5.0 5.0 5.0

4.8 4.8 4.8 4.8


0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0

Time (0-20s)
B
Cell 2

5.2 5.2 5.2 5.2

5.0 5.0 5.0 5.0

4.8 4.8 4.8 4.8


0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0

Time (0-20s)

FIGURE 6.2 | Shows the development of floating gate voltages of cells develop differently according to individual initial biases and inputs.
the two cortical cells of Figure 6.1A. Here blue represents the (A) Shows the four ts-WTAs of Cell 1. The pattern of receptive field
floating gate voltage of the ON-Centered synapse and green is 1100 and (B) Shows the four ts-WTAs of Cell 2. The receptive
represents the floating gate voltage of the OFF-Centered synapse. The field has evolved into 0011.

the floating gate voltages get developed for the pattern that evoked 3.2. ORIENTATION SELECTIVE CELL MODEL AND SIMULATION
the highest response at node dno during the initial few epochs. The previous section described the architecture of a cortical cell
Towards which pattern the competition tilts occurs depends on with a receptive field of 1 4 (ts-WTA) LGN cells. These cells
the initial biases and the patterns applied and can be changed by when connected on an RC grid show diffusive interaction and
changing either. Hence, promising results in the form of coopera- cluster formation. The Orientation Cell model has a similar three
tion between neighboring cells are visible when two cortical cells layer topology, with retinal, LGN and cortical cells except that
are diffusively coupled. instead of a 1 4 receptive field, the orientation selective corti-
To see neighborhood cooperation and cluster formation on cal cell has a two dimensional, 9 9 (ts-WTA), receptive field.
a larger scale we then diffusively connected 10 cortical cells, However, with some differences in component values to balance
each with a 1 4 receptive field, with the tenth cortical cell out the effect of a larger neighborhood. The values of the diffu-
connected to the first in a ring fashion. By giving all the cells sion, (RD ), and feedback resistances, (RF ), are now of 1k ohm
different initial biases but subjecting them to the same sequence each. A 3 3 simplified subsection of the circuit representing
of random-inside-epoch patterns, interesting cluster formation the receptive field of the cortical cell is shown in Figure 8. The
was observed (see Figure 7). Figure 7A shows the develop- capacitance connected at the node dno is 10 pF. The feed-forward
ment of the 10 cortical cells in isolation whereas Figure 7B MOSFETs connecting the common source nodes of the individual
shows their development under diffusive interaction. In the lat- ts-WTA cells to the cortical cell (bias transistor mo ) ensure that the
ter, two clusters of different patterns (or feature preference) self-activation of each cell is conveyed appropriately at the OR cell
are clearly visible. Between two opposite feature preferences output, however, since there cannot be any current in the reverse
(0011 and 1100), there is gradual variation between the fea- direction, the OR cells output will not affect the common source
ture preferences (1001) [see Figure 7B cells 2, 3, and 4 (rows voltage at each ts-WTA. The purpose of the diffusive and feedback
2, 3, and 4 from the top)]. With this idea of extendibility resistances remains the same i.e., to ensure proper neighborhood
to multi-dimensional inputs and a framework for neighbor- interaction and to fine tune the cells response, respectively.
hood interaction and clustering in place, we now build the A set of input patterns resembling ON-Centered and OFF-
circuit of a cortical cell that is capable of adapting and self- Centered oriented bars of angles 0 , 45 , 90 , and 135 were
organizing, to become selective to patterns resembling different created (Figure 9A). Each pattern comprises of 9 9 blocks in
orientations. which a bright block means stimulation with a +6 v pulse given

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 105


Gupta and Markan Neuromorphic orientation selectivity

A
Cell 1

5.2 5.2 5.2 5.2


Floating Gate Voltages (V)

5.0 5.0 5.0 5.0

4.8 4.8 4.8 4.8


0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0

Time (0-20s)
B
Cell 2

5.2 5.2 5.2 5.2

5.0 5.0 5.0 5.0

4.8 4.8 4.8 4.8


0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0 0.0 6.7 13.3 20.0

Time (0-20s)

FIGURE 6.3 | Shows the development of floating gate voltages of the original response to become similar to cell 1. (A) Shows the unchanged
diffusively coupled cortical cells in Figure 6.1B. Cell 1 which seems to response of cell 1 (1100) and (B) shows the response of cell 2 under strong
have a stronger bias influences the development of Cell 2 which modifies its influence of neighborhood (1100).

for 0.02 s and dark block means stimulation by a 1 v pulse given random-inside-epoch manner, transient analysis on the circuit is
for the same duration. To ensure that the learning is not biased performed for 80 epochs. As the simulation progresses, the synap-
towards the order in which patterns are applied, these bars were tic connections from the ON-Centered and OFF-Centered LGN
applied in a random-inside-epoch manner, however, with two cells to the cortical cell compete and only one of the connections
constraints. (1) within each epoch when the left synapses of the survives, the other gets eliminated (ts-WTA action). The local
9 9 ts-WTA receptive field are stimulated with an ON-Centered interaction between LGN cells is both competitive and cooper-
oriented bar input, the right synapses are stimulated with the ative. Competitive because of resource limitation in each ts-WTA
same orientation but with an OFF-Centered oriented bar. This is cell, where only one of the connections (either ON-Centered or
analogous to applying uncorrelated inputs to each ts-WTA branch OFF-Centered) survives and cooperative by means of diffusive
and (2) just after that, this order is reversed, meaning, the left interaction between the neighboring ts-WTA cells, implemented
synapses are now stimulated with the OFF-Centered oriented bar by means of diffusive resistive coupling (RD ) of the 9 9 ts-WTA
and the right branches with the ON-Centered oriented bar of the cells, in a way similar to the Ocular Dominance model implemen-
same orientation angle. This is analogous to applying an orienta- tation. Details on the feedback mechanism acting on the floating
tion grating like input pattern that is necessary for orientation gate pFETs in the individual ts-WTA cells and Ocular Dominance
map formation. Gratings ensure that all the cells in the 9 9 Map formation can be found in Markan et al. (2013). The ori-
receptive field are stimulated with the same oriented bar. This is entation input pattern for which the voltage at node dno is the
necessary for cluster formation since clusters are formed when highest or a pattern that is statistically more significant gets rein-
the cells group together according to similar feature preferences forced through the feedback resistors (RF ) and the injection and
and whether two cells have the same feature preference or not can tunnel feedback mechanisms of each ts-WTA cell (as discussed
be known only when they receive the same inputs. Interestingly, in the case for a 1 4 receptive field) and we say that the cell is
the prenatal brain, when external inputs are absent, retinal waves selective to that particular orientation. Multiple simulations per-
have been identified to play the role of grating like input pat- formed with different random initial biases of LGN cells (floating
terns that help in building a scaffold for orientation selectivity gate voltages) and different random-inside-epoch order of input
even before birth (Wong, 1999; Akerman et al., 2002). On the patterns result into the cell learning different oriented patterns
onset of simulation, the receptive field of the orientation selective with equal likelihood of learning any one of the applied eight pat-
cell i.e., 9 9 LGN cells are given random initial biases within terns. A statistical analysis over 100 simulations is presented in
5.155.16 v. By applying the eight different input patterns in a Tables 1A, 1B. The results show that each of the eight patterns

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 106


Gupta and Markan Neuromorphic orientation selectivity

A B

1 1

2 2

3 3
Floating gate voltages (4.5 - 5.5v)

Floating gate voltages (4.5 - 5.5v)


4 4

5 5

6 6

7 7

8 8

9 9
4.5 - 5.5v
4.5 - 5.5v

10 10

0-40s 0-40s
Time (0-40s) Time (0-40s)

FIGURE 7 | (A) Shows the development of floating gate voltages of 10 (1 4) differently according to individual initial biases and inputs. (B) Shows the
ts-WTA cells in isolation. Here Cell 1 is top row, Cell 2 is 2nd row and so on. same 10 cells when they interact diffusively. Near neighbor cells begin to
The black, white and gray squares represent the feature preference of the cluster developing similar feature preference. Between two opposite patterns
ts-WTAs. Black represents an OFF-Centered cell, white represents an (e.g., 1100 and 0011) , there is a gradual variation (1001), see responses of
ON-Centered cell and gray represents an unbiased cell. The cells develop Cells 2, 3, and 4.

is learnt at least 10% of the times. A video of how the receptive different stages of receptive field development. The development
field of the orientation cell evolves, starting from initial random of orientation tuning is clearly visible from the shape of the curve.
biases of LGN cells to an oriented bar pattern, can be found in the Initially the cell responds equally to all orientations, depicted by
supplementary material. the nearly flat curve, gradually becoming selective to only one,
represented by the rising peak at one of the orientations. The Half
3.3. ORIENTATION TUNING AND PERFORMANCE UNDER ABNORMAL Width at Half Height (HWHH) was computed for each receptive
STIMULATION field for the 100 simulations mentioned in the previous subsec-
Experiments done on many mammals demonstrate that during tion. For the receptive fields that were not very finely tuned or they
the early postnatal periods, the recording over a cortical neuron seemed to be close to more than 1 input patterns e.g., receptive
shows nearly equal response to many orientations or only slight field (5,5) in Table 1A, the HWHH was computed for each case
bias toward a particular orientation. If the response of the cell and the receptive fields were categorized (see Table 1B) accord-
is plotted against different orientation angles, it is a flat curve ing to the lower HWHH value. The best HWHH, i.e., the HWHH
showing faint selectivity to many different orientations. As the for a highly tuned receptive field e.g., (1,4) in Table 1A is 30 and
orientation selectivity of the cell develops, as a result of stim- worst HWHH is 40 for a receptive field similar to (5,5).
ulus dependent activity, the tuning curve becomes sharper at a Some experimental results also suggest that if on the onset
particular orientation (Somers et al., 1995; Dragoi et al., 2000; of vision, animals are reared in an abnormal environment such
Seris et al., 2004). Similar orientation tuning is exhibited by as one with only single stripes, the orientation tuning of a large
our orientation selective cell. Once the cell has learnt a partic- number of cells, that were initially tuned to different orientations,
ular orientation i.e., the floating gate voltages of the cell have adjust their tuning to respond to the orientation of the striped
matured, the injection and tunnel voltages can be modified in environment in which they are reared (Sengpiel et al., 1999;
a way that stops further learning, see learning rate parameter Yoshida et al., 2012) and the cortical space that was initially shared
in Markan et al. (2013). The cells response to any orientation equally by all orientations now becomes exceedingly large for the
can then be obtained by observing the output node voltage (OR orientation shown. In other words the orientations shown take up
cell output node) on the application of that oriented pattern as the cortical space of the orientations that were never shown. To
input. Figure 9B shows the orientation tuning curve of our cell at test if similar behavior is shown by our orientation selective cell,

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 107


Gupta and Markan Neuromorphic orientation selectivity

OR Cell output
6v
(out) mo Vbias A R D = 1k
R o = 1
dno out

5v

B Diffusion Node D
(dno)

R F = 1k
C

FIGURE 8 | Simplified and distributed layout of a 3 3 portion of the recorded. (C) Shows the diffusive resistance network consisting of RD , which
9 9 receptive field of our orientation selective cell. (A) Shows the connects the ts-WTA cells to all their neighbors. (D) Shows the feedback
symbolic representation of a ts-WTA cell. In subsequent figures, the gray resistive network consisting of RF that feeds the output of the cell from dno
square represents a ts-WTA. (B) Is the feed-forward MOSFET network that back to the individual ts-WTAs. Out and dno are connected by Ro which can
takes the output of the individual ts-WTAs and feeds them to the OR Cell be replaced by a buffer device discussed in section 5.2. (see Figure 5 for the
output. This is a read out node from where self-activation of the cell can be lateral view, this is a top view).

A B Applied input patterns

5.4

Development of Oriented Receptive Field


HWHH = 30o

5.35
Response Voltage ()volts)

5.3
HWHH = 40o
5.25

5.2

5.15
-100 -50 -40 -30 0 50 100
Angles (degrees)

FIGURE 9 | (A) Shows the input patterns that are applied to the orientation pattern as can be seen from the sharpening of the tuning curve. The half
cell. (B) Shows the orientation tuning curve. Initially the response of the cell width at half height (HWHH) parameter for the best and the worst receptive
is low and similar for all input patterns. As the receptive field develops (see field has been marked. The sharper the tuning, the lower is the value of
on the right, bottom to top), there is increased response toward that specific HWHH.

two sets of experiments were performed. In the first experiment, probability. Now, for the same set of initial conditions, we applied
for 20 different initial conditions, 8 different orientation patterns only six patterns (two horizontal patterns, 1 ON-Centered and
were applied. It was found that for 20 simulations, the receptive one OFF-Centered were omitted). The results are summarized
fields developed into one of the eight patterns, with nearly equal in Tables 2, 3. It was observed that for 20 simulations, the cell

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 108


Gupta and Markan Neuromorphic orientation selectivity

Table 1A | Results of 100 simulations of the orientation selective cell performed with different random initial biases and different
random-inside-epoch inputs.

1 2 3 4 5 6 7 8 9 10

10

now developed according to the 6 patterns applied with nearly of adaptive plasticity to accommodate abnormal inputs may not
equal probability. Therefore the space that was earlier occupied by be possible in the model by Bhaumik and Mathur (2003) since
eight patterns was now equally distributed amongst six patterns. their model does not take into account the effect of external
The cell demonstrated adaptive cortical plasticity by developing stimulation.
receptive fields according to the applied patterns. However, if the
initial biases very strongly favor one of the missing patterns, like in 3.4. ANALYZING THE EFFECT OF NATURE Vs NURTURE
Table 3, 2nd row 4th column, the receptive field develops accord- It is known that both Nature (genetic biases) and Nurture
ing to the initial bias rather than the applied patterns. This kind (environmental factors) play an important role in feature map

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 109


Gupta and Markan Neuromorphic orientation selectivity

Table 1B | Analysis of 100 simulations.

Orientation Receptive Field

Appearance
(no. of times) 10 19 12 11 13 12 13 9
in 100 simulations

*The evolved receptive field sometimes resembles two different orientations. In such cases the response towards both the orientations was noted and HWHH was
computed in each case. The categorization was done on the basis of the lower HWHH.

Table 2 | Summary of 20 simulations of orientation selective cell with all 8 oriented patterns applied as inputs.

Table 3 | Summary of 20 simulations of orientation selective cell with horizontal patterns missing.

Table 4A | Summary of 20 simulations of orientation selective cell with same initial conditions but different random-inside-epoch order of
input patterns.

formation. To understand how our orientation selective cell a different oriented pattern, highlighting that stimulus driven
responds to nature (initial biases) vs nurture (pattern stimu- activity can override the orientation bent due to the initial float-
lation) and to gauge how close it is to biology, two sets of ing gate voltages in most of the cells. In the second experiment,
experiments were performed. In the first experiment, repeated the random-inside-epoch order in which inputs are applied was
simulations were performed by keeping the initial biases over the kept constant (creating preference for one of the patterns) over
9 9 LGN cells the same, but changing the random-inside-epoch all the simulations but the initial biases were changed every time.
order of input patterns over all the epochs. Statistical analysis It was observed that although 70% of the times the cell devel-
over 20 simulations showed that 80% of the times the cell learnt oped the same oriented receptive field, but 30% of the times it

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 110


Gupta and Markan Neuromorphic orientation selectivity

Table 4B | Summary of 20 simulations of orientation selective cell with different initial conditions but same random-inside-epoch order of
input patterns.

B Response to spatial frequency


Input patterns with different spatial frequencies
1 1
60

A 2

3
2

3
50

4 4 40

5 5
30

6 6

20
7 7

8 8
10

9 9

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

1 1 60

2 2

50

3 3

OR Cell
4 40
4

5 5
30

6 6

20
7 7

8 8
10

9 9

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

D Response to periodic patterns


1 1

C Input patterns in the form of gratings of different orientations


60

2 2

50

3 3

4 40
4

5 5
30

6 6

20
7 7

8
8 10

9
9

1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9

1
1 60

2 2

50

OR Cell
3

4 4 40

5 5
30

6 6

20
7 7

8 8
10

9 9

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

FIGURE 10 | Shows the response of the Orientation Cell to patterns of Each simulation results in the circuit learning one of the input patterns with
different spatial frequencies and periodic patterns of different equal probability. (C) Shows periodic patterns of different orientations that are
orientations. (A) Shows patterns of different spatial frequencies that are applied as inputs to the OR cell. (D) Shows the periodic patterns that the cell
applied as inputs to the OR Cell. (B) Shows the patterns that the cell learns. learns.

did learn other patterns. This experiment brings out that it is way as described in section 3.2 except for the new input patterns
not just the input patterns applied, but the unique combination that have orientations of different spatial frequencies. We took
of the inputs and the initial biases that decides which oriented only two spatial frequencies (low and high). Repeated simulations
pattern the cell would learn or become selective to, bearing close resulted in the cell learning orientations of different spatial fre-
analogy to experimental findings. The results are summarized in quencies (Figure 10B). However, it was observed that the learning
Tables 4A, 4B. time of the cell increased as compared to when all inputs are of the
same spatial frequency.
4. RESPONSE TO SPATIAL FREQUENCY AND PERIODIC Certain cells in the visual cortex are also known to be selec-
PATTERNS tive to periodic patterns (Von der Heydt et al., 1992). These
Cells in the primary visual cortex are also known to respond to the cells respond vigorously to gratings but not so much to bars
spatial frequency of visual inputs (Maffei and Fiorentini, 1973; or edges. Since these cells are not sensitive to the spatial fre-
Tootell et al., 1981; De Valois et al., 1982; Everson et al., 1998). quencies of the gratings but are only specialized for detection
Some cells respond to low spatial frequencies, some to high spatial of periodic patterns, they seem to have a role in the perception
frequencies, essentially forming spatial low pass, band pass and of texture. In order to test if our circuit could have a similar
high pass filters that act on the visual inputs. To test if our cell response to periodic patterns, we presented our circuit with input
could also be selective to the spatial frequency of applied inputs, patterns that resembled gratings of different orientations (see
we presented the circuit with patterns of different spatial frequen- Figure 10C). After several epochs of it was observed that the cells
cies (Figure 10A). The simulations were performed in the same receptive field developed according to one of the grating patterns

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 111


Gupta and Markan Neuromorphic orientation selectivity

A B

FIGURE 11 | In order to isolate the OR cell output (out) which conveys buffer device. The device is linear, and has a double inverting effect on the
the self-activation of the cell, from the diffusion node (dno) at which voltage at node out. The VDD is 6 v. (B) Shows a typical design of the
other orientation cells connect and to prevent loading of node out, a buffer device. (C) Shows an abstract symbol for orientation selective cell
buffer device is created. (A) Shows the characteristic response of the along with the buffer device.

(Figure 10D). Repeated simulations with different initial biases is capable of generating an appropriate signal that can modulate
and different random-inside-epoch order of inputs resulted in the the development of feature selectivity of a cell in concordance
cells receptive field evolving into one of the eight grating pat- with other cells in the cluster.
terns with equal probability. These experiments show that the Diffusive-Hebbian learning based on the biological phe-
cell developed is generic and is extendable to recognizing many nomenon of reaction-diffusion has been shown to be effective
different patterns. in forming clusters of cells with similar feature preference and
has also been used to model Ocular Dominance and Orientation
5. DIFFUSIVE INTERACTION OF CELLS Selectivity Map Formation (Markan, 1996; Krekelberg, 1997;
Feature map formation is based on three important tenets: conti- Markan and Bhaumik, 1999; Bhaumik and Markan, 2000;
nuity, diversity and global order. Continuity requires that nearby Bhaumik and Mathur, 2003). Biologically, this happens by means
cells share the same feature preference. Diversity means that there of leaking chemicals coming out of an active cell, that lower
is equal representation of all possible feature preferences and the threshold of the neighboring cells. Reaction-diffusion can be
global order implies that there is a periodic organization of differ- easily implemented by an RC network as shown in Shi (2009)
ent features over the entire cortical surface. Literature sites several and Markan et al. (2013). The development of individual cells
mechanisms that coordinate the development of feature selectiv- and cells under diffusive interaction varies significantly. If the
ity of single cells under neighborhood influence (Grossberg and cells have different initial biases then in the absence of diffu-
Olson, 1994). The essence of these mechanisms is that if cells have sive coupling they develop into cells with different orientation
overlapping receptive fields and they receive similar inputs, then preferences. On the other hand, the presence of diffusive cou-
if they can be forced to have similar responses, Hebbian Learning pling causes nearby cells to have a similar voltage (at node
mechanism will ensure that the individual cells receptive fields dno) and hence the injection and tunnel feedback that they
develop to form clusters. As discussed earlier, this poses certain receive is also the same. Therefore, if the two cells receive sim-
requirements on the behavior of the learning cell and the neigh- ilar inputs, they develop to have similar feature preference. The
borhood function. Firstly, it demands that the learning cell should stronger cell (the cell that generates a higher voltage at node
allow modulation of its feature selectivity under neighborhood dno) tends to influence the development of the weaker cells
influence. Secondly, it demands for a neighborhood function that around it.

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 112


Gupta and Markan Neuromorphic orientation selectivity

5.1. MODIFICATION OF ORIENTATION TUNING UNDER once the difference between the floating gate voltages of the two
NEIGHBORHOOD INFLUENCE arms of the ts-WTA becomes large, it cannot be reversed.
As discussed earlier, for any map formation, diffusive interaction It may be noted that the diffusion node (dno) voltage varies
between cells should happen in such a way that it leads to between 5.1 and 5.4 volts as the receptive field develops. After
formation of clusters of cells having similar feature selectivity. development, the response of a developed cell to the pattern that it
This is possible if some of the cells change their feature preference favors, measured at the diffusion node (dno) is around 5.4 volts.
when they are surrounded by strongly biased cells, forming clus- Interestingly, if we apply 5.4 volts to node dno externally, for a
ters showing gradual variation in orientation selectivity between pattern of our choice, and do this repeatedly, the circuit begins
clusters. This means that some kind of mechanism needs to be to develop preference for that orientation instead of its natural
present that helps the cell in overcoming its initial orientation bias bias. Therefore, the receptive field development of the orientation
to develop an orientation preference according to the neighbor- selective cell can be modulated externally by applying appropri-
hood influence. In our orientation cell this is achieved by ensuring ate voltage at the dno node of the cell for a particular pattern.
two things. (1). Keeping the time constant of the diffusive RC This way we force a high response for a pattern of our choice,
network (Diffusion ) much smaller than the time constant of the which causes the feedback mechanism to reinforce the desired
orientation cell (Reaction ) and (2). Limiting the amount of learn- pattern on to the individual ts-WTA cells in the 9 9 receptive
ing in each iteration by applying input patterns for a very short field. It was observed that as the floating gate voltages become
duration (0.02 ms). The first condition ensures that diffusion has more developed (developed floating gate voltages mean that the
precedence over reaction and the strong neighborhood influence difference between the floating gate voltages of the two synapses
is able to modify the individual bias of an orientation cell and the of the individual ts-WTA cells has become large) it becomes diffi-
second condition makes sure that the learning in the orientation cult to modulate the orientation preference of the cell. For fully
cell is at a pace that is suitable for diffusion to influence its devel- developed floating gate voltages, i.e., strong orientation prefer-
opment i.e., the floating gate voltages are allowed to change by ence, modulation does not happen at all, and the cells preserve
only a small amount in every iteration. This is required because their original response as expected. Details of how the floating

A B

C D

FIGURE 12 | (A) Shows the independent development of receptive fields of preference. (C) Two more example of cells developing independently under
three orientation selective cells with different initial biases and same random the same random inside epoch order of inputs but different initial biases.
inside epoch order of inputs. (B) Shows the development of the same three (D) Shows the development of the same cells as (C) under diffusive coupling.
cells with the initial conditions and order of inputs same as (A), but with Diffusion causes the cells to develop the same feature preference in each
diffusive interaction between neighbors. All the cells develop similar feature case.

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 113


Gupta and Markan Neuromorphic orientation selectivity

gate voltages vary during unlearning and the influence of injec- develop under two conditions, (1) independently, i.e., without
tion and tunnel voltages are examined critically in Markan et al. any diffusive interaction and, (2) with diffusive interaction. As
(2013). discussed previously, the voltage at node dno affects the feedback
that regulates the response of the cell. If we connect two orien-
5.2. BUFFER DEVICE FOR DIFFUSIVE COUPLING tation cells at the diffusive node (dno) by means of a resistance,
When more than one orientation cells are connected with each then on receiving similar inputs, the cell with the higher voltage
other diffusively using resistances at the diffusion node (dno), the at node dno, starts to influence the response of the other cell by
increased current at the node dno tends to undesirably load the making the injection and tunnel feedback mechanisms of both
output node or OR cell output (out) (Figures 5, 8). Since the OR the cells similar, thus enforcing the same pattern on each of the
cell output (out) node conveys the self-activation of each cell, this cells. By changing the value of the diffusion resistance (by increas-
value should not get altered. In order to avoid this loading effect, ing the resistance we reduce diffusion constant and by reducing
we designed a buffer device (B) that shields the orientation cell its value we increase the diffusion) we can modify the extent of
output (activation) from the excessive current coming to the node interaction we want between the cells. Several experiments were
dno of each orientation cell from other diffusively coupled cells. performed with different diffusion constants, different biases and
This device ensures that the self-activation (feedforward network) different inputs. Each time for moderate(300> Rdiff >100 Ohms)
of the orientation cell driving the voltage at node out can influ- and high values of diffusion constant(100 > Rdiff >0 Ohms), it
ence the voltage at node dno, that drives the feedback network, was found that the response of the two cells became similar.
but node dno cannot influence the voltage at node out directly. To which side the orientation preference tilts is dependent on
This buffer device is essentially a linear device that inverts the which cell has a stronger bias. The simulations were done for two
voltage at the OR cell output (out) twice and feeds it to the dno and three cells connected in a row. Figure 12 shows some of the
node (see Figures 8, 11). This way current only flows in one direc- interesting results. Irrespective of the way the cells develop inde-
tion, i.e., out of OR cell output node and not into it. A typical pendently, whether one is ON-Centered and other OFF-Centered,
design of the buffer device is shown in Figure 11B, however, any whether their orientation preferences are totally opposite of each
other device performing the same function can be used as well. other i.e., 135 and 45 , with diffusion, they become selective to
the same orientation. It is important to note that the lateral diffu-
5.3. SIMULATION OF DIFFUSIVE INTERACTION BETWEEN CELLS sive network and the feedback network are only important as long
In order to test if our orientation cell fulfills the premise laid down as the learning is taking place and the receptive field of the cells
for diffusive interaction between cells, we performed multiple are developing. Once the receptive fields have evolved, the lat-
simulations with orientation cells having different initial biases eral connectivity i.e., RC diffusive network and the cells feedback
but similar random inside epoch order of inputs, and we let them network become ineffective and the cell works in a feed forward
mode where in on applying a set of inputs, the cell responds
according to its developed orientation preference. The power dis-
sipation also varies according to the learning profile of the cell e.g.,
between two orientation cells connected by a 100 ohm resistance,
the current through the diffusive resistor is maximum (150 A)
during learning but reduces drastically (10 A) once the learn-
ing is over (Figure 13). The power can be reduced by shifting the
whole resistance regime of the cell to larger values but keeping the
necessary ratio between (Diffusion ) and (Reaction ) intact.

6. RESULTS AND DISCUSSION


Time-staggered or uncorrelated inputs have been shown to be
essential for feature map formation (Stryker and Strickland, 1984;
Weliky and Katz, 1997; Buffeli et al., 2002; Zhang et al., 2002).
The time-staggered Winner Takes All algorithm, based on un-
correlated inputs, has previously been shown to be biologically
more realistic and a mechanism underlying formation of Ocular
Dominance Maps (Markan et al., 2013). This paper introduces
the design of a cortical cell that is built using ts-WTA cells
comprising of ON/OFF Centered synapses forming a three lay-
ered structure similar to the visual sensory system in the brain.
FIGURE 13 | Shows the variation of current in the resistance, Rdiff ,
connecting two orientation selective cells. The top boxes show the
On application of patterns resembling different orientations, the
evolution of receptive field of orientation cell 1 and the bottom boxes show floating gate dynamics, the diffusive interaction and the feedback
the evolution of receptive field of orientation cell 2. The current is high regime act in a way that the cell is able develop orientation selec-
(150 A) during the learning phase. Once the orientation has been learnt, tivity. Repeated simulations show that the orientation selectivity
or the floating gate voltages have matured, the current reduces and
develops according to two major factors, initial biases(nature) and
remains constant thereafter.
the inputs applied(nurture) and that there is an equal likelihood

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 114


Gupta and Markan Neuromorphic orientation selectivity

of the circuit becoming selective to any of the eight patterns Science, Dayalbagh Educational Institute, Agra, India for the
applied. Embedded in a RC grid, these orientation selective cells support.
are able to modify their feature preference under strong neigh-
borhood influence to form clusters of cells with similar feature SUPPLEMENTARY MATERIAL
preference. The cell also responds to periodic patterns and spa- A video showing the development of oriention receptive field has
tial frequency just like experimentally observed cells of the visual been made available as a part of the online supplementary data. A
cortex. This is a significant step toward developing neuromorphic document on the Monte-Carlo Analysis of the cell under device
equivalents of biological phenomenon that could have diverse parameter variations has also been provided in the supplementary
applications in artificial vision systems. section. The Supplementary Material for this article can be found
Diffusive hebbian learning based on reaction-diffusion and online at: http://www.frontiersin.org/journal/10.3389/fnins.
competition for neurotropic factors (Markan, 1996; Markan and 2014.00054/abstract
Bhaumik, 1999; Bhaumik and Mathur, 2003), has strong biologi-
cal support as basis to explain local computation and organization
REFERENCES
in the brain. It is now well known that the developing cortex Akerman, C. J., Smyth, D., and Thompson, I. D. (2002). Visual experience before
is a generic neural structure that gets compartmentalized for eye-opening and the development of the retinogeniculate pathway. Neuron 36,
processing different sensory inputs through an adaptive learn- 869879. doi: 10.1016/S0896-6273(02)01010-3
ing process. It therefore becomes important to explore the basic Bartolozzi, C., and Indiveri, G. (2007). Synaptic dynamics in analog VLSI. Neural
Comput. 19, 25812603. doi: 10.1162/neco.2007.19.10.2581
learning paradigms that are active in the brain, which are able to Bhaumik, B., and Markan, C. M. (2000). Orientation map: a reaction diffusion
extract statistically relevant information from the sensory input based model, in Proceedings of IJCNN 2000 (Como, Italy).
space and map it onto the cortex, so that such principles can be Bhaumik, B., and Mathur, M. (2003). A cooperation and competition based sim-
applied in artificial systems. In this sense, the model developed ple cell receptive field model and study of feed-forward linear and nonlinear
is very generic and can be applied to inputs from any sensory contributions to orientation selectivity. J. Comput. Neurosci. 14, 211227. doi:
10.1023/A:1021911019241
modality such as olfaction, gustatory, somatosensory and audi- Buffeli, M., Busetto, G., Cangiano, L., and Cangiano, A. (2002). Perinatal
tory. Some preliminary work also demonstrates the applicability switch from synchronous to asynchronous activity of motoneurons: link
of the model to abstract pattern recognition. In the brain no with synapse elimination. Proc. Natl. Acad. Sci. U.S.A. 99, 1320013205. doi:
sensory system works in isolation. Rather, it is a combination 10.1073/pnas.202471199
Buzs, P., Eysel, U. T., Adorjn, P., and Kisvrday, Z. F. (2001). Axonal topography of
of sensory inputs to different sensory modalities that the brain
cortical basket cells in relation to orientation, direction, and ocular dominance
responds best to. Eventual integration of features maps, corre- maps. J. Comp. Neurol. 437, 259285. doi: 10.1002/cne.1282
sponding to different sensory systems, onto a common platform Carrillo, J., Nishiyama, N., and Nishiyama, H. (2013). Dendritic translocation
could act as a database for higher cognitive algorithms to work establishes the winner in cerebellar climbing fiber synapse elimination. J.
on. The work presented in this paper is a small yet significant Neurosci. 33, 76417653. doi: 10.1523/JNEUROSCI.4561-12.2013
Cellerino, A., and Maffei, L. (1996). The action of neurotrophins in the devel-
step toward the goal of building truly cognitive neuromorphic
opment and plasticity of the visual cortex. Prog. Neurobiol. 49, 5371. doi:
systems because it presents a novel approach towards incorpo- 10.1016/S0301-0082(96)00008-1
rating adaptability and learning in artificial systems by modeling Chakrabartty, S., and Cauwenberghs, G. (2007). Sub-microwatt analog VLSI
the developmental aspects of feature selectivity and feature map trainable pattern classifier. IEEE J. Solid State Circ. 42, 11691179. doi:
formation in the brain. While reaction diffusion has been able 10.1109/JSSC.2007.894803
Chan, V., Liu, S. C., and van Schaik, A. (2007). AER EAR: a matched silicon cochlea
to address local range, non-axonal interactions in the brain and pair with address event representation interface. IEEE Trans. Circ. Syst. I Reg.
explain how cortical feature maps evolve to a large extent, more Pap. 54, 4859. doi: 10.1109/TCSI.2006.887979
recent research has highlighted the role of gap junctions in lateral Chapman, B., Stryker, M. P., and Bonhoeffer, T. (1996). Development of orientation
information processing in the brain (Hameroff, 2010; Ebner and preference maps in ferret primary visual cortex. J. Neurosci. 16, 64436453.
Hameroff, 2011; Gupta and Markan, 2013). Experiments have Chenling, H., and Chakrabartty, S. (2012). An asynchronous analog self-powered
CMOS sensor-data-logger with a 13.56 MHz RF programming interface. IEEE
revealed that sibling neurons connected by gap junctions develop J. Solid-State Circ. 47, 114. doi: 10.1109/JSSC.2011.2172159
to have the same feature preference (Li et al., 2012; Mrsic-Flogel Chicca, E., Whatley, A. M., Lichtsteiner, P., Dante, V., Delbruck, T., Del Giudice,
and Bonhoeffer, 2012). Since gap junctions can form networks of P., et al. (2007). A multichip pulse-based neuromorphic infrastructure and its
neurons spanning large areas of the cortex, understanding how application to a model of orientation selectivity. IEEE Trans. Circ. Syst. I Reg.
Pap. 54, 981993. doi: 10.1109/TCSI.2007.893509
they function, could give us new insights into multi-modal infor-
Choi, T. Y. W., Merolla, P. A., Arthur, J. V., Boahen, K. W., and Shi, B. E. (2005).
mation processing in the brain. It seems interesting to explore Neuromorphic implementation of orientation hypercolumns. IEEE Trans. Circ.
gap junctions and see how similar behavior can be emulated in Syst. I, 52, 10491060. doi: 10.1109/TCSI.2005.849136
hardware. De Valois, R. L., Albrecht, D. G., and Thorell, L. G. (1982). Spatial frequency selec-
tivity of cells in macaque visual cortex. Vis. Res. 22, 545559. doi: 10.1016/0042-
6989(82)90112-2
ACKNOWLEDGMENTS Diorio, C., Hasler, P., Minch, B., and Mead, C. (1996). A single transistor silicon
This work was funded by research grants to C. M. synapse. IEEE Trans. Electron Devices 43, 19721980. doi: 10.1109/16.543035
Markan, (III.6(74)/99-ST(PRU)) under SERC Robotics Dragoi, V., Sharma, J., and Sur, M. (2000). Adaptation-induced plasticity of orien-
and Manufacturing PAC, and (SR/CSI/22/2008-12) under tation tuning in adult visual cortex. Neuron 28, 287298. doi: 10.1016/S0896-
6273(00)00103-3
Cognitive Science Research Initiative, Department of Science and Ebner, M., and Hameroff, S. (2011). Lateral information processing by spiking
Technology, Govt. of India. The authors wish to acknowledge the neurons: a theoretical model of the neural correlate of consciousness. Comput.
funding sources and the Department of Physics and Computer Intell. Neurosci. 2011:11. doi: 10.1155/2011/xya247879

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 115


Gupta and Markan Neuromorphic orientation selectivity

Elliott, T., and Shadbolt, N. R. (1998). Competition for neurotrophic factors: ocular Li, Y., Lu, H., Cheng, P. L., Ge, S., Xu, H., Shi, S. H., et al. (2012). Clonally related
dominance columns. J. Neurosci. 18, 58505858. visual cortical neurons show similar stimulus feature selectivity. Nature 486,
Everson, R. M., Prashanth, A. K., Gabbay, M., Knight, B. W., Sirovich, L., 118121. doi: 10.1038/nature11110
and Kaplan, E. (1998). Representation of spatial frequency and orienta- Lichtman, J. W. (2009). Its lonely at the top: winning climbing fibers ascend
tion in the visual cortex. Proc. Natl. Acad. Sci. U.S.A. 95, 83348338. doi: dendrites solo. Neuron 63, 68. doi: 10.1016/j.neuron.2009.07.001
10.1073/pnas.95.14.8334 Lichtsteiner, P., Posch, C., and Delbruck, T. (2008). A 128 128 120 dB 15 s
Favero, M., Busetto, G., and Cangiano, A. (2012). Spike timing plays a key role in latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circ.
synapse elimination at the neuromuscular junction. Proc. Natl. Acad. Sci. U.S.A. 43, 566576. doi: 10.1109/JSSC.2007.914337
109, E1667E1675. doi: 10.1073/pnas.1201147109 Maffei, L., and Fiorentini, A. (1973). The visual cortex as a spatial frequency
Grossberg, S. (1976). Adaptive pattern classification and universal recoding: I. analyser. Vis. Res. 13, 12551267. doi: 10.1016/0042-6989(73)90201-0
Parallel development and coding of neural feature detectors. Biol. Cybern. 23, Markan, C. M. (1996). Sequential development of orientation and ocular domi-
121134. doi: 10.1007/BF00344744 nance maps: reaction diffusion approach. Ph.D. thesis, Department of Electrical
Grossberg, S., and Olson, S. J. (1994). Rules for the cortical map of ocular domi- Engineering, (Delhi: IIT).
nance and orientation columns. Neural Netw. 7, 883894. doi: 10.1016/S0893- Markan, C. M., and Bhaumik, B. (1999). A diffusive Hebbian model for cortical
6080(05)80150-9 orientation maps formation, in Proceedings of IJCNN 99 (Washington, DC).
Gupta, P., and Markan, C. M. (2013). Exploring a quantum-Hebbian approach doi: 10.1109/IJCNN.1999.831482
towards learning and cognition. NeuroQuantology 11, 416425. doi: Markan, C. M., Gupta, P., and Bansal, M. (2007). Neuromorphic building blocks
10.14704/nq.2013.11.3.669 for adaptable cortical feature maps, IFIP International Conference on VLSI,
Hameroff, S. (2010). The conscious pilot-dendritic synchrony moves through the 1517 Oct 2007 (Atlanta, GA).
brain to mediate consciousness. J. Biol. Phys. 36, 7193. doi: 10.1007/s10867- Markan, C. M., Gupta, P., and Bansal, M. (2013). An adaptive neuromorphic
009-9148-x model of Ocular Dominance map using floating gate synapse. Neural Netw.
Hikawa, H., Harada, K., and Hirabayashi, T. (2007). Hardware feedback self- 45, 117133. doi: 10.1016/j.neunet.2013.04.004
organizing map and its application to mobile robot location identification. Martn-del-Bro, B., and Blasco-Alberto, J. (1995). Hardware-oriented models for
JACIII 11, 937945. VLSI implementation of self-organizing maps, in From Natural to Artificial
Horng, S. H., and Sur, M. (2006). Visual activity and cortical rewiring: activity- Neural Computation, eds J. Mira and F. Sandoval (Berlin: Springer), 712719.
dependent plasticity of cortical networks. Prog. Brain Res. 157, 3381. doi: doi: 10.1007/3-540-59497-3_242
10.1016/S0079-6123(06)57001-3 McAllister, A. K., Katz, L. C., and Lo, D. C. (1999). Neurotrophins
Hsu, D., Figueroa, M., and Diorio, C. (2002). Competitive learning with and synaptic plasticity. Annu. Rev. Neurosci. 22, 295318. doi:
floating-gate circuits. IEEE Trans. Neural Netw. 13, 732744. doi: 10.1146/annurev.neuro.22.1.295
10.1109/TNN.2002.1000139 Merolla, P. A., Arthur, J. V., Shi, B. E., and Boahen, K. A. (2007). Expandable net-
Hubel, D. H., and Wiesel, T. N. (1959). Receptive fields of single neurones in the works for neuromorphic chips. IEEE Trans. Circ. Syst. I Reg. Pap. 54, 301311.
cats striate cortex. J. Physiol. 148, 574591. doi: 10.1109/TCSI.2006.887474
Hubel, D. H., and Wiesel, T. N. (1962). Receptive fields, binocular interaction and Miller, K. D. (1994). A model for the development of simple cell receptive
functional architecture in the cats visual cortex. J. Physiol. 160, 106. fields and the ordered arrangement of orientation columns through activity-
Indiveri, G. (2001). A current-mode hysteretic winner-take-all network, with exci- dependent competition between ON and OFF-center inputs. J. Neurosci. 14,
tatory and inhibitory coupling. Analog Integr. Circ. Signal Process. 28, 279291. 409409.
doi: 10.1023/A:1011208127849 Miller, K. D. (1996). Receptive fields and maps in the visual cortex: models of
Indiveri, G. (2008). Neuromorphic VLSI models of selective attention: from sin- ocular dominance and orientation columns, in Models of neural networks III
gle chip vision sensors to multi-chip systems. Sensors 8, 53525375. doi: (New York, NY: Springer), 5578. doi: 10.1007/978-1-4612-0723-8_2
10.3390/s8095352 Miller, K. D., and MacKay, D. J. (1994). The role of constraints in Hebbian learning.
Indiveri, G., Chicca, E., and Douglas, R. (2006). A VLSI array of low-power spik- Neural Comput. 6, 100126. doi: 10.1162/neco.1994.6.1.100
ing neurons and bistable synapses with spike-timing dependent plasticity. IEEE Misgeld, T. (2011). Lost in elimination: mechanisms of axonal loss. e-Neuroforum
Trans. Neural Netw. 17, 211221. doi: 10.1109/TNN.2005.860850 2, 2134. doi: 10.1007/s13295-011-0017-2
Indiveri, G., Chicca, E., and Douglas, R. J. (2009). Artificial cognitive systems: from Mooney, R., Penn, A. A., Gallego, R., and Shatz, C. J. (1996). Thalamic relay
VLSI networks of spiking neurons to neuromorphic cognition. Cogn. Comput. of spontaneous retinal activity prior to vision. Neuron 17, 863874. doi:
1, 119127. doi: 10.1007/s12559-008-9003-6 10.1016/S0896-6273(00)80218-4
Indiveri, G., and Horiuchi, T. K. (2011). Frontiers in neuromorphic engineering. Mrsic-Flogel, T. D., and Bonhoeffer, T. (2012). Neuroscience: sibling neurons bond
Front. Neurosci. 5:118. doi: 10.3389/fnins.2011.00118 to share sensations. Nature 486, 4142. doi: 10.1038/486041a
Indiveri, G., Oswald, P., and Kramer, J. (2002). An adaptive visual tracking sensor Personius, K. E., Chang, Q., Mentis, G. Z., ODonovan, M. J., and Balice-Gordon, R.
with a hysteretic winner-take-all network. IEEE Int. Symp. Circ. Syst. 2, 324327. J. (2007). Reduced gap junctional coupling leads to uncorrelated motor neuron
doi: 10.1109/ISCAS.2002.1010990 firing and precocious neuromuscular synapse elimination. Proc. Natl. Acad. Sci.
Jegelka, S., Bednar, J. A., and Miikkulainen, R. (2006). Prenatal development of U.S.A. 104, 1180811813. doi: 10.1073/pnas.0703357104
ocular dominance and orientation maps in a self-organizing model of V1. Rahimi, K., Diorio, C., Hernandez, C., and Brockhausen, M. D. (2002). A simula-
Neurocomputing 69, 12911296. doi: 10.1016/j.neucom.2005.12.094 tion model for floating-gate MOS synapse transistors, in IEEE International
Kohonen, T. (1993). Physiological interpretation of the self organizing map algo- Symposium on Circuits and Systems, 2002. ISCAS 2002, Vol. 2 (Phoenix-
rithm. Neural Netw. 6, 895905. doi: 10.1016/S0893-6080(09)80001-4 Scottsdale, AZ: IEEE), II532. doi: 10.1109/ISCAS.2002.1011042
Kohonen, T. (2006). Self-organizing neural projections. Neural Netw. 19, 723733. Roerig, B., and Chen, B. (2002). Relationships of local inhibitory and excitatory
doi: 10.1016/j.neunet.2006.05.001 circuits to orientation preference maps in ferret visual cortex. Cereb. Cortex 12,
Krekelberg, B. (1997). Modelling cortical self-organization by volume learning. 187198. doi: 10.1093/cercor/12.2.187
London: Doctoral dissertation. Schemmel, J., Fieres, J., and Meier, K. (2008). Wafer-scale integration of analog
Kruger, W. F., Hasler, P., Minch, B. A., and Koch, C. (1997). An adaptive WTA using neural networks, in IEEE International Joint Conference on Neural Networks,
floating gate technology. Adv. Neural Inform. Processing Syst. 720726. 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence) (Hong
Kuffler, S. W. (1953). Discharge patterns and functional organization of mam- Kong: IEEE), 431438. doi: 10.1109/IJCNN.2008.4633828
malian retina. J. Neurophysiol. 16, 3768. Sengpiel, F., Stawinski, P., and Bonhoeffer, T. (1999). Influence of experience on
Lam, S. Y. M., Shi, B. E., and Boahen, K. (2005). Self-organized cortical map orientation maps in cat visual cortex. Nat. Neurosci. 2, 727732. doi: 10.1038/
formation by guiding connections. IEEE Int. Symp. Circ. Syst. 5, 52305233. 11192
Lazzaro, J., Ryckebusch, S., Mahowald, M. A., and Mead, C. A. (1989). Winner- Seris, P., Latham, P. E., and Pouget, A. (2004). Tuning curve sharpening for orien-
Take-All Networks of O (n) Complexity NIPS 1. San Mateo, CA: Morgan tation selectivity: coding efficiency and the impact of correlations. Nat. Neurosci.
Kaufman Publishers. 7, 11291135. doi: 10.1038/nn1321

Frontiers in Neuroscience | Neuromorphic Engineering April 2014 | Volume 8 | Article 54 | 116


Gupta and Markan Neuromorphic orientation selectivity

Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., Linares-Barranco, Weliky, M., and Katz, L. (1997). Disruption of orientation tuning visual cor-
A., Paz-Vicente, R., Gomez-Rodriguez, F., et al. (2005). AER build- tex by artificially correlated neuronal activity. Nature 386, 680685. doi:
ing blocks for multi-layer multi-chip neuromorphic vision systems, 10.1038/386680a0
in NIPS (Vancouver, BC: Vancouver). Wijekoon, J., and Dudek, P. (2008). Compact silicon neuron circuit
Shi, B. E. (2009). The effect of mismatch in currentversus voltagemode resistive with spiking and bursting behaviour. Neural Netw. 21, 524534. doi:
grids. Int. J. Circ. Theor. Appl. 37, 5365. doi: 10.1002/cta.494 10.1016/j.neunet.2007.12.037
Shi, B. E., Tsang, E. K. S., Lam, S. Y., and Meng, Y. (2006). Expandable hardware Wong, R. O. (1999). Retinal waves and visual system development. Annu. Rev.
for computing cortical feature maps, in Proceedings 2006 IEEE International Neurosci. 22, 2947. doi: 10.1146/annurev.neuro.22.1.29
Symposium on Circuits and Systems, 2006. ISCAS 2006 (Island of Kos: IEEE). Wyatt, R. M., and Balice-Gordon, R. J. (2003). Activity-dependent elim-
doi: 10.1109/ISCAS.2006.1693407 ination of neuromuscular synapses. J. Neurocytol. 32, 777794. doi:
Shouval, H. Z., Goldberg, D. H., Jones, J. P., Beckerman, M., and Cooper, L. N. 10.1023/B:NEUR.0000020623.62043.33
(2000). Structured long-range connections can provide a scaffold for orienta- Yoshida, T., Ozawa, K., and Tanaka, S. (2012). Sensitivity profile for orientation
tion maps. J. Neurosci. 20, 11191128. selectivity in the visual cortex of goggle-reared mice. PLoS ONE 7:e40630. doi:
Shuo, S., and Basu, A. (2011). Analysis and reduction of mismatch in silicon neu- 10.1371/journal.pone.0040630
rons, in Proceedings of IEEE Biomedical Circuits and Systems Conference (San Yousef, T., Tth, E., Rausch, M., Eysel, U. T., and Kisvrday, Z. F. (2001).
Diego, CA), 257260. Topography of orientation centre connections in the primary visual cortex of
Somers, D. C., Nelson, S. B., and Sur, M. (1995). An emergent model of the cat. Neuroreport 12, 16931699. doi: 10.1097/00001756-200106130-00035
orientation selectivity in cat visual cortical simple cells. J. Neurosci. 15, Zamarreo-Ramos, C., Camuas-Mesa, L. A., Perez-Carrasco, J. A., Masquelier, T.,
54485465. Serrano-Gotarredona, T., and Linares-Barranco, B. (2011). On spike-timing-
Srinivasan, V., Graham, D. W., and Hasler, P. (2005). Floating-gates tran- dependent-plasticity, memristive devices, and building a self-Learning visual
sistors for precision analog circuit design: an overview, in 48th Midwest cortex. Front. Neurosci. 5:26. doi: 10.3389/fnins.2011.00026
Symposium on Circuits and Systems, 2005 (Covington, KY: IEEE), 7174. doi: Zhang, L., Bao, S., and Merzenich, M. (2002). Disruption of primary auditory
10.1109/MWSCAS.2005.1594042 cortex by synchronous auditory inputs during critical period. Proc. Natl. Acad.
Stent, G. S. (1973). A physiological mechanism for Hebbs postulate of learning. Sci. U.S.A. 99, 23092314. doi: 10.1073/pnas.261707398
Proc. Natl. Acad. Sci. U.S.A. 70, 9971001. doi: 10.1073/pnas.70.4.997
Stryker, M. P., and Strickland, S. L. (1984). Physiological segregation of ocular Conflict of Interest Statement: The authors declare that the research was con-
dominance columns depends on the pattern of afferent electrical activity. Invest. ducted in the absence of any commercial or financial relationships that could be
Opthalmol. Vis. Sci. 25, 278. construed as a potential conflict of interest.
Sur, M., and Leamey, C. A. (2001). Development and plasticity of cortical areas and
networks. Nat. Rev. Neurosci. 2, 251262. doi: 10.1038/35067562 Received: 31 August 2013; accepted: 09 March 2014; published online: 02 April 2014.
Taba, B., and Boahen, K. (2002). Topographic map formation by silicon growth Citation: Gupta P and Markan CM (2014) An adaptable neuromorphic model of
cones. Proc. NIPS 1139, 1146. orientation selectivity based on floating gate dynamics. Front. Neurosci. 8:54. doi:
Tootell, R. B., Silverman, M. S., and De Valois, R. L. (1981). Spatial frequency 10.3389/fnins.2014.00054
columns in primary visual cortex. Science 214, 813815. doi: 10.1126/sci- This article was submitted to Neuromorphic Engineering, a section of the journal
ence.7292014 Frontiers in Neuroscience.
Turney, S. G., and Lichtman, J. W. (2012). Reversing the outcome of synapse elim- Copyright 2014 Gupta and Markan. This is an open-access article distributed
ination at developing neuromuscular junctions in vivo: evidence for synaptic under the terms of the Creative Commons Attribution License (CC BY). The use, dis-
competition and its mechanism. PLoS Biol. 10:e1001352. doi: 10.1371/jour- tribution or reproduction in other forums is permitted, provided the original author(s)
nal.pbio.1001352 or licensor are credited and that the original publication in this journal is cited, in
Von der Heydt, R., Peterhans, E., and Dursteler, M. R. (1992). Periodic-pattern- accordance with accepted academic practice. No use, distribution or reproduction is
selective cells in monkey visual cortex. J. Neurosci. 12, 14161434. permitted which does not comply with these terms.

www.frontiersin.org April 2014 | Volume 8 | Article 54 | 117


ORIGINAL RESEARCH ARTICLE
published: 18 March 2014
doi: 10.3389/fnins.2014.00051

A mixed-signal implementation of a polychronous spiking


neural network with delay adaptation
Runchun M. Wang*, Tara J. Hamilton , Jonathan C. Tapson and Andr van Schaik
Bioelectronics and Neuroscience, The MARCS Institute, University of Western Sydney, Sydney, NSW, Australia

Edited by: We present a mixed-signal implementation of a re-configurable polychronous spiking


Jennifer Hasler, Georgia Institute of neural network capable of storing and recalling spatio-temporal patterns. The proposed
Technology, USA
neural network contains one neuron array and one axon array. Spike Timing Dependent
Reviewed by:
Delay Plasticity is used to fine-tune delays and add dynamics to the network. In our
Christian G. Mayr, Dresden
University of Technology, Germany mixed-signal implementation, the neurons and axons have been implemented as both
Arindam Basu, Nanyang analog and digital circuits. The system thus consists of one FPGA, containing the
Technological University, Singapore digital neuron array and the digital axon array, and one analog IC containing the analog
*Correspondence: neuron array and the analog axon array. The system can be easily configured to use
Runchun M. Wang, Bioelectronics
different combinations of each. We present and discuss the experimental results of all
and Neuroscience, The MARCS
Institute, University of Western combinations of the analog and digital axon arrays and the analog and digital neuron arrays.
Sydney, Locked Bag 1797, Penrith, The test results show that the proposed neural network is capable of successfully recalling
NSW 2751, Australia more than 85% of stored patterns using both analog and digital circuits.
e-mail: [email protected]
Keywords: mixed-signal implementation, polychronous spiking neural network, analog implementation,
multiplexed neuron array, neuromorphic engineering

INTRODUCTION as at least two spikes arrive simultaneously at a neuron in the


Increasing evidence has been found that the mammalian neural network.
system uses spatio-temporal coding in at least some of its opera- Izhikevich (2006) calls these spatio-temporal patterns groups,
tions (Van Rullen and Thorpe, 2001; Masuda and Aihara, 2003), and concludes that spiking networks with delays have more
largely due to this codings potential to reduce energy consump- groups than neurons after presenting a network developed
tion (Levy and Baxter, 1996). An artificial network that can learn based on this polychronous principle. The groups in Izhikevichs
and recall spatial and temporally encoded spike information will network emerge in a randomly connected network of spik-
have significant benefits in terms of modeling these biological ing neurons with axonal delays, following persistent stimulation
systems. and Spike Timing Dependent Plasticity (STDP) (Gerstner et al.,
A polychronous spiking neural network is a candidate 1996). However, one of the open problems of the theoretical
for implementing a memory for spatio-temporal patterns. model is to find patterns (groups): Our algorithm for finding
Polychronization is the process in which spikes travel down polychronous groups considers various triplets firing with vari-
axons with specific delays to arrive at a common target neuron ous spiking patterns and determines the groups that are initiated
simultaneously and cause it to fire, despite the source neurons by the patterns. Because of the combinatorial explosion, it is
firing asynchronously (Izhikevich, 2006). This time-locked rela- extremely inefficient (Izhikevich, 2006). The method used by
tion between the firing of different neurons is the key feature of Izhikevich will take months of simulation time just to find these
spatio-temporal patterns. Neural networks based on this prin- spatio-temporal patterns. Moreover, the polychronous groups
ciple are referred to as polychronous neural networks and are emerge randomly and the same stimulus is not likely to result
capable of storing and recalling quite complicate spatio-temporal in the same polychronous groups every time. This makes the
patterns. Figure 1 shows an example of a spatio-temporal pattern Izhikevich polychronous network unsuitable for practical appli-
involving five neurons. The threshold voltage of each neuron is cations such as pattern recognition. Finally this model is not
set so that it will fire if two pre-synaptic spikes arrive simulta- efficient for hardware implementations, which we will discuss in
neously. Whenever a neuron fires, its spike is transmitted to all detail in section Discussion.
connected neurons via its axonal connections, each of which has To solve the problems presented above, we have proposed a
its own independent delay. These spikes will then generate post- digital implementation of a reconfigurable polychronous spiking
synaptic currents at the connected neurons. The example pattern neural network that can, in real time, learn specific patterns, and
starts when neuron 1 fires at time 0 and neuron 5 fires at time T1. retrieve them (Wang et al., 2013b). Furthermore, our proposed
The spikes from both neurons will arrive at neuron 3 at time polychronous neural network can use all the available hardware
T1+T2, and together they will induce neuron 3 to fire at time resources to store patterns. Test results show that the proposed
T1+T2. In the same manner, the spikes from neuron 5 and neu- neural network is capable of successfully recalling more than
ron 3 arrive at neuron 2 simultaneously at time T1+T2+T3 and 95% of all spikes for 96% of the stored patterns. Unlike biolog-
will cause neuron 2 to fire. This process will continue as long ical neural networks, the digital implementation is totally free

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 118


Wang et al. Mixed-signal polychronous spiking neural network

mixed-signal implementation, which includes the multiplexed


analog neuron array and the interface between the asynchronous
communication of the analog array and the (synchronous) FPGA.
Measured results and a comparison to the fully digital imple-
mentation are given in section Results. In Section Discussion we
discuss the performance of the different implementations and the
key elements that influence the capacity and scaling of electronic
realizations of polychronous networks and we conclude in section
Conclusions.

MATERIALS AND METHODS


PROPOSED POLYCHRONOUS NETWORK
Training and recalling patterns
Two procedures are needed to use our proposed polychronous
network to memorize and recall spatio-temporal patterns. The
FIGURE 1 | Example of a spatio-temporal pattern. The neurons fire first is a training procedure in which the connection delay val-
asynchronously while their spikes arrive at the destination neurons
synchronously, after traveling along axons with appropriate delays. This
ues of the axon paths between neurons are configured in order
time-locked relation is the key feature of the spatio-temporal patterns. to meet the required timing relations of a given pattern. The sec-
ond is a recall procedure, needed to retrieve a pattern that has
been stored in the neural network through training. A pattern
of mismatch and noise. Therefore, we also designed an analog can be recalled by presenting the first few spikes of the pattern
implementation, which is naturally subject to process variation to the network, after which the network will complete the pat-
and device mismatch, and which more closely emulates the analog tern if it is recognized. For example, to recall the example pattern
computation in biological neurons. shown above, neuron 1 needs to fire at time 0 and neuron 5
Mixed-signal implementations of spiking neural networks needs to fire at time T1. Together they will cause neuron 3 to
benefit from many of the advantages of both analog and digital fire and the remainder of the pattern will be induced by the net-
implementations. Analog implementations can realize biological work. The network is also capable of recalling parts of patterns
behaviors of neurons in a very efficient manner, whereas digital that start somewhere in the middle, e.g., neuron 2 firing at time
implementations can provide the re-configurability needed for T1+T2+T3 and neuron 4 firing at time T1+T2+T3+T4 will
rapid prototyping of spiking neural networks. As a result, mixed- retrieve the remainder of the example pattern.
signal implementations offer an attractive neural network and The goal of the training procedure is to assign appropriate con-
many designs have been proposed for such systems (Goldberg nection delays to axons in the polychronous neural network so
et al., 2001; Gao and Hammerstrom, 2007; Mirhassani et al., 2007; that it is able to recall a specific pattern. We propose two mech-
Vogelstein et al., 2007; Harkin et al., 2008, 2009; Schemmel et al., anisms, which are delay programming and delay adaptation, to
2008; Saighi et al., 2010; Yu and Cauwenberghs, 2010; Zaveri and implement this function. Delay programming relies on a connec-
Hammerstrom, 2011; Minkovich et al., 2012). tion storing the delay value between a spike from its input neuron
These proposed systems tend to employ programmable devices and a spike from its output neuron when both are induced to fire
such as FPGAs and ASICs to route the spikes between analog com- by some external training signal. It is not a biologically plausi-
putation modules. Some programmable platforms using floating ble method, but it is efficient in training and reduces testing time
gates (Basu et al., 2010; Brink et al., 2013). Furthermore, most in scenarios where the result will not be affected by the training
of these systems use DACs to configure the analog modules to method. We therefore commonly use it to initialize a network.
emulate different biological behaviors. Implementations of spik- Inspired by STDP, we developed a delay adaptation method,
ing neural networks with time-multiplexed analog circuits are Spike Timing Dependent Delay Plasticity (STDDP), to fine-tune
described in Mirhassani et al. (2007), Yu and Cauwenberghs the delays during the training phase. We decrease the delay value
(2010), Minkovich et al. (2012) and a version that uses nanotech- of one axon by a small amount if the destination neuron fires
nology is described in Gao and Hammerstrom (2007), Zaveri and (generating the post-synaptic spike) before the pre-synaptic spike
Hammerstrom (2011). arrives (at the synapse of the destination neuron), and we increase
Here, we report on a mixed-signal platform, which com- the delay in the opposite case. This procedure is repeated until the
bines both our analog and digital implementations and provides pre-synaptic spike arrives at the synapse simultaneously with the
test results. Section Proposed Polychronous Network gives an post-synaptic spike being generated. In the training phase, delay
overview of the proposed polychronous neural network. Section adaptation causes the connections to attain the desired delays
Design Choice presents the design choices that have been made through repeated presentation of the desired spatio-temporal pat-
for the neuromorphic implementation of the proposed poly- terns. The delay programming method can be regarded as a special
chronous network. The analog building blocks of the poly- case of the delay adaptation method in which the delay adaption
chronous network (i.e., the neurons, axons, and other analog is completed in just a single step and the delay is never altered
components) are detailed in section Analogue Implementation. subsequently. With the delay adaptation method, every time a pat-
Section Mixed-signal Implementation presents the proposed tern is recalled the delay values in the pattern will be updated,

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 119


Wang et al. Mixed-signal polychronous spiking neural network

allowing the learned delays to be modified over time. Hardware can be any number of axonal delay paths between any two neu-
implementations of non-polychronous networks that also adapt rons in the network. In other words, several axons can have
axonal delays can be found in (Hussain et al., 2012, in press). identical input and output addresses, placing them between the
same two neurons. They would still be able to have different delay
Neural network structure values, so that a spike originating from the input neuron would
The structure of the proposed neural network is shown in arrive at the output neuron multiple times after different delays,
Figure 2. It contains two functional parts: a neuron array emulating the case where a neuron makes multiple synapses with
and an axon array. The neurons and the axons communicate another neuron.
with each other via Address-Event Representation (AER) buses The axon module (see Figure 3) has five address registers,
(Boahen, 2000). Each neuron in the neuron array is identical in one ramp generator, and four identical axonal delay paths. The
structure and has a unique AER address. The axon modules in address registers are used to store the input address and the four
the axon array are also identical in structure, and have both a output addresses for the axonal delay paths. To place one axon
unique physical address (their position in the array) and config- module between neurons, we need to configure its address reg-
urable input and output addresses, to place an axon between two isters. At the beginning of the training, axon module[0] (see
neurons. The axon modules generate pre-synaptic spikes, which Figure 2) is enabled and all the other axon modules are disabled.
are received by the neurons. The neurons will then generate post- When the first post-synaptic spike in a training pattern arrives,
synaptic spikes if more than a certain number of pre-synaptic axon module[0] will latch the address of this spike as its input
spikes arrive simultaneously. To decrease the likelihood of cross- address and enable axon module[1]. The output addresses will
talk between patterns, i.e., that a coincidence detecting neuron be configured after the input address is configured. As there are
would be set off by a random coincidence, we used coincidence four output addresses, one for each of the destination neurons, it
detectors with four inputs and a threshold of three spikes (Wang will take four iterations for one axon module to finish the con-
et al., 2013b). figuration of its output addresses (using the addresses of the next
The post-synaptic spikes are sent to the axon modules in the four sequential post-synaptic spikes in the training pattern after
axon array. The axon array propagates these post-synaptic spikes its input address is configured).
with axonal-specific delay values and generates pre-synaptic
spikes at the end of the axons. In the proposed neural network,
the communication between any two neurons must be conducted
via the axon modules in order to implement the polychronous
network. This axon array, with reconfigurable input and output
addresses, is capable of achieving much higher resource utiliza-
tion than the method we have used previously (Wang et al.,
2011), which generated spatio-temporal patterns based on fixed
connectivity between neurons. That approach always resulted
in networks where some axons remained unused. Our current
approach is to generate delay paths de novo, so that only connec-
tions that actually appear in the training patterns will be created,
by configuring the appropriate input and output addresses for
each axon. Additionally we configured the system such that there

FIGURE 2 | Structure of the proposed polychronous neural network.


The neuron array generates post-synaptic spikes and then sends them to FIGURE 3 | Structure of the axon module. The axon module receives
the axon array, which propagates these post-synaptic spikes, with post-synaptic spikes generated by the neuron in the neuron array via the
programmable axonal delays, and generates the pre-synaptic spikes at the AER post-synaptic bus. The axon module propagates these spikes with
end of the axons. These pre-synaptic spikes are sent to the neuron array to axonal-specific programmable delays and generates pre-synaptic spikes at
cause the neurons to fire. The connectivity and delay of all the axons in the the end of the axons. The address registers are used to store the input
axon array are configurable. address and the four output addresses for the axonal delay paths.

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 120


Wang et al. Mixed-signal polychronous spiking neural network

Delay programming is carried out in the same way as the


address configuration. When the first post-synaptic spike arrives
at axon module[0], it will start a ramp generator, which will
send its value (ramp_out) to the four axonal delay paths. The
delay of each axonal delay path is programmed when the output
addresses are being configured (i.e., when the next four sequen-
tial post-synaptic spikes from the training pattern arrive). After
delay programming, when a post-synaptic spike arrives and its
address matches the input address of one axon module, it will
start the ramp generator again. The axonal delay path will com-
pare the output of the ramp generator with the programmed
delay. A pre-synaptic spike will be generated when the output
of the ramp generator exceeds the programmed delay with an
address as stored in the output address register. The delays can
also be configured using delay adaptation rather than delay pro-
gramming. In this case the axonal delay is increased or decreased
based on the delay between pre-synaptic spike and post-synaptic
spike by using one of the three strategies: exact correction of FIGURE 4 | Topology of the mixed-signal platform. The FPGA contains
the delay error in one step, correction of the error by a fixed the digital axon and neuron array, a router to control the destinations of
amount each time, or correction by an amount proportional to spikes on the bus, and a pattern generator and checker for testing
the error. We have implemented all three strategies in the digital purposes. A separate IC contains the analog implementations of the axon
and neuron arrays.
axon module. The first method is identical to just using the delay
programming method. The second method, which uses a small
fixed step, is very slow and produces similar results to the third
method with a coefficient of 0.5. The digital axon presented here 2. Digital axon array and analog neuron array: In this configu-
uses the third strategy. Slightly differently, the delay of the ana- ration, the router is required to re-map the addresses of the
log axon is programmed in an initial phase followed by a number spikes transmitted between the analog neuron array and the
of iterations of delay adaptation with a fixed update step, which digital axon array.
was the simplest method to implement. An analog implementa- 3. Analog axon array and digital neuron array: In this configura-
tion that implements all three strategies would be too large for tion, the router is also required to re-map the addresses of the
practical implementation on silicon. spikes transmitted between the digital neuron array and the
analog neuron array.
DESIGN CHOICE 4. Analog axon array and analog neuron array: Despite having
Topology only analog implementations, the router is still required to
Figure 4 shows the topology of the proposed mixed-signal plat- transmit spikes between the analog axon array and the analog
form. It consists of one FPGA and one analog chip containing neuron array, as the addresses still require remapping. This is
an analog neuron array and an analog axon array. The FPGA done to multiplex the analog neurons, so that inactive neu-
contains the digital axon array, the digital neuron array, a pat- rons in the network are not using hardware resources. This
tern generator and checker module for training and testing, and a increases the size of the analog neuron array significantly. We
router. The function of the router is to remap the addresses of the will present the details of this approach in section Mixed-signal
spikes between the digital implementation and the analog imple- Implementation.
mentation; but in practice the router also needs to synchronize the
spikes from the analog circuits before it can remap the addresses The neurons in the neuron array work as coincidence detectors
for these spikes. This is due to the analog circuits operating asyn- that detect how many pre-synaptic spikes have arrived simulta-
chronously and therefore without a clock, whereas the router is a neously. The FPGA implementation of these neurons uses four
fully digital design, which does require a clock. The spikes from timers and one comparator (see Wang et al., 2013b). The ana-
the analog circuit therefore have to be synchronized to the clock log version of these neurons is implemented using simple Leaky
domain in which the router works. We will present the design Integrate and Fire (LIF) neurons, which will be described in detail
of an interface circuit for synchronization, followed by a circuit in section Analog Neuron Array. Since no complicated biological
to implement the address remapping in section Synchronization behaviors, such as spike rate adaptation or bursting, are required
Interface Circuit. for the neurons in a polychronous network, we chose to imple-
The system contains two types of implementations for the ment LIF neurons, instead of more complex neuron models, e.g.,
axon array and two for the neuron array, resulting in four poten- the Izhikevich neuron model (Izhikevich, 2003) and the Mihalas-
tial combinations, which are presented below: Niebur neuron model (Mihala and Niebur, 2009), to keep the size
of the neuron circuit to a minimum.
1. A digital axon array and a digital neuron array: This is simply For the axon module, the FPGA implementation uses a
the default FPGA implementation. counter to implement the ramp generator, and registers to store

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 121


Wang et al. Mixed-signal polychronous spiking neural network

the delay values. In the analog implementation, the ramp gen- N M!


CM = (1)
erator is implemented with a circuit that starts charging a N!(M N)!
MOS capacitor after receiving a spike on the AER bus. The
axonal delay is generated by comparing a programmable volt- where M is the width of the bus and N is the number of bits
age, stored on a capacitor, with the output signal of the ramp that are HIGH in each address. In our implementation, M and
generator. The design and implementation of the ramp gen- N are set to 8 and 3, respectively, so that 56 addresses exist, which
erator and the delay path can be found in (Wang et al., suffices for the size of our implementation. Both pre- and post-
2013a). synaptic buses use this 3/8 bit code. The post-synaptic bus uses
one active line in addition to the address to indicate an address
has been placed on the bus, while the pre-synaptic bus uses four
AER bus active linesone for each of the four synapses an axon can target.
There are two different AER buses in the proposed neural net- The addresses of the AER buses in the analog axon array
work: the AER post-synaptic bus and the AER pre-synaptic bus. are encoded in a format of 4 out of 9 high bits, yielding 126
The first is used to transmit post-synaptic spikes generated by the addressesone for each neuron. Increasing the bus width would
neurons to the axon modules. The second is used to transmit pre- allow more neurons at the cost of additional area for the bus and
synaptic spikes generated by the axon modules to the neurons the decoder. The choice of 4/9 for this bus is a trade-off between
(see Figure 3). The AER bus and protocol used in this system dif- performance and the cost of silicon.
fers slightly from the standard AER bus and protocol (Boahen,
2000). We do not use handshaking, so we have omitted the request ANALOG IMPLEMENTATION
and acknowledge signals. Instead we use active lines to tell the Analog neuron array
receiver (neurons or axon modules) that a spike has been placed The proposed LIF neuron comprises four identical charge-and-
on the bus. Each neuron receives input from four neurons via discharge synapses, one for each active line on the pre-synaptic
four axons in our network. The pre-synaptic bus therefore uses bus. The structure of the synapse was first proposed by Arthur and
four active lines, one for each synapse of the neuron. A further Boahen (2004). Figure 5A shows the schematic of the charge-and-
difference in our AER implementation is that there is no arbiter discharge synapse, which will generate a post-synaptic current for
to deal with collisions when two addresses are placed on the bus every incoming pre-synaptic spike. This synapse comprises a reset
simultaneously. We will address this issue in detail in section circuit (N1-N4), a MOS capacitor (Csyn , 100 fF), a voltage-to-
Discussion. current conversion circuit (P1-P2) and a DC current source (Iexp ,
In our digital implementation, a single minimum-width set to 12 pA).
binary address is used to reduce hardware costs, as the wiring The 3/8 high bits of the pre-synaptic address are connected to
for the bus will entail more resources than the implementation of N1-N3. On arrival of a pre-synaptic spike with these three bits
the encoders/decoders in large scale FPGA designs (Harkin et al., HIGH and the appropriate active line high, N1-N4 will conduct
2008). This structure, however, doesnt satisfy our analog imple- and pull Vsyn down to ground. After that, Vsyn will be pulled up to
mentation, in which a full encoder/decoder will cost more area Vdd by Iexp . The voltage-to-current conversion circuit will trans-
than the analog neuron itself in a 0.6 m technology (typically duce Vsyn into Isyn , the post-synaptic current, which will decay
each bit needs one XOR gate with 16 transistors in a full decoder). exponentially, due to the linearly increasing Vsyn . To reduce power
The AER buses in the analog neuron array use active lines and consumption, P1, a diode connected pMOS transistor, is added
a 3/8-bit (three out of eight) address for which the encoding to limit the gate-source voltage of P2. Isyn will be injected into the
and decoding can be efficiently implemented in aVLSI, as will be soma for integration. All four synapses of a LIF neuron are identi-
shown in section Analog Neuron Array. The number of different cal, using the same 3/8 bit address, but are connected to different
addresses, C, for this code are given by the binomial coefficient: active lines.

FIGURE 5 | Circuit diagram of the analog synapse (A) and soma (B).

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 122


Wang et al. Mixed-signal polychronous spiking neural network

Figure 5B shows the schematic of the soma. The post-synaptic post-synaptic bus needs to be driven in parallel by all the ana-
currents from four synapses are sent to a current mirror (N1-N2) log LIF neurons, an implementation with voltage-mode spikes
for summing. The current mirror will convey Isyn , the sum of the would need a high fan-in OR gate or an arbiter, which would
post-synaptic currents, to IP1 , which is the input current of a first- take up a significant amount of area in the layout. Furthermore,
order low-pass filter. Furthermore, by changing the width/length using voltage-mode spikes for on-chip routing will take up signif-
ratio of N1 or N2, the input current to the low pass filter can be icant area as each spike needs one wire, whereas the current-mode
easily scaled or amplified. spikes can share one bus, e.g., one wire can be shared by the active
The low-pass filter, which was first proposed in Python and lines from all the 50 neurons.
Enz (2001), is the basic building block of the soma. In our pre- As a trade-off between fabrication cost and the size of the neu-
vious work (Wang et al., 2011), we have shown that its output ron array, we chose to implement 50 analog LIF neurons in the
current Iout has the following equation: analog neuron array, which led to the choice of the 3/8-bit address
format. The layout of the analog LIF neuron is as compact as pos-
dIout sible and all signals are routed across the neuron. In this way, the
mem + Iout = IP1 (2)
dt placement of the neurons in an array is quite straightforward; the
neurons are placed in one row.
where the time constant of the implementation is given by:
All transistors are 2.4 m wide and 3.6 m long (P8, N3, N4,
nUT Cmem and N8 is 0.6 m long, N1 is 4.5 m wide and P7 is 4.8 m
mem = (3) wide and 0.6 m long). The inverter I1 use transistors are 2.4 m
It
wide and 0.6 m long. The MOS capacitor values are: Cmem =
where UT is the thermal voltage, n is the weak inversion slope 15 24 m (0.6 pF) and Crfc = 3.6 2.4 m (0.02 pF). In
factor, and It is a DC current source (set to 1 nA ). More details the layout of the neuron array, for each neuron, we just need
can be found in Wang et al. (2011). to connect the three transistors that form the address decoder
To generate the post-synaptic spike, the output current of (N1-N3) in the current synapse (see Figure 5A), to three bits
this low-pass filter Iout is compared with a constant current Ithres in the address of the AER pre-synaptic bus according to the
introduced by N7. The value of Ithres is set by Vthres to a value unique 3/8-bit address of that neuron. An active line on the
such that three pre-synaptic spikes arriving within 1 ms will make AER pre-synaptic bus is connected to N4 of a current synapse.
Iout strong enough to pull Vcmp up to Vdd . When Vcmp exceeds Each of the four current synapses will have its own active line
the threshold of N8, N8 will conduct and pull Vpulse down to on the AER pre-synaptic bus. Similarly, for each neuron, we
ground. Vpulse is sent to an inverter to generate the post-synaptic just need to connect the four transistors, which compose the
spike. It is HIGH when Vpulse is lower than the threshold of the address encoder (N10-N13) in Figure 5B, to the active line
inverter. and to the three high bits in the address on the current-mode
The refractory period is implemented by a circuit composed of AER post-synaptic bus according to the unique 3/8-bit address
N9, P7, a MOS capacitor (Crf , 100 fF) and a DC current source of that neuron. In this way, the layout of the neuron array
(Irf , set to 12 pA). When the post-synaptic spike is HIGH, N9 will remain compact as no extra routing of the AER buses is
will conduct and pull Vrf down to ground. After that, Vrf will needed.
be pulled up to Vdd by Irf . P7 will conduct and pull Vmem up to
Vdd when Vrf is lower than the threshold of P7. The time when Analog axon array
Vmem is at Vdd is the refractory period, during which the low- The structure of the analog axon module is shown in Figure 3. It
pass filter will not do any integration. Since this refractory time is comprises three parts: a ramp generator, four axonal delay paths
active when Vrf is lower than the threshold of P7, the refractory and an AER interface circuit. The AER interface circuit carries out
time is thus controlled by the size of Crf , the capacitor, and Irf , the function of the address configuration, the address decoding
the charging current. and the address encoding. The ramp generator will start when
When Vmem is pulled up to Vdd and Iout is reset to 0, Vcmp will receiving a spike on the AER bus. The details of the design and
be pulled down to ground by Ithres . N8 will stop conducting when implementation of the ramp generator and the delay path can be
Vcmp is low and Vpulse will then be pulled up to Vdd by a constant found in Wang et al. (2013a).
current Ipw . The post-synaptic spike, which is the inverted signal The analog axon array contains 100 identical analog axon
of Vpulse , will then be reset. A feedback circuit (P8) will pull Vpulse modules connected serially. Due to the size of the axon module,
up to Vdd quickly once Vpulse exceeds the threshold voltage of the we cannot place these 100 axon modules physically in one row
inverter, to reduce power consumption. The pulse width of the (it would be 20 mm long) but instead the array is folded to cre-
post-synaptic spike, which is the time when Vpulse is lower than ate a 1010 2-D array, as shown in Figure 6. As in the layout of
the threshold of the inverter, is controlled by Ipw , which is used to the neuron module all the AER buses, control signals, and bias
pull Vpulse up. currents are routed horizontally across the axon module so that
An address encoder (N10-N13, using four minimum-sized neighboring neurons in a row are simply connected by placing
nMOS transistors to drive the active line and 3/8-bit address them next to each other. The horizontal buses in each row are
of the AER post-synaptic bus), will convert the voltage-mode connected to two vertical buses placed on both sides of the axon
post-synaptic spike into a current-mode spike. The current-mode array for interconnection. As for the neuron array, the spikes gen-
spike will be sent to the AER post-synaptic bus. As the AER erated by the axon modules are all current-mode spikes within the

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 123


Wang et al. Mixed-signal polychronous spiking neural network

chip and they are converted to voltage-mode spikes for off-chip can use the fact that in a typical polychronous network, only a
transmission. small percentage (less than 5%) of the neurons are active at any
given time, and only those active neurons need to be physically
MIXED-SIGNAL IMPLEMENTATION implemented.
Multiplexed analog neuron array The structure of the multiplexed analog neuron array is shown
The motivation for developing a multiplexed analog neuron array in Figure 7. It consists of two sub-blocks: a physical neuron
is to increase the size of the analog neuron array without increas- array and a controller. They communicate with each other via
ing the cost of the system significantly. A polychronous neural two internal AER buses: the AER physical pre-synaptic bus and
network composed of a neuron array with 50 neurons will suf- the AER physical post-synaptic bus. The controller receives pre-
fer from severe cross-talk between patterns, which occurs when synaptic spikes from the axon array and assigns them to the
a neuron belonging to one pattern fires accidently as a result of physical neurons for the generation of post-synaptic spikes, which
pre-synaptic spikes from other patterns or another part of the will be sent to the axon array. From the point of view of the axon
same pattern. The effect of cross-talk depends on the overlap array, the multiplexed neuron array appears as a neuron array
(correlation) of the patterns and can be regarded as noise. The with 4k neurons. The addresses of the spikes between the con-
more overlap there is, the higher the possibility that a pattern troller (a single minimum-width binary address) and the analog
plus some noise spikes will also set off a different pattern. Also, neuron array (the 3/8-bit address format) need to be remapped by
the more input connections a neuron has, i.e., the more patterns the router, which will also synchronize the spikes from the analog
this neuron is a member of, the more likely this neuron is to get circuits. For simplicity, in the following description, we assume
three simultaneous inputs as a result of noise. In severe cases of the controller is connected to the analog neuron array without
cross-talk, all neurons in the network will fire continuously in synchronization and address remapping.
an uncontrolled manner. To mitigate this problem, we need to The controller dynamically assigns analog neurons to each
increase the sparsity of the neural network, i.e., decrease the num- incoming pre-synaptic spike. The analog neurons are used to
ber of patterns to which each neuron is sensitive. This can be detect how many pre-synaptic spikes have arrived within 1 ms
achieved by increasing the size of the neuron array, as the patterns of each other. When a spike arrives from the axon array and an
generated by the pattern generator are evenly distributed over the analog neuron has already been assigned for that spikes address,
whole network. The conventional approach to increase the size of the spike will be sent to that neuron. The address of this incom-
the analog neuron array is to simply add more physical neurons. ing spike will have been latched in a register linked to that analog
As expected, hardware costs increase linearly in relation to the neuron. If no neuron has been assigned for the arriving address,
size of the neuron array if all the neurons are to be implemented the spike will be sent to an unassigned neuron, which will then
physically. be labeled as assigned by the controller, by latching the address
Inspired by the multiplexed neuron array used in the digi- of the spike. The controller will also start a timer linked to that
tal implementation (Wang et al., 2013b), we propose a similar analog neuron. Once the timer of that neuron has expired (after
approach to implement a multiplexed analog neuron array. We 1 ms), the neuron will be freed and labeled as unassigned by
the controller. When a post-synaptic spike is generated by an
analog neuron, the controller will send it to the axon array with

FIGURE 7 | Structure of the multiplexed analog neuron array. The


controller and router map virtual addresses from the AER busses to
FIGURE 6 | Layout of the axon array. Arrows show how the axons physical addresses on the analog neuron array, so that only active neurons
modules are placed in a 1-D array. in the network are using hardware resources.

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 124


Wang et al. Mixed-signal polychronous spiking neural network

the address that is stored in its register. More details about the They are caused by the coupling capacitance between the wires
controller can be found in Wang et al. (2013b). and transistors. These glitches, in spite of their short period, are
Based on this structure, a neuron array with 4k virtual ana- still likely to be sampled by the digital circuit (running at 50 MHz)
log neurons can be achieved using only 50 physical neurons. This and ultimately may lead to the wrong addresses being sampled.
multiplexed analog neuron array is thus 80 times more efficient One common method to minimize the timing skew caused by
in silicon area on the analog side. It does, however, require a con- transistor mismatch is to use clocked flip-flops (Weste and Harris,
troller implemented on an FPGA. This does not increase the cost 2005) to generate these spikes. We have not used this method
of the system significantly as the FPGA is needed anyway to carry because it would increase the design overhead of circuit and intro-
out other tasks such as pattern generation, address remapping and duce another problem, namely that of synchronizing the clock
other miscellaneous tasks. Furthermore, this mixed-signal imple- signal of the chip and the FPGA. The timing skew caused by
mentation offers a much higher degree of extensibility as the LIF propagation delays on the PCB is usually minimized by carefully
neurons used in this implementation could easily be replaced with tuning the length of the tracks on the PCB. We have not used that
other neuron models if desired. method either as it would significantly increase the effort and cost
of manufacturing the PCB.
Synchronization interface circuit In digital designs, the general way to sample an asynchronous
To use the asynchronous analog circuits with the FPGA, synchro- parallel bus is to use a handshake protocol to guarantee that the
nization with its clock domain is needed. In digital circuit design, receiver will only sample the data when the data is stable (Weste
a general method used to do this is to use two (or more) serially and Harris, 2005). In other words, the sender needs to inform the
connected flip-flops to sample the input (Weste and Harris, 2005). receiver when to sample the data. The drawback of this method
This scheme works well for 1-bit signals but it does not extend to is that it requires extra logic circuits on both the sender and the
catering for parallel signals, such as the address bus and data bus, receiver. In cases where there is more than one sender on the bus
due to potential timing skew on these buses that could cause each trying to send data, some form of arbitration is required, fur-
bit in the bus to arrive at a slightly different time. This can lead to ther increasing the circuit complexity and the cost of hardware
race conditions and hazard problems, and can ultimately lead to resources.
the wrong address being sampled (Weste and Harris, 2005). Instead of the above methods, we chose to synchronize the
In our design, this timing skew comes from two sources. The spikes from the analog implementations by using an interface
first is the analog circuit that converts the current-mode spikes circuit to carry out the synchronization in three steps without
to voltage-mode spikes. Due to process variation and parasitic requiring a handshake protocol. For illustration, we will use the
capacitors between the wires and transistors, the conversion for AER bus of the analog neuron array in the following explanation.
each line of the bus will take a slightly different amount of time. The interface circuit handles the AER bus of the analog axon array
For the very same reasons, the pulse width of each active line and in the same way.
each bit in the address will also be slightly different. The second The first step is to synchronize each active line and each bit of
source of timing skew is caused by the propagation delay of the the address of the incoming spike in the conventional manner by
signals along the tracks of the Printed Circuit Board on their way using a circuit composed of a serial connected flip-flop for each
to the FPGA. of them (four in total). The output values of the flip-flops for the
Figure 8 illustrates a waveform of a post-synaptic spike from address and active lines are referred to as the synchronized address
an analog LIF neuron (the waveform from the analog axon is quite and the synchronized active line, respectively. The address of the
similar). In the figure, the timing skew can clearly be seen as each post-synaptic spike is encoded in the 3/8-bit format, which means
bit in the bus arrives at a slightly different time. Besides the timing that any address that does not have exactly three out of eight bits
skew, there is also an additional problem in the form of glitches, active is invalid.
which are brief digital pulses, up to tens of nanoseconds long. The second step is then to latch the synchronized address and
active line only when a valid address is present, i.e., when exactly
three bits are HIGH, and store it in a register. We have imple-
mented this register as a 329 bit FIFO, using eight bits for the
address and one bit for the active line. We use a counter to deter-
mine how many bits are HIGH in the synchronized address and
we can distinguish two situations that need an action when a valid
address is detected:

1. The arrival of a spike with a valid address when the address


at the previous clock cycle was invalid. In this condition, the
value of the counter in current clock cycle is three, whilst the
value of the counter at previous clock cycle was not equal to
three. The address of the spike is latched in the FIFO
2. The arrival of a spike with a valid address that is different from
FIGURE 8 | Waveform of a spike from an analog neuron on the
post-synaptic AER bus showing timing skew and glitches.
a valid address at the previous clock cycle. In this case, the
value of the counter in the current clock cycle and previous

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 125


Wang et al. Mixed-signal polychronous spiking neural network

clock cycle are both equal to three, whereas the value of the RESULTS
synchronized address in current clock cycle is not equal to the The proposed polychronous neural network is designed to train
value at the previous clock cycle. The new address is stored in and recall patterns rather than to randomly react to some spatio-
the FIFO. temporal patterns (groups) that have emerged in the network,
as is the case in Izhikevich (2006). Performance in our net-
In all other cases, including when a valid address is detected that work is therefore measured as the rate of success in recalling the
is the same as in the previous clock cycle, the data on the bus trained patterns. The advantage of our approach is that the net-
is ignored. In this way, the asynchronous spikes from the ana- work can be used as a memory that can learn spatio-temporal
log neuron array are synchronized and stored in the FIFO. The patterns. Furthermore this approach optimizes the use of the
third step is to generate spikes with a fixed pulse width (four clock available hardware, so that in our approach all available neurons
cycles) by reading the FIFO. If the FIFO is empty, all the synchro- and axons in the hardware arrays can be used, while in the origi-
nized pre-synaptic spikes have been read out and no new spikes nal polychronous network some neurons and many connections
will be generated. are not part of any pattern and thus never used. The disadvan-
The interface circuit for the spikes from the analog axon array tage of our approach is that overlap between patterns (cross-talk)
operates in the same way with the exception that a third condition has to be limited and it is not possible to store near identical
needs to be handled: patterns.
There are four possible combinations of analog or digital
3. The arrival of a spike with a valid address that is the same as the axons and neurons. The fully digital (FPGA) combination imple-
last one that arrived, but on a different synapse. In this case, ments the proposed neural network faithfully with hardly any
the value of the counter in current clock cycle and previous effect of noise and process variations. The measurements form
clock cycle are both four (4/9-bit format) and the value of the this combination therefore present the optimal performance of
synchronized address in both cycles is the same, but the value our polychronous neural network model. The results of all the
of the synchronized active lines is different. The new address other three combinations will be compared with the results of the
and active line are stored in a 3213 bit FIFO (nine bits for fully digital implementation in the sections Digital Axon Array
the address and four bits for the active lines). and Analog Neuron Array to Analog Axon Array and Analog
Neuron Array. Section Performance of the Interface Circuit first
The interface circuit effectively eliminate the problems of timing discusses the performance of the interface circuit described in
skew and glitches on the bus. It is also capable of sampling the section Synchronization Interface Circuit.
asynchronous spikes from the analog circuits with a high tem-
poral accuracy, as shown by the results that will be presented in PERFORMANCE OF THE INTERFACE CIRCUIT
section Performance of the Interface Circuit. For spikes that need Testing the interface circuit is the first step in testing the whole sys-
to be sent to the analog chip, we use the conventional means of tem. To obtain a direct measurement of the ability of the interface
synchronizing them to the system clock by using flip-flops on the circuit to synchronize and latch addresses correctly, we use the
FPGA to minimize the timing skew on the address lines (Weste FPGA to send a pre-synaptic spike to an analog neuron to induce
and Harris, 2005). it to fire. The interface circuit is then used to synchronize and
latch the spike from the analog neuron with the FPGAs clock. We
Address remapping then compare the address of this latched post-synaptic spike with
Address remapping is the second function of the router. The con- the expected address, as determined by which neuron the FPGA
troller can be configured for multiplexed analog neuron arrays induced to fire. If their addresses match, this means the interface
or multiplexed digital neuron arrays. When it is configured for a circuit works correctly.
multiplexed analog neuron array, the router needs to carry out Sometimes the interface circuit samples the same address
the remapping for the addresses of spikes traveling between the twice. This is caused by glitches that can cause a valid address to
controller and the analog neuron array. To use the analog axon become briefly invalid, when more than three address lines are
array, the router needs to carry out the address remapping for the high, before returning to the valid address as the glitches subside.
spikes traveling between the analog axon array and the controller This double sampling could be solved by adding an internal timer
regardless of whether it is configured for multiplexed analog or to the interface circuit to guarantee that an address could only be
digital neuron array. sampled once within a short period (say 1 s). However, we have
The router was implemented using four look-up tables, one not employed this method as the second spike sampled will only
for each of the four address remapping possibilities. For spikes cause a small offset (<1 s) in the axonal delay, which starts on
from the analog axon/neuron array, the router synchronizes them the arrival of a post-synaptic spike. This offset will not affect the
using the interface circuit first. These synchronized spikes are then performance of the proposed polychronous neural network at all.
compared to the look-up tables in order to convert their addresses Figure 9 shows the results of the tests. All 50 addresses (one
to the corresponding binary-encoded addresses. These spikes are for each analog neuron) were tested 128 times (with an interval
then sent to the controller for processing. Spikes generated by the time of 5 ms to guarantee there will be one post-synaptic spike
controller are also compared against the look-up tables to convert each time). This test was then repeated 10 times. In each of the 10
their addresses to either 3/8-bit or 4/9-bit addresses. After being runs, for approximately 75% of the time the correct address was
converted, these spikes are sent to the analog axon/neuron array. sampled once while for the remainder of the cases, the correct

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 126


Wang et al. Mixed-signal polychronous spiking neural network

FIGURE 9 | Performance of the interface circuit. Dark gray: valid address


sampled once; Light gray: valid address sampled twice in succession.

address was sampled twice in succession. No wrong addresses


were sampled in these tests.

DIGITAL AXON ARRAY AND ANALOG NEURON ARRAY


Delay programming
In the setup for the delay programming tests, a single axon array
was used in the neural network, yielding 4k axon modules with
16k (16384) axonal delay paths (connections). Note that unlike
in Izhikevich (2006), no connections are shared between two
patterns, so that the number of available connections directly FIGURE 10 | Percentage of stored patterns successfully recalled for
determines the maximum number of inter-spike intervals that different neuron array sizes. (A) delay programming and (B) delay
adaptation, respectively. The results for the fully digital implementation are
can be programmed into our network. Each axon module con- added for comparison purpose. Error bars are standard errors of the mean.
tains four axonal delay paths (see Figure 3), and for each spike
in the polychronous pattern, 4 delay paths are needed from the
four previous spikes in the pattern. Thus, the number of the inter-
spike intervals that our neural network can store is simply equal 4k analog neurons matches the digital implementation. As an
to the number of axon modules. If, for instance, the patterns to be aside, this proves that the proposed interface circuit is capable of
stored each contain 50 inter-spike intervals, the maximum num- sampling the asynchronous spikes from the analog circuits cor-
ber of such patterns that can be stored in the neural network is 82 rectly, because otherwise the performance would be much worse
(4k/51). than in the digital implementation.
The patterns are trained only once when using delay pro- The results indicate that the effects of cross-talk are more seri-
gramming. There is also only one recall test as there is no ous when using the multiplexed analog neuron array, so that a
adaptation, and the result of a recall will be the same each time. network with analog neurons performs worse than one with dig-
For each configuration of the neural network, 10 test runs were ital neurons when the size of the network is small. Due to process
conducted. The pattern generator & checker module generates variation and device mismatch, the analog neurons cannot be per-
spatio-temporal patterns for training and for testing whether the fectly tuned to all generate a post-synaptic spike only when at
patterns can be recalled successfully. We tested neuron array sizes least 3 out of 4 pre-synaptic spikes arrive within 1 ms. In other
ranging from 128 to 4k neurons and test results are shown in words, the analog neuron is not as precise a coincidence detector
Figure 10A. For the configurations consisting of 128 and 256 neu- as the digital neuron. Moreover, due to the parasitic capacitances
rons (not shown in Figure 10A) and trained with 82 patterns on chip, the analog LIF neuron will sometimes generate spikes by
having 51 spikes each, the neural network enters an all firing accident, e.g., the firing of one neuron will trigger its neighboring
state in which all the neurons fire simultaneously, showing that neuron to fire, which increases cross-talk. Increasing the size of
a network of this size using analog neurons cannot cope with the network increases the sparsity (i.e., decreases the number of
that number of patterns. In the digital implementation, this only patterns to which a neuron belongs Wang et al., 2013b), and the
happens for configurations consisting of 128 neurons, while a net- difference in the performance between the analog neurons and
work with 256 neurons achieves an average success rate about the digital neurons will become negligible for larger networks.
80%. To achieve a similar success rate when using analog LIF
neurons, the network needs at least 512 neurons. Furthermore, Delay adaptation
the results for a network with 1k and 2k analog neurons are also In the tests for the delay-adaptation mode, each pattern was
slightly worse than their digital counterparts. Only the result for trained five times and recalled one time. The strategy used

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 127


Wang et al. Mixed-signal polychronous spiking neural network

adapted the delay by half the time difference between the pre- and for 1000 patterns and the successful recall rate is about 95% on
post-synaptic spikes each time a neuron fired. The same settings average which is quite close to the result of the fully digital imple-
used in the delay programming scenario were used for these tests, mentation (Wang et al., 2013b). With 1200 patterns the recall
but all delays were initialized with random values. We again tested no longer works as the effect of cross-talk becomes too severe,
neuron array sizes from 128 to 4k neurons and the test results indicating that once cross-talk reaches a critical level, it quickly
are shown in Figure 10B. For the networks with a size smaller becomes catastrophic. Two reasons caused this performance drop.
than 2k neurons, only a few patterns can be recalled success- The first reason is that the mixed-signal system suffers more noise
fully and their results are therefore not included in Figure 10B. compared to the fully digital implementation, the successful rate
The results in Figure 10 also show the performance drops more of which is 95% for 1200 patterns. The second reason is that the
in delay adaptation mode than in the delay programming mode theoretical maximum firing rate of the pre-synaptic spikes that
when compared with the digital implementation. This is again the the multiplexed analog neuron array can handle is only 50/128
result of the larger sensitivity to cross-talk in the analog neuron 40% of the maximum firing rate that the digital one can handle, as
array. the number of the physical neurons is only 50, whereas the digital
implementation has 128 physical neurons.
Effect of noise
In this set of tests, random noise was injected into the network. ANALOG AXON ARRAY AND DIGITAL NEURON ARRAY
The Poisson rate of the noise, generated by a LFSR, was varied Unlike the results presented in section Digital axon array and
from 2 to 128 spikes per second. This firing rate represents the Analog Neuron Array, the testing scenarios for the combination
number of additional spikes, i.e., not belonging to any of the of analog axon array and digital neuron array will focus on the
trained patterns, presented to the network in a one second win- percentage of spikes in a pattern that have been recalled success-
dow. As each spike is generated by a randomly chosen neuron, fully. This is because the capacity of the analog axon array is much
the spike rate measures the total noise input, not the firing rate of smaller than that of the digital axon array, which means that only
individual neurons. a few patterns can be stored in this network, so that the percentage
All other settings were kept the same as in the delay- of patterns recalled is a much less accurate measure of perfor-
programming mode and the delay-adaptation mode with a neu- mance. Furthermore, the dynamics caused by process variation
ron array consisting of 4k neurons. In both modes, no noise was and device mismatch causes variations in the number of spikes
added during the first training time. Figure 11 shows the result, that are correctly recalled in each pattern.
which proves that the system is fairly robust to noise when the For this test, we only had access to one analog axon array with
sparsity of the neural network is large. 100 analog axon modules, each with 4 axonal delay paths. The
maximum accessible address of the 4/9-bit bus on the analog axon
Capacity for storing spatio-temporal patterns array is 126, which means the maximum size of the digital neuron
To test the capacity for storing spatio-temporal patterns when array that can be used is 126 neurons. As the experimental results
using the multiplexed analog neuron array, it was configured with in Wang et al. (2013b) show, a neural network consisting of only
4k neurons and 80k axon modules. Delay programming and delay 126 neurons will be affected seriously by cross-talk. To measure
adaptation were both used with a pattern length of 51 spikes. the performance of the analog axon array without the effect of
For a pattern length of 51 spikes, we tested storing and recalling this cross-talk, we used specially generated random patterns with
1000 and 1200 patterns. Ten test runs were conducted. The system no overlap (correlation) for testing.
works well for the 1000 pattern case. Figure 12 shows the results

FIGURE 11 | Recall percentage for various Poisson rates of the noise


generator. The firing rate represents the total number of additional random FIGURE 12 | Result for capacity testing with 1000 stored patterns of 51
spikes per second in the network. For comparison, the firing rate of a spikes each. The network consists of 4k neurons and 80k axon modules.
stored pattern is about 100 spikes per second (50 events in about 500 ms). Both methods of delay configuration resulted in approximately 95% of the
Light gray: delay programming; Dark gray: delay adaptation. Error bars are stored patterns being successfully recalled. Error bars are standard errors
standard errors of the mean. of the mean.

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 128


Wang et al. Mixed-signal polychronous spiking neural network

Delay programming and delay adaptation were both used with


pattern lengths of 20, 25, 33, and 50 spikes. The patterns were
trained with a single presentation in the delay programming
mode and for 20 presentations in the delay adaptation mode.
As there are 400 axons in the analog axon array, for the pattern
length of 20, 25, 33, 50 spikes, the maximum number of such pat-
terns that can be stored in the neural network is five, four, three,
and two, respectively. For each pattern length, 127 test runs were
conducted.
Figure 13 shows, for each pattern stored in the neural network,
what percentage of spikes were recalled correctly. As discussed
in section Analog Axon Array, the delay of the analog axon is
programmed in an initial phase followed by a number of itera-
tions of delay adaptation with a fixed delay update step. This is to
reduce the errors in delay that result from the initial delay pro-
gramming step. Figure 13 shows that after 20 iterations of delay
adaptation, the percentage of the spikes in the patterns that have
been correctly recalled has been slightly increased for the patterns
with 50 spikes. For the other pattern lengths, the improvement is
negligible. The average percentage of spikes in each pattern cor-
rectly recorded across four pattern lengths (over 127 test runs)
using delay programming is 86.2% and using delay adaptation
is 87%.
Compared to the test results presented in Wang et al. (2013b),
which uses the fully digital implementations, the combination of
analog axon array and digital neuron array has an 8% drop in
performance, which is mainly because the analog axon cannot
be as precisely programmed and tuned as the digital axon. As
the experimental results of one axon module presented in (Wang
et al., 2013a) show, the offset between the actual programmed
and the desired value is about 10%, after delay programming.
When the ramp generators voltage is latched by the analog mem-
ory (for delay programming), there is always a slight deviation
(10 mV) between the programmed voltage and the desired volt-
age, as a combined result of charge injection (Liu et al., 2002)
and the inaccuracy of the ramp generator itself. The ramp gen-
erator will not charge at exactly the same speed each time due
to noise in the charging current. The analog axon will therefore
propagate each incoming pre-synaptic event with an offset com-
pared to the desired axonal delay. After delay adaptation, this
error can be reduced to less than 300 s throughout the work-
ing range of a single axonal delay path (Wang et al., 2013a), but
due to process variation and device mismatch, it is impossible
to tune all axonal delay paths with such accuracy. This offset,
when large enough, will destroy the time-locked relations that are
the basis of polychronous spiking neural networks. We will dis-
cuss possible solutions for this issue in section Analog vs. Digital
Implementations. Another factor in the drop in performance is
the fact that the analog axon will sometimes generate spikes due
to on-chip parasitic coupling between axons, so that the firing of
one axonal delay path can trigger its neighboring paths to fire by FIGURE 13 | Percentage of spikes in pattern correctly recalled for
accident. different pattern lengths: (A) 50 spikes, (B) 33 spikes, (C) 25
spikes, and (D) 20 spikes. These results are from the
ANALOG AXON ARRAY AND ANALOG NEURON ARRAY combination of analog axons and digital neurons. For most patterns
across all four pattern lengths, more than 85% of spikes are
In this section, we will present the experimental results of the recalled successfully.
combination with an analog axon array and an analog neuron

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 129


Wang et al. Mixed-signal polychronous spiking neural network

array. For the same reasons as presented in the previous section,


the testing scenarios will also focus on the percentage of spikes in
a pattern that have been recalled successfully, and the setup for
testing is the same as described in the previous section.
Figure 14 shows for each pattern stored in the analog axon
array how many spikes were recalled correctly. Figure 14 shows
that more than 70% of the spikes are correctly recalled for nearly
all the patterns across three pattern lengths (20, 25, and 33 spikes)
in both delay programming mode and delay adaptation mode.
For the longest patterns (50 spikes) the probability of correctly
recalling the full pattern is significantly lower, with only 57.4%
of the spikes successfully recalled on average, as mismatch and
noise are more likely to destroy the time-locked relations, result-
ing in the final part of the pattern not being recalled. Figure 14
also shows that for these longest patterns, 20 iterations of delay
adaptation improve the percentage of the spikes in the patterns
that have been correctly recalled to 64.7%. The average percent-
ages of spikes in pattern correctly recorded across four pattern
lengths (20, 25, 33, and 50 spikes) using delay programming
are 77.2, 78, 72.8, and 57.4%, respectively. After 20 iterations of
delay adaptation, these numbers have been improved to 78.1,
78.6, 73.9, 64.7%, respectively. Compared to the results pre-
sented in section Analog Axon Array and Digital Neuron array
for the analog axon array and digital neuron array, the fully
analog combination has an overall Performance drop of about
14%. Compared to the test results presented in section Digital
Axon Array and Analog Neuron Array for the digital axon array
and analog neuron array, the performance drop increases to
about 20%.
These drops are the results of two major factors. The first one
is that the analog axon and neuron arrays both generate spuri-
ous spikes due to on-chip parasitic coupling. The second factor
is that the analog axon fails to perfectly produce the time-locked
relations as the digital axon does. Both factors play a larger role
the longer the pattern is (in terms of number of spikes). Together,
these effects causes the combination of the analog axon and ana-
log neuron array to have the lowest performance of the four
combinations.

DISCUSSION
PERFORMANCE COMPARISON
Efficiency of the implementation
In Izhikevich (2006), the polychronous network is created with
random delays, and STDP is used to prune the connections.
Patterns are not stored or programmed into the network, but
rather, random patterns emerge. A single connection between
neurons could be active in a number of patterns, while other
connections will become totally inactive. In our implementation,
patterns can be directly programmed into the network and all
connections are used when the maximum number of patterns
has been programmed into the network. We aimed to avoid inac- FIGURE 14 | Percentage of spikes in each pattern correctly recalled for
tive connections, since hardware would still be dedicated to these different pattern lengths: (A) 50 spikes, (B) 33 spikes, (C) 25 spikes, and
inactive connections, but never used. (D) 20 spikes. These results are from the full analog system.

A drawback of a polychronous neural network is that a com-


mon sequence of four spikes in multiple patterns would initiate needed from the input pattern to keep the pattern going, for
all patterns that have this sequence when it occurred. To distin- example by setting the threshold to 5 simultaneous input spikes (4
guish between two patterns with identical sub-sequences, it will from the previous neurons in the pattern and 1 from the input).
be necessary to set up the network so that continuous input is Such a system would then only follow a pattern if it had been

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 130


Wang et al. Mixed-signal polychronous spiking neural network

previously learned, and if it corresponded with the input pattern. While it is common cause in neuromorphic engineering that
One of the two potential patterns (with identical starts) would die analog circuits provide superior simulation of biological neurons
out once the input signal identified which of the two patterns is as a result of their continuous and noisy representation of signals,
being presented. these results show that in this application the analog implemen-
The probability of overlap between patterns can be reduced tation is consistently poorer in performance and scalability than
by setting a higher threshold at each neuron and connecting it the digital implementation, which emphasizes that practitioners
to more of the previous neurons in the pattern. The number of should recognize that the use of analog circuits comes at a signifi-
patterns a network can store decreases linearly with the number cant cost and should not necessarily be an automatic choice in all
of neurons each neuron is connected to, so this would come at the applications.
cost of a decreased storage capacity.
Comparison with other solutions
Analog vs. digital implementations For the analog implementation of the axonal delay, a similar
The experimental results show that, on average, the fully digital approach was implemented by charging a capacitor using a tran-
implementation has the best performance. For comparison, the sistor operating in sub-threshold (Dowrick et al., 2013), so that
combination of the digital axon array and the analog neuron array the duration of the delay can be programmed by adjusting the
achieves a similar performance when the network is sparse. The gate voltage of the charging transistor. However, their implemen-
combination of the analog axon array and digital neuron array tation is not able to learn delays, as the value of the gate voltage
has a considerable performance drop, even when care has been was assigned externally and the authors have not addressed the
taken to remove all cross-talk from the spatio-temporal patterns. issues of obtaining and maintaining this voltage. In contrast, our
Finally, the combination of the analog axon and neuron array circuit is capable of learning and storing the axonal delay between
has the worst performance out of the four combinations. The two spikes. In (Sheik et al., 2012, 2013), the authors show how
fully digital implementation has the strongest time-locked rela- slow dynamics of analog synapses, combined with the variability
tion, whereas the fully analog implementation has the weakest, of neuromorphic analog circuits, can be used to generate a range
due to the offset between the actual programmed and the desired of temporal delays. Again, this work is used to generate the desired
delay during programming; and the analog implementation is delay rather than learn the delay.
further hampered by noise and spurious spikes. As a result, we For the digital implementation of the (axonal) delay, another
may conclude that the most important requirement of a hardware approach is to use a look-up table for the axonal delay values and
implementation of a polychronous network is to provide a strong use a delay sorter directly before the neurons (Scholze et al., 2011).
time-locked relation. The delay sorter records the arrival time of a spike and will re-emit
For the analog axon, as presented in section Analogue Axon the spike when the axonal delay time found in the look-up-table
Array and Digital Neuron Array, the error is introduced when is reached. Our polychronous network generates delay paths de
the ramp generator is writing its output voltage to the analog novo, so that only connections that actually appear in the training
memory (for delay programming) as a combined result of the patterns will be created. Each axon module of our polychronous
charge injection and the inaccuracy of the ramp generator. As the network not only propagates the post-synaptic spike with a pro-
results presented in Wang et al. (2013a) show, the offset will still grammable axonal delay but also transmits the pre-synaptic spike
be about 300 s even after adaptation. One possible solution is to to the destination neuron (using address remapping by configur-
use analog-to-digital conversions and then store these digital val- ing the input and output addresses). An implementation with a
ues in digital memories (Horio et al., 1990; Cauwenberghs, 1996). look-up table would need the axon module to store the address
This method has a major advantage in that data can be stored in of the desired axonal delay from the look-up-table, and would
non-volatile digital memory. The drawback is also quite obvious. need to receive the notification from the look-up-table when that
It requires at least one analog-to-digital converter (ADC) for stor- axonal delay is reached. Address-remapping would then have to
age and usually one digital-to-analog converter (DAC) for read be carried out by the axon module through the configuration of
out. This problem will become critical when massive storage is its input and output addresses. An implementation using look-
required as each analog cell will either have its own ADC or share up tables would therefore be more complex and larger than our
one ADC, which will increase the complexity of the circuit. Other proposed implementation.
factors, such as the accuracy and the bandwidth of the converters,
will lead to the requirement for a high precision ADC. The second SCALING
possible solution is to use floating-gate devices, which employ The performance of the proposed polychronous network (the
programmable elements that that could be used to store the ana- number of storable patterns) will scale linearly with the num-
log values in a non-volatile memory (Basu et al., 2010; Brink et al., ber of axons as long as the average number of connections per
2013; Hasler and Marr, 2013). This feature is a promising alterna- neuron is kept below 1/4 of the number of neurons in the net-
tive for the implementation of our polychronous spiking neural work to ensure that cross-talk is not much of an issue (Wang
network. On the other hand, the time-multiplexed digital axon et al., 2013b). In other words, the number of neurons needs to
achieves an excellent balance between hardware cost and perfor- be increased proportionally to the number of axons to maintain
mance and therefore is the preferred choice when using FPGAs. performance.
As for a custom design, this design choice needs to be carefully The fully digital implementation of the polychronous neural
investigated because the cost will be highly process dependent. network is a scalable design. The number of time-multiplexed

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 131


Wang et al. Mixed-signal polychronous spiking neural network

axons implemented by one physical axon will increase linearly as possible as the routing can be carried out off-chip by FPGAs
with the amount of available on-chip SRAM, as long as the mul- with more flexibility and extensibility.
tiplexing rate keeps the time resolution of the system within Our polychronous network stores spatiotemporal patterns. A
the biological time scale, which is generally less than 1 ms. The certain amount of jitter can be tolerated in the initial spikes when
number of physical axons (i.e., the ones that could be activated recalling a stored pattern, which is controlled by setting a time
simultaneously) will increase linearly with the number of avail- window for coincidence detection in the FPGA implementation,
able Slice LUTs, which is indeed the bottleneck for large-scale and by the neuronal time constant in the analog implemen-
FPGA designs. The total number of virtual axons therefore scales tation. If the patterns are to be generated by a neuromorphic
linearly with the quantity of both the available on-chip SRAM sensor, then care needs to be taken that the sensor reliably pro-
and Slice LUTs. The number of physical neurons also scales duces (near) identical spatiotemporal patterns for identical input
directly with the number of the available Slice LUTs. Finally, the signals.
timing requirement will become quite critical when the utiliza-
tion becomes high, e.g., 90% of the LUTs on an FPGA, due to CONCLUSIONS
the difficulties in routing. A good balance between the number We have presented a mixed-signal implementation of a poly-
of the physical axons, the multiplexing rate and the number of chronous spiking neural network composed of both an analog
physical neurons is therefore the key to the implementation of implementation and a digital implementation of the axon array
a large-scale polychronous network with a good time resolu- and the neuron array. A multiplexed analog neuron array with 4k
tion and a high utilization of the available hardware recourses analog neurons was achieved by multiplexing 50 physical analog
on FPGA. neurons. Compared to conventional time-multiplexing systems
The analog implementation is nowhere near as scalable as the that operate serially and have to store and retrieve analog vari-
digital implementation, since it can only be scaled up by imple- ables, our scheme operates in parallel, and does not require analog
menting more physical copies of the neurons and axons. However, storage. A novel interface circuit for synchronizing the spikes
the introduction of the multiplexed analog neuron array, making from the analog circuits has also been presented. The proposed
use of the fact that only a few neurons are active at any given time interface circuit effectively eliminates the problems of timing skew
in a polychronous network, allows the number of virtual neurons and glitches on the bus and is capable of sampling the asyn-
to be about 80 times larger than the number of physical neurons. chronous spikes from the analog circuits correctly. The test results
In systems that need slow dynamics or memory of past events, using the four possible configurations of analog or digital com-
i.e., using neurons with longer time constants than we have used ponents have been compared and discussed. We compared our
here, the multiplex rate would go down and we would need more mixed-signal implementation with our fully digital implemen-
physical neurons. tation and addressed the key factor that most influences the
performance of the neural networkthat of generating accurate
LESSONS LEARNED time locked relations. The proposed implementation can be lin-
Some lessons have been learnt from the implementation of this early scaled up with the quantity of available hardware resources,
mixed-signal platform and these are discussed below. although the digital implementations are significantly easier to
Virtualization, i.e., the mapping of a larger address space onto scale than the analog equivalents, owing to the generic FPGA
a smaller number of physical components through multiplexing platforms used.
these components, is one of the key ideas for implementing large-
scale spiking neural networks, because physical components are ACKNOWLEDGMENTS
costly. Virtualization, when simulating neural networks, is sup- This work has been supported by the Australian Research Council
ported by biological observations that only 1% of neurons in Grant DP0881219.
our brains are active on average at any moment (Johansson and
Lansner, 2007), which means it is not necessary to implement all REFERENCES
neurons physically on silicon. Arthur, J. V., and Boahen, K. (2004). Recurrently connected silicon neurons
A mixed-signal system appears to be a powerful tool for with active dendrites for one-shot learning, in 2004 IEEE International Joint
Conference on Neural Networks (IEEE Cat. No.04CH37541) (IEEE) (Vancouver,
real-time emulation of large-scale neural networks as it can
BC), 16991704. doi: 10.1109/IJCNN.2004.1380858
use analog circuits for computation while keeping the flexibility Basu, A., Ramakrishnan, S., Petre, C., Koziol, S., Brink, S., and Hasler, P. E. (2010).
of using programmable devices such as FPGA. As the on-chip Neural dynamics in reconfigurable silicon. IEEE Trans. Biomed. Circuits Syst. 4,
topology of the analog circuits is generally fixed after fabrica- 311319. doi: 10.1109/TBCAS.2010.2055157
tion, it is better to implement the whole system in an FPGA Boahen, K. (2000). Point-to-point connectivity between neuromorphic chips using
address events. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 47,
for prototyping and optimization before fabricating the analog 416434. doi: 10.1109/82.842110
circuits. Brink, S., Nease, S., Hasler, P., Ramakrishnan, S., Wunderlich, R., Basu, A., et al.
For the sake of multiplexing analog building blocks such as (2013). A learning-enabled neuron array IC based upon transistor channel
neurons and axons in a neuromorphic system, these circuits models of biological phenomena. IEEE Trans. Biomed. Circuits Syst. 7, 7181.
must be designed as standardized building blocks with a standard doi: 10.1109/TBCAS.2012.2197858
Cauwenberghs, G. (1996). Analog VLSI long-term dynamic storage, in 1996
protocol for communication (such as AER) with programmable IEEE International Symposium on Circuits and Systems. Circuits and Systems
devices. Furthermore, for the maximum utilization of a fixed Connecting the World. ISCAS 96 (IEEE) (Atlanta, GA), 334337. doi:
sized analog chip, it is best to reduce the on-chip routing as much 10.1109/ISCAS.1996.541601

www.frontiersin.org March 2014 | Volume 8 | Article 51 | 132


Wang et al. Mixed-signal polychronous spiking neural network

Dowrick, T., Hall, S., and McDaid, L. (2013). A simple programmable axonal Schemmel, J., Fieres, J., and Meier, K. (2008). Wafer-scale integration of
delay scheme for spiking neural networks. Neurocomputing 108, 7983. doi: analog neural networks, in 2008 International Joint Conference on Neural
10.1016/j.neucom.2012.12.004 Networks (IEEE World Congr. Comput. Intell.) (Hong Kong), 431438. doi:
Gao, C., and Hammerstrom, D. (2007). Cortical models onto CMOL and CMOS 10.1109/IJCNN.2008.4633828
architectures and performance/price. IEEE Trans. Circuits Syst. I Regul. Pap. 54, Scholze, S., Schiefer, S., Partzsch, J., Hartmann, S., Mayr, C. G., Hppner, S.,
25022515. doi: 10.1109/TCSI.2007.907830 et al. (2011). VLSI implementation of a 2.8 Gevent/s packet-based AER inter-
Gerstner, W., Kempter, R., van Hemmen, J. L., and Wagner, H. (1996). A neu- face with routing and event sorting functionality. Front. Neurosci. 5:117. doi:
ronal learning rule for sub-millisecond temporal coding. Nature 383, 7681. 10.3389/fnins.2011.00117
doi: 10.1038/383076a0 Sheik, S., Chicca, E., and Indiveri, G. (2012). Exploiting device mismatch
Goldberg, D., Cauwenberghs, G., and Andreou, A. (2001). Probabilistic synap- in neuromorphic VLSI systems to implement axonal delays, in 2012
tic weighting in a reconfigurable network of VLSI integrate-and-fire neurons. International Joint Conference on Neural Networks (Brisbane, QLD), 16. doi:
Neural Netw. 14, 781793. doi: 10.1016/S0893-6080(01)00057-0 10.1109/IJCNN.2012.6252636
Harkin, J., Morgan, F., Hall, S., Dudek, P., Dowrick, T., and McDaid, L. Sheik, S., Pfeiffer, M., Stefanini, F., and Indiveri, G. (2013). Spatio-temporal spike
(2008). Reconfigurable platforms and the challenges for large-scale imple- pattern classification in neuromorphic systems, in Biomimetic and Biohybrid
mentations of spiking neural networks, in 2008 International Conference on Systems, eds N. F. Lepora, A. Mura, H. G. Krapp, P. F. M. J. Verschure, and T. J.
Field Programmable Logic and Applications (IEEE) (Heidelberg), 483486. doi: Prescott (Heidelberg: Springer), 262273. doi: 10.1007/978-3-642-39802-5_23
10.1109/FPL.2008.4629989 Van Rullen, R., and Thorpe, S. J. (2001). Rate coding versus temporal order cod-
Harkin, J., Morgan, F., McDaid, L., Hall, S., McGinley, B., and Cawley, S. (2009). ing: what the retinal ganglion cells tell the visual cortex. Neural Comput. 13,
A reconfigurable and biologically inspired paradigm for computation using 12551283. doi: 10.1162/08997660152002852
network-on-chip and spiking neural networks. Int. J. Reconfig. Comput. 2009, Vogelstein, R. J., Mallik, U., Vogelstein, J. T., and Cauwenberghs, G. (2007).
113. doi: 10.1155/2009/908740 Dynamically reconfigurable silicon array of spiking neurons with
Hasler, J., and Marr, B. (2013). Finding a roadmap to achieve large neuromorphic conductance-based synapses. IEEE Trans. Neural Netw. 18, 253265. doi:
hardware systems. Front. Neurosci. 7:118. doi: 10.3389/fnins.2013.00118 10.1109/TNN.2006.883007
Horio, Y., Ymamamoto, M., and Nakamura, S. (1990). Active analog memo- Wang, R., Cohen, G., Hamilton, T. J., Tapson, J., and Van Schaik, A. (2013a). An
ries for neuro-computing. IEEE Int. Symp. Circuits Syst. 4, 29862989. doi: improved aVLSI axon with programmable delay using spike timing dependent
10.1109/ISCAS.1990.112638 delay plasticity, in 2013 IEEE International Symposium of Circuits and Systems
Hussain, S., Basu, A., Wang, R., and Hamilton, T. (in press). Delay learning (ISCAS) (IEEE) (Beijing), 25.
architectures for memory and classification. Neurocomputing 127. Wang, R., Cohen, G., Stiefel, K. M., Hamilton, T. J., Tapson, J., and van Schaik,
Hussain, S., Basu, A., Wang, M., and Hamilton, T. J. (2012). DELTRON: A. (2013b). An FPGA implementation of a polychronous spiking neural net-
neuromorphic architectures for delay based learning, in 2012 IEEE Asia work with delay adaptation. Front. Neurosci. 7:14. doi: 10.3389/fnins.2013.
Pacific Conference on Circuits and Systems (IEEE) (Kaohsiung), 304307. doi: 00014
10.1109/APCCAS.2012.6419032 Wang, R., Tapson, J., Hamilton, T. J., and van Schaik, A. (2011). An analogue VLSI
Izhikevich, E. M. (2003). Simple model of spiking neurons. IEEE Trans. Neural implementation of polychronous spiking neural networks, in 2011 Seventh
Netw. 14, 15691572. doi: 10.1109/TNN.2003.820440 International Conference on Intelligent Sensors, Sensor Networks and Information
Izhikevich, E. M. (2006). Polychronization: computation with spikes. Neural Processing (IEEE) (Adelaide, SA), 97102. doi: 10.1109/ISSNIP.2011.6146572
Comput. 18, 245282. doi: 10.1162/089976606775093882 Weste, N., and Harris, D. (2005). CMOS VLSI Design: a Circuits and Systems
Johansson, C., and Lansner, A. (2007). Towards cortex sized artificial neural Perspective, 3rd Edn. Boston, MA: Addison-Wesley.
systems. Neural Netw. 20, 4861. doi: 10.1016/j.neunet.2006.05.029 Yu, T., and Cauwenberghs, G. (2010). Log-domain time-multiplexed real-
Levy, W. B., and Baxter, R. A. (1996). Energy efficient neural codes. Neural Comput. ization of dynamical conductance-based synapses, in Proceedings of 2010
8, 531543. doi: 10.1162/neco.1996.8.3.531 IEEE International Symposium on Circuits System (Paris), 25582561. doi:
Liu, S., Kramer, J., Indiveri, G., Delbrck, T., and Douglas, R. (2002). Analog VLSI: 10.1109/ISCAS.2010.5537114
Circuits and Principles. Cambridge, MA: MIT Press. Zaveri, M. S., and Hammerstrom, D. (2011). Performance/price estimates for
Masuda, N., and Aihara, K. (2003). Duality of rate coding and temporal cod- cortex-scale hardware: a design space exploration. Neural Netw. 24, 291304.
ing in multilayered feedforward networks. Neural Comput. 15, 103125. doi: doi: 10.1016/j.neunet.2010.12.003
10.1162/089976603321043711
Mihalas, S., and Niebur, E. (2009). A generalized linear integrate-and-fire neural Conflict of Interest Statement: The authors declare that the research was con-
model produces diverse spiking behaviors. Neural Comput. 21, 704718. doi: ducted in the absence of any commercial or financial relationships that could be
10.1162/neco.2008.12-07-680 construed as a potential conflict of interest.
Minkovich, K., Srinivasa, N., Cruz-Albrecht, J. M., Cho, Y., and Nogin, A. (2012).
Programming time-multiplexed reconfigurable hardware using a scalable neu- Received: 25 September 2013; accepted: 26 February 2014; published online: 18 March
romorphic compiler. IEEE Trans. Neural Netw. Learn. Syst. 23, 889901. doi: 2014.
10.1109/TNNLS.2012.2191795 Citation: Wang RM, Hamilton TJ, Tapson JC and van Schaik A (2014) A mixed-signal
Mirhassani, M., Ahmadi, M., and Miller, W. C. (2007). A feed-forward implementation of a polychronous spiking neural network with delay adaptation.
time-multiplexed neural network with mixed-signal neuronsynapse arrays. Front. Neurosci. 8:51. doi: 10.3389/fnins.2014.00051
Microelectron. Eng. 84, 300307. doi: 10.1016/j.mee.2006.02.014 This article was submitted to Neuromorphic Engineering, a section of the journal
Python, D., and Enz, C. C. (2001). A micropower class-AB CMOS log-domain Frontiers in Neuroscience.
filter for DECT applications. IEEE J. Solid State Circuits 36, 10671075. doi: Copyright 2014 Wang, Hamilton, Tapson and van Schaik. This is an open-access
10.1109/4.933462 article distributed under the terms of the Creative Commons Attribution License
Saighi, S., Levi, T., Belhadj, B., Malot, O., and Tomas, J. (2010). Hardware (CC BY). The use, distribution or reproduction in other forums is permitted, provided
system for biologically realistic, plastic, and real-time spiking neural net- the original author(s) or licensor are credited and that the original publication in this
work simulations, in 2010 International Joint Conference on Neural Networks journal is cited, in accordance with accepted academic practice. No use, distribution or
(Barcelona), 17. doi: 10.1109/IJCNN.2010.5596979 reproduction is permitted which does not comply with these terms.

Frontiers in Neuroscience | Neuromorphic Engineering March 2014 | Volume 8 | Article 51 | 133


ORIGINAL RESEARCH ARTICLE
published: 21 November 2013
doi: 10.3389/fnins.2013.00215

Real-time biomimetic Central Pattern Generators in an


FPGA for hybrid experiments
Matthieu Ambroise 1 , Timothe Levi 1*, Sbastien Joucla 2 , Blaise Yvert 2 and Sylvain Saghi 1
1
Laboratoire IMS, UMR Centre National de la Recherche Scientifique, University of Bordeaux, Talence, France
2
Laboratoire INCIA (Institute for Cognitive and Integrative Neuroscience), UMR Centre National de la Recherche Scientifique, University of Bordeaux, Talence,
France

Edited by: This investigation of the leech heartbeat neural network system led to the development of
Andr Van Schaik, The University of a low resources, real-time, biomimetic digital hardware for use in hybrid experiments.
Western Sydney, Australia
The leech heartbeat neural network is one of the simplest central pattern generators
Reviewed by:
(CPG). In biology, CPG provide the rhythmic bursts of spikes that form the basis for all
Jorg Conradt, Technische Universitt
Mnchen, Germany muscle contraction orders (heartbeat) and locomotion (walking, running, etc.). The leech
Runchun M. Wang, University of neural network system was previously investigated and this CPG formalized in the
Western Sydney, Australia HodgkinHuxley neural model (HH), the most complex devised to date. However, the
M. Anthony Lewis, Qualcomm,
QTI, USA
resources required for a neural model are proportional to its complexity. In response
to this issue, this article describes a biomimetic implementation of a network of 240
*Correspondence:
Timothe Levi, Laboratoire IMS, CPGs in an FPGA (Field Programmable Gate Array), using a simple model (Izhikevich) and
UMR Centre National De La proposes a new synapse model: activity-dependent depression synapse. The network
Recherche Scientifique 5218, implementation architecture operates on a single computation core. This digital system
Universit de Bordeaux, 351 Cours
de la Libration, 33405 Talence,
works in real-time, requires few resources, and has the same bursting activity behavior
France as the complex model. The implementation of this CPG was initially validated by
e-mail: timothee.levi@ comparing it with a simulation of the complex model. Its activity was then matched
ims-bordeaux.fr with pharmacological data from the rat spinal cord activity. This digital system opens the
way for future hybrid experiments and represents an important step toward hybridization
of biological tissue and artificial neural networks. This CPG network is also likely to be
useful for mimicking the locomotion activity of various animals and developing hybrid
experiments for neuroprosthesis development.

Keywords: central pattern generator, biomimetic, neuron model, spiking neural networks, digital hardware, FPGA

INTRODUCTION from the electrophysiological properties of a single neuron to


Millions of people worldwide are affected by neurological disor- large-scale neural networks.
ders which disrupt connections between brain and body, causing Our study describes the development of a neuromorphic hard-
paralysis or affecting cognitive capabilities. The number is likely ware device containing a network of real-time biomimetic Central
to increase over the next few years and current assistive tech- Pattern Generators (CPG). The main goal of this research is to
nology is still limited. In recent decades, extensive research has create artificial CPGs that will be connected to ex vivo spinal
been devoted to Brain-Machine Interfaces (BMIs) and neuro- cord of rats and guinea pigs, thus achieving one main objec-
prosthesis in general (Hochberg et al., 2006, 2012; Nicolelis and tive of the Brainbow European project (Brainbow, 2012) toward
Lebedev, 2009), working toward effective treatment for these hybridization. Hardware-based SNN systems were developed for
disabilities. The development of these devices has had and, hope- hybrid experiments with biological neurons and the description
fully, will continue to have a profound social impact on these of those pioneer platforms was reported in the literature (Jung
patients quality of life. These prostheses are designed on the basis et al., 2001; Le Masson et al., 2002; Vogelstein et al., 2006). The
of our knowledge of interactions with neuronal cell assemblies, Brainbow project will go further by using a large-scale neural
taking into account the intrinsic spontaneous activity of neu- network instead of few neurons to substitute the functions of a
ronal networks and understanding how to stimulate them into biological sub-network. The final goal is the development of a
a desired state or produce a specific behavior. The long-term goal new generation of neuro-prostheses capable to restore the lost
of replacing damaged neural networks with artificial devices also communication between neuronal circuitries.
requires the development of neural network models that match Locomotion is one of the most basic abilities of animals.
the recorded electrophysiological patterns and are capable of pro- Neurobiologists have established that locomotion results from
ducing the correct stimulation patterns to restore the desired the activity of half-center oscillators that provides alternating
function. The hardware set-up used to interface the biological bursts. The first half-center oscillator was proposed by Brown
component is a Spiking Neural Network (SNN) system imple- (1914). Pools of interneurons control flexor and extensor motor
menting biologically realistic neural network models, ranging neurons with reciprocal inhibitory connections. Most rhythmic

www.frontiersin.org November 2013 | Volume 7 | Article 215 | 134


Ambroise et al. Biomimetic CPGs for Hybrid experiments

movements are programmed by central pattern-generating net-


works consisting of neural oscillators (Marder and Bucher, 2001;
Ijspeert, 2008). CPGs are neural networks capable of producing
rhythmic patterned outputs without rhythmic sensory or central
input. CPGs underlie the production of most rhythmic motor
patterns and have been extensively studied as models of neural
network function (Hooper, 2000). Half-center oscillators con-
trol swimming in xenopus, salamander (Ijspeert et al., 2007),
and lamprey (Cohen et al., 1992), as well as leech heartbeat
(Cymbalyuk et al., 2002), as described in numerous publications.
One key article on modeling the leech heartbeat system is Hill
et al. (2001), where the HodgkinHuxley formalism is used to
reproduce the CPG.
The main novelty of this research was to implement the leech
heartbeat system neural network with minimum resources while
maintaining its biomimetic activity. Indeed, the final application
is a hybrid experiment that requires spike detection, spike sort-
ing, and micro-electrode stimulation. All of these modules are
implemented in the same digital board. To achieve this, the Hill
et al. (2001) model and results were reproduced using a sim-
pler model (Izhikevich, 2004), implemented in an FPGA (Field
Programmable Gate Array) board. This digital board made it pos-
sible to design a frugal, real-time network of several CPGs (in this
case, a network of 240 CPGs implemented on a Spartan6 FPGA
board). For instance, this CPG network is capable of mimicking
the activity of a salamander, which requires 40 CPGs (Ijspeert,
FIGURE 1 | Electrical activity of the leech heartbeat system and
2001), or developing hybrid experiments (Le Masson et al., 2002)
diagram of the CPG. Neuron cell bodies are represented by circles. Axons
for neuroprosthesis development (Brainbow, 2012). and neurite processes are represented by lines. Inhibitory chemical
The first part of this article describes the biological leech heart- synapses are represented by small filled dots. (A) Electrical activity of two
beat system, based on one segmental CPG. The next section heart interneurons recorded extracellularly from a chain of ganglia (Hill
focuses on choosing a frugal neuron model to match the same et al., 2001). (B) A diagram of the elemental oscillator in the leech heartbeat
system. (C) A diagram of the segmental oscillator in the leech heartbeat
biological behavior. The following section explains the topol-
system, including two elemental oscillators, L3/R3 and L4/R4, and two
ogy of a single neuron and its implementation in the hardware, pairs of coordination neurons, L1/R1 and L2/R2.
followed by its extension to a neuron computation core for
increasing the size of the neural network. The next stage was to
develop a new synaptic model reproducing activity-dependent unit that capable of producing robust oscillations under normal
depression phenomena to fit the biological activity of a leech conditions. These neurons oscillate in alternation with a period
heartbeat. The architecture of this digital system is then described of about 1012 s (Krahl and Zerbst-Boroffka, 1983; Calabrese
in full, including the various blocks. Finally, the system was used et al., 1989; Olsen and Calabrese, 1996) demonstrated that the
to design a CPG network, validated by comparing our measure- synaptic connections among interneurons and from interneu-
ments with ex vivo rat spinal cord locomotion results following rons to motor neurons were inhibitory. The synaptic interaction
pharmacological stimulation. between reciprocally inhibitory heart interneurons consists of a
graded component in addition to spike-mediated synaptic trans-
MATERIALS AND METHODS missions (Angstadt and Calabrese, 1991). This kind of synapse is
DESCRIPTION OF THE LEECH BIOLOGICAL HEARTBEAT SYSTEM really difficult to implement in hardware as it contains sigmoid
All leech heartbeat studies agree that the CPG (Figure 1C) functions, differential equations, memory of last spikes, and so
responsible for this activity (Figure 1A) requires few neurons, on. A description of our synapse model reproducing the same
making it an ideal candidate system for elucidating the various behavior is included below.
biomechanisms governing CPG behavior. Nadim et al. (1995) and Olsen et al. (1995) developed a bio-
Modeling studies indicate that the burst duration of a leech physical model of a pair of reciprocally inhibitory interneurons in
heart interneuron in an elemental oscillator is regulated by the the leech heartbeat system. This model included synaptic ionic
interneuron itself and by the opposite interneuron (see L3 and currents based on voltage-clamp data. Synaptic transmissions
R3 in Figure 1B) (Calabrese, 1995; Nadim et al., 1995; Olsen between the interneurons consist of spike-mediated and graded
et al., 1995; Hill et al., 2001; Jezzini et al., 2004; Norris et al., synaptic currents. The Hill et al. (2001) model was derived from
2007). Figure 1A shows the electrical activity in the leech heart- a previous two-cell, elemental oscillator model (Nadim et al.,
beat system from extracellular recordings. The pair of neurons 1995) by incorporating intrinsic and synaptic current modifica-
is known as an elemental oscillator (Figure 1B), i.e., the smallest tions based on the results of a realistic waveform voltage-clamp

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 215 | 135
Ambroise et al. Biomimetic CPGs for Hybrid experiments

study (Olsen and Calabrese, 1996). This new, segmental oscilla- Choice and presentation of the Izhikevich model
tor model behaves more similarly to biological systems. Figure 1C In designing a SNN, the first step is the choice of a biologically
shows a model of the system. The real-time digital segmental realistic model. Indeed, a mathematical model based differen-
oscillator model design will be based on this architecture. The tial equations is capable of reproducing a behavior quite similar
next part will describe the system modeling the leech heartbeat to that of a biological cell. The choice of model was based on
with the goal of implementing it in hardware. The leech heartbeat two criteria: the family of neurons able to be reproduced and
CPG was chosen for the long duration of the burst. the number of equations. These criteria were used to compare
several models, including the Leaky Integrate and Fire model
SYSTEM MODELING FOR HARDWARE IMPLEMENTATION (LIF) (Indiveri, 2007), the HodgkinHuxley model (HH), and the
State of art Izhikevich model (IZH).
Some previous studies used silicon neurons (Indiveri et al., 2011) Hill et al. (2001) used the HH to reproduce the leech heart-
to simulate the leech heartbeat system (Simoni et al., 2004; Simoni beat system with eight neurons (Figure 1C). From the equations
and DeWeerth, 2007). Sorensen et al. (2004) created a hybrid defined in this paper, it was established that the eight neurons
system of a heart interneuron and a silicon neuron. The silicon in the heartbeat leech behaved like regular spiking ones (RS).
neuron provides real-time operation and implements a version Indeed, this model was composed of nine voltage-dependent
of the HodgkinHuxley formalism (Hodgkin and Huxley, 1952). currents with different calcium conductances.
However, due to the complexity of the model, it was only possi- The HH model reproduces all types of neurons with good
ble to use a small number of silicon neurons and, therefore, only accuracy (spike timing and shape). Its main drawbacks are the
one CPG. This study describes the same results using a large CPG large number of parameters and the equations required. In the
network (240 CPGs on a Spartan6 FPGA board), in preparation heartbeat network, the main focus is on excitatory neurons, like
for future hybrid experiments with different CPGs. For instance, RS. The HH model required 32 parameters for an RS and 26 for a
in the salamander model (Ijspeert, 2001), the body CPG consists fast-spiking neuron (FS) (Grassia et al., 2011). Furthermore, sim-
of 40 interconnected segmental networks. ulating an RS neuron required four ionic channels (dynamics of
When a silicon neuron and heart interneuron are connected potassium and sodium ions, leak current, and slow potassium).
with reciprocal inhibitory synapses of appropriate strength, In contrast, LIF only involves two equations but is only capable of
they form a hybrid elemental oscillator that produces oscilla- simulating a few types of neurons.
tions remarkably similar to those seen in the living system. The IZH represents a good solution, as it is based on two equa-
Olypher et al. (2006) described the control of burst dura- tions and is capable of reproducing many different families of
tion in heart interneurons using a hybrid system, where a liv- neurons by changing four parameters. Furthermore, according
ing, pharmacologically-isolated, heart interneuron was connected to Izhikevich (2004), this model is resource-frugal, a key advan-
with artificial synapses to a model heart interneuron running tage when the aim is to design a large CPG network embedded in
in real-time (software). Using an FPGA board will make it pos- the same board as other modules required for hybrid experiments
sible to operate in real time using a large number of neurons, (spike detection, spike sorting, stimulation, etc.).
together with customized systems for various applications (hybrid The IZH model depends on four parameters, which make it
experiments). possible to reproduce the spiking and bursting behavior of spe-
A few studies (Torres-Huitzil and Girau, 2008; Rice et al., cific types of cortical neurons. From a mathematical standpoint,
2009; Serrano-Gotarredona et al., 2009; Barron-Zambrano et al., the model is described by a two-dimensional system of ordinary
2010; Barron-Zambrano and Torres-Huitzil, 2013) reported on differential equations (Izhikevich, 2003):
CPG in FPGA for robotic applications. These studies used sim-
ple neuron-models and were more bio-inspired than biomimetic.
dv
Guerrero-Riberas et al. (2006) implemented a network of LIF = 0.04v2 + 5v + 140 u + IIzh (1)
neurons with synapses and plasticity, but not in biological time, dt
so it was impossible to perform hybrid experiments. While du
= a(bv u) (2)
multi-legged robots need CPG to move or coordinate their dt
movements, they implement an AmariHopfield CPG (Amari,
1972) or basic CPGs (Van Der Pol, 1928), modeled as non- with the after-spike resetting conditions:
linear oscillators. Those models provide sinusoidal oscillations 
that are not biorealistic. The ultimate goal of these studies is vc
if v 30 mV (3)
to create a robot that mimics biological behavior but these uu+d
systems cannot be used for hybrid experiments. Analog hard-
ware has also been implemented (Linares-Barranco et al., 1993; In equation (3), v is the membrane potential of the neuron, u is a
Still and Tilden, 1998; Lewis et al., 2001; Nakada, 2003; Still membrane recovery variable, which takes into account the activa-
et al., 2006; Lee et al., 2007; Wijekoon and Dudek, 2008). tion of potassium and inactivation of sodium channels, and IIzh
However, it is very difficult to tune analog circuits due to param- describes the input current from other neurons.
eter mismatch. For these works, they either design bio-inspired The IZH model was chosen to emulate the behavior of the
oscillators for creating CPG or implement few biomimetic excitatory cells for its simplicity and its capacity to implement
neurons. various families of neurons. The next step was to determine the

www.frontiersin.org November 2013 | Volume 7 | Article 215 | 136


Ambroise et al. Biomimetic CPGs for Hybrid experiments

network system topology. The next section describes the design


of one neuron and its extension to a neuron computation core,
then the different synapse models implemented, and, finally, the
topology of the network.

SYSTEM TOPOLOGY
Topology of one neuron core: architecture and implementation
In order to make the Izhikevich neural network more biomimetic,
the IIzh current from equation (1) was split into three: Ibias , Iexc ,
and Iinh . Ibias is the biasing current, Iexc is the positive con-
tribution due to excitatory synapses, and Iinh is the negative
contribution of inhibitory synapses. Those currents will be
detailed in Synapse Model. As suggested in Cassidy and Andreou
(2008), equation (1) was multiplied by 0.78125 to make it easier
to implement on a digital board. These modifications gave (4),
where the u coefficient is still 1 thanks to Ibias current.

dv 1 2
= 32 v + 4v + 109.375 u + Ibias + Iexc + Iinh
dt
(4)
du
= a (bv u)
dt FIGURE 2 | Architecture of the u and v pipelines in the neural
computation core. The computation cycles are separated by dotted lines.
v[n+1]v[n]
dt =
Moreover, dv t and, as the time step of the IZH
model is equal to one millisecond (t = 1):
the same time as the first multiplication (step 3). By multiplexing
 operands, the same multiplier is used for the following multipli-
v [n + 1] = 1 32 v [n]2 + 5v [n] + 109.375 u [n] cations in different computation cycles. In step 4, v2 is obtained
by another multiplication. A simple two-bit shift makes it possi-
+ Ibias [n] + Iexc [n] + Iinh [n] (5)
ble to obtain 4v and add it to v. At the same time, u is used in two
u [n + 1] = u [n] + a.(b.v [n] u [n]) subtractions. Step 5 consists of a 5-bit shift to obtain (1/32)v2 , an
addition, and the last multiplication. In step 6, the computation
One neuron was implemented on the FPGA board according of both u and v is completed. In the next step, the v value is tested
to these equations and specifications. This neuron was then against the threshold to determine whether the neuron has emit-
extended into a neuron computation core that updated the u and ted a spike or not. This test gives the next u and v values for this
v values of all neurons in the network. Consequently, the neu- neuron to be stored in the RAM.
ron implementation became a neuron computation core. For An RS neuron with a = 0.002, b = 0.2, c = 65, and d = 8
instance, around 2000 independent neurons could be imple- was used to implement the CPG.
mented on our digital board. In this system, the type of neuron Once the neuron computation core was implemented, the
is defined by the four Izhikevich parameters: a, b, c, and d from synaptic model was chosen and implemented.
equations (2) and (3). Moreover, the state of a neuron is defined
by values u and v, and the three current values. Those 9 val- Synapse model
ues were saved in a RAM for use in the next millisecond in the A network is defined by a group of neurons and a group of
step computation. By extension, the same process can be used for synapses. Once the neuron model had been chosen, it was obvi-
every neuron in the network. ously necessary to choose a synapse model. Like the neuron
Each u and v computation step is run in parallel, using two model, this model had to be biomimetic but frugal in its use of
pipelines based on the architecture presented in [9]. The topology resources. In biology, synapses are described as links between neu-
is presented in Figure 2. All parameters from equations (2), (3), rons that transmit different types of synaptic currents to each
and (4), as well as the u and v values used in the computation are other to either excite or inhibit neuron activity. In our imple-
synchronized in one cycle before going through the pipelines (not mentation, a synaptic weight (Wsyn ) was added to the synaptic
shown in Figure 2). current. When Wsyn was positive, it was added to Iexc (excitatory
To resume, each neuron is represented by one v and u synaptic current) and when Wsyn was negative, it was added to
value, four Izhikevich coefficients (a, b, c, and d), and three Iinh (inhibitory synaptic current).
currents (Ibias , Iexc , and Iinh ). Thanks to AMPA and GABA effects, all synaptic current exci-
Iexc , Iinh , and Ibias are added in two cycles at the beginning of tations or inhibitions, respectively decay exponentially (Ben-Ari
the v pipeline, while the u pipeline is still inactive (steps 1 et al., 1997). AMPA is an excitatory neurotransmitter that depo-
and 2). The current sum is added to the constant 109.375 and at larizes the neuron membrane whereas GABA is an inhibitory

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 215 | 137
Ambroise et al. Biomimetic CPGs for Hybrid experiments

neurotransmitter with a hyperpolarizing effect. Depolarization syn equals zero there is no depression on Wsyn and when syn
or hyperpolarization are represented by a positive or negative equals one there is maximum depression on Wsyn and the synapse
contribution on the synaptic current. is exhausted.
The synaptic current Isyn was implemented with a time Ws was used instead of Wsyn as the synaptic weight for each
constant syn for the exponential decay, as follows: synapse. Then, according to the activity-dependent depression
effect, when there is a spike, Ws is added to the synaptic current:

 Isyn (t + T) Isyn (t) Ws [n] = Wsyn syn [n] Wsyn (11)


Isyn (t) = syn Isyn (t) = syn . (6)
T
The other effect of activity-dependent depression is to increase
 
T syn after each spike, thanks to the percentage dissipation (P).
Isyn (t + T) = 1 Isyn (t) (7)
syn
syn [n + 1] = syn [n] + P (1 syn [n]) (12)
When computation step T equals one millisecond and syn is in
ms:   The regeneration or reloading of synaptic vesicles is represented
1 by syn decreasing to zero. Thus, syn decays exponentially when
Isyn (t + 1) = 1 Isyn (t) (8)
syn no spike is emitted. So, using the method described in Synapse
Model:
1
1 syn [n + 1] = syn [n] syn [n] (13)
Isyn [n + 1] = Isyn [n] Isyn [n] (9) reg
syn
To summarize, all synapses are now represented by (12), (13), (14)
Adding the synaptic weight to the synaptic current, the new and:
equation is:
1
1 Isyn [n + 1] = Isyn [n] Isyn [n] + Ws [n] (14)
Isyn [n + 1] = Isyn [n] Isyn [n] + Wsyn [n] (10) syn
syn
The main parameters are: synaptic weight, Wsyn ; level of depres-
The synaptic computation core implementation is based on the sion, syn ; and percentage dissipation, P. All these parameters are
same principle as the neuron computation core. However, this stored in the RAM on the digital board. Furthermore, this com-
model is not adequate to fit biological data. It was, therefore, putation required greater precision due to the sensitivity of the
decided to implement an activity-dependent depression, where parameters. The 26-bit signed fixed representation chosen had
the new synaptic weight, Ws , was dependent on Wsyn . 1-bit for the sign, 9-bits for the whole numbers, and 16 for the
decimals.
Activity-dependent depression Once the neuron and synapse models had been designed, it
As the synaptic behavior described in Hill et al. (2001) requires was possible to develop the neural network topology.
too many resources to be implemented on FPGA, the method
chosen to fit overall biological behavior was activity-dependent Network topology
depression (Tabak et al., 2000). Activity-dependent depression of Three elementary blocks. The architecture was based on three
synapses is another biological phenomenon consisting of reduc- main blocks: the neuron implemented (or neuron computation
ing a synaptic weight after a spike. In biology, each synapse core), a synapse, and the RAM. The connectivity between those
contribution is provided by a synaptic vesicle. These vesicles blocks is shown in Figure 3.
contain ions that empty out at each spike and then regenerate, fol- So far, the neuron computation core can update the state
lowing an exponential rule. According to Matsuoka (1987), four (u and v variables) of each neuron. In the digital network,
methods provide a stable rhythm within a network (regulation the role of the synapse is to update all synaptic currents and
of stimulus intensity, change in input, alteration of stimuli, and weights related to the activity of all neurons, so the synapse block
change in synaptic weight). The phenomenon, known as activity- exhibits two behaviors (spiking or not). These two behaviors are
dependent depression changes the synaptic weight depending on summarized in Table 1.
the activity of the network. The IZH model has a time step of one millisecond, so the other
This phenomenon has been reported in neurobiology litera- computation was synchronized with this time step. The new val-
ture but no model had been devised. This paper proposes a model ues of u and v, the exponential decay of Isyn , and the new values
of this activity-dependent depression that was implemented in of each synaptic current are computed in the same millisecond.
digital hardware to improving our CPG network. Moreover, a biological neural network is composed of Nn neu-
As previously explained, each time a neuron emits a spike; the rons and Ns synapses. To define which neuron is connected to
synapse adds a synaptic weight (Wsyn ) to the synaptic current. which and with which kind of synapse (excitatory or inhibitory),
At the same time, the factor (syn ) indicating the level of depres- the network is described using two matrixes: connectivity and
sion on a synaptic weight increases. Furthermore, syn regulates synaptic weight (see Figure 3). To save RAM, both matrixes are
Wsyn . The value of syn is between 0 and 1. Consequently, when implemented as sparse matrixes with Nn lines. The ith line in the

www.frontiersin.org November 2013 | Volume 7 | Article 215 | 138


Ambroise et al. Biomimetic CPGs for Hybrid experiments

FIGURE 4 | Example of matrix design depending on the neural


FIGURE 3 | Global architecture of the spiking neural network. network. Neuron 2 (N2) is connected by an inhibitory synapse to neuron 1
(N1) and by an excitatory synapse to neuron 3 (N3). Then, on line 2 in the
connectivity matrix, N2 is connected to N1, N3, and a virtual neuron (VN),
indicating the end of the connection. In the synaptic weight matrix, the
Table 1 | Description of the equations for synaptic currents and synapses for neuron 2 (S2) have a negative weight for the inhibitory
activity-dependent depression. synapse and a positive weight for the excitatory synapse. Note the
correspondence of its position in both matrixes.
When a spike is emitted When no spike is emitted

Synaptic current Synaptic current


Network machine states. The synaptic current is computed in
Iexc [n + 1] = Iexc [n] + Ws [n] Iexc [n + 1] = Iexc [n] three successive steps:
or and
Iinh [n + 1] = Iinh [n] + Ws [n] Iinh [n + 1] = Iinh [n] EXT state: for closed-loop experiments, we implement this
state in which external feedback can interact with the artificial
neural network. This first state consists of using the synaptic
Activity-dependent depression Activity-dependent depression
block to update the synaptic current. In this case, presynaptic
syn [n] spikes are external events (see Figure 3), such as stimulation
syn [n + 1] = syn [n] + P(1 syn [n]) syn [n + 1] = syn [n]
reg from biological neurons in the case of neuroprosthesis. This
state makes it possible to stimulate each neuron.
NEUR state: during this step, the neuron membrane (u
connectivity matrix corresponds to the connectivity of presynap- and v from Figure 2) and all exponential decay values are
tic neuron Ni to the other neurons. The synapses are identified by computed in parallel.
the postsynaptic neuron addresses. For example, the connection SYN state: the last step consists of updating the synaptic current
to neuron Nj is identified by the number j on the ith line. In the to reflect the presynaptic spikes computed in the NEUR state.
worst case, each neuron is connected to itself and all the others, These updated current values are used in the EXT state during
giving Nn columns. Each matrix line ends with a virtual neuron the next cycle.
(address Nn + 1). This implementation is not optimum for the
worst case, but the gain is significant for biologically plausible net- The EXT, NEUR, and SYN states must be completed within a
works, where the total number of synapses is at least four times one millisecond time step. If the computation of all three states
smaller. Marom and Shahaf (2002) and Garofalo et al. (2009) is completed in less than 1 ms, an IDLE state is implemented
estimated the average connectivity level of neural networks at until the end of the cycle. Moreover, the blocks (neuron compu-
their mature phase each neuron is mono-synaptically connected tation core and synapse computation core) described in Figure 3
to 1030% of all the other neurons. are multiplexed in time to reduce the implementation area in
There is a direct link between the matrixes: the synaptic weight large-scale neural networks.
matrix is the same size as the connectivity matrix, i.e., the same Our architecture has two main limits: the number of avail-
number of lines and columns, with the virtual neurons in the able cycles (Nc) in one millisecond and the size of the RAM
same position (Figure 4). The connectivity between two neurons used to save all parameters. Two equations derived from these
described by the coordinates (k, l) in the connectivity matrix has limits determine the maximum size of the implementable
the weight shown in box (k, l) in the synaptic weight matrix. neural network, in terms of number of neuron (Nn) and
A third matrix based on the same principle completes the system: synapses (Ns).
the percentage efficiency matrix, which gives the percentage dissi- In the EXT state, all synaptic currents are updated in 10 cycles
pation, P, of each synapse in a network, as defined in the previous for each neuron, i.e., 10 Nn cycles. Each neuron requires 11 cycles
section on activity-dependent depression. We will describe now to compute the NEUR state, i.e., 11 Nn cycles. The synaptic cur-
the state machine of the neural network. rent update during state SYN requires 10 cycles per synapse, i.e.,

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 215 | 139
Ambroise et al. Biomimetic CPGs for Hybrid experiments

10 Ns. Figure 2 describes 7 cycles for the neuron computation


core, but 4 more cycles are required to read and save the various
parameters in the RAM.
This leads to the following equation for computing the max-
imum number of neurons that may be implemented, depending
on the number of cycles available:

10 Nn + 11 Nn + 10 Ns Nc (15)

Having built all the component parts of this real-time,


biomimetic digital system, it was possible to validate it by several
experiments, presented in the following section.

FIGURE 5 | Comparison between elemental oscillator (Figure 1B)


RESULTS bursting activity in the complex model simulated by scilab, as
A CPG is defined by the number of neurons and the families of described in Hill et al. (2001) and the elemental oscillator presented
above thanks to a logic analyzer. The time scale is the same.
neurons and synapses. The leech heartbeat neural network was
simulated by an appropriate CPG configuration.
Hill et al. (2001) presented an elemental oscillator, based on
two excitatory neurons linked by inhibitory synapses. A segmen-
tal oscillator may consist of 410 neurons. A two-neuron network
(elemental oscillator from Figure 1B) was chosen to validate our
topology, followed by an eight-neuron neural network (segmen-
tal oscillator from Figure 1C). The activity of our system was
then compared with that of an ex vivo rat spinal cord, stimu-
lated with pharmacological solutions. It was also demonstrated
that the period of bursting activity could be modified depending
on one parameter. This will be useful in future closed-loop hybrid
experiments. FIGURE 6 | Logic analyzer measurements of the digital eight-neuron
Biological CPGs provide specifications concerning their CPG. L3 and R3 show the activity of the first oscillator. L4 and R4 show the
behavior. Indeed, their activity is characterized by periodic long activity of the second oscillator. L1/R1 and L2/R2 are the coordination
neurons.
bursts (lasting many seconds). Each burst begins by a quick rise
in spike frequency to a maximum and ends with a low final spike
frequency.
COMPARISON OF BIOLOGICAL/DIGITAL SEGMENTAL OSCILLATOR
COMPARISON OF BIOLOGICAL/DIGITAL ELEMENTAL OSCILLATOR Keeping the time constant, the biological behavior of the eight-
The first example of a CPG was the elemental oscillator (with only neuron network was duplicated using the following parameters.
two neurons). To reproduce activity accurately, it was necessary to This time, an eight-neuron CPG was implemented using the same
obtain the following values: ampa (time constant of the inhibitory values for current and reg than as those used for the elemen-
synaptic current exponential decay), reg (time constant of the tal oscillator. The use of 8 neurons made it possible to maintain
recovery of synaptic vesicles), and P (percentage dissipation). the period without variation (see Table 2) by slowing down the
These values will be the same for each synapse. The following val- two pairs of oscillators with coordination neurons (De Schutter,
ues were chosen to match biological behavior: current = 100 ms 2000).
and reg = 4444 ms (so 1/current = 0.01 and 1/reg = 0.0002). In Figures 1C, 6, L3/R3 and L4/R4 correspond to the two
The Ibias current was equal to 8 for both neurons. The synaptic elemental oscillators and are coupled to the L1/R1 and L2/R2
weights are 5.1 and the percentages of dissipation are 1.49. coordination neurons. The connectivity between each neuron
This model was validated by comparing its implementation is following Figure 1C. The synaptic weights are 7 and the
with the complex model in Hill et al. (2001) (see Figure 5). percentage of dissipation is 2.65.
In this case, the activity of one neuron inhibits the sec- The mean period, duty cycle, and variations in spike fre-
ond neuron. Due to activity-dependent depression and the quency depending on their position in the burst were mea-
GABA effect, the inhibition ends and lets the second neuron sured to quantify the overlap of bursting activity (Table 2). The
fire again. In both cases (biological modeling system and digi- mean period of this digital implementation was similar to bio-
tal system), the bursting activity was similar in terms of period logical values. Note that the segmental oscillator exhibited less
and duty cycle, thus validating the simplified elemental oscil- variation than the elemental system, thanks to its coordination
lator with the complex one. The next step was to validate neurons.
the segmental oscillator and compare its implementation with Also, in general, the spike frequency of our implementation
biological data. was similar to that of the biological system. Due to our synapse

www.frontiersin.org November 2013 | Volume 7 | Article 215 | 140


Ambroise et al. Biomimetic CPGs for Hybrid experiments

Table 2 | Comparison of burst characteristics in the two digital Table 3 | Variation in the mean period depending on the reg
implementations and the biological system. parameter.

Biological Elemental Segmental 1/reg (ms1 ) Mean period (s)


system oscillator oscillator
(Hill et al., (digital) (digital) 0.09 4.4 1.6
2001) 0.15 7.2 1.2
0.20 11.2 1
Mean period 1012 s 12.6 1.4 s 11.2 1 s 0.22 12.9 1.1
Mean duty cycle 57.2 2.9% 54.7 6% 46.1 6%
Mean spike 11.9 2.1 Hz 12.1 1 Hz 11.2 1 Hz
frequency Table 4 | Resources required for one CPG on a Spartan 6 digital board.
Initial spike 4.3 0.7 Hz 8.5 0.2 Hz 8.6 0.4 Hz
frequency Resources Total Used for Used for
Peak spike 17.5 3.2 Hz 13 0 Hz 12.5 0 Hz available one CPG 240 CPGs
frequency
Slice FFs 184304 1,093 (0.6%) 1,459 (0.8%)
Final spike 5.8 1.0 Hz 8.1 0.2 Hz 9.3 3 Hz
Slice LUT 92152 1,037 (1.2%) 1,756 (1.9%)
frequency
DSP48A1 180 10 (5.6%) 10 (5.6%)
RAMB16BWER 268 1 42 (756 kb)
Total RAM 4824 kb 9 kb (0.2%) 765 kb (16%)
model, the frequency reached a maximum in each spike burst
but remained on a plateau instead of decreasing to the minimum
frequency immediately. In the biological system, the behavior
described is due to the enhancement and attenuation of variations FPGA RESOURCES
in conductance. However, the IZH model does not include con- Originally, a CPG consisted of 8 neurons and 12 synapses but
ductance, so it cannot be as biomimetic as the HH model. This 2 additional synapses per CPG were required to create a net-
highlights a weak point of the implementation presented here, work of CPG, by connecting CPG to another one. Thus, each
but even the HH model, Hill et al. (2001) was unable to mimic CPG consisted of 8 neurons and 14 synapses. In terms of cycles
this biological variation in spike frequency in a single burst. One and available memory, this implementation was capable of run-
discrepancy between the model and the biological system is that ning 240 CPGs on a Spartan 6 digital board [see equation (15)
the initial and final spike frequencies of a burst were consis- and Table 4]. The power consumption of one CPG is 8 mW and
tently lower in the biological system. In both implementations, for CPGs is 20 mW. We could reduce it in the future by design-
the most inconvenient drawback was the variation in the duty ing a custom ASIC. For neuroprosthesis application, the power
cycle, explained by the stability of the IZH model. One perspec- consumption should be lower than 80 mW/cm2 chronic heat dis-
tive of this work to ensure stability is described in the discussion sipation level considered to prevent tissue damage (Zumsteg et al.,
section. 2005).
These experiments validated the implementation of our ele-
mental and segmental oscillators. This table also confirms that COMPARISON WITH EX-VIVO RAT SPINAL CORD RESULTS USING
designing a biomimetic system was a good choice. Indeed, the PHARMACOLOGICAL STIMULATION
variations of the duty cycle and the period for the bursting The final validation of this system consisted of comparing the
activity could not be reproduced by bio-inspired oscillators. The CPG output with ex vivo physiological data obtained from the
next step was to identify one parameter that would modify the spinal cord of newborn rat [postnatal day (P)12]. Bursting
bursting activity period, which would be useful in closed-loop locomotor-like activity was induced by bath-application of aCSF
applications. (artificial cerebrospinal fluid) mixed with N-methyl-DL-aspartate
(NMA; 10 M), serotonin (5HT; 5 M), and dopamine (DA;
VARIATION IN THE MEAN PERIOD DEPENDING ON ONE PARAMETER 50 m) (all purchased from Sigma-Aldrich, France).
A CPG is defined here by the number of neurons and the type For the elemental oscillator. Neuron N1 (corresponds to neu-
synapses involved, the static currents of each neuron, the percent- ron L3 in Figure 1B) is connected to neuron N2 (corresponds to
age dissipation, and the synaptic efficiency time constant. neuron R3 in Figure 1B) by an inhibitory synapse with a synap-
Changing the synaptic efficiency time constant reg modifies tic weight of 7 and a percentage dissipation of 12%. The Ibias
the period of each spike burst (Table 3). The variation in reg current is equal to 7 for both neurons.
affects the period and duration of each burst, as well as the Figure 7 shows that the digital system fits the biological
duty cycle and the variability of these parameters: the greater recordings of the newborn rat spinal cord. The period and duty
the value of 1/reg , the longer the mean period of bursting cycle of the bursting activity are the same, confirming that the dig-
activity. ital system was suitable for hybrid experiments. Instead of using
The possibility of modifying the period using a single param- pharmacological stimulation, the digital board will be used in the
eter is very useful and was applied in a closed-loop hybrid near future to create a hybrid experiment involving the ex vivo
experiment concerning locomotion behaviors. spinal cord and the digital CPGs. A closed-loop is also possible

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 215 | 141
Ambroise et al. Biomimetic CPGs for Hybrid experiments

board includes several modules, including an MEA (Micro-


Electrode Array) and spike detection block, to detect and
record neural activity in the spinal cord. All these modules,
together with the CPG network, will be implemented in the
same FPGA. Our neurophysiologist colleagues will identify the
best spinal cord sites to stimulate and record bursting activ-
ity. These sites will be hybridized to the output of the artifi-
cial CPG described in this paper and, in turn, its activity will
drive the various ventral root outputs of the spinal cord into
full locomotor-like activity. These future experiments aim to
demonstrate that hybrid artificial/biological networks provide
possible solutions for rehabilitating lost central nervous system
function.
Our CPG network could be also used to study the locomo-
tion of different animals. Indeed, according to Ijspeert (2001),
the locomotion activity of a salamander requires 40 CPGs, so the
240 CPGs implemented on the Spartan 6 digital board would
FIGURE 7 | Comparison of pharmacological in-vitro spinal cord with
digital CPG. be suitable for studying more complex locomotion. Our system
will be used in a closed-loop system with different sensors and
actuators.
thanks to the possibility of changing the mean period of bursting
activity by modifying a single parameter (reg ). ACKNOWLEDGMENTS
This work is supported by the European Unions Seventh
DISCUSSION Framework Programme (ICT-FET FP7/2007-2013, FET Young
One key step in designing a neuroprosthesis is to produce a Explorers scheme) under grant agreement n 284772 BRAIN
large, resource-frugal biomimetic SNN. A biologically realistic BOW (www.brainbowproject.eu).
CPG (i.e., the leech heartbeat system neural network) was imple-
mented with a minimum resource cost in terms of neuron model, REFERENCES
while maintaining its biomimetic activity, as shown in the Results. Amari, S. (1972). Characteristic of the random nets of analog neuron-like ele-
The first step was to model the biological leech heartbeat system ments. IEEE Trans. Syst.Man Cybern. 2, 643657. doi: 10.1109/TSMC.1972.
4309193
using a single, segmental CPG. The next stage was to choose an Angstadt, J. D., and Calabrese, R. L. (1991). Calcium currents and graded synap-
efficient neuron model that required few resources for its digital tic transmission between heart interneurons of the leech. J. Neurosci. 11,
implementation but remained biorealistic enough to match the 746759.
behavior of biological cells. The topology and hardware imple- Barron-Zambrano, J. H., and Torres-Huitzil, C. (2013). FPGA implementation of
mentation of a single neuron were then extended to form a neu- a configurable neuromorphic CPG-based locomotion controller. Neural Netw.
45, 5061. doi: 10.1016/j.neunet.2013.04.005
ron computation core built into a large-scale neural network: 240 Barron-Zambrano, J. H., Torres-Huitzil, C., and Girau, B. (2010). FPGA-based
CPGs on a Spartan6 FPGA board. Furthermore, the new synaptic circuit for central pattern generator in quadruped locomotion. Aust. J. Intell.
model proposed reproduced the activity-dependent depression Inform. Process. Syst. 12, 2429.
phenomenon, which had only previously been described in biol- Ben-Ari, Y., Khazipov, R., Leinekugel, X., Caillard, O., and Gaiarsa, J. L.
(1997). GABAA, NMDA and AMPA receptors: a developmentally regu-
ogy literature. The architecture of the entire real-time systemwas
lated mnage trois. Trends Neurosci. 20, 523529. doi: 10.1016/S0166-2236
described in detail. Finally, the system was validated by several (97)01147-8
experiments comparing both elemental and segmental oscilla- Brainbow. (2012). Brainbow Project European Unions Seventh Framework
tors with biological data, and comparing the segmental oscilla- Programme (ICT-FET FP7/2007-2013, FET Young Explorers scheme) Under
tor with ex vivo rat spinal cord stimulated by pharmacological Grant Agreement n 284772. Available online at: www.brainbowproject.eu
Brown, T. (1914). On the nature of the fundamental activity of the nervous centres;
solutions.
together with an analysis of the conditioning of rhythmic activity in progression
The short-term prospect of this work is to improve the stabil- and a theory of the evolution of function in the nervous system. J. Physiol. 48,
ity of the system using another neuron model. Currently our work 1846.
is focused on the quartic model (Touboul, 2009), which is more Calabrese, R. L. (1995). Half-center oscillators underlying rhythmic movements,
stable than the Izhikevich one and also requires few resources. As in The Handbook of Brain Theory And Neural Networks, ed M. A. Arbib
(Cambridge, MA: MIT Press), 444447.
described in Table 2, this system is subject to variations in duty Calabrese, R. L., Angstadt, J., and Arbas, E. (1989). A neural oscillator based on
cycle and mean period, likely to be reduced by using the new reciprocal inhibition. Perspect. Neural Syst. Behav. 10, 3350.
model. However, these variations also exist in biology, so it is Cassidy, A., and Andreou, A. G. (2008). Dynamical digital silicon neurons, in
necessary to study the actual effect of these variations in the bio- IEEE Biomedical Circuits and Systems Conference, BioCAS 2008, 289292. doi:
logical system to determine whether they should be eliminated or 10.1109/BIOCAS.2008.4696931
Cohen, A. H., Ermentrout, G. B., Kiemel, T., Kopel, N., Sigvardt, K. A., and
not. Williams, T. L. (1992). Modelling of intersegmental coordination in the lamprey
In the medium term, this system will be included in a hybrid central pattern generator for locomotion. Trends Neurosci. 15, 434438. doi:
experiment using an ex vivo rat spinal cord. The experiment 10.1016/0166-2236(92)90006-T

www.frontiersin.org November 2013 | Volume 7 | Article 215 | 142


Ambroise et al. Biomimetic CPGs for Hybrid experiments

Cymbalyuk, G. S., Gaudry, Q., Masino, M. A., and Calabrese, R. L. (2002). Bursting Zaghloul, J. Meador, and R. Newcomb (Boston: Kluwer Academic Publishers),
in leech heart interneurons: cell-autonomous and network-based mechanisms. 199247.
J. Neurosci. 22, 1058010592. Marder, E., and Bucher, D. (2001). Central pattern generators and the con-
De Schutter, E. (ed.) (2000). Computational Neuroscience: Realistic Modeling for trol of rhythmic movements. Curr. Biol. 11, 986996. doi: 10.1016/S0960-
Experimentalists. Boca Raton, FL: CRC Press. doi: 10.1201/9781420039290 9822(01)00581-4
Garofalo, M., Nieus, T., Massobrio, P., and Martinoia, S. (2009). Evaluation of Marom, S., and Shahaf, G. (2002) Development, learning and memory in large
the performance of information theory-based methods and cross-correlation random networks of cortical neurons: lessons beyond anatomy. Q. Rev. Biophys.
to estimate the functional connectivity in cortical networks. PLoS ONE 4:e6482. 35: 6387. doi: 10.1017/S0033583501003742
doi: 10.1371/journal.pone.0006482 Matsuoka, K. (1987). Mechanism of frequency and pattern control in the neural
Grassia, F., Buhry, L., Levi, T., Tomas, J., Destexhe, A., and Saighi, S. (2011). Tunable rhythm generators. Biol. Cybern. 56, 345353. doi: 10.1007/BF00319514
neuromimetic integrated system for emulating cortical neuron models. Front. Nadim, F., Olsen, O. H., De Schutter, E., and Calabrese, R. L. (1995). Modeling
Neurosci. 5:134. doi: 10.3389/fnins.2011.00134 the leech heartbeat elemental oscillator. J. Comput. Neurosci. 2, 215235. doi:
Guerrero-Riberas, R., Morrison, A., Diesmann, M., and Pearce, T. (2006). 10.1007/BF00961435
Programmable logic construction kits for hyper-real-time neuronal modeling. Nakada, K. (2003). An analog cmos central pattern generator for interlimb coordi-
Neural Comput. 18, 26512679. doi: 10.1162/neco.2006.18.11.2651 nation in quadruped locomotion. IEEE Tran. Neural Netw. 14, 13561365. doi:
Hill, A. A., Lu, J., Masino, M. A., Olsen, O. H., and Calabrese, R. L. (2001). A model 10.1109/TNN.2003.816381
of a segmental oscillator in the leech heartbeat neuronal network. J. Comput. Nicolelis, M. A. L., and Lebedev, M. A. (2009). Principles of neural ensemble physi-
Neurosci. 10, 281302. doi: 10.1023/A:1011216131638 ology underlying the operation of brain-machine interfaces. Nat. Rev. Neurosci.
Hochberg, L. R., Bacher, D., Jarosiewicz, B., Masse, N. Y., Simeral, J. D., 10, 530540. doi: 10.1038/nrn2653
Vogel, J., et al. (2012). Reach and grasp by people with tetraplegia using Norris, B., Weaver, A., Wenning, A., Garcia, P., and Calabrese, R. L. (2007). A cen-
a neurally controlled robotic arm. Nature 485, 372375. doi: 10.1038/ tral pattern generator producing alternative outputs: phase relations of leech
nature11076 hear motor neurons with respect of premotor synaptic input. J. Neurophysiol.
Hochberg, L. R., Serruya, M. D., Friehs, G. M., Mukand, J. A., Saleh, M., Caplan, A. 98, 29832991. doi: 10.1152/jn.00407.2007
H., et al. (2006). Neuronal ensemble control of prosthetic devices by a human Olsen, O. H., and Calabrese, R. L. (1996). Activation of intrinsic and synaptic
with tetraplegia. Nature 442, 164171. doi: 10.1038/nature04970 currents in leech heart interneurons by realistic waveforms. J. Neurosci. 16,
Hodgkin, A. L., and Huxley, A. F. (1952). A quantitative description of membrane 49584970.
current and its applications to conduction and excitation in nerve. J. Physiol. Olsen, O. H., Nadim, F., and Calabrese, R. L. (1995). Modeling the leech heartbeat
117, 500544. elemental oscillator: II. Exploring the parameter space. J. Comput. Neurosci. 2,
Hooper, S. (2000). Central pattern generators. Curr. Biol. 10, 176177. doi: 10.1016/ 237257. doi: 10.1007/BF00961436
S0960-9822(00)00367-5 Olypher, A., Cymbalyuk, G., and Calabrese, R. L. (2006). Hybrid systems analysis of
Ijspeert, A. (2001). A connectionist central pattern generator for the aquatic and the control of burst duration by low-voltage-activated calcium current in Leech
terrestrial gaits of a simulated salamander. J. Biol. Cybern. 84, 331348. doi: heart interneurons. J. Neurophysiol. 96, 28572867. doi: 10.1152/jn.00582.2006
10.1007/s004220000211 Rice, K. L., Bhuiyan, P. A., Taha, T. M., Vutsinas, C. N., and Smith, M. C. (2009).
Ijspeert, A. (2008). Central pattern generators for locomotion control in animals FPGA Implementation of Izhikevich Spiking Neural Network for Character
and robots: a review. J. Neural Netw. 21, 642653. doi: 10.1016/j.neunet.2008. Recognition, in International Conference on Reconfigurable Computing and
03.014 FPGAs, (Cancun), 451456.
Ijspeert, A., Crespi, A., Ryczko, D., and Cabelguen, J. (2007). From swimming to Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., Linares-Barranco, A., Paz-
walking with a salamander robot driven by a spinal cord model. Science 315, Vicente, R., Gmez-Rodrguez, F., et al. (2009). CAVIAR: A 45k-Neuron,
14161420. doi: 10.1126/science.1138353 5M-Synapse, 12G-connects/sec AER Hardware Sensory-Processing-Learning-
Indiveri, G. (2007). Synaptic plasticity and spike-based computation in VLSI Actuating System for High Speed Visual Object Recognition and Tracking. IEEE
networks of integrate-and-fire neurons. Neural Inform. Process. Lett. Rev. 11, Trans. Neural Netw. 20, 14171438. doi: 10.1109/TNN.2009.2023653
135146. Simoni, M., and DeWeerth, S. (2007). Sensory feedback in a half-center oscil-
Indiveri, G., Linares-Barranco, B., Hamilton, T., Van Schaik, A., Etienne- lator model. IEEE Trans. Biomed. Eng. 54, 193204. doi: 10.1109/TBME.
Cummings, R., and Delbruck, T. (2011). Neuromorphic silicon neuron circuits. 2006.886868
Front. Neurosci. 5:73. doi: 10.3389/fnins.2011.00073 Simoni, M., Cymbalyuk, G., Sorensen, M., R. Calabrese, R. L., and DeWeerth, S.
Izhikevich, E. M. (2003). Simple model of spiking neurons. IEEE Trans. Neural (2004). A multi-conductance silicon neuron with biologically matched conduc-
Netw. 14, 15691572. doi: 10.1109/TNN.2003.820440 tances. IEEE Trans. Biomed. Eng. 51, 342354. doi: 10.1109/TBME.2003.820390
Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons. IEEE Sorensen, M., DeWeerth, S., Cymbalyuk, G., and Calabrese, R. L. (2004). Using a
Trans. Neural Netw. 15, 10631070. doi: 10.1109/TNN.2004.832719 hybrid neural system to reveal regulation of neuronal network activity by an
Jezzini, S., Hill, A. A., Kuzyk, P., and Calabrese, R. L. (2004). Detailed model of intrinsic current. J. Neurosci. 24, 54275438. doi: 10.1523/JNEUROSCI.4449-
intersegmental coordination in the timing network of the leech heartbeat cen- 03.2004
tral pattern generator. J. Neurophysiol. 91, 958977. doi: 10.1152/jn.00656.2003 Still, S., and Tilden, M. W. (1998). Controller for a four legged walking machine,
Jung, R., Brauer, E. J., and Abbas, J. J. (2001) Real-time interaction between a neuro- in Neuromorphic Systems: Engineering Silicon from Neurobiology, eds L. S. Smith
morphic electronic circuit and the spinal cord. IEEE Trans. Neural Syst. Rehabil. and A. Hamilton, (World Scientific Publishing Co Pte Ltd), 138148 doi:
Eng. 9, 319326. doi: 10.1109/7333.948461 10.1142/9789812816535_0012
Krahl, B., and Zerbst-Boroffka, I. (1983). Blood pressure in the leech. J. Exp. Biol. Still, S., Hepp, K., and Douglas, R. J. (2006). Neuromorphic walking gait control.
107, 163168. IEEE Trans. Neural Netw. 17, 496508. doi: 10.1109/TNN.2005.863454
Le Masson, G., Renaud-Le Masson, S., Debay, D., and Bal, T. (2002). Feedback inhi- Tabak, J., Senn, W., ODonovan, M., and Rinzel, J. (2000). Modeling of sponta-
bition controls spike transfer in hybrid thalamic circuits. Nature 417, 854858. neous activity in developing spinal cord using activity-dependent depression in
doi: 10.1038/nature00825 an excitatory network. J. Neurosci. 20, 30413056.
Lee, Y. J., Lee, J., Kim, K. K., Kim, Y. B., and Ayers, J. (2007). Low power cmos Torres-Huitzil, C., and Girau, B. (2008). Implementation of central pattern gen-
electronic central pattern generator design for a biomimetic underwater robot. erator in an FPGA-based embedded system. 18th International Conference
Neurocomputing 71, 284296. doi: 10.1016/j.neucom.2006.12.013 on Artificial Neural Networks, Vol. 5164, 179187. doi:10.1007/978-3-540-
Lewis, M. A., Hartmann, M. J., Etienne-Cummings, R., and Cohen, A. H. (2001). 87559-8_19
Control of a robot leg with an adaptive VLSI CPG chip. Neurocomputing 3840, Touboul, J. (2009). Importance of the cutoff value in the quadratic adaptive
14091421. doi: 10.1016/S0925-2312(01)00506-9 integrate-and-fire model. Neural Comput. 21, 21142122. doi: 10.1162/neco.
Linares-Barranco, B., Snchez-Sinencio, E., Rodrguez-Vzquez, A., and Huertas, 2009.09-08-853
J. L. (1993). CMOS Analog Neural Network Systems Based on Oscillatory Van Der Pol, B. (1928). The heartbeat considered as a relaxation oscillation, and an
Neurons, in Silicon Implementation of Pulse Coded Neural Networks, eds M. electrical model of the heart. Philos. Mag. 6, 763775.

Frontiers in Neuroscience | Neuromorphic Engineering November 2013 | Volume 7 | Article 215 | 143
Ambroise et al. Biomimetic CPGs for Hybrid experiments

Vogelstein, R. J., Tenore, F., Etienne-Cummings, R., Lewis, M. A., and Cohen, A. H. Received: 06 August 2013; accepted: 29 October 2013; published online: 21 November
(2006). Dynamic control of the central pattern generator for locomotion. Biol. 2013.
Cybern. 95, 555566. doi: 10.1007/s00422-006-0119-z Citation: Ambroise M, Levi T, Joucla S, Yvert B and Saghi S (2013) Real-time
Wijekoon, J., and Dudek, P. (2008). Compact silicon neuron circuit with spiking biomimetic Central Pattern Generators in an FPGA for hybrid experiments. Front.
and bursting behavior. Neural Netw. 21, 524534. doi: 10.1016/j.neunet.2007. Neurosci. 7:215. doi: 10.3389/fnins.2013.00215
12.037 This article was submitted to Neuromorphic Engineering, a section of the journal
Zumsteg, Z., Kemere, C., ODriscoll, S., Santhanam, G., Ahmed, R. E., Shenoy, K. Frontiers in Neuroscience.
V., et al. (2005). Power feasibility of implantable digital spike sorting circuits for Copyright 2013 Ambroise, Levi, Joucla, Yvert and Saghi. This is an open-
neural prosthetic systems. IEEE Trans. Neural Syst. Rehabil. Eng. 13, 272279. access article distributed under the terms of the Creative Commons Attribution
doi: 10.1109/TNSRE.2005.854307 License (CC BY). The use, distribution or reproduction in other forums is permit-
ted, provided the original author(s) or licensor are credited and that the original
Conflict of Interest Statement: The authors declare that the research was con- publication in this journal is cited, in accordance with accepted academic practice.
ducted in the absence of any commercial or financial relationships that could be No use, distribution or reproduction is permitted which does not comply with these
construed as a potential conflict of interest. terms.

www.frontiersin.org November 2013 | Volume 7 | Article 215 | 144


METHODS ARTICLE
published: 22 January 2014
doi: 10.3389/fnins.2013.00276

Dynamic neural fields as a step toward cognitive


neuromorphic architectures
Yulia Sandamirskaya*
Chair for Theory of Cognitive Systems, Institute for Neural Computation, Ruhr-University Bochum, Bochum, Germany

Edited by: Dynamic Field Theory (DFT) is an established framework for modeling embodied cognition.
Andr Van Schaik, The University of In DFT, elementary cognitive functions such as memory formation, formation of grounded
Western Sydney, Australia
representations, attentional processes, decision making, adaptation, and learning emerge
Reviewed by:
from neuronal dynamics. The basic computational element of this framework is a Dynamic
Dylan R. Muir, University of Basel,
Switzerland Neural Field (DNF). Under constraints on the time-scale of the dynamics, the DNF is
Jonathan Binas, University of Zurich computationally equivalent to a soft winner-take-all (WTA) network, which is considered
and ETH Zurich, Switzerland one of the basic computational units in neuronal processing. Recently, it has been
*Correspondence: shown how a WTA network may be implemented in neuromorphic hardware, such as
Yulia Sandamirskaya, Chair for
analog Very Large Scale Integration (VLSI) device. This paper leverages the relationship
Theory of Cognitive Systems,
Institute for Neural Computation, between DFT and soft WTA networks to systematically revise and integrate established
Ruhr-University Bochum, 150, DFT mechanisms that have previously been spread among different architectures. In
44780 Bochum, Germany addition, I also identify some novel computational and architectural mechanisms of DFT
e-mail: yulia.sandamirskaya@
ini.rub.de
which may be implemented in neuromorphic VLSI devices using WTA networks as an
intermediate computational layer. These specific mechanisms include the stabilization of
working memory, the coupling of sensory systems to motor dynamics, intentionality, and
autonomous learning. I further demonstrate how all these elements may be integrated
into a unified architecture to generate behavior and autonomous learning.
Keywords: dynamic neural fields, cognitive neuromorphic architecture, soft winner-take-all, autonomous learning,
neural dynamics

1. INTRODUCTION sensory and motor noise, and low power consumption. Success in
Organisms, such as animals and humans, are remarkable in the implementation of cognitive models on neuromorphic hard-
their ability to generate behavior in complex and changing envi- ware may lead to breakthroughs both in understanding the neural
ronments. Their neural systems solve challenging problems of basis of human cognition and in the development of performant
perception and movement generation in the real world with a technical systems (robots) acting in real-world environments.
flexibility, adaptability, and robustness that surpasses the capabil- VLSI technology allows one to implement large neural net-
ities of any technical system available today. The question of how works in hardware by configuring the VLSI device to simulate
biological neural systems cope with the complexity and dynam- the dynamics and connectivity of a network of spiking neurons.
ics of real-world environments and achieve their behavioral goals, Such networks may be efficiently configured, connected to sen-
does not have a simple answer. Processes such as memory for- sors and motors, and operate in real time (Mead and Ismail,
mation, attention, adaptation, and learning all play crucial roles 1989; Indiveri et al., 2009, 2011). However, a challenging question
in the biological solution to the problem of behavior generation remains: how to develop these neuromorphic systems beyond
in real-world environments. Understanding how these processes simple feed-forward reactive architectures toward architectures
are realized by the neural networks of biological brains is at the capable of cognitive behavior?
core of understanding biological cognition and building cognitive Soft winner-take-all (WTA) connectivity has been recently
artifacts that successfully contend with real world constraints. proposed as an important milestone on the way toward such
The field of neuromorphic engineering may contribute to functional cognitive neuromorphic systems (Indiveri et al., 2009;
the ambitious goal of understanding these cognitive processes Rutishauser and Douglas, 2009). Soft WTA networks are com-
by offering platforms in which neural models may be imple- putational elements that are hypothesized to play a central role
mented in hardware using the VLSI (Very Large Scale Integration) in cortical processing (Douglas and Martin, 2004; Rutishauser
technology. The analog neuromorphic hardware shares several and Douglas, 2009). Recently, a wide variety of WTA networks
properties with biological neural networks such as the presence of spiking neurons have been implemented in hardware (Indiveri
of the inherent noise, the potential mismatch of computing ele- et al., 2001; Abrahamsen et al., 2004; Oster and Liu, 2004; Indiveri
ments, constraints on connectivity, and a limited number of et al., 2009). These initial architectures have made use of WTA
learning mechanisms. Apart from these shared constraints, artifi- connectivity to enable the effective processing of sensory infor-
cial and biological neural networks also maintain the advantages mation (Liu and Delbruck, 2010) and the implementation of
of pervasive parallel computation, redundant systems to handle finite state machines (Neftci et al., 2013). Soft WTAs introduce

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 145


Sandamirskaya DNFs and cognitive neuromorphic architectures

a cognitive layer to the neuromorphic hardware systems, which architectures. In particular, these principles enable the coupling
enables reliable processing on unreliable elements (Neftci et al., of DNFs of differing dimensionality, the coupling of the archi-
2013). The WTA networks contribute to making neuromorphic tectures to sensors and motors, cognitive control over behavior,
systems more cognitive, because they stabilize localized attractor and autonomous learning. On a simple exemplar architecture, I
patterns in neural networks. These stable attractors organize the demonstrate how these principles enable autonomous behavior
dynamics of the neural system in a macroscopical way and enable and learning in a neural-dynamic system coupled to real-world
the coupling of the network to sensors and motors despite noise, sensors and motors. I also discuss the possibility of implementing
fluctuations, and neural mismatch. WTA connectivity therefore DNF architectures in neuromorphic hardware.
introduces macroscopic neural dynamic states which may persist
long enough to interact with other parts of the neural-dynamic 2. MATERIALS AND METHODS
architecture, thus moving neuromorphic systems beyond mere 2.1. DYNAMIC NEURAL FIELDS: BASIC DYNAMICS AND INSTABILITIES
reactive behavior. A DNF is a mathematical description of activation dynamics
However, there are still open questions on the way toward of a neuronal population in response to certain parameters of
cognitive processing with hardware WTAs. The first question con- the agents behavioral state. The behavioral parameters, such as
cerns representational power: How can we add contents to the a perceptual feature, location, or motor control variable, span
state in a WTA network and link this network state to perceptual dimension(s), over which the DNFs are defined (Schoner, 2008).
or motor variables? How can the system represent associations The dynamics of DNF may be mathematically formalized as a
and concepts such as a red ball on the table or a hand mov- differential equation, Equations (13), which was first analyzed
ing toward an object in this framework? The second line of by Amari (1977), and used to model neuronal dynamics on a
open questions concerns movement generation and the motor population level (Wilson and Cowan, 1973; Grossberg, 1988;
behavior: How should the system represent and control move- Ermentrout, 1998).
ments in this framework? How should it decide when to initiate 
 
or terminate a movement? Finally, questions regarding learn- u(x, t) = u(x, t) + h + f u(x , t) (x x )dx + S(x, t), (1)
ing also arise: How may a system learn WTA connectivity of  
 
its neural network? How may the system learn the connections (x x )2 (x x )2
between WTA networks in a complex architecture? Such ques- (x x ) = cexc exp 2
cinh exp 2
, (2)
2exc 2inh
tions are often addressed in the fields of psychophysics, cognitive
science, and artificial intelligence, but the proposed models and   1
f u(x, t) = . (3)
solutions are often not compatible with neural implementations. 1 + exp[u(x, t)]
Here, I propose that Dynamic Field Theory (DFT) is a frame-
work which may make such cognitive models feasible for neu- In Equation (1), u(x, t) is the activation of the DNF over dimen-
romorphic implementation because it formulates the principles sion x, to which the underlying neuronal population is responsive.
of cognitive representations and processes in a language com- h is a negative resting level and S(x, t) is an external input driv-
patible with neuromorphic soft WTA architectures. Identifying ing the DNF. The lateral interactions in DFT are shaped by
the computational and architectural principles underlying these a symmetrical homogeneous interaction kernel, Equation (2),
cognitive models may facilitate the development of large-scale with a short-range excitation and a long-range inhibition (Ellias
neuromorphic cognitive systems. and Grossberg, 1975); exc , inh , cexc , and cinh are the width
DFT is a mathematical and conceptual framework which was and the amplitude of the excitatory and the inhibitory parts of
developed to model embodied human cognition (Schoner, 2008). the interaction kernel respectively. The sigmoidal non-linearity,
DFT is an established framework in modeling many aspects of Equation (3), shapes the output of the DNF in such a way, that
human cognition and development including visual and spa- only sufficiently activated field locations contribute to neural
tial working memory, object and scene representation, sequence interactions; determines the slope of the sigmoid.
generation, and spatial language (Johnson et al., 2008). DFT cog- An example of how a DNF may be linked to the activity of
nitive models have been used to control robots and demonstrate a neuronal population is shown in Figure 1: First, each neu-
that the developed architectures can function autonomously in ron in the population contributes its tuning curve in respect
the real-world (Erlhagen and Bicho, 2006; Sandamirskaya et al., to the behavioral parameter of interest as a (virtual) input to
2013). DFT builds on Dynamic Neural Fields (DNFs), which, the DNF. The tuning curve is determined as a dependence of
as I will discuss in the Methods section, are analogous to soft the mean firing rate or the action potential of the neuron on
WTAs in their dynamics and lateral connectivity within networks the value of the behavioral parameter (Figure 1A). Second, the
(Neftci et al., 2010). Accordingly, their dynamical and structural tuning curves of the neurons in the population are summed,
principles may be applied to the design of neuromorphic WTA each weighted by the current activation level (e.g., mean fir-
architectures. ing rate) of the respective neuron. The resulting Distribution of
In this paper, I discuss computational and architectural prin- Population Activity [DPA, introduced by Bastian et al. (2003)
ciples recently developed in DFT that may be applied to WTA to derive a DNF description of neuronal data on movement
neuromorphic networks. These principles can increase the repre- preparation in studies of reaching movements in monkeys] rep-
sentational power and autonomy of such networks, and thus con- resents the overall activity of the selected neuronal population
tribute to the greater scalability and robustness of neuromorphic in response to a given stimulus or state of the behaving neural

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 276 | 146
Sandamirskaya DNFs and cognitive neuromorphic architectures

FIGURE 1 | Illustration of the relationship between neuronal activity (2003)] curve in response to a given color stimulus is constructed. (D)
and a DNF. (A) Five exemplar neurons (neuronal populations) and their The DNF dynamics adds lateral interactions between neurons according
tuning curves in the color dimension. (B) The tuning curves are scaled to Equation (1). The activation of the DNF is shown as a blue line, the
by the mean firing rate (activation) of the neurons. (C) By summing the red line shows the output (sigmoided activation) of the DNF, the green
scaled tuning curves, the Dynamic Population Activity [DPA, Bastian et al. line is the DPA [same as in (C)].

 
system (Figures 1B,C). Finally, the neurons in the population xi = xi + f Ii + xi 1 xN Ti (4)
are assumed to be interconnected so that the nearby (in the 1
N
 
behavioral space) locations exert excitatory influence on each xN = xN + f 2 xj TN . (5)
other, and the far-off locations inhibit each other (on-center, off- j=1
surround connectivity Ellias and Grossberg, 1975). The resulting
activation function u(x, t), is activation of the DNF. A sig- In Equations (4, 5), the excitatory population of nodes (neurons)
moidal non-linearity f (u(x, t)), shapes the output of the DNF, xi has an attractor dynamics driven by the external input, Ii , the
which impacts on the DNF itself through the lateral connections resting level potential, Ti , the self-excitatory term with strength ,
and on the other parts of the neural architecture connected to and the inhibitory term with strength 1 . The inhibition is shared
this DNF. by all excitatory nodes and is provided by the inhibitory neuron,
The pattern of lateral connectivity of DNFs results in xN , which also follows an attractor dynamics, driven by activity in
existence of a localized-bump solution in their dynamics the excitatory population and the resting level TN .
(Figure 1D), which is at the core of the properties of DNFs In these equations, the excitation constant, , is analogous to
to exert elementary cognitive functions, discussed further. In the excitatory part of the interaction kernel of a DNF, cexc in
the realm of modeling human cognition, activity peaks bridge Equation (2), and the strength of the coupling of the inhibitory
the low-level, graded sensory-motor representations to cate- population onto the excitatory population, 1 , corresponds to the
gorical, symbol-like representations. The localized (and sta- inhibitory part of the interaction kernel with the strength cinh .
bilized, i.e., sustainable over macroscopical time intervals) In the DNF equation, the inhibition is coupled into the fields
representation facilitates perception, action generation, and dynamics without delay, which is present in the WTA network
learning. of Equations (4, 5).
The connectivity pattern within DNF also makes it a soft WTA In several studies on development of working memory and
architecture. Indeed, a WTA-connected network may be formal- spatial cognition in infants and toddlers, a more general DNF
ized in terms of two neuronal populations, an excitatory and an equation is used, in which a separate inhibitory layer is intro-
inhibitory one (Rutishauser and Douglas, 2009): duced [e.g., Johnson et al. (2006, 2008)]. Separate inhibitory layer

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 147


Sandamirskaya DNFs and cognitive neuromorphic architectures

leads to a delay in the inhibitory interaction among neural fields A negative external input or a decrease of the excitatory
locations, which allows to model fine-grained effects in competi- input may lead to an extinction of the activity peak. This
tion among items in the working memory depending on timing causes a reverse detection instability, or forgetting insta-
of their presentation. The separate inhibitory layer is also used to bility, which separates an active state from the quiescent
create a shared inhibition among perceptual and working mem- state.
ory neural fields, which plays a critical role in a change detection
process. The localized-peak stable states and instabilities between them
When DNF architectures are created to generate behavior in form the basis for more complex DNF architectures, just as WTA
an embodied agent, the DFT postulates that only attractor states networks form the basis for state-based spiking network archi-
impact on the behavior of the controlled agent and thus the tectures. In the following, I present additional components in
dynamics of DNFs is typically tuned to relax as fast as possible the DFT, which may be translated into VLSI WTA networks and
to the attractor state. Since this holds for the separate inhibitory enhance their scalability and autonomy.
layer, the presence of the delay in the inhibitory dynamics is neg-
ligible in robotic DNF architectures. For this reason, when DNFs 2.2. COUPLING DYNAMIC NEURAL FIELDS TO SENSORY SYSTEMS
are used to control robots, only single-layer dynamics are used, Figure 2 shows a small DNF architecture, which exemplifies the
where inhibition and excitation are integrated in a single equa- coupling structures in DFT: coupling the DNFs to each other,
tion. Since WTA dynamics in Equations (4,5) is a more general to sensors, and to motors. Here, I will introduce the principles
formulation than DNFs, discussed in this paper, the equivalence behind these coupling structures, while referring to the figure for
between these two mathematical structures requires a constraint a concrete example. Overall, the simple system in the Figure 2
on the timing constant of the inhibitory population, which needs performs saliency computations based on color- or spatial cues
to be faster than the timing constant of the excitatory population, by means of neuronal dynamics (DNF or WTA computation)
which in its turn is faster than the dynamics of sensor inputs to and will be a building block, used in the example, presented in
the field. Section 3.
The stable localized activity peak solution of the DNF dynam- In Figure 2, a two-dimensional perceptual color-space DNF
ics is the DNF variant of soft-WTA behavior. Intuitively, the receives input from the robotic camera. Camera input to this
short-range excitatory interactions stabilize the peak solution DNF is constructed in the following way. The raw hue value
against decay and the long-range inhibitory interactions stabilize of every pixel corresponds to the vertical location in the DNF,
peaks against spread by diffusion. The sites of the DNF, which the location of the pixel on the horizontal axis of the image
have above zero activity, are the winners of the DNF dynam- to the horizontal location in DNF, and the saturation value
ics. The sigmoidal non-linearity increases stability of the localized of the pixel to the intensity value of the sensory input. Thus,
peak. The important contribution of DFT to understanding the the input to the perceptual DNF is an unsegmented stream of
dynamics of soft WTA networks is the characterization of sta- color-space associations. If the input is strong enough to pass
ble states and instabilities between them based on the analysis of the activation threshold and is localized in space, a peak of
Equation (1) (Amari, 1977; Schoner, 2008; Sandamirskaya et al., suprathreshold activity evolves, which represents the perceived
2013): object. In Figure 2A, the camera input is not sufficient to activate
the perceptual DNFonly subthreshold hills of activity repre-
The detection instability separates a quiescent state of the DNF sent the four salient objects in the visual scene. However, when
from an active state. In the quiescent state, the inputs are not the perceptual DNF receives an additional input, which speci-
strong enough to collectively drive the DNF over the activa- fies the color of the target object and which overlaps with one
tion threshold. The DNF produces no output in this state, of the subthreshold hills, an activity peak evolves in the per-
it is invisible for the down-stream structures, driven by the ceptual DNF and signals the selection of an object of interest
DNF. To the contrary, when inputs are strong enough to (Figure 2B). The additional input arrives from anothercolor
drive the field over the activation threshold in one or sev- DNF, which is coupled to the perceptual DNF, as described in
eral locations, an activity peak emerges in the field, which Section 2.3.
provides input to the down-stream structures, or the motor Another example of coupling a sensor to the DNF is shown
system. in Figure 3. Here, a neuromorphic embedded Dynamic Vision
The DNFs inputs may drive the field over the threshold at sev- Sensor [eDVS, Conradt et al. (2009)] drives the perceptual DNF.
eral locations. In this case, the field may build several activation In the eDVS, each pixel sends an event when it sensed lumi-
peaks or it may select and amplify activity at one location only, nance changes. Consequently, the sensor naturally detects moving
depending on the spread of the lateral inhibition. In the latter objects. If the object of interest is not moving too fast relative to
case, a selection instability separates an inactive state from an the motor capabilities of the agent, the perceptual DNF may be
activated state of the DNF dynamics. used to stabilize the representation of the instantaneous position
If the lateral interactions are strong enough, a peak in the DNF of the moving object in order to use this position to parametrize
may be sustained even if the input, which initiated the peak, the motor action (e.g., to direct the agents gaze toward the
ceases. This working memory instability separates the state of object). If the object is moving too fast for the behaving system, a
the field with no activation from the state, in which an external predictive mechanism needs to be built into the DNFs dynamics
inhibiting input is needed to deactivate the field. (Erlhagen and Bicho, 2006).

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 276 | 148
Sandamirskaya DNFs and cognitive neuromorphic architectures

FIGURE 2 | A small DNF architecture, which consists of a in the color DNF over the hue value of the named color. This activity peak is
two-dimensional color-space DNF (center), one-dimensional color- and projected onto the 2D perceptual DNF as a subthreshold activity ridge, which
space- DNFs, coupled to the perceptual DNF, a camera input (top), and overlaps with the camera input for the green object. The resulting activity
an attractor motor dynamics (bottom). (A) The camera input alone is not peak in the 2D DNF provides input to the spatial DNF, which, in its turn, sets
sufficient to activate the perceptual DNF, the system is quiescent and an attractor for the motor dynamics. The latter drives the motor system of the
produces neither output nor behavior. (B) A color cue creates an activity peak agent, initiating an overt action.

FIGURE 3 | The neuromorphic Dynamic Vision Sensor [eDVS, Conradt DNF. The perceptual DNF enhances the perceptual input in a selected region
et al. (2009)] on a pan-tilt unit, the output of the eDVS, integrated over a (which reached the activation threshold first), and inhibits all other locations in
time window of 100 ms, and the instantaneous output of the perceptual the visual array, performing an elementary object segregation operation.

2.3. DYNAMIC NEURAL FIELDS OF HIGHER DIMENSIONALITY AND in space. Such multidimensional DNFs have typically low
COUPLINGS dimensionality.
A single DNF describes activation of a neuronal popula- Two DNFs of the same or different dimensionality may be
tion, which is sensitive to a particular behavioral parameter. coupled with weighted connections, according to Equation (7)
Activity of any behaving agent, however, is characterized by (Zibner et al., 2011).
many such parameters from different sensory-motor modal- 
ities. In DFT, there are two ways to represent such multi-  
u1 (x, t) = u1 (x, t) + h + f u1 (x , t) (x x )dx
modality of a system: multidimensional DNFs and coupled
DNFs. + S(x, t), (6)
The multidimensional DNFs are sensitive to combinations 
 
of two or several behavioral parameters. The perceptual color- u2 (y, t) = u2 (y, t) + h + f u2 (y , t) (y y )dy
space field in Figure 2 is an example of a two-dimensional DNF,
which may be activated by combinations of color and locations + W(x, y) f (u1 (x, t)). (7)

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 149


Sandamirskaya DNFs and cognitive neuromorphic architectures

Here, u1 (x, t) and u2 (y, t) are two DNFs, defined over two differ- Here, u(x, t) is a one-dimensional motor DNF, which represents
ent behavioral spaces, x and y. The first DNF provides an additive the target values of the motor variable using space coding.
input to the second DNF through the (adaptable) connection is the motor variable, which controls movement of the robot
weights matrix, W(x, y), which maps the dimensions of the space (e.g., velocity, position, force of a motor, or the target elonga-
x onto dimensions of the space y. tion of a muscle). This variable follows an attractor dynamics,
For example, the one-dimensional color DNF in Figure 2 rep- Equation (9) with an attractor defined by the position of the
resents distributions in the color (hue) dimension. This DNF activity peak in the DNF, u(x, t). This attractor is only turned on
projects its activation onto the two-dimensional color-space DNF. when an activity peak is present in the motor DNF. The typical
In particular, since the two DNFs share one dimension (color), choice for (x) is (x) = cx, but generally, this factor is subject to
the output of the one-dimensional DNF is copied along the not a learning (gain adaptation) process (see Section 2.6.3).
shared dimension (space) of the two-dimensional DNF. This typi- In a WTA architecture, the motor variable [Equation (9)]
cally results in a ridge-shaped input to the two-dimensional DNF may be implemented as a neural population without lateral con-
(stamming from the Gaussian shape of the activity peak in the nections, which receives input from the a motor WTA [that is
one-dimensional DNF). If this ridge overlaps with a localized sub- analogous to the motor DNF in Equation(8)] through a set of
threshold input in the two-dimensional DNF, an activity peak synaptic connections, (x). This input is summed by the motor
evolves over the cued (in this case, by color) location (Zibner variable population. The critical difference of this dynamics to the
et al., 2011). DNF (or WTA) dynamics is that the motor command is defined
Further, the localized output of the two-dimensional percep- by the activity of the population rather than the location of an
tual DNF in Figure 2 is in its turn projected on a one-dimensional activity peak in the population (Bicho et al., 2000).
spatial DNF, which represents locations on the horizontal axis of
the image plane. This projection may be either a sum or a maxi- 2.5. AUTONOMY AND COGNITIVE CONTROL IN DFT
mum of the DNFs output in the dimension, not shared between Critically, in order to close the behavioral loop, the cognitive con-
the two DNFs (here, color). An example of an adaptive cou- trol of the neural architecture is necessary. In particular, the agent
pling between DNFs of the same dimensionality is presented in that has access to several perceptual and motor modalities has
Section 2.6.2. to decide at each point in time, which perceptual input to use
In terms of WTA network, coupling between two DNFs is to control the motor system and which effector of the motor
equivalent (under constraints, stated in Section 2.1) to two WTA system to use to achieve a behavioral goal. This problem was
networks, one of which receives output from the other one as an addressed recently in DFT in terms of modeling executive con-
external input, which is mapped through synaptic connections. trol in human cognition (Buss and Spencer, 2012) and in the
behavioral organization in robotics (Richter et al., 2012).
2.4. COUPLING THE DNF TO ATTRACTOR MOTOR DYNAMICS The crucial element that gives a neural architecture the desired
In order to close the behavioral loop, DNF architectures have to autonomy of executive control is based on the principle of inten-
be coupled to the motor system of a behaving agent. The control tionality (Searle, 1983; Sandamirskaya and Schoner, 2010a). In
of motor actions may be expressed mathematically as an attractor practice, this principle amounts to a structural extension of DNFs,
dynamics, where the neural system sets attractors for motor vari- so that every behavioral state of the system has two components
ables, such as position, velocity, or force of the effector. Deviations a representation of an intention, which eventually drives the
from the attractor due to an external or an internal perturba- motor system of the agent, and a representation of the condition-
tion are then actively corrected by the neural controller in the of-satisfaction (CoS), which is activated by the sensory input
motor system. Such motor attractor dynamics have been probed when the action is finished and which inhibits the respective
in control of mobile robots (Bicho and Schoner, 1997) and multi intention. The CoS DNF is biased, or preshaped, by the intention
degrees of freedom actuators (Schaal et al., 2003; Iossifidis and DNF to be sensitive to particular sensory input, characteristics for
Schner, 2004; Reimann et al., 2011), and also used to model the action outcome. This coupling from the intention to the CoS
human motor control (Latash et al., 2007). DNF carries a predictive component of the intentional behav-
In order to couple the DNF dynamics to the attractor dynam- ior, which may be shaped in a learning process (Luciw et al.,
ics for motor control, the space-code representation of the DNF 2013). Together, the intention and the CoS comprise an elemen-
(in terms of locations of activity peaks) has to be mapped onto tary behavior (EB, Richter et al., 2012), which generally has the
the rate-code representation of the motor dynamics (in terms dynamics of Equations (10).

of the value of the control variable). Figure 2 (bottom) and  
uint (x, t) = uint (x, t) + h + f uint (x , t) (x x )dx
Equation (89) show how the space-code of a DNF may be trans-

lated into the rate-code of attractor dynamics through a weighted
projection to the rate-coding neural node. The weights (or gain + S1 (x, t) c1 f (uCoS (y, t))dy, (10)
field, (x)) of this projection may be subject to learning (or 
 
adaptation) (see Section 3). uCoS (y, t) = uCoS (y, t) + h + f uCoS (y , t) (y y )dy

  + S2 (y, t) + c2 W(x, y)f (uint (x, t))
u(x, t) = u(x, t) + h + f u(x , t) (x x )dx + S(x, t), (8)
  Here, uint (x, t) is a DNF which represents possible intentions of

(t) = f (u(x, t))dx + (x)f (u(x, t))dx. (9) the agent. These intentions may be motor or perceptual goals,

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 276 | 150
Sandamirskaya DNFs and cognitive neuromorphic architectures

which the agent aims to achieve through contact with the envi- Here, P(x, t) is the strength of the memory
 trace
 at site x of the
ronment. For instance, locate a red object is a typical perceptual DNF with activity u(x, t) and output f u(x, t) , build and decay
intention, turn 30 degrees to the left is an example of a motor are the rates of build-up and decay of the memory trace. The
intention. x is a perceptual or motor variable, which characterizes build-up of the
 memory  trace is active on sites with a high pos-
the particular intention; S1 (x, t) is an external input which acti- itive output f u(x, t) , the decay is active on the sites with a low
vates the intention. This input may be sensory (condition of initi- output. The memory trace P(x, t) is an additive input to the DNF
ation) or motivational (task input) (Sandamirskaya et al., 2011). dynamics.
uCoS (y, t) is the condition-of-satisfaction DNF, which receives a The memory trace formation can be used to account for one-
localized input from the intention DNF through a neuronal map- shot learning of object categories (Faubel and Schner, 2009),
ping W(x, y) (as introduced in Section 2.3). This input makes representation of visual scenes (Zibner et al., 2011), or action
the CoS DNF sensitive to a particular part of the sensory input, sequences (Sandamirskaya and Schoner, 2010b).
S2 (y, t), which is characteristic for the termination conditions of In a neuromorphic WTA implementation, the memory trace,
the intended perceptual or motor act. The mapping W(x, y) may or preshape, may be interpreted as the strength of synaptic
be learned (Luciw et al., 2013). When the CoS DNF is activated, connections from the DNF (or WTA), u(x, t), to a memory
it inhibits the intention DNF by shifting its resting level below the population. This memory population activates the preshape
threshold of the forgetting instability. by transmitting its activation through the learned synaptic con-
The DNF structure of an elementary behavior (EB) further nections, P(x, t). Learning of the synaptic connections amounts
stabilizes the behavioral state of the neural system. Thus, the to attractor dynamics [as in the first parenthesis of Equation
intentional state of the system is kept active as long as needed to (11)], in which the pattern of synaptic connections approaches
achieve the behavioral goal. The CoS autonomously detects that the pattern of the DNFs (WTAs) output. This learning dynamics
the intended action is successfully accomplished and inhibits the may also be implemented as a simple Hebbian rule: the synap-
intention of the EB. Extinction of the previously stabilized inten- tic weights which connect active sites of the DNF (WTA) with
tion gives way to the next EB to be activated. With this dynamics, the memory population are strengthened. Another possible inter-
the exact duration of an upcoming action does not need to be pretation of the preshape as a change in the resting levels of
represented in advance (and action durations may vary to a large individual nodes in the DNF (WTA) is harder to implement in
degree in real-world environments). The intentional state will neuromorphic WTA networks.
be kept active until the CoS signals that the motor action has
reached its goal. This neural-dynamic mechanism of intention- 2.6.2. Learning mappings and associations
ality enables autonomous activation and deactivation of different When the memory trace dynamics is defined within a structure
modalities of a larger neuronal architecture. with a higher dimensionality than the involved DNFs, the pre-
Since the intention and the CoS are interconnected DNFs, shape dynamics leads to learning of mappings and associations.
their WTA implementation may be achieved as described in The dynamics of an associating map is similar to the memory
Section 2.3. trace dynamics, Equation (12).


2.6. LEARNING IN DFT
W(x, y, t) = (t) W(x, y, t) + f (u1 (x, t)) f (u2 (y, t)) . (12)
The following learning mechanisms are available in the DFT
framework. The weights function, W(x, y, t), which couples the DNFs u1 (x, t)
and u2 (y, t) in Equation (12), as well as in Equations (4, 5),
2.6.1. Memory trace of previous activity has an attractor at the intersection between positive outputs of
The most basic learning mechanism in DFT is the memory trace the DNFs u1 and u2 . The intersection is computed as a sum
formation, also called preshape. The memory trace changes the between the output of u1 , expanded along the dimensions of the
subsequent dynamics of a DNF and thus is considered an ele- u2 , and the output of the u2 , expanded in the dimensions of the
mentary form of learning. In neural terms, the memory trace u1 , augmented with a sigmoidal threshold function (this neural-
amounts to local increase in excitability of neurons, which may dynamic operation is denoted by the symbol). The shunting
be counterbalanced with homeostatic processes. term (t) limits learning to time intervals when a reward-
Formally, the preshape is an additional layer over the same ing situation is perceived, as exemplified in the architecture in
dimensions as the associated DNF. The preshape layer receives Section 3.
input from the DNF, which is integrated into the preshape This learning mechanism is equivalent to a (reward-gated)
dynamics as an attractor that is approached with a time-constant Hebbian learning rule: the cites of the DNFs u1 and u2 become
l /build , Equation (11). This build-up constant is slower than the coupled more strongly if they happen to be active simulta-
time-constant of the DNF dynamics. When there is no activity in neously when learning is facilitated by the (rewarding) sig-
the DNF, the preshape decays with an even slower time-constant, nal (t). Through the DNF dynamics, which builds localized
l /decay in Equation (11). activity peaks in the functionally relevant states, the learning

    dynamics has the properties of the adaptive resonance net-
t) = build P(x, t) + f u(x, t) f u(x, t)
l P(x, works (ART, Carpenter et al., 1991), which emphasize the

  need for localization of the learning processes in time and in
decay P(x, t) 1 f u(x, t) . (11) space.

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 151


Sandamirskaya DNFs and cognitive neuromorphic architectures

2.6.3. Adaptation This simple task embraces the following fundamental prob-
Adaptation [Equation (13)] is considered a learning process, lems. First, the mapping between the target location and the
which amounts to an unnormalized change of the coupling required motor command is a priori unknown. The system needs
weights (gains) in a desired direction. A typical example is learn- to calibrate itself autonomously. In particular, the system needs
ing in the transition from the DNFs space-code to the rate-code to learn a mapping between the position of the input in the cam-
of motor dynamics. era array and the motor command, which will bring the target in
the center of the visual field. The second fundamental problem
t) = (t)f (u(x, t))
(x, (13) revealed in this setting is that when the camera moves, the per-
ceived location of the target on the surface of the sensor changes,
(t) = error time window and the system needs a mechanism to keep the initial location
of the target in memory in order to learn the mapping between
Here, (x, t) is a matrix of weights, or gains, defined over the the visually perceived locations and the motor commands. The
dimension of the DNF, u(x, t), which is coupled to the motor third problem is the autonomy of the looking behavior: the sys-
dynamics, as in Equation (9). The gain changes in proportion tems needs a mechanism to update the target representation after
to the output of the driving DNF, u(x, t), in a learning win- both successful and unsuccessful looking actions.
dow, defined by the term (t). The learning window is non-zero Figure 4 shows the scheme of the DNF architecture, which
in a short time window when an intended action within EB, to demonstrates how all these problems may be addressed in a
which the DNF u(x, t) belongs, is finished (the respective uCoS is closed-loop system. Next, I will present the dynamical structures,
active), but activity in the intention DNF is not yet extinguished. which constitute the architecture.
The error is determined in a DNF system, which compares the
outcome of an action with the intended value of the motor 3.2. THE NEURAL-DYNAMICS ARCHITECTURE
variable and determines the direction of change of the weights 3.2.1. Perceptual DNF
in (x, t). The perceptual DNF is coupled to the eDVS, as described in
Now that all neural-dynamic structures developed within DFT Section 2.2 and effectively performs a low-pass filter operation on
are presented, which may be implemented in hardware neu- the camera input in time and in space. This DNF builds peaks of
ronal networks through the WTA architecture, I will introduce activation at locations, where events are concentrated in time and
an exemplar robotic architecture, which integrates these mecha- in space in the visual array of the robot.
nisms in a neural-dynamic system, which generates behavior and
learns autonomously. 3.2.2. Visual intention DNF
This DNF builds sustained activity peaks that represent the target
3. AN EXAMPLE OF AN ADAPTIVE ARCHITECTURE IN DFT locations (Figure 5). The peaks are sustained even if the input,
3.1. THE SCENARIO AND SETUP which initiated them ceases or moves. Thus, even during or after
The simple, but functioning in a closed loop learning architecture a gaze movement, the representation of the current target is stably
presented in this section employs several of the principles, pre- represented. This allows, on the one hand, the robust coupling to
sented above, such as categorization properties of DNFs, coupling the motor system (the attractor, set for the motor system, is guar-
between DNFs of different dimensionality, coupling to sensors anteed to be kept constant for the time of the movement). On the
and motors, autonomous action initiation and termination, as other hand, this memory system enables learning, since the rep-
well as learning. resentation of the previous target is still active when a rewarding
The robot, used to demonstrate the closed-loop behavior input is perceived after a successful gaze.
of a neuromorohic agent, consists of an eDVS camera and a
3.2.3. Motor intention DNF
pan-tilt unit. The eDVS camera has 128x128 event-based pix-
els, each sending a signal when a luminance change is detected. The visual intention DNF represents the target of the current gaze
The pan-tilt unit consists of two servo motors, which take action in sensory, here visual, coordinates. The movement gener-
position signals in the range 02000 and are controlled to ation system takes attractors in the motor coordinates, however
take the corresponding pose with a small latency. The task (here, the desired pan and tilt). The motor intention DNF is
for this robot is to direct its gaze at a small blinking cir-
cle, which is moved around on a computer screen in front of
the robot. A successful looking movement leads to the blink-
ing circle settled in the central portion of the robots camera
array.
In order to accomplish this task, the robot, similarly to an ani-
mal, needs to detect the target in the visual input and in particular,
estimate and represent its location relative to the center of the field
of view of the robot. Next, according to the current location of the
target, the system needs to select a motor command, which will
bring the target into the center of the field of view. Thus, the sys-
tem needs to select the desired values for pan and tilt, which will FIGURE 4 | The DFT architecture for looking. See main text for details.
be sent to the servo motors.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 276 | 152
Sandamirskaya DNFs and cognitive neuromorphic architectures

FIGURE 5 | The cascade from the visual input to perceptual DNF to the visual intention (target) DNF segregates and stabilizes the selected region in
the input stream.

defined over the motor coordinates and an activity peak in this In the transformation array, a learning dynamics is imple-
DNF creates an attractor for the motor dynamics and initiates a mented [Equation (12)]. The learning window, (t) is defined
gaze movement. by the activity in the visual match DNF, which signals when the
visual input falls onto the central part of the camera array.
3.2.4. Condition of satisfaction node
The CoS DNF in this architecture is a zero-dimensional CoS node, 3.2.6. The visual match DNF
since it monitors directly the state of the motor system, which The visual match DNF receives a preshape in the center of the
is characterized by two single-valued variables, pan and tilt. The field when the visual intention DNF is active. This preshaping
CoS node is activated when the motor action is accomplished input is equivalent to an expectation to perceive the target in the
[Equation (14)]. visual field, which biases the visual match DNF to be sensitive to
 the respective sensory input. The connectivity which enables this
predicting coupling is assumed to be given here, but could poten-
vcos (t) = vcos (t) + h + cexc f (vcos (t)) + c f (umot (y, t))dy
tially emerge in a developmental process [e.g., similar to Luciw
+ ca fdiff , (14) et al. (2013)].

where vcos (t) is the activation of the CoS node for either the pan or umatch (x, t) = umatch (x, t) + h + f (umatch (x , t))w(x x )dx
the tilt movement (the CoS of the overall movement is a thresh- 
olded sum of the two CoSs). The CoS node is activated if (1) there + f (uperc (x, t)) + cG (x, t) f (uvis (x, t))dx, (15)
is activity in the motor intention DNF, umot , and (2) the detec-
signals that the state variable for
tor fdiff = f (0.5 |pan pan|)
the pan or the tilt dynamics reaches the respective attractor, . c In Equation (15), the visual match DNF, umatch (x, t) is defined
and ca are scaling constants for these two contributions, cexc is the over the same (visual, 2D here) coordinates as the perceptual
strength of self-excitation of the CoS node. DNF, uperc , and the visual intention DNF, uvis , and receives
The activated CoS node inhibits both the motor and the visual a one-to-one input from the perceptual DNF, as well as a
intention DNFs below activation threshold. The otherwise self- Gaussian-shaped input, cG (x, t) if theres activity in the visual
sustained activity peaks in these DNFs cease, which causes the intention DNF. When the visual match DNF is active, it drives
CoS node loose its activation as well. The intention DNFs are learning in the transformation array, according to Equations
released from inhibition and regain their initial resting levels, (16, 12).

allowing the sensory input to induce a stabilized representation
of the next target. (t) = f (umatch (x, t))dx. (16)

3.2.5. The transformation array


The transformation between the visual and the motor coordi- 3.3. THE DYNAMICS OF THE ARCHITECTURE
nates, needed to achieve a particular behavioral goal, e.g., center Figure 6 show the DNF architecture for looking at work.
the target object in the visual field, is a priori unknown. In the When salient visual input is perceived by the eDVS sensor, one
DFT architecture presented here, this transformation is repre- or several activity peaks emerge in the perceptual DNF (Figure 6,
sented by a randomly initialized coupling matrix, which imple- left), the most salient of these peaks (i.e., the one that reached
ments a potential all-to-all connectivity between the two DNFs. the activation threshold first) drives the visual intention DNF
Thus, an active visual intention DNF initially induces a peak at a (Figure 6, middle) and induces a self-sustained activity peak in
random location in the motor DNF. The lateral interactions in the this DNF. The peak in the visual intention DNF is sustained
motor DNF ensure that a peak may be built, although the con- even when the camera starts moving and the visual input shifts,
nection matrix is random (and sparse) in the beginning of the representing the instantaneous goal of the upcoming camera
learning process. movement.

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 153


Sandamirskaya DNFs and cognitive neuromorphic architectures

FIGURE 6 | The DFT architecture for autonomous looking and learning input to the motor intention DNF, which generates attractors for the motor
the sensorimotor map. The robotic camera provides input to the perceptual dynamics. Motor dynamics signals completion of the looking act through the
DNF, which performs initial segregation of object-like regions in the visual CoS node, which inhibits the intention DNFs. If the looking action brings the
stream. The visual intention DNF selects and stabilizes the spatial target object into the foveal (central) region of the field of view, the adaptive
representation (in visual coordinates) of a single target for the upcoming weights are updated according to the current (decaying) activation in the
looking action. Through adaptive weights, the visual intention DNF provides visual and motor intention DNFs.

The visual intention DNF induces an activity peak in the several hundred gaze movements to different target locations
motor intention DNF through the coupling weights, which are (Sandamirskaya and Conradt, 2013).
random in the beginning of the learning process. A localized
activity peak emerges in the motor intention DNF, formed by the 4. DISCUSSION
lateral interactions in this field. The motor intention peak sets an 4.1. GENERAL DISCUSSION
attractor for the dynamics of the pan and the tilt control variables, The principles of DFT presented in this paper set a possible
which drive the robotic pan-tilt unit. When the control variables roadmap for the development of neuromorphic architectures
are close to the attractor, the CoS node is activated and inhibits capable of cognitive behavior. As modeling framework, DFT
the visual and the motor intention DNFs. Activity in the motor is remarkable in its capacity to address issues of embodiment,
intention DNF ceases in a forgetting instability, which leads to the autonomy, and learning using neural dynamics throughout. In
CoS node to loose its activation as well. The inhibitory influence this paper, I have reviewed the DFT mechanisms that provide for
on the intention DNFs is released and the visual intention DNF the creation of stabilized sensory representations, learned associa-
may build a new activity peak from the perceptual input. tions, coupled sensory-motor representations, intentionality, and
When the camera movement is finished (event, detected by the autonomous behavior and learning. In an exemplar architecture,
CoS node), if the input falls onto the central part of the visual I demonstrated how the computational and architectural princi-
array, the visual match DNF is activated and triggers the learn- ples of DFT come together in a neural-dynamic architecture, that
ing process in the adaptive weights. In particular, the weights are coupled a neuromorphic sensor to motors and autonomously
strengthened between the currently active positions in the visual generated looking behavior while learning in a closed behavioral
intention DNF and the currently active positions in the motor loop. The categorization properties of DNFs achieve the stabiliza-
intention DNF, which correspond to the just-been-active inten- tion of the visual input against sensory noise, while the memory
tions. When the CoS node inhibits the intention DNFs, learning mechanisms allow the relevant representations to be kept active
stops and a new gazing action is initiated. long enough to parameterize and initiate motor actions and also
Figure 7 shows the activity of the motor variables during the drive the learning process after a successful movement. Adaptive
gaze movements in the learning process and Figure 8 shows the couplings between DNFs together with a mechanism that
2D projections of the 4D transformation matrix, learned over enables autonomous activation and deactivation of intentions

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 276 | 154
Sandamirskaya DNFs and cognitive neuromorphic architectures

FIGURE 7 | Top: Time-course of the activation of the motor variable (velocity of the pan joint) during four steps of the learning procedure. Middle: The value of
the pan variable. Bottom: Activation of the CoS node.

(Neftci et al., 2013) constitutes a method to set parameters of


the neuromorphic hardware in relation to parameters of a more
abstract WTA layer. By measuring the activity of hardware units,
the parameter mappings are calibrated in an automated proce-
dure. Another way to translate DNF dynamics to spiking net-
works is to use the vector-encoding of a dynamical system in the
neural-dynamic framework of Eliasmith (2005). This framework
allows one to implement the attractor dynamics of DNFs in terms
of a network of spiking units, which in its turn may define the
parametrization for a VLSI neuromorphic network.
These powerful tools allow one to translate between levels of
description and can be used to implement different models of
cognition in order to facilitate the development of behaving, neu-
FIGURE 8 | Two exemplar projections of the learned 4D transformation
romorphic cognitive systems. DFT is one of the frameworks that
array between the visual and the motor intention DNFs of the agent.
(A) Weights strength at the given visual-intention DNF horizontal position
defines the principles and constraints critical to this goal. There
as function of motor intention field coordinates (overlayed projections along are of course several other frameworks that may be used for this
ymot ). (B) Weights strength at the given visual-intention DNF vertical purpose, each with its own advantages and limitations. Thus, the
position. probabilistic framework allows one to use noisy and incomplete
sensory information to infer hidden states of the environment and
weigh alternative actions, which may bring the agent closer to its
make for an architecture in which autonomous learning goals. Such a Bayesian framework has been applied both in the
accompanies behavior. field of modeling human cognition [e.g., Griffiths et al. (2008)]
In order to translate the language of behavior-based attrac- and in robotics (Thrun et al., 2005). However, this framework has
tor dynamics of DFT to spiking networks implemented in VLSI, two limitations with respect to modeling human cognition. First,
several possibilities have been reported recently. One solution the probabilistic models focus on the functional or behavioral

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 155


Sandamirskaya DNFs and cognitive neuromorphic architectures

aspects of cognition and not the neuronal mechanisms underly- grounded, but also open to empirical links between the behavioral
ing cognitive processing. They often require normalization pro- and neuronal dynamics. Bringing principles of DFT onto VLSI
cedures which are not trivial to implement neurally. Second, the chips will, on the one hand, allow one to model human cogni-
probabilistic models often need an external mechanism to make tion and make predictions under both neuronal and behavioral
inferences on the probability distributions and do not account constraints. On the other hand, the cooperation between the two
for the process of decision making. Thus, the Bayesian archi- fields could foster the development of powerful technical cogni-
tectures may achieve powerful performance and may be used to tive systems based on a parallel, low-power implementation with
account for empirical data on human cognition, but they do not VLSI.
provide a process model of cognitive functions or offer a mecha-
nism of how these functions are achieved or realized neurally. On ACKNOWLEDGMENTS
the contrary, in neuronal modeling, the developed architectures The author gratefully acknowledges support from the organizers
are anchored in neuronal data and focus on the mechanisms and of the Telluride Cognitive Neuromorphic Engineering worksop
processes behind cognition. However, their functional implemen- 2012 and the Capo Caccia Neuromorphic Cognition 2013 work-
tations (i.e., embodiment) are typically limited and fail to address shop, as well as of Prof. J. Conradt for providing the hardware
important problems such as representational coupling, auton- setup.
omy, and development. DFT aims at bridging the two approaches
to understanding cognitive processingthe functional (behav- FUNDING
ioral) and the mechanistic (neuronal)and thus naturally fits The project was funded by the DFG SPP Autonomous learning
the goal of providing for a tool to implement neuromorphic cog- within the Priority program 1527.
nition. The scaling of DFT toward higher cognitive functions,
such as concept representation, language, and complex action
REFERENCES
sequencing is currently under way. Abrahamsen, J. P., Hafliger, P., and Lande, T. S. (2004). A time domain winner-
This paper aims to reveal the formalized DFT principles and take-all network of integrate-and-fire neurons, in Proceedings of the 2004 IEEE
concepts developed in embodied cognition and autonomous International Symposium on Circuits and Systems, 2004. ISCAS04. (Vancouver),
robotics in such a way that they may be integrated into the lan- Vol. 5, V-361.
Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type neural
guage of spiking neural networks in VLSI hardware through the
fields. Biol. Cyber. 27, 7787. doi: 10.1007/BF00337259
structure of WTA networks. DNF may be considered a functional Bastian, A., Schoner, G., and Riehle, A. (2003). Preshaping and continuous evo-
description of the soft WTA networks. The successful implemen- lution of motor cortical representations during movement preparation. Eur. J.
tation of soft WTA networks in VLSI devices to date opens the Neurosci. 18, 20472058. doi: 10.1046/j.1460-9568.2003.02906.x
way to employing the architectural elements of DFT in spiking Bicho, E., Mallet, P., and Schner, G. (2000). Target representation on an
autonomous vehicle with low-level sensors. I. J. Robotic Res. 19, 424447. doi:
hardware architectures. These structural elements as summarized 10.1177/02783640022066950
here are (1) coupling between fields of different dimensional- Bicho, E., and Schoner, G. (1997). The dynamic approach to autonomous robotics
ity, (2) coupling to sensors through space-coding, (3) coupling demonstrated on a low-level vehicle platform. Robot. Auton. Syst. 21, 2335. doi:
to rate-coded motor dynamics, (4) application of principles of 10.1016/S0921-8890(97)00004-3
autonomy (intentionality), and (5) autonomous neural-dynamic Buss, A. T., and Spencer, J. P. (2012). When seeing is knowing: the role of visual
cues in the dissociation between childrens rule knowledge and rule use. J. Exp.
learning. Some of the DFT principles, such as categorization and Child Psychol. 111, 561569. doi: 10.1016/j.jecp.2011.11.005
memory formation, are already probed in VLSI WTA networks, Carpenter, G. A., Grossberg, S., and Rosen, D. B. (1991). Art 2-a: an adaptive res-
resulting in a framework of state-based computing in spiking onance algorithm for rapid category learning and recognition. Neural Netw. 4,
networks. In addition, this paper formalizes mechanisms that 493504. doi: 10.1016/0893-6080(91)90045-7
Conradt, J., Berner, R., Cook, M., and Delbruck, T. (2009). An embedded
allow for autonomous transition between stable states through
aer dynamic vision sensor for low-latency pole balancing, in IEEE 12th
the introduction of elementary behavior structures, namely the International Conference on Computer Vision Workshops (ICCV Workshops).
intention and the conditions-of-satisfaction. This formalization (Kyoto), 780785.
also enables autonomous learning and the robust coupling of Douglas, R. J., and Martin, K. A. C. (2004). Neural circuits of the neocortex. Ann.
WTAs to each other, to sensors, and to motor dynamics. Rev. Neurosci. 27, 419451. doi: 10.1146/annurev.neuro.27.070203.144152
Eliasmith, C. (2005). A unified approach to building and controlling spiking attrac-
The DFT approach considers cognitive systems from a
tor networks. Neural Comput. 7, 12761314. doi: 10.1162/0899766053630332
behavioral perspective while neuromorphic hardware system Ellias, S. A., and Grossberg, S. (1975). Pattern formation, contrast control, and
development aims at understanding the neuronal mechanisms oscillations in the short term memory of shunting on-center off-surround
underlying cognition. The fact that these two approaches con- networks. Biol. Cyber. 20, 6998. doi: 10.1007/BF00327046
verge to a mathematically equivalent objecta DNF or a soft Erlhagen, W., and Bicho, E. (2006). The dynamic neural field approach to cognitive
robotics. J. Neural Eng. 3, R36R54. doi: 10.1088/1741-2560/3/3/R02
WTAas an elementary computational unit in the develop- Ermentrout, B. (1998). Neural networks as spatio-temporal pattern-forming sys-
ment of cognitive neuromorphic systems is a strong argument tems. Rep. Prog. Phys. 61, 353430. doi: 10.1088/0034-4885/61/4/002
for the fundamental character of this computational element. Faubel, C., and Schner, G. (2009). A neuro-dynamic architecture for one shot
Here, I aimed at establishing a common ground for future col- learning of objects that uses both bottom-up recognition and top-down predic-
laborative projects that can facilitate progress in both fields. The tion, in Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent
Robots and Systems IROS. (St. Louis, MO: IEEE Press).
VLSI networks could scale up to produce cognitive autonomous Griffiths, T. L., Kemp, C., and Tenenbaum, J. B. (2008). Bayesian models
behavior and the DFT framework could gain access to a neural of cognition, in Cambridge Handbook of Computational Cognitive Modeling
implementation which is not only more efficient and biologically (Cambridge: Cambridge University Press), 59100.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 276 | 156
Sandamirskaya DNFs and cognitive neuromorphic architectures

Grossberg, S. (1988). Nonlinear neural networks: principles, mechanisms, and Sandamirskaya, Y., and Conradt, J. (2013). Learning sensorimotortransforma-
architectures. Neural Netw. 1, 1761. doi: 10.1016/0893-6080(88)90021-4 tions with dynamic neural fields, in International Conference on Artificial Neural
Indiveri, G., Chicca, E., and Douglas, R. J. (2009). Artificial cognitive systems: from Networks (ICANN). (Sofia).
vlsi networks of spiking neurons to neuromorphic cognition. Cogn. Comput. 1, Sandamirskaya, Y., Richter, M., and Schoner, G. (2011). A neural-dynamic archi-
119127. doi: 10.1007/s12559-008-9003-6 tecture for behavioral organization of an embodied agent, in IEEE International
Indiveri, G., Linares-Barranco, B., Hamilton, T. J., van Schaik, A., Etienne- Conference on Development and Learning and on Epigenetic Robotics (ICDL
Cummings, R., Delbruck, T., et al. (2011). Neuromorphic silicon neuron EPIROB 2011). (Frankfurt).
circuits. Front. Neurosci. 5:73. doi: 10.3389/fnins.2011.00073 Sandamirskaya, Y., and Schoner, G. (2010a). An embodied account of serial order:
Indiveri, G., Murer, R., and Kramer, J. (2001). Active vision using an analog how instabilities drive sequence generation. Neural Netw. 23, 11641179. doi:
vlsi model of selective attention. IEEE Trans. Cir. Syst. II 48, 492500. doi: 10.1016/j.neunet.2010.07.012
10.1109/82.938359 Sandamirskaya, Y., and Schoner, G. (2010b). An embodied account of serial order:
Iossifidis, I., and Schner, G. (2004). Autonomous reaching and obstacle avoid- how instabilities drive sequence generation. Neural Netw. 23, 11641179. doi:
ance with the anthropomorphic arm of a robotic assistant using the attractor 10.1016/j.neunet.2010.07.012
dynamics approach, in Proceedings of the IEEE 2004 International Conference Sandamirskaya, Y., Zibner, S., Schneegans, S., and Schoner, G. (2013). Using
on Robotics and Automation. (New Orleans, LA). dynamic field theory to extend the embodiment stance toward higher
Johnson, J. S., Spencer, J. P., and Schner, G. (2006). A dynamic neural field the- cognition. New Ideas Psychol. 30, 118. doi: 10.1016/j.newideapsych.2013.
ory of multi-item visual working memory and change detection, in Proceedings 01.002
of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) Schaal, S., Ijspeert, A., and Billard, A. (2003). Computational approaches to
(Vancouver, BC), 399404. motor learning by imitation. Philos. Trans. R. Soc. Lond. B 358, 537547. doi:
Johnson, J. S., Spencer, J. P., and Schoner, G. (2008). Moving to higher ground: the 10.1098/rstb.2002.1258
dynamic field theory and the dynamics of visual cognition. New Ideas Psychol. Schoner, G. (2008). Dynamical systems approaches to cognition, in Cambridge
26, 227251. doi: 10.1016/j.newideapsych.2007.07.007 Handbook of Computational Cognitive Modeling, ed R. Sun (Cambridge:
Latash, M. L., Scholz, J. P., and Schoner, G. (2007). Toward a new theory of motor Cambridge University Press), 101126.
synergies. Motor Control 11, 276308. Searle, J. R. (1983). IntentionalityAn Essay in the Philosophy of Mind. Cambridge:
Liu, S. C., and Delbruck, T. (2010). Neuromorphic sensory systems. Curr. Opin. Cambridge University Press.
Neurobiol. 20, 288295. doi: 10.1016/j.conb.2010.03.007 Thrun, S., Burgard, W., and Fox, D. (2005). Probabilistic Robotics. Vol. 1.
Luciw, M., Kazerounian, S., Lakhmann, K., Richter, M., and Sandamirskaya, Y. Cambridge: MIT press.
(2013). Learning the perceptual conditions of satisfaction of elementary behav- Wilson, H. R., and Cowan, J. D. (1973). A mathematical theory of the functional
iors, in Robotics: Science and Systems (RSS), Workshop Active Learning in dynamics of cortical and thalamic nervous tissue. Kybernetik 13, 5580. doi:
Robotics: Exploration, Curiosity, and Interaction. (Berlin). 10.1007/BF00288786
Mead, C., and Ismail, M. (1989). Analog VLSI Implementation of Neural Systems. Zibner, S. K. U., Faubel, C., Iossifidis, I., and Schner, G. (2011). Dynamic
Norwell, MA: Springer. neural fields as building blocks for a cortex-inspired architecture of
Neftci, E., Binas, J., Rutishauser, U., Chicca, E., Indiveri, G., and Douglas, R. J. robotic scene representation. IEEE Trans. Auton. Ment. Dev. 3, 7491.
(2013). Synthesizing cognition in neuromorphic electronic systems. Proc. Natl. doi: 10.1109/TAMD.2011.2109714
Acad. Sci. U.S.A. 110, E3468E3476. doi: 10.1073/pnas.1212083110
Neftci, E., Chicca, E., Cook, M., Indiveri, G., and Douglas, R. (2010). State-
Conflict of Interest Statement: The author declares that the research was con-
dependent sensory processing in networks of vlsi spiking neurons, in
ducted in the absence of any commercial or financial relationships that could be
Proceedings of 2010 IEEE International Symposium on Circuits and Systems
construed as a potential conflict of interest.
(ISCAS). (Paris), 27892792.
Oster, M., and Liu, S. C. (2004). A winner-take-all spiking network with spik-
ing inputs, in Proceedings of the 2004 11th IEEE International Conference on Received: 07 October 2013; accepted: 25 December 2013; published online: 22 January
Electronics, Circuits and Systems, ICECS 2004. (Tel-Aviv), 203206. 2014.
Reimann, H., Iossifidis, I., and Schoner, G. (2011). Autonomous movement gener- Citation: Sandamirskaya Y (2014) Dynamic neural fields as a step toward cognitive
ation for manipulators with multiple simultaneous constraints using the attrac- neuromorphic architectures. Front. Neurosci. 7:276. doi: 10.3389/fnins.2013.00276
tor dynamics approach, in Proceedings of the IEEE International Conference on This article was submitted to Neuromorphic Engineering, a section of the journal
Robotics and Automation ICRA (Shanghai). Frontiers in Neuroscience.
Richter, M., Sandamirskaya, Y., and Schoner, G. (2012). A robotic architecture for Copyright 2014 Sandamirskaya. This is an open-access article distributed
action selection and behavioral organization inspired by human cognition, in under the terms of the Creative Commons Attribution License (CC BY). The
IEEE/RSJ International Conference on Intelligent Robots and Systems, (Algarve), use, distribution or reproduction in other forums is permitted, provided the
IROS. original author(s) or licensor are credited and that the original publication
Rutishauser, U., and Douglas, R. J. (2009). State-dependent computation in this journal is cited, in accordance with accepted academic practice. No
using coupled recurrent networks. Neural Comput. 21, 478509. doi: use, distribution or reproduction is permitted which does not comply with
10.1162/neco.2008.03-08-734 these terms.

www.frontiersin.org January 2014 | Volume 7 | Article 276 | 157


ORIGINAL RESEARCH ARTICLE
published: 17 January 2014
doi: 10.3389/fnins.2013.00278

A robust sound perception model suitable for


neuromorphic implementation
Martin Coath 1,2*, Sadique Sheik 3 , Elisabetta Chicca 4 , Giacomo Indiveri 3 , Susan L. Denham 1,2 and
Thomas Wennekers 1,5
1
Cognition Institute, Plymouth University, Plymouth, UK
2
Faculty of Health and Human Sciences, School of Psychology, Plymouth University, Plymouth, UK
3
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
4
Faculty of Technology, Cognitive Interaction Technology Center of Excellence, Bielefeld University, Bielefeld, Germany
5
Faculty of Science and Environment, School of Computing and Mathematics, Plymouth University, Plymouth, UK

Edited by: We have recently demonstrated the emergence of dynamic feature sensitivity through
Andr Van Schaik, The University of exposure to formative stimuli in a real-time neuromorphic system implementing a
Western Sydney, Australia
hybrid analog/digital network of spiking neurons. This network, inspired by models
Reviewed by:
of auditory processing in mammals, includes several mutually connected layers with
John Harris, University of Florida,
USA distance-dependent transmission delays and learning in the form of spike timing
Dylan R. Muir, University of Basel, dependent plasticity, which effects stimulus-driven changes in the network connectivity.
Switzerland Here we present results that demonstrate that the network is robust to a range of
*Correspondence: variations in the stimulus pattern, such as are found in naturalistic stimuli and neural
Martin Coath, Cognition Institute,
responses. This robustness is a property critical to the development of realistic, electronic
Plymouth University, A222, Portland
Square, Drake Circus, Plymouth, neuromorphic systems. We analyze the variability of the response of the network to
Devon PL48AA, UK noisy stimuli which allows us to characterize the acuity in information-theoretic terms.
e-mail: [email protected] This provides an objective basis for the quantitative comparison of networks, their
connectivity patterns, and learning strategies, which can inform future design decisions.
We also show, using stimuli derived from speech samples, that the principles are robust
to other challenges, such as variable presentation rate, that would have to be met by
systems deployed in the real world. Finally we demonstrate the potential applicability of
the approach to real sounds.

Keywords: auditory, modeling, plasticity, information, VLSI, neuromorphic

1. INTRODUCTION in mammalian auditory cortex is not well understood, and neither


Neurons in sensory cortex are highly adaptive, and are sensitive to are the neural mechanisms underlying the learning of dynamic
an organisms sensory environment. This is particularly true dur- sound features.
ing early life and an epoch known as the critical period (Zhang Neural mechanisms thought to underlie, for example, sensi-
et al., 2001; Insanally et al., 2009). For many organisms sounds tivity to frequency sweeps include differential latency between
of ecological importance, such as communication calls, are char- excitatory inputs (Razak and Fuzessery, 2008), or excitatory and
acterized by time-varying spectra. Understanding how to build inhibitory inputs (Razak and Fuzessery, 2010), and asymmetric
auditory processing systems that can cope with time-varying inhibition (Zhang et al., 2003; Razak and Fuzessery, 2009), all of
spectra is important. However, most neuromorphic auditory which have been shown to correlate with sweep direction and/or
models to date have focused on distinguishing mainly static pat- rate preference. However, these studies have focussed primarily
terns, under the assumption that dynamic patterns can be learned on local neural mechanisms (Ye et al., 2010) whereas anatomical
as sequences of static ones. studies of the auditory system reveal widespread lateral connec-
One strategy for devices that implement artificial sensory sys- tions and nested recurrent loops, and in many cases feedback
tems is to emulate biological principles. Developing this approach connections outnumbering feed-forward ones (Friston, 2005).
holds out the hope that we might be able to build devices that We have demonstrated previously that it is possible to address
approach the efficiency and robustness of biological systems and, the problem of sensitivity to dynamic stimuli, including but not
in doing so, new insights in to neural processing might be gained. limited to frequency modulated (FM) sweeps, with a biophysi-
If, as is widely believed, the perception of complex sensory stimuli cally plausible model of auditory processing (Coath et al., 2010).
in vivo is based upon the population response of spiking neurons We have validated the model with a real-time physical system
that are tuned to stimulus features then important questions arise, implemented using neuromorphic electronic circuits (Sheik et al.,
including what are these features, and how do they come in to 2011). However, neither of these studies has investigated the
existence? The situation for artificial auditory perception is com- robustness of the system to stimuli that exhibit variation, either
plicated by the fact that the way in which sounds are represented in spike pattern, or presentation rate, or to the order of similar

www.frontiersin.org January 2014 | Volume 7 | Article 278 | 158


Coath et al. Robust neuromorphic sound perception

stimuli when sets of stimuli are presented continuously. In addi- within each channel are laterally connected to other channels.
tion, the spectro-temporal patterns used as stimuli in these earlier The hardware implementation and the software simulation both
studies are not derived from, or related to, those found in natural use 32 tonotopic channels. Where real stimuli are processed (see
sounds such as speech, or other communication calls of animals. section 2.1.5) the cochlea uses a linear gammatone filter bank, fol-
All of these considerations are important if the principles involved lowed by half wave rectification and low pass filtering to simulate
are to be implemented in artificial sensory systems that can be the phase locking characteristics of auditory nerve firing, and cen-
deployed in realistic environments. ter frequencies ranging from 50 to 8000 Hz equally spaced on the
In the present paper we provide evidence that the approach Equivalent Rectangular Bandwidth scale (Glasberg and Moore,
first presented in Sheik et al. (2011) is suitable for real-world 1990).
deployment in that we extend the hardware results to an inves- The input neuron A, at each tonotopic position projects to a
tigation of responses to noisy stimuli. We also present results B1 and a B2 neuron in the same channel via excitatory synapses.
from a software simulation that replicates the hardware as closely The output of the network is taken to be the activity of the B2
as possible using stimuli derived from speech and presented con- neurons. This activity is derived from the input, but controlled by
tinuously at different rates. Robustness to both of these types excitatory and inhibitory projections from B1 neurons. However,
of stimulus variation is a necessary condition for any practical the excitatory B1 B2 projections originate only from other
system. Finally we predict the results from networks with compa- tonotopic channels, these connections exhibit distance dependent
rable architectures trained on real world stimuli. This approach is propagation delays, and terminate with plastic synapses which
useful in that it provides guidelines that can be used to inform the are the loci of Spike Timing Dependent Plasticity (STDP) (see
design of more complex neuromorphic processing systems that section 2.1.2). Each B1 neuron is connected to a number of B2
could be implemented in the future. neurons via these delayed connections that have a fan out of 14
neurons on either side. The learning rule implemented at the
2. METHODS synapses associated with these connections (shown as filled tri-
2.1. NETWORK angles in Figure 1) ensures that the B2 neurons are active only
2.1.1. Schematic if there are coincidences between spikes within the channel and
A schematic representation of the network, as implemented in delayed spikes from other channels; it is this feature that allows
both hardware and software, is shown in Figure 1. The horizon- the network to learn dynamic spectro-temporal patterns. The
tal axis in the figure represents the tonotopic arrangement of the units marked C represent the delays in the B1 B2 connections
auditory system, divided in to a number of frequency channels which are implemented differently in hardware and software, see
representing positions on the basilar membrane. The pattern of sections 2.1.3 and 2.1.4.
spiking in the A neurons thus represents the output of an artifi-
cial cochlea (Chan et al., 2007). Three channels only are shown 2.1.2. Spike timing dependent plasticity
in Figure 1, the central channel is labeled and the two flanking Plasticity in both the hardware and software networks is imple-
channels are shown dimmed to illustrate how the neurons mented in each of the B1 B2 synapses in the form of an

Output

Delays

B2

B1

Synapses
A Excitatory

Input Inhibitory
Excitatory STDP

FIGURE 1 | Schematic representation of the network as implemented in shown dimmed to illustrate how the channels interconnected by delayed
hardware and software. Neurons are arranged in groups representing excitatory projections that terminate in plastic synapses. The two populations
positions on the tonotopic axis. Three channels only are shown, and of these of B neurons receive input from the same A neurons within the same channel.
only the central channel is labeled for clarity. The two flanking channels are B2 neurons are excited by B1 neurons from outside their own channel.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 278 | 159
Coath et al. Robust neuromorphic sound perception

STDP -like model of synaptic plasticity described fully in Brader


et al. (2007). In the absence of activation, the synaptic weight,
or efficacy, drifts toward one of two stable values, 0 or 1; and
although it can take on other values, it is bounded by these two
values and stays constant at one of them unless further learning
events occur. This has the advantage of preventing instabilities
in the adaptation, such as the unbounded growth of connection
strengths.

2.1.3. Hardware implementation FIGURE 2 | Illustration of sample spike patterns used as probe
The first set of results presented in section 3.1 were obtained stimuli in the hardware experiments described in section 2.1.3 to
using a hybrid analog /digital hardware implementation of the investigate the robustness of the network response to variability in
the probe stimulus. In all cases these patterns of spikes are prepared
network model which consists of a real-time, multi-chip set-up as off-line using a model integrate and fire neuron, a 5.5 ms current pulse,
described in Sheik et al. (2011). Three multi-neuron spiking chips and a noise current that extends over the whole stimulus period. From
and an Address Event Representation (AER) mapper (Fasnacht left to right the ratio between the noisy current the current pulse used
and Indiveri, 2011) are used connected in a serial loop. The multi- to generate the spike in a channel increases from 0 to 1. The range
neuron chips were fabricated using a standard AMS 0.35 m illustrated is greater than that used in the experiments where the
highest level of noise is = 0.45.
CMOS process.
The hardware does not directly support propagation delays
between neurons. To overcome this limitation, long synaptic
and neuronal time constants are exploited, which due to the characteristics between the B1 and the B2 neurons (Sheik et al.,
variability in hardware have a range of values (Sheik et al., 2011).
2012). Given that the weights associated with the synapses The method adopted in the first set of experiments (sec-
of a neuron are strong enough to produce a single output tion 3.1) using the hardware implementation of the network and
spike, the time difference between the pre-synaptic spike and FM sweep stimuli was the same as that described in Sheik et al.
the post-synaptic spike is considered equivalent to a propaga- (2011). We reset the neurons to their resting state at the beginning
tion/transmission delay. Therefore, every projection in the model of each ES and the plastic synapses to their low state, that is with
that requires a delay is passed through an additional neuron, effectively null synaptic efficacy. Input patterns were presented 30
referred to as a delay neuron. The delay neurons are labeled C in times over a period of 3 seconds during which time the network
Figure 1. learns. We then measured the response of the exposed network
by probing with a set of PS that consisted of linear frequency
2.1.3.1. Frequency modulated stimuli. Trials with the hardware sweeps with different velocities; during each of these presen-
network were conducted with stimuli representing Frequency tations the number of spikes in the B2 neurons was recorded.
Modulated (FM) sweeps. These were prepared off-line by inject- Stimuli representing each of the 10 sweep rates were presented 100
ing current in to integrate and fire neurons. A current pulse of times for each noise level during the probe phase. These results
duration 5.5 ms is used in each channel in turn to generate the were used to determine the Stimulus Specific Information (SSI)
burst of input spikes representing the activity of the A neurons as described below.
(see Figure 1) when presented with a frequency modulated stim-
ulus. In order to evaluate the robustness of the network response 2.1.3.3. Stimulus Specific Information. As artificial sensory sys-
to stimulus variation, or noise, an additional noisy current signal tems become increasingly complex it will become increasingly
is added to this injection current used to generate the input spikes important to make principled decisions about their design. In the
as illustrated in Figure 2. Noise is generated from an Ornstein majority of cases choices will have to be made where the detailed
Uhlenbeck (OU) process with a zero mean using the forward Euler neurobiological data is incomplete or difficult to interpret. This
method (Bibbona et al., 2008). We define the noise level, as the inevitably leads to a requirement to quantify the performance
ratio between the standard deviation of the OU process and the of the network (for comparison with in vivo data, and to guide
magnitude of the actual noise free current signal used to generate choices of architecture, learning rule, etc.) where no clear guid-
the spikes. ance is available from physiology.
A measure that has been used to characterize neuronal
2.1.3.2. FM sweep trials and analysis. Trials for the hardware and acuity is the Stimulus Specific Information (SSI) which is a
software versions of the network consisted of two parts; first the formalization of the intuitive view that a stimulus is well
exposure phase, using the exposure stimulus (ES), followed by a encoded if it produces an unambiguous response; that is
probe phase using a number of different probe stimuli (PS) pre- a response that is associated with a unique, or very small
sented many times. During the exposure phase the learning rule number of, stimuli. Where this is true the stimulus is read-
forces the weight, or efficacy, of each B1 B2 plastic synapses to ily identified when one of these responses, or a response in
either one or zero; this effects a pattern of stimulus driven con- the correct range, appears (Butts and Goldman, 2006). This
nectivity. The selection by the learning rule of only a few high characterization has the advantage of not being dependant
efficacy connections is the origin of the difference in response on the design or performance of a classifier. The specific

www.frontiersin.org January 2014 | Volume 7 | Article 278 | 160


Coath et al. Robust neuromorphic sound perception

information of a response isp (r) given a set of stimuli  can be do not. This method has been widely used in the characterization
written: of classifiers and is believed to perform very well (Fawcett, 2006).
  In all cases the AUC will be between 0.5, representing a network
isp (r) = p() log2 p() + p (|r) log2 p (|r) that will not function as a classifier, and 1.0 which represents a
  perfect classifier at all thresholds. Although useful, unlike the SSI
this ignores the information present in the response concerning
where it is defined in terms of the entropy of the stimulus any of the other six classes.
ensemble, and that of the stimulus distribution conditional on a
particular response. This makes the isp (r) a measure of the reduc- 2.1.4. Software implementation
tion in uncertainty about the stimulus  gained by measuring A second set of results, presented in section 3.2, were obtained
particular response r. Thus the value of the isp (r) is high for using a network implemented in custom C code closely based
unambiguous responses and low for ambiguous responses. The on the hardware implementation. The learning rule implemented
SSI is simply the average specific information of the responses that
is also the same as in the hardware implementation (see sec-
occur when a particular stimulus, , is present: tion 2.1.2). In these software simulations of the hardware imple-
 mentation the lateral, or B1 to B2 , projections exhibit distance
iSSI () = p (r|) isp (r) dependent delays that cover the same range of values as the
r hardware network, however these delays were implemented in a
queued data structure whereas in the hardware these delays are
We show that the performance of the network can be character-
implemented by exploiting variability of time constants that result
ized by the SSI which combines features of the tuning curve, where
from the fabrication of the chip (Sheik et al., 2011, 2012). Beside
information is encoded in the rate of response, and of the Fisher
this difference the software model was designed to be close to
Information where the high-slope regions of the tuning curve are
the hardware implementation in order to allow for reliable pre-
the most informative (Butts and Goldman, 2006).
dictions of the hardwares learning and recognition capabilities.
Because the hardware operates in real biological time, use of an
2.1.3.4. Receiver Operating Characteristics. A Receiver
emulated software version allowed us to run a large number of
Operating Characteristic ( ROC) can be used as a measure of
tests, which would have been impossible in hardware.
performance of classifiers (Fawcett, 2006). ROC graphs have
their origin in signal detection theory but are also popular
2.1.4.1. Stimuli derived from speech. The stimuli used in these
in other fields, including the evaluation and comparison of
experiments using the software network were derived from speech
machine learning algorithms. The output from the network can
and represent the formant tracks of a set of English words.
be interpreted as a binary classifier if we designate the Exposure
Formants are peaks in the frequency response of sounds caused
Stimulus as the target for identification by setting a detection
by resonances in the vocal tract. These peaks are the characteris-
threshold. The ROC is then a graph of the False Positive Rate
tics that identify vowels and in most cases the two first formants
(FPR) against the True Positive Rate (TPR) for all values of the
are enough to disambiguate a vowel. This approach was chosen
detection threshold. The FPR is simply the ratio between the
as it results in stimuli that increase the complexity and realism
number of stimuli of the target class correctly identified (True
from the single, and double, FM sweeps used in the first set of
Positives, TP) and the total number of stimuli identified as
experiments.
belonging to this class (Positives, P):
A vocabulary of seven words was chosen: And, Of, Yes, One,
TP Two, Three, Four and three examples of each were recorded using a
TPR = male speaker (the first author). Seven words were chosen because
P
they exhibit a variety of vowel sounds, and hence their formant
Likewise, the FPR is the ratio between the the number of stim- tracks exhibit a range of spectrotemporal correlations, also they
uli incorrectly identified as belonging to the target class (False are monosyllabic and (almost) free of diphthongs. The formant
Positives, FP) and the total number of stimuli identified as not tracks of these words exhibit spectrotemporal correlations, for
belonging to this class (Negatives, N): example changes in frequency over time and maxima at two dif-
ferent spectral positions at the same time, that we have shown can
FP be learned by the networkthere is more on the mechanism of
FPR = this learning in section 3.1.
N
The first and second formant tracks of these seven classes were
The ROC curve is a two-dimensional visualization of the systems extracted using LPC which yields position (frequency) and magni-
potential as a classifier. We also make use of a common method tude parameters for formants (Ellis, 2005). The results are shown
to reduce this to a single scalar value, that is to calculate the area in Figure 3 in which parts of the stimulus indicated with a thicker
under the ROC curve, abbreviated AUC; this is achieved by adding line (in blue) are those with an LPC magnitude of greater than
the area of successive trapezoids (Fawcett, 2006). The Area Under 15% of the maximum value indicating the position of the vowel.
Curve (AUC) is used to quantify the relative overall ability of the The thin line sections (in gray) correspond to the parts of the
network to discriminate between the two classes of stimuli; that is sound files that were silent or contained consonants. Figure 4
those that match the class of the Exposure Stimulus and those that shows how the three examples of each word have formant tracks

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 278 | 161
Coath et al. Robust neuromorphic sound perception

FIGURE 3 | Illustration of the derivation of simplified stimuli consisting of gray) are the parts of the sound files that were silent or contained consonants.
the first and second formant tracks for the seven words extracted using Formant tracks of vowels, shown in thicker blue line segments, were
Linear Predictive Coding. The words were And, Of, Yes, One, Two, smoothed and down-sampled to produce the patterns of current injection that
Three, Four as labeled in titles of subfigures. The thin line segments (in were a highly simplified representation of the speech stimuli, see Figure 5.

FIGURE 5 | An example of a stimulus sequence, or sentence, used as


a Probe Stimulus (PS) for the second set of experiments in software
simulations. The stimulus is a concatenation of simplified formant tracks
drawn from the set of words illustrated in Figure 3. The labels on the upper
abcissa (in blue) show the stimulus class. Each word is arranged to be
250 ms long, hence the presentation rate in the trials referred to as
normal is 4 stimuli per second, see section 2.1.4.

FIGURE 4 | Formant tracks extracted using LPC for three examples of


two different stimuli And and Yes. Sections of the individual stimuli
corresponding to vowels are indicated by thicker red, green, and blue
as 32 25 binary patterns used as inputs to the network simula-
segmemts this Figure clearly shows that there is some variation in the
formant tracks among sets of stimuli of the same class even when
tion, with each of the 32 rows representing a frequency channel
recorded from a single speaker. and each of the 25 columns representing temporal bins of 10 ms.
Thus with each monosyllabic word occupying 250 ms presenta-
tion at the normal or 100% presentation rate is 4 stimuli per
that are comparable. For clarity only two of the seven words are second, a realistic rate for speech. We use the same stimuli pre-
shown in Figure 4 and the extracted formant tracks highlighted sented at other rates (60, 150, 200% of the normal rate of 4
using thicker colored lines as in Figure 3. stimuli per second) to investigate robustness to time warping, see
The formant tracks were then smoothed and down-sampled section 2.1.4.
to produce the patterns of current injection that were a simpli- Random concatenations of the 21 stimuli produced simplified,
fied representation of the stimulus, see Figure 5. These patterns formant based representations of nonsense sentences of the type
of current injection derived from the formant tracks are stored three and one of four two four yes and one of two three yes one etc,

www.frontiersin.org January 2014 | Volume 7 | Article 278 | 162


Coath et al. Robust neuromorphic sound perception

an example of which is shown in Figure 5. The sentences were implementation of the delays using the variability of time con-
arranged to contain equal numbers of each stimulus and were stants that result from the fabrication of the chip (Sheik et al.,
presented during the exposure phase without gaps. 2011, 2012). However, although this limitation is taken in to
account in the software model results, future hardware designs
2.1.4.2. Formant track trials and analysis. In the exposure phase need not exhibit these limitations if the delays are implemented
of the second set of experiments reported in section 3.2 the net- differently. It is partly to explore these possibilities that results in
work was exposed to 20 repetitions (5 s) of all three examples section 3.3 include examples that employ a wide range of values
of a single utterance; during this time the learning was switched for .
on. This was followed by the probe phase where all stimuli A simple example of how correlation in the stimulus leads to
were presented 50 times in randomized order, without gaps, and potentiation of a small set of synapses is illustrated in Figure 6.
with the learning switched off. The output spikes from the B2 The left subfigure shows activity in a channel followed by activ-
were counted for each stimulus and the total number of spikes ity in another channel some time later, represented by two dots.
recorded. This allows the SSI to be calculated for the speech- The propagation of activity through the lateral connections has a
derived stimuli in the same way as for the FM sweeps in the fixed offset and a velocity; represented by horizontal and sloping
hardware resultsusing the methods detailed in section 2.1.3. broken gray lines respectively. The right subfigure shows that the
Sample results are shown in Figure 9. synapses connecting neurons in two channels are potentiated if
In addition to the SSI it is possible, because these experiments they lie on the broken gray line representing the propagation.
can be interpreted as a set of keyword spotting trials based on In a second more complex example shown in Figure 7 the
spiking rate, to characterize the network as a binary classifier. two labeled dots are in exactly the same position as Figure 6 for
The output spikes from the B2 neurons were counted during each comparison. In this case however the stimulus consists of two
stimulus, and the total number of spikes recorded; from these data tones, both rising in frequency but with different rates and start-
we can construct the ROC and hence the AUC of the responses of ing times. The network can learn the fact that there are two sweep
the network. velocities present at the same time as indicated by the predicted
connectivity pattern. Because the sweeps are linear the potenti-
2.1.5. Learning predictions ated synapses in red and blue are parallel to the diagonal in the
The third set of results in section 3.3 deals with analytical pre- weight matrix. The black synapses are potentiated by the apparent
dictions of what the network, either hardware or software, would up velocities between pairs of points of different colors as they
learn in ideal circumstances if exposed to an arbitrary stim- diverge. Note, there will be no corresponding apparent down
ulus. These analytical predictions of what pattern of learning correlations (below the diagonal) until the sweeps are further
would result from exposure to a particular stimulus are based on apart because of the fixed propagation offset.
the principle, mentioned in section 2, that the function of the
B2 neurons is to learn correlations between activity at different 3. RESULTS
times at different tonotopic channels. Calculating the strength of 3.1. FM SWEEPS
these correlations should therefore give us an approximation of The first set of results were obtained by recording spikes from sili-
the connectivity pattern that would result from exposure to any con neurons in a hardware implementation of the network shown
arbitrary stimulus. in Figure 1. Using the spikes recorded form the B2 neurons it
We calculate the strength of the correlation, and hence the pre-
dicted strength of connectivity, between two network channels x
and y after exposure. This can be written Cx,y and is calculated as A B
the sum of the products of the stimulus activity A, over all times t,
over all pairs of frequency channels x, y, taking in to account the
time difference caused by the delays in the lateral connections t ,
and the time difference between the pre- and post-synaptic spikes
that is required by the STDP rule . The STDP rule also penalizes
any activity in y that precedes activity in x thus the pattern of
connectivity can be approximated by:
    
C(x, y) = Ax,t Ay,t + t +  Ax,t Ay,t + t 
t
FIGURE 6 | The learning of a simple correlation in the network. (A)
Activity in a channel followed by activity in another channel some time later,
The value of t is a function of the channel separation between represented by the dots a1 and a2 which are in channels 5 and 9 in this
x and y, and the time taken for the activity to propagate between example. The propagation of activity through the lateral connections has a
adjacent channels : fixed offset and a velocity each represented by broken gray lines in (A). (B)
Plastic synapses connecting the neuron excited by a1 to the neuron excited
t = |x y| by a2 are potentiated if they are on the broken gray line representing the
propagation of activity. These synapses are at position (5, 9) on the weight
matrix shown by a dot. The distance from the diagonal v is proportional to
It is important to note that the range of effective values of the apparent sweep velocity from a1 to a2 .
is extremely limited in the current hardware due to the

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 278 | 163
Coath et al. Robust neuromorphic sound perception

is possible to calculate the SSI with respect to all the FM Probe 3.2. FORMANT TRACKS
Stimuli (PS) after using each these as Exposure Stimuli (ES). The second set of results comes from the software version of
These results are shown in Figure 8 which summarizes the SSI the network using the simplified formant track stimuli. These
for all Exposure-Probe stimulus combinations at four noise lev- results are collected in the same way as for the results in sec-
els. Figure 8 shows that the maximum of the SSI occurs often, tion 3.1. Figure 9 shows the SSI for two of the seven classes of
but not always, at the sweep rate representing the ES. This is in Exposure Stimuli; Two and Four. The SSI values are shown for
contrast to what we would expect if we were measuring tuning the no noise condition ( = 0.00 in blue), and for the noisiest
curves. The SSI measures the reduction in uncertainty, or infor- condition ( = 0.45 in red). The maximum value for the SSI is
mativeness, provided by the response which is not necessarily at log2 (7) 2.80 there being 7 classes of stimulus. The maximum
the same place as the response maximum. SSI is approached in the no noise condition for the ES class in
both cases; it is however also clear that there is information in
the network response concerning all classes, not only for the class
of the Exposure Stimuli. These results are representative of those
A B
obtained with all other ES classes.
The next results, shown in Figure 10, are ROC curves for tri-
als using one of the seven stimulus classes for training; And.
Unlike the SSI results these figures can be obtained only by des-
ignating the Exposure Stimuli as belonging to the class to be
detected by the network after training; that is treating the net-
work as a binary classifier. Two presentation rates (Rate = 100
and 200%) combined with two noise levels ( = 0.0 and 0.45) are
shown such as to generate four conditions including the best and
worst cases. Other results for this class are intermediate and this
FIGURE 7 | Correlations in a more complex stimulus. (A) Stimulus pattern is repeated for all other ES classes. Full summary results
consisting of two rising tones, the two dots are in exactly the same from the ROC curves presented as Area Under Curve (AUC) for
position as Figure 6 for comparison. (B) The network can learn the fact that all presentation rates and noise levels are shown, for four repre-
there are two sweep velocities present at the same time as indicated by
sentative classes, in Table 1. Results for the remaining three classes
the colors. Because the sweeps are linear the potentiated synapses
representing the individual sweeps (red and blue) are parallel to the diagonal are comparable.
in the weight matrix. The black synapses are potentiated by the apparent
up velocities between pairs of points of different colors as they diverge. 3.3. PREDICTED PATTERNS OF LEARNING
The third and final set of results shows the predicted pattern of
connectivity that would result from the exposure of an ideal-
ized network to spectrographic representations derived from real
sounds. These results are derived from the analytical approach
described in section 2.1.5.
First the approach is validated using sound files designed
to mimic the simple patterns used in the other experiments
previously reported. Figure 11 shows the predicted connectiv-
ity pattern derived from a spectrographic representation of a
sound file, alongside a previously reported result from Sheik
et al. (2011) using a synthetic stimulus pattern in the hardware

FIGURE 8 | Robustness to variation in the stimulus using hardware


implementation and synthetic FM sweeps. These plots illustrate the
Stimulus Specific Information for trained network and FM sweeps using
noisy stimuli. Subfigures represent increasing values of added noise that
causes the spike pattern to added to and disrupted from the simple sweep
produced by current injection in to successive channels as illustrated in
Figure 2. Color scale is in bits. The maximum value is log2 (10) 3.32 as FIGURE 9 | Stimulus Specific Information results for two of the seven
there are 10 classes of stimuli, interpreting each FM rate as a separate classes of stimuli, Two and Four. In Blue is the no noise ( = 0.00)
class. condition and in Red the noisiest ( = 0.45) condition for comparison.

www.frontiersin.org January 2014 | Volume 7 | Article 278 | 164


Coath et al. Robust neuromorphic sound perception

Table 1 | Combined table showing Area Under Curve (AUC) results for
all noise conditions and all presentation rates for four of the
Exposure Stimuli, And,Of,Yes,Four.

Rate

60% 100% 150% 200%

And
=
0.00% 0.97 0.97 0.95 0.95
0.15% 0.94 0.95 0.96 0.93
0.35% 0.92 0.92 0.92 0.90
0.45% 0.87 0.86 0.85 0.85
Of
=
0.00% 0.88 0.93 0.86 0.84
0.15% 0.88 0.88 0.87 0.84
0.35% 0.85 0.84 0.85 0.80
0.45% 0.78 0.76 0.81 0.78
Yes
FIGURE 10 | Representative Receiver Operating curves (ROC) for
the network with exposure stimulus And. ROC curves plot the =
False Positive Rate (FPR or fall-out) against the True Positive Rate 0.00% 0.95 0.98 0.96 0.95
(TPRsometimes called recall) for all detection thresholds. For clarity 0.15% 0.95 0.95 0.95 0.94
only four conditions are shown with two representing the best and 0.35% 0.92 0.92 0.91 0.90
worst case. The solid black line represents the best case with no
0.45% 0.88 0.85 0.86 0.81
added noise ( = 0.00) and with the probe stimuli presented at the
Four
normal rate (Rate = 100%). The broken blue line represents the
worst case across all conditions with the added noise at 45% =
( = 0.45) and the presentation rate at twice the normal rate 0.00% 0.96 0.97 0.97 0.95
(Rate = 200%). 0.15% 0.95 0.96 0.96 0.92
0.35% 0.90 0.94 0.93 0.88
0.45% 0.86 0.87 0.85 0.82

implementation. A range of simple patterns give comparable Example ROC curves for the And stimulus can be seen in Figure 10. Other
results in hardware and software. ROC and AUC results are comparable in all seven classes.
An example of this approach using a recording of a biological
communication call is shown in Figure 13. The example chosen
is a recording of a call from a Weddle Seal; the cochleagraphic A B
representation of this call can be seen in Figure 12. These results
show the predicted connection patterns that would result from
training a network similar to that used in the hardware and sim-
ulation experiments. However the results require a wider range of
propagation rates between channels than can be achieved with the
current hardware.
Four results are illustrated in Figure 13, each using a differ- C D
ent value of , the time taken for activity to propagate between
adjacent channels. Figure 13A shows the result with the lowest
value for ; note the emphasis on connections below the diago-
nal indicating down-sweeps and the distance from the diagonal
to the lower left maximum of the connectivity represents the
chirp FM rate of the successive downward sweeps in the seal
call. In contrast to A the predicted connectivity in C results from
an apparent up sweep. This apparent up activity in fact rep-
FIGURE 11 | Comparison between hardware result using a synthetic
resents correlations between successive down-sweeps, that is the
stimulus pattern (A,B) and learning prediction using a real sound file
relationship between the maxima of each down-sweep (at low (C,D). Top row shows raster of synthetic exposure stimulus (A) and
frequency) and the majority of the succeeding down-sweep at resulting network connectivity after exposure (B) for hardware
higher frequency. B contains features visible in A and C and so networkthese figures are taken from Sheik et al. (2011). Bottom row
best characterizes the stimulus, while the longest value for in shows spectrogram of comparable sound file (C) and the analytically
predicted pattern of connectivity (D) based on correlations in the stimulus
Figure 13D captures few, if any, of the dynamic features of the representation as described in section 2.1.5.
stimulus.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 278 | 165
Coath et al. Robust neuromorphic sound perception

In section 2.1.5 we discuss how the network is capable of


simultaneously representing the position, rate of change, and
spectral distance (and to a more limited extent temporal distance)
between features in the stimuli. Adaptive sensitivity to all of these
has been demonstrated in hardware and software. The robustness
in the system is derived from the fact that, although noise and
variable presentation rate alter or degrade these patterns of fea-
tures, it requires either or both types of variability to be present to
a very large degree for the degradation to cause the correlations
FIGURE 12 | An example of a natural communication call, in this case a
to be masked completely.
Weddle Seal, shown here as a spectrogram. This pattern was used to
derive the predicted learning patterns shown in Figure 13. We have introduced an information theoretic characterization
of the performance of the network, the SSI, based on the vari-
ability of the stimuli and the consequent range of responses to a
single stimulus class. This represents a method of quantifying the
performance of a hardware system that has not been previously
reported in an engineering context, but has direct parallels in
physiological measurements. The substitution of an information
theoretic measure for a classifier is deliberate, because it focusses
on the information present in the response rather than the design
or performance of the classifier. Our results, summarized in
Figures 8 and 9 indicate that the adaptation of the network to
the formative stimulus produces a differential response that is
informative with respect to all classes.
Sensory stimuli, in particular auditory stimuli, contain both
short and long range temporal correlations. The techniques
currently employed in the hardware implementation primarily
address correlations only over time scales of the order of synap-
tic or membrane time constants, up to those represented by the
propagation of excitation to adjacent regions. However we have
shown that the principles embodied in the network could be
extended to longer time scales making it feasible to build systems
capable of adapting to complex stimuli, such as animal commu-
nication calls. In hardware, longer time scales could be addressed
FIGURE 13 | This figure shows the predicted connectivity of the using many levels of recurrence between widely separated layers,
network, as Figure 11D, but for the Weddle seal call; the spectrogram as is observed in the mammalian auditory system. Alternatively,
for this call is shown in Figure 12. There are four predictions based on a
from a pragmatic perspective, it could be tackled with working
range of values for the propagation rate between channels. (AD) show the
predicted pattern of connectivity with increasing values of corresponding
memory and neuromorphic implementations of state machine
to the network adapting to correlations at progressively longer time scales. based approaches (Neftci et al., 2013).
Note that in (A) the distance from the diagonal to the lower left local Alongside our previously reported results (Sheik et al., 2011)
maximum represents the chirp FM rate. In (C) a clear apparent up-sweep we pointed out that in order to be useful, the properties of
is caused by the low frequency maxima of each chirp and the majority of
the neuromorphic system we described would have to be val-
the next successive chirp being at higher frequency. The (B) shows features
of (A,C). The (D) captures less of the dynamic nature of the stimulus. idated against noise and other variations in the stimulus, and
to be shown to work with more realistic stimuli. We also
promised to go beyond the demonstration of emergent sensi-
tivity to a stimulus parameter, and to quantify the increase in
4. DISCUSSION AND CONCLUSION acuity in information-theoretic terms; thus providing a basis
The results presented here show that the previously published for the quantitative comparison of networks, connectivity pat-
results and approach (Sheik et al., 2011) are not limited to simple terns, and learning strategies in the future. In this work we have
stereotypical stimuli and, even in this highly challenging arena, made significant progress in all of these aims. The approach
that there is scope for implementing systems that are robust to has been shown to be capable of handling considerable stim-
realistic signal variability. The stimuli used in these studies exhibit ulus variation, changes in presentation rate, and the increased
a range of different spectro-temporal properties, are presented complexity of stimulus. Had it fallen at any of these hurdles
continuously rather than in isolation, exhibit wide variability due then the feasibility of the approach would have been called in
to added noise, and have a variable presentation rate. All of these to question. It is clear, then, that each of these new results is
complications, and distortions, represent a substantial challenge evidence that the approach could lead to a neuromorphic sub-
and are necessary prerequisites to the development of systems that system engineered for dynamic pattern recognition in real world
can deployed in real situations. applications.

www.frontiersin.org January 2014 | Volume 7 | Article 278 | 166


Coath et al. Robust neuromorphic sound perception

ACKNOWLEDGMENTS Razak, K. A., and Fuzessery, Z. M. (2008). Facilitatory mechanisms underlying


FUNDING selectivity for the direction and rate of frequency modulated sweeps in the
auditory cortex. J. Neurosci. 28, 98069816. doi: 10.1523/JNEUROSCI.1293-
This work was supported by the European Communitys 08.2008
Seventh Framework Programme (grants no.231168-SCANDLE Razak, K. A., and Fuzessery, Z. M. (2009). GABA shapes selectivity for the rate and
and no.257219-neuroP), and by the Cluster of Excellence 277 direction of frequency-modulated sweeps in the auditory cortex. J. Neurophysiol.
(CITEC, Bielefeld University). We would like to thank our col- 102, 13661378. doi: 10.1152/jn.00334.2009
Razak, K. A., and Fuzessery, Z. M. (2010). Development of parallel auditory thala-
league Robert Mill for many helpful suggestions and contribu-
mocortical pathways for two different behaviors. Front. Neuroanat. 4:134. doi:
tions made during the work underlying this manuscript. 10.3389/fnana.2010.00134
Sheik, S., Chicca, E., and Indiveri, G. (2012). Exploiting device mismatch in neu-
REFERENCES romorphic vlsi systems to implement axonal delays, in The 2012 International
Bibbona, E., Panfilo, G., and Tavella, P. (2008). The OrnsteinUhlenbeck pro- Joint Conference on Neural Networks (IJCNN) (Brisbane), 16.
cess as a model of a low pass filtered white noise. Metrologia 45, S117. doi: Sheik, S., Coath, M., Indiveri, G., Denham, S., Wennekers, T., and Chicca, E. (2011).
10.1088/0026-1394/45/6/S17 Emergent auditory feature tuning in a real-time neuromorphic vlsi system.
Brader, J. M., Senn, W., and Fusi, S. (2007). Learning real-world stimuli in a neural Front. Neurosci. 6:17. doi: 10.3389/fnins.2012.00017
network with spike-driven synaptic dynamics. Neural Comput. 19, 28812912. Ye, C., Poo, M., Dan, Y., and Zhang, X. (2010). Synaptic mechanisms of direc-
doi: 10.1162/neco.2007.19.11.2881 tion selectivity in primary auditory cortex. J. Neurosci. 30, 18611868. doi:
Butts, D. A., and Goldman, M. S. (2006). Tuning curves, neuronal variability, and 10.1523/JNEUROSCI.3088-09.2010
sensory coding. PLoS Biol. 4:e92. doi: 10.1371/journal.pbio.0040092 Zhang, L. I., Bao, S., and Merzenich, M. M. (2001). Persistent and specific influ-
Chan, V., Liu, S.-C., and van Schaik, A. (2007). AER EAR: A matched silicon cochlea ences of early acoustic environments on primary auditory cortex. Nat. Neurosci.
pair with address event representation interface. IEEE Trans. Cir. Syst. I 54, 4, 11231130. doi: 10.1038/nn745
4859. doi: 10.1109/TCSI.2006.887979 Zhang, L. I., Tan, A. Y. Y., Schreiner, C. E., and Merzenich, M. M. (2003).
Coath, M., Mill, R., Denham, S., and Wennekers, T. (2010). The emergence of Topography and synaptic shaping of direction selectivity in primary auditory
feature sensitivity in a recurrent model of auditory cortex with spike timing cortex. Nature 424, 201205. doi: 10.1038/nature01796
dependent plasticity, in Proceedings of BICS 2010 (Madrid).
Ellis, D. P. W. (2005). Sinewave Speech Analysis/Synthesis in Matlab. Web resource
Conflict of Interest Statement: The authors declare that the research was con-
available online at: http://www.ee.columbia.edu/ln/labrosa/matlab/sws/
ducted in the absence of any commercial or financial relationships that could be
Fasnacht, D., and Indiveri, G. (2011). A PCI based high-fanout AER mapper with
construed as a potential conflict of interest.
2 GiB RAM look-up table, 0.8 s latency and 66 mhz output event-rate, in
Conference on Information Sciences and Systems, CISS 2011 (Baltimore, MD:
Johns Hopkins University), 16. Received: 24 September 2013; accepted: 30 December 2013; published online: 17
Fawcett, T. (2006). An introduction to roc analysis. Patt. Recogn. Lett. 27, 861874. January 2014.
doi: 10.1016/j.patrec.2005.10.010 Citation: Coath M, Sheik S, Chicca E, Indiveri G, Denham SL and Wennekers T (2014)
Friston, K. (2005). A theory of cortical responses. Philos. Trans. R. Soc. Lond. B Biol. A robust sound perception model suitable for neuromorphic implementation. Front.
Sci. 360, 815836. doi: 10.1098/rstb.2005.1622 Neurosci. 7:278. doi: 10.3389/fnins.2013.00278
Glasberg, B. R., and Moore, B. C. (1990). Derivation of auditory filter shapes from This article was submitted to Neuromorphic Engineering, a section of the journal
notched noise data. Hear. Res. 47, 103138. doi: 10.1016/0378-5955(90)90170-T Frontiers in Neuroscience.
Insanally, M. N., Kver, H., Kim, H., and Bao, S. (2009). Feature-dependent sensi- Copyright 2014 Coath, Sheik, Chicca, Indiveri, Denham and Wennekers. This is an
tive periods in the development of complex sound representation. J. Neurosci. open-access article distributed under the terms of the Creative Commons Attribution
29, 54565462. doi: 10.1523/JNEUROSCI.5311-08.2009 License (CC BY). The use, distribution or reproduction in other forums is permitted,
Neftci, E., Binas, J., Rutishauser, U., Chicca, E., Indiveri, G., and Douglas, R. (2013). provided the original author(s) or licensor are credited and that the original publica-
Synthesizing cognition in neuromorphic electronic systems. Proc. Natl. Acad. tion in this journal is cited, in accordance with accepted academic practice. No use,
Sci. U.S.A. 110, E3468E3476. doi: 10.1073/pnas.1212083110 distribution or reproduction is permitted which does not comply with these terms.

Frontiers in Neuroscience | Neuromorphic Engineering January 2014 | Volume 7 | Article 278 | 167
ORIGINAL RESEARCH ARTICLE
published: 04 February 2014
doi: 10.3389/fnins.2014.00010

An efficient automated parameter tuning framework for


spiking neural networks
Kristofor D. Carlson 1 , Jayram Moorkanikara Nageswaran 2 , Nikil Dutt 3 and Jeffrey L. Krichmar 1,3*
1
Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA
2
Brain Corporation, San Diego, CA, USA
3
Department of Computer Science, University of California Irvine, Irvine, CA, USA

Edited by: As the desire for biologically realistic spiking neural networks (SNNs) increases, tuning
Tobi Delbruck, ETH Zurich and the enormous number of open parameters in these models becomes a difficult challenge.
University of Zurich, Switzerland
SNNs have been used to successfully model complex neural circuits that explore various
Reviewed by:
neural phenomena such as neural plasticity, vision systems, auditory systems, neural
Michael Schmuker, Freie Universitt
Berlin, Germany oscillations, and many other important topics of neural function. Additionally, SNNs are
Siddharth Joshi, University of particularly well-adapted to run on neuromorphic hardware that will support biological
California, San Diego, USA brain-scale architectures. Although the inclusion of realistic plasticity equations, neural
*Correspondence: dynamics, and recurrent topologies has increased the descriptive power of SNNs, it
Jeffrey L. Krichmar, Department of
has also made the task of tuning these biologically realistic SNNs difficult. To meet
Cognitive Sciences, University of
California Irvine, 2328 Social and this challenge, we present an automated parameter tuning framework capable of tuning
Behavioral Sciences Gateway, SNNs quickly and efficiently using evolutionary algorithms (EA) and inexpensive, readily
Irvine, CA 92697-5100, USA accessible graphics processing units (GPUs). A sample SNN with 4104 neurons was tuned
e-mail: [email protected]
to give V1 simple cell-like tuning curve responses and produce self-organizing receptive
fields (SORFs) when presented with a random sequence of counterphase sinusoidal
grating stimuli. A performance analysis comparing the GPU-accelerated implementation
to a single-threaded central processing unit (CPU) implementation was carried out and
showed a speedup of 65 of the GPU implementation over the CPU implementation,
or 0.35 h per generation for GPU vs. 23.5 h per generation for CPU. Additionally, the
parameter value solutions found in the tuned SNN were studied and found to be stable
and repeatable. The automated parameter tuning framework presented here will be of
use to both the computational neuroscience and neuromorphic engineering communities,
making the process of constructing and tuning large-scale SNNs much quicker and easier.
Keywords: spiking neural networks, parameter tuning, evolutionary algorithms, GPU programming,
self-organizing receptive fields, STDP

INTRODUCTION A similar shift in complexity is occurring when simulating synap-


Although much progress has been made in simulating large-scale tic plasticity (Abbott and Nelson, 2000), as new types of plasticity
spiking neural networks (SNNs), there are still many challenges to models such as homeostatic synaptic scaling (Watt and Desai,
overcome before these neurobiologically inspired algorithms can 2010; Carlson et al., 2013), short-term plasticity (Mongillo et al.,
be used in practical applications that can be deployed on neuro- 2008), and spike-timing dependent plasticity (STDP) (Song et al.,
morphic hardware (Boahen, 2005; Markram, 2006; Nageswaran 2000; van Rossum et al., 2000) are being incorporated into SNNs.
et al., 2010; Indiveri et al., 2011). Moreover, it has been difficult In addition, network topologies are shifting from conventional
to construct SNNs large enough to describe the complex func- feed-forward connectivity to recurrent connectivity, which have
tionality and dynamics found in real nervous systems (Izhikevich more complex dynamics and require precise tuning of synaptic
and Edelman, 2008; Krichmar et al., 2011; Eliasmith et al., 2012). feedback for stable activity (Seung et al., 2000).
Foremost among these challenges are the tuning and stabiliza- For these reasons, the process of hand-tuning SNNs is often
tion of large-scale dynamical systems, which are characterized by extremely time-consuming and inefficient which has led to inter-
many state values and open parameters (Djurfeldt et al., 2008). est among researchers in automating this process. To address these
The task of tuning SNNs is becoming more difficult as neurosci- challenges, we present an automated tuning framework that uti-
entists move away from simpler models toward more realistic, but lizes the parallel nature of graphics processing units (GPUs) and
complex models to describe the properties of network elements the optimization capabilities of evolutionary algorithms (EAs) to
(van Geit et al., 2008). For example, many modelers have moved tune open parameters of SNNs in a fast and efficient manner.
away from simple integrate and fire neuron models to models The present article describes a means to automate param-
which capture a wider range of neuronal dynamics, but have more eter tuning of spiking neural networks which are compatible
open parameters (Izhikevich, 2003; Brette and Gerstner, 2005). with present and future neuromorphic hardware. However, it is

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 168


Carlson et al. Efficient spiking network parameter tuning

important to first examine the role SNN models play in the devel- tuning large-scale SNNs; we compare these approaches with our
opment of neuromorphic hardware. Recent neuromorphic sci- tuning framework in the discussion section.
ence funding initiatives, such as the SyNAPSE project in the USA A parallel line of research in automated parameter tuning has
and the FACETS/BrainScaleS projects in Europe, have resulted taken place where larger, more abstract artificial neural networks
in the construction of neuromorphic chips. Not surprisingly, (ANNs) are constructed using EAs (Fogel et al., 1990). The build-
the research groups involved in producing these neuromorphic ing of ANNs using EAs can be broken into two basic method-
hardware devices have also spent a great deal of time build- ologies: direct encoding and indirect encoding. Much work has
ing software simulation and interface frameworks (Amir et al., been done using the direct encoding approach, where the genetic
2013; Thibeault and Srinivasa, 2013). This is because the novel description of the individual, or the genotype, is directly mapped
hardware requires new software environments and methodolo- to the actual traits of the individual, or the phenotype (Hancock,
gies to run applications (Brderle et al., 2011). There are two main 1992; Gomez and Miikkulainen, 1997; Stanley and Miikkulainen,
software development tasks required to run neuromorphic appli- 2002). An EA is said to use direct encoding when there is a one-
cations on a hardware device. First, the neuromorphic application to-one correspondence between parameter values, like synaptic
must be designed and tuned to perform a particular cognitive or weight values and genotype values. Drawbacks of this approach
computational function. This is the focus of our present study. include poor genotype scaling for large network encodings and
Second, the model description of the neuromorphic application very large parameter spaces due to the lack of geometrical con-
must be mapped onto the neuromorphic hardware device com- straints of the networks. Alternatively, indirect encoding allows
puting elements. There have been a number of recent studies that the genotype to specify a rule or method for growing the ANN
have applied various optimization techniques to solve this map- instead of specifying the parameter values directly (Husbands
ping problem with some success (Ehrlich et al., 2010; Sheik et al., et al., 1998; Beer, 2000; Floreano and Urzelai, 2001; Stanley and
2011; Gao et al., 2012; Neftci et al., 2013). Although both tasks are Miikkulainen, 2003). NeuroEvolution of Augmented Topologies
integral to developing neuromorphic hardware applications, the (NEAT) and HyperNEAT use indirect encoding to evolve net-
latter is outside the scope of present study. work topologies, beginning with a small network and adding
There has been a great deal of work in the computational complexity to that network as evolution progresses (Stanley and
neuroscience community on automating the process of parame- Miikkulainen, 2002; Stanley et al., 2009; Clune et al., 2011; Risi
ter tuning neuronal models. A variety of different methodologies and Stanley, 2012). HyperNEAT has been used to encode net-
have been used to automate parameter tuning in neural mod- works with as many as 8 million connections and networks
els, many of which are summarized in the review by van Geit evolved using NEAT have been used in food-gathering tasks
et al. (2008). Svensson et al. (2012) fit a nine-parameter model (Stanley et al., 2009), in a checkers-playing ANN that exhibits
of a filter-based visual neuron to experimental data using both topographic mappings (Gauci and Stanley, 2010), and in evolv-
gradient following (GF) methods and EAs. Some groups have ing robot gaits in hardware (Yosinski et al., 2011). The present
used optimization techniques to tune ion channels kinetics for study utilizes the indirect encoding approach, in which the learn-
compartmental neurons (Hendrickson et al., 2011; Ben-Shalom ing parameters are evolved, as opposed to the direct encoding
et al., 2012) while other groups have used quantum optimiza- approach where the synaptic weights are evolved. This allows for a
tion techniques and EAs to tune more abstract networks of large reduction in the parameter space. Although EAs are an effec-
neurons (Schliebs et al., 2009, 2010). Additionally, brute force tive tool for constructing ANNs, they often require long execution
methods of searching the parameter space were used to exam- times to produce well-tuned networks. A number of parallel com-
ine a three-neuron model of a lobster stomatogastric circuit by puting techniques can be used to reduce the execution time of
creating large databases of compartmental neurons with varying EAs, however, this paper focuses mainly on parallelization via
membrane conductance values and testing the resulting func- GPU computing.
tional behavior of this circuit (Prinz et al., 2003, 2004). Some With the development of mature GPU parallel computing
automated parameter-search tools have been developed as inter- platforms like CUDA (Nickolls et al., 2008) and OpenCL (Stone
faces between neural simulators and the optimization meth- et al., 2010), GPU accelerated algorithms have been applied to
ods used to tune them such as Neurofitter (van Geit et al., a variety of tasks in scientific computing. GPU acceleration has
2008). Other tools allow for the automatic compilation of very been used to increase the throughput of EAs (Maitre et al.,
large sets of simulation runs across a wide range of parameter 2009), simulate neural field models of the primary visual cor-
values and experimental conditions (Calin-Jageman and Katz, tex V1 (Baladron et al., 2012), and search parameter spaces in
2006). bio-inspired object-recognition models (Pinto et al., 2009). In
Unlike these parameter tuning methodologies, which have addition to these applications, a number of research groups in the
been applied to small neural circuits, single neurons or net- computational neuroscience community (Brette and Goodman,
works of hundreds of neurons, our focus is the automated tuning 2012) have developed and implemented parallel implementa-
of much larger neural systems (on the scale of 103 106 neu- tions of SNNs on GPUs (Bernhard and Keriven, 2006; Fidjeland
rons). Neural networks at these scales become more useful for the et al., 2009; Nageswaran et al., 2009b; Bhuiyan et al., 2010; Han
description of cognitive models and closer to the scale of SNNs and Taha, 2010; Hoffmann et al., 2010; Yudanov et al., 2010;
currently being designed to run on neuromorphic hardware Ahmadi and Soleimani, 2011; Nowotny, 2011; Thibeault et al.,
(Esser et al., 2013; Thibeault and Srinivasa, 2013). Recent work 2011; de Ladurantaye et al., 2012; Mirsu et al., 2012; Pallipuram
by Rossant et al. (2011) and Eliasmith et al. (2012) has focused on et al., 2012). GPU-driven SNN simulators have already been used

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 169


Carlson et al. Efficient spiking network parameter tuning

in SNN models of the basal forebrain (Avery et al., 2012), the first briefly review the approaches CARLsim uses to run SNNs
basal ganglia (Igarashi et al., 2011), the cerebellum (Yamazaki and in parallel before describing the general layout of the automated
Igarashi, 2013), and the olfactory system (Nowotny, 2010). parameter tuning framework and describe how a researcher
Our present study drastically decreases the time it takes to tune would use the tool to tune SNNs. Figure 1 shows the basic CUDA
SNN models by combining a freely available EA library with our GPU architecture, which consists of a multiple streaming mul-
previous work (Nageswaran et al., 2009b; Richert et al., 2011), tiprocessors (SMs) and a global memory, accessible to all SMs.
which consisted of a parallelized GPU implementation of an SNN Each SM is comprised of multiple floating-point scalar processors
simulator. Although other research groups have used EAs and (SPs), at least one special function unit (SFU), and a cache/shared
GPUs to tune SNNs (Rossant et al., 2011), our approach is more memory. CUDA code is distributed and executed in groups of 32
general as it tunes a variety of SNN parameters and utilizes fitness threads called warps. Each SM has at least one warp scheduler that
functions that can be broadly applied to the behavior of the entire ensures maximum thread concurrency by switching from slower
SNN. As a proof of concept, we introduce a parameter tuning to faster executing warps. Our simulations utilized an NVIDIA
framework to evolve SNNs capable of producing self-organized Tesla M2090 GPU with 6 GB of global memory, 512 cores (each
receptive fields similar to those found in V1 simple cells in operating at 1.30 GHz) grouped into 16 SMs (32 SPs per SM), and
response to patterned inputs. An indirect encoding approach was a single precision compute power of 1331.2 GFLOPS.
utilized as the parameters tuned in the SNN governed Hebbian The CARLsim parallel GPU implementation was written to
learning, homeostasis, maximum input stimulus firing rates, and optimize four main performance metrics: parallelism, memory
synaptic weight ranges. A performance analysis compared the bandwidth, memory usage, and thread divergence which are
parallelized GPU implementation of the tuning framework with discussed in greater detail in (Nageswaran et al., 2009a). The
the equivalent central processing unit (CPU) implementation and term parallelism refers to both the degree to which the appli-
found a speedup of 65 (i.e., 0.35 h per generation vs. 23.5 h cation data is mapped to parallel threads and the structure of
per generation) when SNNs were run concurrently on the GPU. the mapping itself. CARLsim utilizes both neuronal parallelism
Using affordable, widely-accessible GPU-powered video cards, (N-parallelism), where individual neurons are mapped to
the software package presented here is capable of tuning complex processing elements and simulated in parallel, and synaptic par-
SNNs in a fast and efficient manner. The automated parameter allelism (S-parallelism), where synaptic data are mapped to pro-
tuning framework is publicly available and could be very use- cessing elements and simulated in parallel. Anytime a neuronal
ful for the implementation of large-scale SNNs on neuromorphic state variable is updated, N-parallelism is used, and anytime a
hardware or for the development of large-scale SNN simulations weight update is necessary, S-parallelism is used. Sparse repre-
that describe complex brain functions. sentation techniques such as the storage of SNN data structures
using the reduced Address Event Representation (AER) format
METHODS and the use of a circular queue to represent firing event data
GPU ACCELERATED SNNs IN CARLsim decrease both memory and memory bandwidth usage. GPUs exe-
An important feature of the automated parameter tuning frame- cute many threads concurrently (1536 threads per SM in the Tesla
work is the ability to run multiple SNNs in parallel on the GPU, M2090) and manage these threads by providing a thread sched-
allowing significant acceleration of the EA evaluation phase. We uler for each SM which organizes groups of threads into warps.

FIGURE 1 | A simplified diagram of NVIDIA CUDA GPU architecture (adapted from Nageswaran et al., 2009a,b). Our simulations used an NVIDIA Tesla
2090 GPU that had 16 streaming multiprocessors (SM) made up of 32 scalar processors (SPs) and 6 GB of global memory.

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 170


Carlson et al. Efficient spiking network parameter tuning

Thread/warp divergence occurs when threads in a single warp a publically available evolutionary computation toolkit (Keijzer
execute different operations, forcing the faster executing threads et al., 2002), and (3) a Parameter Tuning Interface (PTI),
to wait until the slower threads have completed. In CARLsim, developed by our group, to provide an interface between
thread/warp divergence is minimized during diverging loop exe- CARLsim and EO. Evolving Objects is available at http://eodev.
cutions by buffering the data until all threads can execute the sourceforge.net/ and both CARLsim and the PTI are available at
diverging loop simultaneously. http://www.socsci.uci.edu/jkrichma/CARLsim/. The EO com-
putational framework runs the evolutionary algorithm on the
AUTOMATED PARAMETER TUNING FRAMEWORK DESCRIPTION user-designated parameters of SNNs in CARLsim. The PTI allows
To test the feasibility of an automated parameter tuning frame- the objective function to be calculated independent of the EO
work, our group used EAs to tune open parameters in SNNs computation framework. Parameter values are passed from the
running concurrently on a GPU. As a proof of concept, the EO computation framework through the PTI to the SNN in
SNNs were evolved to produce orientation-dependent stimulus CARLsim where the objective function is calculated. After the
responses similar to those found in simple cells of the primary objective function is executed, the results are passed from the
visual cortex (V1) through the formation self-organizing recep- SNN in CARLsim through the PTI back to the EO computa-
tive fields (SORFs). The general evolutionary approach was as tion framework for processing by the EA. With this approach, the
follows: (1) A population of neural networks was created, each fitness function calculation, which involves running each SNN
with a unique set of neural parameter values that defined over- in the population, can be run in parallel on the GPU while the
all behavior. (2) Each SNN was then ranked based on a fitness remainder of EA calculations can be performed using the CPU
value assigned by the objective function. (3) The highest ranked (Figure 2B).
individuals were selected, recombined, and mutated to form the
offspring for the next generation. (4) This process continued USING THE PARAMETER TUNING INTERFACE
until a desired fitness was reached or until other termination In addition to providing a means for CARLsim and EO to
conditions were met (Figure 2A). exchange data, the PTI hides the low level description and con-
The automated parameter tuning framework consisted of figuration of EO from the user by providing a simple application
three software packages and is shown in Figure 2B. The frame- programming interface (API). Before using the PTI, the user
work includes: (1) the CARLsim SNN simulator (Richert et al., must have a properly configured EO parameter file, which is a
2011), (2) the Evolving Objects (EO) computational framework, plain text file that provides the user with control over an EO

FIGURE 2 | (A) Flow chart for the execution of an Evolutionary carried out in parallel on the GPU. The operations inside the dotted
Algorithm (EA). A population of individuals () is first initialized and gray box are described in greater detail in (B). (B) Description of the
then evaluated. After evaluation, the most successful individuals are automated parameter tuning framework consists of the CARLsim SNN
selected to reproduce via recombination and mutation to create an simulator (light brown), the EO computational framework (light blue),
offspring generation (). The offspring then become parents for a new and the Parameter Tuning Interface (PTI) (light green). The PTI passes
generation of the EA. This continues until a termination condition is tuning parameters (PN ) to CARLsim for evaluation in parallel on the
reached. The light blue boxes denote operations that are carried out GPU. After evaluation, fitness values (FN ) are passed from CARLsim
serially on the CPU while the light brown box denotes operations back to EO via the PTI.

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 171


Carlson et al. Efficient spiking network parameter tuning

configuration. An example of an EO parameter file is shown in a multi-component objective function was used to evaluate an
Supplementary 1 of the supplementary materials. At execution, individuals ability to reproduce V1 simple-cell behavior. The
EO reads the parameter file to configure all aspects of the EA, training phase consisted of the presentation of 40 grayscale sinu-
including selection, recombination, mutation, population size, soidal grating patterns of varying orientation (from /40 to )
termination conditions, and many other EA properties. Beginners in random sequence to the SNN for approximately 100 min. Each
to EO can use the example EO parameter files included with the pattern was presented to the network for 2 s while 1 Hz Poisson
EO source code for the automated parameter tuning framework noise was applied to the network for 500 ms between pattern pre-
presented here. A sample program overview of the PTI and a sum- sentations. During the testing phase eight grating orientations
mary of the PTI-API are included in Supplementary materials (from /8 to ) were presented to the network and the fir-
sections 2 and 3. Additional EO examples and documentation can ing rate responses of the four output neurons in the Exc group
be found online at http://eodev.sourceforge.net/eo/tutorial/html/ were recorded. STDP and homeostasis were enabled during the
eoTutorial.html. After creating a valid EO parameter file, the user training phase but were disabled for the testing phase. The evo-
is ready to use the PTI and CARLsim to tune SNNs. lutionary algorithm began with the random initialization of the
parent population, consisting of 10 SNNs, and produced 10 off-
EVOLVING SNNs WITH V1 SIMPLE CELL RESPONSES AND SORF spring per generation. Ten SNN configurations ran in parallel. To
FORMATION evolve V1 simple cell responses, a real-valued optimization algo-
As a proof of concept, the ability of the automated parameter tun- rithm called Evolution Strategies (De Jong, 2002) was used with
ing network to construct an SNN capable of producing SORFs deterministic tournament selection, weak-elitism replacement,
and orientation-dependent stimulus responses was examined. 40% Gaussian mutation and 50% crossover. Weak-elitism ensures
This set of simulations was presented with grayscale counterphase the overall fitness monotonically increases each generation by
gratings of varying orientations. The EO computation framework replacing the worst fitness individual of the offspring population
evolved SNN parameters that characterized spike-timing depen- with the best fitness individual of the parent population. Fourteen
dent plasticity (STDP), homeostasis, the maximum firing rates of parameters were evolved: four parameters associated with E E
the neurons encoding the stimuli, and the range of weight values STDP, four parameters associated with E I STDP, the homeo-
for non-plastic connections. The network topology of the SNN, static target firing rates of the Exc and Inh groups, the strength of
shown in Figure 3, modeled the visual pathway from the lateral the fixed uniformly random On(Off)Buffer Exc group connec-
geniculate nucleus (LGN) to the primary visual cortex (V1). tions, the strength of the plastic Exc Inh group connections,
Each individual in the population participated in a train- the strength of the fixed uniformly random Inh Exc group
ing phase, where synaptic weights were modified according to connections, and the maximum firing rate response to the input
STDP and homeostatic learning rules, and a testing phase where stimuli. The range of allowable values for each parameter is shown

FIGURE 3 | Network architecture of the SNN tuned by the parameter STDP curves are included to describe plastic On(Off)Buffer Exc and
tuning framework to produce V1 simple cell response and SORFs. N Exc Inh connections. Tuned parameters are indicated with dashed arrows
represents the number of neurons used in different groups. E E and E I and boxes.

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 172


Carlson et al. Efficient spiking network parameter tuning

in Table 1. The parameter ranges for the STDP time windows range from approximately 0 to 60 Hz, which is an important
were constrained by experimental data (Caporale and Dan, 2008) aspect of neuronal activity.
while the remaining parameter ranges were chosen to produce The fitnessdecorr component of the fitness function enforced
SNNs with biologically realistic firing rates. decorrelation in the Exc group neuronal firing rates so that each
The multi-component objective function was constructed neuron responded maximally to a different stimulus presenta-
by requiring output neurons to have desirable traits in neu- tion angle. Equation (2) ensured the angles of maximum response
ronal response activity, namely, decorrelation, sparseness, and imax for each neuron, i, were as far from one another as possible
a description of the stimulus that employs the entire response by minimizing the difference between the two closest maximum
range. The total fitness function to be maximized, fitnesstotal , is angles (Dimin ) and the maximum possible value of Dimin , called
described by Equation (1) and required each fitness component in
Dtarget . Dimin is described in Equation (3) and Dtarget had a value
the denominator to be minimized. Fitness values were normalized
of /4.
by the highest fitness value and ranged from 0 to 1. The fit-
ness function consisted of three fitness components, fitnessdecorr , =4 
N 
fitnessGauss , fitnessmaxRate , and a scaling factor K which had a  i 
fitnessdecorr = Dmin Dtarget  (2)
value of 4.4 in all simulations discussed here.
i=1
 
 j 
1 Dimin = min imax max  j  = i (3)
fitnesstotal =
fitnessdecorr + fitnessGauss + Kscaling factor fitnessmaxRate
(1) The next fitness component fitnessGauss ensured that each Exc
Here fitnessdecorr , described in Equation (2), was minimized if group neuron had a Gaussian tuning curve response similar to
each output neuron responded uniquely and preferentially to that found in V1 simple cells. The difference between the normal-
a grating orientation, causing the average firing rates of each ized firing rate Rij and a normalized Gaussian Gij was calculated
neuron to be decorrelated. The fitness component, fitnessGauss , for every presentation angle for each Exc group neuron and was
was minimized when each Exc group neuron had an idealized summed over all angles and neurons. This is shown in Equation
Gaussian tuning curve response and is defined in Equation (4). (4) while a description of the Gaussian is shown in Equation (5),
The fitness component, fitnessmaxRate , was minimized when the
i
where rmax is the maximum firing rate for the ith neuron, imax
maximum firing rate of the output neurons achieved a target fir- is the angle of maximum response of the ith neuron, j is the
ing rate, which helped neuronal activity remain stable and sparse, jth stimulus angle, and was chosen to be 15/180 to match
and is defined in Equation (6). A scaling term, Kscaling factor = 4.4, experimental observations (Henry et al., 1974).
was used to correctly balance the maximum firing rate require-
= 40 
= 4 M
N 
ment against the decorrelation and Gaussian tuning curve curve  i 
requirements. Taken together, both fitnessmaxRate and fitnessGauss fitnessGauss = Rj Gij  (4)
i=i j=1
result in the assignment of high fitness values to neurons that
 2
have a stimulus response that utilizes the entire neuronal response
1 j imax
Gij = rmax
i
exp (5)
2
Table 1 | Range of allowable values for parameters optimized by the
automated parameter tuning framework. The fitnessmaxRate component, in combination with the
Inh Exc group connections, helped to enforce the requirement
Parameters Range
that the Exc group neurons had sparse firing rates by limiting the
Max. Poiss. Rate 1040 Hz firing rate of each neuron to a maximum target firing rate Rmax
target
Buff Exc Wts 4.0e-31.6e-2 of 60 Hz. The difference between the maximum firing rate Rimax
Exc Inh Wts 0.11.0 of each Exc group neuron and the maximum target firing rate
Inh Exc Wts 0.10.5 was calculated and summed over all Exc group neurons as shown
Rtarget Exc 1030 Hz in Equation (6).
Rtarget Inh 40100 Hz
=4 
N 
A+ Exc 9.6e-64.8e-5  i 
A Exc 9.6e-64.8e-5 fitnessmaxRate = Rmax Rmax
target  (6)
+ Exc 1060 ms i=1
Exc 5100 ms
Each fitness component had a fitness constraint imposed on it
A+ Inh 9.6e-64.8e-5
which caused the individual to be assigned a poor overall fitness
A Inh 9.6e-64.8e-5
if it fell outside a particular range of values. Recall that the fitness
+ Inh 1060 ms
components are in the denominator of the total fitness equation
Inh 5100 ms
making lower fitness component values more fit than higher fit-
Weight ranges and STDP A+ and A parameters are dimensionless and their ness component values. The constraints are expressed as upper
relative magnitudes are important for creating a functional SNN. limits. Those individuals with fitness components larger than the

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 173


Carlson et al. Efficient spiking network parameter tuning

upper limit were assigned poor overall fitness values by adding of accuracy (Izhikevich, 2003). All excitatory neurons were mod-
240 to the denominator of Equation (1). The fitness component eled as RS neurons while all inhibitory neurons were modeled
fitnessdecorr had an upper limit constraint of 15, the fitness com- as FS neurons. The dynamics of Izhikevich neurons are shown
ponent fitnessGauss had an upper limit of 1300, and the fitness in Equations (10, 11) and consist of a 2D system of ordinary
component fitnessmaxRate had an upper limit of 160. differential equations.

NETWORK MODEL d
= 0.042 + 5 + 140 u + I (10)
The input to the network consisted of a 32 32 grid of grayscale dt
pixels, ranging from -1 to 1, which were connected to a pair of du
32 32 Poisson spiking neuron groups with one-to-one topol- = a(b u) (11)
dt
ogy to model the On/Off receptive fields found in the LGN.
One Poisson spiking neuron group, the OnPoiss group, had lin- Here, is the membrane potential of the neuron and u is the
ear spiking responses corresponding to Equation (7) while the recovery variable. The neuron dynamics for spiking are as follows:
OffPoiss group had responses corresponding to Equation (8). c
Here, ri,On (ri,Off ) represent the firing rate of neuron i, of the If 30 mv, then . The variables a, b, c, and d
uu+d
On(Off)Poiss group in response to the value of the input p,
are specific to the type of Izhikevich neuron being modeled. For
pixel i. The rates had maximum values of 1 and were scaled
RS neurons, a = 0.02, b = 0.2, c = 65.0, and d = 8.0. For FS
with the Max. Poiss. Rate parameter. Each On(Off)Poiss group
neurons, a = 0.1, b = 0.2, c = 65.0, and d = 2.0. The synap-
had fixed excitatory synapses with one-to-one connectivity to
tic input for the spiking neurons consisted of excitatory NMDA
a pair of 32 32 spiking neuron groups consisting of regular
and AMPA currents and inhibitory GABAA and GABAB cur-
spiking (RS) Izhikevich neurons (Izhikevich et al., 2004), called
rents (Izhikevich and Edelman, 2008) and has the form shown in
the On(Off)Buffer groups. The On(Off)Buffer group neurons
Equation (12). Each conductance has the general form of g(
have a refractory period and were included to produce more
0 ) where g is the conductance, is the membrane potential, and
realistic spike dynamics in response to the stimulus input. The
0 is the reversal potential.
On(Off) Buffer groups were included because Poisson spiking
neurons with a built-in delay period were not part of the standard  + 80 2
NVIDIA CUDA Random Number Generation (cuRAND) library I = gNMDA 60
 + 80 2 ( 0) + gAMPA ( 0)
and were therefore, more difficult to generate. On(Off)Buffer 1+ 60
neurons had plastic excitatory synapses with all-to-all connec-
tivity to an output group of RS neurons called the Exc group. + gGABAA ( + 70) + gGABAB ( + 90) (12)
Finally, to induce competition and encourage sparse firing, the
Exc group made plastic excitatory all-to-all connections to a fast- The conductances obey the first order dynamics shown in
spiking (FS) inhibitory neuron group (Izhikevich et al., 2004), Equation (13).
dgi gi
which made fixed inhibitory all-to-all connections back to the Exc = (13)
group. dt i

 Here i denotes a particular conductance (NMDA, AMPA, GABAA ,


pi , pi > 0 or GABAB ) and denotes the decay constant for the conduc-
ri,On pi = (7)
0, pi 0 tance. The decay constants are NMDA = 100 ms, AMPA = 5 ms,
 GABAA = 6 ms, and GABAB = 150 ms.
0, pi > 0 All plastic connections used a standard nearest-neighbor
ri,Off p =   (8)
pi  , pi 0 STDP implementation (Izhikevich and Desai, 2003) but dis-
tinct STDP rules were used for STDP occurring between
excitatory-to-excitatory (E E) neurons and STDP occurring
The mathematical description of the Poisson spiking neurons
between excitatory-to-inhibitory (E I) neurons. Excitatory-to-
used in the simulation is shown in Equation (9).
excitatory plastic connections had traditional STDP curves as
detailed in (Bi and Poo, 1998) while excitatory-to-inhibitory plas-
ti + 1 = ti ln (xi ) /r (9) tic connections used STDP curves where potentiation occurred
for pre-after-post pairings and depression occurred for pre-
The spike times were generated iteratively by generating inter- before-post pairings as found in experiments (Bell et al., 1997). A
spike intervals (ISIs) from an exponential distribution (Dayan model for homeostatic synaptic scaling (Carlson et al., 2013) was
and Abbott, 2001). Here ti is the spike time of the current spike, also included to prevent runaway synaptic dynamics that often
ti + 1 is the spike time of the next spike, r is the average firing arise in STDP learning rules.
rate, and xi is the current random number (uniformly distributed The STDP update rule used in our simulations is shown in
between 0 and 1) used to generate the next spike time. Equation (14).
The spiking neurons used in the simulation were Izhikevich-
type neurons and were chosen because they are computationally dwi,j
efficient and able to produce neural dynamics with a high degree = + LTPi,j + LTDi,j (14)
dt

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 174


Carlson et al. Efficient spiking network parameter tuning

The synaptic weight from presynaptic neuron i to postsynaptic


neuron j is indicated by the variable wi,j . Additionally, is a bias
often set to zero or a positive number to push the network toward
positive weight increases for low synaptic input, while is the
learning rate. The weight changes were updated every time step
(1 ms) but the weights themselves are modified once every 1 s.
To model homeostatic synaptic plasticity the STDP update
rule was modified as shown in Equation (15) where = 0.1 and
= 1.0.
   
dwi,j R
= wi,j 1 + LTPi,j + LTDi,j K (15)
dt Rtarget

Here, is the homeostatic scaling factor while R and Rtarget are the
average and target firing rates, respectively, for the postsynaptic
neuron, j. A stability term denoted by K, damps oscillations in the
weight updates and speeds up learning. K is defined as:

R FIGURE 4 | Plot of best and average fitness vs. generation number for
K=   (16)
T 1 + 1 R/Rtarget 
entire simulation run (287 generations, 4104 neuron SNNs, 10 parallel
configurations). All values were normalized to the best fitness value. The
error bars denote the standard deviation for the average fitness at intervals
In Equation (16), is a tuning factor and T is the time scale over of once per 20 generations. Initially the standard deviation of the average
which the firing rate of the postsynaptic neuron is averaged. Here fitness is large as the EA explores the parameter space, but over time, the
= 50 and T = 10 s. standard deviation decreases as the EA finds better solutions.

SIMULATION DETAILS
The SORF formation and performance analysis simulations were Table 2 | Sorted fitness values (higher is better) for the initial and final
developed and implemented on a publically available neural sim- SNN populations.
ulator (Nageswaran et al., 2009b; Richert et al., 2011) and the
Initial population fitness values Final population fitness values
forward Euler method (Giordano and Nakanishi, 2006) was used
to integrate the difference equations with a step size of 1 ms for 0.1949 1.0000
plasticity equations and 0.5 ms for neuronal activity equations. 0.1780 0.9807
The CPU version of CARLsim was run on a system with an 0.1594 0.9481
Intel Core i7 2.67 GHz quad-core processor with 6 GB of mem- 0.1551 0.9454
ory. The GPU version of CARLsim was run on a NVIDIA Tesla 0.1444 0.9384
GPU M2090 card, with 6 GB of total memory and 512 cores. 0.1399 0.9294
The GPU was capable of 665 GFLOPS of double precision, 1.33 0.1212 0.9146
TFLOPs of single precision, and had a memory bandwidth of 0.1006 0.9107
117 GB/s. The GPU was in a 12-core CPU cluster with 24 GB of 0.0977 0.9105
memory and 4 GPU cards. Simulations executed on the CPU were 0.0913 0.9040
single-threaded, while simulations executed on the GPU were
parallelized, but only on a single GPU. Entries with a shaded background denote SNNs with V1 simple cell responses
and SORF formation for every Exc group neuron (4).
RESULTS
An SNN capable of SORF formation and V1 simple cell like
responses to counterphase grating stimuli presentation was SNNs at generation 52. Table 2 shows the fitness values of the 10
constructed using the automated parameter tuning framework initial SNNs and the fitness values after 287 generations. Shaded
described above. Using a configuration with 10 SNNs running table entries denote SNNs that produced SORFs and V1 sim-
simultaneously on the GPU, each having 4104 neurons, the auto- ple cell-like tuning curves for all four Exc group neurons while
mated parameter tuning framework took 127.2 h to complete 287 unshaded table entries indicate SNNs that failed to produce these
generations of the EA and used a stopping criterion that halted neural phenomena. All SNNs from the initial population had very
the EA after the completion of 100 generations without a change low fitness, produced no orientation selectivity, and had no SORF
in the fitness of the best individual or after the completion of 500 formation. All SNNs from the final population except for the last
generations. The average and best fitness values for every gener- individual (fitness = 0.9040) had high fitness values, produced
ation are shown in red and blue, respectively, in Figure 4. The V1 simple cell-like tuning curve responses, and produced SORFs.
automated parameter tuning framework constructed 128 SNNs The last individual in Table 2, had a high fitness, but only pro-
out of 2880 total SNNs (4.4%) that displayed SORF formation duced V1 simple-cell like tuning curve responses and SORFs in
and V1 simple cell like responses and produced the first of these three of the four Exc group neurons.

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 175


Carlson et al. Efficient spiking network parameter tuning

EVOLVING SNNs WITH V1 SIMPLE CELL RESPONSES AND SORF a dashed red line. The firing rate responses from the Exc group
FORMATION neurons qualitatively match the idealized V1 simple cell Gaussian
A single set of parameter values from the highest fitness tuning curves. The maximum firing rate responses of Exc group
individual (row 1, column 2 in Table 2) was used to gener- neurons were constrained by the sparsity requirement of the fit-
ate Figures 57A, 10, these parameter values can be found in ness function and peaked at an average value of 67 Hz. The firing
Supplementary 4 of the supplementary materials. Figure 5 shows rate responses were also decorrelated, another requirement of the
the firing rates of four output neurons from the Exc group in fitness function, which lead to different preferred orientations for
response to all 40 input stimulus grating orientations. Each plot each of the Exc group neurons.
represents the firing rate of an individual Exc group neuron, To examine the ability of the automated parameter tuning
denoted by a blue line, along with an idealized Gaussian tuning framework to construct SNNs capable of SORF formation, the
curve similar to those found in simple cell responses in visual cor- synaptic weights between the On(Off)Buffer groups and the
tical area V1 of the visual cortex (Henry et al., 1974), denoted by Exc group were visualized in Figure 6 for the highest fitness
SNN. Each plot is a composite of the connections between the
On(Off)Buffer group and a single Exc group neuron, where
light regions represent strong synaptic connections and dark
regions represent weak synaptic connections. Figure 6A shows

FIGURE 5 | Plot of the firing rate response of Exc group neurons vs.
grating presentation orientation angle. The blue lines indicate the firing
rate of a neuron in the simulation while the dotted red lines indicate
idealized Gaussian tuning curves. Together, the four excitatory neurons
cover the stimulus space of all the possible presentation angles.

FIGURE 7 | The responses of the Exc group neurons (identified by their


neuron id on the y-axis) were tested for all 40 grating orientations.
One orientation was presented per second and the test ran for 40 s (x-axis).
(A) Neuronal spike responses of 400 neurons trained with the highest
fitness SNN parameters found using the parameter tuning framework. (B)
Neuronal spike responses of 400 neurons trained using a single set of low
fitness parameters. The neurons were arranged such that those neurons
responding to similar orientations were grouped together for both (A,B).
This accounts for the strong diagonal pattern found in (A) and the very faint
diagonal pattern found in (B). Neuronal spike responses in (A) are sparse in
FIGURE 6 | Synaptic weights for the On(Off)Buffer Exc connections that relatively few neurons code for one orientation while neuronal spike
of a high fitness SNN individual. (A) Initial weight values before training. responses in (B) are not sparse. Additional, many of the neuronal spike
(B) After training for approximately 100 simulated minutes with STDP and responses in part (A) employ a wide range of firing rates to describe a
homeostasis, the synaptic weight patterns resemble Gabor filters. (C) Four subset of the orientation stimulus space while spike responses in (B) have
example orientation grating patterns are shown. similar firing responses across all angles in all cases.

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 176


Carlson et al. Efficient spiking network parameter tuning

the initial randomized synaptic weights while Figure 6B shows SNNs (shown in Figure 8A) found using the parameter tuning
the final synaptic weights after 100 min of simulation time dur- framework along with the remaining low fitness parameter values
ing the training period. The synaptic connections between the (shown in Figure 8B). Each point represents a target Exc and Inh
On(Off)Buffer neurons and the Exc neurons in Figure 6B formed firing rate pair for a given SNN. The homeostatic target firing rate
receptive fields that resembled Gabor filters, which have been used parameter for Exc groups in high fitness SNNs is clustered around
extensively to model V1 simple cell responses (Jones and Palmer, a relatively small region (1014 Hz) when compared to the total
1987). Figure 6C shows four example counterphase sinusoidal allowed target firing rate ranges of the Exc and Inh groups which
grating orientations used as visual input into the SNN. are 1030 and 40100 Hz, respectively. The low fitness SNNs have
Figure 7 shows a raster plot of 400 Exc group neurons from Exc and Inh groups with target firing rates that have a much
100 SNNs that were trained using the highest fitness parame- wider range of values. It is interesting that successful SNNs clus-
ter values taken from row 2, column 1 of Table 2 and shown in ter around a low Exc group homeostatic firing rate (1014 Hz).
Figure 7A compared with a set of very low fitness parameters, This may be due to the interplay between STDP time windows or
fitness = 0.0978, shown in Figure 7B. Neurons that have sim- the maximum input Poisson firing rate. In high fitness SNNs, Inh
ilar preferred orientation angles have been placed close to one groups with higher homeostatic target firing rates are rare, but the
another. The high fitness neurons in Figure 7A have responses distribution of firing rates is broader.
that are sparse (only a small subset of the neurons respond to We next examined the relationship between STDP plastic-
any particular stimulus angle) and orthogonal (different neu- ity parameters among high fitness SNNs individuals exclusively.
rons respond to different stimulus orientations) while neurons in Figure 9A shows the LTD/LTP decay constant ratios, which dic-
Figure 7B do not have these properties. Although each high fit- tate the size of the LTP and LTD time windows, for Buffer to Exc
ness neuron responds to a small subset of stimulus orientations, group connections and Exc to Inh group connections. Figure 9B
taken together the high fitness neurons have responses that cover shows a comparison between LTD/LTP amplitude ratios for
all the possible stimulus orientations while low fitness neurons Buffer to Exc group connections and Exc to Inh group connec-
do not have responses that carry meaningful information in this tions. The overall parameter ranges can be found in Table 1. The
respect. Buffer to Exc decay constant ratio in Figure 9A is within close
Figure 8 compares the evolved parameters of high fitness range of experimental observations by (Bi and Poo, 1998), that
SNNs with low fitness SNNs. We judged an SNN to be high show the LTD decay constant as being roughly twice as large at
fitness if its three fitness component values met the following cut- the LTP decay constant. The Exc to Inh LTD/LTP decay constant
offs: fitnessdecorr had a cutoff value of 15, fitnessGauss had a cutoff ratio in Figure 9A has a broader distribution of values that ranged
value of 950, and fitnessmaxRate had a cutoff value of 50. We found from approximately 1 to 4. These values also fall within the range
these cutoffs produced SNNs with SORFs in the receptive fields of experimental measurements of the LTD/LTP decay constant
of at least 3 out of 4 of the Exc group neurons. There were 128 ratio of approximately one (Bell et al., 1997). High fitness SNNs
high fitness SNNs and 2752 low fitness SNNs out of the 2880 total in Figure 9B show a narrow distribution of LTD/LTP amplitude
SNNs constructed and tested by the parameter tuning framework. ratios that favor an LTD/LTP ratio less than one for Buffer to Exc
Figure 8 shows a comparison between homeostatic target fir- group connections while Exc to Inh group connections show sig-
ing rate parameters for Exc and Inh groups for high fitness nificantly broader LTD/LTP amplitude ratios with values ranging
from approximately 1 to 4.

FIGURE 8 | Plot of the target homeostatic firing rate parameters for


Exc group and Inh group for high fitness SNNs shown in (A) and low FIGURE 9 | The time windows in which STDP occurs are often
fitness SNNs shown in (B). The Exc group homeostatic target firing rate is modeled as decaying exponentials and each of the LTP and LTD
significantly more constrained (between the ranges of 1014 Hz) for the windows can be characterized by single decay constant. The degree to
high fitness SNNs as opposed to the corresponding parameters for the low which the weight is increased during LTP or decreased during LTD is often
fitness SNNs. There were 128 high fitness SNNs and 2752 low fitness called the LTP/LTD amplitude or magnitude. (A) Ratio of the STDP LTD/LTP
SNNs out of a total of 2880 individuals. EAs allow parent individuals to pass decay constant for the Buffer to Exc group connections (blue) and the Exc
high value parameter values directly to their offspring, because of this, to Inh group connections (red) for high fitness SNNs. (B) The ratio of the
there are many offspring with identical high fitness values. This explains STDP LTD/LTP amplitude for the Buffer to Exc group connections (blue) and
why there are not 128 distinct points distinguishable in (A). the Exc to Inh group connections (blue) for high fitness SNNs.

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 177


Carlson et al. Efficient spiking network parameter tuning

STABILITY ANALYSIS
To ensure that the solutions found by the automated tuning
framework were stable, the parameter set from the highest fit-
ness SNN was used to train and test an SNN for an additional
100 trials, allowing the SORFs to be learned through STDP and
tested as described in the previous section. That is, a single set
of parameters was tested to ensure that the ability of a nave
SNN to form SORFs was repeatable and independent of stimu-
lus presentation order. Thus, the order of stimulus presentations
was randomized between trials and each trial consisted of train-
ing and testing phases. Parameter values were deemed stable if
the SNNs consistently produced V1 simple cell-like responses and
SORFs for the majority of the trials. A robustness analysis on the
effect of small perturbations on the functional behavior of the
SNNs was not performed. To further analyze the stability of the
parameter values, firing rate responses from the Exc group over
all 100 trials were used to decode eight test angles presented to
the trained SNNs with a widely used population decoding algo-
rithm (Dayan and Abbott, 2001). At each presentation of the eight
orientation test angles, the neuronal firing rate and its preferred
stimulus orientation (i.e., the orientation for which the neuron
fired maximally) were used to create a population vector for all
the Exc neurons from the 100 trials (4 Exc neurons per trial 100
trials = 400 neurons in total). The neuronal population vec-
tors were averaged and the resultant vector was compared to the
stimulus orientation angle.
The results of the 100 training and testing trials for the iden-
tical set of parameters were as follows. 76% of the trials had
SNNs with tuning curves that qualitatively matched V1 simple
cell responses and produced Gabor filter-like SORFs. The remain-
ing 24% of the trials had SNNs with three Exc group neurons that
produced good behavior and a single Exc group neuron with a
bimodal tuning curve and a SORF that resembled two overlap-
ping Gabor filters at different angles. A population code from the
firing rate responses of the 400 Exc group neurons was used to
decode the orientation of the presented test stimuli. Figure 10
shows the population decoding results for eight presented test
angles. The smaller black arrows are neuronal responses from the
400 neurons which sum to the population vector, shown with a
blue arrow. The lengths of the individual neural response vectors
(black arrows) were normalized by dividing the mean firing rate
by 2. The length of the population vector (blue arrow) was nor-
malized by dividing the sum of the individual responses by the
magnitude of the vector. The population vector was very close to
the presented test stimulus orientation for every case with a mean
error of 3.4 and a standard deviation of 2.3 .

PERFORMANCE ANALYSIS FIGURE 10 | Population decoding of eight test presentation angles. The
To test the computational performance of the automated param- test presentation angle , is shown above each population decoding figure.
100 simulation runs, each with identical parameter values but different
eter tuning framework, three different sized SNNs were run using training presentation orders, were conducted and the firing rates of the Exc
either a serial CPU implementation or a parallel GPU implemen- group neurons were recorded. The individual responses of each of the 400
tation of CARLsim. Each SNN had identical topology except for neurons (4 Exc neurons 100 runs) are shown with solid black arrows.
the size of the On(Off)Poiss and On(Off)Buffer groups which These individuals were summed to give a population vector (shown with a
were either 16 16, 24 24, or 32 32 giving rise to networks blue arrow) that was compared to the correct presentation angle (shown
with 1032, 2312, and 4104 neurons, respectively. The number of with a red arrow). Both the population vectors and correct presentation
angle vectors were normalized while the component vectors were scaled
configurations executed in parallel on the GPU was varied from 5 down by a factor of 2 for display purposes (see text for details).
to 30 for all network sizes and execution times were recorded.

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 178


Carlson et al. Efficient spiking network parameter tuning

The parallelized GPU SNN implementation showed impres- that can quickly and efficiently tune SNNs by utilizing inex-
sive speedups over the CPU SNN implementation (Figure 11). pensive, off-the-shelf GPU computing technology as a substitute
The largest speedup (65) was found when 30 SNN configura- for more expensive alternatives such as supercomputing clusters.
tions, each with 4104 neurons, were run in parallel, which took The automated parameter tuning framework consists solely of
21.1 min to complete a single generation, whereas 10 SNN con- freely available open source software. As a proof of concept, the
figurations with 4104 neurons required 26.4 min to complete a framework was used to tune 14 neural parameters in an SNN
single generation. In contrast, the CPU took 23.5 h for a single ranging from 1032 to 4104-neurons. The tuned SNNs evolved
generation. It would be interesting to compare the GPU perfor- STDP and homeostasis parameters that learned to produce V1
mance with a multi-threaded CPU simulation and there may be simple cell-like tuning curve responses and SORFs. We observed
gains in such an approach. However, in our experience SNNs on speedups of 65 using the GPU for parallelization over a CPU.
such systems do not optimize or scale as well as GPUs. Because Additionally, the solutions found by the automated parameter
the calculation of SNN neuronal and synaptic states can be cast tuning framework were shown to be stable.
as single instruction multiple data (SIMD), parallel computation There are a few research groups that have designed soft-
of SNNs is more suited to GPUs having thousands of simple ware frameworks capable of tuning large-scale SNNs. Eliasmith
cores, rather than multithreaded CPUs having many less, but et al. (2012) constructed a 2.5 million neuron simulation that
more powerful cores. demonstrated eight diverse behavioral tasks by taking a con-
As the number of concurrent SNN configurations grows, the trol theoretic approach called the Neural Engineering Framework
speedup increases slowly and nearly plateaus for 30 parallel SNN (NEF) to tune very large-scale models. The NEF is implemented
configurations. These speedup plateaus are mostly likely due to in a neural simulator called Nengo and can specify the connec-
the limitations of the GPU core number and clock-frequency, and tion weights between two neuronal populations given the input
not the GPU global memory size as 99% of the GPU was utilized population, the output population, and the desired computation
but less than 20% of the GPU memory was utilized for the largest to be performed on those representations. Our parameter tun-
simulation configurations. It should be noted that although the ing framework takes a different approach, allowing the user to
single SNN configuration was moderately sized, all 30 configu- tune not only individual synaptic weights but also parameters
rations together comprised a large-scale network (i.e., 123,120 related to plasticity rules, connection topology, and other biolog-
total neurons) that was running simultaneously. This parameter ically relevant parameters. Our framework does not require the
tuning approach can be scaled to tune larger SNNs by running user to specify the desired computations between two neuronal
fewer configurations in parallel or by spreading the computation populations but rather leaves it to the user to specify the exact
and memory usage across multiple GPUs with an MPI/CUDA fitness function. The popular SNN simulator, Brian (Goodman
implementation. and Brette, 2009), also has support for parameter tuning in the
form of a parallelized CPU/GPU tuning toolkit. Their toolkit has
DISCUSSION been used to match individual neuron models to electrophysi-
With the growing interest in large-scale neuromorphic applica- ological data and also to reduce complex biophysical models to
tions using spiking neural networks, the challenge of tuning the simple phenomenological ones (Rossant et al., 2011). Our tuning
vast number of open parameters is becoming increasingly impor- framework is focused more on tuning the overall SNN behav-
tant. We introduced an automated parameter tuning framework ior as opposed to tuning a spiking model neuron that captures
electrophysiological data.
SNNs constructed and tuned with our framework could be
converted to run on any neuromorphic device that incorporates
the AER format for spike events and supports basic connection
topologies. This is the case for many neuromorphic hardware
devices (Furber et al., 2012; Cruz-Albrecht et al., 2013; Esser et al.,
2013; Pfeil et al., 2013). Although the framework presented here
was run on the CARLsim simulator, which utilizes the Izhikevich
neuron and STDP, the automated tuning framework presented
here could readily be extended to support any spiking model, such
as the leaky integrate-and-fire neuron or the adaptive exponential
integrate-and-fire neuron (Brette and Gerstner, 2005).
SNNs with thousands of neurons, multiple plasticity rules,
homeostatic mechanisms, and feedback connections, similar to
the SNN presented here, are notoriously difficult to construct and
tune. The automated parameter tuning framework presented here
FIGURE 11 | Plot of GPU speedup over CPU vs. number of SNNs run in can currently be applied to much larger SNNs (on the scale of
parallel for different sized SNNs and different numbers of SNNs run in 106 neurons) with more complex network topologies but GPU
parallel. Three different SNN sizes were used, the blue line denotes SNNs memory constraints limit the tuning of larger SNNs. Currently,
with 1032 neurons, the green line denotes SNNs with 2312 neurons, and
CARLsim SNN simulations are limited to approximately 500 K
the red line denotes SNNs with 4104 neurons.
neurons and 100 M synapses on a single Tesla M2090 GPU, but a

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 179


Carlson et al. Efficient spiking network parameter tuning

version that allows SNN simulations to run across multiple GPUs Bi, G. Q., and Poo, M. M. (1998). Synaptic modifications in cultured hippocampal
is in development and will increase the size of SNNs that can be neurons: dependence on spike timing, synaptic strength, and postsynaptic cell
type. J. Neurosci. 18, 1046410472.
tuned using this framework. The combination of a multi-GPU
Boahen, K. (2005). Neuromorphic microchips. Sci. Am. 292, 5663. doi:
version of CARLsim and the implementation of more advanced 10.1038/scientificamerican0505-56
evolutionary computation principles, such as multi-objective fit- Brette, R., and Gerstner, W. (2005). Adaptive exponential integrate-and-fire model
ness functions and co-evolving populations, should allow the as an effective description of neuronal activity. J. Neurophysiol. 94, 36373642.
framework to be scalable and capable of tuning large-scale SNNs doi: 10.1152/jn.00686.2005
Brette, R., and Goodman, D. F. M. (2012). Simulating spiking neural networks on
on the scale of millions of neurons. The highly efficient automated
GPU. Network 23, 167182. doi: 10.3109/0954898X.2012.730170
parameter tuning framework presented here can reduce the time Brderle, D., Petrovici, M. A., Vogginger, B., Ehrlich, M., Pfeil, T., Millner, S.,
researchers spend constructing and tuning large-scale SNNs and et al. (2011). A comprehensive workflow for general-purpose neural modeling
could prove to be a valuable contribution to both the neuro- with highly configurable neuromorphic hardware systems. Biol. Cybern. 104,
morphic engineering and computational neuroscience research 263296. doi: 10.1007/s00422-011-0435-9
Calin-Jageman, R. J., and Katz, P. S. (2006). A distributed computing tool for
communities. generating neural simulation databases. Neural Comput. 18, 29232927. doi:
10.1162/neco.2006.18.12.2923
ACKNOWLEDGMENTS Caporale, N., and Dan, Y. (2008). Spike timing-dependent plastic-
This work was supported by the Defense Advanced Research ity: a Hebbian learning rule. Annu. Rev. Neurosci. 31, 2546. doi:
Projects Agency (DARPA) subcontract 801888-BS and by NSF 10.1146/annurev.neuro.31.060407.125639
Carlson, K. D., Richert, M., Dutt, N., and Krichmar, J. L. (2013). Biologically plau-
Award IIS/RI-1302125. We thank Micah Richert for his work sible models of homeostasis and STDP: stability and learning in spiking neural
developing the custom spiking neural network simulator and networks, in Proceedings of the 2013 International Joint Conference on Neural
homeostatic plasticity model. We also thank Michael Avery and Networks (IJCNN) (Dallas, TX). doi: 10.1109/IJCNN.2013.6706961
Michael Beyeler for valuable feedback and discussion on this Clune, J., Stanley, K. O., Pennock, R. T., and Ofria, C. (2011). On the perfor-
mance of indirect encoding across the continuum of regularity. IEEE Trans. Evol.
project. Finally, we thank the reviewers for their feedback which
Comput. 15, 346367. doi: 10.1109/TEVC.2010.2104157
greatly improved the accuracy and clarity of the manuscript. Cruz-Albrecht, J. M., Derosier, T., and Srinivasa, N. (2013). A scalable neural chip
with synaptic electronics using CMOS integrated memristors. Nanotechnology
SUPPLEMENTARY MATERIAL 24, 384011. doi: 10.1088/0957-4484/24/38/384011
The Supplementary Material for this article can be found online Dayan, P., and Abbott, L. F. (2001). Theoretical Neuroscience. Cambridge: MIT
press.
at: http://www.frontiersin.org/journal/10.3389/fnins.2014.
De Jong, K. A. (2002). Evolutionary Computation: A Unified Approach. Cambridge:
00010/abstract The MIT Press.
de Ladurantaye, V., Lavoie, J., Bergeron, J., Parenteau, M., Lu, H., Pichevar, R., et al.
REFERENCES (2012). A parallel supercomputer implementation of a biological inspired neu-
Abbott, L. F., and Nelson, S. B. (2000). Synaptic plasticity: taming the beast. Nat. ral network and its use for pattern recognition. J. Phys. Conf. Ser. 341, 012024.
Neurosci. 3, 11781183. doi: 10.1038/81453 doi: 10.1088/1742-6596/341/1/012024
Ahmadi, A., and Soleimani, H. (2011). A GPU based simulation of multilayer spik- Djurfeldt, M., Ekeberg, O., and Lansner, A. (2008). Large-scale modelinga tool
ing neural networks, in Proceedings of the 2011 Iranian Conference on Electrical for conquering the complexity of the brain. Front. Neuroinformatics 2:1. doi:
Engineering (ICEE) (Tehran), 15. 10.3389/neuro.11.001.2008
Amir, A., Datta, P., Risk, W. P., Cassidy, A. S., Kusnitz, J. A., Esser, S. K., et al. Ehrlich, M., Wendt, K., Zhl, L., Schffny, R., Brderle, D., Mller, E., et al. (2010).
(2013). Cognitive computing programming paradigm: a corelet language A software framework for mapping neural networks to a wafer-scale neu-
for composing networks of neurosynaptic cores, in Proceedings of the 2013 romorphic hardware system, in Proceedings of ANNIIP (Funchal, Madeira),
International Joint Conference on Neural Networks (IJCNN) (Dallas, TX). doi: 4352.
10.1109/IJCNN.2013.6707078 Eliasmith, C., Stewart, T. C., Choo, X., Bekolay, T., DeWolf, T., Tang, Y., et al.
Avery, M., Krichmar, J. L., and Dutt, N. (2012). Spiking neuron model of (2012). A large-scale model of the functioning brain. Science 338, 12021205.
basal forebrain enhancement of visual attention, in Proccedings of the 2012 doi: 10.1126/science.1225266
International Joint Conference on Neural Networks (IJCNN) (Brisbane, QLD), Esser, S. K., Andreopoulus, A., Appuswamy, R., Datta, P., Barch, D., Amir,
18. doi: 10.1109/IJCNN.2012.6252578 A., et al. (2013). Cognitive computing systems: algorithms and appli-
Baladron, J., Fasoli, D., and Faugeras, O. (2012). Three applications of cations for networks of neurosynaptic cores, in Proceedings of the 2013
GPU computing in neuroscience. Comput. Sci. Eng. 14, 4047. doi: International Joint Conference on Neural Networks (IJCNN) (Dallas, TX). doi:
10.1109/MCSE.2011.119 10.1109/IJCNN.2013.6706746
Beer, R. D. (2000). Dynamical approaches to cognitive science. Trends Cogn. Sci. 4, Fidjeland, A. K., Roesch, E. B., Shanahan, M. P., and Luk, W. (2009). NeMo: a
9199. doi: 10.1016/S1364-6613(99)01440-0 platform for neural modelling of spiking neurons using GPUs, in Application-
Bell, C. C., Han, V. Z., Sugawara, Y., and Grant, K. (1997). Synaptic plasticity in a specific Systems, Architectures and Processors, 2009 ASAP 2009. 20th IEEE
cerebellum-like structure depends on temporal order. Nature 387, 278281. doi: International Conference on, (Boston, MA), 137144.
10.1038/387278a0 Floreano, D., and Urzelai, J. (2001). Neural morphogenesis, synaptic plas-
Ben-Shalom, R., Aviv, A., Razon, B., and Korngreen, A. (2012). Optimizing ion ticity, and evolution. Theory Biosci. 120, 225240. doi: 10.1007/s12064-
channel models using a parallel genetic algorithm on graphical processors. 001-0020-1
J. Neurosci. Methods 206, 183194. doi: 10.1016/j.jneumeth.2012.02.024 Fogel, D. B., Fogel, L. J., and Porto, V. W. (1990). Evolving neural networks. Biol.
Bernhard, F., and Keriven, R. (2006). Spiking neurons on GPUs, in Computational Cybern. 63, 487493. doi: 10.1007/BF00199581
ScienceICCS 2006 Lecture Notes in Computer Science, eds V. Alexandrov, G. Furber, S. B., Lester, D. R., Plana, L. A., Garside, J. D., Painkras, E., Temple, S., et al.
Albada, P. A. Sloot, and J. Dongarra (Berlin; Heidelberg: Springer), 236243. (2012). Overview of the SpiNNaker system architecture. IEEE Trans. Comput.
Bhuiyan, M. A., Pallipuram, V. K., and Smith, M. C. (2010). Acceleration 62, 2454. doi: 10.1109/TC.2012.142
of spiking neural networks in emerging multi-core and GPU archi- Gao, P., Benjamin, B. V., and Boahen, K. (2012). Dynamical system guided
tectures, in Parallel Distributed Processing, Workshops and Phd Forum mapping of quantitative neuronal models onto neuromorphic hardware.
(IPDPSW), 2010 IEEE International Symposium on (Atlanta, GA), 18. doi: IEEE Trans. Circuits Syst. Regul. Pap. 59, 23832394. doi: 10.1109/TCSI.
10.1109/IPDPSW.2010.5470899 2012.2188956

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 180


Carlson et al. Efficient spiking network parameter tuning

Gauci, J., and Stanley, K. O. (2010). Autonomous evolution of topographic reg- Nageswaran, J. M., Dutt, N., Krichmar, J. L., Nicolau, A., and Veidenbaum,
ularities in artificial neural networks. Neural Comput. 22, 18601898. doi: A. (2009a). Efficient simulation of large-scale spiking neural networks
10.1162/neco.2010.06-09-1042 using CUDA graphics processors, in Proceedings of the 2009 International
Giordano, N. J., and Nakanishi, H. (2006). Computational Physics. 2nd Edn. Upper Joint Conference on Neural Networks (IJCNN). (Piscataway, NJ: IEEE Press),
Saddle River, NJ: Pearson Prentice Hall. 32013208.
Gomez, F., and Miikkulainen, R. (1997). Incremental evolution of complex general Nageswaran, J. M., Dutt, N., Krichmar, J. L., Nicolau, A., and Veidenbaum, A. V.
behavior. Adapt. Behav. 5, 317342. doi: 10.1177/105971239700500305 (2009b). A configurable simulation environment for the efficient simulation of
Goodman, D. F. M., and Brette, R. (2009). The brian simulator. Front. Neurosci. large-scale spiking neural networks on graphics processors. Neural Netw. 22,
3:192197. doi: 10.3389/neuro.01.026.2009 791800. doi: 10.1016/j.neunet.2009.06.028
Han, B., and Taha, T. M. (2010). Neuromorphic models on a GPGPU cluster, Nageswaran, J. M., Richert, M., Dutt, N., and Krichmar, J. L. (2010). Towards
in Proceedings of the 2010 International Joint Conference on Neural Networks reverse engineering the brain: modeling abstractions and simulation frame-
(IJCNN) (Barcelona), 18. doi: 10.1109/IJCNN.2010.5596803 works, in VLSI System on Chip Conference (VLSI-SoC), 2010 18th IEEE/IFIP,
Hancock, P. J. B. (1992). Genetic algorithms and permutation problems: a (Madrid), 16. doi: 10.1109/VLSISOC.2010.5642630
comparison of recombination operators for neural net structure specifi- Neftci, E., Binas, J., Rutishauser, U., Chicca, E., Indiveri, G., and Douglas, R. J.
cation, in International Workshop on Combinations of Genetic Algorithms (2013). Synthesizing cognition in neuromorphic electronic systems. Proc. Natl.
and Neural Networks, 1992, COGANN-92, (Baltimore, MD), 108122. doi: Acad. Sci. U.S.A. 110:E3468E3476. doi: 10.1073/pnas.1212083110. Available
10.1109/COGANN.1992.273944 online at: http://www.pnas.org/content/early/2013/07/17/1212083110
Hendrickson, E. B., Edgerton, J. R., and Jaeger, D. (2011). The use of auto- Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008). Scalable parallel
mated parameter searches to improve ion channel kinetics for neural modeling. programming with CUDA. Queue 6, 4053. doi: 10.1145/1365490.1365500
J. Comput. Neurosci. 31, 329346. doi: 10.1007/s10827-010-0312-x Nowotny, T. (2010). Parallel implementation of a spiking neuronal network model
Henry, G., Dreher, B., and Bishop, P. (1974). Orientation specificity of cells in cat of unsupervised olfactory learning on NVidia CUDA, in The Proceedings of the
striate cortex. J. Neurophysiol. 37, 13941409. 2010 International Joint Conference on Neural Networks (IJCNN) (Barcelona),
Hoffmann, J., El-Laithy, K., Gttler, F., and Bogdan, M. (2010). Simulating 18. doi: 10.1109/IJCNN.2010.5596358
biological-inspired spiking neural networks with OpenCL, in Artificial Nowotny, T. (2011). Flexible neuronal network simulation framework using
Neural NetworksICANN 2010 Lecture Notes in Computer Science, eds code generation for NVidia(R) CUDATM. BMC Neurosci. 12:P239. doi:
K. Diamantaras, W. Duch, and L. Iliadis (Berlin; Heidelberg: Springer), 10.1186/1471-2202-12-S1-P239
184187. Pallipuram, V. K., Smith, M. C., Raut, N., and Ren, X. (2012). Exploring
Husbands, P., Smith, T., Jakobi, N., and OShea, M. (1998). Better living through multi-level parallelism for large-scale spiking neural networks, in Proceedings
chemistry: evolving GasNets for robot control. Connect. Sci. 10, 185210. doi: of the International Conference on Parallel and Distributed Techniques and
10.1080/095400998116404 Applications (PDPTA 2012) held in conjunction with WORLDCOMP 2012,
Igarashi, J., Shouno, O., Fukai, T., and Tsujino, H. (2011). Real-time simulation (Las Vegas, NV), 773779.
of a spiking neural network model of the basal ganglia circuitry using general Pfeil, T., Grbl, A., Jeltsch, S., Mller, E., Mller, P., Schmuker, M., et al. (2013). Six
purpose computing on graphics processing units. Neural Netw. 24, 950960. networks on a universal neuromorphic computing substrate. Front. Neurosci.
doi: 10.1016/j.neunet.2011.06.008 7:11. doi: 10.3389/fnins.2013.00011
Indiveri, G., Linares-Barranco, B., Hamilton, T. J., van Schaik, A., Etienne- Pinto, N., Doukhan, D., DiCarlo, J. J., and Cox, D. D. (2009). A high-throughput
Cummings, R., Delbruck, T., et al. (2011). Neuromorphic silicon neuron screening approach to discovering good forms of biologically inspired
circuits. Front. Neurosci. 5:73. doi: 10.3389/fnins.2011.00073 visual representation. PLoS Comput. Biol. 5:e1000579. doi: 10.1371/jour-
Izhikevich, E. M. (2003). Simple model of spiking neurons. IEEE Trans. Neural nal.pcbi.1000579
Netw. 14, 15691572. doi: 10.1109/TNN.2003.820440 Prinz, A. A., Billimoria, C. P., and Marder, E. (2003). Alternative to hand-tuning
Izhikevich, E. M., and Desai, N. S. (2003). Relating STDP to BCM. Neural Comput. conductance-based models: construction and analysis of databases of model
15, 15111523. doi: 10.1162/089976603321891783 neurons. J. Neurophysiol. 90, 39984015. doi: 10.1152/jn.00641.2003
Izhikevich, E. M., and Edelman, G. M. (2008). Large-scale model of mammalian Prinz, A. A., Bucher, D., and Marder, E. (2004). Similar network activity from
thalamocortical systems. Proc. Natl. Acad. Sci. U.S.A. 105, 35933598. doi: disparate circuit parameters. Nat. Neurosci. 7, 13451352. doi: 10.1038/nn1352
10.1073/pnas.0712231105 Richert, M., Nageswaran, J. M., Dutt, N., and Krichmar, J. L. (2011). An efficient
Izhikevich, E. M., Gally, J. A., and Edelman, G. M. (2004). Spike-timing dynamics simulation environment for modeling large-scale cortical processing. Front.
of neuronal groups. Cereb. Cortex 14, 933944. doi: 10.1093/cercor/bhh053 Neuroinform. 5:19. doi: 10.3389/fninf.2011.00019
Jones, J., and Palmer, L. (1987). An evaluation of the two-dimensional gabor fil- Risi, S., and Stanley, K. O. (2012). An enhanced hypercube-based encoding for
ter model of simple receptive-fields in cat striate cortex. J. Neurophysiol. 58, evolving the placement, density, and connectivity of neurons. Artif. Life 18,
12331258. 331363. doi: 10.1162/ARTL_a_00071
Keijzer, M., Merelo, J. J., Romero, G., and Schoenauer, M. (2002). Evolving objects: Rossant, C., Goodman, D. F., Fontaine, B., Platkiewicz, J., Magnusson, A. K., and
a general purpose evolutionary computation library, in Artficial Evolution, eds Brette, R. (2011). Fitting neuron models to spike trains. Front. Neurosci. 5:9. doi:
P. Collet, C. Fonlupt, J. K. Hao, E. Lutton, and M. Schoenauer (Berlin: Springer- 10.3389/fnins.2011.00009
Verlag), 231242. Schliebs, S., Defoin-Platel, M., Worner, S., and Kasabov, N. (2009). Integrated
Krichmar, J. L., Dutt, N., Nageswaran, J. M., and Richert, M. (2011). feature and parameter optimization for an evolving spiking neural network:
Neuromorphic modeling abstractions and simulation of large-scale exploring heterogeneous probabilistic models. Neural Netw. 22, 623632. doi:
cortical networks, in Proceedings of the 2011 IEEE/ACM International 10.1016/j.neunet.2009.06.038
Conference on Computer-Aided Design (ICCAD) (San Jose, CA), 334338. doi: Schliebs, S., Kasabov, N., and Defoin-Platel, M. (2010). On the probabilistic opti-
10.1109/ICCAD.2011.6105350 mization of spiking neural networks. Int. J. Neural Syst. 20, 481500. doi:
Maitre, O., Baumes, L. A., Lachiche, N., Corma, A., and Collet, P. (2009). Coarse 10.1142/S0129065710002565
grain parallelization of evolutionary algorithms on GPGPU cards with EASEA, Seung, H. S., Lee, D. D., Reis, B. Y., and Tank, D. W. (2000). Stability of the memory
in Proceedings of the 11th Annual conference on Genetic and evolutionary compu- of eye position in a recurrent network of conductance-based model neurons.
tation (Montreal, QC), 14031410. Neuron 26, 259271. doi: 10.1016/S0896-6273(00)81155-1
Markram, H. (2006). The blue brain project. Nat. Rev. Neurosci. 7, 153160. doi: Sheik, S., Stefanini, F., Neftci, E., Chicca, E., and Indiveri, G. (2011). Systematic
10.1038/nrn1848 configuration and automatic tuning of neuromorphic systems, in Circuits and
Mirsu, R., Micut, S., Caleanu, C., and Mirsu, D. B. (2012). Optimized simulation Systems (ISCAS), 2011 IEEE International Symposium on, (Rio de Janeiro),
framework for spiking neural networks using GPUs. Adv. Electr. Comp. Eng. 12, 873876. doi: 10.1109/ISCAS.2011.5937705
6168. doi: 10.4316/aece.2012.02011 Song, S., Miller, K. D., and Abbott, L. F. (2000). Competitive hebbian learning
Mongillo, G., Barak, O., and Tsodyks, M. (2008). Synaptic theory of working through spike-timing-dependent synaptic plasticity. Nat. Neurosci. 3, 919926.
memory. Science 319, 15431546. doi: 10.1126/science.1150769 doi: 10.1038/78829

Frontiers in Neuroscience | Neuromorphic Engineering February 2014 | Volume 8 | Article 10 | 181


Carlson et al. Efficient spiking network parameter tuning

Stanley, K. O., DAmbrosio, D. B., and Gauci, J. (2009). A hypercube-based Yamazaki, T., and Igarashi, J. (2013). Realtime cerebellum: a large-scale spiking net-
encoding for evolving large-scale neural networks. Artif. Life 15, 185212. doi: work model of the cerebellum that runs in realtime using a graphics processing
10.1162/artl.2009.15.2.15202 unit. Neural Netw. 47, 103111. doi: 10.1016/j.neunet.2013.01.019
Stanley, K. O., and Miikkulainen, R. (2002). Efficient evolution of neural network Yosinski, J., Clune, J., Hidalgo, D., Nguyen, S., Zagal, J. C., and Lipson, H. (2011).
topologies, in The Proceedings of the Genetic and Evolutionary Computation Evolving robot gaits in hardware: the HyperNEAT generative encoding vs.
Conference, eds W. B. Langdon, E. Cantu-Paz, K. E. Mathias, R. Roy, D. Davis, R. parameter optimization, in Proceedings of the 20th European Conference on
Poli, et al. (Piscataway, NJ; San Francisco, CA: Morgan Kaufmann), 17571762. Artificial Life (Paris).
Stanley, K. O., and Miikkulainen, R. (2003). A taxonomy for artificial embryogeny. Yudanov, D., Shaaban, M., Melton, R., and Reznik, L. (2010). GPU-based
Artif. Life 9, 93130. doi: 10.1162/106454603322221487 simulation of spiking neural networks with real-time performance and high
Stone, J. E., Gohara, D., and Shi, G. (2010). OpenCL: a parallel programming stan- accuracy, in Proceedings of the 2010 International Joint Conference on Neural
dard for heterogeneous computing systems. Comput. Sci. Eng. 12, 6673. doi: Networks (IJCNN) (Barcelona), 18. doi: 10.1109/IJCNN.2010.5596334
10.1109/MCSE.2010.69
Svensson, C. M., Coombes, S., and Peirce, J. W. (2012). Using evolutionary algo- Conflict of Interest Statement: The authors declare that the research was con-
rithms for fitting high-dimensional models to neuronal data. Neuroinformatics ducted in the absence of any commercial or financial relationships that could be
10, 199218. doi: 10.1007/s12021-012-9140-7 construed as a potential conflict of interest.
Thibeault, C. M., Hoang, R. V., and Harris, F. C. (2011). A novel multi-GPU neural
simulator, in Proceedings of the 2011 International Conference on Bioinformatics Received: 26 August 2013; accepted: 17 January 2014; published online: 04 February
and Computational Biology (BICoB) (New Orleans, LA), 146151. 2014.
Thibeault, C. M., and Srinivasa, N. (2013). Using a hybrid neuron in physiologi- Citation: Carlson KD, Nageswaran JM, Dutt N and Krichmar JL (2014) An efficient
cally inspired models of the basal ganglia. Front. Comput. Neurosci. 7:88. doi: automated parameter tuning framework for spiking neural networks. Front. Neurosci.
10.3389/fncom.2013.00088 8:10. doi: 10.3389/fnins.2014.00010
van Geit, W., de Schutter, E., and Achard, P. (2008). Automated neuron model opti- This article was submitted to Neuromorphic Engineering, a section of the journal
mization techniques: a review. Biol. Cybern. 99, 241251. doi: 10.1007/s00422- Frontiers in Neuroscience.
008-0257-6 Copyright 2014 Carlson, Nageswaran, Dutt and Krichmar. This is an open-access
van Rossum, M. C. W., Bi, G. Q., and Turrigiano, G. G. (2000). Stable hebbian article distributed under the terms of the Creative Commons Attribution License
learning from spike timing-dependent plasticity. J. Neurosci. 20, 88128821. (CC BY). The use, distribution or reproduction in other forums is permitted, provided
Watt, A. J., and Desai, N. S. (2010). Homeostatic plasticity and STDP: keep- the original author(s) or licensor are credited and that the original publication in this
ing a neurons cool in a fluctuating world. Front. Synaptic Neurosci. 2:5. doi: journal is cited, in accordance with accepted academic practice. No use, distribution or
10.3389/fnsyn.2010.00005 reproduction is permitted which does not comply with these terms.

www.frontiersin.org February 2014 | Volume 8 | Article 10 | 182

You might also like