M Ashford 2017

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Neural-network-based B. S.

Mashford
A. Jimeno Yepes
analysis of EEG data using I. Kiral-Kornek
the neuromorphic TrueNorth J. Tang
S. Harrer
chip for brain-machine
interfaces
Electroencephalography (EEG) is a noninvasive way to record brain
activity by means of measuring electrical fields arising from neural
activation. Being relatively inexpensive, safe, and readily available,
EEG-based techniques have been studied as potential methods for
controlling brain-machine interfaces. Previous attempts to analyze
EEG signals have focused on well-characterized sensorimotor data
features. However, the brain-machine interface field seems to have
stagnated in improving motor decoding using this method. One way
to overcome this hurdle is to use neural-network-based classification
methods to analyze brain-activity data. In this paper, we describe the
novel neural networks we created for analyzing existing EEG data.
Although these neural networks were programmed, trained, and
tested in a conventional central processing unit or graphics
processing unit environment, their novelty lies in their full
compatibility with IBM’s recently introduced ultralow power,
neuromorphic TrueNorth chip infrastructure, thus, constituting the
analytical units in the next generation of neurobionic mobile devices.
We report on the development of a new EEG signal classifier built on
a spiking neural network that runs on the TrueNorth platform. Using
a modified back-propagation training method that employs trinary
weights, we demonstrate state-of-the-art classification accuracy.

Introduction classification of images [2], audio data [3], natural language


The human brain has the extraordinary ability of reliably processing [4], and robotic control [5]. If humans can find
finding patterns in noisy data. Deep-learning technology a pattern in data, then it is likely that well-designed deep
uses deep neural networks that are designed to mimic the neural networks can do so, too. In some cases, their
human brain and its capabilities in this regard. Much like performance exceeds ours [6].
in an actual brain, information in artificial neural networks Given these advances, deep neural networks have been
is passed through several layers, each contributing to the proposed for addressing the technical challenge of
processing of the signal. Using back-propagation classifying data obtained from the human brain directly for
algorithms, connection strengths (weights) between single enabling novel types of brain-machine interfaces. However,
units of the layers are adjusted automatically in order to decoding brain signals, such as those from
achieve a certain result. electroencephalography (EEG) recordings, poses many
With increasing computing power, the application space challenges because of the large degree of variation in
for neural networks has been growing vastly over the last signals, arising from intersubject anatomical variations,
few decades [1]. This has led to recent breakthroughs in a inconsistent positioning of electrodes, and physiological
range of machine learning fields, including the contingencies such as perspiration.
The traditional pipeline for analyzing EEG signals first
Digital Object Identifier: 10.1147/JRD.2017.2663978 involves preprocessing the raw data to extract features.

ß Copyright 2017 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without
alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed
royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.

0018-8646/17 ß 2017 IBM

IBM J. RES. & DEV. VOL. 61 NO. 2/3 PAPER 7 MARCH/MAY 2017 B. S. MASHFORD ET AL. 7:1
Figure 1
Pipeline for moving EEG data from the laboratory, through pre-processing and training phases, before deployment onto TrueNorth hardware.

These features include the signal intensity within particular power while operating at maximum capacity, making it
frequency bands that are fed into a classifier. From this an ideal platform for assistive patient devices, such as
information, predictions are then made about unknown brain-machine interfaces.
measurement signals. The major limitation of this approach The complete processing pipeline for building a
lies in the fact that it relies on assumptions about which TrueNorth-compatible neural network is shown in
features are the most relevant ones for the classification task Figure 1. Once deployed onto the TrueNorth chip, the
at hand. network weights cannot change their value. For this reason,
Recently, it has been shown that deep neural networks training takes place off-line on a traditional computing
can effectively classify [7] EEG signals without requiring platform (GPU or CPU). The backpropagation training
hand-crafted features. However, on a traditional computing method used here is closely related to other machine
platform [e.g., a desktop graphics processing unit (GPU)], learning methods but with a number of key technical
running data through a deep neural network requires a differences. TrueNorth neuromorphic hardware implements
substantial amount of electrical power and computational a spiking network where information flows from neurons to
resources. Thus, to date, such computations are often done axons via spikes (binary signals). Neurons and axons are
on supercomputers. Yet, in the area of healthcare, mobility grouped into neurosynaptic cores with 256 axons and 256
and low power consumption are two of the most important neurons each, which limits the topology of the network.
features for any device. Even though deep neural networks Neurons and axons in a core are connected by a 256  256
show promise in handling brain-activity data, due to these synaptic crossbar, which indicates whether an axon is
two requirements, their usefulness for patient devices in this connected to a neuron. Each axon is assigned one of four
space has been limited. available axon types. The strength of each axon signal in
With the development of the TrueNorth chip, we are the neuron integration function is defined by the weight that
now in a position where these concerns can be addressed. the neuron assigns to each axon type. The output of each
As background, note that IBM has recently introduced a core can be connected to other cores in the TrueNorth chip.
novel brain-inspired hardware platform called TrueNorth. One complete cycle of computation by all the neurons in
Being a so-called neuromorphic chip, TrueNorth is all the cores is denoted as one tick. Constraining weight
capable of performing a fundamentally new type of values into binary signals leads to important differences
computing that replaces conventional processing steps in the required processing hardware, since many
with brain-inspired operations. This allows TrueNorth to multiply-accumulate operations will be replaced by simple
analyze certain types of data much more efficiently than accumulations. Multiplier hardware components consume
any conventional state-of-the-art platform [8]. After the largest fraction of power in digital neural networks, and
training a neural network on data in a conventional CPU/ so a spiking network of this type yields a large decrease in
GPU environment, the obtained weights can be chip power consumption. Although there is an inevitable
transferred onto the TrueNorth chip. Subsequent data can loss of precision when transforming complex signals to
then be classified in near real time on the chip directly. spiking representation, recent developments in training
During operation, TrueNorth requires less than 70 mW models constrained to trinary weights (values of 1, 0, or

7:2 B. S. MASHFORD ET AL. IBM J. RES. & DEV. VOL. 61 NO. 2/3 PAPER 7 MARCH/MAY 2017
þ1) have demonstrated performance levels across a range network configuration, we use 8,704 neurons (out of a total
of benchmark tasks that are comparable to state-of-the-art of 1 million available on the TrueNorth processor), while
networks using floating point values [9]. the four-layer network uses 30,976 neurons. A summary of
In this work, a modified network training algorithm is the number of cores used to implement each layer in the
utilized whereby the synaptic weights are constrained to network is shown in Table 1.
values of 1, 1, or 0 (no crossbar connection) during the The network layout follows a multilayer feedforward
training phase. Previously, the conversion to trinary scheme (see [10] for a more detailed explanation). The first
weighted synapses occurred at the conclusion of the input layer receives the transduced input data while the
training phase, leading to a discrepancy between the following layers have as input the output from previous
classification performance of the training phase and of cores. Due to the TrueNorth neurosynaptic core design,
deployment models [10]. We have described the full details the maximum number of inputs is 256, which means that
of this development elsewhere [11]. only a part of the input data can be input to a core. To
overcome this problem, a square-shaped subsection of
Method the input data is mapped to each core in the first layer, with
We apply TrueNorth-compatible neural networks to an a sliding-window approach used to cover all the input data.
EEG dataset that was recorded using a participant In the following layers, the output from the previous layer is
performing left-hand and right-hand motor imagery [12]. mapped to cores in the next layer in a similar fashion. The
The dataset was recorded at the Institute for Biomedical number of cores required by each network is calculated
Engineering, University of Technology Graz and was using the configuration of the network and the size of the
made available in the BCI (brain-computer interface) EEG data examples.
Competition III [13]. In this study, we focus on data In each network, we utilize only a small fraction of the
from participant S4. The experiment involved seated 4,096 available cores on the TrueNorth processor. When
participants watching an image on the monitor that operating at full capacity, the TrueNorth processor
showed a ball falling into one of two colored baskets. consumes approximately 70 mW. The networks described
The subject was tasked with controlling the movement here occupy only a fraction of the processor capacity, and
of the ball by imagination of either left or right hand therefore the power consumption level is in the range of
movement. The order of left and right hand cues were tens of milliwatts. A more precise measurement of the
random. For each participant, the data were collected power level is problematic due to technical issues of
over three experimental sessions, but we append all three separating power consumption of the TrueNorth processor
sessions to ensure the dataset is sufficiently large for from auxiliary electronic components. This is a level that
network training. The challenge originally specified in is several orders of magnitude lower than that typically
the BCI IIIb competition involves a time-varying EEG consumed by a desktop GPU [14]. For a wearable device,
signal to be classified via a non-stationary classifier. for which the capacity of the connected power source
In this current study, we limit our analysis to a static (e.g., lithium-ion battery) is severely constrained, power
classifier trained on the full presentation window, requirements become a major consideration. It is therefore
which is a uniform 4- to 7-second period within each desirable to use a network configuration that is as small as
8-second-long trial. The data recordings were made with possible, while still delivering an adequate level of
two bipolar EEG channels, at a sampling rate of 125 Hz performance. We also must consider that although a larger
and then a bandpass filter between 0.5 and 30 Hz. Each network (containing more neurons) can lead to a better fit
of the two-channel samples consists of 242 data points on complex datasets, it can also lead to over-fitting, which
(measured electrical field at the location of the electrode) is a major challenge whenever the dataset is of limited
taken from the 375 possible data points that were size. The TrueNorth-constrained networks were trained
recorded within the 3-second-long experimental window. over 240 iterations using three learning rate steps (0.2,
In the current implementation of TrueNorth neural 0.02, and 0.002) of 80 iterations each. One common
network architecture, the input dimensions must be a approach to minimizing overfitting in image analysis tasks
square, so the two channels of data were appended to is to implement “drop-out” [15]. A dropout rate of 0.5 was
form a 22  22 input signal. An illustration of the applied between each layer, which was found to greatly
workflow is shown in Figure 2. decrease the accuracy gap between training and testing
We compare the classification accuracy that may be sets. A different technique to address the issue of network
reached by using two different neural network overfitting is to artificially expand the size of the dataset
configurations. Each of these configurations differs in their by applying transformations to the images (cropping,
number of layers and parameters. The TrueNorth processor rotating, flipping, etc.). We implemented a related idea
is composed of 4,096 cores, with each core having 256 here by applying a fixed-length window that slides to
input axons and 256 output neurons. In the two-layer random locations across the signal. Using the technique,

IBM J. RES. & DEV. VOL. 61 NO. 2/3 PAPER 7 MARCH/MAY 2017 B. S. MASHFORD ET AL. 7:3
Figure 2
Workflow for our method. Top: Raw signal data showing a single sample converted to a 22  22 pixel image. Bottom: Network configuration for the
two-layer network used in this study.

the number of samples in the training data was expanded The S4 dataset has been analyzed via a variety of
by a factor of five. classification methods, including support vector machines
and linear discriminant analysis [16]. Each of these methods
Results and discussion requires some variety of feature engineering to be applied to
The classification accuracy results are as follows: the two- the data. The resulting accuracy level we achieved by
layer network achieved 78.1% classification accuracy when
the original training set was used and 79.3% when applied
to the temporal-window augmented dataset. The four-layer Table 1 Structure of the two different neural network
network achieved 79.8% accuracy with the original dataset configurations used in this study, showing the number of
and 84.2% with the augmented dataset. Considering the hardware cores used. (n/a: not applicable.)
result on the two-layer network, there is a relatively small
gap in performance between the augmented and original
datasets. This gap is increased with the larger four-layer
network, where the data augmentation techniques lead to a
4% increase in accuracy. This result can be understood by
considering the higher likelihood of overfitting that is
incurred by adding a higher number of parameters to a
neural network.

7:4 B. S. MASHFORD ET AL. IBM J. RES. & DEV. VOL. 61 NO. 2/3 PAPER 7 MARCH/MAY 2017
results demonstrate that a single TrueNorth processor can
decode EEG signals with an accuracy that is equivalent to or
exceeds that achievable on traditional computing platforms,
opening up new possibilities for mobile, low-power
consumption devices for wearable healthcare and medical
applications. Future work will focus on performance
improvement of the developed EEG data classifiers and on
exploring their applicability for applications in neurobionics
and epileptic seizure monitoring and prediction.

References
1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
vol. 521, no. 7553, pp. 436–444, 2015.
2. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet
classification with deep convolutional neural networks,” in Proc.
Figure 3 Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
Classification accuracy (using the training result for the four-layer 3. H. Lee, P. Pham, Y. Largman, and A. Y. Ng, “Unsupervised
network configuration and augmented dataset) as a function of the input feature learning for audio classification using convolutional deep
data rate. At higher input data rates, no further increases in accuracy belief networks,” in Proc. Adv. Neural Inf. Process. Syst., 2009,
pp. 1096–1104.
were observed. 4. R. Collobert and J. Weston, “A unified architecture for natural
language processing: Deep neural networks with multitask learning,”
in Proc. 25th Int. Conf. Mach. Learn., Jul. 2008, pp. 160–167.
purely applying the backpropagation-based training method 5. S. Levine and P. Abbeel, “Learning neural network policies with
we developed [11] exceeds previously reported accuracy guided policy search under unknown dynamics,” in Proc. Adv.
Neural Inf. Process. Syst., 2014, pp. 1071–1079.
levels as obtained by the above mentioned conventional 6. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre,
classification techniques. G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,
Another important factor that determines how efficiently V. Panneershelvam, M. Lanctot, and S. Dieleman, “Mastering the
game of Go with deep neural networks and tree search,” Nature,
the TrueNorth platform can process real-time EEG signals vol. 529, no. 7587, pp. 484–489, 2016.
is obtained from an examination of different input signal 7. E. S. Nurse, P. J. Karoly, D. B. Grayden, and D. R. Freestone,
data representations. Data input into the TrueNorth “A generalizable brain-computer interface (BCIi) using machine
learning for feature discovery,” PloS One, vol. 10, no. 6, 2015,
network is represented via stochastically generated spikes. Art. no. e0131328.
In this scheme, each of the floating-point input data are 8. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy,
converted to a representation consisting of a sequence of J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo,
Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy,
spikes received within a temporal window of specified B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and
length. The length of this window can be adjusted, D. S. Modha, “A million spiking-neuron integrated circuit with a
permitting a tradeoff between precision and input scalable communication network and interface,” Science, vol. 345,
no. 6197, pp. 668–673, 2014.
bandwidth. An examination of this relationship is given in 9. M. Courbariaux, Y. Bengio, and J. P. David, “Binaryconnect:
Figure 3. Here, we see a considerable reduction in Training deep neural networks with binary weights during
classification accuracy when the input window is shorter propagations,” in Proc. Adv. Neural Inf. Process. Syst., 2015,
pp. 3123–3131.
than 64 ticks. At longer windows, no significant increase is 10. S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and
measured, indicating that sufficient precision has been D. S. Modha, “Backpropagation for energy-efficient
reached to fully represent the input signal. The TrueNorth neuromorphic computing,” in Proc. Adv. Neural Inf. Process.
Syst., 2015, pp. 1117–1125.
processor operates at 1,000 ticks per second; thus, an input 11. A. J. Yepes and J. Tang, “Improving energy efficiency and
sampling rate of 150 Hz would not be possible with the classification accuracy of neuromorphic chips by learning binary
current implementation of data encoding while achieving synaptic crossbars,” arXiv:1605.07740, 2016.
12. N. Brodu, F. Lotte, and A. Lecuyer, “Exploring two novel features for
maximum accuracy. This issue is addressed in later EEG-based brain–computer interfaces: Multifractal cumulants and
implementations of TrueNorth compatible networks that predictive complexity,” Neurocomputing, vol. 79, pp. 87–94, 2012.
permit far higher input bandwidth [17]. 13. C. Vidaurre, A. Schl€ogl, R. Cabeza, and G. Pfurtscheller, “A fully
on-line adaptive brain computer interface,” Biomed. Tech. Band,
vol. 49 (Special issue), pp. 760–761, 2004.
Conclusion 14. Y. H. Lu, A. M. Kadin, A. C. Berg, T. M. Conte,
We have developed a classifier for two-channel EEG data E. P. DeBenedictis, R. Garg, G. Gingade, B. Hoang, Y. Huang,
B. Li, and J. Liu, “Rebooting computing and low-power image
based on a TrueNorth-constrained neural network. When an recognition challenge,” in Proc. IEEE/ACM Int. Conf. Comput.-
optimized four-layer network was used, a classification Aided Des., Nov. 2015, pp. 927–932.
accuracy of 84% was demonstrated. Augmentation of 15. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, “Dropout: A simple way to prevent neural
the dataset via randomly sampling over the time domain was networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1,
found to be useful in reducing overfitting of the model. These pp. 1929–1958, 2014.

IBM J. RES. & DEV. VOL. 61 NO. 2/3 PAPER 7 MARCH/MAY 2017 B. S. MASHFORD ET AL. 7:5
16. BCI Competition 2005 data set IIIa and IIIb. [Online]. Available: Jianbin Tang IBM Research, Melbourne, VIC 3053, Australia
ht_tp://www_.bbci.de/competition/iii/ ([email protected]). Mr. Tang is a Research Staff Member in the
17. S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, Brain-Inspired Computing Research program of IBM Research -
R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, Australia. He received his B.S. degree in information science and
T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, technology and his M.S. degree in information and communication
M. D. Flickner, and D. S. Modha, “Convolutional networks for systems in 1999 and 2002, respectively, from the Xi’an Jiaotong
fast, energy-efficient neuromorphic computing,” in Proc. Natl. University, Shaanxi, China. He subsequently joined Huawei
Acad. Sci., p. 201604850, 2016. Technologies as an algorithm engineer where he worked until 2005.
After that, he joined two startup companies—Adaptix Technologies, as
manager of the digital signal processing group until 2008, and then
Received May 20, 2016; accepted for publication June 15, Telepath Technologies, as manager of the algorithm group until 2010.
Since 2010, Mr. Tang had been a Research Staff Member at IBM
2016. Date of current version May 5, 2017. Research - China where he acted as team leader of a project concerning
the physical layer of a wireless Internet of Things platform. In 2012,
Mr. Tang transferred to IBM Research - Australia. He is author or
Benjamin S. Mashford IBM Research, Melbourne, VIC 3053,
Australia ([email protected]). Dr. Mashford received his Ph.D. coauthor of 19 patents and 2 technical papers.
degree in Nanoscience from the University of Melbourne in 2010, an
honor’s degree in physics from Monash University in 2005, and a Stefan Harrer IBM Research, Melbourne, VIC 3053, Australia
bachelor’s degree of applied physics from RMIT in 2003. He is a ([email protected]). In 2015, Dr. Harrer holds a Ph.D. degree in
Research Staff Member in the Brain-Inspired Computing Research electrical engineering and computer science from the Technical
program at IBM Research - Australia. Previously, he has held positions University Munich and an Honors Master-Level Degree in technology
of senior scientist at QD Vision, a Boston-based nanotechnology startup management from the Center for Digital Technology and Management.
company, as a researcher at Swinburne University, and as a software He co-founded the Brain-Inspired Computing Research program of
developer. He is author or coauthor of ten technical publications and IBM Research - Australia and now leads it as an IBM Research Staff
five patents. Member and Honorary Principal Research Fellow at the Centre for
Neural Engineering at the University of Melbourne. His team
Antonio Jimeno Yepes IBM Research, Melbourne, VIC 3053, spearheads an effort to employ IBM’s cognitive TrueNorth chip to
Australia ([email protected]). Dr. Jimeno Yepes received develop deep-learning-enabled biomedical and healthcare solutions at
his M.S. degree in computer science in 2001, an M.S. degree in the intersection of neuroscience and neuromorphic computing. Since
intelligent systems in 2008, and his Ph.D. degree, in 2009, in computer joining IBM Research in 2008, Dr. Harrer has worked on
semiconductor, biotechnology, and nanotechnology research projects in
science from the University Jaume I, Spain. He is a senior researcher
working in the fields of brain-inspired computing and text analytics in nanofabrication, materials science, DNA sequencing, and biomedical
the Healthcare Group at IBM Research - Australia. He previously engineering at IBM AlbanyNanotech, the IBM T. J. Watson Research
worked as a software engineer at CERN from 2000 to 2006, as a Center, and IBM Research - Australia. He has authored and coauthored
software engineer at the European Bioinformatics Institute from 2006 to more than 40 technical publications, is an inventor on 25 issued patents,
2010, as researcher at the U.S. National Library of Medicine from 2010 and has more than 20 patents pending. He is a Senior Member of the
Institute of Electrical and Electronics Engineers (IEEE), a member of
to 2012, as a researcher at National ICT Australia from 2012 to 2014,
the New York Academy of Sciences and the American Chemical
and at the University of Melbourne in the CIS (Computing and
Information Systems) department in 2014. Society, and an Associate Editor of the IEEE Transactions on
Nanobioscience. Dr. Harrer has received a Research Scholarship from
University of California–Berkeley, a Karl Chang Innovation Fund Grant
Isabell Kiral-Kornek IBM Research, Melbourne, VIC 3053, from MIT, and Research Grants from the National Institutes of Health
Australia ([email protected]). Dr. Kiral-Kornek is a postdoctoral and the Australian Research Council.
researcher in the Computational Science Group at IBM Research -
Australia. In 2009, she received a Dip.-Ing. degree in Electrical and
Electronic Engineering from the University of Hanover, Germany.
Dr. Kiral-Kornek then joined the startup company Trivacos. She joined
IBM in 2015 after receiving her Ph.D. degree with Bionic Vision
Australia at the University of Melbourne. Dr. Kiral-Kornek has authored
or coauthored more than 11 technical publications.

7:6 B. S. MASHFORD ET AL. IBM J. RES. & DEV. VOL. 61 NO. 2/3 PAPER 7 MARCH/MAY 2017

You might also like