SFN Short Course c1
SFN Short Course c1
© 2008 Pesaran
Spectral Analysis for Neural Signals 3
Notes functions, one for each frequency. The process is rep- lower frequencies by the process of sampling the sig-
resented by its amplitude and phase at each frequency. nal. Importantly, once the signal has been sampled at
The time and frequency domains are equivalent, Fs, we can no longer distinguish between continuous
and we can transform signals between them using processes that have frequency components greater
the Fourier transform. Fourier transforming a signal
1 . To avoid the
than the Nyquist frequency of —
that is in the time domain, xt, will give the values of F
2 s
the signal in the frequency domain, ~ x(f ). The tilde
denotes a complex number with amplitude and phase. problem of aliasing signals from high frequencies into
lower frequencies by digital sampling, an anti-
N aliasing filter is often applied to continuous analog
~
x(f) = ∑ exp(–2πiftn) signals before sampling. Anti-aliasing filters act to
t=1 low-pass the signal at a frequency less than the
Nyquist frequency of the sampling.
Inverse Fourier transforming ~ x(f ) transforms it to
the time domain. To preserve all the features in the Sampling point processes does not lead to the same
process, these transforms need to be carried out over problems as sampling continuous processes. The
an infinite time interval. However, this is never main consideration for point processes is that the
realized in practice. Performing Fourier transforms sampled point process be orderly. Orderliness is
on finite duration data segments distorts features in achieved by choosing a sufficiently short time inter-
the signal and, as we explain below, spectral estima- val so that each sampling interval has no more than
tion employs data tapers to limit these distortions. one event. Since this means that there are only two
possible outcomes to consider at each time step, 0
Nyquist frequency, sampling and 1, analyzing an orderly point process is simpler
than analyzing a process that has multiple possible
theorem, and aliasing outcomes, such as 0, 1, and 2, at each time step.
Both point and continuous processes can be repre-
sented in the frequency domain. When we sample a
process, by considering a sufficiently short interval in Method of moments for stochastic
time, t, and measuring the voltage or the presence processes
or absence of a spike event, we are making an Neural signals are variable and stochastic owing to
assumption about the highest frequency in the pro- noise and the intrinsic properties of neural firing.
cess. For continuous processes, the sampling theorem Stochastic processes (also called random process-
states that when we sample an analog signal that is es) can be contrasted with deterministic processes,
band-limited, so that it contains no frequencies which are perfectly predictable. Deterministic
greater than the Nyquist rate (B Hz), we can processes evolve in exactly the same way from a par-
perfectly reconstruct the original signal if we sample ticular point. In contrast, stochastic processes are
1 described by probability distributions that govern
at a sampling rate, Fs = —
δ t , of at least 2B Hz. The how they evolve in time. Stochastic processes evolve
to give a different outcome, even if all the samples
original signal is said to be band-limited because it of the process originally started from the same point.
contains no energy outside the frequency band given This is akin to rolling dice on each trial to deter-
by the Nyquist rate. Similarly, once we sample a mine the neural activity. Each roll of the dice is a
signal at a certain sampling rate, Fs, the maximum realization from a particular probability distribution,
frequency we can reconstruct from the sampled and it is this distribution that determines the proper-
1 ties of the signal. When we measure neural signals to
signal is called the Nyquist frequency, — Fs. repeated trials in an experiment, we assume that the
2
signals we record on each trial are different realizations
The Nyquist frequency is a central property of all or outcomes of the same underlying stochastic process.
sampled, continuous processes. It is possible to sam-
ple the signal more frequently than the bandwidth of Another powerful simplification is to assume that
the original signal, with a sampling rate greater than the properties of the stochastic process generating
twice the Nyquist rate, without any problems. This is the neural signals within each trial are stationary
called oversampling. However, if we sample the sig- and that their statistical properties don’t change with
nal at less than twice the Nyquist rate, we cannot time—even within a trial. This is clearly not strictly
reconstruct the original signal without errors. Errors true for neural signals because of the nonstationarities
arise because components of the signal that exist at a in behavior. However, under many circumstances, a
higher frequency than — 1
F become aliased into reasonable procedure is to weaken the stationarity
2 s
© 2008 Pesaran
Spectral Analysis for Neural Signals 5
assumption to short-time stationarity. Short-time variance and covariance (for more than one time
stationarity assumes that the properties of the series), and so on. If a stochastic process is a Gaussian Notes
stochastic process have stationarity for short time process, the mean and variance or covariances com-
intervals, say 300-400 ms, but change on longer time pletely specify it. For the spectral representation, we
scales. In general, the window for spectral analysis is are interested in the second moments. The spectrum
chosen to be as short as possible to remain consistent is the variance of the following process:
with the spectral structure of the data; this window
is then translated in time. Fundamental to time- SX(f)δ (f – fʹ) = E[x~*(f)x(fʹ)]
~
frequency representations is the uncertainty prin-
~ ~
ciple, which sets the bounds for simultaneous SdN(f)δ (f – fʹ) = E[dN*(f)dN(fʹ)]
resolution in time and frequency. If the time-
frequency plane is “tiled” so as to provide time
and frequency resolutions Δt = N by Δf = W, then The delta function indicates that the process is
NW ≥ 1. We can then estimate the statistical prop- stationary in time. The asterisk denotes complex
erties of the stochastic process by analyzing short conjugation. The cross-spectrum is the covariance of
segments of data and, if necessary and reasonable, two processes:
averaging the results across many repetitions or
trials. Examples of time-frequency characterizations ~ ~
SXY(f)δ (f – fʹ) = E[x*(f)y(fʹ)]
are given below. Note that this presentation uses
“normalized” units. This means that we assume the
sampling rate to be 1 and the Nyquist frequency The coherence is the correlation coefficient between
interval to range from –½ to ½. The chapter Appli- each process at each frequency and is simply the
cation of Spectral Methods to Representative Data covariance of the processes normalized by
Sets in Electrophysiology and Functional Neuro- their variances.
imaging presents the relationships below in units of
time and frequency. SXY(f)
CXY(f) =
Spectral analysis depends on another assumption: Sx(f)Sy(f)
that the stochastic process which generates the neu-
ral signals has a spectral representation.
This formula represents the cross-spectrum between
½
the two processes, divided by the square root of
xt = ∫–½ x(f)exp(2πift)df
~
the spectrum of each process. We have written this
expression for two continuous processes; analo-
gous expressions can be written for pairs of point-
Remarkably, the same spectral representation can continuous processes or point-point processes by
be assumed for both continuous processes (like LFP substituting the appropriate spectral representation.
activity) and point processes (like spiking activity), Also, we should note that the assumption of station-
so the Fourier transform of the spike train, tn, is arity applies only to the time interval during which
as follows: we carry out the expectation.
N
~
dN(f) = ∑ exp(2πiftn ) Multitaper spectral estimation
n=1
The simplest estimate of the spectrum, called the
periodogram, is proportional to the square of the data
The spectral representation assumes that underlying sequence, |~ xtr(f )|2. This spectral estimate suffers
stochastic processes generating the data exist in the from two problems. The first is the problem of bias.
frequency domain, but that we observe their real- This estimate does not equal the true value of the
izations as neural signals, in the time domain. As a spectrum unless the data length is infinite. Bias arises
result, we need to characterize the statistical proper- because signals at different frequencies are mixed
ties of these signals in the frequency domain: This is together and “blurred.” This bias comes in two forms:
the goal of spectral analysis. narrow-band bias and broad-band bias. Narrow-band
bias refers to bias in the estimate due to mixing sig-
The method of moments characterizes the statisti- nals at different nearby frequencies. Broad-band bias
cal properties of a stochastic process by estimating refers to mixing signals at different frequencies at
the moments of the probability distribution. The first distant frequencies. The second problem is the
moment is the mean; the second moments are the problem of variance. Even if the data length were
© 2008 Pesaran
6
Notes infinite, the periodogram spectral estimate would by a bandwidth parameter W. The important feature
simply square the data without averaging. As a of these sequences is that, for a given bandwidth
result, it would never converge to the correct value parameter W and taper length N, K = 2NW – 1
and would remain inconsistent. sequences, out of a total of N, each having their
energy effectively concentrated within a range
Recordings of neural signals are often sufficiently [–W, W] of frequency space.
limited so that bias and variance can present
major limitations in the analysis. Bias can be Consider a sequence wt of length N whose Fourier
reduced, however, by multiplying the data by a data transform is given by the formula
taper, wt, before transforming to the frequency-
domain, as follows: N
An elegant approach toward the solution of both The usual strategy is to select the desired analy-
the above problems has been offered by the multi- sis half-bandwidth W to be a small multiple of
taper spectral estimation method, in which the data the Raleigh frequency 1/N, and then to take the
are multiplied by not one, but several, orthogonal leading 2NW – 1 Slepian functions as data tapers
tapers and Fourier-transformed in order to obtain in the multitaper analysis. The remaining functions
the basic quantity for further spectral analysis. The have progressively worsening spectral concentra-
simplest example of the method is given by the tion properties. For illustration, in the left column of
direct multitaper estimate, SMT (f), defined as the Figure 1, we show the first four Slepian functions
average of individual tapered spectral estimates, for W = 5/N. In the right column, we show the
time series example from the earlier subsection
1
K multiplied by each of the successive data tapers. In
|x~k (f)|
2
SMT(f) = — ∑
K k=1
the left column of Figure 2, we show the spectra of
the data tapers themselves, displaying the spectral
N
concentration property. The vertical marker denotes
x~k(f) = ∑ wt(k)x t exp(–2πift) the bandwidth parameter W. Figure 2 also shows the
t=1
magnitude-squared Fourier transforms of the tapered
time series presented in Figure 1. The arithmetic
average of these spectra for k = 1, 2, . . . , 9 (note that
The wt(k) (k = 1, 2, … , K) constitute K orthogo- only 4 of 9 are shown in Figs. 1 and 2) gives a direct
nal taper functions with appropriate properties. multitaper estimate of the underlying process.
A particular choice for these taper functions, with
optimal spectral concentration properties, is given by Figure 3A shows the periodogram estimate of the
the discrete prolate spheroidal sequences, which we spectrum based on a single trial of LFP activity dur-
will call “Slepian functions” (Slepian and Pollack, ing the delayed look-and-reach task. The variability
1961). Let wt(k, W, N) be the kth Slepian function in the estimate is significant. Figure 3B presents the
of length N and frequency bandwidth parameter W. multitaper estimate of the spectrum on the same
The Slepians would then form an orthogonal basis data with W = 10 Hz, averaged across 9 tapers. This
set for sequences of length, N, and be characterized
© 2008 Pesaran
Spectral Analysis for Neural Signals 7
A B Notes
C D
Bandwidth selection
The choice of the time window length N and the
bandwidth parameter W is critical for applications.
No simple procedure can be given for these choices,
which in fact depend on the data set at hand, and
are best made iteratively using visual inspection and
some degree of trial and error. 2NW gives the num-
ber of Raleigh frequencies over which the spectral
Figure 2. Slepian functions in the frequency domain. Left panel: estimate is effectively smoothed, so that the vari-
spectra of Slepian functions from left panels of Figure 1. Right panel:
spectra of data from right panels of Figure 1.
ance in the estimate is typically reduced by 2NW.
Thus, the choice of W is a choice of how much to
estimate is much smoother and reveals the presence smooth. In qualitative terms, the bandwidth param-
of two broad peaks in the spectrum, at 20 Hz and eter should be chosen to reduce variance while not
60 Hz. Figure 3C shows the multitaper spectrum overly distorting the spectrum by increasing narrow-
estimate on the same data with W = 20 Hz. This band bias. This can be done formally by trading off an
estimate is even smoother than the 10 Hz, which appropriate weighted sum of the estimated variance
reflects the increased number of tapers available to and bias. However, as a rule, we find fixing the time
bandwidth product NW at a small number (typically
© 2008 Pesaran
8
Notes 3 or 4), and then varying the window length in time Calculating error bars
until sufficient spectral resolution is obtained, to be a The multitaper method confers one important
reasonable strategy. It presupposes that the data are advantage: It offers a natural way of estimating
examined in the time-frequency plane so that N may error bars corresponding to most quantities obtained
be significantly smaller than the total data length. in time series analysis, even if one is dealing with an
individual instance within a time series. Error bars
Figure 4 illustrates these issues using two spectrogram can be constructed using a number of procedures, but
estimates of the example LFP activity averaged across broadly speaking, there are two types. The funda-
9 trials. Each trial lasts approximately 3 s and consists mental notion common to both types of error bars
of a 1 s baseline period, followed by a 1–1.5 s delay is the local frequency ensemble. That is, if the spec-
period, during which a movement is being planned. trum of the process is locally flat over a bandwidth
The look-reach movement is then executed. Each 2W, then the tapered Fourier transforms ~ xk(f ) con-
spectrogram is shown with time on the horizontal stitute a statistical ensemble for the Fourier transform
axis, frequency on the vertical axis, and power color- of the process at the frequency, fo. This locally flat
coded on a log base-10 scale. Figure 4A shows the assumption and the orthogonality of the data tapers
spectrogram estimated using a 0.5 s duration analysis mean that the ~ xk(f ) are uncorrelated random vari-
window and a 10 Hz bandwidth. The time-frequency ables having the same variance. This provides one
tile this represents is shown in the white rectangle. way of thinking about the direct multitaper estimate
This estimate clearly shows the sustained activity presented in the previous sections: The estimate con-
following the presentation of the spatial cue at 0 s sists of an average over the local frequency ensemble.
that extends through the movement’s execution.
Figure 4B shows a spectrogram of the same data The first type of error bar is the asymptotic error bar.
estimated using a 0.2 s duration analysis window For large N, ~xk(f ) may be assumed to be asymptoti-
and a 25 Hz bandwidth. The time-frequency tile for cally, normally distributed under some general
this estimate has the same area as Figure 4A, so each circumstances. As a result, the estimate of the spec-
estimate has the same number of degrees of free- trum is asymptotically distributed according to a χ 2dof
dom. However, there is great variation in the time- S(f)
frequency resolution trade-off between these distribution scaled by dof . The number of degrees
estimates: Figure 4B better captures the transients in of freedom (dof ) is given by the total number of data
the signal, at the loss of significant frequency resolu- tapers averaged to estimate the spectrum. This would
tion that distorts the final estimate. Ultimately, the equal the number of trials multiplied by the number
best choice of time-frequency resolution will depend of tapers.
on the frequency band of interest, the temporal
dynamics in the signal, and the number of trials For the second type of error bar, we can use the
available for increasing the degrees of freedom of a local frequency ensemble to estimate jackknife error
given estimate. bars for the spectra and all other spectral quantities
(Thomson and Chave, 1991; Wasserman, 2007). The
A B idea of the jackknife is to create different estimates
by, in turn, leaving out a data taper. This creates a
set of spectral estimates that forms an empirical dis-
tribution. A variety of error bars can be constructed
based on such a distribution. If we use a variance-
stabilizing transformation, the empirical distribution
can be well approximated using a Gaussian distribu-
tion. We can then calculate error bars according to
the normal interval by estimating the variance of
the distribution and determining critical values that
set the error bars. Inverting the variance-stabilizing
Figure 4. Spectrogram of LFP activity in macaque LIP averaged transformation gives us the error bars for the original
across 9 trials of a delayed saccade-and-reach task. Each trial is spectral estimate. This is a standard tool in statistics
aligned to cue presentation, which occurs at 0 s. Saccade and reach and provides a more conservative error bar than the
are made at around 1.2 s. A, Multitaper estimate with duration of
500 ms and bandwidth of 10 Hz. B, Multitaper estimate with dura-
asymptotic error bar. Note that the degree to which
tion of 200 ms and bandwidth 25 Hz. White rectangle shows then the two error bars agree constitutes a test of how well
time-frequency resolution of each spectrogram. The color bar shows the empirical distribution follows the asymptotic
the spectral power on a log scale in arbitrary units. distribution. The variance-stabilizing transforma-
© 2008 Pesaran
Spectral Analysis for Neural Signals 9
tion for the spectrum is the logarithm. The variance- that if the data contain oscillatory components, they Notes
stabilizing transformation for the coherence, the are compactly represented in frequency space and
magnitude of the coherency, is the arc-tanh. lead to nonlocal effects in the correlation function.
Similar arguments apply to the computation of corre-
As an example, Figure 5A shows asymptotic and lation functions for point and continuous processes.
Figure 5B empirical jackknife estimates of the spec- One exception is for spiking examples in which there
tral estimate illustrated in Figure 3D. These are are sharp features in the time-domain correlation
95% confidence intervals and are largely the same functions, e.g., owing to monosynaptic connections.
between the two estimates. This similarity indicates
that, for these data, the sampling distribution of the Figure 6 illustrates the difference between using spec-
spectral estimate follows the asymptotic distribution tral estimates and correlation functions. Figure 6A
across trials and data tapers. If we were to reduce the shows the spectrum of spiking activity recorded
estimate’s number of degrees of freedom by reducing in macaque parietal cortex during a delay period
the number of trials or data tapers, we might expect before a coordinated look-and-reach. The duration
to see more deviations between the two estimates, of the spectral estimate is 500 ms, the bandwidth is
with the empirical error bars being larger than the 30 Hz, and the activity is averaged over nine trials.
asymptotic error bars. Thin lines show the empirical 95% confidence inter-
vals. Figure 6B shows the auto-correlation function
Correlation functions for the same data, revealing some structure around
Neural signals are often characterized in terms of short lags and inhibition at longer lags. There is a
correlation functions. Correlation functions are hint of some ripples, but the variability in the esti-
equivalent to computing spectral quantities but with mate is too large to see them clearly. This is not too
important statistical differences. For stationary pro- surprising, because the correlation function estimate
cesses, local error bars can be imposed for spectral is analogous to the periodogram spectral estimate,
estimates in the frequency domain. This is not true which also suffers from excess statistical variability.
for correlation functions, even assuming stationarity, In contrast, the spectrum estimate clearly reveals
because error bars for temporal correlation functions the presence of significant spectral suppression and a
are nonlocal. Nonlocality in the error bars means that broad spectral peak at 80 Hz. The dotted line shows
uncertainty about the correlation function at one lag the expected spectrum from a Poisson process having
is influenced by the value of the correlation function the same rate.
across other lags. The precise nature of the nonlocality
relies on the temporal dependence within the
underlying process. Consequently, correlation func- A B
tion error bars must be constructed by assuming there
are no dependencies between different time bins.
This is a far more restrictive assumption than the
one holding that neighboring frequencies are locally
flat and rarely achieved in practice. Other problems
associated with the use of correlation functions are
A B
Coherence
Figure 5. 95% confidence error bars for LFP spectrum shown The idea of a local frequency ensemble motivates
in Figure 3D. A, Asymptotic error bars assuming chi-squared
multitaper estimates of the coherence between two-
distribution. B, Empirical error bars using leave-one-out jack-
knife procedure. point or continuous processes. Given two time series
© 2008 Pesaran
10
Notes and the corresponding multiple tapered Fourier trans- is structure in the estimate, but the degree of vari-
forms ~xk(f ), ~
yk( f ), the following direct estimates can ability lowers the power of the analysis.
be defined for the coherence function:
Regression using spectral feature
1 ∑ xk (f)yk(f)~* ~ vectors
— Detection of period signals is an important problem
CXY (f) = K k
that occurs frequently in the analysis of neural data.
Sx(f)Sy(f) Such signals can arise as a result of periodic stimu-
lation and can manifest as 50/60 Hz line noise. We
pursue the effects of periodic stimulation in the mul-
This definition allows us to estimate the coherence tivariate case in the next chapter Multivariate Neural
from a single trial. Estimating the coherence presents Data Sets: Image Time Series, Allen Brain Atlas. As
many of the same issues as estimating the spectrum, discussed therein, certain experiments that have no
except that more degrees of freedom are needed innate periodicity may also be cast into a form that
to ensure a reasonable estimate. In common with makes them amenable to analysis as periodic stimuli.
spectrum estimates, the duration and bandwidth of We now discuss how such components may be de-
the estimator need to be chosen to allow sufficient tected and modeled in the univariate time series by
degrees of freedom in the estimator. Increasing the performing a regression on the spectral coefficients.
number of trials will increase the effective resolution
of the estimate. Periodic components are visible in preliminary
estimates as sharp peaks in the spectrum, which, for
Figure 7 shows the coherence and correlations multitaper estimation with Slepians, appear with flat
between two simultaneously recorded spike tops owing to narrow-band bias. Consider one such
trains from macaque parietal cortex averaged sinusoid embedded in colored noise:
over nine trials. Figure 7A shows the coherence
estimated with 16 Hz bandwidth. The horizontal x(t) = A cos(2π ft + ϕ) + η(t)
dotted line represents expected coherence for this
estimator when there is no coherence between the It is customary to apply a least-squares procedure to
spike trains. The coherence significantly exceeds this obtain A and φ, by minimizing the sum of squares
threshold, as shown by the 95% confidence inter-
vals, in a broad frequency band. Figure 7B illustrates ∑|x(t) – A cos(2π f0t + φ )|2 . However, this is a
t
the coherence estimated with a 30 Hz bandwidth. nonlinear procedure that must be performed numeri-
The variability in the estimate is reduced, as is the cally; moreover, it effectively assumes a white-noise
noise floor of the estimator, as shown by the lower spectrum. Thomson’s F-test offers an attractive
horizontal dotted line. Figure 7C shows the cross- alternative within the multitaper framework by
correlation function for these data. Here, too, there reducing the line-fitting procedure to a simple
linear regression.
The solution is given by Pesaran B, Pezaris JS, Sahani M, Mitra PP, Notes
Andersen RA (2002) Temporal structure in
K
neuronal activity during working memory in
∑U (0) x~ (f )
k k 0 macaque parietal cortex. Nat Neurosci 5:805-811.
μ (f 0) = k=1
K
Slepian D, Pollack HO (1961) Prolate spheroidal
wavefunctions: Fourier analysis and uncertainty
∑|U (0)| k
2
I. Bell System Tech Journal 40:43-63.
k=1
Steriade M (2001) The intact and sliced brain.
The goodness of fit of this model may be tested Cambridge, MA: MIT Press.
using an F-ratio statistic with (2, 2K − 2) degrees Thomson DJ (1982) Spectrum estimation and
of freedom, which is usually plotted as a function of harmonic analysis. Proc IEEE 70:1055-1996.
frequency to determine the position of the significant
Thomson DJ, Chave AD (1991) Jackknifed error
sinusoidal peaks in the spectrum,
estimates for spectra, coherences, and transfer
K
functions. In: Advances in spectrum analysis and
array processing, pp 58-113. Englewood Cliffs, NJ:
(K – 1)|μ (f )| ∑|Uk(0)|2 2
Prentice Hall.
F(f) = K
k=1
Wasserman L (2007) All of nonparametric statistics.
∑|x (f ) – μ (f )U (0)|
~
k k
2 New York: Springer-Verlag.
k=1
K
1
Sreshaped(f ) = — ∑|x~k (f ) – ∑μ iUk(f – fi)|2
K k=1 i
References
Brillinger DR (1978) Comparative aspects of the
study of ordinary time series and of point processes.
In: Developments in statistics vol. 11. Orlando,
FL: Academic Press.
Buzsaki G (2006) Rhythms of the Brain. New York:
Oxford UP.
Jarvis MR, Mitra PP (2001) Sampling properties of
the spectrum and coherency of sequences of action
potentials. Neural Comput 13:717-749.
Mitra PP, Bokil H (2007) Observed Brain Dynamics.
New York: Oxford UP.
Percival DB, Walden AT (1993) Spectral analysis for
physical applications. Cambridge, UK: Cambridge UP.
© 2008 Pesaran