A Model-Based Spike Sorting Algorithm for RemovingCorrelation Artifacts in Multi-Neuron RecordingsJonathan W. Pillow1*., Jonathon Shlens2., E. J. Chichilnisky2, Eero P. Simoncelli3
1 Center for Perceptual Systems, Department of Psychology and Section of Neurobiology, The University of Texas at Austin, Austin, Texas, United States of America, 2 The
Salk Institute, La Jolla, California, United States of America, 3 Howard Hughes Medical Institute and Center for Neural Science and Courant Institute, NYU, New York, New
York, United States of America
Abstract
We examine the problem of estimating the spike trains of multiple neurons from voltage traces recorded on one or moreextracellular electrodes. Traditional spike-sorting methods rely on thresholding or clustering of recorded signals to identifyspikes. While these methods can detect a large fraction of the spikes from a recording, they generally fail to identifysynchronous or near-synchronous spikes: cases in which multiple spikes overlap. Here we investigate the geometry offailures in traditional sorting algorithms, and document the prevalence of such errors in multi-electrode recordings fromprimate retina. We then develop a method for multi-neuron spike sorting using a model that explicitly accounts for thesuperposition of spike waveforms. We model the recorded voltage traces as a linear combination of spike waveforms plus astochastic background component of correlated Gaussian noise. Combining this measurement model with a Bernoulli priorover binary spike trains yields a posterior distribution for spikes given the recorded data. We introduce a greedy algorithmto maximize this posterior that we call ‘‘binary pursuit’’. The algorithm allows modest variability in spike waveforms andrecovers spike times with higher precision than the voltage sampling rate. This method substantially corrects cross-correlation artifacts that arise with conventional methods, and substantially outperforms clustering methods on both realand simulated data. Finally, we develop diagnostic tools that can be used to assess errors in spike sorting in the absence ofground truth.
Citation: Pillow JW, Shlens J, Chichilnisky EJ, Simoncelli EP (2013) A Model-Based Spike Sorting Algorithm for Removing Correlation Artifacts in Multi-NeuronRecordings. PLoS ONE 8(5): e62123. doi:10.1371/journal.pone.0062123
Editor: Bart Krekelberg, Rutgers University, United States of America
Received January 31, 2012; Accepted March 19, 2013; Published May 3, 2013
Copyright: � 2013 Pillow et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by: Royal Society Society USA/Canada Research Fellowship (JWP) (http://royalsociety.org/grants/); Center for PerceptualSystems, startup funding (JP) (http://www.utexas.edu/cola/centers/cps/); Sloan Research Fellowship (JWP) (http://www.sloan.org/); Miller Institute for BasicResearch in Science (JS) (http://millerinstitute.berkeley.edu/); National Eye Institute (NEI) grant EY018003 (EJC, EPS); National Institutes of Health (NIH) GrantEY017736 (EJC); and Howard Hughes Medical Institute (EPS) (http://www.hhmi.org/). The funders had no role in study design, data collection and analysis,decision to publish, or preparation of the manuscript.
Competing Interests: The authors have the following interests to declare: Jonathon Shlens is now employed by Google Inc, but his contribution to the projectcame entirely before he began his employment at Google. There are no patents, products in development or marketed products to declare. This does not alterthe authors’ adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
* E-mail: [email protected]
. These authors contributed equally to this work.
Introduction
Action potentials, often referred to as ‘‘spikes’’, are the
fundamental unit of communication in much of the nervous
system. The problem of estimating the timing and identity of
spikes from extracellular analog voltage recordings, generally
known as spike sorting, was originally studied for recordings of single
neurons on single electrodes. However, many newly developed
multi-electrode recording techniques aim to examine the simul-
taneous activity of populations of neurons in a neural circuit [1–6].
With a few notable exceptions, spike-sorting methodologies have
not kept up.
Most spike-sorting techniques rely on the observation that
individual neurons produce stereotyped spike waveforms. The
earliest methods, developed for single neurons recorded on single
electrodes, rely on the basic strategy of matched filtering: the
electrode waveform is compared against a temporally sliding
template and a spike is identified whenever the two are found to
match within some tolerance. This methodology predates the era
of digital computers, when the matching was done using hand-
adjusted threshold triggers on an oscilloscope [7]. A form of this
technique is still widely used in single-cell electrophysiology, where
the electrode position is adjusted to maximize the waveform
amplitude of one cell. In general, matched filtering is known to be
optimal for detecting isolated waveforms of known shape and
amplitude in a background of white noise [8]. However, this
optimality degrades quickly when waveforms of more than one
spike overlap, as is common in extracellular recordings. In fact,
much of the ‘‘background’’ noise in neural recordings is likely due
to spikes of other cells [9]; if those spikes are large enough, any
methodology based on template matching is likely to fail [10,11].
Moreover, because it typically requires hand-adjustment of
thresholding parameters, matched filtering is not practical for
sorting multi-electrode data from large electrode arrays.
Modern methods have extended the matched filtering strategy
to identify spikes from multiple cells, measured with multiple
electrodes, by first selecting short segments of the recorded
waveforms during which the voltage exceeds some threshold, and
then identifying individual neurons and their spikes by identifying
clusters within the space spanned by these segments [12,13]. A
PLOS ONE | www.plosone.org 1 May 2013 | Volume 8 | Issue 5 | e62123
variety of different clustering methodologies have been explored as
well as new methods for selecting appropriate waveform features
(e.g., [14–19]). But clustering methods, just like the matched
filtering methods that preceded them, exhibit failures when spikes
from two or more cells are superimposed [4,20–22]. Despite these
drawbacks, clustering methods are the current de facto standard;
they are distributed in analysis software by manufacturers of multi-
electrodes [23] and are considered adequate for most experiments
in which a relatively small number of neurons are recorded or
analyses in which a small fraction of errors are acceptable.
We suggest that the errors that occur when spikes are
superimposed are more severe than is commonly assumed. First,
these errors are not random, but highly systematic, and can
complicate conclusions regarding the occurrence of near-synchro-
nous spikes, and their role in network activity. Accumulating
evidence suggests that correlated or synchronized firing amongst
cells within a network is likely to be far more prevalent than
previously believed. For example, recent analysis of retinal
ganglion cells show that synchronous spikes constitute up to
60% of all spiking activity and can occur in events constituting a
large fraction of the neurons recorded [2,24]. Second, as recording
technology advances, increases in both the number of electrodes
and the recording fidelity of electrodes lead to ever more frequent
occurrences of spike superposition. Thus, spike sorting solutions
that directly address the superposition problem are clearly needed.
Several recent papers have addressed the problem of spike
sorting while explicitly addressing the problem of overlapping
spikes [4,25–28]. (See Discussion for a more detailed comparison).
Here we make several new contributions to the study of this
problem. First, we carefully examine the failure of clustering
methodologies in cases where spikes from multiple neurons
overlap. We examine how these failures lead to systematic artifacts
which can be used to diagnose any spike-sorting algorithm in the
absence of ground-truth. Second, we propose a framework for
spike sorting based on a simple generative model of extracellularly
recorded data. This model formalizes a set of prior beliefs and
assumptions about neural spike trains and waveforms and how
these signals combine to generate a noisy voltage waveform. In
particular, this model specifies that overlapping spikes from nearby
neurons superimpose linearly in the recorded voltage signal. We
introduce a greedy algorithm – ‘‘binary pursuit’’ – for obtaining
the approximate maximum a posteriori (MAP) estimate of the spike
trains given the voltage data under this model. We demonstrate
that in comparison to clustering methods, binary pursuit can
reduce both the number of missed spikes and the rate of false
positives. Finally, we develop a new method for assessing the spike
sorting error rate in the absence of ground truth, and we use this to
demonstrate the quality of our results on real data.
Results
Failures of Clustering MethodsWe begin by examining the geometry of extracellular spike
recordings in order to provide an intuitive illustration of the
limitations of clustering methods, and to motivate our proposed
methodology. Clustering methods for spike sorting follow several
generic steps. First, putative spike times and their associated
waveforms are isolated from an analog voltage trace. Then, the
voltage traces in the vicinity of these spike times are grouped into
clusters. The centroid of each cluster is identified as the spike
waveform of a neuron, and all traces that fall within a cluster are
then labelled as spikes of the corresponding neuron (see Methods).
Although the details vary, these steps constitute the primary
elements of most spike sorting algorithms described in the
literature [13,16,21] as well as most commercially available spike
sorting systems [23].
Clustering methods are generally successful when each neuron’s
spike waveform is sufficiently distinct from background noise and
from those of other neurons, or when spikes occur primarily in
isolation. However, these methods generally fail when spike
waveforms from multiple neurons are superimposed [4,10,11,30].
Specifically, if two neurons fire synchronously, the resulting
voltage trace will resemble the sum of the individual waveforms
[31]. The sum of the two spike waveforms forms a pattern that is
distinct from the waveforms considered separately, and clustering
methods will either assign the composite spike waveform to a
distinct cluster–thus ‘‘hallucinating’’ a fictitious neuron–or discard
the observation as an outlier that does not match any neuron.
Figure 1 demonstrates the systematic failure to identify the near-
synchronous spikes of two neurons recorded in primate retina
[29,32]. Figure 1A–B shows the superposition of synchronous
spike waveforms, which a clustering method fails to identify. The
problem is not limited to synchronous spikes, as shown in Fig. 1
C–D: any spikes whose waveforms exhibit non-zero dot product
can give rise to an unrecognizable composite waveforms when
superimposed. The feature-space trajectory of overlapping spikes
can trace out regions of feature space distinct from the waveforms
of each constituent neuron. These points will also typically be
discarded as outliers by traditional clustering methods.
The failure to correctly identify near-synchronous spikes in a
pair of neurons leads to an artifact that can be observed directly.
Figure 2 A shows the cross-correlation function (CCF) between
recovered spike trains of an adjacent pair of ON parasol retinal
ganglion cells (RGCs), which are known to exhibit some synchrony
in their spiking. The cross-correlation function provides an
estimate of the instantaneous spike rate of the second cell relative
to the time of a spike in the first cell. The plot in Figure 2 A shows
an increase in rate over the interval jtjv5 ms, which is typical for
the timescale of synchrony in these cells [33–35]. But one can also
see a pronounced CCF notch in the interval jtjv1 ms, which
corresponds to the most highly synchronized spikes. This notch
has been observed in extracellular but not intracellular recordings
[34,36], and its duration is matched to the interval over which the
clustering failures identified in Figure 1 occur. These two facts
suggest that the notch is an artifact that corresponds to spikes that
the clustering method has failed to identify.
This sharp notch in the cross-correlogram is quite common.
Figure 2 B shows a grayscale image whose rows are cross-
correlograms between pairs of simultaneously recorded adjacent
RGCs. The vertical black streak at t~0 corresponds to the notch,
and is seen to occur for many neuron pairs. Amongst synchronous
cells, we can further demonstrate that the strength of the notch
artifact is related to the geometry illustrated in Fig. 1. Intuitively,
the magnitude of the waveform of the secondary cell determines
how frequently spikes of the primary cell will fall outside of its
cluster (and thus be classified as outliers). Figure 2 C quantifies this
relationship, plotting a measure of the strength of the artifact
against the magnitude of the secondary neuron waveform, across
all pairs of adjacent RGCs. The significant correlation (r2~0:73)
supports the interpretation that the notch is an artifact arising from
failures of clustering for near-synchronous spikes.
Estimating Spike Trains with Binary PursuitWe formulate spike sorting as a statistical estimation problem.
Specifically, we develop a generative model that describes how the
measurements (extracellular voltage measurements) relate to the
quantities to be estimated (spike times and spike waveforms). We
also develop an algorithm for inferring spike times and waveforms
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 2 May 2013 | Volume 8 | Issue 5 | e62123
from measurements under this model [21]. We provide a
summary of our solution here; full details are provided in Methods.
Our model assumes that each neuron’s spikes give rise to a
characteristic space-time voltage pattern or ‘‘waveform’’ on the
recording electrodes. Spike waveforms may extend several
milliseconds in time and across many electrodes, depending on
the three-dimensional layout of electrodes and neurons, as well as
their electrical properties. We denote the waveform of the j’th
neuron with a vector-valued quantity, ~wwj(t), which has indices
across all electrodes at each time t relative to a spike time. We
Figure 2. Cross-correlation artifacts induced by failure of clustering method for temporally overlapping spikes. (A) The cross-correlation function (CCF), which expresses the firing rate of one neuron relative to the spike times of another neuron. The CCF shows a substantialelevation in the firing of the primary cell in a time window extending roughly 6 5 ms around the spike of the secondary cell, as well as a sharp notchat the origin (width roughly 6 1 ms). The timescale of this notch matches the range of times over which the waveforms interfere with each other, asshown in Figure 1. (B) Summary of pairwise cross-correlations for all adjacent ON retinal ganglion cells within a simultaneously recorded population(338 pairs). Each row of the image represents the CCF between a pair of cells (shade of grey represents firing rate relative to the mean). Rows aresorted according to the value of the center time bin. (C) For pairs of neurons with significant synchronized firing, the magnitude of the secondaryspike waveform (corresponding to the length of the red vector in Figure 1 ) provides a strong prediction of the strength of the CCF artifact (r2~0:73).We quantify the strength of the CCF artifact (index on vertical axis) as the difference between the average firing rate during the intervals of +(1,5) msand ({0:4,0:4) ms, divided by the baseline firing rate.doi:10.1371/journal.pone.0062123.g002
A
synchronous spiking superposition for various time shifts
C
PC
2 p
roje
ctio
n
B
+ =
PC 1 projection
PC
2 p
roje
ctio
n
+ =
D
PC 1 projection
0 ms
+0.1 ms
-0.15 ms+0.45 ms
Figure 1. Geometric picture of failures in clustering-based spike sorting, with multi-electrode retinal data [29]. (A) Synchronous spikewaveforms on a single extracellular electrode from two different neurons (black and red), which sum linearly to form a new waveform (blue) whenthese neurons fire synchronously. (B) Spike waveforms from these same two neurons projected into a two-dimensional linear feature space. Eachpoint in this space corresponds to a single recorded waveform. Black and red vectors indicate the waveforms shown in (A), and the correspondingclusters of colored points around each vector indicate the samples that were assigned to each neuron. Synchronous spikes from these two neuronsgive rise to voltage waveforms that lie near the sum of these two vectors (blue vector), and these points (gray) are generally discarded as outliers. (C–D) More generally, overlapping spikes with different temporal offsets produce different waveforms (example, with second waveform offset 20.45 msrelative to first, shown in (C). These summed waveforms lie along a trajectory in the feature space, parameterized by their temporal offset. Severalexamples (blue points) are shown in (D), along with their associated waveforms.doi:10.1371/journal.pone.0062123.g001
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 3 May 2013 | Volume 8 | Issue 5 | e62123
assume the voltages measured across electrodes during an
experiment are a sparse linear superposition of these spike
waveforms, contaminated with background noise, ~gg(t):
~vv(t)~Xnc
j~1
Xnt
t~0
xj(t{t)~wwj(t)z~gg(t), ð1Þ
where~vv(t) is a vector-valued function of time whose components
contain the raw voltage traces recorded on each electrode, and
xj(t) is a binary variable that indicates whether the jth neuron has
spiked at time t. Note that we have discretized time (i.e., t takes on
integer values corresponding to discretized time bins), in antici-
pation of a numerical optimization algorithm that will be
implemented on a digital computer. The sum over time represents
a convolution of each waveform ~wwj with the corresponding
neuron’s spike train xj . The constant nc is the number of neurons
in the population. The constant nt is the number of time steps in
the spike waveform (assumed the same on all electrodes for all
neurons).
We assume the probability distribution of the background noise
can be approximated as a multivariate Gaussian, which specifies
the conditional probabilistic relationship between the desired spike
times and waveforms, and the observed electrode voltages:
p VjX,Wð Þ! exp {1
2(V{W � X)TL{1(V{W � X)
� �, ð2Þ
Where V, W, and X are vectors containing the full content of~vv(t),~wwj(t), and xj(t) across space and time, and the bilinear term
W � X denotes the convolution expressed in Eq. 1. Note that Vdenotes the vector formed by taking the entire time|space matrix
of recorded electrode data and reshaping it into a single column
vector, while W � X denotes a vector of the same size, formed after
temporally convolving the waveform matrix ~wwj(t) with the binary
spike train xj(t) for each neuron and summing across neurons.
The covariance matrix L characterizes the spatiotemporal
covariance of the noise in the recorded voltage signal, which is
largely due to background electrical activity in the surrounding
neural tissue(some of which may be due to spikes that are too small
to reliably detect), and exhibits strong correlations in space and
time, particularly for dense arrays. We discuss estimation of L in
Methods.
To complete the generative model, we need to specify prior
probability distributions over the spike trains xj(t) and spike
waveforms ~wwj(t). For spike trains represented at high temporal
resolution, a natural prior is a Bernoulli distribution:
p(xj(t))~Pt
pxj (t)
j (1{pj)1{xj (t), ð3Þ
where xj is a binary variable representing a spike (or lack thereof)
for the jth neuron, in a single time bin t. The parameter pj
specifies the prior probability that a time bin contains a spike, and
is generally quite small. Given a voltage sampling rate of
20,000 Hz, for example, a neuron spiking at 40 Hz emits an
average of one spike per 500 bins, corresponding to a Bernoulli
prior with pj~0:002. This prior assumes that spikes in different
time bins, and for each neuron, occur independently. Finally, we
imposed a sparseness penalty on the spike waveforms ~wwj(t),
exploiting the fact that the waveforms tend to be localized across
electrodes, and to reduce the computational cost of inference (see
Methods for details).
This completes our generative model, consisting of a likelihood
P(VjX,W) and priors p(X) and p(W). From these ingredients, we
can use Bayes’ rule to obtain the posterior distribution over spikes
and waveforms given the data: p X,W jVð Þ!p(VjX,W)p(X)p(W).Our goal here is to develop a computational algorithm for
maximizing this posterior, that is, to obtain the maximum a posteriori
(MAP) estimate of the spikes and waveforms. (See [16] for a
discussion of more general Bayesian inference methods, which can
be made tractable for much lower-dimensional data). The negative
log-posterior provides a quadratic objective function that we will
seek to minimize for X and W:
L(X,W)~1
2(V{W � X)TL{1(V{W � X)
� �zcT X, ð4Þ
where c is a vector of constants that depend on the prior Bernoulli
spike rates fpjg. In essence, this objective function consists of two
terms that impose differing constraints on the solution. The first is
the squared error between the linear superposition of spike
waveforms and the voltage data (measured in the space of the
noise covariance). The second, which comes from the Bernoulli
prior, places a penalty on each spike, and thus serves to reduce the
number of spikes. The penalty (cost per spike) differs for each cell,
and is derived from the prior probability of spiking in that cell (see
Methods).
This is a hybrid discrete/continuous objective function (W is
continuous, X contains binary spikes), and there are no known
methods for finding unique global minimum apart from brute-
force search. Instead, we search for a local minimum using
coordinate descent, which involves alternating between solving for
each of these unknowns while holding the other fixed. Specifically,
the algorithm uses the following steps:
1. Initialize using a standard clustering algorithm to identify the
number of neurons and their approximate firing rates fpjg.2. Estimate the spike waveforms for all neurons across all
electrodes by minimizing the objective function L (Eq. 4 ) for
W; this is a simple least-squares linear regression problem.
3. Estimate the noise covariance L from the residual prediction
errors, then whiten the data by the inverse square root of L and
re-estimate waveforms WW.
4. Estimate spikes by minimizing L for X. This is a sparse binary
linear inverse problem [37], and the exact solution is
intractable. Instead, we develop a greedy method that we call
binary pursuit. Binary pursuit greedily inserts and removes spikes
so as to maximally decrease the objective function until a local
optimum is reached.
5. Return to step 1 and repeat until the estimated spike times and
waveforms change minimally.
We provide the full details of this algorithm, along with practical
and theoretical justification, in Methods.
Performance ComparisonTo evaluate our algorithm, we examined data recorded with a
multi-electrode array in primate retina [29]. The custom 512-
electrode array samples electrical activity at 20 kHz, providing
approximately 30 samples for each &1:5ms action potential (Fig. 1
a) [32]. This data set contains 364 identified retinal ganglion cells,
spiking at an average rate of 10 sp/s. This dataset is especially
challenging due to the high degree of multiplexing: each electrode
records spikes from many different neurons, and each neuron
projects to many (w50) electrodes. Spike superposition is
0.
1.
4.
3.
2.
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 4 May 2013 | Volume 8 | Issue 5 | e62123
exacerbated by the fact that mammalian retina exhibits substantial
synchronous spiking activity [33,38,39].
We compared spike train estimates obtained with traditional
clustering and with binary pursuit. The most immediate difference
was that binary pursuit identified a larger number of spikes for
every cell. These additional identified spikes generally overlapped
the spikes of other cells, as illustrated in Fig. 3. The left column
shows the spikes of four example cells obtained using a clustering
method.
For each example cell, the spikes of a second cell recorded on
similar electrodes are also shown (gray points). Binary pursuit
identifies a number of additional spikes, which are scattered in
multiple directions away from those identified by clustering (red
points). The red points are incorrectly classified as outliers by
clustering. (Note that some points do not appear to be outliers
within the two dimensions displayed, but are outliers along other
dimensions.) The middle column shows the predicted locations of
the superpositions of the spike waveforms of the two cells with
different temporal offsets. The right column shows additional
spikes of the primary cell identified by binary pursuit, color coded
according to whether they overlapped a spike of the second cell,
and if so, at what temporal offset. The estimate spike times are
consistent with the predicted superpositions in the middle panel.
Note that synchronous spikes (zero temporal offset) deviate furthest
from the cloud of isolated spikes.
We also compared the cross-correlations of spike trains
estimated with binary pursuit and clustering. Figure 4 A shows
examples for eight pairs of adjacent parasol cells (four ON, and
four OFF pairs). As shown in Fig. 2, the clustering method leads to
an artifact in the CCF (a notch at + 1 ms), but this artifact is
reduced or eliminated for the spike trains estimated using binary
pursuit. Figure 4 B summarizes this improvement across all pairs
of ON and OFF parasol cell in a single recording. Cells of opposite
polarity are known to exhibit weak anti-correlation [33,40], as can
be seen in cross-correlations of four example ON-OFF pairs,
shown in Fig. 4 C. Again, the clustering method produces an
artificial notch at the origin, indicating a failure to correctly
identify spikes that are near-synchronous, and this artifact is
systematically removed under binary pursuit. A summary across
the population is shown in Fig. 4 D. Curiously, on a small fraction
of cell pairs, a spurious peak in cross-correlation is observed even
for binary pursuit, which we believe reflects a lack of discrimina-
bility of the two waveforms (see Discussion).
The black curves in the panels of Fig. 5 summarize the relative
behavior of the two spike sorting methods. Figure 5 A shows that
binary pursuit identifies more spikes for every cell in our
population (N = 293 cells). Figure 5 B shows a comparison of the
magnitude of the CCF artifact. The spike trains obtained using
binary pursuit are seen to have little or no artifact. From these two
plots, one might be tempted to believe that binary pursuit has
solved the spike sorting problem. But further examination reveals a
new problem: an increase in refractory-period violations, which
provide another indicator of spike-sorting errors [4,15,24,41–43].
We quantify these errors in terms of the ‘‘contamination rate’’ for
each neuron, defined as the ratio of the frequency of occurrence of
spikes within the refractory period (v1:5 ms) to the baseline
frequency of spikes outside this window. (A contamination rate of
50% indicates that the rate of spikes detected during the refractory
window is equal to half the rate of spikes detected outside this
window). Figure 5 C shows a comparison of the contamination
rate for spikes sorted by clustering and binary pursuit. We see that
for a large proportion of the cells, binary pursuit has a significantly
higher contamination rate than clustering, and thus some of the
increase in spike rate seen in these cells is likely due to inclusion of
erroneous spikes.
Spike sorting is a type of signal detection problem, and it is well
known that failures in such problems come in two forms: misses (in
which a true spike is not detected), and false positives (in which an
artificial spike is inserted). The CCF artifact provides a measurable
indicator of misses, whereas the contamination rate is a
measurable indicator of false positives. In classical signal detection
theory, misses and false positive errors trade off against each other
as one adjusts the decision threshold [44]. In the context of a
Bayesian approach, one may accomplish this tradeoff by adjusting
the prior probability on signal occurrence. This idea may be used
directly with our spike sorter to trade off the CCF artifact against
the contamination rate, as shown in Fig. 5. Reducing the Bernoulli
spike rate decreases the number of estimated spikes, increases the
CCF artifact, and decreases the contamination rate (Fig. 5 A–C,
blue curves). However, a more moderate reduction in the
Bernoulli rate results in a contamination rate significantly below
that of clustering, while minimally increasing the CCF artifact
index (Fig. 5 B–C, purple curves). Thus, for these data, there exist
prior settings for which both types of errors occur less frequently
than with clustering.
Estimating Error Rates in the Absence of Ground TruthThe results of Fig. 5 show that spikes sorted with binary pursuit
depend significantly on the choice of prior spike rate, and suggest
that this value could be selected to simultaneously minimize both
the CCF artifacts (misses) and the refractory contamination (false
positives). These two measurable errors are only proxies for the true
errors that one would like to minimize. In general, one does not
know the true errors and we cannot assume that the true errors are
proportional to their corresponding measurable quantities.
We can use signal detection theory to develop a method for
assessing the error rate of individual neurons in the absence of
ground truth. This can be used both to select prior values for each
cell, and to determine which neurons have acceptable spike sorting
errors. The method is based on a simple observation: In a
Bayesian setting, if an estimate is well constrained by the data,
then the value of the prior parameter has little effect [45]. Thus, if
the spike waveform of a cell is easily distinguished from the
background noise and from the waveforms (or superpositions of
waveforms) of other cells, the number of spikes found for that cell
should be insensitive to the parameter value chosen. Fig. 6A
illustrates this effect by showing the sensitivity of spike count to the
Bernoulli prior parameter for two different RGCs. The well-
isolated cell shows a spike count that is stable with respect to
changes in threshold up to an order of magnitude in either
direction. In contrast, the poorly-isolated cell is highly sensitive to
the threshold value.
This behavior is nearly identical to that obtained from
simulation of a simple signal detection problem. Figure 6 B shows
results for detecting a scalar value from a scalar measurement
corrupted by additive Gaussian noise. Optimal detection (in the
sense of minimizing errors) is achieved by thresholding the
measurement at a value that depends on the prior probability of
occurrence of the signal [46]. For high SNR, the number of
detected events is stable over a broad range of thresholds, whereas
for low SNR, the number of detected events is highly sensitive to
the choice of threshold. This sensitivity provides an indication of
how cleanly the signal can be isolated from the noise, which is
directly related to the error rate in the two situations, as illustrated
in Fig. 6 C.
To make use of this relationship in spike sorting, we need to
estimate the relationship between the sensitivity and the error rate.
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 5 May 2013 | Volume 8 | Issue 5 | e62123
Figure 3. Comparison of spikes estimated using clustering and binary pursuit. Each row shows results for one example neuron. Each plotdepicts the 2D linear feature space used for clustering (see Fig. 1 ). Left column: Black and gray points indicate spikes obtained by clustering for twocells. Additional spikes obtained for the black cell by binary pursuit (but ignored as outliers by clustering), are scattered in various directions relativeto this ellipse (red points). Note that some points do not appear to be outliers within the two dimensions shown, but are outliers in other dimensions.Middle column: When the spikes of these two cells overlap in time, the resulting superimposed waveform is predicted to lie along a trajectory (seeFig. 1 ). Filled black and gray ellipses correspond to the location of isolated spikes for the primary and secondary cells, respectively. Size and shape ofellipses corresponds to the level curve (at one standard deviation) of the estimated (Gaussian) noise distribution. Colored ellipses indicate predictedlocations of noisy superimposed waveforms, with color indicating their temporal offset. Right column: Subset of spikes identified by binary pursuitthat were either isolated (black and gray points), or overlapping (colored points, with color indicating the temporal offset of the two spikes).doi:10.1371/journal.pone.0062123.g003
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 6 May 2013 | Volume 8 | Issue 5 | e62123
Figure 4. Cross-correlation artifacts introduced by clustering techniques are greatly reduced with binary pursuit. (A) Cross-correlationbetween four distinct pairs of adjacent ON parasol cells (left column) and OFF parasol cells (right column), for spike trains estimated using clustering(gray bars) and binary pursuit (red line). Dashed line indicates baseline firing rate. (B–C) Summary of cross-correlations between adjacent pairs ofneurons (338 ON and 369 OFF neuron pairs), with spike trains obtained from clustering (left column) and binary pursuit (right column). Within a singleimage, each row represents the cross-correlogram between a single pair of neurons, with intensity indicating firing rate relative to mean rate. Rowsare sorted by the firing rate of the bin at t~0. The artifactual notch at zero that arises from cluster-based sorting is now visible as a dark streak att~0, and largely disappears with binary pursuit sorting. (D) Cross-correlation between four distinct pairs of adjacent ON and OFF parasol cells. (E)Summary of cross-correlations between 225 pairs of adjacent ON and OFF parasol cells.doi:10.1371/journal.pone.0062123.g004
Figure 5. Comparison of spike trains estimated using binary pursuit and clustering. Three different summary statistics are computed andcompared for 293 retinal ganglion cells. For each statistic, the data are shown as ‘‘Q–Q’’ plots: Each line spans the range of quantiles from 5% to 95%,and points are plotted at corresponding deciles of the distributions from 10% to 90%. Different colored lines correspond to different Bernoulli spikerate priors: values in legend indicate a multiplicative factor on the log-prior, relative to the firing rate estimated from clustering. (A) Spike rate. (B)Cross-correlation function artifact index measures the depth of the ‘‘notch’’ at the origin of the cross-correlation function between a pair of cells, ameasure of missed spikes. (C) Refractory period contamination rate, which is a measure of false positives. Note that the purple curves (which arisefrom using a prior for each cell that is six times the firing rate of spikes estimated using clustering) show a reduction in both contamination and CCFartifacts relative to clustering.doi:10.1371/journal.pone.0062123.g005
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 7 May 2013 | Volume 8 | Issue 5 | e62123
We simulated 120 seconds of electrode data using the generative
model of Eq. 1 for 293 neurons, and estimated the spikes of each
neuron using binary pursuit. We recomputed these estimates while
varying the prior of each neuron individually. Figure 7 A shows a
scatter plot of the relationship between the sensitivity (quantified as
the derivative of the spike count with respect to the threshold for
each neuron), and the spike sorting error rate in the simulated
data. The data are reasonably well fit by a power law (straight line
fit on a log-log plot, r2~0:99).
As an example of the use of this relationship, suppose one
wanted to analyze only those neurons with a spike sorting error
rate less than 2%. Using the scatter plot of Fig. 7 A, we find that
that the estimated spike trains for 285 of the simulated neurons
had error rates v2% (Fig. 7 B). We then use the relationship
between sensitivity and error rate to estimate the error rates in the
real data. Figure 7 C suggests that the 49 neurons with a spike rate
sensitivity w0:07 are likely to have error rates w2%.
Discussion
We have formulated the spike-sorting problem in a statistical
estimation framework based on a generative model of extracellular
electrode data. The model, while extremely simple, provides an
explicit statement of the assumptions underlying our methodology:
the recorded voltage traces arise as a linear superposition of spike
waveforms from multiple neurons, along with additive correlated
Gaussian noise, with a prior on the frequency of each neuron’s
spikes.
We have shown that clustering methods, which are the current
de facto standard for sorting spikes, exhibit systematic failures,
arising from an implicit assumption that the spike waveforms
contained in the recorded voltage traces do not overlap. We
developed binary pursuit, an algorithm for finding a (local)
maximum of the posterior expressed by our model, and
demonstrated its capabilities in sorting multi-electrode data from
the retina, using refractory violations and cross-correlation
artifacts as measurable indicators of errors. In addition, we’ve
shown that a statistical formulation of the estimation problem
allows us to test the robustness of the spike sorting results to
perturbations in the prior parameters, providing a measure of the
quality of the results in the absence of ground truth.
Relationship to Previous WorkPrevious literature on spike sorting is quite extensive, but focuses
mostly on variants of matched filtering or clustering [13]. The
artifacts that can arise in these methods have been previously
documented [10,11,13,47], and a few authors have developed
post-processing algorithms for repairing them [4,11,26,28,48].
Such repairs can be effective in some situations, but since they are
generally not tied to any particular generative model, it can be
difficult to state the conditions under which they will succeed.
Several methods operate by identifying portions of the voltage
trace that are likely to contain spikes, and then searching
exhaustively for the combination of spikes (and temporal offsets)
that can best explain them. This type of method can be quite
effective for small numbers of cells, but the computational cost
scales exponentially with the number of cells, rendering it
intractable for large multi-electrode arrays.
One method closely related to our own uses a convex relaxation
of the discrete (binary) optimization problem [25]. Specifically, the
authors use an L1-norm (or ‘‘lasso’’) penalty on positive, real-
valued spike coefficients [49]. The resulting objective function is
identical to ours, but is convex on the augmented space of positive
(as opposed to binary) coefficients, meaning that a unique global
maximum can be obtained via quadratic programming. Spikes are
then obtained by thresholding these coefficients. We have
experimented with this approach on smaller datasets (using spike
trains from 27 neurons on 76 electrodes, published in [50]). We
found that the algorithm gave results of comparable quality to
binary pursuit, but required an order of magnitude more
computation time, making it impractical for datasets of the size
considered here.
Recent work from Prentice et al [26] describes a method for
Bayesian (MAP) spike train estimation that also has a number of
similarities to our own. In fact, that paper provides a more
complete method for spike-sorting, as it uses a clever method for
clustering multi-electrode data and estimating the number of
neurons (whereas we have relied a standard clustering method to
initialize our algorithm). However, [26] does not specifically
discuss cross-correlation artifacts or methods for assessing perfor-
mance in the absence of ground truth. The dataset examined in
[26] had a 30-electrode recording from 107 neurons with 1.5 sp/s
average spike rate; this differs substantially from our dataset, which
Figure 6. Sensitivity of number of spikes recovered to the prior on spike rate. (A) Results for two example cells, one well-isolated (blue),and one poorly isolated (red). Adjusting the Bernoulli prior parameter (for each cell individually) alters the threshold used for spike identification (seeMethods), which leads to an increase or decrease in the number of estimated spikes. (B) Simulation of detection of a scalar signal contaminated byGaussian noise, for two different SNRs. Insets indicate histograms of noise observations (black) and signal observations (gray). The number ofdetections (‘‘hits’’ plus ‘‘false positives’’) varies with the choice of threshold, and the shape of the curve depends on the SNR. (C) Error rates (‘‘misses’’plus ‘‘false positives’’) as a function of threshold for the simulations in (B).doi:10.1371/journal.pone.0062123.g006
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 8 May 2013 | Volume 8 | Issue 5 | e62123
had a 512-electrode recording from 298 neurons with 10 sp/s
average spike rate. Cross-correlation artifacts were likely a larger
problem in our dataset due to the higher spike rates and higher
density of neurons. Differences in the two algorithms reflect some
of the differences in datasets. For example, [26] used periods of
silence to estimate the noise covariance, and extracted isolated
(multi-neuron) firing events from the raw datastream before
sorting. By contrast, our recordings rarely exhibited total silence
across the array, and single-neuron waveforms often extended
across more than 100 electrodes. This high degree of temporal and
spatial overlap precluded the extraction of isolated ‘‘spiking event’’
data vectors, and required temporally traversing the entire raw
datastream to estimate spikes. In this sense, our algorithm more
closely resembles the methods of [4,28], which also involve greedy
subtraction of spikes from the raw data.
Taken together, it is clear that the recent literature has seen
the development of several closely-related methods, all involving
MAP inference under a generative model with Gaussian noise and
a sparse prior on spike trains. We believe there is much to be
gained by comparing, synthesizing and extending these algorithms
to improve speed, computational complexity, accuracy, and
robustness.
Sources of ErrorWe cannot provide guarantees on the absolute performance of
our algorithm, since performance is inherently limited by noise
level, the number of neurons, and the discriminability of their
waveforms. The presence of noise generally implies the possibility
of errors, and one should think in terms of understanding and
bounding the errors. In this regard, statistical formulation allows
us to partition errors into three categories, and to separately
consider improvements that might reduce each.
The first of these are irreducible model errors; that is, errors that
would be incurred if the data actually arose from the process
assumed in our model. One asks ‘‘what is the probability that any
particular spike or combination of spikes might be mistaken for
background noise, a different spike or combination of spikes.’’ This
is a multi-dimensional signal detection problem, and the error rate
will be a function of the amplitude and similarity of the spike
waveforms (at all relative temporal offsets), relative to the
amplitude of the noise. These errors can be examined through
simulations (i.e., by applying the spike sorter to artificial data
generated by drawing samples from the model), although it is
important to recognize that such simulations will also include the
effects of algorithmic errors (see next paragraph). Some authors
Figure 7. Quantifying the robustness of spike sorting in the absence of ground truth through a prior sensitivity analysis for allparasol cells ().N~293 (A) The sensitivity of the spike rate to the prior distribution plotted against the spike sorting error rate in simulated data (seetext for details). Note that both axes are plotted in logarithmic space. Dashed line is best fit line (r2~0:99). Gray box indicates spike rate sensitivitiesachieving v2% error. (B) Distribution of error rates across simulation. The solid blue line indicates 2% error rate in simulation. (C) The distribution ofspike rate sensitivities from simulation (top) indicate that 8 cells contain spike rate sensitivities which imply an w2% error rate. Distribution of spikerate sensitivities calculated from real data suggest that 49 cells contain w2% error rates.doi:10.1371/journal.pone.0062123.g007
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 9 May 2013 | Volume 8 | Issue 5 | e62123
have examined such errors (specifically, the tradeoff between hits
and false alarms) in the context of single neuron spikes [51].
The second type of error arises from failures of the optimization
algorithm. Our algorithm operates by taking iterative steps, each
of which decreases the negative log-posterior; we can therefore
guarantee that it will reach a local minimum. However, since the
objective function is not convex, this minimum is not guaranteed
to be a global minimum. The solution is also susceptible to
numerical approximation errors (e.g., Taylor series), although
careful implementation can ensure that these are not significant.
The sparse linear inverse problem has become a focus of intense
study over the past ten years, and the literature can be loosely
partitioned into two general classes: the greedy pursuit methods
(including iterative thresholding), and the convex relaxation
methods (e.g., basis pursuit). The greedy methods (such as the
one we have presented here) tend to make mistakes in which
overlapping spike waveforms are ‘‘explained’’ with an incorrectly
placed spike or combination of spikes. This could potentially be
improved with post-processing, in which one examines those
spikes or combinations of spikes that are most likely to generate
superposition errors (e.g., [48]). We have also begun to examine
relaxation methods [52].
The third type of error arises from incorrectness of the model.
The most common of these are likely to be errors in the assumed
waveforms or noise description. For example:
N the set of model waveforms might include a false waveform.
For example, if a clustering method is used to obtain initial
waveform estimates, two cells with a high degree of synchrony
can result in identification of a false neuron associated with the
combined waveform.
N the waveforms of real cells are variable, exhibiting slow drift or
systematic changes in amplitude or shape (e.g., during spike
bursts) [53,54].
N the electrode ‘‘noise’’ does not arise from a Gaussian process,
but primarily from the superposition of spikes of unsorted cells
[9]. Although it is generally intractable to fully incorporate this
into the model, some authors have modeled the non-
Gaussianity of these signals using heavy-tailed noise distribu-
tions [55,56].
We have deliberately designed our spike train model to be
simple, but the basic framework can be extended to incorporate
additional constraints on spike trains (e.g., refractoriness, joint
activity, stimulus dependencies) or variability (e.g., priors on the
waveform shapes, or on their drift in shape over time, [54]). In
general, additional constraints serve to further restrict the set of
possible solutions, which can improve the results if the constraints
correspond to true properties of the neurons, and assuming they
can be readily incorporated into the optimization algorithm. On
the other hand, over-constraining the solution can lead to
additional ‘‘Miss’’ errors. Similarly, the model could be relaxed
to allow more substantial variability in the spike waveforms, but if
this enlarges the set of possible solutions and can thus open the
door for additional ‘‘False Positive’’ errors.
Future DirectionsWe have focused on the problem of identifying spikes under the
assumption that the number of neurons is known. (Specifically, we
used a clustering analysis to estimate the number of neurons). A
full solution to the spike sorting problem should incorporate
uncertainty about the number of neurons as well. Recently
developed non-parametric Bayesian clustering methods based on
the Dirichlet process, which do not yet take account of
superposition but might be extended to do so, provide one
promising direction for future work [16]. Another important
direction is to improve the speed and computational efficiency of
our method, either through parallelization or perhaps through
greedy methods that employ binary pursuit only in restricted
spatio-temporal regions of the recording (i.e., where a region of
spike overlap can be identified through an increase in residual
error). Further improvements might be achieved by explicitly
modeling temporal dependencies in spike trains [43,53], tuning
information [58,59], non-stationarity of spike waveforms (due to
shifts in tissue or biophysical changes in the neurons themselves
[53,57]), and non-stationarities in the noise distribution. In our
view, the primary virtue of a model-based approach is that it
requires formalizing one’s assumptions about the statistical
structure of the data, making it possible to achieve improvements
either by identifying and replacing inaccurate assumptions, or by
observing new statistical features of the data that can make the
problem easier.
SummaryWe have provided a thorough analysis of superposition errors
that arise in clustering-based methods, a new spike-sorting
algorithm based on a generative model that allows for spike
overlap, and accompanying methods for assessing the robustness
of the estimated spike trains. These results provide a principled
and self-consistent formulation of the problem that can serve as a
substrate for the development of new model-based spike sorting
methods.
Methods
Mathematical Details of Sorting AlgorithmOur algorithm seeks to maximize the joint posterior L(X,W)
~ log p(X,WjV,L,fpjg) (Eq. 4 ) over spike trains X and spike
waveforms W given the voltage data V, the noise covariance L,
and prior spike probabilities fpjg. Our general inference strategy is
to maximize the log-posterior via coordinate ascent, which means
alternating between maximizing L for W and for X. This
procedure is guaranteed to converge to a local maximum of the
posterior.
The geometry of the log-posterior informs our optimization
strategy, and may in the future be exploited to design improved
spike train estimators. The expected voltage is a bilinear function
of X and W. Gaussian noise implies that maximizing L for Wgiven X is a linear least squares problem, which can be solved
efficiently by linear regression. Maximizing L for X given W is also
a linear least squares problem, due to the fact that the log
Bernoulli prior (Eq. 3 ) is linear in X. However, the discreteness of
X–each component must be zero or one–means that this
optimization is a non-convex problem. We therefore resorted to
a greedy algorithm for estimating X given W. However, the
convex relaxation that results from allowing scalar-valued X in the
interval ½0,1� does does have a unique global maximum. Spike
sorting methods that make use of this scalar solution for initializing
a search over binary spike trains may provide one promising
avenue for future research (see [25]). We implemented this method
but did not find any substantial improvement over the current
algorithm, suggesting that the additional computational cost of
such an approach is not justified for the recordings considered
here. We summarize the details of our algorithm below.
Waveform EstimationWe begin by estimating the spike waveforms W using an initial
estimate of the spike trains X(0), the latter of which is provided by a
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 10 May 2013 | Volume 8 | Issue 5 | e62123
clustering-based method (see Methods). The rationale for estimat-
ing W first is that the clustering-based method uses a low-
dimensional linear feature space derived from a small neighbor-
hood of nearby electrodes (depicted in Figs 1 and 3), and we would
like to learn each neuron’s full spatiotemporal spike waveform
across all electrodes to better identify spikes.
Given X(0), we maximize the posterior (Eq. 4 ) for W using an
initial assumption of independent noise (L equal to the identity
matrix). This yields the solution:
W(1)~ argminW
(V{W � X(0))TL{1(V{W � X(0))~
(MX(0)MX(0)
){1MTX(0)
V,ð5Þ
where MX(0)is a toeplitz matrix formed from the elements of
X(0) such that MX(0)W~W � X(0). This solution minimizes the
quadratic term in L.
We then prune W(1) by subset selection [60] on the vector norm
of Wij(1), the waveform of the i’th neuron on the j’th electrode.
That is, we set Wij(1) to zero if jjWij(1)jjva, where a was a
constant multiple of the noise on the j’th electrode. Subset
selection effectively induces sparsity on the estimate of W (see, e.g.,
[49,50,61–63]), which regularizes and reduces computational cost,
but does not bias estimates of large-amplitude waveforms.
Noise Covariance and WhiteningThe next step is to estimate the noise covariance L from initial
estimates of the spike trains X(0) and waveforms W(1). Knowledge
of this covariance will allow us to sphere the noise so that it is
independent in time and across electrodes [64]. This will
transform the first term in the log-posterior (Eq. 4 ) from a
weighted to an unweighted sum of squares, which reduces the
computational cost of spike train estimation.
We could in principle estimate L using the covariance of the
residual errors in predicting V, that is, LL~cov(V{W(1) � X(0)).
However, this matrix is far too large to estimate, or even to store in
memory. We therefore modeled the noise as having a separable
space-time correlation structure, with a limited extent in time. This
allowed us to whiten the data using a step-wise whitening
procedure: first, we estimated the temporal noise covariance Lt
on each electrode using a 16 time-bin (0.8 ms) window, and then
filtered the data from that electrode with the central column vector
of L{1
2t . Then, we estimated the instantaneous noise covariance Lx
across all 512 electrodes in the array (a 512|512 matrix) and
multiplied the vector of data in each time bin by whitening matrix
L{12
x .
Let ~VV denote the whitened electrode data obtained from this
two-stage whitening procedure. (The residuals of ~VV had approx-
imately flat autocorrelation in both time ans space, indicating that
the assumption of space-time separable noise was a reasonable
assumption). We then re-estimated and sparsified the waveforms
(as described above) to obtain ~WW(1), the whitened spike waveforms.
Spike Train EstimationThe most computationally intensive step in the algorithm is
estimating the set of spike trains X given W. This involves
maximizing the log-posterior in the space of whitened voltage
signals, which can be written:
~LL(X, ~WW)~{1
2(~VV{ ~WW � X)T (~VV{ ~WW � X) { cT X: ð6Þ
The final term cT X arises from the Bernoulli prior over each
neuron’s spike train. We initialize the prior probability of a spike in
each neuron using X(0), the spike train estimate returned by
clustering-based method (see Methods). We set ppj~nj=nT , where
nj is the number of spikes from the j’th neuron, and nT is the total
number of time bins in the experiment. The weights fccjgcomposing c are then given by
ccj~{ log (ppj)z log (1{ppj), ð7Þ
which follows from the fact that the log of the prior (Eq. 3 ) can be
written log p(xj)~xj( log pj{ log (1{pj))zc.
As noted above, maximizing ~LL for X is a quadratic optimization
problem on a binary lattice, since each element of X is 0 or 1. The
advantage of working in the whitened space is that the log-
posterior is just the sum of the residual errors plus the penalty term
from the Bernoulli prior; when inserting or a removing a particular
spike, we need only compute the change in residuals on the bins
where the expected voltage ~WW � X changes, i.e., the electrodes and
time bins affected by a particular spike waveform.
Greedy binary optimization procedes as follows. Let Xi denote
the ith bin of X and let X\i denote the vector X with the ith bin
removed. Let Mw denote the (highly sparse) toeplitz matrix for
convolution of waveforms with the spike trains, so MwX~W � X.
Let wi denote the ith column of the waveform matrix Mw, and M \iw
denote the same matrix with the ith column removed. We can now
evaluate ~LL with Xi~0 and Xi~1 in order to determine whether
the bin should contain a spike or not. Assuming that noise variance
s2~1 after whitening, we have, for all i:
~LL(Xi~0jX\i)~{1
2(V{M \i
wX\i)T (V{M \iwX\i){(c\i)T X\i
~LL(Xi~1jX\i)~{1
2(V{M \i
wX\i{wi)T (V{M \i
wX\i{wi){
(c\i)T X\i{ci
ð8Þ
The difference gives the change in the log-posterior for
changing Xi from 0 to 1:
D~LLi~VT wi{wTi M \i
wX\i{ci{1
2wT
i wi, ð9Þ
and {D~LL gives the change in the log-posterior for changing Xi
from 1 to 0. We can compute this difference for every bin i, with
initial setting X~X0. An obvious strategy for maximizing the
posterior is then to proceed greedily, selecting the bin i for which
(1{2Xi)DLi is largest, and flipping Xi from 0 to 1 or vice versa, as
determined by the sign of DLi. This strategy leads to a highly
efficient computational algorithm, since after flipping a bin Xi, we
only need to update DL in the bins j for which wTj wi is non-zero
(i.e., only bins nearby in time and space to neuron i). Moreover, we
can pre-compute wTi M\i
w for all i, making it extremely fast to
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 11 May 2013 | Volume 8 | Issue 5 | e62123
perform updates to DL (eq. 9) following a spike insertion or
deletion.
To reduce the computational cost of searching for the
maximum of D ~LL, we processed the data in 1 s blocks. This made
X a vector of length 512620,0000 = 10,240,000 for each block.
The algorithm terminates when ~LL can no longer be increased by
inserting or deleting a spike in any neuron in any time bin.
The full MAP inference algorithm (summarized in Algorithm 1
below) involves coordinate ascent, which involves cycling through
and re-estimating W, L, and fpjg in turn as described above,
repeating until the log-posterior cannot be increased further. In
practice, however, the high cost of running multiple rounds of
coordinate ascent, and the relatively good performance achieved
with a single round of updates led us to stop with X(1), the spikes
obtained from the first maximization of ~LL for X.
Empirically, we found that sorting with the sparsity penalty cdetermined from the ‘‘plugin’’ estimate for the Bernoulli param-
eter ppj (Eq. 7) led to an undesirably large increase in contami-
nation rate for many cells (see Fig. 5). For this reason, we
systematically varied c by a multiplicative factor, and found that a
reasonable tradeoff between CCF artifact and contamination rate
was obtained with penalty increased by a factor of 6, giving
ccj~6( log (1{ppj){ log ppj).
Accounting for Spike Waveform VariabilityThe algorithm described above assumes that spikes occur on a
fixed lattice of discrete time points (with 0.05 ms spacing, given the
20 KHz sampling of our data). One consequence of this
discretization is that ‘‘true’’ spike waveforms present in the analog
voltage trace may be shifted relative to the waveform templates
subtracted or added during binary pursuit. To address this form of
aliasing error, we used a local expansion of the waveform of each
neuron to account for shifts in the exact spike time and variations
in the spike amplitude and spike width. This additional flexibility
allows us to resolve spike times to a finer resolution than the
sampling rate of the analog trace, and to account for variability in
spike waveform height and amplitude that arises (for example)
during bursting activity.
We account for such variability by assuming that a spike
waveform ~ww can vary slightly in time t (relative to the discrete time
lattice), amplitude a or width s each time it appears in the data.
Specifically, we represent each spike in the data using a local
Taylor series approximation centered on the ‘‘canonical’’ wave-
form:
~ww~ww~~wwza1d~ww
dtza2
d~ww
daza3
d~ww
ds, ð10Þ
where the waveform derivatives can be computed numerically:
d~ww
dt~~ww(tzD){~ww(t)
D,
d~ww
da~~ww(t),
d~ww
ds~~ww((1zD)t){~ww(t)
D:
ð11Þ
For the derivative with respect to spike width s, we interpolate the
waveform and center it so that t~0 corresponds to the waveform
peak; this ensures that the time-dilation ~ww((1zD)t) increases the
width without shifting peak location. The basic intuition here is
that, for smooth waveforms ~ww(t), small shifts in spike time,
amplitude, or width can be closely approximated by adding a
small amount of the appropriate waveform derivative. (See [52]
for a more direct embedding of this idea in a convex relaxation
scheme known as continuous basis pursuit.).
For each observed spike in the dataset, the weights a1, a2 and a3
must be estimated in order to determine the exact spike time,
amplitude and width. We simplify the formula above by expressing
the ‘‘corrected’’ waveform ~ww~ww in matrix notation.
~ww~ww~~wwzGa
where G:d~ww
dt
d~ww
da
d~ww
ds
� �and a:½a1 a2 a3�T
If we assume that a single spike occurs, then we can express the
unknown a in terms of the voltage signal.
V~~ww~wwzj
~(~wwzGa)zj ð12Þ
where j is a zero-mean Gaussian noise. Given this, the least-
squares value of a may be obtained as: G:
aa~G{(V{~ww): ð13Þ
The pseudo-inverse G{ can produce large values of aa when the
data exhibits large deviations from the true waveform, causing
unrealistically large changes in spike width or amplitude. We can
keep the correction small by adding an L2 (‘‘ridge’’) penalty
ljjajj2, which shrinks aa toward zero and results in the formula:
aa~(GT GzlI){1GT (V{~ww)~G{{(V{~ww) ð14Þ
We set the regularization parameter l to minimize contamination
errors in cross-validation data. Note that the new matrix G{{ can
be pre-computed for each waveform ~ww and applied to any residual
(V{~ww) before maximizing the log-posterior to solve for the spike
times. We incorporated this update rule into the binary pursuit
algorithm described above, using it to update the residual error
between V and W � X whenever a spike was added to X.
Algorithm 1: MAP inference procedure
1. Estimate waveforms W by linear regression given voltagedata V and initial spike train estimate X(0).
2. Prune W, removing unnecessary electrodes from eachneuron’s spike waveform via subset selection (or otherfeature selection method).
3. Compute the residuals R~V{W � X(0) and estimate
noise covariance LL~cov(R).
4. Whiten by the square root of the inverse covariance:~VV~LL{1
2V and ~WW~LL{12WW
5. Estimate prior spike probabilities for each neuron:ppj~
Pt Xj(t)=nT
6. Estimate spike trains X via binary pursuit given ~VV, ~WW,pp.
7. Return to 1; Repeat until convergence.
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 12 May 2013 | Volume 8 | Issue 5 | e62123
In our dataset, we found that the temporal derivative term made
the largest contribution to performance, and that the resulting
estimates exhibited far fewer ‘‘doublets’’, where the algorithm
erroneously inserts two spikes from the same neuron in adjacent
time bins.
Clustering MethodIn our multi-electrode recordings from primate retina, each
electrode samples electrical activity at 20 kHz, providing approx-
imately 30 samples for each &1.5 ms action potential (Fig. 1 a),
and each spike typically elicits voltage signals occur across multiple
electrodes, reflecting electrical propagation through dendrites,
soma and axon (Fig. 1 b; see also [65]).
To obtain initial estimates of the spike waveforms present in a
recording, we use a standard clustering methodology. The basic
steps can be summarized as follows:
1. For each ‘‘center’’ electrode, identify candidate spikes via
thresholding, and create a vector of the voltage data from a
1.5 ms window of time and neighborhood of 6 immediately
neighboring electrodes.
2. Reduce dimensionality of the resulting collection of vectors
using PCA.
3. Cluster the resulting vectors and identify the points in each
cluster as the spikes of a single neuron, with human oversight to
determine the number of clusters and assess the reliability of
cluster assignment.
The spike sorting literature contains an extensive treatment of
such methods [13,20,21].
Acknowledgments
We thank Christophe Pouzat for helpful comments and suggestions, Alan
Litke, Alexander Sher, Matthew Grivich and Dumitru Petrusca for
technical development and Greg Field, Martin Greschner, Jeff Gauthier,
Clare Hulse, and Alexander Sher for experimental assistance.
Author Contributions
Conceived and designed the experiments: JWP JS EPS EJC. Performed the
experiments: JS EJC. Analyzed the data: JWP JS. Contributed reagents/
materials/analysis tools: JWP JS EPS EJC. Wrote the paper: JWP JS EPS
EJC.
References
1. Gerstein G, Clark W (1964) Simultaneous studies of firing patterns in several
neurons. Science 143: 1325.
2. Meister M, Pine J, A BD (1994) Multi-neuronal signals from the retina:
acquisition and analysis. J Neurosci Methods 51: 95–106.
3. Gray CM, Maldonado PE, Wilson M, McNaughton B (1995) Tetrodes markedly
improve the reliability and yield of multiple single-unit isolation from multi-unit
recordings in cat striate cortex. J Neurosci Methods 63: 43–54.
4. Segev R, Goodhouse J, Puchalla J, Berry MJ (2004) Recording spikes from a
large fraction of the ganglion cells in a retinal patch. Nat Neurosci 7: 1155–1162.
5. Blanche T, Spacek M, Hetke J, Swindale N (2005) Polytrodes: High-density
silicon electrode arrays for large-scale multiunit recording. Journal of
Neurophysiology 93: 2987–3000.
6. Brown E, Kass R, Mitra P (2004) Multiple neural spike train data analysis: state-
of-the-art and future challenges. Nature Neuroscience 7: 456–461.
7. Rodieck R (1967) Maintained activity of cat retinal ganglion cells. Journal of
Neurophysiology 30: 1043.
8. Turin G (1960) An introduction to matched filters. Information Theory, IRE
Transactions on 6: 311–329.
9. Sahani M (1999) Latent variable models for neural data analysis. Ph.D. thesis,
Cal Tech.
10. Bar-Gad I, Ritov Y, Vaadia E, Bergman H (2001) Failure in identification of
overlapping spikes from multiple neuron activity causes artificial correlations.
Journal of neuroscience methods 107: 1–13.
11. Takahashi S, Anzai Y, Sakurai Y (2003) Automatic sorting for multi-neuronal
activity recorded with tetrodes in the presence of overlapping spikes.
J Neurophysiol 89: 2245–2258.
12. Schmidt E (1984) Computer separation of multi-unit neuroelectric data.
J Neurosci Meth 12: 95–111.
13. Lewicki M (1998) A review of methods for spike sorting: the detection and
classification of neural action potentials. Network 9: R53–78.
14. Quiroga Q, Nadasdy Z, Ben-Shaul Y (2004) Unsupervised spike detection and
sorting with wavelets and superparamagnetic clustering. Neural Computation.
15. Harris K, Henze D, Csicsvari J, Hirase H, Buzsaki G (2000) Accuracy of tetrode
spike separation as determined by simultaneous intracellular and extracellular
measurements. J Neurophysiol 84: 401–414.
16. Wood F, Black M (2008) A nonparametric bayesian alternative to spike sorting.
Journal of Neuroscience Methods 173: 1–12.
17. Nguyen D, Frank L, Brown E (2003) An application of reversible-jump markov
chain monte carlo to spike classification of multi-unit extracellular recordings.
Network 14: 61–82.
18. Luczak A, Narayanan N (2005) Spectral representation–analyzing single-unit
activity in extracellularly recorded neuronal data without spike sorting. Journal
of neuroscience methods 144: 53–61.
19. Rutishauser U, Schuman E, Mamelak A (2006) Online detection and sorting of
extracellularly recorded action potentials in human medial temporal lobe
recordings, in vivo. Journal of neuroscience methods 154: 204–224.
20. Lewicki M (1994) Bayesian modeling and classification of neural signals. Neural
Computation 6: 1005–1030.
21. Sahani M, Pezaris J, Andersen R (1998) On the separation of signals from
neighboring cells in tetrode recordings. Advances in Neural Information
Processing Systems 10.
22. Pouzat C, Mazor O, Laurent G (2002) Using noise signature to optimize spike-
sorting and to assess neuronal classification quality. Journal of NeuroscienceMethods 122: 43–57.
23. Plexon (2006) Plexon user’s guide, version 2.0. 6500 Greenville Ave., Ste 730,Dallas, TX 75206.
24. Shlens J, Field GD, Gauthier JL, Greschner M, Sher A, et al. (2009) Thestructure of large-scale synchronized firing in primate retina. J Neurosci 29:
5022–5031.
25. Bickel Sahani (2006) Spike Sorting Using a convex relaxation. Master’s thesis,
UCL.
26. Prentice JS, Homann J, Simmons KD, Tkacik G, Balasubramanian V, et al.
(2011) Fast, scalable, bayesian spike identification for multi-electrode arrays.PLoS ONE 6: e19884.
27. Ge D, Le Carpentier E, Idier J, Farina D (2011) Spike sorting by stochasticsimulation. Neural Systems and Rehabilitation Engineering, IEEE Transactions
on 19: 249–259.
28. Marre O, Amodei D, Deshmukh N, Sadeghi K, Soo F, et al. (2012) Mapping a
complete neural population in the retina. The Journal of Neuroscience 32:14859–14873.
29. Shlens J, Field G, Gauthier J, Grivich M, Petrusca D, et al. (2006) The structureof multi-neuron firing patterns in primate retina. J Neurosci 26: 8254–8266.
30. McGill K (2002) Optimal resolution of superimposed action potentials.Biomedical Engineering, IEEE Transactions on 49: 640–650.
31. Johnston D, Wu SMS (1994) Foundations of Cellular Neurophysiology(Bradford Books). The MIT Press.
32. Litke A, Bezayiff N, Chichilnisky E, Cunningham W, Dabrowski W, et al. (2004)What does the eye tell the brain? development of a system for the large scale
recording of retinal output activity. IEEE Trans Nucl Sci : 1434–1440.
33. Mastronarde DN (1989) Correlated firing of retinal ganglion cells. Trends in
Neurosciences 12: 75–80.
34. Shlens J, Rieke F, Chichilnisky E (2008) Synchronized firing in the retina. Curr
Opin Neurobiol 18: 396–402.
35. Field G, Chichilnisky E (2007) Information processing in the primate retina:
circuitry and coding. Annu Rev Neurosci : 1–30.
36. Trong PK, Rieke F (2008) Origin of correlated activity between parasol retinal
ganglion cells. Nat Neurosci 11: 1343–1351.
37. Hastie TJ, Tibshirani RJ, Friedman J (2009) The Elements of Statistical
Learning: Data Mining, Inference and Prediction. Springer.
38. DeVries SH (1999) Correlated firing in rabbit retinal ganglion cells.
J Neurophysiol 81: 908–920.
39. Schnitzer M, Meister M (2003) Multineuronal firing patterns in the signal from
eye to brain. Neuron 37: 499–511.
40. Greschner M, Shlens J, Bakolitsa C, Field GD, Gauthier JL, et al. (2011)
Correlated firing among major ganglion cell types in primate retina. J Physiol589: 75–86.
41. Obeid I, Wolf P (2004) Evaluation of spike-detection algorithms fora brain-machine interface application. Biomedical Engineering, IEEE Transactions on
51: 905–911.
42. Delescluse M, Pouzat C (2006) Efficient spike-sorting of multi-state neurons
using inter-spike intervals information. Journal of neuroscience methods 150:16–29.
43. Gasthaus J, Wood F, Gorur D, Teh Y (2009) Dependent dirichlet process spikesorting. Advances in Neural Information Processing Systems 21: 497–504.
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 13 May 2013 | Volume 8 | Issue 5 | e62123
44. Green DM, Swets JA (1966) Signal Detection Theory and Psychophysics. New
York: Wiley.
45. Gelman A, Carlin J, Stern H (2004) Bayesian data analysis. CRC Press.
46. Green D, Swets J (1974) Signal detection theory and psychophysics. Robert E.
Krieger.
47. Pazienti A, Grn S (2006) Robustness of the significance of spike synchrony with
respect to sorting errors. J Comput Neurosci 21: 329–342.
48. Zhang P, Wu J, Zhou Y, Liang P, Yuan J (2004) Spike sorting based on
automatic template reconstruction with a partial solution to the overlapping
problem. Journal of neuroscience methods 135: 55–65.
49. Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of
the Royal Statistical Society Series B (Methodological) 58: 267–288.
50. Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, et al. (2008) Spatio-temporal
correlations and visual signaling in a complete neuronal population. Nature 454:
995–999.
51. Herbst JA, Gammeter S, Ferrero D, Hahnloser RHR (2008) Spike sorting with
hidden markov models. J Neurosci Methods 174: 126–134.
52. Ekanadham C, Tranchina D, Simoncelli EP (2011) A blind sparse deconvolution
method for neural spike identification. In: Shawe-Taylor J, Zemel R, Bartlett P,
Pereira F, Weinberger K, editors, Adv. Neural Information Processing Systems
(NIPS*11). Cambridge, MA: MIT Press, volume 24, 1440–1448.
53. Pouzat C, Delescluse M, Viot P, Diebolt J (2004) Improved spike-sorting by
modeling firing statistics and burst-dependent spike amplitude attenuation: a
markov chain monte carlo approach. Journal of neurophysiology 91: 2910.
54. Calabrese A, Paninski L (2010) Kalman filter mixture model for spike sorting of
non-stationary data. Journal of neuroscience methods.
55. Shoham S, Fellows M, Normann R (2003) Robust, automatic spike sorting using
mixtures of multivariate t-distributions. Journal of neuroscience methods 127:111–122.
56. Takekawa T, Isomura Y, Fukai T (2010) Accurate spike sorting for multi-unit
recordings. Eur J Neurosci 31: 263–272.57. Calabrese A, Paninski L (2011) Kalman filter mixture model for spike sorting of
non-stationary data. Journal of neuroscience methods 196: 159–169.58. Ventura V (2009) Traditional waveform based spike sorting yields biased rate
code estimates. Proc Natl Acad Sci U S A 106: 6921–6926.
59. Ventura V (2009) Automatic spike sorting using tuning information. NeuralComput 21: 2466–2501.
60. John G, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selectionproblem. In: Proceedings of the eleventh international conference on machine
learning. Citeseer, volume 129, 121–129.61. Mackay D (1992) Information-based objective functions for active data selection.
Neural Computation 4: 589–603.
62. Stevenson I, Rebesco J, Hatsopoulos N, Haga Z, Miller L, et al. (2009) Bayesianinference of functional connectivity and network structure from spikes. IEEE
Transactions on Neural Systems and Rehabilitation Engineering 17: 203–213.63. Gerwinn S, Macke JH, Bethge M (2010) Bayesian inference for generalized
linear models for spiking neurons. Frontiers in Computational Neuroscience.
64. Rebrik S, Wright B, Emondi A, Miller K (1999) Cross-channel correlations intetrode recordings: implications for spike-sorting. Neurocomputing 26: 1033–
1038.65. Petrusca D, Grivich M, Sher A, Field G, Gauthier J, et al. (2007) Identification
and characterization of a y-like primate retinal ganglion cell type. The Journal ofNeuroscience 27: 11019–11027.
Spike Sorting for Removing Correlation Artifacts
PLOS ONE | www.plosone.org 14 May 2013 | Volume 8 | Issue 5 | e62123
Top Related