Pattern Filtering for Detection of Neural Activity, with Examples from HVc Activity During Sleep in...

31
LETTER Communicated by Jonathan Victor Pattern Filtering for Detection of Neural Activity, with Examples from HVc Activity During Sleep in Zebra Finches Zhiyi Chi [email protected] Department of Statistics, Committee on Computational Neuroscience, University of Chicago, Chicago, IL 60637, U.S.A. Peter L. Rauske [email protected] Daniel Margoliash [email protected] Department of Organismal Biology and Anatomy, Committee on Computational Neuroscience, University of Chicago, Chicago, IL 60637, U.S.A. The detection of patterned spiking activity is important in the study of neural coding. A pattern ltering approach is developed for pattern de- tection under the framework of point processes, which offers exibility in combining temporal details and ring rates. The detection combines multiple steps of ltering in a coarse-to- ne manner. Under some con- ditional Poisson assumptions on the spiking activity, each ltering step is equivalent to classifying by likelihood ratios all the data segments as targets or as background sequences. Unlike previous studies, where global surrogate data were used to evaluate the statistical signi cance of the detected patterns, a localized p-test procedure is developed, which better accounts for ring modulation and nonstationarity in spiking ac- tivity. Common temporal structures of patterned activity are learned us- ing an entropy-based alignment procedure, without relying on metrics or pair-wise alignment. Applications of pattern ltering to single, pre- sumptive interneurons recorded in the nucleus HVc of zebra nch are illustrated. These demonstrate a match between the auditory-evoked re- sponse to playback of the individual bird’s own song and spontaneous activity during sleep. Small temporal compression or expansion, or both, is required for optimal matching of spontaneous patterns to stimulus- evoked activity. 1 Introduction In analysis of neural activity, an important problem is detection of spike sequences with certain temporal or spatiotemporal patterns. In some situ- ations, the pattern of interest is prespeci ed, as exhibited, for example, by Neural Computation 15, 2307–2337 (2003) c ° 2003 Massachusetts Institute of Technology

Transcript of Pattern Filtering for Detection of Neural Activity, with Examples from HVc Activity During Sleep in...

LETTER Communicated by Jonathan Victor

Pattern Filtering for Detection of Neural Activity, withExamples from HVc Activity During Sleep in Zebra Finches

Zhiyi [email protected] of Statistics, Committee on Computational Neuroscience,University of Chicago, Chicago, IL 60637, U.S.A.

Peter L. [email protected] [email protected] of Organismal Biology and Anatomy, Committee on ComputationalNeuroscience, University of Chicago, Chicago, IL 60637, U.S.A.

The detection of patterned spiking activity is important in the study ofneural coding. A pattern �ltering approach is developed for pattern de-tection under the framework of point processes, which offers �exibilityin combining temporal details and �ring rates. The detection combinesmultiple steps of �ltering in a coarse-to-�ne manner. Under some con-ditional Poisson assumptions on the spiking activity, each �ltering stepis equivalent to classifying by likelihood ratios all the data segmentsas targets or as background sequences. Unlike previous studies, whereglobal surrogate data were used to evaluate the statistical signi�cance ofthe detected patterns, a localized p-test procedure is developed, whichbetter accounts for �ring modulation and nonstationarity in spiking ac-tivity. Common temporal structures of patterned activity are learned us-ing an entropy-based alignment procedure, without relying on metricsor pair-wise alignment. Applications of pattern �ltering to single, pre-sumptive interneurons recorded in the nucleus HVc of zebra �nch areillustrated. These demonstrate a match between the auditory-evoked re-sponse to playback of the individual bird’s own song and spontaneousactivity during sleep. Small temporal compression or expansion, or both,is required for optimal matching of spontaneous patterns to stimulus-evoked activity.

1 Introduction

In analysis of neural activity, an important problem is detection of spikesequences with certain temporal or spatiotemporal patterns. In some situ-ations, the pattern of interest is prespeci�ed, as exhibited, for example, by

Neural Computation 15, 2307–2337 (2003) c° 2003 Massachusetts Institute of Technology

2308 Z. Chi, P. Rauske, and D. Margoliash

the activity of a neuron during sensory stimulation or motor behavior. Ingeneral, the goal here is to recognize spiking activity of the same neuronat a different time or in a different state that exhibits patterns similar tothe prespeci�ed one. This case is becoming increasingly important in thestudy of sleep, where recent results indicate that spiking activity during re-inforced behavioral learning or behavior that requires sensory feedback was“replayed” in spontaneous activity during sleep (Nadasdy, Hirase, Czurko,Csicsvari, & Buzsaki, 1999; Dave & Margoliash, 2000; Louie & Wilson, 2001).Results of this sort provide evidence for the role of sleep in learning andmemory consolidation that is complementary to studies of how sleep andbehavior modify the correlations between neurons (Wilson & McNaughton,1994; Skaggs & McNaughton, 1996). This letter focuses principally on de-tection with prespeci�ed patterns.

In other situations, no pattern is prespeci�ed. Instead, the goal is to rec-ognize any activity pattern that occurs at a frequency above chance level.Detection of such patterns has been important in the study of the nature ofthe neural code, especially in assessing temporal and rate coding (Abeles& Gerstein, 1988; Abeles, Bergman, Margalit, & Vaadia, 1993; Riehle, Grun,Diesmann, & Aertsen, 1997; Date, Bienenstock, & Geman, 1998; Baker &Lemon, 2000). Detection of “excessive” spike patterns should be a strongermethod than are correlational methods (Vaadia et al., 1995). In principle, allpossible spike sequences have to be accounted for. In practice, this can resultin a serious computational problem. To reduce the complexity of the prob-lem, therefore, typically only sequences with a fairly small number of spikeshave been considered. This computational constraint can have serious con-sequences in cases where coding of natural behavior is under consideration,since target sequences can extend for many seconds of behavior and dozens,if not hundreds, of spikes. The techniques derived in this letter are compu-tationally ef�cient and may therefore also be of value for this second case.Here we do not directly address this problem, however.

To detect patterned spike sequences ef�ciently and reliably, templatematching algorithms have been developed. In these algorithms, exemplarspike sequences observed during speci�c behaviors are used as templates.A commonly used algorithm is the “sliding sweeps” algorithm (Nadasdyet al., 1999; Abeles & Gerstein, 1988; Abeles et al., 1993; Dave & Margoliash,2000). In this algorithm, for each iteration, the segment within a time frame iscompared with the template by counting the number of matching spikes orinterspike intervals (ISIs). This is slow when the template has many spikes,resulting in inef�cient detection for large data sets. Because the detectiontreats matching spikes or ISIs equally, this technique is also insensitive tothe temporal variability of spikes. To expedite detection, one may insteadcompare the �ring-rate functions of the data and the template by cross-correlation (Louie & Wilson, 2001). This method requires several steps ofnormalization and kernel smoothing, which makes it also insensitive to thetemporal details of spike sequences.

Pattern Filtering for Detection of Neural Activity 2309

In this letter, we approach the detection of spiking activity as a specialcase of pattern detection for point processes. This framework provides an ef-fective way to incorporate temporal detail and �ring rates of spiking activityinto pattern detection. It formulates the detection process as classi�cationbased on the likelihood ratio test. Under certain assumptions of the pointprocesses, the classi�cation is equivalent to linear convolution of the entiredata set with a �lter determined by the template. The �ltering leads to sub-stantial computational ef�ciency, as it allows detected patterns (“targets”) tobe located by above-threshold peaks in the �lter response. The quality of thedetection critically depends on the variability of the spiking activity. This isaddressed using multiple �lters and templates and at multiple timescales.The statistical signi�cance of the targets is evaluated by tests using shortprocesses to model the spiking activity around the targets. Such localizedtests address different issues than the more commonly applied tests usinglong processes to model the entire data set (Abeles & Gerstein, 1988; Abeleset al., 1993; Riehle et al., 1997; Date et al., 1998; Baker & Lemon, 2000). Finally,in our approach, the detected targets and the templates can be compared intheir common temporal structure, after learning the temporal structure ofthe templates.

We applied this approach to a small data set of single neurons recordedin the song control nucleus HVc of sleeping zebra �nches. Sensorimotormatching had previously been reported for neurons in the nucleus robus-tus archistriatalis (RA), to which HVc projects (Dave & Margoliash, 2000).Here,we demonstrate a match between each HVcneuron’s auditory-evokedresponse to playback of the individual bird’s own song and the spontaneousactivity of that neuron. Investigating such matching is important for theoriesthat posit a role of sleep in birdsong learning (Margoliash, 2002).

Following is an overview of the main points of the �ltering approach:

1. Conditional Poisson assumption: The �ltering approachassumes con-ditional independence for the point processes. In terms of neural activity,this means that each �nite spike train is randomly generated by a templateor by the background, and for either case, the distribution is Poisson. Bydividing spiking activity into different categories and approximating eachwith a Poisson process, the resulting “hybrid” process, which in general isnot Poisson, can provide a better model of the spiking activity than Pois-son processes. The approach is related to the generalized Hough transform(Rojer & Schwartz, 1992; Amit & Geman, 1999), with the key difference thatit takes into account various models for the target sequences as well as thebackground.

2. Classi�cation and �ltering: A spike sequence is classi�ed as a targetinstead of a background sequence only if the ratio of the correspondinglikelihoods is above threshold in all tests. By the above model, the likelihoodratio is determined by a linear convolution of the sequence with a �lter.The �lter not only limits the amount of temporal discrepancy, or “jitter,”

2310 Z. Chi, P. Rauske, and D. Margoliash

between matching spikes, but also weighs spikes differentially, favoringspikes with small jitter. Henceforth, this convolution will be referred to aspattern �ltering.

3. Multiple �lters: Since each �lter is based on a reasonable but fairlysimple probability model of the spiking activity, it often detects targets thatare not very similar to the template. This limitation is addressed by mul-tiple �lters, which implicitly induce a probability model subject to moreconstraints and hence offer a better description of the spiking activity. Com-putational ef�ciency is maintained by coarse-to-�ne detection (Fleuret &Geman, 2001).

This approach is in contrast to using more sophisticated models. More so-phisticated models reduce computational ef�ciency, especially when com-plicated patterns must be detected from large data sets. Also, more sophis-ticated models do not necessarily result in noticeable improvement in de-tection performance, yet they suffer from potential dif�culty of parameterestimation.

4. Multiple templates: A challenge for spike pattern detection is variabil-ity in target sequences. Spike trains, even when associated with the samestimulus or behavior, often exhibit variability. To allow for this, multipletemplates are used. Detection is �rst conducted using individual templates,independently of the other templates. Then the detected targets across thetemplates are combined.

5. Multiple scales: Systematic changes in spikingactivity can also presentproblems. In some experimental designs, templates collected in one behav-ioral state are used to analyze the activity in a different state. There can besystematic differences between states in the rate that processes evolve; thesedifferences are not accounted for by the templates (Wilson & McNaughton,1994; Nadasdy et al., 1999; Dave & Margoliash, 2000; Louie & Wilson, 2001).To address this, �lters are scaled in time by varying amounts, so that se-quences with similar patterns as the templates up to a temporal scale canbe detected.

6. Training: Some parameters in the detection process are selected bytraining. In some cases, such as the spontaneous activity during sleep, onehas to account for the absence of reliably tagged spike sequences to be usedas training sets. The solution here is to train the detector using simulatedspike sequences based on biologically plausible parameters and then eval-uate the sensitivity of the detection to the parameters.

7. Signi�cance test using simulation: Assessing the p-values of targetsposes a challenging problem, as it is dif�cult to model the probability dis-tribution of the data. We note that the statistical signi�cance of each targetdepends on only the properties of the data around it rather than the globalproperties of the entire data set. Therefore, we develop “local” tests, whichuse simulated spike trains based on the activity around the target. Usinglocal processes, it is easier to account for the rate modulation and nonsta-tionarity of the data.

Pattern Filtering for Detection of Neural Activity 2311

The local tests, however, do not address questions related to the entiretyof the data. Questions such as the statistical signi�cance of the total numberof detected targets require global methods, such as random shuf�ing tests(Abeles & Gerstein, 1988) and unitary event analysis (Grun, Diesmann, &Aertsen, 2002a, 2002b). Thus, the local and global tests can be consideredcomplementary. In this letter, only local tests are considered.

8. Analysis of temporal structure of templates: If templates share a com-mon temporal structure, this may have biological signi�cance. Such struc-ture can be easy to �nd in cases where neuronal activity is reliable and highlyphasic (Dave & Margoliash, 2000), but it may be dif�cult to identify in caseswhere neuronal activity is variable and tonic. To deal with the variability intonic spiking activity, an alignment procedure is developed from an infor-mation point of view. The idea is to minimize the uncertainty in predictingspiking events based on the empirical distributions induced by the tempo-ral registries of the templates, hence forcing their common structure to bealigned. This approach is substantially different from the metric or cross-correlation-based methods, and it is based on considering the templates asa whole rather than pairwise comparisons (Victor & Purpura, 1996; Yu &Margoliash, 1996). In addition, the common temporal structure can provideuseful cues for selection of �lters.

2 Pattern Filtering

In this section, we develop pattern �ltering for spiking activity. The samemethodology can be established for general point processes.

2.1 Conditional Poisson Models for Spiking Activity. Since pattern�ltering is based on a conditional Poisson assumption as described earlier,we need to specify the distributions of background sequences and targetsequences.

For the background, the distribution is simply a Poisson process with aconstant rate. To specify the distribution of target sequences, think of themas being generated by a template. Around each template spike, spikes aregenerated with random jitter. We assume that for all the template spikes,the spike-generating mechanism is the same, and therefore the distributionof the jitter is the same. In addition, “noise” spikes are generated outside theneighborhood of the template spikes. All these spikes considered togetherthen consist of a spike train generated by the template.

More speci�cally, denote by P. f; A/ a Poisson process on A with a ratefunction f . Let S D fs1; : : : ; spg ½ [0; ¾ ] be a template and I D [x; x C ¾ ]be a time interval. A background sequence in I is simply a sample fromP.q; I/, with q a constant. On the other hand, to generate a sequence inI from S, �rst draw Z1; : : : ; Zp independently and identically distributed(i.i.d.) » P. f1; .¡²; ²// for a given rate function f1 and ² > 0. Zn C sn isa sequence in Jn :D .sn ¡ ²; sn C ²/. We take it as the set of jittered spikes

2312 Z. Chi, P. Rauske, and D. Margoliash

generated by sn. The union of all Jn, J DSp

nD1 Jn, is the neighborhood ofthe entire template. Draw X » P . f0; [0; ¾ ]nJ/, and take it as the set of noisespikes. Finally, “insert” T0 D X [

SpnD1.Zn C sn/ into the interval I. The

resulting sequence T D T0 C x is a spike train generated by S in I. We willrefer to a collection of sets X, Zn as a “con�guration” for T. The likelihoodof the con�gurations, rather than the resulting spike trains, will be used inthe classi�cation.

The model incorporates the temporal detail of spiking activity around atemplate spike by f1 and ² > 0, with ² being the maximum jitter of a randomspike generated by a template spike. In most cases, f1.x/ is nonconstant, andwe always assume it is nonincreasing in jxj so that the probability decreasesas the jitter increases. The above setup can be generalized to nonstationarynoise within target spike trains by allowing f0 to be a nonconstant. In thisarticle, however, we will use constant f0.

2.2 Classi�cation by Likelihood Ratio and Pattern Filtering. Based onthe above model, we now consider how to detect targets using one �lter.Later we will discuss how to combine different �lters. Target detection isachieved by classifying segments in a data spike sequence as either targetsor background sequences. For a segment T, let lo.T/ be the likelihood of Tgiven that it is a background sequence, and la.T/ the maximum likelihood (ML)among all con�gurations for T, given that it is a target. Then the followinglog-likelihood ratio is a natural criteria for the classi�cation,

L.T/ D logla.T/

lo.T/:

Note that la.T/ is not the likelihood of T and often is smaller. As a conse-quence, L is not the difference between two rate functions. The choice ofla.T/ reduces the chance of detecting targets that are not similar to the tem-plate. Also, it can be shown that la.T/ has a simpler form than the likelihoodof T and hence is easier to compute.

To explicitly derive L and see how the detection across a long data se-quence T can be accomplished by �ltering, for each interval [x; x C ¾ ], letTx D T \ [x; x C ¾ ], where ¾ is the duration of a template S D fs1; : : : ; spg.It is well known for Poisson processes that lo.Tx/ D qn expf¡q¾ g. On theother hand, by the independence assumption in the model, la.Tx/ D Q0 £Q,where Q0 is the likelihood of the noise spikes in Tx and Q the ML of thejittered spikes generated by the template spikes. With J D

Sk.sk ¡ ²; sk C ²/,

let the background function B and the time window function K be

B.x/ D»

log. f0.x/=q/ if x 62 J0 otherwise;

K.x/ D»

log. f1.x/=q/ if x 2 .¡²; ²/

0 otherwise;(2.1)

Pattern Filtering for Detection of Neural Activity 2313

Figure 1: Construction of a �lter. K.x/ is a function de�ned on .¡²; ²/. S is anexemplar spike sequence. In this case, it happens that B.x/ ´ K.²/ D K.¡²/. The�lter F is plotted as the dark curve.

respectively, and F.x/ D F.xI S/ as a function on [0; ¾ ], such that (see Fig-ure 1),

F.x/ D»

maxs2S K.x ¡ s/ if x 2 J;B.x/ otherwise:

(2.2)

Regard T D ftng as a series of ± functions, that is, T.t/ DP

±.t ¡ tn/ (Rieke,Warland, de Ruyter van Steveninck, & Bialek, 1997). Then

L.Tx/ D R.x/ C C; for any x; (2.3)

where C is a constant independent of S and x, and

R.x/ D R.xI S; T/ DZ ¾

0F.¿ /T.x C ¿ / d¿:

Therefore, as x runs across T, up to a constant, L.Tx/ consists of a linearconvolution of T. Thus, targets can be ef�ciently detected using �lteringinstead of class�cation segment by segment. The proof of equation 2.3 isgiven in the appendix.

In practice, we select K and B by hand. We use local peaks in the �lterresponse R.x/ to detect targets. Given prespeci�ed µ > 0 and r > 0, �nd all

2314 Z. Chi, P. Rauske, and D. Margoliash

temporal points x1; : : : ; xL satisfying

(Local maximum) R.xj/ ¸ R.x/; x 2 .xj ¡ r; xj C r/ (2.4)

(Above threshold) R.xj/ ¸ µ: (2.5)

The spike sequences in [xj; xj C ¾ ] are considered potential targets.The �ltering also applies to multiple point sequences, such as simulta-

neous activity of multiple neurons (Chi, 2003). Suppose S D .S1; : : : ; Sp/

is a set of spike sequences simultaneously recorded from p neurons in thetime interval [0; ¾ ]. Let Fk.x/ D F.xI Sk/. Given a simultaneous recordingT D .T1.x/; : : : ; Tp.x// from the same set of neurons, segments in T withspatiotemporal pattern similar to S are located by equations 2.4 and 2.5,with

R.x/ DXp

kD1

Z ¾

0Fk.x/Tk.x C ¿ / d¿:

It can be shown that both the sliding sweeps algorithm and cross-corre-lation are special cases of linear convolution. One can choose K.x/ D 1 andB.x/ D ¡M for large M > 0 to achieve the same effect as the sliding sweepsalgorithm. Cross-correlation is essentially convolution of the recti�ed spikecounts of the data sequence with that of the template.

3 Multiple Pattern Filtering Procedures

As we argue in section 1, to improve the detection of spiking activity, mul-tiple pattern �lters, templates, and temporal scales can be incorporated.Figure 2 illustrates how the detection process proceeds. For each templateSn, every segment of the data sequence T is matched with it using severaltypes of �lters, one after another. If the response of the segment to one of the�lters is subthreshold, it is classi�ed as a background sequence and will notbe treated by subsequent �lters. To account for time scaling in the activity,the segment is matched with the template at different time scales. The out-puts associated with the template, therefore, are segments that match thetemplate at some time scale. Among the segments, only those with small p-values under a test P at the end of the channel are output as targets. Finally,because a segment may be detected multiple times, by �lters associated withdifferent templates, the target outputs from the channels are combined, sothat only one is chosen from overlapping targets.

Throughout the analysis, detection, training, and signi�cance testing fol-low the same protocols. For all the templates, the associated �lters are con-structed with the same time windows and background functions and arescaled by the same set of factors. Likewise, training for different templatesresults in different parameter values. The variability of the templates thusis incorporated throughout the detection process. This makes the processmore robust to variability, a desirable property for pattern detection.

Pattern Filtering for Detection of Neural Activity 2315

Figure 2: Detection by multiple templates, �lters, at multiple scales. T is a dataspike sequence. Sk are templates, for all of which the same set of window andbackground functions are used to contruct pattern �lters. Fkm are �lters con-structed from Sk, using the mth window function and background function. µkm

are associated thresholds. Only segments in T with above-threshold responseto Fkm are considered potential targets. cs are scaling factors applied to all Sk . Pk

are signi�cance tests on targets detected by �lters for Sk. Only targets with highsigni�cance are kept. To avoid overcounting, D is a procedure to choose onlyone target from each set of overlapping targets detected across the channels.

3.1 Detection at Multiple Levels

3.1.1 Multiple Filter Types. A pattern �lter enforces some constraints onthe statistical assumptions regarding spiking activity. As long as the �lteris constructed based on empirical evidence, the constraints are biologicallyplausible. Multiple �lters induce more constraints, which may lead to amore realistic probability model of spiking activity and, hence, better de-tection. The resulting model is only implicitly speci�ed; however, it allowsfor ef�cient computation for detection.

Given a template, using different time window and background func-tions, multiple �lters can be constructed from the template. The squarewindow function,

K.x/ D 1.¡²;²/.x/ D»

1 if x 2 .¡²; ²/

0 otherwise;(3.1)

and B.x/ ´ 0 are among the simplest choice. One may also apply other timewindow and background functions for a �lter, such as

K.x/ D» 1

2 .1 ¡ ¯/ C 12 .1 C ¯/ cos ¼x

²; if x 2 .¡²; ²/

¡¯ otherwise;

B.x/ ´ ¡¯; ¯ ¸ 0: (3.2)

K essentially is a Hamming window. Filters made from equation 3.2 arecontinuous and nonconstant around template spikes. Another reasonable

2316 Z. Chi, P. Rauske, and D. Margoliash

choice includes “cropped” versions of K,

NK.x/ D min.®; K.x//; ® 2 .0; 1/; (3.3)

which can be thought of as mixtures of square windows and Hammingwindows.

Computational ef�ciency is maintained if the �lters are applied in acoarse-to-�ne manner (see Figure 2). To detect targets, the �rst �lter is ap-plied to the entire data spike sequence to detect segments with bare sim-ilarity to the template. In actual computation, this step is fast, requiringconvolving the �lter and the data, both discretized only at a coarse reso-lution. Subsequent �lters are used only for the detected segments. In thesesteps, the �lter and the segments are discretized at �ner resolution. Anysegment that yields a below-threshold response is classi�ed as backgroundand is excluded. Since the total duration of the segments is much shorterthan the entire sequence, the subsequent steps of �ltering are also fast.

3.1.2 Multiple Scales. Spiking activity often exhibits across-trial nonsta-tionarity, which is different from across-time nonstationarity (Chi & Mar-goliash, 2001; Grun et al., 2002a). A remarkable type of across-trial nonsta-tionarity is temporal scaling, which is observed across different states ofthe brain (Nadasdy et al., 1999; Dave & Margoliash, 2000; Louie & Wilson,2001). To take this into account in the detection process, we use �lters scaledin time at several levels.

To detect targets in T at a time scale c, for each �lter F, use Rc.x/ DR ¾0 F.c¿ I S/T.xC¿ / d¿ in place of equation 2.3. To account for the scaling, the

radius r in equation 2.4 is changed to cr, while the threshold µ in equation 2.5remains the same.

The coarse-to-�ne multiscale procedure is summarized as follows. Givena template of duration ¾ , suppose M �lters are applied, with correspondingthresholds µ1; : : : ; µM, and the detection is conducted at D time scales. Then,for each scale cn,

Filter T by F1.cnt/ to get R1Ln à fx : R1.x/ ¸ µ1; and R1.x/ ¸ R1.t/,

for any t 2 .x ¡ cnr; x C cnr/gfor m D 2; : : : ; M

for x 2 LnFilter T \ [x ¡ cnr; x C cn¾ C cnr] by Fm.cnt/ to get RmRemove x from Ln if Rm.t/ < µm for all t

Output Ln

(3.4)

3.1.3 Multiple Templates. The conditional Poisson model describes thevariability of spike sequences resulting from small random perturbations ofa template. On a more global scale, there can be variability across the tem-

Pattern Filtering for Detection of Neural Activity 2317

plates that is not accounted for by the Poisson model. When this happens,multiple templates should be incorporated.

Suppose M pairs of time window and background functions .Km; Bm/

are given. Then for template Sn, n D 1; : : : ; N, we repeat the detection,equation 3.4, using �lters F.n/

m constructed from Sn and .Km; Bm/. The targetsets associated with different templates are combined (see section 3.4). Foreach template, the thresholds are set relatively high, so only a small numberof segments can be detected. Thus, the detection for each template is quitespeci�c. However, with many templates being incorporated, the overalldetection covers a reasonably wide range of target activity.

Detection is faster using mean �lters QFm D 1N

PNnD1 F.n/

m for each pair.Km; Bm/. However, since the average �lters do not represent the variabilityof the templates, this approach has the same drawback as the detectionusing only a single template. Thus, we do not advocate using mean �lters.

3.2 Training. The goal of training is to select thresholds for �lter re-sponses in order to calibrate the detection. For each �lter, the threshold de-pends on the responses of two training sets: one consisting of backgroundactivity and the other target activity. In principle, both should be sampledunder similar conditions (i.e., behavioral states); for example, if the �lteris meant to detect replayed patterns in spontaneous activity during sleep,the training data should consist of such activity. The problem is that unlikein supervised learning, often there is no reliable way to tag the sample ac-tivity as background or targets. As an alternative, simulated sequences aresubstituted as training data.

To simulate the responses of background activity to a �lter F, sequencessampled from a homogeneous Poisson process of the same duration as thecorresponding template are convolved with F. The rate of the process is theaverage �ring rate of a long neuronal trace recorded from the same state asthe data spike sequence, which is reasonable when targets are rare events.For each sample, register the peak value of its convolution with F. Then let®.F/ D the 99th percentile of the peak values.

To simulate the responses of target activity to F, randomly modi�ed tem-plates are used as simulated targets. Given a template S, a simulated targetis generated in four steps:

1. Random deletion of each spike in S with a small probability.

2. Random shift of each remaining spike in S by distance x » N.0; ±/.

3. Add a sample from a Poisson process with low density to S as noisespikes.

4. Random scaling of S by e¡» , with » » Uniform.¡²; ²/.

Given a percentile ¼ ¸ 50 only depending on the time window andbackground functions used by F, let SF be the set of the peak responses to

2318 Z. Chi, P. Rauske, and D. Margoliash

F of the simulated targets. Let ¯.F/ D the ¼th percentile of SF. Then thethreshold for F is chosen to be

µ D µ .F/ D maxf®.F/; ¯.F/g: (3.5)

This threshold is high enough to exclude most of the background activityand to ensure that the detected targets are similar to the template.

3.3 Signi�cance Test. It is often the case that the average �ring ratearound a target is signi�cantly different from the average �ring rate acrossthe data sequence. This raises the possibility that some targets might beartifacts generated by short episodes, which have constant but signi�cantlydifferent �ring rates from their surroundings. This possibility cannot beruled out by pattern �ltering. To control for it, some statistical test of thesigni�cance of the targets is needed.

Suppose the sequence in [t0; t1] is detected. To test whether the targetis just a segment of constant rate activity, the following procedure is con-ducted,

1. Calculate the average �ring rate r within an interval J containing[t0; t1].

2. Sample N sequences from a Poisson process with density r on J, withN À 1.

3. For each sampled sequence, repeat the same detection procedureacross all timescales.

4. Count the number n of sequences in which targets are detected at anyof the scales.

5. Output p D n=N as the p-value under the null hypothesis.

To account for the physiological limit of a neuron, in actual implementa-tion, before step 3, spikes with temporal distance from preceding spikes lessthan a certain value are deleted from the sampled sequences. The modi�edsequences are not random samples from a Poisson process anymore, butstill are derived from a process with a constant �ring rate.

3.4 Ambiguity Solving. If the templates are exactly the same, then ata given temporal scale, the detected targets should be the same across thetemplates. Due to the variability of the templates, targets detected with dif-ferent templates often overlap in time but are not exactly the same. Thus,a short time interval can be associated with multiple targets, causing am-biguity of detection. Detection at similar temporal scales may also lead toambiguity.

To avoid overcounting, it is sensible to choose only one from overlappingtargets. A reasonable criterion is the p-value of a target obtained from the

Pattern Filtering for Detection of Neural Activity 2319

test in section 3.3, so that among overlapping targets, only those with thesmallest p-value are kept. When two or more targets remain, one can adoptsome ad hoc criteria to choose between them.

4 Information-Based Alignment of Samples

Given a sample of spike sequences, usually associated with the same stim-ulus or behavior, temporal alignment refers to adjustment of the temporalregistry of spikes under certain constraints, so that a characteristic commontemporal pattern of the spike sequences can be revealed, which otherwiseis dif�cult to observe (see Figure 6).

There are several alignment methods. One is based on dynamical pro-gramming, which combines aspects of rate coding and temporal coding(Victor & Purpura, 1996). In this method, there are no “hard” constraints onthe alignment. Instead, any adjustment of the temporal registry is allowedbut with a certain cost. The optimal adjustment is the one that minimizesthe total cost. Another method cross-correlates the spike sequences or theassociated behaviors (such as spectrograms of vocalizations) and registersthe spikes when the cross-correlation reaches maximum (Yu & Margoliash,1996). In this method, the ISIs of the spike sequences are kept constant, andthe only way to adjust the temporal registry of the spikes is rigid shifting ofthe entire sequences. Furthermore, both methods make pair-wise compar-isons of templates. This leads to a series of local optimum, which does notguarantee achieving a global optimum. A third method aligns templates byminimizing a global measure of distance. The method works well when thespiking activity is highly phasic (Chi & Margoliash, 2001). However, it isnot as well suited for tonic activity.

The above alignment methods are based on the idea of reducing certainmeasures of distance among spike sequences. The alignment introducedhere is from a different perspective. To illustrate this, we consider the fol-lowing problem.Suppose we have the temporal registries of spike sequencesS1; : : : ; SN, and we �nd that at each time t, a certain event is observed insome of the spike sequences but not in the others. Assume these spike se-quences are collected under the same conditions. One may ask, If we areto collect another spike sequence S under the same conditions, then, basedon S1; : : : ; SN, how well can we predict that the same event will occur in Sat t?

Intuitively, at any time where common temporal structure among Sn oc-curs, which can be excitation, inhibition, or something else, an event willhave a distribution concentrated around 1 or 0. Based on the distribution,the uncertainty in predicting the event at that time is low. Consequently,the overall uncertainty in predicting events across time is good indicationof how well the common temporal structure of Sn is expressed across time,which clearly depends on the temporal registries of Sn. The lower the un-certainty is, the more explicitly the common structure is expressed. In infor-

2320 Z. Chi, P. Rauske, and D. Margoliash

mation theory, uncertainty is measured by entropy. The alignment can thusbe achieved by reducing the total entropy of the events across time.

In actual computation, to deal with the problem of sparse data, it isimportant to choose events appropriately. For each time t, we de�ne

At D fAt time t, spiking occurs in [t ¡ 12 ; t C 1

2 /g;

Xt D Xt.S/ D»

1 if At happens to S0 otherwise : (4.1)

Note that At does not involve the exact number of spikes. The advantage ofusing equation 4.1 is that Xt is a Bernoulli random variable, so the estimationof P.Xt D 1/ does not require a large number of templates. The estimate ofP.Xt D 1/ is

OPt D OPt.S1; : : : ; SN/ DNumber of Si having spikes in [t ¡ 1

2 ; t C 12 ]

N

D 1N

NX

nD1

Xt.Sn/; (4.2)

with estimated stdq

OPt.1 ¡ OPt/=N ¡ 1. The entropy of the Bernoulli randomvariable is equal to

Oht D ¡ OPt ln OPt ¡ .1 ¡ OPt/ ln.1 ¡ OPt/; (4.3)

with 0 ln 0 de�ned to be 0. De�ne

H.S1; : : : ; SN/ DZ

Oht dt; (4.4)

which is an upper bound of the joint entropy of the process Xt, t 2 R(theorem 2.6.6, Cover & Thomas, 1991). The information-based alignmentof S1; : : : ; SN is to �nd t1; : : : ; tN, such that

H.S1 C t1; : : : ; SN C tN/

D minfH.S1 C ¿1; : : : ; SN C ¿N/ : ¿1; : : : ; ¿N 2 .¡1; 1/g: (4.5)

For actual data, H.S1 C ¿1; : : : ; SN C ¿N/ is a complex function in f¿ng. Itcan be minimized by stochastic annealing (Geman & Geman, 1984) ratherslowly. Alternatively,we approximateits minimum by a randomized greedyprocedure. For each cycle, in a random order, we shift Sn, one at a time,to minimize gn.t/ D H.S1; : : : ; Sn C t; : : : ; SN/ and update Sn to Sn C t ift minimizes gn. We then carry out multiple such cycles of minimization,

Pattern Filtering for Detection of Neural Activity 2321

choosing a different random order at each cycle, until H.S1; : : : ; SN/ cannotbe reduced by shifting individual Sn.

Updating Sn in a random order reduces the possibility of settling intoa local minimum. Experience indicates that many cycles are not required.Usually 10 to 15 cycles suf�ce. Although it is possible to reduce the entropyfurther by annealing, as a practical matter for the data we analyzed, this didnot seem to appreciably improve the alignment and was not implemented.

We now remark on some features of the information-based alignment.The most signi�cant difference between the information-based alignment ascompared to the previous pairwise alignment procedure is that information-based alignment is based on the templates as a whole, which naturallyallows an information-theoretic interpretation. In contrast, pairwise align-ment puts emphasis on the common structure between pairs of sequences,which may not be the common structure of all the sequences. This leads toa series of local minima. One can imagine that the �ring-rate function in-duced from aligned templates should be “clumpy” instead of �at. Entropyis an objective measure of the clumpiness.

Second, the de�nition of the event At involves a parameter 1. If 1 istoo large, then by equation 4.2, OPt ¼ 1 for most t over the time inteval ofS1; : : : ; SN, and hence H.S1; : : : ; SN/ ¼ 0. On the other hand, if 1 is toosmall, because of the variabily in spike timing, no matter how the temporalregistries of S1; : : : ; SN are adjusted, for most t, the number of Si with spikesin [t ¡ 1

2 ; t C 12 ] is at most 1. It is then not hard to see that H.S1; : : : ; SN/ ¼

CN log N, where C is the total number of spikes in S1; : : : ; SN . In eithercase, the associated entropy is dif�cult to reduce, and hence fSng will not bealigned. For HVc activity, we got satisfactory result when 1¡1 was at thesame order of the maximum �ring rate. Presumably, optimal 1 depends onthe intrinsic temporal precision of spiking activity (Victor & Purpura, 1996).This is subject to further study.

Third, in the optimization, equation 4.5, only rigid shifts are allowed forSn. The reason is twofold. Since the internal structure of Sn is maintained, thealignment can reveal systematic change in spiking activity that otherwisewould not be revealed by shifting individual parts (Chi & Margoliash, 2001).Presumably, by taking into account time scaling, alignment of temporallymorphed templates may yield better results. However, if all the templatesare collected under the same conditions, there is no compelling argumentwhy morphing of the templates is more reasonable than not doing so. Nev-ertheless, depending on problem at hand, even with rigid shift, individualparts of Sn can be shifted relative to the others (Yu & Margoliash, 1996).

5 Experiments

We tested pattern �ltering on neuronal data collected from the HVc of thezebra �nch (Taeniopygia guttata). The song system is a standard object in the

2322 Z. Chi, P. Rauske, and D. Margoliash

study of neural mechanisms of vocal learning, production, and maintenance(Brenowitz, Margoliash, & Nordeen, 1997). HVc is a nucleus in the forebrainof the birdsong system. It plays an important role in vocal learning, audi-tory input integration, and higher-level motor commands for vocalization.HVc directly projects to another forebrain nucleus, robustus archistriatalis(RA). Recently, it was found that in sleeping zebra �nches, the spontaneousspiking activity of RA neurons occasionally exhibited spike bursting pat-terns similar to the premotor activity patterns that the same neurons exhibitduring singing and in response to auditory playback of the bird’s own song(BOS) during sleep (Dave, Yu, & Margoliash, 1998; Dave & Margoliash,2000). This “replay phenomenon” during sleep is hypothesized to play animportant role in learning and memory consolidation of the birdsong sys-tem (Margoliash, 2001). Since RA receives inputs from HVc, it is natural toconsider whether the replay phenomenon also occurs in the spontaneousactivity of HVc and is potentially the source of the replay in the spontaneousactivity of RA. Recent recordings in sleep-induced birds from RA-projectingHVc neurons support this interpretation (Hahnloser, Kozhevnikov, & Fee,2002). Furthermore, the activity in HVc interneurons, which are the pre-sumed class of neurons analyzed in this article, has far more variabilitythan the activity in RA (Mooney, 2000; Shea, Rauske, & Margoliash, 2001),and therefore the application of pattern �ltering to HVc data is both bio-logically interesting and a useful test of the effectiveness of the techniquesdeveloped in previous sections.

The steps of the detection are (1) template selection, (2) training of the de-tector as described in section 3.2, and (3) the multiple detection proceduresin section 3.1.

During sleep, HVc auditory responses to BOS often exhibit considerablecross-trial variability. Some unobserved variable, such as phase of intrinsicbursting within the song system or different stages of sleep, presumablyis related to the variability of auditory responses. From the raster plot inFigure 3, it can be seen that in many trials, the auditory responses were quiteweak and less structured. To achieve effective detection, sample sequencesassociated with robust responses and clearer structure were chosen. First,sequences without enough spikes (< 20) were removed from the sample.Templates were then chosen from the remaining sequence. In general, if asequence did not match well with the other sequences—for example, onaverage there were too many nonmatching spikes—then it was not chosenas a template.

5.1 Results. We analyzed the sleep activity of seven single units in HVcof four zebra �nches (two units per bird for three birds and one unit forone bird; see Table 1). For each unit, the data consisted of activity that wascollected in a large number of trials while the animal slept. At the begin-ning of each trial, a recording of BOS was broadcast to the animal, andneuronal responses to the stimulus were collected. The spontaneous activ-

Pattern Filtering for Detection of Neural Activity 2323

Figure 3: Auditory responses of HVc neuron unit 1 to multiple repetitions ofthe same motif in BOS. (Top) Raster plot of the spike sequences of auditoryresponses, starting at the onset of the motif. The spike sequences are alignedat the onset. The duration of each spike sequence is about 500 ms. (Bottom)The �ring-rate function of the spikes sequences computed using a disjoint timewindow of size 5 ms (Dayan & Abbott, 2001).

Table 1: Results of Detection.

Bird Unit T (ms) N n

hv31 1005 1 504 § 5 5 0.682 504 § 5 30 4.08

hv33 1106 3 620 § 0 31 1.274 570 § 0 21 0.86

hv39 1014 5 659 § 4 21 0.466 661 § 4 50 3

hv40 0626 7 665 § 3 3 0.82

Notes: T: Duration of template. N: Number of detected spikesequences. n: Average number of detected sequences perminute.

ity of the neuron was collected in the intervals between presentation ofBOS (approximately once every 20 § 6:5 s). The purpose of the experimentswas to investigate if auditory responses were replayed during spontaneousactivity of sleep in HVc.

For each bird, the recording of BOS had one or two renditions of a motif(sequence of repeated syllables). Templates were selected from the auditoryresponses of the HVc units to the renditions. Template duration T is de�nedas the duration of the corresponding stimulus. For each unit, we chose onetime interval during which its responses stayed relatively strong across thetrials and selected templates from those responses. Thus, mean.T/ can differfor units of the same bird.

2324 Z. Chi, P. Rauske, and D. Margoliash

Table 1 lists the total numbers and temporaldensities of detected spike se-quences from spontaneous activity that exhibited similar temporal patternsas the templates. Across the units, �lters constructed with the followingthree types of window functions and background functions were appliedsequentially: square window (see equation 3.1), Hamming-type window(see equation 3.2), and cropped versions of Hamming windows (see equa-tion 3.3). The same set of parameter values for pattern �lters was used forthe results reported in Table 1. It is noteworthy that for each of the units an-alyzed some spontaneous patterns were found to match templates drawnfrom auditory responses to BOS.

Next, we show the results for two units in more detail, each involving asingle HVc neuron of a different zebra �nch.

5.1.1 Unit 1. The BOS consisted of two renditions of a motif and wasbroadcast in 63 trials, leading to 126 spike sequences of auditory responsesto the playbacks of the motif. Figure 3 displays all the spike sequences,which are aligned by the onset of the motif.

Of the 126 spike sequences, 48 were chosen as templates. Figure 4A showsthe raster plot and estimated �ring-rate functions for these trials. The tem-plates were then aligned by minimizing entropy. The raster plot and the�ring-rate function of the aligned templates are shown in Figure 4B. Theraster plot of the aligned templates clearly reveals more temporal structurethan for the onset-registered templates. The standard error of the shifts ofthe aligned templates was about 5 ms, with the difference between the max-imum shift and the minimum shift equal to 30 ms. The average shift is notimportant, because only the relative shifts of the templates matter for thealignment.

To reduce the amount of computation, 30 of the 48 spike sequences arefurther selected. The constructed pattern �lters were scaled at �ve discretelevels, c D 1; 0:8; 0:9; 1:1, and 1:25, and convolved with the data spike se-quences following, as described in section 3.1.

For each aligned template, detection was conducted for the recordedspontaneous activity across all 63 trials, each lasting 7 seconds. A total of 22spike sequences, which yielded above-threshold responses to all the �lters,were found. Among them, 8 were highly signi�cant (p · 0:01) by the testdescribed in section 3.3. After disambiguation, only 5 remained and wereoutput as targets. In Figure 5, all of the targets are displayed along with thetemplates used to detect them. In order to be compared with the templates,the targets were scaled in time at the levels at which they were detected.The variability among the templates is obvious, which argues for detec-tion based on individual templates instead of some form of average of thetemplates.

5.1.2 Unit 3. Unlike the previous case, the BOS for this bird containedonly one rendition of a motif lasting about 620 ms and was broadcast to

Pattern Filtering for Detection of Neural Activity 2325

Figure 4: Raster plot and empirical �ring rate function of selected sample se-quences from unit 1, before alignment (A) and after alignment (B). The �ringrates were estimated by the same method as in Figure 3. Note the different scalesof the �ring-rate functions in A, B, and Figure 3.

the sleeping bird in 126 trials. Figure 6A is a raster plot of the 126 spikesequences of auditory responses to the motif, aligned by the onset of theplayback of the motif. There is an obvious change in the latency of theresponses. Of the 126 spike sequences, 76 were selected as templates. Onaverage, there were 29.4 spikes in each template. Figures 6B and 6C are theraster plots of the selected sequences before and after they were alignedby minimizing entropy. As for the previous neuron, the information-basedalignment reveals some noticeable common patterns of the responses.

The data of spontaneous activity during sleep were about 25 minutesin total duration. Using 30 of the 76 selected sequences as templates, 31nonoverlapping targets with p-value · 0:01 were detected. The relativelylarge number ofdetected targets allowed us to have a moreglobal viewof thetemporal pattern of the spontaneous activity, as compared to the auditoryresponses. First, for each target detected at scale c, its scaled version by factor1=c was generated. Then the scaled targets were combined into a raster plot,

2326 Z. Chi, P. Rauske, and D. Margoliash

Figure 5: Templates and targets (unit 1). In each group, the �rst trace is a tem-plate; the others are targets detected with �lters built from the template. Thetargets are scaled in time in order to be compared with the templates (in A,by 1

0:9 , 11:25 , 1

1:25 , respectively; in B, by 11:1 ; in C, by 1

1:25 ). The line at the bottomrepresents an interval of 500 ms. No spikes occurred in the blank regions in thetraces.

Figure 6: (A) Raster plot of 126 sample spike sequences of HVc neuron unit 3 inresponse to playbacks of the BOS. All the sequences are about 620 ms long andaligned at the onset of the BOS. (B) Raster plot of 76 selected sequences used astemplates. The order of the sequences is the same as in the original sample. (C)Raster plot of the same 76 spike sequences, after they are aligned by entropyminimization. (D) Raster plot of 31 spike sequences in spontaneous activity ofthe same neuron during sleep. Among them, 1 is scaled by 1

0:9 , 7 by 11:1 , and 13

by 11:25 . The others are unscaled.

Pattern Filtering for Detection of Neural Activity 2327

with those detected in the earlier part of the spontaneous activity plotted ontop of those detected in the later part. The raster plot is shown in Figure 6D,which clearly demonstrates the similarity between the responses to BOSand the spontaneous spike sequences.

5.2 Sensitivity to Parameters. To test how different values of the pa-rameters affect the detection, we compared the results of the detection bychanging the value of one of the parameters while �xing the others. Thisrequired the entire detection procedure, including training and the signi�-cance test, to be rerun.

The main parameters are those for the time window and backgroundfunctions, speci�cally,

² The window sizes ²1 for the square window (see equation 3.1), ²2 for theHamming window (see equation 3.2), and ²3 for the croppedHammingwindow (see equation 3.3)

² The background values ¯1, ¯2, ¯3, respectively, associated with theabove window functions to construct the �lters.

Because of the similarity between the Hamming window function and itscropped versions, we always set ²2 D ²3 and ¯2 D ¯3. In both cases reportedin the previous two sections, the values of the parameters were ²1 D 4 ms,²2 D ²3 D 5 ms, ¯1 D ¡0:3, and ¯2 D ¯3 D ¡0:4.

Suppose targets in time intervals Jk , k D 1; : : : ; K were detected with theabove parameter values. To analyze the targets detected when one of theparameter values was changed, they were classi�ed into three categories.A target detected in interval I is classi�ed as new if I \ Jk D ; for all k. It isclassi�ed as already detected if it has a signi�cant large overlap with someJk. Speci�cally, the overlap between I and Jk covers more than c D 4=5 of Jk.Otherwise, it is classi�ed as overlapping. In general, if the number of alreadydetected targets is large, then it indicates that different parameters lead toconsistent detection. The larger c is, the harder it is to classify a target asalready detected. We also tested c D 9=10, and the results were very similar.

For the �rst HVc neuron analyzed here (unit 1), we �rst doubled ²1 to8 ms while keeping the other parameters unchanged. Following the sameprocedures for training, detection, and signi�cance test, seven targets weredetected. Among them, two had been previously detected, while the other�ve were new, shown in Figures 7A through 7E. When ²1 was reset to 4 ms,while ²2, ¯1, or ¯2 was doubled, all but one target were already detectedeither in Figures 5 or 7A through 7E. The new target is displayed in Figure 7F.

We applied the same procedure to data for unit 3. Table 2 collects theresults. Each column summarizes to the detection with the value of thecorresponding parameter being doubled, while keeping the others �xed.

The results show that different parameter values have some, but not alarge, impact on detection. On the one hand, new targets may be detected.

2328 Z. Chi, P. Rauske, and D. Margoliash

Figure 7: Templates and corresponding targets detected from the same data asFigure 5, but with the window size of one of the time window functions doubled(see section 5.2).The targets are scaled by 1

1:1 , 11:25 , 1

1:1 , 11:25 , 1, and 1

1:31 , respectively.As a window size increases, more jitter is allowed in detection, leading to lessclose matching of some of the detected targets and the corresponding templates.

Table 2: Breakdown of Targets Detected for the Second HVc Neuron by Dou-bling the Value of One Parameter, as Compared with Targets Detected When²1 D 4 ms, ²2 D ²3 D 5 ms, ¯1 D ¡0:3, and ¯2 D ¯3 D ¡0:4.

²1 ²2 ¯1 ¯2

Already detected 18 25 16 14Overlapping 5 4 4 1

New 5 7 5 5Total 28 36 25 20

On the other, when the number of targets is relatively large, most of thetargets detected using one set of parameters can also be detected using dif-ferent sets of parameters, up to a relatively small difference in time. Thisis not unexpected, because different window sizes correspond to differentstatistical models on spiking activity. Furthermore, window sizes also af-fect the training and signi�cance test on the targets. These changes, whencombined, lead to different detection results. It is thus a good strategy to try

Pattern Filtering for Detection of Neural Activity 2329

different parameter values that are biologically plausible and report targetsthat are signi�cant under a speci�c statistical test.

6 Discussion

We have described a novel approach to spike sequence detection: pattern�ltering. Under the framework of temporal point processes, the approachincorporates (1) conditional Poisson processes to describe spiking activity,(2) detection by classi�cation based on likelihood ratio, (3) classi�cation by�ltering, (4) multiple levels of detection, and (5) information-based tempo-ral alignment of spike trains. Pattern �ltering has signi�cant computationaladvantages. First, it effectively detects spike sequences with a speci�c tem-poral pattern. Second, it allows the training and signi�cance testing of thedetection to be done ef�ciently, which otherwise would normally involvesubstantial computation. To improve pattern �ltering, there are several is-sues to be addressed.

6.1 Statistical Signi�cance of Targets. When a target is detected, it isnecessary to make sure that it is not likely to have resulted by chance frombackground activity. One thus needs to test the p-value of the target: the like-lihood of the target if it was generated by the background. Unfortunately,the probability distribution of the background activity is often poorly un-derstood, making theoretical evaluation of the p-value infeasible.

The approach we take to address the problem is to use local processes assurrogates for the background activity (see section 3.3). The processes aremodi�ed homogeneous Poisson with short ISIs being removed to accountfor the neuronal refractory period. It is well known that a homogeneousPoisson process in general is not a good model for spiking activity (Riehleet al., 1997). At least, it does not account for rate modulations and possiblenonstationarity of the spiking activity. This is a reason for using local pro-cesses, which can approximate the probability distributions around targetsbetter than a global, homogeneous Poisson model does. For HVc, this choiceof surrogates seems to work reasonably well. For other systems, however,different surrogate processes may be more appropriate. For example, incases where neurons exhibit very low-frequency discharge but occasionallyburst strongly, a Poisson distribution will not �t the data well, and a differ-ent distribution should be considered. In any case, localized processes willnot only be suf�cient, but will be better than a global surrogate process forthe p-value tests of individual targets.

The detection considered here does not address another aspect of thestatistical signi�cance of targets: their global properties, such as the totalnumber of targets. This has been a challenge for quite a long time, becauseit requires modeling of the entire data set (Abeles & Gerstein, 1988; Abeleset al., 1993; Date et al., 1998; Baker & Lemon, 2000; Grun et al., 2002a). In thelight of the above comments, one possible approach to the global statistical

2330 Z. Chi, P. Rauske, and D. Margoliash

signi�cance of targets is to use a hierarchy of processes. Intuitively, thismeans that for each data segment, which may have random onset and offset,we use a local process to model it, and the process at the top of the hierarchyspeci�es what type of process to use for the segment as well as when thesegment starts and ends.

6.2 Multiple Filtering. The basic procedure in our multilevel detectionis serial (see Figure 2), as the �lters in each channel are arranged in theform F1 ! F2 ! ¢ ¢ ¢ ! Fn, with F1 having a coarser temporal resolution(i.e., sampling rate) than the others. This feedforward arrangement of �lterscan be changed into a parallel one, whereby each Fk outputs to a commonintegrator G independent of the others. Since the coarser temporal resolutionof F1 makes it faster to compute its output, G will �rst check the output of F1,and only when the output is above threshold will G check the outputs of theother Fk. If above-threshold outputs from F1, : : : , Fn coincide at a location xin the data, then G outputs the data at x as a target. This parallel-feedforwardmechanism is biologicallyplausible, because it keeps the sensors in a systemoperating on a continuous basis, without being interfered with by the othersat the same level of hierarchy.

6.3 Estimation of Density Functions for Filters. In our study, the timewindow functions and the background functions were chosen by hand.It is possible to estimate both from a sample of spike sequences, whichpotentially may lead to a better probability model of the spiking activity.

First, from the discussion in section 2.2, both the time window and back-ground functions of a �lter depend on the average �ring rate at background,which is easily estimated from data. From a theoretical point of view, theunderlying assumption that the background spiking activity follows a ho-mogeneous Poisson process over time is questionable. One possible mod-i�cation is to divide the background activity into several categories andmodel each one with a Poisson process, which may be inhomogeneous.In the appendix, we suggest a non-Bayesian approach as well as a simpleBayesian approach to incorporate different categories of background. It isworth mentioning, however, that for the detection per se, the homogeneousPoisson model of background activity has worked well.

Second, the density functions of jittered spikes around template spikesand noise spikes within a target can also be estimated. To pursue this, tem-poral alignment will play an important role. The appendix suggests a fairlysimple but crude estimation procedure for the density functions. However,the point is that the estimation of the density functions requires the distinc-tion between jittered spikes and noise spikes, which is hard to made withoutappropriate alignment.

6.4 Alignment. Our study demonstrates that alignment of spike se-quences is useful for understanding the temporal discharge patterns of

Pattern Filtering for Detection of Neural Activity 2331

higher-order neurons such as in HVc. The responses of such neurons mayexhibit internal structure yet variability in phase in relation to the stimulus.This variability may manifest itself as a systematic change in response la-tency across consecutive trials, as exhibited in Figure 6. Besides systematicmodulations, there can also be random or local perturbations of the system,resulting in magni�ed random changes in spike timing across trials.

Basically, the alignment aims to “undo” either the systematic modula-tions or the random changes in the spike sequences. The information-basedalignment is meant for tonic neuronal activity, where the notion of distanceis harder to de�ne. The alignment takes into account only the entropy of abinary spiking event in each individual time bin and then sums up the en-tropy. Note that the binary spiking event is not directly related to the exactnumber of spikes. Indeed, the binary events can be considered quantizationof the actual spiking events. Simple spiking events such as binary eventscan be used for alignment and require far fewer samples.

In the alignment considered here, any correlation among spikes in differ-ent time bins is ignored by this alignment. One can indeed use joint binaryevents. For example, with Xt de�ned as in equation 4.1, given time lags11; : : : ; 1n¡1, let Yt D .Xt; XtC11 ; : : : ; XtC1n¡1 /. Then, in equation 4.3, re-place Oht with the entropy of Yt, which has 2n possible values. However, evenfor modest n, the alignment encounters the small sample problem. One pos-sible way to alleviate the dif�culty is to quantize the set of all possible valuesof Yt, and use entropy derived from the quantization rather than Yt itself.

6.5 Applications of Point Processes to Complex Data. Point processescan be applied not only to spiking activity but also to other complicateddata, even for continuous-valued ones. Chosen appropriately, point repre-sentations not only greatly reduce the complexity of the data, but also keepa signi�cant amount of information. Such compact representations can beadvantageous in analyzing the structure of data, such as behavior (Chi &Margoliash, 2001). In addition, as shown in this article, they can lead to ef�-cient detection. Recent progress in computer vision and acoustics indicatesthat detection of complicated objects can be achieved by representing themwith very simple discrete features, such as points (Amit & Geman, 1999;Amit & Murua, 2001; Amit, Koloydenko, & Niyogi, 2002; Chi, 2003; Roth,Yang, & Ahuja, 2002).

Although there are extensive possibilities of point representations forcomplicated data, in order for point processes to be effectively applied,several issues need to be addressed. First, what types of features should beused, and how should these features be registered? While this may not bea problem for neural activity, it poses a challenge for other types of data.Second, how should the characteristic structure be learned from samplepoint representations, especially when �ne structure needs to be learned?This issue may have important implications for neural coding (Victor &

2332 Z. Chi, P. Rauske, and D. Margoliash

Purpura, 1996). Third, in the context of detection, how should the statisticalsigni�cance of detected targets be assessed? As discussed earlier, this isin particular a hard problem in neural science, given that the statisticalproperties of neural activity in many systems are poorly understood.

Appendix

A.1 Proof of Equation 2.3. We �rst show that

L.Tx/ DX

Txn.JCx/B.t ¡ x/

CX

Tx\.JCx/max

nD1;:::;pK.t ¡ x ¡ sn/ C C; (A.1)

where C is the constant in equation 2.3. First, we need to �nd la.Tx/, themaximum likelihood of con�gurations that generate Tx, given Tx is gen-erated by the template S D fs1; : : : ; spg. Let X, Z1, : : : , Zp be a con�gura-tion for Tx. By well-known results for Poisson processes (Reich, Victor, &Bruce, 1998; Brown, Barbieri, Ventura, Kass, & Frank, 2001), the likelihoodof X C x D Txn.J C x/ is

Q0 DY

t2Txn.JCx/f0.t ¡ x/ £ exp

»¡

Z

[0;¾ ]nJf0.¿ / d¿

¼: (A.2)

On the other hand, for each n D 1; : : : ; p, the likelihood of Zn is equal to

Qn DY

t2Znf1.t/ £ exp

»¡

Z ²

¡²

f1.¿ / d¿

¼: (A.3)

Because X and Zn are independent, their joint likelihood is

l.X; Z1; : : : ; Zp/ D Q0 £Yp

nD1

Yt2Zn

f1.t/ £ exp»

¡pZ ²

¡²

f1.¿ / d¿

¼

D Q0 £Y

t2Tx\.JCx/f1.t ¡ sn.t/ ¡ x/

£ exp»

¡pZ ²

¡²

f1.¿ / d¿

¼;

where n.t/ is the unique n such that t 2 Zn C sn C x. Maximizing l.X; Z1; : : : ;

Zp/ over all possible con�gurations that generate Tx yields

la.Tx/ D P.Tx j Tx generated by fs1; : : : ; sng/ D Q0 £ Q; (A.4)

with

Q :DY

Tx\.JCx/l.t/ £ exp

»¡p

Z ²

¡²

f1.¿ / d¿

¼;

l.t/ :D maxn: t2JnCx

f1.t ¡ sn ¡ x/:

Pattern Filtering for Detection of Neural Activity 2333

Recall that lo.Tx/ D qn expf¡q¾ g. Then, letting

C D ¡Z

[0;¾ ]nJf0.¿ / d¿ ¡ p

Z ²

¡²

f1.¿ / d¿ C q¾;

which is independent of S,

L.Tx/ D logla.Tx/

lo.Tx/

DX

t2Txn.JCx/log

f0.t ¡ x/

q

CX

t2Tx\.JCx/max

n:t2JnCx

»log

f1.t ¡ x ¡ sn/

q

¼C C:

Equation A.1 is proved by combining the above equations and equation 2.1.Now we can prove equation 2.3. By the de�nition of ± functions, with

Tx D T \ [x; x C ¾ ], it is easy to see that R.x/ DP

t2TxF.t ¡ x/. Then by

equations 2.2,

R.x/ DX

t2Tx\.JCx/F.t ¡ x/ C

Xt2Txn.JCx/

F.t ¡ x/

DX

t2Tx\.JCx/max1·k·p

K.t ¡ x ¡ sk/ CX

t2Txn.JCx/B.t ¡ x/:

This combined equation A.1 then proves L.Tx/ D R.x/ C C.

A.2 Incorporation of Multiple Types of Background. Assume that abackground spike sequence is a sample from one of N Poisson processeswith densities qk.t/. The processes may be inhomogeneous, and thus qkmay not be constant. For each k, let

lk.Tx/ D l.TxjTx is generated from qk/;

still letting la.Tx/ be the maximum con�guration likelihood for Tx. Thenfollowing the argument that leads to equation A.1,

Lk.Tx/ D logla.Tx/

lk.Tk/D Rk.x/ C Ck;

with Rk.x/ DR ¾

0 Fk.¿ /T.x¡¿/ d¿ and Fk D F¡ log qk and F constructed from

K.x/ D»

log f1.x/ if x 2 .¡²; ²/

0 otherwise ; B.x/ D»

log f0.x/ if x 62 J0 otherwise:

With the above changes, targets are detected at locations where all the peakresponses to Fk, k D 1; : : : ; N reach above-threshold local maximum.

2334 Z. Chi, P. Rauske, and D. Margoliash

It is also possible to develop a Bayesian approach to the detection. Let.¼0, ¼1, : : : , ¼n/ be a prior probability distribution of the categories “target,”“background k,” k D 1; : : : ; n. The spike sequence Tx at a location x in thedata has posteriors

P.Tx is a target j Tx/ D ¼0la.Tx/

P.Tx/;

P.Tx is from background k j Tx/ D ¼klk.Tx/

P.Tx/:

Then the Bayesian log-likelihood ratio of Tx being a target versus a nontargetis

L.Tx/ D log¼0la.Tx/

PNkD1 ¼klk.Tx/

D log¼0

1 ¡ ¼0¡ log

XN

kD1N¼k e¡Rk.x/;

with N¼k D e¡Ck¼k=.1 ¡ ¼0/.

A.3 Estimation of Modulated Spike Densities Based on Alignment.Let S1; : : : ; SN be aligned templates. To estimate the density f1 of spikesgenerated by template spikes, for each Sn, think of the other templates asspike sequences generated by Sn. Given ² > 0, for each t 2 Sn, the histogramof jt ¡ sj, for s 2 Sk \ .t ¡ ²; t C ²/, k 6D n, can be used to estimate the densityof jittered spikes around t. Since we assume the density is the same aroundeach template spike, we can pool all the ISIs for all t 2 Sn and use thehistogram of

Dn D[

t2Snfjt ¡ sj : s 2

Sk 6DnSk \ .t ¡ ²; t C ²/g

to estimate the density f1. In fact, since there are jSnj time windows for thespikes in Sn, and N ¡ 1 other templates, for any small interval J ½ .¡²; ²/

with duration ¾ ,

1.N ¡ 1/¾ jSnj

jDn \ Jj

is an estimate of the average of f1 in J. One can pool the data of all Dn anduse D D [Dn to estimate f1 as well.

Under the assumption of being a constant, the density f0 of noise spikeswithin a target can also be estimated. Suppose the durations of the templatesare all registered in [0; ¾ ]. Then for each n D 1; : : : ; N,

Of .n/0 D Mn

.N ¡ 1/L

with L D j[0; !]nJj, J DS

t2Sn.t ¡ ²; t C ²/, and

Mn DX

k 6Dnjfs 2 Sk : s 62 Jgj

Pattern Filtering for Detection of Neural Activity 2335

gives an estimate of f0. Moreover, Of .n/0 across n may be combined for esti-

mation of f0.

Acknowledgments

We thank Michael L. Stein for valuable comments on the manuscript. Thiswork was supported by NIH MH59831 and MH60276.

References

Abeles, M., Bergman, H., Margalit, E., & Vaadia, E. (1993). Spatiotemporal �ringpatterns in the frontal cortex of behaving monkeys. J. Neurophysiol.,70, 1629–1638.

Abeles, M., & Gerstein, G. M. (1988). Detecting spatiotemporal �ring patternsamong simultaneously recorded single neurons. J. Neurophysiol.,60, 909–924.

Amit, Y., & Geman, D. (1999).A computationalmodel for visual selection. NeuralComput., 11(7), 1691–1715.

Amit, Y., Koloydenko, A., & Niyogi, P. (2002). Robust acoustic object detection(Tech. Rep. 520). Chicago: Department of Computer Science and Statistics,University of Chicago.

Amit, Y., & Murua, A. (2001). Speech recognition using randomized relationaldecision trees. IEEE Trans. on Speech and Audio Processing, 9(4), 333–341.

Baker, S. N., & Lemon, R. N. (2000). Precise spatiotemporal repeating patternsin monkey primary and supplementary motor areas occur at chance levels.J. Neurophysiol., 84, 1770–1780.

Brenowitz, E. A., Margoliash, D., & Nordeen, K. W. (1997). An introduction tobirdsong and the avian song system, J. Neurobiol., 33, 495–500.

Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E., & Frank, L. M. (2001). Thetime-rescaling theorem and its application to neural spike train data analysis.Neural Comput., 14, 325–346.

Chi, Z. (2003). Feature representation, pattern �ltering, and temporal alignment foracoustic detection (Tech. Rep. 531). Chicago: Department of Computer Scienceand Statistics, University of Chicago.

Chi, Z., & Margoliash, D. (2001). Temporal coding and temporal drift in brainand behavior of zebra �nch song. Neuron, 32, 899–910.

Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York:Wiley.

Date, A., Bienenstock, E., & Geman, S. (1998). On the temporal resolution of neuralactivity(Tech. Rep.). Providence, RI: Division of Applied Mathematics, BrownUniversity.

Dave, A. S., & Margoliash, D. (2000). Song replay during sleep and computa-tional rules for sensorimotor vocal learning. Science, 290, 812–816.

Dave, A. S., Yu, A. C., & Margoliash, D. (1998). Behavioral state modulation ofauditory activity in a vocal motor system. Science, 282, 2250–2253.

Dayan, P., & Abbott, L. F. (2001).Theoreticalneuroscience: Computational and math-ematical modeling of neural systems. Cambridge, MA: MIT Press.

2336 Z. Chi, P. Rauske, and D. Margoliash

Fleuret, F., & Geman, D. (2001).Coarse-to-�ne facedetection. Intl. J. Comp. Vision,41(1=2), 85–107.

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions andthe Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intel.,6(6), 721–741.

Grun, S., Diesmann, M., & Aertsen, A. (2002a).Unitary events in multiple single-neuron spiking activity: I. Detection and signi�cance stimulus properties.Neural Comput., 14(1), 43–80.

Grun, S.,Diesmann, M., & Aertsen, A. (2002b).Unitary events in multiple single-neuron spiking activity: II. Nonstationary data.NeuralComput., 14(1), 81–120.

Hahnloser, R. H. R., Kozhevnikov, A. A., & Fee, M. S. (2002). An ultra-sparsecode underlies the generation of neural sequences in a songbird. Nature,419, 65–70.

Louie, K., & Wilson, M. A. (2001). Temporally structured replay of awakehippocampal ensemble activity during rapid eye movement sleep. Neuron,29, 145–156.

Margoliash, D. (2001). Do sleeping birds sing? Population coding and learningin the bird song system. Prog. Brain Res., 130, 319–331.

Margoliash, D. (2002). Evaluating theories of bird song learning: Implicationsfor future directions. J. Comp. Physiol. A, 188, 851–866.

Mooney, R. (2000).Different subthreshold mechanisms underlie song selectivityin identi�ed HVc neurons of the zebra �nch. J. Neurosci., 20(14), 5420–5436.

Nadasdy, Z., Hirase, H., Czurko, A., Csicsvari, J., & Buzsaki, G. (1999). Replayand time compression of recurring spike sequences in the hippocampus. J.Neurosci., 19(21), 9497–9507.

Reich, D. S., Victor, J. D., & Bruce, W. K. (1998). The power ratio and the intervalmap: spiking models and extracellular recordings. J. Neurosci., 18(23), 10090–10104.

Riehle, A., Grun, S., Diesmann, M., & Aertsen, A. (1997). Spike synchroniza-tion and rate modulation differentially involved in motor cortical function.Science, 278, 1950–1953.

Rieke, F., Warland, D., de Ruyter van Steveninck, R. R., & Bialek, W. (1997).Spikes: Exploring the neural code. Cambridge, MA: MIT Press.

Rojer, A. S., & Schwartz, E. L. (1992). A quotient space Hough transform forspace-variant visual attention. InG. A. Carpenter & S. Grossbert (Eds.), Neuralnetworks for vision and image processing. Cambridge, MA: MIT Press.

Roth, D., Yang, M.-H., & Ahuja, N. (2002). Learning to recognize three-dimensional objects. Neural Comput., 14(5), 1071–1103.

Shea, S. D., Rauske, P. L., & Margoliash, D. (2001). Identi�cation of HVc projec-tion neurons in extracellular records by antidromic stimulation. Soc. Neurosci.Abstr., 27, 381–386.

Skaggs, W. E., & McNaughton, B. L. (1996). Replay of neuronal �ring se-quences in rat hippocampus during sleep following spatial experience. Sci-ence, 271, 1870–1873.

Vaadia, E., Haalman, I., Abeles, M., Bergman, H., Prut, Y., Slovin, H., & Aertsen,A. (1995). Dynamics of neuronal interactions in monkey cortex in relation tobehavioural events. Nature, 373, 515–518.

Pattern Filtering for Detection of Neural Activity 2337

Victor, J. D., & Purpura, K. P. (1996). Nature and precision of temporal codingin visual cortex: A metric-space analysis. J. Neurophysiol., 76(2), 1310–1326.

Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of hippocampal en-semble memories during sleep. Science, 265, 676–679.

Yu, A. C., & Margoliash, D. (1996). Temporal hierarchical control of singing inbirds. Science, 273, 1871–1875.

Received October 11, 2002; accepted April 3, 2003.