Modeling the impact of common noise inputs on the network activity of retinal ganglion cells

J Comput NeurosciDOI 10.1007/s10827-011-0376-2

Modeling the impact of common noise inputson the network activity of retinal ganglion cells

Michael Vidne · Yashar Ahmadian · Jonathon Shlens · Jonathan W. Pillow ·Jayant Kulkarni · Alan M. Litke · E. J. Chichilnisky ·Eero Simoncelli · Liam Paninski

Received: 14 June 2011 / Revised: 4 December 2011 / Accepted: 9 December 2011© Springer Science+Business Media, LLC 2011

Abstract Synchronized spontaneous firing among reti-nal ganglion cells (RGCs), on timescales faster than vi-sual responses, has been reported in many studies. Twocandidate mechanisms of synchronized firing includedirect coupling and shared noisy inputs. In neighboringparasol cells of primate retina, which exhibit rapidsynchronized firing that has been studied extensively,

Action Editor: Brent Doiron

M. Vidne (B)Department of Applied Physics & Applied Mathematics,Center for Theoretical Neuroscience, Columbia University,New York, NY, USAe-mail: [email protected]

Y. Ahmadian · L. PaninskiCenter for Theoretical Neuroscience,Columbia University, New York, NY, USA

J. Shlens · E. J. ChichilniskyThe Salk Institute for Biological Studies,La Jolla, CA, USA

J. W. PillowCenter for Perceptual Systems, The Universityof Texas at Austin, Austin, TX, USA

J. KulkarniCold Spring Harbor Laboratory, Cold Spring Harbor,NY, USA

A. M. LitkeSanta Cruz Institute for Particle Physics,University of California, Santa Cruz, CA, USA

E. SimoncelliHoward Hughes Medical Institute, Center for NeuralScience, and Courant Institute of Mathematical Sciences,New York University, New York, NY, USA

recent experimental work indicates that direct electricalor synaptic coupling is weak, but shared synaptic inputin the absence of modulated stimuli is strong. However,previous modeling efforts have not accounted for thisaspect of firing in the parasol cell population. Herewe develop a new model that incorporates the effectsof common noise, and apply it to analyze the lightresponses and synchronized firing of a large, densely-sampled network of over 250 simultaneously recordedparasol cells. We use a generalized linear model inwhich the spike rate in each cell is determined bythe linear combination of the spatio-temporally filteredvisual input, the temporally filtered prior spikes ofthat cell, and unobserved sources representing commonnoise. The model accurately captures the statisticalstructure of the spike trains and the encoding of thevisual stimulus, without the direct coupling assump-tion present in previous modeling work. Finally, weexamined the problem of decoding the visual stimulusfrom the spike train given the estimated parameters.The common-noise model produces Bayesian decod-ing performance as accurate as that of a model withdirect coupling, but with significantly more robustnessto spike timing perturbations.

Keywords Retina · Generalized linear model ·State-space model · Multielectrode · Recording ·Random-effects model

1 Introduction

Advances in large-scale multineuronal recordings havemade it possible to study the simultaneous activity ofcomplete ensembles of neurons. Experimentalists now

J Comput Neurosci

routinely record from hundreds of neurons simultane-ously in many preparations: retina (Warland et al. 1997;Frechette et al. 2005), motor cortex (Nicolelis et al.2003; Wu et al. 2008), visual cortex (Ohki et al. 2006;Kelly et al. 2007), somatosensory cortex (Kerr et al.2005; Dombeck et al. 2007), parietal cortex (Yu et al.2006), hippocampus (Zhang et al. 1998; Harris et al.2003; Okatan et al. 2005), spinal cord (Stein et al. 2004;Wilson et al. 2007), cortical slice (Cossart et al. 2003;MacLean et al. 2005), and culture (Van Pelt et al. 2005;Rigat et al. 2006). These techniques in principle providethe opportunity to discern the architecture of neuronalnetworks. However, current technologies can sampleonly small fractions of the underlying circuitry; there-fore, unmeasured neurons can have a large collectiveimpact on network dynamics and coding properties. Forexample, it is well-understood that common input playsan essential role in the interpretation of pairwise cross-correlograms (Brody 1999; Nykamp 2005). To infer thecorrect connectivity and computations in the circuitrequires modeling tools that account for unrecordedneurons.

Here, we investigate the network of parasol retinalganglion cells (RGCs) of the macaque retina. Severalfactors make this an ideal system for probing commoninput. Dense multi-electrode arrays provide access tothe simultaneous spiking activity of many RGCs, but donot provide systematic access to the nonspiking innerretinal layers. RGCs exhibit significant synchrony intheir activity, on timescales faster than that of visualresponses (Mastronarde 1983; DeVries 1999; Shlenset al. 2009; Greschner et al. 2011), yet the significancefor information encoding is still debated (Meister et al.1995; Nirenberg et al. 2002; Schneidman et al. 2003;Latham and Nirenberg 2005). In addition, the under-lying mechanisms of these correlations in the primateretina remain under-studied: do correlations reflect di-rect electrical or synaptic coupling (Dacey and Brace1992) or shared input (Trong and Rieke 2008)? Conse-quently, the computational role of the correlations re-mains uncertain: how does synchronous activity affectthe information encoded by RGCs about the visualworld?

Recent work using paired intracellular recordings re-vealed that neighboring parasol RGCs receive stronglycorrelated synaptic input; ON parasol cells exhibitedweak direct reciprocal coupling while OFF parasol cellsexhibited none (Trong and Rieke 2008). In contrastwith these empirical findings, previous work modelingthe joint firing properties and stimulus encoding ofparasol cells modeled their correlations with nearlyinstantaneous direct coupling effects, and did not ac-count for the possibility of common input (Pillow et al.

2008). Here, we introduce a model for the joint firing ofthe parasol cell population that incorporates commonnoise effects. We apply this model to analyze the lightresponses and synchronized firing of a large, densely-sampled network of over 250 simultaneously recordedRGCs. Our main conclusion is that the common noisemodel captures the statistical structure of the spiketrains and the encoding of visual stimuli accurately,without assuming direct coupling.

2 Methods

2.1 Recording and preproccesing

Recordings The preparation and recording methodswere described previously (Litke et al. 2004; Frechetteet al. 2005; Shlens et al. 2006). Briefly, eyes were ob-tained from deeply and terminally anesthetized Macacamulatta used by other experimenters in accordancewith institutional guidelines for the care and use ofanimals. 3–5 mm diameter pieces of peripheral retina,isolated from the retinal pigment epithelium, wereplaced flat against a planar array of 512 extracellularmicroelectrodes, covering an area of 1800 × 900 μm.The present results were obtained from 30–60 minsegments of recording. The voltage on each electrodewas digitized at 20 kHz and stored for off-line analy-sis. Details of recording methods and spike sortinghave been given previously (Litke 2004; see also Segevet al. 2004). Clusters with a large number of refractoryperiod violations (>10% estimated contamination) orspike rates below 1 Hz were excluded from additionalanalysis. Inspection of the pairwise cross-correlationfunctions of the remaining cells revealed an occasionalunexplained artifact, in the form of a sharp and pro-nounced ‘spike’ at zero lag, in a few cell pairs. Theseartifactual coincident spikes were rare enough to nothave any significant effect on our results; cell pairsdisplaying this artifact are excluded from the analysisillustrated in Figs. 4–6 and 11.

Stimulation and receptive f ield analysis An opticallyreduced stimulus from a gamma-corrected cathode raytube computer display refreshing at 120 Hz was focusedon the photoreceptor outer segments. The low photopicintensity was controlled by neutral density filters inthe light path. The mean photon absorption rate forthe long (middle, short) wavelength-sensitive cones wasapproximately equal to the rate that would have beencaused by a spatially uniform monochromatic light ofwavelength 561 (530, 430) nm and intensity 9200 (8700,7100) photons/μm2/s incident on the photoreceptors.

J Comput Neurosci

The mean firing rate during exposure to a steady, spa-tially uniform display at this light level was 11 ± 3 Hzfor ON cells and 17 ± 3.5 Hz for OFF cells. Spatiotem-poral receptive fields were measured using a dynamiccheckerboard (white noise) stimulus in which the in-tensity of each display phosphor was selected randomlyand independently over space and time from a binarydistribution. Root mean square stimulus contrast was96%. The pixel size (60 μm) was selected to be of thesame spatial scale as the parasol cell receptive fields.In order to outline the spatial footprint of each cell, wefit an elliptic two-dimensional Gaussian function to thespatial spike triggered average of each of the neurons.The resulting receptive field outlines of each of thetwo cell types (104 ON and 173 OFF RGCs) formeda nearly complete mosaic covering a region of visualspace (Fig. 11(D)), indicating that most parasol cells inthis region were recorded. These fits were only used tooutline the spatial footprint of the receptive fields, andwere not used as the spatiotemporal stimulus filters ki,as discussed in more detail below.

2.2 Model structure

We begin by describing our model in its full gener-ality (Fig. 1(A)). Later, we will examine two mod-els which are different simplifications of the generalmodel (Fig. 1(B)–(C)). We used a generalized linearmodel augmented with a state-space model (GLSSM).This model was introduced in Kulkarni and Paninski(2007) and is similar to methods discussed in Smith andBrown (2003) and Yu et al. (2006); see Paninski et al.(2010) for review. The conditional intensity function(instantaneous firing rate), λi

t, of neuron i at time t wasmodeled as:

log λit = μi + ki · xt + hi · yi

t +ncells∑

j�=i

Li, j · y jt +

nq∑

r=1

Mi,r · qrt .

(1)

The right hand side of this expression is organized interms of input-filter pairs; we will describe the inputsfirst and then the corresponding filters. μi is a scalaroffset, determining the cell’s baseline log-firing rate; xt

is the spatiotemporal stimulus history vector at time t; yit

is a vector of the cell’s own spike-train history in a shortwindow of time preceding time t; y j

t is the spike-train ofthe jth cell preceding time t; qr

t is the rth common noisecomponent at time t. Correspondingly, ki is the stimulusspatio-temporal filter of neuron i; hi is the post-spikefilter accounting for the ith cell’s own post-spike effects;Li, j are direct coupling filters from neuron j to neuron

i which capture dependencies of the cell on the recentspiking of all other cells; and lastly, Mi,r is the mixingmatrix which takes the rth noise source and ‘injects’ itinto cell i.

We experimented with two different distributions tomodel the spike count in each bin, given the rate λi

t.First we used a Poisson distribution (with mean λi

tdt,where dt = 0.8 ms was the temporal resolution usedto represent the spike times here) in the exploratoryanalyses described in Sections 2.2 and 3.1 below; notethat since λi

t itself depends on the past spike times, thismodel does not correspond to an inhomogeneous Pois-son process (in which the spiking in each bin would beindependent). We also used the Bernoulli distributionwith the same probability of not spiking, p(no spike) =exp(−λi

tdt).The Poisson distribution constrains the mean firing

rate to equal the variance of the firing rate, which isnot the case in our data; the variance tends to be sig-nificantly smaller than the mean. In preliminary analy-ses, the Bernoulli model outperformed the Poissonmodel. Therefore, we used the Bernoulli model for thecomplete network with common noise with no directcoupling (described here and in Section 3.2). We didnot explore the differences between the two modelsfurther, but in general expect them to behave fairlysimilarly due to the small dt used here.

Now we will discuss the terms in Eq. (1) in moredetail. The stimulus spatio-temporal filter ki was mod-eled as a five-by-five-pixels spatial field, by 30 frames(250 ms) temporal extent. Each pixel in ki is allowedto evolve independently in time. The history filter wascomposed of ten cosine “bump” basis functions, with0.8 ms resolution, and a duration of 188 ms, while thedirect coupling filters were composed of 4 cosine-bumpbasis functions; for more details see Pillow et al. (2008).

The statistical model developed by Pillow et al.(2008) captured the joint firing properties of a completesubnetwork of 27 RGCs, but did not explicitly includecommon noise in the model. Instead, correlations werecaptured through direct reciprocal connections (corre-sponding to our Li, j terms) between RGCs. (Similarmodels with no common noise term have been con-sidered by many previous authors (Chornoboy et al.1988; Utikal 1997; Keat et al. 2001; Paninski et al. 2004;Pillow et al. 2005; Truccolo et al. 2005; Okatan et al.2005).) Because the cross-correlations observed in thisnetwork have a fast time scale and are peaked at zerolag, the direct connections in the model introduced byPillow et al. (2008) were estimated to act almost instan-taneously (with effectively zero delay), making theirphysiological interpretation somewhat uncertain. Wetherefore imposed a strict 3 ms delay on the initial rise

J Comput Neurosci

Fig. 1 Model schemas.(A) Fully general model.Each cell is modeledindependently using aGeneralized Linear modelaugmented with a state-spacemodel (GLSSM). The inputsto the cell are: stimulusconvolved with a linearspatio-temporal filter (ki · xtin Eq. (1)), past spikingactivity convolved with ahistory filter (hi · yi

t), pastspiking activity of all othercells convolved withcorresponding cross-couplingfilters (

∑ncellsj�=i Li, j · y j

t ), and amixing matrix, M, thatconnects nq common noiseinputs qr

t to the ncellsobserved RGCs. (B) Pairwisemodel. We simplify the modelby considering pairs ofneurons separately.Therefore, we have two cellsand just one shared commonnoise. (C) Common-noisemodel where we set all thecross-coupling filters to zeroand have ncells independentcommon-noise sourcescoupled to the RGC networkvia the mixing matrix M

ex

Neuron 1

ex

Neuron 2

ex

Neuron 3

q1

q2

q3

M

Noisesources

Mixingmatrix

Stimulus filter NonlinearityStochastic

spiking

Post-spike filter

Stimulus

Cross-coupling filters

ex

Neuron 1

ex

Neuron 2

ex

Neuron 3

q1

q2

q3

M

Noisesources

Mixingmatrix

(C)ex

Neuron 1

ex

Neuron 2

q1M

Noisesources

Mixingmatrix

Cross-coupling filters

(B)

(A)

of the coupling filters here to account for physiologicaldelays in neural coupling. (However, the exact delayimposed on the cross-coupling filters did not change ourresults qualitatively, as long as some delay was imposed,down to 0.8 ms, our temporal resolution, as we discussat more length below.) The delay in the cross-couplingfilters effectively forces the common noise term toaccount for the instantaneous correlations which areobserved in this network; see Fig. 5 and Section 3.1below for further discussion.

The last term in Eq. (1) is more novel in this context,and will be our focus in this paper. The term qr

t is theinstantaneous value of the r-th common noise term attime t, and we use qr = {qr

t }Tt=1 to denote the time-series

of common noise inputs, r. Each qr is independentlydrawn from an autoregressive (AR) Gaussian process,qr ∼ N (0, Cτ ), with mean zero and covariance matrixCτ . Since the inner layers of the retina are composedof non-spiking neurons, and since each RGC receives

inputs from many inner layer cells, restricting qr to bea Gaussian process seems to be a reasonable startingpoint. To fix the temporal covariance matrix Cτ , recallthat Trong and Rieke (2008) reported that RGCs sharea common noise with a characteristic time scale ofabout 4 ms, even in the absence of modulations inthe visual stimulus. In addition, the characteristic widthof the fast central peak in the cross-correlograms inthis network is of the order of 4 ms (see, e.g., Shlenset al. 2006; Pillow et al. 2008, and also Fig. 5 below).Therefore, we imposed a 4 ms time scale on our com-mon noise by choosing appropriate parameters for Cτ ;we did this using the autoregressive formulation qt =∑n

τ=1 φτ qt−τ + εt, and choosing appropriate parametersφτ and the variance of the white noise process, εt, usingthe Yule–Walker method, which solves a linear setof equations given the observed data autocorrelation(Hayes 1996). We use the Matlab implementation ofthe Yule–Walker method.

J Comput Neurosci

The common noise inputs in our model are inde-pendent of the stimulus, to maintain computationaltractability; recall that Trong and Rieke (2008) ob-served that the common noise was strong and corre-lated even in the absence of a modulated stimulus.While the stimulus-independent AR model for thecommon noise was chosen for computational conve-nience (see Appendix A for details), this model provedsufficient for modeling the data, as we discuss at morelength below. Models of this kind, in which the linearpredictor contains a random component, are commonin the statistical literature and are referred to as ‘ran-dom effects models’ (Agresti 2002), and ‘generalizedlinear mixed models’ (McCulloch et al. 2008).

The mixing matrix, M, connects the nq commonnoise terms to the ncells observed neurons; this matrixinduces correlations in the noise inputs impacting theRGCs. More precisely, the interneuronal correlationsin the noise arise entirely from the mixing matrix;this reflects our assumption that the spatio-temporalcorrelations in the common noise terms are separable.I.e. the correlation matrix can be written as a Kro-necker product of two matrices, C = Cτ ⊗ Cs, whereCs = MT M. Since the common noise sources qt areindependent, with identical distributions, and all thespatial correlations in the common noise are due tothe mixing matrix, M, we may compute the spatialcovariance matrix of the vector of mixed common noiseinputs Mqt as Cs = MTM. (We have chosen the noisesources to have unit variance; this entails no loss ofgenerality, since we can change the strength of any ofthe noise inputs to any cell by changing the mixingmatrix M.) The number of noise sources in the modelcan vary to reflect modeling assumptions. In Section 2.2we use one noise source for every pair of neurons, whilein Section 2.2 we use the same number of noise sourcesas the number of cells in order to avoid imposing anyfurther assumptions on the statistical structure of thenoise. (One interesting question is how few commoninputs are sufficient—i.e., how small can we make nq

while still obtaining an accurate model—but we havenot yet pursued this direction systematically.)

Pairwise model (Fig. 1(B)) For simplicity, and for bet-ter correspondence with the experimental evidence ofTrong and Rieke (2008), we began by restricting ourmodel to pairs of RGCs. We fit our model to pairs ofthe same subset of 27 RGCs analyzed in the modelingpaper by Pillow et al. (2008). The receptive fields ofboth ON and OFF cell types in this subset formeda complete mosaic covering a small region of visualspace, indicating that every parasol cell in this regionwas recorded; see Fig. 1(b) of Pillow et al. (2008). We

modeled each pair of neurons with the model describedabove, but we allowed one common noise to be sharedby the cells. In other words, the two cells in the pair re-ceived the same time-series, q, scaled by M = [m1, m2]T

(Fig. 1(B)). Therefore, the conditional intensity is:

λit = exp(μi + ki · xt + hi · yi

t + Li, j · y jt + mi · qt), (2)

where we kept all the notation as above, but we havedispensed with the sums over the other cells and overthe common noise inputs (since in this case there is onlyone other cell and one common noise term to consider).The probability of observing n spikes between time tand time t + dt for neuron i in this model is:

P(nit) = (λi

t)ni

t e−λit

nit!

; (3)

This model is conceptually similar to the model ana-lyzed by de la Rocha et al. (2007), with a GLM includ-ing the spike history yi

t and y jt in place of integrate and

fire spiking mechanism used in that paper.

Common-noise model with no direct coupling(Fig. 1(C)) Encouraged by the results of the pairwisemodel, we proceeded to fit the GLSS model tothe entire observed RGC network. The pairwisemodel results (discussed below) indicated that thecross-coupling inputs are very weak compared toall other inputs. Also, the experimental results ofTrong and Rieke (2008) indicate only weak directcoupling between ON cells and no direct couplingbetween the OFF cells. Thus, in order to obtain a moreparsimonious model we abolished all cross-couplingfilters (Fig. 1(C)). Hence, the conditional intensityfunction, λi

t is:

λit = exp

(μi + ki · xt + hi · yi

t +nq∑

r=1

Mi,r · qrt

). (4)

As in the pairwise model, each cell has a spatio-temporal filter, ki, and a history filter, hi. In contrastwith the pairwise model, we now have no direct cou-pling, Li, j. As discussed above, the firing activity of allthe cells was modeled here as a Bernoulli process, withprobabilities

P(nit = 0) = exp(−λi

t)dt

P(nit = 1) = 1 − exp(−λi

t)dt.(5)

The formulation of the model as conditionally indepen-dent cells given the common noise q, with no cross-coupling filters, interacting only through the covariancestructure of the common inputs, lends itself naturallyto computational parallelization, since the expensivestep, the maximum likelihood estimation of the model

J Comput Neurosci

parameters, can be performed independently on eachcell if certain simplifying approximations are made, asdiscussed in the next section.

2.3 Model parameter estimation

Now that we have introduced the model structure,we will describe the estimation of the model parame-ters. We proceeded in three steps. First we obtaineda preliminary estimate of the “private” GLM parame-ters (μi, ki, hi) using a standard maximum-likelihoodapproach in which the common noise inputs q anddirect coupling terms Li were fixed at zero: we maxi-mized the GLM likelihood p(y|μi, ki, hi, Li = 0, q = 0),computed by forming products over all timebins t ofeither the Poisson likelihood (Eq. (3), in the pairwisemodel) or the Bernoulli likelihood (Eq. (5), in thepopulation model with no direct coupling). In paral-lel, we obtained a rough estimate of the spatial noisecovariance Cs directly from the data using the PSTHmethod (explained briefly below, and in Appendix C).Then, in the second step, each cell’s GLSS model wasfit independently given the rough estimate of Cs. Lastly,given the model parameters and data we determinedthe covariance structure of the common noise effectswith greater precision using the method of moments(explained below and in Appendix B).

We now discuss each step in turn. Maximizingthe GLM likelihood p(y|μi, ki, hi, Li = 0, q = 0) fol-lowed standard procedures (Truccolo et al. 2005; Pillowet al. 2008). To obtain a first estimate of the spa-tial noise covariance Cs, we used a peri-stimulus timehistogram- (PSTH-) based approach that was similarto the cross-covariogram method introduced by Brody(1999). Specifically, we analyzed another dataset inwhich we observed these neurons’ responses to a fixedrepeating white noise stimulus with the same varianceas the longer nonrepeating white noise stimulus usedto fit the other model parameters. By averaging theneural responses to the repeated stimulus we obtainedthe PSTH. We subtracted each neuron’s PSTH from itsresponse to each trial to obtain the neurons’ trial-by-trial deviations from the mean response to the stimulus.We then formed a covariance matrix of the deviationsbetween the neurons for each trial, and estimated thespatial covariance Cs from this matrix via a singularvalue decomposition that exploits the spatiotemporallyseparable structure of the common noise in our model.(For more details see Appendix C.) It is important tonote that even though the PSTH method gives a goodestimate of the spatial covariance Cs, we only need arough estimate of the magnitude of the common noise

going into each cell in order to proceed to the next stepin our estimation procedure.1

In the second stage of the estimation procedure,we used a maximum marginal likelihood approach toupdate the parameters for each cell in parallel, us-ing the rough estimate of the spatial covariance Cs

obtained in the previous step. We abbreviate � ={μi, ||ki||2, hi, Li}.2 In our model, since the commonnoise inputs q are unobserved, to compute the likeli-hood of the spike train given the parameters �, we mustmarginalize over all possible q:

�opt = argmax�

log p(y|�, Cs)

= argmax�

log∫

p(y|q, �; Cs)p(q, �; Cs)dq, (6)

This marginal loglikelihood can be shown to be a con-cave function of � in this model (Paninski 2005). How-ever, the integral over all common noise time series qis of very high dimension and is difficult to computedirectly. Therefore, we proceeded by using the Laplaceapproximation (Kass and Raftery 1995; Koyama andPaninski 2010; Paninski et al. 2010):

log∫

p(y|q, �; Cs)p(q, �; Cs)dq

≈ log p(q) + log p(y|q, �; Cs)

−12

log det(

∂2 p(y|q; �; Cs)

∂2q2

), (7)

with

q = argmaxq

[log p(q) + log p(y|q, �; Cs)

]. (8)

While this approximation might look complicated atfirst sight, in fact it is quite straightforward: we have ap-proximated the integrand p(y|q, �; Cs)p(q, �; Cs) with

1It is also useful to recall the relationship between the spatialcovariance Cs and the mixing matrix M here: as noted above,since the common noise terms qt have unit variance and are inde-pendent from the stimulus and from one another, the spatial co-variance Cs is given by MT M. However, since Cs = (UM)T (UM)

for any unitary matrix U, we can not estimate M directly; we canonly obtain an estimate of Cs. Therefore, we can proceed with theestimation of the spatial covariance Cs and use any convenientdecomposition of this matrix for our calculations below. (Weemphasize the non-uniqueness of M because it is tempting tointerpret M as a kind of effective connectivity matrix, and thisover-interpretation should be avoided.)2Estimating the spatio-temporal filters ki simultaneously whilemarginalizing over the common noise q is possible but compu-tationally challenging. Therefore we held the shape of ki fixed inthis second stage but optimized its Euclidean length ||ki||2. Theresulting model explained the data well, as discussed in Section 3below.

J Comput Neurosci

a Gaussian function, whose integral we can compute ex-actly, resulting in the three terms on the right-hand-sideof Eq. (7). The key is that this replaces the intractableintegral with a much more tractable optimization prob-lem: we need only compute q, which corresponds to themaximum a posteriori (MAP) estimate of the commonnoise input to the cell on a trial by trial basis.3 TheLaplace approximation is accurate when the likelihoodfunction is close to Gaussian or highly concentratedaround the MAP estimate; in particular, Pillow et al.(2011) and Ahmadian et al. (2011) found that thisapproximation was valid in this setting.

It should also be noted that the spatial covariancematrix, Cs, is treated as a fixed parameter and is notbeing optimized for in this second stage. The modelparameters and the common noise were jointly opti-mized by using a Newton-Raphson method on the log-likelihood, Eq. (7). Taking advantage of the fact thatthe second derivative matrix of the log-posterior in theAR model has a banded diagonal structure, we wereable to fit the model in time linearly proportional to thelength of the experiment (Koyama and Paninski 2010;Paninski et al. 2010). For more details see Appendix A.In our experience, the last term in Eq. 7 does not sig-nificantly influence the optimization, and was thereforeneglected, for computational convenience.

Over-fitting is a potential concern for all parametricfitting problems. Here one might worry, for example,that our common noise is simply inferred to be instan-taneously high whenever we observe a spike. However,this does not happen, since we imposed that the com-mon noise is an AR process with a nonvanishing timescale, and this time correlation, in essence, penalizesinstantaneous changes in the common noise. Further-more, in all the results presented below, we generated

3We make one further simplifying approximation in the case ofthe population model with no direct coupling terms. Here thedimensionality of the common noise terms is extremely large:dim(q) = ncells × T, where T is the number of timebins in theexperimental data. As discussed in Appendix A, direct optimiza-tion over q can be performed in O(n3

cellsT) time, i.e., computa-tional time scaling linearly with T but unfortunately cubically inncells (see Paninski et al. 2010 for further discussion). This cubicscaling becomes prohibitive for large populations. However, ifwe make the simplifying approximation that the common noiseinjected into each RGC is nearly conditionally independent giventhe observed spikes y when computing the marginal likelihoodp(y|�, Cs), then the optimizations over the ncells independentnoise terms qi can be performed independently, with the totalcomputation time scaling linearly in both ncells and T. We foundthis approximation to perform reasonably in practice (see Section3 below), largely because the pairwise correlation in the common-noise terms was always significantly less than one (see Fig. 4below). No such approximation was necessary in the pairwisecase, where the computation always scales as O(T).

the predicted spike trains using a new realization of theAR process, not with the MAP estimate of the commonnoise obtained here, on a cross-validation set. There-fore, any possible over-fitting should only decrease ourprediction accuracy.

In the third stage of the fitting procedure, we re-estimated the spatial covariance matrix, Cs, of the com-mon noise influencing the cells. We use the method ofmoments to obtain Cs given the parameters obtained inthe second step. In the method of moments, we approx-imated each neuron as a point process with a condi-tional intensity function, λt = exp(�Xt + qt), where wehave concatenated all the covariates, the stimulus andthe past spiking activity of the cell and all other cells,into X. This allowed us to write analytic expressionsfor the different expected values of the model as afunction of Cs given Xt. Then, we equated the analyticexpressions for the expected values with the observedempirical expected values and solved for Cs. See de-tails in Appendix B. The method of moments has theadvantage that it permits an alternating optimization(iterating between the second and third steps describedhere). However, in practice we found that the twomethods for estimating Cs (the PSTH approach and themethod of moments) gave similar results.

2.4 Decoding

Once we found the model parameters, we solved theinverse problem and estimated the stimulus given thespike train and the model parameters. We will considerthe decoding of the filtered stimulus input into the cells,ui = ki · x. For simplicity, we performed the decodingonly on pairs of neurons, using the pairwise model(Fig. 1(B)) fit of Section 2.2 on real spike trains thatwere left out during the fitting procedure. We adopteda Bayesian approach to decoding, where the stimu-lus estimate, u(y), is based on the posterior stimulusdistribution conditioned on the observed spike trainsyi. More specifically, we used the maximum a poste-riori (MAP) stimulus as the Bayesian estimate. TheMAP estimate is approximately equal to the posteriormean, E

[u|y, �, Cs

]in this setting (Pillow et al. 2011;

Ahmadian et al. 2011). According to Bayes’ rule, theposterior p(u|y, �, Cs) is proportional to the productof the prior stimulus distribution p(u), which describesthe statistics of the stimulus ensemble used in the ex-periment, and the likelihood p(y|u, �, Cs) given by theencoding model. The MAP estimate is thus given by

u(y) = argmaxu

p(u|y, �, Cs)

= argmaxu

[log p(u|�) + log p(y|u, �, Cs)]. (9)

J Comput Neurosci

The prior p(u|�) depends on the model parametersthrough the dependence of ui on the stimulus filters ki.The marginal likelihood p(y|u, �, Cs) was obtained byintegrating out the common noise terms as in Eq. (6),and as in Eq. (7), we used the Laplace approximation:

log p(y|u, �, Cs) = maxq

[log p(q) + log p(y|u, q, �, Cs)]

(10)

where, again, we retained just the first two terms of theLaplace approximation (dropping the log-determinantterm, as in Eq. (7)). With this approximation the poste-rior estimate is given by

u(y) = argmaxu,q

[log p(u|�) + log p(q)

+ log p(y|u, q, �, Cs)]. (11)

By exploiting the bandedness properties of the priorsand the model likelihood, we performed the optimiza-tion in Eq. (11) in computational time scaling only

linearly with the duration of the decoded spike trains;for more details see Appendix A.

3 Results

We began by examining the structure of the estimatedGLM parameters. Qualitatively, the resulting filterswere similar for the pairwise model, the common-noise model, and the model developed by Pillow et al.(2008). The stimulus filters closely resembled a time-varying difference-of-Gaussians. The post-spike filtersproduced a brief refractory period and gradual recov-ery with a slight overshoot (data not shown; see Fig. 1in Pillow et al. 2008).

We will now first present our results based on thepairwise model. Then we will present the results ofthe analysis based on the common-noise model withno direct coupling; in particular, we will address thismodel’s ability to capture the statistical structure that

−2

0

2

−2

2

−4 −2

024

−2

−1

0

100 200 300 400 500 600 700 800 900 1000

0

1

2

ms

(A)

(B)

(C)

(D)

(E) common noise

coupling input

stimulus input

self input

spikes (F)

(G)

−60 0 60 −5

0

25

ms

Hz

truesimulated

ms−20 0 20

−20

0

20

0

2

4

6

Fig. 2 Relative contribution of self post-spike, stimulus inputs,cross-coupling inputs, and common noise inputs to a pair of ONcells. In panels (A) through (D) blue indicates cell 1 and greencell 2. Panel (A): The observed spike train of these cells duringthis second of observation. Panel (B): Estimated refractory inputfrom the cell, hi · yi

t. In panels B through D the traces are obtainedby convolving the estimated filters with the observed spike-trains.Panel (C): The stimulus input, ki · xt. Panel (D): Estimated cross-coupling input to the cells, Li, j · y j

t . Panel (E): MAP estimateof the common noise, qt (black), with one standard deviationband (gray). Red trace indicates a sample from the posterior

distribution of the common noise given the observed data. Notethat the cross-coupling input to the cells are much smaller than allother three inputs to the cell. Panel (F): The cross-correlationsof the two spike trains (black - true spike trains, red-simulatedspike trains). Panel (G): The three point correlations function ofthe common-noise and the spike trains, C(τ1, τ2) = [〈q(t)y1(t +τ1)y2(t + τ2)〉 − 〈q(t)〉〈y1(t)〉〈y2(t)〉]/(〈y1(t)〉〈y2(t)〉dt). Note thatwhen the two cells fire in synchrony the common-noise tends tobe high (as can be seen from the warm colors at the center of thefigure)

J Comput Neurosci

is present in the full population data, and to predictthe responses to novel stimuli. Finally, we will turnback to the pairwise model and analyze its decodingperformance.

3.1 Pairwise model

The pairwise model (Fig. 1(B)) allowed for the trial-by-trial estimation of the different inputs to the cell,including the common noise, qt. To examine the rela-tive magnitude of the inputs provided by stimulus andcoupling-related model components, we show in Fig. 2the net linear input to an example pair of ON cellson a single trial. The top panel (panel A) shows thespike-trains of two ON cells over 1 s. Below it arethe different linear inputs. The post-spike filter input,hi · yi

t, (panel B) imposes a relative refractory periodafter each spike by its fast negative input to the cell.The stimulus filter input, ki · xt, (panel C) is the stimulusdrive to the cell. In panel D we show the cross-couplinginput to the cells, Li, j · y j

t . The MAP estimate of thecommon noise, mi · qt, (panel E black line) is positivewhen synchronous spikes occur during negative stim-ulus periods or when the stimulus input is not strongenough to explain the observed spike times. This canbe seen quantitatively in panel J, where we show thethree point correlation function of the common noise

and the two spike trains. In this case, mi = m j = 1, forsimplicity, so mi · qt = qt. The red line in panel E isan example of one possible realization of the commonnoise which is consistent with the observed spikingdata; more precisely, it is a sample from the Gaussiandistribution with mean given by the MAP estimate,qt, (black line) and covariance given by the estimatedposterior covariance of qt (gray band) (Paninski et al.2010). Note that the cross-coupling input, Li, j · y j

t , ismuch smaller than the stimulus, self-history, and thecommon noise.

The relative magnitude of the different linear inputsis quantified in Fig. 3(A), where we show the rootmean square (RMS) inputs to the ensemble of cellsfrom qt and Li, j · y j

t as a function of the cells’ distancefrom each other. One can see that the common noiseis significantly larger than the cross-coupling input inthe majority of the cell pairs. It is important to notethat we are plotting the input to the cells and notthe magnitude of the filters; since the OFF populationhas a higher mean firing rate, the net cross-couplinginput is sometimes larger than for the ON population.However, the magnitude of the cross-coupling filters isin agreement with Trong and Rieke (2008); we foundthat the cross-coupling filters between neighboring ONcells are stronger, on average, than the cross-couplingfilters between neighboring OFF cells by about 25%

0 100 200 300 4000

0.2

0.4

0.6

0.8

1

distance between cells [µm]

RM

S o

f net

line

ar in

put

ON−ON CIOFF−OFF CION−ON cross couplingOFF−OFF cross coupling

(A) (B)

0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

40

45

Cou

nt

common input

stimulus and self inputs

OFF cells

ON cells

Fig. 3 Population summary of the inferred strength of the inputsto the model. (A) Summary plot of the relative contribution ofthe common noise (circles) and cross-coupling inputs (crosses) tothe cells as a function of the cells’ distance from each other in thepairwise model for the complete network of 27 cells. Note thatthe common noise is stronger then the cross-coupling input in thelarge majority of the cell pairs. Since the OFF cells have a higherspike rate on average, the OFF–OFF cross-coupling inputs havea larger contribution on average than the ON–ON cross-couplinginputs. However, the filter magnitudes agree with the results ofTrong and Rieke (2008); the cross-coupling filters between ON

cells are stronger, on average, than the cross-coupling filters be-tween OFF cells (not shown). The gap under 100 μm is due to thefact that RGCs of the same type have minimal overlap (Gauthieret al. 2009). (B) Histograms showing the relative magnitude ofthe common noise and stimulus plus post-spike filter inducedinputs to each cell under the common-noise model. The X-axis

is the ratio of the RMS of each of these inputs(√

var(qt)

var(hi·yit+ki·xt)

).

Common inputs tend to be a bit more than half as strong as thestimulus and post-spike inputs combined

J Comput Neurosci

(data not shown). It is also clear that the estimated cou-pling strength falls off with distance, as also observed inShlens et al. (2006). The gap under 100 μm in Fig. 3(A)reflects the minimum spacing between RGCs of thesame type within a single mosaic.

Cells that are synchronized in this network havecross-correlations that peak at zero lag (as can be seenin Fig. 5 below and in Shlens et al. 2006 and Pillow et al.2008). This is true even if we bin our spike trains withsub-millisecond precision. This is difficult to explainby direct coupling between the cells, since it wouldtake some finite time for the influence of one cell topropagate to the other (though, gap junction couplingcan indeed act on a submillisecond time scale and canexplain fast time scale synchronization Brivanlou et al.1998). Pillow et al. (2008) used virtually instantaneouscross-coupling filters with sub-millisecond resolution tocapture these synchronous effects. To avoid this non-physiological instantaneous coupling, as discussed inthe methods we imposed a strict delay on our cross-coupling filters Li, j, to force the model to distinguish be-tween the common noise and the direct cross-coupling.Therefore, by construction, the cross-coupling filterscannot account for very short-lag synchrony (under4 ms) in this model, and the mixing matrix M handlesthese effects, while the cross-coupling filters and thereceptive field overlap in the pairwise model accountfor synchrony on longer time scales. Indeed, the modelassigns a small value to the cross-coupling and, as wewill show below, we can in fact discard the cross-coupling filters while retaining an accurate model ofthis network’s spiking responses.

In the pairwise model, since every two cells shareone common noise term, we may directly maximize themarginal log-likelihood of observing the spike trainsgiven M (which is just a scalar in this case), by eval-uating the log-likelihood on a grid of different valuesof M. For each pair of cells, we checked that the valueof M obtained by this maximum-marginal-likelihoodprocedure qualitatively matched the value obtained bythe method of moments procedure (data not shown).We also repeated our analysis for varying lengths ofthe imposed delay on the direct interneuronal cross-coupling filters. We found that the length of this delaydid not appreciably change the obtained value of M,nor the strength of the cross-coupling filters, across therange of delays examined (0.8 − 4 ms).

3.2 Common-noise model with no direct coupling

We examined the effects of setting the cross-couplingfilters to zero in the common-noise model in response

to two considerations. The first consideration is thefact that the cross-coupling inputs to the cells weremuch smaller than all other inputs once we introducedthe common noise effects and imposed a delay onthe coupling terms. The second consideration is thatTrong and Rieke (2008) reported that the ON RGCsare weakly coupled, and OFF cells are not directlycoupled. This reduced the number of parameters in themodel drastically since we no longer have to fit ncells ×(ncells − 1) cross-coupling filters, as in the fully generalmodel. When we estimated the model parameters withthe direct coupling effects removed, we found that theinputs to the cell are qualitatively the same as in thepairwise model. For this new common-noise model,across the population, the standard deviation of thetotal network-induced input is approximately 1/2 thestandard deviation of total input in the cells (Fig. 3(B)),in agreement with the results of Pillow et al. (2008).We also found that the inferred common noise strengthshared by any pair of cells depended strongly on thedegree of overlap between the receptive fields of thecells (Fig. 4), consistent with the distance-dependentcoupling observed in Fig. 3(A).

In order to test the quality of the estimated modelparameters, and to test whether the common noisemodel can account for the observed synchrony, wepresented the model with a new stimulus and examinedthe resulting spike trains. As discussed in the methods,it is important to note that the common noise sampleswe used to generate spike trains are not the MAPestimate of the common noise we obtain in the fitting,but rather, new random realizations. This was done fortwo reasons. First, we used a cross-validation stimulusthat was never presented to the model while fittingthe parameters, and for which we have no estimateof the common noise. Second, we avoid over-fittingby not using the estimated common noise. Since wehave a probabilistic model, and we are injecting noiseinto the cells, the generated spike trains should not beexact reproductions of the collected data; rather, thestatistical structure between the cells should be pre-served. Below, we show the two point (Fig. 5) and threepoint correlation (Fig. 6) functions of a set of randomlyselected cells under the common-noise model. Both inthe cross-correlation function and in the three-pointcorrelation functions, one can see that the correlationsof the data are very well approximated by the model.4

4In many of the three point correlation functions one can noticepersistent diagonal structures. If we consider the three pointcorrelation of neuron 1 with the time shifted firing of neurons 2and 3, the diagonal structure is the sign of a correlation betweenthe time shifted neurons (2 and 3).

J Comput Neurosci

ON RF overlap

cell

num

20 40 60 80 100

20

40

60

80

100

ON CN corr

20 40 60 80 100

OFF RF overlap

cell num

cell

num

50 100 150

20

40

60

80

100

120

140

160

OFF CN corr

cell num

50 100 150

0

1

Fig. 4 Comparing the inferred common noise strength (right)and the receptive field overlap (left) across all ON–ON and OFF–OFF pairs. Note that these two variables are strongly dependent;Spearman rank correlation coefficent = 0.75 (computed on allpairs with a positive overlap, excluding the diagonal elementsof the displayed matrices). Thus the strength of the commonnoise between any two cells can be predicted accurately giventhe degree to which the cells have overlapping receptive fields.

The receptive field overlap was computed as the correlationcoefficient of the spatial receptive fields of the two cells; thecommon noise strength was computed as the correlation valuederived from the estimated common noise spatial covariancematrix Cs (i.e., Cs(i, j)/

√Cs(i, i)Cs( j, j)). Both of these quantities

take values between −1 and 1; all matrices are plotted on thesame color scale

The results for the pairwise model are qualitatively thesame and were previously presented in Vidne et al.(2009).

To further examine the accuracy of the estimatedcommon noise terms q, we tested the model with thefollowing simulation. Using the estimated model pa-rameters and the true stimulus, we generated a newpopulation spike-train from the model, using a novelsample from the common-noise input q. Then we es-timated q given this simulated population spike-trainand the stimulus, and examined the extent to whichthe estimated q reproduces the true simulated q. Wefound that the estimated q tracks the true q well whena sufficient number of spikes are observed, but shrinksto zero when no spikes are observed. This effect can bequantified directly by plotting the true vs. the inferredqt values; we find that E(qt|qt) is a shrunk and half-rectified version of the true qt (Fig. 7(A)), and whenquantified across the entire observed population, we

find that the correlation coefficient between the trueand inferred simulated common-noise inputs in thismodel is about 0.5 (Fig. 7(B)).

We also tested the model by examining the peris-timulus time histogram (PSTH). We presented to themodel 60 repetitions of a novel 10 s stimulus, andcompared the PSTHs of the generated spikes to thePSTHs of the recorded activity of the cells (Fig. 8).The model does very well in capturing the PSTHs ofthe ON population and captures the PSTH of the OFFpopulation even more accurately. In each case (PSTHs,cross-correlations, and triplet-correlations), the resultsqualitatively match those presented in Pillow et al.(2008). Finally, we examined the dependence of theinferred common noise on the stimulus; recall that weassumed when defining our model that the commonnoise is independent of the stimulus. We computed theconditional expectation E[q|x, y] as a function of x foreach cell, where we use y to denote the cell’s observed

J Comput Neurosci

1

2

3

4

5a

b cDataOFF reference cellON reference cellNo- CI

a,3 a,5 a,b

4,3 4,2 4,c

0-96 96ms

off-onoff-offoff-off

on-onon-offon-off

Fig. 5 Comparing real versus predicted cross-correlations. Ex-ample of 48 randomly selected cross-correlation functions ofretinal responses and simulated responses of the common-noisemodel without direct cross-coupling (Fig. 1(C)). Each group ofthree panels shows the cross-correlation between a randomlyselected reference ON/OFF (red/blue) cell and its three nearestneighbor cells (see schematic receptive field inset and enlargedexample). In black is the data, in blue/red is the generated datausing the common input model, and in green is the data generatedwith the common input turned off. The baseline firing rates aresubtracted; also, recall that cells whose firing rate was too low or

which displayed spike-sorting artifacts were excluded from thisanalysis. Each cross-correlation is plotted for delays between −96to 96 ms. The Y-axis in each plot is rescaled to maximize visibility.The cross correlation between cells of the same type is alwayspositive while opposite type cells are negatively cross correlated.Therefore, a blue positive cross correlation indicates an OFF–OFF pair, a red positive cross correlation indicates an ON–ONpair, and a blue/red negative cross-correlation indicates a OFF–ON/ON–OFF pair. Note that the cross-correlation at zero lag iscaptured by the common noise

spike train, q for the common noise influencing thecell, and x denotes the sum of all the other inputsto the cell (i.e., the stimulus input ki · xt plus the re-fractory input hi · yi

t). This function E[q|x, y] was well-approximated by a linear function with a shallow slopeover the effective range of the observed x (data notshown). Since we can reabsorb any linear dependencein E[q|x, y] into the generalized linear model (via asuitable rescaling of x), we conclude that the simplestimulus-independent noise model is sufficient here.

The common noise model suggests that noise in thespike trains has two sources: the “back-end” noise dueto the stochastic nature of the RGC spiking mechanism,and the “front-end” noise due to the common noiseterm, which represents the lump sum of filtered noisesources presynaptic to the RGC layer. In order toestimate the relative contribution of these two terms,we used the “law of total variance”:

Ex[var(y|x)] = Ex[varq|x(E[y|q, x])]

+Ex[Eq|x[var(y|q, x)]] . (12)

Here q, x, and y are as in the preceding paragraph. Thefirst term on the right-hand side here represents the“front-end” noise: it quantifies the variance in the spiketrain that is only due to variance in q for different valuesof x. The second term on the right represents the “back-end” noise: it quantifies the average conditional vari-ance in the spike train, given x and q (i.e., the variancedue only to the spiking mechanism). We found that thefront-end noise is approximately 60% as large as theback-end noise in the ON population, on average, andapproximately 40% as large as the back-end noise inthe OFF population.

3.3 Decoding

The encoding of stimuli to spike-trains is captured wellby the GLSS model, as quantified in Figs. 5–8. Wewill now consider the inverse problem, the decodingof the stimuli given the spike-trains and the modelparameters. This decoding analysis is performed usingreal spike trains that were left out during the fittingprocedure of the pairwise model (Fig. 1B).

J Comput Neurosci

1

2

3

4

5a

b c500

ms-50 500

-50

0

Model Data

cells

: a,3

,5

50-50

Fig. 6 Comparing real and predicted triplet correlations. 89example third-order (triplet) correlation functions betweena reference cell (a in the schematic receptive field) andits two nearest-neighbors (cells 3 and 5 in the schematicreceptive field). Triplet correlations were computed in 4-ms bins according to C(τ1, τ2) = [〈y1(t)y2(t + τ1)y3(t + τ2)〉 −〈y1(t)〉〈y2(t)〉〈y3(t)〉]/(〈y2(t)〉〈y3(t)〉dt). Color indicates the instan-taneous spike rate (in 4 ms bins) as a function of the relative

spike time in the two nearest neighbor cells for time delaysbetween −50 to 50 ms. The left figure in each pair is the model(Fig. 1(C)) and the right is data. The color map for each of thetwo plots in each pair is the same. For different pairs, the colormap is renormalized so that the maximum value of the pair is red.Note the model reproduces both the long time scale correlationsand the peaks at the center corresponding to short time scalecorrelations

-15 -10 -5 0 5 10

0

1

2

3

4

5

simulated common-input

cond

ition

al e

xpec

ted

com

mon

noi

se

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

correlation coefficient between the estimated and the simulated common-noise inputs

coun

t

OFF cells

ON cells(A) (B)

Fig. 7 Comparing simulated and inferred common-noise effectsgiven the full population spike train. (A) The conditional ex-pected inferred common-noise input q, averaged over all cellsin the population, ± 1 s.d., versus the simulated common-noiseinput q. Note that the estimated input can be approximated

as a rectified and shrunk linear function of the true input. (B)Population summary of the correlation coefficient between theestimated common-noise input and the simulated common-noiseinput for the entire population

J Comput Neurosci

Fig. 8 Comparing real versuspredicted PSTHs. Exampleraster of responses andPSTHs of recorded data andmodel-generated spike trainsto 60 repeats of a novel 1 secstimulus. (A) OFF RGC(black) and model cell (blue).(B) ON RGC (black) andmodel ON cell (red). (C)–(D)Correlation coefficientbetween the model PSTHsand the recorded PSTHs forall OFF (C) and ON cells(D). The model achieves highaccuracy in predicting thePSTHs

20

40

60

20

40

0

0 200 400 600 800 10000

100

200

0

Hz

mod

elR

GC

20

40

60

20

40

60

0 200 400 600 800 10000

100

200

300

Hz

mod

elR

GC

OFF cell

ON cell

A

B

C

Data

OFF cell

[ms]

[ms]

5

10

15

coun

t

0

10

20

30OFF cells

coun

t

ON cells

0.5 0.6 0.7 0.8 0.9 1

00.5 0.6 0.7 0.8 0.9 1

D

ON cell

Data

correlation coefficient between the PSTHs

correlation coefficient between the PSTHs

Figure 9 shows an example of decoding the filteredstimulus, ui, for a pair of cells. Figure 10(A) showsa comparison of the mean-square decoding errors for

MAP estimates based on the pairwise common-noisemodel, and those based on a GLM with no commonnoise but with post-spike filters directly coupling the

Fig. 9 Stimulus decodinggiven a pair of spike trains. Inpanels (B) through (D) blueindicates cell 1 and green cell2. (A) MAP estimate of thecommon noise going into twocells (black) with onestandard deviation of theestimate (gray). (B) and (C)panels: The true stimulusfiltered by the estimatedspatio-temporal filter trace,ki · xt, (black) and the MAPestimate of the filteredstimulus (green/blue) withone standard deviation(gray). Decoding performedfollowing the methoddescribed in Section 2.4. (D)The spike trains of the twocells used to perform thedecoding

spikes

-5

0

5

-4

-2

0

2

common noise

0 100 200 300 400 500

1

0

-1

(A)

(B)

(C)

(D)

filtered stimulus 2

filtered stimulus 1

ms

J Comput Neurosci

0.4 0.5 0.6 0.7

0.4

0.5

0.6

0.7

MSE with couplling filters

MS

E w

ith c

omm

on n

oise

ON pairs

OFF pairs

(A) (B)

0 20 40 60 80 100 120

% c

hang

e in

mea

n se

nsiti

vity

cross-correlation peak [Hz]

-16

-12

-8

-4

0 ON pairs

OFF pairs

Fig. 10 Decoding jitter analysis. (A) Without any spike jitter, thetwo models decode the stimulus with nearly identical accuracy.Each point represents the decoding performance of the twomodels for a pair of cells. In red: ON–ON pairs. In blue: OFF–OFF. (B) Jitter analysis of the common noise model versus thedirect coupling model from Pillow et al. (2008). The X axisdenotes the maximum of the cross-correlation function betweenthe neuron pair. Baseline is subtracted so that units are in spikes/sabove (or below) the cells’ mean rate. The Y axis is the percent

change (Scommon−noise−Sdirect−coupling

Scommon−noise∗ 100) in the model’s sensitivity

to jitter (S =√

U jitter−U|�t| ) compared to the change in sensitivity of

the direct coupling model: negative values mean that the commonnoise model is less sensitive to jitter than a model with directcross-coupling. Note that the common-noise model leads to morerobust decoding in every cell pair examined, and pairs that arestrongly synchronized are much less sensitive to spike-train jitterin the common noise model than in the model with couplingfilters

two cells, as in Pillow et al. (2008). There is very littledifference between the performance of the two modelsbased on the decoding error.

However, the decoder based on the common noisemodel turned out to be significantly more robust thanthat based on the direct, instantaneous coupling model.We studied robustness by quantifying how jitter in thespike trains changes the decoded stimulus, u(y jitter), inthe two models, where y jitter is the jittered spike train(Ahmadian et al. 2009). Specifically, we analyzed ameasure of robustness which quantifies the sensitivityto precise spike timing. To calculate a spike’s jittersensitivity we compute the estimate u(y) for the originalspike train and for a spike train in which the timing ofthat spike is jittered by a Gaussian random variable,with standard deviation �t. We defined the sensitivityto be the root mean square distance between the twostimulus estimates divided by |�t|, S =

√u jitter−u|�t| , where

u jitter is the decoded stimulus of the jittered spike train,and S is computed in the limit of small jitter, �t → 0.The average value of these spike sensitivities quan-tifies how sensitively the decoder depends on smallvariations in the spike trains. Conversely, the smallerthese quantities are on average, the more robust is thedecoder.

Our intuition was that the direct coupling modelhas very precise spike-induced interactions, via thecoupling filters Li, j, but these interactions should befragile, in the sense that if a spike is perturbed, it will

make less sense to the decoder, because it is not atthe right time relative to spikes of other cells. In thecommon-noise model, on the other hand, the temporalconstraints on the spike times should be less precise,since the common noise term acts as an unobservednoise source which serves to jitter the spikes relativeto each other, and therefore adding small amountsof additional spike-time jitter should have less of animpact on the decoding performance. We found thatthis was indeed the case: the spike sensitivities forON–ON (OFF–OFF) pairs turned out to decrease byabout 10% (5%) when using the decoder based onthe common noise model instead of the instantaneouscoupling model with no common noise. Furthermore,the percentage decrease in mean spike sensitivity wasdirectly proportional to the strength of the commonnoise input between the two cells (Fig. 10(B)).

4 Discussion

The central result of this study is that multi-neuronfiring patterns in large networks of primate parasolretinal ganglion cells can be explained accurately bya common-noise model with no direct coupling inter-actions (Figs. 1(C), 5–8), consistent with the recentintracellular experimental results of Trong and Rieke(2008). The common noise terms in the model can

J Comput Neurosci

be estimated on a trial-by-trial basis (Figs. 2 and 7),and the scale of the common noise shared by any pairof cells depends strongly on the degree of overlap inthe cells’ receptive fields (Fig. 4). By comparing themagnitudes of noise- versus stimulus-driven effects inthe model (Fig. 3), we can quantify the relative contri-butions of this common noise source, versus spike-trainoutput variability, to the reliability of RGC responses.Finally, optimal Bayesian decoding methods based onthe common-noise model perform just as well as (and infact, are more robust than) models that account for thecorrelations in the network by direct coupling (Fig. 10).

4.1 Modeling correlated firing in large networks

In recent years, many researchers have grappled withthe problem of inferring the connectivity of a networkfrom spike-train activity (Chornoboy et al. 1988; Utikal1997; Martignon et al. 2000; Iyengar 2001; Paninskiet al. 2004; Truccolo et al. 2005; Okatan et al. 2005;Nykamp 2005, 2008, 2009; Kulkarni and Paninski 2007;Stevenson et al. 2008, 2009). Modeling of correlatedfiring in the retina has been a special focus (Nirenberget al. 2002; Schnitzer and Meister 2003; Schneidmanet al. 2006; Shlens et al. 2006; Pillow et al. 2008; Shlenset al. 2009; Cocco et al. 2009). The task of disambiguat-ing directed connectivity from common input effectsinvolves many challenges, among them the fact that thenumber of model parameters increases with the size ofthe networks, and therefore more data are required inorder to estimate the model parameters. The computa-tional complexity of the task also increases rapidly. Inthe Bayesian framework, integrating over higher andhigher dimensional distributions of unobserved inputsbecomes difficult. Here, we were aided by prior knowl-edge from previous physiological studies—in particu-lar, that the common noise in this network is spatiallylocalized (Shlens et al. 2006; Pillow et al. 2008) andfast (with a timescale that was explicitly measured byTrong and Rieke 2008)—to render the problem moremanageable. The model is parallelizable, which madethe computation much more tractable. Also, we tookadvantage of the fact that the common noise may bewell approximated as an AR process with a bandedstructure to perform the necessary computations intime that scales linearly with T, the length of the ex-periment. Finally, we exploited the separation of timescales between the common-noise effects and those dueto direct connectivity by imposing a strict time delay onthe cross-coupling filters; this allowed us to distinguishbetween the effects of these terms in estimating themodel structure (Figs. 2 and 3).

Our work relied heavily on the inference of themodel’s latent variables, the common noise inputs, andtheir correlation structure. If we fit our model para-meters while ignoring the effect of the common noise,then the inferred coupling filters incorrectly attempt tocapture the effects of the unmodeled common noise. Agrowing body of related work on the inference of latentvariables given spike train data has emerged in thelast few years, building on work both in neuroscience(Smith and Brown 2003) and statistics (Fahrmeir andKaufmann 1991; Fahrmeir and Tutz 1994). For exam-ple, Yu et al. (2006, 2009) explored a dynamical latentvariable model to explain the activity of a large numberof observed neurons on a single trial basis during motorbehavior. Lawhern et al. (2010) proposed to includea multi-dimensional latent state in a GLM frameworkto account for external and internal unobserved states,also in a motor decoding context. We also inferred theeffects of the common noise on a single trial basis, butour method does not attempt to explain many observedspike trains on the basis of a small number of latentvariables (i.e., our focus here is not on dimensionalityreduction); instead, the model proposed here maintainsthe same number of common noise terms as the num-ber of observed neurons, since these terms are meantto account for noise in the presynaptic network, andevery RGC could receive a different and independentcombination of these inputs.

Two important extensions of these methods shouldbe pursued in the future. First, in our method thetime scale of the common noise is static and preset.It will be important to relax this requirement, to allowdynamic time-scales and to attempt to infer the correcttime scales directly from the spiking data, perhaps viaan extension of the moment-matching or maximumlikelihood methods presented here. Another importantextension would be to include common noise whosescale depends on the stimulus; integrating over suchstimulus-dependent common noise inputs would entailsignificant statistical and computational challenges thatwe hope to pursue in future work (Pillow and Latham2007; Mishchenko et al. 2011).

4.2 Possible biological interpretations of the commonnoise

One possible source of common noise is synaptic vari-ability in the bipolar and photoreceptor layers; thisnoise would be transmitted and shared among nearbyRGCs with overlapping dendritic footprints. Our re-sults are consistent with this hypothesis (Fig. 4), andexperiments are currently in progress to test this idea

J Comput Neurosci

more directly (Rieke and Chichilnisky, personal com-munication). It is tempting to interpret the mixing ma-trix M in our model as an effective connectivity matrixbetween the observed RGC layer and the unobservedpresynaptic noise sources in the bipolar and photore-ceptor layers. However, it is important to rememberthat M is highly underdetermined here: as emphasizedin the Methods section, any unitary rotation of themixing matrix will lead to the same inferred covariancematrix Cλ. Thus an important goal of future work willbe to incorporate further biological constraints (e.g.,on the sparseness of the cross coupling filters, local-ity of the RGC connectivity, and average number ofphotoreceptors per RGC), in order to determine themixing matrix M uniquely. One possible source of suchconstraints was recently described by Field et al. (2010),who introduced methods for resolving the effectiveconnectivity between the photoreceptor and RGC lay-ers. Furthermore, parasol ganglion cells are only oneof many retinal ganglion cell type; many researchershave found the other cell types to be synchronized aswell (Arnett 1978; Meister et al. 1995; Brivanlou et al.1998; Usrey and Reid 1999; Greschner et al. 2011). Animportant direction for future work is to model thesecircuits together, to better understand the joint sourcesof their synchrony.

4.3 Functional roles of synchrony and common noise

Earlier work (Pillow et al. 2008) showed that properconsideration of synchrony in the RGC populationsignificantly improves decoding accuracy. Here, weshowed that explaining the synchrony as an outcomeof common noise driving the RGCs, as opposed to amodel where the synchrony is the outcome of directpost-spike coupling, leads to less sensitivity to temporalvariability in the RGC output spikes. One major ques-tion remains open: are there any possible functional ad-vantages of this (rather large) common noise source inthe retina? Cafaro and Rieke (2010) recently reportedthat common noise increases the accuracy with whichsome retinal ganglion cell types encode light stimuli. Aninteresting analogy to image processing might provideanother clue: “dithering” refers to the process of addingnoise to an image before it is quantized in order torandomize quantization error, helping to prevent visualartifacts which are coherent over many pixels (andtherefore more perceptually obvious). It might be inter-esting to consider the common noise in the retina as ananalog of the dithering process, in which nearby pho-toreceptors are dithered by the common noise beforethe analog-to-digital conversion implemented at the

ganglion cell layer (Masmoudi et al. 2010). Of course,we emphasize that this dithering analogy represents justone possible hypothesis, and further work is necessaryto better understand the computational role of commonnoise in this network.

Acknowledgements We would like to thank F. Rieke for help-ful suggestions and insight; K. Masmoudi for bringing the analogyto the dithering process to our attention; O. Barak, X. Pitkow andM. Greschner for comments on the manuscript; G. D. Field, J. L.Gauthier, and A. Sher for experimental assistance. A preliminaryversion of this work was presented in Vidne et al. (2009). Inaddition, an early version of Fig. 2 appeared previously in thereview paper (Paninski et al. 2010), and Fig. 11(D) is a repro-duction of a schematic figure from (Shlens et al. 2006). Thiswork was supported by: the Gatsby Foundation (M.V.); a RobertLeet and Clara Guthrie Patterson Trust Postdoctoral Fellowship(Y.A.); an NSF Integrative Graduate Education and ResearchTraineeship Training Grant DGE-0333451 and a Miller InstituteFellowship for Basic Research in Science, UC Berkeley (J.S.); USNational Science Foundation grant PHY-0417175 (A.M.L.); NIHGrant EY017736 (E.J.C.); HHMI (E.P.S.); NEI grant EY018003(E.J.C., L.P. and E.P.S.); and a McKnight Scholar award (L.P.).We also gratefully acknowledge the use of the Hotfoot sharedcluster computer at Columbia University.

Appendix A: O(T) optimization for parameterestimation and stimulus decoding for modelswith common noise effects

The common-noise model was formulated as condition-ally independent cells, with no cross-coupling filters,interacting through the covariance structure of thecommon noise inputs. Therefore, the model lends it-self naturally to parallelization of the computationallyexpensive stage, the maximum likelihood estimationof the model parameters. For each cell, we need toestimate the model parameters and the common noisethe cell receives. The common noise to the cell ona given trial is a time series of the same length ofthe experiment itself (T discrete bins), and finding themaximum a posteriori (MAP) estimate of the commonnoise is a computationally intensive task because ofits high dimensionality, but it can be performed inde-pendently on each cell. We used the Newton-Raphsonmethod for optimization over the joint vector, ν =[�, q], of the model cell’s parameters and the common-noise estimate. Each iteration in the Newton-Raphsonmethod requires us to find the new stepping direction,δ, by solving the linear set of equations, Hδ = ∇, where∇ denotes the gradient and H the Hessian matrix (thematrix of second derivatives) of the objective functionF. We need to solve this matrix equation for δ at everystep of the optimization. In general, this requires O(T3)

J Comput Neurosci

(A) (B)

(C) (D)

−20 −15 −10 −5 0 5 10 15 20−0.2

−0.1

0

msNeuron ID

Neu

ron

ID

50 100 150 200 250

50

100

150

200

250

−10

−5

0

5

10

15

20

25

0 10 20 30 40 50 60 700

0.05

0.1

0.15

0.2

0.25

rela

tive

pow

er

singular values

2nd singular vector

1st singular vector

0.1

0.2

0.3

0.4

On-On

Off-Off

On-Off

Off-On

Fig. 11 PSTH-based method (A) Approximate spatial covari-ance matrix of the inferred common noise composed of thefirst two spatial singular vectors. Ci, j = ∑2

k=1 σkUk (the vectorsare reshaped into matrix form). Note the four distinct regionscorresponding to the ON–ON, OFF–OFF, ON–OFF, and OFF–ON. (B) First two temporal singular vectors, correspondingto the temporal correlations of the ‘same-type’ (ON–ON andOFF–OFF pairs), and ‘different type’(ON–OFF pairs). Note theasymmetry of the ON–OFF temporal correlations. (C) Relative

power of all the singular values. Note that the first two singularvalues are well separated from the rest, indicating the correlationstructure is well approximated by the first two singular vectors.(D) Receptive field centers and the numbering schema. Top inred: ON cells. Bottom in blue: OFF cells. The numbering schemastarts at the top left corner and goes column-wise to the right. TheON cells are 1 through 104 and the OFF cells are 105 to 277. As aresult, cells that are physically close are usually closely numbered

operations which renders naive approaches inapplica-ble for long experiments such as the one discussedhere. Here we used a O(T) method for computingthe MAP estimate developed in Koyama and Paninski(2010).

Because of the autoregressive structure of the com-mon noise, q, the log posterior density of qt can bewritten as:

F = log p({qt}|{yt}) = log p(qo)

+T∑

t=1

log p(qt|qt−1)+T∑

t=1

log p(yt|qt)+const, (13)

where we denote the spike train as {yt}, and for sim-plicity we have suppressed the dependence on all other

model parameters and taken q to be a first order au-toregressive process. Since all terms are concave in q,the entire expression is concave in q.

The Hessian, H, has the form:

H =

⎡

⎢⎢⎢⎣

∂2 F∂2q

∂2 F∂q∂�

∂2 F∂�∂q

∂2 F∂2�

⎤

⎥⎥⎥⎦ ; (14)

note that the dimension of J = ∂2 F∂2q is T × T (in our case

we have a 9.6 min recording and we use 0.8 ms bins,i.e. T = 720, 000), while the dimension of H�� = ∂2 F

∂2�

is just N × N, where N is the number of parameters in

J Comput Neurosci

the model (N < 20 for all the models considered here).Using the Schur complement of H we get:

H−1 =[

J Hq�

H�q H��

]−1

=[

J−1 + J−1 Hq�S−1 H�q J −J−1 Hq�S−1

−S−1 H�q J−1 S−1

]

(15)

where we note the Schur complement of H as S =J − Hq� H−1

�� H�q. The dimensions of ∂2 F∂2�

are small(number of model parameters, N), and because of theautoregressive structure of the common noise, J = ∂2 F

∂2q ,is a tridiagonal matrix:

J =

⎡

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

D1 B1,2 0 . . . 0 0BT

1,2 D2 B2,3 . . . 0 0

0 BT2,3 D3

. . . 0 0...

.... . .

. . .... 0

0 0 0 BTT−2,T−1 DT−1 BT−1,T

0 0 0 . . . BTT−1,T DT

⎤

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(16)

where

Dt = ∂2

∂q2t

log p(yt|qt) + ∂2

∂q2t

log p(qt|qt−1)

+ ∂2

∂q2t

log p(qt+1|qt) (17)

and

Bt,t+1 = ∂2

∂qt∂qt+1log p(qt+1|qt). (18)

In the case that we use higher order autoregressiveprocesses, Dt and Bt are replaced by d × d blockswhere d is the order of the AR process. The tridiagonalstructure of J allows us to obtain each Newton stepdirection, δ, in O(d3T) time using standard methods(Rybicki and Hummer 1991; Paninski et al. 2010).

We can also exploit similar bandedness propertiesfor stimulus decoding. Namely, we can carry out theNewton-Raphson method for optimization over thejoint vector ν = (u, q), Eq. (11), in O(T) computa-tional time. According to Eq. (11) (repeated here forconvenience),

u(y)=argmaxu,q

[log p(u)+log p(q|�)+log p(y|u, q, �)].

(19)

the Hessian of the log-posterior, J, (i.e., the matrixof second partial derivatives of the log-posterior withrespect to the components of ν = (u, q)) is:

J = ∂2 log p(u)

∂2ν+ ∂2 log p(q|�)

∂2ν+ ∂2 log p(y|u, q, �)

∂2ν.

(20)

Thus, the Hessian of the log-posterior is the sum of theHessian of the log-prior for the filtered stimulus, A =∂2

∂2νlog p(u), the Hessian, B = ∂2

∂2νlog p(q|�), of the log-

prior for the common noise, and the Hessian of the log-likelihood, D = ∂2

∂2νlog p(y|u, q, �). Since we took the

stimulus and common noise priors to be Gaussian withzero mean, A and B are constant matrices, independentof the decoded spike train.

We order the components of ν = (u, q) according to(u1

1, u21, q1, u1

2, u22, q2, · · · ), where subscripts denote the

time step, i.e., such that components corresponding tothe same time step are adjacent. With this ordering, Dis block-diagonal with 3 × 3 blocks

D =⎛

⎜⎝d1,1 0 · · ·

0 d2,2 · · ·...

.... . .

⎞

⎟⎠ ,

dt,t =

⎛

⎜⎜⎜⎜⎜⎜⎜⎝

∂L∂u1

t ∂u1t

∂L∂u1

t ∂u2t

∂L∂u1

t ∂qt

∂L∂u2

t ∂u1t

∂L∂u2

t ∂u2t

∂L∂u2

t ∂qt

∂L∂qt∂u1

t

∂L∂qt∂u2

t

∂L∂qt∂qt

⎞

⎟⎟⎟⎟⎟⎟⎟⎠

, (21)

where L ≡ log p(y|u, q, �) is the GLM log-likelihood.The contribution of the log-prior for q is given by

B =⎛

⎜⎝b1,1 b1,2 · · ·b2,1 b2,2 · · ·...

.... . .

⎞

⎟⎠ , bt1,t2 =⎛

⎝0 0 00 0 00 0 Bt1,t2

⎞

⎠ , (22)

where, as above, the matrix[Bt1,t2

]is the Hessian cor-

responding to the autoregressive process describing q(the zero entries in bt1,t2 involve partial differentiationwith respect to the components of u which vanish be-cause the log-prior for q is independent of u). Since Bis banded, B is also banded. Finally, the contribution ofthe log-prior term for u is given by

A =⎛

⎜⎝a1,1 a1,2 · · ·a2,1 a2,2 · · ·...

.... . .

⎞

⎟⎠ , at1,t2 =⎛

⎝A1,1

t1,t2 A1,2t1,t2 0

A2,1t1,t2 A2,2

t1,t2 00 0 0

⎞

⎠ , (23)

J Comput Neurosci

where the matrix A (formed by excluding all thezero rows and columns, corresponding to partialdifferentiation with respect to the common noise, fromA) is the inverse covariance matrix of the ui. Thecovariance matrix of the ui is given by Cij

u = kiCxk j T

where Cx is the covariance of the spatio-temporallyfluctuating visual stimulus, x, and k j T

is cell j’s receptivefield transposed. Since a white-noise stimulus was usedin the experiment, we use Cx = c2I, where I is theidentity matrix and c is the stimulus contrast. Hence wehave Cij

u = c2kik j T, or more explicitly5

[Cu

]i, jt1,t2

≡ Cov[uit1, u j

t2 ] = c2∑

t,n

ki(t1 − t, n)k j(t2 − t, n).

(24)

Notice that since the experimentally fit ki have a finitetemporal duration Tk, the covariance matrix, Cu isbanded: it vanishes when |t1 − t2| ≥ 2Tk − 1. However,the inverse of Cu, and therefore A are not banded ingeneral. This complicates the direct use of the bandedmatrix methods to solve the set of linear equations,Jδ = ∇, in each Newton-Raphson step. Still, as we willnow show, we can exploit the bandedness of Cu (as wellas B and D) to obtain the desired O(T) scaling.

In order to solve each Newton-Raphson step weneed to recast our problem into an auxiliary space inwhich all our matrices are tridiagonal. Below we showhow such a rotation can be accomplished. First, we cal-culate the (lower triangular) Cholesky decomposition,L, of Cu, satisfying LL

T = Cu. Since Cu is banded, L isitself banded, and its calculation can be performed inO(T) operations (and is performed only once, becauseCu is fixed and does not depend on the vector (u, q)).Next, we form the 3T × 3T matrix L

L =⎛

⎜⎝l1,1 0 · · ·l2,1 l2,2 · · ·...

.... . .

⎞

⎟⎠ , lt1,t2 =⎛

⎝L1,1

t1,t2 L1,2t1,t2 0

L2,1t1,t2 L2,2

t1,t2 00 0 1

⎞

⎠ .

(25)

Clearly L is banded because L is banded. Also noticethat when L acts on a state vector it does not affect its q

5Since x is binary, strictly speaking, ui is not a Gaussian vectorsolely described by its covariance. However, because the filterski have a relatively large spatiotemporal dimension, the compo-nents of ui are weighted sums of many independent identicallydistributed binary random variables, and their prior marginaldistributions can be well approximated by Gaussian distributions(see Pillow et al. 2011 and Ahmadian et al. 2011 for furtherdiscussion of this point). For this reason, we replaced the true(non-Gaussian) joint prior distribution of yi with a Gaussiandistribution with zero mean and covariance Eq. (24).

part: it corresponds to the T × T identity matrix in theq-subspace. Let us define

G ≡ LTJL. (26)

Using the definitions of L and A, and LT

AL = I (whichfollows from A = C−1

u and the definition of L), we thenobtain

G = Iu + B + LTDL, (27)

where we defined

Iu =⎛

⎜⎝iu 0 · · ·0 iu · · ·...

.... . .

⎞

⎟⎠ , iu =⎛

⎝1 0 00 1 00 0 0

⎞

⎠ . (28)

Iu is diagonal and B is banded, and since D is block-diagonal and L is banded, so is the third term inEq. (27). Thus G is banded.

Now it is easy to see from Eq. (26) that the solution,δ, of the Newton-Raphson equation, Jδ = ∇, can bewritten as δ = Lδ where δ is the solution of the auxiliaryequation Gδ = L

T∇. Since G is banded, this equationcan be solved in O(T) time, and since L is banded, therequired matrix multiplications by L and L

Tcan also

be performed in linear computational time. Thus theNewton-Raphson algorithm for solving Eq. (11) can beperformed in computational time scaling only linearlywith T.

Appendix B: Method of moments

The identification of the correlation structure of a la-tent variable from the spike-trains of many neurons,and the closely related converse problem of gener-ating spike-trains with a desired correlation structure(Niebur 2007; Krumin and Shoham 2009; Macke et al.2009; Gutnisky and Josic 2010), has received muchattention lately. Krumin and Shoham (2009) generatespike-trains with the desired statistical structure bynonlinearly transforming the underlying nonnegativerate process to a Gaussian processes using a few com-monly used link functions. Macke et al. (2009) andGutnisky and Josic (2010) both use thresholding of aGaussian process with the desired statistical structure,though these works differ in the way the authors sam-ple the resulting processes. Finally, Dorn and Ringach(2003) proposed a method to find the correlation struc-ture of an underlying Gaussian process in a modelwhere spikes are generated by simple threshold cross-ing. Here we take a similar route to find the correlationstructure between our latent variables, the commonnoise inputs. However, the correlations in the spike

J Comput Neurosci

trains in our model stem from both the receptive fieldoverlap as well as the correlations among the latentvariable (the common noise inputs) which we have toestimate from the observed spiking data. Moreover,the GLM framework used here affords some additionalflexibility, since the spike train including history effectsis not restricted to be a Poisson process given the latentvariable.

Our model for the full cell population can be writtenas:

λ = exp(k · x + h · y + Q), (29)

where Q ∼ N (0, Cτ ⊗ Cs) since we assume the spatio-temporal covariance structure of the common noise isseparable (as discussed in Section 2.2). Here, we sep-arate the covariance of the common noise inputs intotwo terms: the temporal correlation of each of the com-mon noise sources which imposes the autoregressivestructure, Cτ , and the spatial correlations between thedifferent common inputs to each cell, Cs. Correlationsbetween cells have two sources in this model. First, thespatio-temporal filters overlap. Second, the correlationmatrix Cs accounts for the common noise correlation.The spatio-temporal filters’ overlap is insufficient toaccount for the observed correlation structure, as dis-cussed in (Pillow et al. 2008); the RF overlap does notaccount for the large sharp peak at zero lag in the cross-correlation function computed from the spike trains ofneighboring cells. Therefore, we need to capture thisfast remaining correlation through Cs.

We approximated our model as: yi ∼Poisson(exp(aizi + qi)dt) where zi = ki · xi + hi · yi.Since the exponential nonlinearity is a convex,log-concave, increasing function of z, so isEq

[exp(aizi + qi)

](Paninski 2005). This guarantees

that if the distribution of the covariate is ellipticallysymmetric, then we may consistently estimate themodel parameters via the standard GLM maximumlikelihood estimator, even if the incorrect nonlinearityis used to compute the likelihood (Paninski 2004), i.e,even when the correct values of the correlation matrix,Cs, are unknown. Consistency of the model parameterestimation holds up to a scalar constant. Therefore,we leave a scalar degree of freedom, ai in front of zi.We will now write the moments of the distributionanalytically and afterwards we will equate them to theobserved moments of the spike trains:

The average firing rate can be written as:

E[yi] = Ez,q[exp(aizi + qi)

]

= Ez[exp(aizi)

]Eq

[exp(qi)

]

= Gi(ai)ri

(30)

and

E[ziyi] = Ez,b

[z exp(aizi + qi)

]

= Ez[z exp(aizi)

]Eq

[exp(qi)

]

= ddai

Gi(ai)ri

(31)

where Gi(ai) = E[exp(aizi)] is the moment generatingfunction (Bickel and Doksum 2001) of aizi and ri =E[exp(qi)] = exp(μi + 1

2 (Cs)ii) may be computed ana-lytically here (by solving the Gaussian integral overexp(q)). Therefore we have:

E[ziyi]E[yi] =

ddai Gi(ai)ri

Gi(ai)ri= d

dailog Gi(ai) (32)

and we can solve for aiopt.

Now, we can proceed and solve for Cs. We firstrewrite Cs using the “law of total variance” saying thatthe total variance of a random variable is the sum ofexpected conditioned variance plus the variance of theexpectations:

Cs = E[yiy j] − E

[yi] E

[y j]

= Ez,q[Cov(y|z, q)] + Covz,q(E[y|z, q]).(33)

The first term in the right hand side can be written as:

Ez,q[Cov(y|z, q)

] = Ez,q[diag[exp(aizi + qi)]]

= diag[E[yi]], (34)

since the variance of a Poisson process equals its mean.The second term in the right hand side of Eq. (33) is

Covz,q(E[y|z, q])= Ez,q

[E[yi|zi, qi]E[y j|z j, q j]]

−Ez,q[E[yi|zi, qi]] Ez,q

[E[y j|z j, q j]]

= Ez,q[exp(aizi + qi) exp(a jz j + q j)

]

−Ez,q[exp(aizi + qi)

]Ez,q

[exp(a jz j + q j)

]

= Ez[exp(aizi+a jz j)

]Eq

[exp(qi + q j)

]−E[yi] E

[y j]

= Gi, j(ai, a j) exp(

12

(Cii

s + C jjs + 2Cij

s

))

−Gi(ai)riG j(a j)r j (35)

Putting them back together, we have

Cs = diag[E

[yi]]

+ rir j(Gi, j(ai, a j) exp((Cs)i, j − Gi(ai)G j(a j)

).

(36)

J Comput Neurosci

Now we have all the terms for the estimation of Cs, andwe can uniquely invert the relationship to obtain:

(Cs)i, j = log

((Cs)i, j − E[yi]rir jGi, j(ai, a j)

+ Gi(ai)G j(a j)

Gi, j(ai, a j)

), (37)

where we set Gi, j(ai, a j) = E[exp(aizi + a jz j)], and wedenote the observed covariance of the time-series as Cs.

As a consequence of the law of total variance (re-call Eq. (33) above), the Poisson mixture model (aPoisson model where the underlying intensity is itselfstochastic) constrains the variance of the spike rate tobe larger than the mean rate, while in fact our datais under-dispersed (i.e., the variance is smaller thanthe mean). This often results in negative eigenvaluesin the estimate of Cs. We therefore need to find theclosest approximant to the covariance matrix by findingthe positive semidefinite matrix that minimizing the 2-norm distance to Cs which is a common approach inthe literature, though other matrix norms are also pos-sible. Therefore, in order that the smallest eigenvalueis just above zero, we must add a constant equal tothe minimum eigenvalue to the diagonal, Cpsd

s = Cs +min(eig(Cs)) · 1. For more details see (Higham 1988); asimilar problem was discussed in (Macke et al. 2009).Finally, note that our estimate of the mixing matrix, M,only depends on the estimate of the mean firing ratesand correlations in the network. Estimating these quan-tities accurately does not require exceptionally longdata samples; we found that the parameter estimateswere stable, approaching their final value even with aslittle as half the data.

Appendix C: PSTH-based method

A different method to obtain the covariance structureof the common noise involves analyzing the covaria-tions of the residual activity in the cells once the PSTHsare subtracted (Brody 1999). Let yi

r(t) be the spike trainof neuron i at repeat r. Where the the ensemble of cellsis presented with a stimulus of duration T, R times.si = Er[yi

r(t)] is the PSTH of neuron i. Let

δyir(t) = yi

r(t) − Er[yir(t)]; ie, (38)

δyir(t) is each neuron’s deviation from the PSTH on

each trial. This deviation is unrelated to the stimulus(since we removed the ‘signal’, the PSTH). We next

form a matrix of cross-correlations between the devi-ations of every pair of neurons,

Ci, j(τ ) = Er[(

yir (t) − Er

[yi

r (t)])

· (y j

r (t + τ) − Er[y j

r (t + τ)])]

= Er[δyi

r(t) · δy jr(t + τ)

]; (39)

we can cast this matrix into a 2-dimensional matrix,Ck(τ ), by denoting k = i · N + j. The matrix, Ck(τ ),contains the trial-averaged cross-correlation functionsbetween all cells. It has both the spatial and the tempo-ral information about the covariations.

Using the singular value decomposition one can al-ways decompose a matrix into Ck(τ ) = U�V. There-fore, we can rewrite our matrix of cross-covariations as:

Ck(τ ) =∑

i

σiUi · VTi . (40)

where Ui is the ith singular vector in the matrix Uand Vi is the ith singular vector in the matrix V andσi are the singular values. Each matrix, σiUi · VT

i , inthe sum, is a spatio-temporally separable matrix. Ex-amining the singular values, σi, in Fig. 11(C), we see aclear separation between the first and second singularvalues; this provides some additional support for theseparable nature of the common noise in our model.The first two singular values capture most of the struc-ture of Ck(τ ). This means that we can approximate thematrix Ck(τ ) ≈ Ck = ∑2

i=1 σiUi · VTi (Haykin 2001). In

Fig. 11(A), we show the spatial part of the matrix ofcross-covariations composed of the first two singularvectors reshaped into matrix form, and in Fig. 11B weshow the first two temporal counterparts. Note the dis-tinct separation of the different subpopulations (ON–ON, OFF–OFF, ON–OFF, and OFF–ON) composingthe matrix of the entire population in Fig. 11A.

References

Agresti, A. (2002). Categorical data analysis. In Wiley series inprobability and mathematical statistics.

Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E., Simoncelli,E., & Paninski, L. (2009). A decoder-based spike train metricfor analyzing the neural code in the retina. In COSYNE09.

Ahmadian, Y., Pillow, J. W., & Paninski, L. (2011). EfficientMarkov chain Monte Carlo methods for decoding neuralspike trains. Neural Computation, 23, 46–96.

Arnett, D. (1978). Statistical dependence between neighboringretinal ganglion cells in goldfish. Experimental Brain Re-search, 32(1), 49–53.

J Comput Neurosci

Bickel, P., & Doksum, K. (2001). Mathematical statistics: Basicideas and selected topics. Prentice Hall.

Brivanlou, I., Warland, D., & Meister, M. (1998). Mechanisms ofconcerted firing among retinal ganglion cells. Neuron, 20(3),527–539.

Brody, C. (1999). Correlations without synchrony. Neural Com-putation, 11, 1537–1551.

Cafaro, J., & Rieke, F. (2010). Noise correlations improve re-sponse fidelity and stimulus encoding. Nature, 468(7326),964–967.

Chornoboy, E., Schramm, L., & Karr, A. (1988). Maximum like-lihood identification of neural point process systems. Biolog-ical Cybernetics, 59, 265–275.

Cocco, S., Leibler, S., & Monasson, R. (2009). Neuronal cou-plings between retinal ganglion cells inferred by efficient in-verse statistical physics methods. Proceedings of the NationalAcademy of Sciences, 106(33), 14058.

Cossart, R., Aronov, D., & Yuste, R. (2003). Attractor dynamicsof network up states in the neocortex. Nature, 423, 283–288.

Dacey, D., & Brace, S. (1992). A coupled network for parasolbut not midget ganglion cells in the primate retina. VisualNeuroscience, 9(3–4), 279–290.

DeVries, S. H. (1999). Correlated firing in rabbit retinal ganglioncells. Journal of Neurophysiology, 81(2), 908–920.

Dombeck, D., Khabbaz, A., Collman, F., Adelman, T., & Tank,D. (2007). Imaging large-scale neural activity with cellularresolution in awake, mobile mice. Neuron, 56(1), 43–57.

Dorn, J. D., & Ringach, D. L. (2003). Estimating membranevoltage correlations from extracellular spike trains. Journalof Neurophysiology, 89(4), 2271–2278.

Fahrmeir, L., & Kaufmann, H. (1991). On Kalman filtering, pos-terior mode estimation and fisher scoring in dynamic expo-nential family regression. Metrika, 38, 37–60.

Fahrmeir, L., & Tutz, G. (1994). Multivariate statistical modellingbased on generalized linear models. Springer.

Field, G., Gauthier, J., Sher, A., Greschner, M., Machado, T.,Jepson, L., et al. (2010). Mapping a neural circuit: A com-plete input-output diagram in the primate retina. Nature,467, 673–677.

Frechette, E., Sher, A., Grivich, M., Petrusca, D., Litke, A., &Chichilnisky, E. (2005). Fidelity of the ensemble code forvisual motion in primate retina. Journal of Neurophysiology,94(1), 119.

Gauthier, J., Field, G., Sher, A., Greschner, M., Shlens, J., Litke,A., et al. (2009). Receptive fields in primate retina are coor-dinated to sample visual space more uniformly. PLoS Biol-ogy, 7(4), e1000,063.

Greschner, M., Shlens, J., Bakolitsa, C., Field, G., Gauthier, J.,Jepson, L., et al. (2011). Correlated firing among major gan-glion cell types in primate retina. The Journal of Physiology,589(1), 75.

Gutnisky, D., & Josic, K. (2010). Generation of spatiotempo-rally correlated spike trains and local field potentials using amultivariate autoregressive process. Journal of Neurophysi-ology, 103(5), 2912.

Harris, K., Csicsvari, J., Hirase, H., Dragoi, G., & Buzsaki, G.(2003). Organization of cell assemblies in the hippocampus.Nature, 424, 552–556.

Hayes, M. H. (1996). Statistical digital signal processing and mod-eling. Wiley.

Haykin, S. (2001). Adaptive f ilter theory. Pearson EducationIndia.

Higham, N. (1988). Computing a nearest symmetric positivesemidefinite matrix. Linear Algebra and its Applications,103, 103–118.

Iyengar, S. (2001). The analysis of multiple neural spike trains.In Advances in methodological and applied aspects ofprobability and statistics, Gordon and Breach (pp. 507–524).

Kass, R., & Raftery, A. (1995). Bayes factors. Journal of theAmerican Statistical Association, 90, 773–795.

Keat, J., Reinagel, P., Reid, R., & Meister, M. (2001). Predictingevery spike: A model for the responses of visual neurons.Neuron, 30, 803–817.

Kelly, R., Smith, M., Samonds, J., Kohn, A., Bonds, J., Movshon,A. B., et al. (2007). Comparison of recordings from micro-electrode arrays and single electrodes in the visual cortex.Journal of Neuroscience, 27, 261–264.

Kerr, J. N. D., Greenberg, D., & Helmchen, F. (2005). Imaginginput and output of neocortical networks in vivo. PNAS,102(39), 14063–14068.

Koyama, S., & Paninski, L. (2010). Efficient computation of themap path and parameter estimation in integrate-and-fire andmore general state-space models. Journal of ComputationalNeuroscience, 29, 89–105.

Krumin, M., & Shoham, S. (2009). Generation of spike trainswith controlled auto-and cross-correlation functions. NeuralComputation, 21(6), 1642–1664.

Kulkarni, J., & Paninski, L. (2007). Common-input models formultiple neural spike-train data. Network: Computation inNeural Systems, 18, 375–407.

Latham, P., & Nirenberg, S. (2005). Synergy, redundancy, andindependence in population codes, revisited. The Journal ofNeuroscience, 25(21), 5195.

Lawhern, V., Wu, W., Hatsopoulos, N., & Paninski, L. (2010).Population decoding of motor cortical activity using a gen-eralized linear model with hidden states. Journal of Neuro-science Methods, 189(2), 267–280.

Litke, A., Bezayiff N., Chichilnisky E., Cunningham W.,Dabrowski W., Grillo A., et al. (2004). What does the eyetell the brain? Development of a system for the large scalerecording of retinal output activity. IEEE Transactions onNuclear Science, 51, 1434–1440.

Macke, J., Berens, P., Ecker, A., Tolias, A., & Bethge, M.(2009). Generating spike trains with specified correlationcoefficients. Neural Computation, 21, 397–423.

MacLean, J., Watson, B., Aaron, G., & Yuste, R. (2005). Internaldynamics determine the cortical response to thalamic stimu-lation. Neuron, 48, 811–823.

Martignon, L., Deco, G., Laskey, K., Diamond, M., Freiwald, W.,& Vaadia, E. (2000). Neural coding: Higher-order tempo-ral patterns in the neuro-statistics of cell assemblies. NeuralComputation, 12, 2621–2653.

Masmoudi, K., Antonini, M., & Kornprobst, P. (2010). Encod-ing and decoding stimuli using a biological realistic model:The non-determinism in spike timings seen as a dither sig-nal. In Proc of research in encoding and decoding of neuralensembles.

Mastronarde, D. (1983). Correlated firing of cat retinal ganglioncells. I. Spontaneously active inputs to x-and y-cells. Journalof Neurophysiology, 49(2), 303.

McCulloch, C., Searle, S., & Neuhaus, J. (2008). Generalized,linear, and mixed models. In Wiley series in probability andstatistics.

Meister, M., Lagnado, L., & Baylor, D. (1995). Concerted signal-ing by retinal ganglion cells. Science, 270(5239), 1207.

Mishchenko, Y., Vogelstein, J.T., & Paninski, L. (2011). ABayesian approach for inferring neuronal connectivity fromcalcium fluorescent imaging data. The Annals of AppliedStatistics, 5(2B), 1229–1261.

J Comput Neurosci

Nicolelis, M., Dimitrov, D., Carmena, J., Crist, R., Lehew,G., Kralik, J., et al. (2003). Chronic, multisite, multielec-trode recordings in macaque monkeys. PNAS, 100, 11,041–11,046.

Niebur, E. (2007). Generation of synthetic spike trains withdefined pairwise correlations. Neural Computation, 19(7),1720–1738.

Nirenberg, S., Carcieri, S., Jacobs, A., & Latham, P. (2002).Retinal ganglion cells act largely as independent encoders.Nature, 411, 698–701.

Nykamp, D. (2005). Revealing pairwise coupling in linear-nonlinear networks. SIAM Journal on Applied Mathematics,65, 2005–2032.

Nykamp, D. (2008). Exploiting history-dependent effects to infernetwork connectivity. SIAM Journal on Applied Mathemat-ics, 68(2), 354–391.

Nykamp, D. (2009). A stimulus-dependent connectivity analy-sis of neuronal networks. Journal of Mathematical Biology,59(2), 147–173.

Ohki, K., Chung, S., Kara, P., Hübener, M., Bonhoeffer, T., &Reid, R. (2006). Highly ordered arrangement of single neu-rons in orientation pinwheels. Nature, 442(7105), 925–928.

Okatan, M., Wilson, M., & Brown, E. (2005). Analyzing func-tional connectivity using a network likelihood model ofensemble neural spiking activity. Neural Computation, 17,1927–1961.

Paninski, L. (2004). Maximum likelihood estimation of cascadepoint-process neural encoding models. Network: Computa-tion in Neural Systems, 15, 243–262.

Paninski, L. (2005). Log-concavity results on Gaussian processmethods for supervised and unsupervised learning. Ad-vances in Neural Information Processing Systems 17.

Paninski, L., Pillow, J., & Simoncelli, E. (2004). Maximum like-lihood estimation of a stochastic integrate-and-fire neuralmodel. Neural Computation, 16, 2533–2561.

Paninski, L., Ahmadian, Y., Ferreira, D., Koyama, S.,Rahnama Rad, K., Vidne, M., et al. (2010) A newlook at state-space models for neural data. Journal ofComputational Neuroscience, 29, 107–126.

Pillow, J., & Latham, P. (2007). Neural characterization in par-tially observed populations of spiking neurons. In NIPS.

Pillow, J., Paninski, L., Uzzell, V., Simoncelli, E., & Chichilnisky,E. (2005). Prediction and decoding of retinal ganglion cellresponses with a probabilistic spiking model. Journal of Neu-roscience, 25, 11,003–11,013.

Pillow, J., Shlens, J., Paninski, L., Sher, A., Litke, A.,Chichilnisky, E., et al. (2008). Spatiotemporal correlationsand visual signaling in a complete neuronal population.Nature, 454, 995–999.

Pillow, J., Ahmadian, Y., & Paninski, L. (2011). Model-based de-coding, information estimation, and change-point detectionin multi-neuron spike trains. Neural Computation, 23, 1–45.

Rigat, F., de Gunst, M., & van Pelt, J. (2006). Bayesian modellingand analysis of spatio-temporal neuronal networks. BayesianAnalysis, 1, 733–764.

de la Rocha, J., Doiron, B., Shea-Brown, E., Josic, K., & Reyes,A. (2007). Correlation between neural spike trains increaseswith firing rate. Nature, 448, 802–806.

Rybicki, G., & Hummer, D. (1991). An accelerated lambda iter-ation method for multilevel radiative transfer, appendix b:Fast solution for the diagonal elements of the inverse of atridiagonal matrix. Astronomy and Astrophysics, 245, 171.

Schneidman, E., Bialek, W., & Berry, M. (2003). Synergy, redun-dancy, and independence in population codes. The Journalof Neuroscience, 23(37), 11,539.

Schneidman, E., Berry, M., Segev, R., & Bialek, W. (2006).Weak pairwise correlations imply strongly correlated net-work states in a neural population. Nature, 440, 1007–1012.

Schnitzer, M., & Meister, M. (2003). Multineuronal firing pat-terns in the signal from eye to brain. Neuron, 37, 499–511.

Segev, R., Goodhouse, J., Puchalla, J., & Berry, M. (2004).Recording spikes from a large fraction of the ganglion cellsin a retinal patch. Nature Neuroscience, 7, 1154–1161.

Shlens, J., Field, G., Gauthier, J., Grivich, M., Petrusca, D., Sher,A., et al. (2006). The structure of multi-neuron firing pat-terns in primate retina. The Journal of Neuroscience, 26(32),8254.

Shlens, J., Field, G. D., Gauthier, J. L., Greschner, M., Sher, A.,Litke, A. M., et al. (2009). The structure of large-scale syn-chronized firing in primate retina. Journal of Neuroscience,29(15), 5022–5031. doi:10.1523/JNEUROSCI.5187-08.2009

Smith, A., & Brown, E. (2003). Estimating a state-space modelfrom point process observations. Neural Computation, 15,965–991.

Stein, R., Weber, D., Aoyagi, Y., Prochazka, A., Wagenaar, J.,Shoham, S., et al. (2004). Coding of position by simulta-neously recorded sensory neurones in the cat dorsal rootganglion. The Journal of Physiology, 560(3), 883–896.

Stevenson, I., Rebesco, J., Miller, L., & Körding, K. (2008). Infer-ring functional connections between neurons. Current Opin-ion in Neurobiology, 18(6), 582–588.

Stevenson, I., Rebesco, J., Hatsopoulos, N., Haga, Z., Miller, L.,& Körding, K. (2009). Bayesian inference of functional con-nectivity and network structure from spikes. IEEE Trans-actions On Neural Systems And Rehabilitation Engineering,17(3), 203.

Trong, P., & Rieke, F. (2008). Origin of correlated activity be-tween parasol retinal ganglion cells. Nature Neuroscience,11(11), 1343–1351.

Truccolo, W., Eden, U., Fellows, M., Donoghue, J., & Brown, E.(2005). A point process framework for relating neural spik-ing activity to spiking history, neural ensemble and extrin-sic covariate effects. Journal of Neurophysiology, 93, 1074–1089.

Usrey, W., & Reid, R. (1999). Synchronous activity in the visualsystem. Annual Review of Physiology, 61(1), 435–456.

Utikal, K. (1997). A new method for detecting neural intercon-nectivity. Biological Cyberkinetics, 76, 459–470.

Van Pelt, J., Vajda, I., Wolters, P., Corner, M., & Ramakers,G. (2005). Dynamics and plasticity in developing neuronalnetworks in vitro. Progress in Brain Research, 147, 173–188.

Vidne, M., Kulkarni, J., Ahmadian, Y., Pillow, J., Shlens, J.,Chichilnisky, E., et al. (2009). Inferring functional connec-tivity in an ensemble of retinal ganglion cells sharing a com-mon input. In Frontiers in systems neuroscience conferenceabstract: Computational and systems neuroscience 2009.

Warland, D., Reinagel, P., & Meister, M. (1997). Decoding vi-sual information from a population of retinal ganglion cells.Journal of Neurophysiology, 78, 2336–2350.

Wilson, J. M., Dombeck, D. A., Diaz-Rios, M., Harris-Warrick,R. M., & Brownstone, R. M. (2007). Two-photon calciumimaging of network activity in XFP-expressing neurons inthe mouse. Journal of Neurophysiology, 97(4), 3118–3125.

Wu, W., Kulkarni, J., Hatsopoulos, N., & Paninski, L. (2008).Neural decoding of goal-directed movements using a linearstatespace model with hidden states. In Computational andsystems neuroscience meeting.

Yu, B., Afshar, A., Santhanam, G., Ryu, S., Shenoy, K., & Sa-hani, M. (2006). Extracting dynamical structure embeddedin neural activity. In NIPS.

http://dx.doi.org/10.1523/JNEUROSCI.5187-08.2009

J Comput Neurosci

Yu, B., Cunningham, J., Santhanam, G., Ryu, S., Shenoy, K., &Sahani, M. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activ-ity. In NIPS.

Zhang, K., Ginzburg, I., McNaughton, B., & Sejnowski, T. (1998).Interpreting neuronal population activity by reconstruction:Unified framework with application to hippocampal placecells. Journal of Neurophysiology, 79, 1017–1044.

Modeling the impact of common noise inputs on the network activity of retinal ganglion cells

Documents

Transcript of Modeling the impact of common noise inputs on the network activity of retinal ganglion cells