Statistical Models for Neural Data: from Regression / GLMs to ...

114
Jonathan Pillow Princeton Neuroscience Institute Statistical Models for Neural Data: from Regression / GLMs to Latent Variables Tutorial Cosyne 2018

Transcript of Statistical Models for Neural Data: from Regression / GLMs to ...

Jonathan PillowPrinceton Neuroscience Institute

Statistical Models for Neural Data: from Regression / GLMs to Latent Variables

TutorialCosyne 2018

Retinal responses to white noise

Shlens, Field, Gauthier, Greschner, Sher , Litke & Chichilnisky (2009).

(ON parasol cells)

stimulus

spikes

membrane potential

imaging

neural activity

neural coding problem

• How are stimuli and actions encoded in neural activity?• What aspects of neural activity carry information?

stimulus

spikes

membrane potential

imaging

neural activity

neural coding problem

encoding models

• develop flexible statistical models of P(y|x) • quantify information carried in neural responses

Approach:

spikes

membrane potential

imaging

neural activityencoding models

“regression models”

NATURE NEUROSCIENCE VOLUME 17 | NUMBER 3 | MARCH 2014 441

A R T I C L E S

Columns correspond to the activities at different times and for differ-ent movements. W contains the weights for the linear mapping from neurons to muscles. That is, W specifies the weighted sum of neurons’ firing rates that drives each muscle.

To build intuition about this model, consider the following extreme, unphysiologically simplified situation. Imagine that just two excitatory neurons synapsed directly on a muscle, and this muscle produced force proportional to the sum of its two inputs. As long as the sum of the two inputs remained constant, the muscle would produce a constant amount of force: no ‘gate’ or ‘switch’ is required. The activity of these two neurons can be represented as a point in a two-dimensional firing rate space. Their pattern of activity over time is a trajectory through this space35–37. In the state space, the constant-sum line forms an ‘output-null’ dimension (Fig. 2). The muscle’s force output will change only if there is a change in the sum of the neurons’ firing rates; we term the direction in which that sum changes the ‘output-potent’ dimension (Fig. 2). This idea also generalizes to more complex cases: if one of these hypothetical neurons had a net inhibitory effect, the dimensions would be switched. With many neurons, we would expect multiple output-null dimensions. If there were multiple independent muscles, we would need multiple output-potent dimensions. This is all to say that activity in the output-potent dimensions would be read out by the target muscle or brain area, whereas activity in output-null dimensions would not be visible to the target. Formally, any activity changes in output-null dimensions fall in the null space of W. Conversely, activity changes in output-potent dimensions fall in the row space of W.

The existence of output-potent and output-null dimensions is likely inevitable, as there are more neurons than muscles. The key question is whether the brain exploits these dimensions to control when cir-cuits communicate (as opposed to relying on nonlinear thresholds or a time-varying gain). The hypothesis that output-null dimensions are used to control communication leads to two predictions. First, if this mechanism operates between cortex and muscles, then during motor preparation changes in neural firing rates should occur in combina-tions that produce changes in output-null dimensions but do not pro-duce changes in dimensions that are output-potent with respect to the muscles (Fig. 2). Second, if this same mechanism operates between cortical areas, we would expect PMd preparatory activity to prefer-entially occupy dimensions that are output-null with respect to M1.

If this latter prediction is correct, this could help produce the well-known reduction in preparatory activity between PMd and M1.

Exploitation of output-null dimensions is unlikely to leave any par-ticular signature at the level of single neurons. Changing state along the output-null dimensions corresponds to activity changes in most of the relevant neurons (Fig. 2). Such activity cancels out only at the level of the population output. Intriguingly, though, this model tends to produce neurons with mismatches in tuning between the prepara-tory and movement periods, as has been observed previously25–28. Thus, if one averages over neurons based on their preferred reach condition during movement, their preparatory tuning largely aver-ages away (Supplementary Fig. 1). This mismatch is suggestive, but it forms only an indirect test and is neither necessary nor sufficient to demonstrate that such a model is correct (Online Methods). Testing this hypothesis requires both knowing the population response and estimating the output-null and output-potent dimensions.

To test our hypothesis, we used a variant of a standard delayed-reaching task with two monkeys, J and N (Fig. 1 and Online Methods). We recorded the population response using both single- and multiunit neural activity (using single moveable electrodes for data sets J and N, and silicon electrode arrays for data sets JA and NA) and muscle activity (using percutaneous electrodes). Trial-averaged data were used except where noted: the primary goal of these analyses was to explain how there can be preparatory tuning without movement, not to explain trial-by-trial variability. Thus, all repeats of the same con-dition were averaged to produce a single rate versus time. The same reaches were required every day and monkeys were highly practiced. Repeated reaches to the same targets were thus extremely similar to one another over the course of months (Supplementary Fig. 2). Data from different days were therefore combined.

As a basic test for the plausibility of exploiting output-null dimen-sions, we can search for neuron pairs whose preparatory activity

0

10

cm

0

1 a.u.

0

110 spikes per s

200 ms

ba Vertical target position

Vertical cursor positionCentral spot

Firing rate of onePMd neuron

Deltoid EMG

Target Go Move

Figure 1 Task and typical data. (a) Layout of maze task. One typical trial shown. The same mazes were repeated many times; each maze is hereafter called a ‘condition’. (b) Top, task timeline. The monkey initially touched a central spot with a cursor projected slightly above his fingertip; then a target and (typically) barriers appeared. On some trials, two inaccessible distractor ‘targets’ also appeared. After the Go cue (cessation of slight target jitter, extinguishing of central spot), the monkey made a curved reach around the barriers to touch the accessible target, leaving a white trail on the screen. If no barriers were present, reaches were straight. Middle, trial-averaged deltoid EMG; a.u., arbitrary units. Bottom, firing rate of one PMd neuron. Target, target onset; Go, go cue; Move, movement onset. Flanking traces show s.e.m. Maze identifier 100, neuron J-PM48, EMG recording J-PD10.

Firing rate neuron 1

Firin

g ra

te n

euro

n 2

Preparation

BaselineReach right

Go cue

FR n

euro

n 1

FR n

euro

n 2

Out

put-p

oten

t pro

ject

ion

Out

put-n

ull p

roje

ctio

n

TimeT G

TimeT G

TimeT GTimeT G

Reach left

Figure 2 Simplified output-null model. For illustration, assume a muscle receives input from two neurons and produces a response that is the linear sum of the inputs. If the sum is constant (output-null dimension), the muscle cannot distinguish between input 1 being high and 2 low, and vice versa. When the sum changes (output-potent dimension), muscle output will change. If preparatory neural activity changes only within the output-null dimension (two different reaches illustrated in darker and lighter shades), then the muscle’s activity remains constant; when neural activity changes in the output-potent dimension also, movement ensues. Insets: PSTHs for the neurons and PSTH-like views of output-potent and output-null dimensions. T, target onset; G, go cue; FR, firing rate.

hand position

[Kauffman et al 2014]

• not restricted to sensory variables

neural coding problem

spikes

membrane potential

imaging

neural activityencoding models

“external covariates”

[Hardcastle et al 2015]

“regression models”

• not restricted to sensory variables

neural coding problem

spikes

membrane potential

imaging

neural activity

latent variable models

• capture hidden structure underlying neural activity

latent encoding models

latent variable(unobserved or

“hidden”)

(eg. low-dimensional or discrete states)

<latexit sha1_base64="W03iimHuVnRwoVFYpQGh0ykBtHs=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3QkiuGzBqtCpkklPNTSXITlTW4Z5Arf6Kr6L4Erc+g6mdRZaPZDw83/nkJM/SaVwGIavwdT0zOzc/MJiZWl5ZXVtfWPz0pnMcmhyI429TpgDKTQ0UaCE69QCU4mEq6R3OuJXfbBOGH2BwxTait1p0RWcobcag9v1algLx0X/iqgUVVJW/XYj2Ik7hmcKNHLJnGtFYYrtnFkUXEJRiTMHKeM9dgctLzVT4Nr5eNOC7nmnQ7vG+qORjt2fEzlTzg1V4jsVw3s3yUbmf6yVYfeonQudZgiafz/UzSRFQ0ffph1hgaMcesG4FX5Xyu+ZZRx9OJVKrOGBG6WY7uRxv1f4CzjtFRNgUILBBECbFvlNjCYtfKDRZHx/RfOgdlyLGmH15KxMdoFsk12yTyJySE7IOamTJuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkF+AGrsg==</latexit>

<latexit sha1_base64="Y1Rdd0vDJoDdlc8eOTZsZtbYSw8=">AAACgXicbVFdb9MwFHXDgK18rIXHvVhUSB1CldNRuoqXSRMSj0WibKgJlePcdlb9EdnO1irkp/DKfhP/BqfLA2RcydbROffI1+cmmeDWEfK7FTzYe/jo8f5B+8nTZ88PO90XX63ODYMZ00Kby4RaEFzBzHEn4DIzQGUi4CJZn1f6xTUYy7X64rYZxJKuFF9yRp2nFp3utL/9sTmOjIG0mPY3x+Wi0yMDsit8H4Q16KG6potu61uUapZLUI4Jau08JJmLC2ocZwLKdpRbyChb0xXMPVRUgo2L3ewlfu2ZFC+18Uc5vGP/dhRUWruVie+U1F3ZplaR/9PmuVuexgVXWe5AsbuHlrnATuMqCJxyA8yJrQeUGe5nxeyKGsqcj6vdjhTcMC0lVWkRXa9LfwHD67IhbGph0xCcycrie+R01hCqoL1n98+5WSVxQQbvJ6fkZPLWhzx+NwqJB+HJeDgaldUywmb098FsOJgMws+kd/ax3so+OkKvUB+FaIzO0Cc0RTPE0A36iX6h22AveBOQYHjXGrRqz0v0TwUf/gB49sGr</latexit>

spikes

membrane potential

imaging

neural activity

latent variable models

latent variable

latent dynamics

latent dynamical encoding models

• capture hidden dynamics underlying neural activity

(unobserved or “hidden”)

<latexit sha1_base64="W03iimHuVnRwoVFYpQGh0ykBtHs=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3QkiuGzBqtCpkklPNTSXITlTW4Z5Arf6Kr6L4Erc+g6mdRZaPZDw83/nkJM/SaVwGIavwdT0zOzc/MJiZWl5ZXVtfWPz0pnMcmhyI429TpgDKTQ0UaCE69QCU4mEq6R3OuJXfbBOGH2BwxTait1p0RWcobcag9v1algLx0X/iqgUVVJW/XYj2Ik7hmcKNHLJnGtFYYrtnFkUXEJRiTMHKeM9dgctLzVT4Nr5eNOC7nmnQ7vG+qORjt2fEzlTzg1V4jsVw3s3yUbmf6yVYfeonQudZgiafz/UzSRFQ0ffph1hgaMcesG4FX5Xyu+ZZRx9OJVKrOGBG6WY7uRxv1f4CzjtFRNgUILBBECbFvlNjCYtfKDRZHx/RfOgdlyLGmH15KxMdoFsk12yTyJySE7IOamTJuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkF+AGrsg==</latexit>

<latexit sha1_base64="mlBhLlmzDrT6CPDWqx9CWJUwZR0=">AAACkXicbVFdb9MwFHXDx0b5WDd448VQIXUSVM62ritPFQgJiZciUTbUhMhx3M5qbEf2zWiV5Xm/Zq/wW/g3OF0eIONKto7OuUf2PTfOUmGBkN8t787de/e3th+0Hz56/GSns7v31ercMD5lOtXmLKaWp0LxKQhI+VlmOJVxyk/j5ftKP73gxgqtvsA646GkCyXmglFwVNR5MemtI7hcRbAfGMOTYtJzGF/iVVTAG7/cL6NOl/TJpvBt4Negi+qaRLutb0GiWS65ApZSa2c+ySAsqAHBUl62g9zyjLIlXfCZg4pKbsNiM0uJXzkmwXNt3FGAN+zfjoJKa9cydp2SwrltahX5P22Ww/wkLITKcuCK3Tw0z1MMGlfB4EQYziBdO0CZEe6vmJ1TQxm4+NrtQPEfTEtJVVIEF8vSXZzhZdkQVrWwaghgsrL4HoDOGkIVuvNs5pyZRRwWpH88OiGHo9cu5OHRwCcO+IfDg8GgrJbhN6O/DaYH/VHf/0y64w/1VrbRc/QS9ZCPhmiMPqIJmiKGrtA1+ol+ec+8t97Ye3fT6rVqz1P0T3mf/gBUUsgh</latexit>

model desiderata

fittability /tractability

richness / flexibility

(can be fit to data) (capture realistic neural properties)

linear, Gaussian

multi- compartment

Hodgkin-Huxley

sweet spotGLM

descriptive statistical models What is the code?

Why does the codetake this form?

normative theories(e.g. “efficient coding”)

How is it implemented?anatomy,

biophysics

1. Spike count models & Maximum Likelihood

2. Spike train models (GLMs with spike history)

3. Multiple Spike Train Models (GLMs with coupling)

4. Regularization

5. Beyond GLM

6. Latent variable models

Outline

simple example #1: linear Poisson neuron

spike count

spike rate

encoding model:

stimulusparameter

important distributions

Gaussian0 1 2 3 4 5 6 7 8 9 10

−3 −2 −1 0 1 2 3

Poisson

0 1 2 3 4 5 6 7 8 9 10

−3 −2 −1 0 1 2 3

others that may come up: Bernoulli, binomial, multinomial, exponential, gamma,

37

= mean = variance

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60

conditional distributionP (y|x)<latexit sha1_base64="XO6O7/dO3TkCBayCB27dyDpqB8U=">AAACOHicbZDPShxBEMZrNMbNqolrjkFoIoJellkvSW5CEDyukFVhZ5We3lptp/8M3TW6y2RAfAKv+ip5khxzCrnmCez9c0hWC7r5+H5VdPWX5kp6iuOf0cLiq6XXy7U39ZXVtbfv1hsbx94WTmBHWGXdaco9KmmwQ5IUnuYOuU4VnqTZ1zE/uUbnpTXfaJRjT/MLIwdScArWcXtn9H24e76+FTfjSbHnojUTW/vb2d0tALTPG9Fm0rei0GhIKO59txXn1Cu5IykUVvWk8JhzkfEL7AZpuEbfKyfrVmw7OH02sC4cQ2zi/jtRcu39SKehU3O69PNsbL7EugUNPvdKafKC0IjpQ4NCMbJs/HfWlw4FqVEQXDgZdmXikjsuKCRUrycGb4TVmpt+mVxnVbhQsKyaA8MZGM4BcnlVniVk8yoE2pqP77no7DW/NOOjEOwBTKsGH+Aj7EALPsE+HEIbOiDgCu7hAR6jH9Gv6Hf0Z9q6EM1m3sN/Ff19ApLpr0s=</latexit><latexit sha1_base64="vElakgQetGBJCA9/pD4i/DlUbZ8=">AAACOHicbZDPSltBFMbnqm1jtNXYpQhDRUg34aab2l2gCF2m0EQhNw1zJyfJ9M6fy8y5NuF6KfQJutWVPkcfwrVLV+LWJ3DyZ2FjD8zw8f3OYc58cSqFwzC8CVZW1168fFVaL29svn6ztV3ZaTuTWQ4tbqSxJzFzIIWGFgqUcJJaYCqWcBwnn6f8+BSsE0Z/w0kKXcWGWgwEZ+itdrM6ORu/723vh7VwVvS5qC/EfuMg+f3r6vK62asEe1Hf8EyBRi6Zc516mGI3ZxYFl1CUo8xBynjChtDxUjMFrpvP1i3ogXf6dGCsPxrpzH06kTPl3ETFvlMxHLllNjX/xzoZDg67udBphqD5/KFBJikaOv077QsLHOXEC8at8LtSPmKWcfQJlcuRhp/cKMV0P49Ok8JfwGlSLIHxAoyXANq0yL9HaNLCB1pfju+5aH2ofaqFX32wR2ReJbJL3pEqqZOPpEG+kCZpEU5+kD/knFwEf4Pb4C64n7euBIuZt+SfCh4eAe1VsVo=</latexit><latexit sha1_base64="vElakgQetGBJCA9/pD4i/DlUbZ8=">AAACOHicbZDPSltBFMbnqm1jtNXYpQhDRUg34aab2l2gCF2m0EQhNw1zJyfJ9M6fy8y5NuF6KfQJutWVPkcfwrVLV+LWJ3DyZ2FjD8zw8f3OYc58cSqFwzC8CVZW1168fFVaL29svn6ztV3ZaTuTWQ4tbqSxJzFzIIWGFgqUcJJaYCqWcBwnn6f8+BSsE0Z/w0kKXcWGWgwEZ+itdrM6ORu/723vh7VwVvS5qC/EfuMg+f3r6vK62asEe1Hf8EyBRi6Zc516mGI3ZxYFl1CUo8xBynjChtDxUjMFrpvP1i3ogXf6dGCsPxrpzH06kTPl3ETFvlMxHLllNjX/xzoZDg67udBphqD5/KFBJikaOv077QsLHOXEC8at8LtSPmKWcfQJlcuRhp/cKMV0P49Ok8JfwGlSLIHxAoyXANq0yL9HaNLCB1pfju+5aH2ofaqFX32wR2ReJbJL3pEqqZOPpEG+kCZpEU5+kD/knFwEf4Pb4C64n7euBIuZt+SfCh4eAe1VsVo=</latexit>

P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60

conditional distributionP (y|x)

<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60

conditional distributionP (y|x)

<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>

Maximum Likelihood Estimation:

• given observed data , find that maximizes

all spikecounts

all stimuli

parameters

}single-trial probability

Q: what assumption are we making about the responses?A: conditional independence across trials!

A: when considering it as a function of !

Maximum Likelihood Estimation:

• given observed data , find that maximizes

all spikecounts

all stimuli

parameters

}single-trial probability

Q: what assumption are we making about the responses?A: conditional independence across trials!

Q: when do we call a likelihood?

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)Maximum Likelihood Estimation:

• given observed data , find that maximizes

• could in theory do this by turning a knob

P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)Maximum Likelihood Estimation:

• given observed data , find that maximizes

• could in theory do this by turning a knob

P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>

0 20 400

20

40

60

(contrast)

(spi

ke c

ount

)Maximum Likelihood Estimation:

• given observed data , find that maximizes

• could in theory do this by turning a knob

P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>

likelihood

Likelihood function: as a function of

Because data are independent:

0 1 2

0 1 2

log-likelihood

log

Likelihood function: as a function of

Because data are independent:

0 1 2

likelihood

0 1 2

log-likelihood

Do it: solve for

•Closed-form solution when model in “exponential family”

0 1 2

log-likelihood

Properties of the MLE (maximum likelihood estimator)

• consistent (converges to true in limit of infinite data)

• efficient (converges as quickly as possible, i.e., achieves minimum possible asymptotic error)

spike count

spike rate

encoding model:

stimulusparameter

simple example #2: linear Gaussian neuron

0 20 40

0

20

40

60

(contrast)

(spi

ke c

ount

)

0 20 40 60

All slices have same width

encoding distribution

Log-Likelihood

Differentiate, set to zero, and solve for ✓<latexit sha1_base64="1mL7mdCDWiAOaBCSidPGnhyZV6w=">AAACOHicbZC7ThtBFIbPQrjE3EwoI6QRViQqa00DVCChKJREigHJa9Ds+BgPnstq5ixgrfYd0sKr0FDzBimporR5AsaXIjEcaUa//u8czZk/zZT0FMe/opnZD3PzC4sfK0vLK6tr1fVPp97mTmBTWGXdeco9KmmwSZIUnmcOuU4VnqX9oyE/u0HnpTU/aJBhW/MrI7tScArWaUI9JH5ZrcX1eFTsrWhMRO3g+eXbEwCcXK5Hm0nHilyjIaG4961GnFG74I6kUFhWktxjxkWfX2ErSMM1+nYxWrdkX4LTYV3rwjHERu6/EwXX3g90Gjo1p56fZkPzPdbKqbvXLqTJckIjxg91c8XIsuHfWUc6FKQGQXDhZNiViR53XFBIqFJJDN4KqzU3nSK56ZfhQsH65RS4m4C7KUAuK4uLhGxWhkAb0/G9Fc2d+n49/h7XDr/CuBbhM2zBNjRgFw7hGE6gCQKu4Sfcw0P0GL1Ev6M/49aZaDKzAf9V9PcVIVKwzg==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit>

Log-Likelihood

Maximum-Likelihood Estimator:(“Least squares regression” solution)

(Recall that for Poisson, )

example #3: unknown neuron

-25 0 250

25

50

75

100

(contrast)

(spi

ke c

ount

)

Be the computational neuroscientist: what model would you use?

Example 3: unknown neuron

More general setup:

-25 0 250

25

50

75

100

(contrast)

(spi

ke c

ount

)

firing rate is nonlinear

Poisson firingThis is a GLM!

stimulus filter Poissonspiking

stimulus

f

λ(t)

exponentialnonlinearity

dimensionalityreduction

nonlinearstretching

noise

“basic” Poisson generalized linear model (GLM)

Linear-Nonlinear-Poisson (LNP) model

spike ratespike count

✓<latexit sha1_base64="1mL7mdCDWiAOaBCSidPGnhyZV6w=">AAACOHicbZC7ThtBFIbPQrjE3EwoI6QRViQqa00DVCChKJREigHJa9Ds+BgPnstq5ixgrfYd0sKr0FDzBimporR5AsaXIjEcaUa//u8czZk/zZT0FMe/opnZD3PzC4sfK0vLK6tr1fVPp97mTmBTWGXdeco9KmmwSZIUnmcOuU4VnqX9oyE/u0HnpTU/aJBhW/MrI7tScArWaUI9JH5ZrcX1eFTsrWhMRO3g+eXbEwCcXK5Hm0nHilyjIaG4961GnFG74I6kUFhWktxjxkWfX2ErSMM1+nYxWrdkX4LTYV3rwjHERu6/EwXX3g90Gjo1p56fZkPzPdbKqbvXLqTJckIjxg91c8XIsuHfWUc6FKQGQXDhZNiViR53XFBIqFJJDN4KqzU3nSK56ZfhQsH65RS4m4C7KUAuK4uLhGxWhkAb0/G9Fc2d+n49/h7XDr/CuBbhM2zBNjRgFw7hGE6gCQKu4Sfcw0P0GL1Ev6M/49aZaDKzAf9V9PcVIVKwzg==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit>

• also known as a “cascade” model

What is a GLM?

Be careful about terminology:

Linear Linear

General Linear Model Generalized Linear Model

GLM GLM≠

(Nelder 1972)

Moral:Be careful when naming your model!

Linear Noise

“Dimensionality Reduction”

(exponential family)

Examples: 1. Gaussian

2. Poisson

y = ~✓ · ~x + ✏

1. General Linear Model

2. Generalized Linear Model

Linear

Examples: 1. Gaussian

2. Poisson

Noise(exponential family)

Nonlinear

y = f(~✓ · ~x) + ✏

2. Generalized Linear Model

Linear Noise(exponential family)

Nonlinear

Terminology:

“distribution function”

“parameters”= “link function”

f<latexit sha1_base64="AI/t4M9ypbTplLN4BJ80y0xLxXQ=">AAACNHicbVBNSxxBEK0xJurmQzc5itAogZyW2VwSb0IIeFTJZhd2NtLTU6PN9sfQXaO7DPMPvCZ/Jdf8C/GQU/Dqb0jvx0FHH1TxeK+Krn5poaSnOL6JVp6tPn+xtr7Revnq9ZvNrfbb796WTmBPWGXdIOUelTTYI0kKB4VDrlOF/XT8Zeb3L9B5ac03mhY40vzMyFwKTkE6ydnp1l7ciedgj0l3SfYO2J9rCDg6bUc7SWZFqdGQUNz7YTcuaFRxR1IorFtJ6bHgYszPcBio4Rr9qJqfWrP3QclYbl0oQ2yu3t+ouPZ+qtMwqTmd+6Y3E5/yhiXln0eVNEVJaMTiobxUjCyb/Ztl0qEgNQ2ECyfDrUycc8cFhXRarcTgpbBac5NVycW4Dg0FG9cNY7I0Jg2DXFFXPxKyRR0C7Tbje0x6Hzv7nfg4BPsVFliHbdiFD9CFT3AAh3AEPRCQwxX8hF/R7+hv9C+6XYyuRMudd/AA0d1/cc6twA==</latexit><latexit sha1_base64="RQP4MU4syzl9VD9nDpZqDmBMilc=">AAACNHicbZC7SgQxFIYz3l3vWooQFMFqmbVRO0EFSxVXhZ1VMtkzGjaXITmjLsO8ga2+ioWNTyFYWImtnb3ZS6GrBxJ+/u8ccvLHqRQOw/A1GBgcGh4ZHRsvTUxOTc/Mzs2fOJNZDlVupLFnMXMghYYqCpRwllpgKpZwGjd32vz0GqwTRh9jK4W6YpdaJIIz9NZRQi9mV8Jy2Cn6V1R6YmWbPr987T4tHlzMBUtRw/BMgUYumXO1SphiPWcWBZdQlKLMQcp4k11CzUvNFLh63lm1oKveadDEWH800o77cyJnyrmWin2nYnjl+lnb/I/VMkw267nQaYagefehJJMUDW3/mzaEBY6y5QXjVvhdKb9ilnH06ZRKkYYbbpRiupFH183CX8Bps+gDtz1w2wfQpkV+HqFJCx9opT++v6K6Xt4qh4c+2D3SrTGySJbJGqmQDbJN9skBqRJOEnJH7slD8Bi8Be/BR7d1IOjNLJBfFXx+A8zZr84=</latexit><latexit sha1_base64="RQP4MU4syzl9VD9nDpZqDmBMilc=">AAACNHicbZC7SgQxFIYz3l3vWooQFMFqmbVRO0EFSxVXhZ1VMtkzGjaXITmjLsO8ga2+ioWNTyFYWImtnb3ZS6GrBxJ+/u8ccvLHqRQOw/A1GBgcGh4ZHRsvTUxOTc/Mzs2fOJNZDlVupLFnMXMghYYqCpRwllpgKpZwGjd32vz0GqwTRh9jK4W6YpdaJIIz9NZRQi9mV8Jy2Cn6V1R6YmWbPr987T4tHlzMBUtRw/BMgUYumXO1SphiPWcWBZdQlKLMQcp4k11CzUvNFLh63lm1oKveadDEWH800o77cyJnyrmWin2nYnjl+lnb/I/VMkw267nQaYagefehJJMUDW3/mzaEBY6y5QXjVvhdKb9ilnH06ZRKkYYbbpRiupFH183CX8Bps+gDtz1w2wfQpkV+HqFJCx9opT++v6K6Xt4qh4c+2D3SrTGySJbJGqmQDbJN9skBqRJOEnJH7slD8Bi8Be/BR7d1IOjNLJBfFXx+A8zZr84=</latexit>

= “the nonlinearity”

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

stimulus

response

Applying it to data

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

~xt

yt

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

stimulus

response

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

t = 1

walk through the data one time bin at a time

~xt

yt

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

t = 2

walk through the data one time bin at a time

stimulus

response

~xt

yt

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

t = 3

walk through the data one time bin at a time

stimulus

response

~xt

yt

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

t = 4

walk through the data one time bin at a time

stimulus

response

~xt

yt

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

t = 5

walk through the data one time bin at a time

stimulus

response

~xt

yt

00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0

linearfilter

vector stimulus at time t

yt = ~k · ~xt + noise

time

responseat time t

t = 6

walk through the data one time bin at a time

stimulus

response

Build up to following matrix version:

001

Y X~k= + noise

~k=time

design matrix

Computing maximum likelihood estimate

001

Y X~k= + noise

…=tim

e

stimulus covariance

spike-triggered avg(STA)

k̂ = (XTX)�1XTY

~k

1. “Linear-Gaussian” GLM:

001

Y = + noise

…=tim

e

2. Poisson GLM: k = glmfit(X,Y,‘Poisson’);

maximum likelihood fit(assumes exponential nonlinearity by default)

~k

Computing maximum likelihood estimate

<latexit sha1_base64="CaRu3bohmMwwqXh5y9qIYwNjphg=">AAAB8XicbVBNS8NAEN34WetX1aOXxSLUS0m8qLeiF48VjC2koWy2k3bpZhN2J4VS+jO8eFDx6r/x5r9x2+agrQ8GHu/NMDMvyqQw6Lrfztr6xubWdmmnvLu3f3BYOTp+MmmuOfg8laluR8yAFAp8FCihnWlgSSShFQ3vZn5rBNqIVD3iOIMwYX0lYsEZWimIa23aGQGnw4tuperW3TnoKvEKUiUFmt3KV6eX8jwBhVwyYwLPzTCcMI2CS5iWO7mBjPEh60NgqWIJmHAyP3lKz63So3GqbSmkc/X3xIQlxoyTyHYmDAdm2ZuJ/3lBjvF1OBEqyxEUXyyKc0kxpbP/aU9o4CjHljCuhb2V8gHTjKNNqWxD8JZfXiX+Zf2m7j241cZtkUaJnJIzUiMeuSINck+axCecpOSZvJI3B50X5935WLSuOcXMCfkD5/MHKGeQDA==</latexit>

001

Y = + noise

…=tim

e

3. Bernoulli GLM: k = glmfit(X,Y,’binomial’);(assumes logistic nonlinearity by default)

~k

“logistic regression”

outputs 0 and 1

Computing maximum likelihood estimate

<latexit sha1_base64="CaRu3bohmMwwqXh5y9qIYwNjphg=">AAAB8XicbVBNS8NAEN34WetX1aOXxSLUS0m8qLeiF48VjC2koWy2k3bpZhN2J4VS+jO8eFDx6r/x5r9x2+agrQ8GHu/NMDMvyqQw6Lrfztr6xubWdmmnvLu3f3BYOTp+MmmuOfg8laluR8yAFAp8FCihnWlgSSShFQ3vZn5rBNqIVD3iOIMwYX0lYsEZWimIa23aGQGnw4tuperW3TnoKvEKUiUFmt3KV6eX8jwBhVwyYwLPzTCcMI2CS5iWO7mBjPEh60NgqWIJmHAyP3lKz63So3GqbSmkc/X3xIQlxoyTyHYmDAdm2ZuJ/3lBjvF1OBEqyxEUXyyKc0kxpbP/aU9o4CjHljCuhb2V8gHTjKNNqWxD8JZfXiX+Zf2m7j241cZtkUaJnJIzUiMeuSINck+axCecpOSZvJI3B50X5935WLSuOcXMCfkD5/MHKGeQDA==</latexit>

GLM summary

k̂ = (XTX)�1XTY

1. Linear-Gaussian GLM: Y |X,~k ⇠ N (X~k,�2I)

2. Poisson GLM:

3. Bernoulli GLM: yt|~xt,~k ⇠ Ber(f(~xt · ~k))

<latexit sha1_base64="UNt4lEq3y3m8KLk2pIT2tsWpp4c=">AAACbXicbZDNThsxFIWdofw0LRBA6qZQWY0QsCCaYdOyqITaTRcsqERKqkyIPM6dxIp/RvYdSjSaZ+Fpum3XfQseAWeYRQm9kq2j890rX58kk8JhGP5tBEsvlldW1142X71e39hsbW1/dya3HLrcSGN7CXMghYYuCpTQyywwlUi4SqZf5vzqBqwTRl/iLIOBYmMtUsEZemvYOo0Vwwlnsjgv6Sf6I0ab0ViaMU0Pe/HN9Ige06olSYuorGgNhq122Amros9FVIs2qetiuNXYi0eG5wo0csmc60dhhoOCWRRcQtmMcwcZ41M2hr6Xmilwg6L6Y0n3vTOiqbH+aKSV++9EwZRzM5X4zvm6bpHNzf+xfo7px0EhdJYjaP74UJpLiobOA6MjYYGjnHnBuBV+V8onzDKOPtZmM9bwkxulmB4VPpbSX8DptFwAtzW4XQA+0LK4jtFkpQ80WozvueiedE474bewffa5TnaNvCXvySGJyAdyRr6SC9IlnNyRX+Q3+dO4D94Eu8G7x9agUc/skCcVHDwAMYS9SQ==</latexit>

<latexit sha1_base64="HP+VfMMA0E+XGBPwmKdZYtaDvzk=">AAACcnicbZHLahsxFIbl6S11L7GTZSioNQUbGjPTTZJFSkg3WXThQl07eFyjkc84wroM0hknZpi3ydNkm2z6IN1HdgfaOj0g8ev/dJD0K8mkcBiGP2vBo8dPnj7bel5/8fLV6+1Gc+e7M7nl0OdGGjtMmAMpNPRRoIRhZoGpRMIgmX9e8cECrBNGf8NlBmPFZlqkgjP01qTxKVYMLziTxZeSHtPzGG1GY2lmNG0P48W8Q/dpO9o/7/wBflmxDp00WmE3XBd9KKJKtEhVvUmz9iaeGp4r0Mglc24UhRmOC2ZRcAllPc4dZIzP2QxGXmqmwI2L9UNL+t47U5oa64dGunb/7iiYcm6pEr9z9Sy3yVbm/9gox/RwXAid5Qia/z4ozSVFQ1ep0amwwFEuvWDcCn9Xyi+YZRx9tvV6rOGSG6WYnhY+mNJPwOm83ABXFbjaAD7asvgRo8lKH2i0Gd9D0f/YPeqGX8PWyWmV7BbZI+9Im0TkgJyQM9IjfcLJNbkht+Su9ivYC94G1TcEtapnl/xTwYd7gzC9GQ==</latexit>

log-likelihood:

<latexit sha1_base64="F3rP+XsnNYD2Mnh1h5tcub2kI7M=">AAACXXicbVBNb9NAEN2YUkpaSkoPHLisGiGlEorsXgD1UrUXjqlEaKVsFK3X43SV/bB2xyWR8f/h13Cl3PpTuk59gJSRdvT03hvNzksLJT3G8Z9O9Gzr+faLnZfd3b1X+697B2++eVs6AWNhlXXXKfegpIExSlRwXTjgOlVwlS4uGv3qFpyX1nzFVQFTzedG5lJwDNSsd76iPyi7XX4IbUHZKfNSs1NKmeZ443Q1stL7mqVyPqD5IBhnSJnILDb+44Y/nvX68TBeF30Kkhb0SVuj2UFni2VWlBoMCsW9nyRxgdOKO5RCQd1lpYeCiwWfwyRAwzX4abU+tqbvA5PR3LrwDNI1+/dExbX3K50GZ3OD39Qa8n/apMT807SSpigRjHhclJeKoqVNcjSTDgSqVQBcOBn+SsUNd1xgyLfbZQa+C6s1N1kVsqlDA0EX9YawbIVlHXJLNlN6CsYnw8/D5DLun523Ae6Qd+SIDEhCPpIz8oWMyJgI8pP8Ir/JXec+2o72ov1Ha9RpZw7JPxW9fQBidLYs</latexit>

<latexit sha1_base64="/eGfDfF6Yx5Ln/n5tPz6/Ht41Bw=">AAACZnicbZDNThsxFIWdaaEQ/gJVxQIhWURIVBXRTDaFHSqbLkEiJSgTIo9zJ1jxz8i+A0SjeY8+TbflFfoIfQucMEgQuJKt43Pule0vyaRwGIb/asGHjwuLn5aW6yura+sbjc2tX87klkOHG2lsN2EOpNDQQYESupkFphIJl8n4dJpf3oJ1wugLnGTQV2ykRSo4Q28NGu3DGFPLeBGVRTt2YqTYdbs8uDrsxrfjrzHajD4f6DfKjXY4aDTDVjgr+lZElWiSqs4Gm7XdeGh4rkAjl8y5XhRm2C+YRcEllPU4d5AxPmYj6HmpmQLXL2afK+m+d4Y0NdYvjXTmvpwomHJuohLfqRjeuPlsar6X9XJMj/qF0FmOoPnTRWkuKRo6JUWHwgJHOfGCcSv8Wym/YZ4Vep71eqzhjhulmB4WHk/pN+B0XM4F91VwPxd4smVxHaPJSg80msf3VnTareNWdB42T35UZJfIDtkjByQi38kJ+UnOSIdw8pv8IX/JQ+1/sBF8CbafWoNaNfOZvKqAPgIWdLp7</latexit>

log-likelihood:

MLE:

log-likelihood:

“logistic regression” if

integer counts

binary counts

continuous

<latexit sha1_base64="lp1eCGr0ThxpOBIdDQbd/kxVouE=">AAACTXicdZDfahNBFMZno7U1tTbVKxFhaBBaistOTU17USh642UFYwvZNMxOzrZD5s8yM1sThqVP09v6Kt76Il6V4mwaQYMemOHj+53DmfmyQnDrkuRH1HjwcOnR8srj5uqTtafrrY1nX6wuDYMe00Kb04xaEFxBz3En4LQwQGUm4CQbf6j5ySUYy7X67KYFDCQ9VzznjLpgDVsv8q3JNj7EaW4o86TyZAfO/JtJVQ1b7SQ+2O+8JR2cxAkh3Xd7tdjtdvYIJnEyqzaa1/FwI3qVjjQrJSjHBLW2T5LCDTw1jjMBVTMtLRSUjek59INUVIId+NkfKvw6OCOcaxOOcnjm/jnhqbR2KrPQKam7sIusNv/F+qXL9weeq6J0oNj9orwU2GlcB4JH3ABzYhoEZYaHt2J2QUMaLsTWbKYKvjItJVUjn16Oq3ABw+NqAUzmYLIAnCkqf5Y6XdSB/k4N/1/0duODmHxK2kfv58muoJdoE20hgrroCH1Ex6iHGLpC1+gGfYu+Rz+j2+juvrURzWeeo7+qsfwLVEK0Hg==</latexit>

yxstimuli spike trains

NEXT:

GLMs with spike-history and coupling

stimulus filter Poissonspiking

stimulus

kλ(t)

spike rate

exponentialnonlinearity

f

• problem: assumes spiking depends only on stimulus!

Poisson GLM

Poisson GLM with spike-history dependence

post-spike filter

exponentialnonlinearity

probabilisticspiking

stimulus

stimulus filter

+

(Truccolo et al 2004, Gerstner 2001)

• output: no longer a Poisson process• interpretation: “soft-threshold” integrate-and-fire model

spike rate:

kh

f

Poisson GLM with spike-history dependence

post-spike filter

exponentialnonlinearity

probabilisticspiking

stimulus

stimulus filter

+

(Truccolo et al 2004, Gerstner 2001)

kh

f

filter output

traditional IF

filter output

∞“hard threshold”

“soft-threshold” IF

spik

e ra

te

• interpretation: “soft-threshold” integrate-and-fire model

<latexit sha1_base64="Foj+KEmVWEyBlwzgy1UK2SPlVfo=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3RXduFSwKnSqZNLTNjSXITlTLcM8gVt9Fd9FcCVufQfTOgutHkj4+b9zyMmfpFI4DMPXYGp6ZnZufmGxsrS8srq2vrF55UxmOTS4kcbeJMyBFBoaKFDCTWqBqUTCddI/HfHrAVgnjL7EYQotxbpadARn6K2Lq7v1algLx0X/iqgUVVLW+d1GsBO3Dc8UaOSSOdeMwhRbObMouISiEmcOUsb7rAtNLzVT4Fr5eNOC7nmnTTvG+qORjt2fEzlTzg1V4jsVw56bZCPzP9bMsHPUyoVOMwTNvx/qZJKioaNv07awwFEOvWDcCr8r5T1mGUcfTqUSa7jnRimm23k86Bf+Ak77xQR4KMHDBECbFvltjCYtfKDRZHx/ReOgdlyLLsJq/aRMdoFsk12yTyJySOrkjJyTBuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkFuZyrjQ==</latexit>

GLM dynamic behaviors

post-spike filter h(t)

stimulus

p(spike)

• irregular spiking

filter outputs(“currents”)

GLM dynamic behaviors

post-spike filter h(t)

stimulus

p(spike)

• regular spiking

filter outputs(“currents”)

GLM dynamic behaviors

post-spike filter h(t)

stimulus

filter outputs(“currents”)

p(spike)

• adaptation

GLM dynamic behaviors

post-spike filter h(t)

• bursting

filter outputs(“currents”)

p(spike)

stimulus

GLM dynamic behaviors (from Izhikevich)A B C D

E F G H

I J K L

M N O P

tonic spiking phasic spiking tonic bursting phasic bursting

mixed mode type I type II

spike latency resonator integrator rebound spike

rebound burst variabilitybistability I bistability II

50 ms

spike frequencyadaptation

threshold

Figure 6: Suite of dynamical behaviors of Izhikevich and GLM neurons. Each panel,

top to bottom: stimulus (blue), Izhikevich neuron response (black), GLM responses

on five trials (gray), stimulus filter (left, blue), and post-spike filter (right, red). Black

line in each plot indicates a 50 ms scale bar for the stimulus and spike response.

(Differing timescales reflect timescales used for each behavior in original Izhikevich

paper (Izhikevich, 2004)). Stimulus filter and post-spike filter plots all have 100 ms

duration.

19

(Weber & Pillow 2017)

Izhikevich neuron

GLM spikes

stimulus

GLM parameters

multi-neuron GLM

exponentialnonlinearity

probabilisticspiking

stimulus

neuron 1

neuron 2

post-spike filter

stimulus filter

+

+

multi-neuron GLM

exponentialnonlinearity

probabilisticspiking

coupling filters

stimulus

neuron 1

neuron 2

post-spike filter

stimulus filter

+

+

...

time t

GLM equivalent diagram:

spike rate

Uzzell et al (J Neurophys 04)

• stimulus = binary flicker• parasol retinal ganglion cell spike responses

Example dataset

Uzzell et al (J Neurophys 04)

• stimulus = binary flicker• parasol retinal ganglion cell spike responses

Example dataset

YX~k

time

time lag

<latexit sha1_base64="T7110r2UjToEat5gBhsCYBUeHmw=">AAACOHicbZDLThsxFIY9XAqktFy6REhWI6SwiWbYADsEmy5TqSFBmTTyOCfBjS8j+0wgms47sIVX4Um67KpiyxPgJLOA0CPZ+vV/58jHf5JK4TAM/wRLyyurH9bWNyofNz993tre2b10JrMcmtxIY9sJcyCFhiYKlNBOLTCVSGglo4spb43BOmH0D5yk0FVsqMVAcIbeumzUrn63D3vb1bAezoq+F1EpqqSsRm8n2I/7hmcKNHLJnOtEYYrdnFkUXEJRiTMHKeMjNoSOl5opcN18tm5BD7zTpwNj/dFIZ+7riZwp5yYq8Z2K4bVbZFPzf6yT4eCkmwudZgiazx8aZJKiodO/076wwFFOvGDcCr8r5dfMMo4+oUol1nDDjVJM9/N4PCr8BZyOigVwW4LbBYA2LfKfMZq08IFGi/G9F82j+mk9+h5Wz87LZNfJHvlKaiQix+SMfCMN0iSc/CJ35J48BI/B3+Bf8DRvXQrKmS/kTQXPLyBqrTc=</latexit>

model

Stimulus-only GLM

spike responsedesign matrix

time <latexit sha1_base64="T7110r2UjToEat5gBhsCYBUeHmw=">AAACOHicbZDLThsxFIY9XAqktFy6REhWI6SwiWbYADsEmy5TqSFBmTTyOCfBjS8j+0wgms47sIVX4Um67KpiyxPgJLOA0CPZ+vV/58jHf5JK4TAM/wRLyyurH9bWNyofNz993tre2b10JrMcmtxIY9sJcyCFhiYKlNBOLTCVSGglo4spb43BOmH0D5yk0FVsqMVAcIbeumzUrn63D3vb1bAezoq+F1EpqqSsRm8n2I/7hmcKNHLJnOtEYYrdnFkUXEJRiTMHKeMjNoSOl5opcN18tm5BD7zTpwNj/dFIZ+7riZwp5yYq8Z2K4bVbZFPzf6yT4eCkmwudZgiazx8aZJKiodO/076wwFFOvGDcCr8r5dfMMo4+oUol1nDDjVJM9/N4PCr8BZyOigVwW4LbBYA2LfKfMZq08IFGi/G9F82j+mk9+h5Wz87LZNfJHvlKaiQix+SMfCMN0iSc/CJ35J48BI/B3+Bf8DRvXQrKmS/kTQXPLyBqrTc=</latexit>

model model

stimulus portion

spike-historyportion

Stimulus + SpikeHistory GLM

YX~k

spike responsedesign matrix

time <latexit sha1_base64="T7110r2UjToEat5gBhsCYBUeHmw=">AAACOHicbZDLThsxFIY9XAqktFy6REhWI6SwiWbYADsEmy5TqSFBmTTyOCfBjS8j+0wgms47sIVX4Um67KpiyxPgJLOA0CPZ+vV/58jHf5JK4TAM/wRLyyurH9bWNyofNz993tre2b10JrMcmtxIY9sJcyCFhiYKlNBOLTCVSGglo4spb43BOmH0D5yk0FVsqMVAcIbeumzUrn63D3vb1bAezoq+F1EpqqSsRm8n2I/7hmcKNHLJnOtEYYrdnFkUXEJRiTMHKeMjNoSOl5opcN18tm5BD7zTpwNj/dFIZ+7riZwp5yYq8Z2K4bVbZFPzf6yT4eCkmwudZgiazx8aZJKiodO/076wwFFOvGDcCr8r5dfMMo4+oUol1nDDjVJM9/N4PCr8BZyOigVwW4LbBYA2LfKfMZq08IFGi/G9F82j+mk9+h5Wz87LZNfJHvlKaiQix+SMfCMN0iSc/CJ35J48BI/B3+Bf8DRvXQrKmS/kTQXPLyBqrTc=</latexit>

model model

stimulus portion

neuron 1spike-hist

neuron 2spike-hist

neuron 3spike-hist

neuron 4spike-hist

YX~k

spike responsedesign matrix

Stimulus + History + 3 Neuron Coupling GLM

Fitting: Maximum Likelihood

• maximize log-likelihood for filters {k, h1, h2, …hn}GLMData

• log-likelihood is concave • no local maxima [Paninski 04]

logP (Y |X) =X

t

yt log �t � �t

<latexit sha1_base64="6FqC25K3VDOCpK+hSgntuV0eIRs=">AAACVXicbVBNT9tAFFy7FGgoEOBYIW0bVaKXyO6lcKiE4MIRJAJIcYjW62dYZT+s3eeQyPK5v6ZX+Cv0z1RsEh/awEi7Gs3M09udtJDCYRT9CcJ3K+9X19Y/tDY+bm5tt3d2r5wpLYceN9LYm5Q5kEJDDwVKuCksMJVKuE5HpzP/egzWCaMvcVrAQLE7LXLBGXpp2P6cSB/O2BDpT5ofJOOJZwnPjL/HwOno27DdibrRHPQ1iRvSIQ3OhzvBfpIZXirQyCVzrh9HBQ4qZlFwCXUrKR0UjI/YHfQ91UyBG1Tzv9T0q1cymhvrj0Y6V/+dqJhybqpSn1QM792yNxPf8vol5oeDSuiiRNB8sSgvJUVDZ8XQTFjgKKeeMG6Ffyvl98wyjr6+VivR8MCNUkxnVTIe1dWinnrJmDTGZMlAW9TVbYKmqH2h8XJ9r0nve/eoG19EneOTptl18ol8IQckJj/IMTkj56RHOPlFfpNH8hQ8B3/DlXB1EQ2DZmaP/Idw+wUvX7T3</latexit>

firing rate:

convexity and concavity

concave convex

• everywhere downward curvature

• everywhere upward curvature

• maximizing concave function ⟺ minimizing a convex function

• preclude existence of non-global local optima

1 2 3 4 5 7 8 9 106

2

11

10

9

8

6

5

4

3

7

75 sp/s

50 ms

1

2

3

4

5

6 7

8

9

1 0

11

receptive fields

modeling correlation structure in neural spike trains

dataGLMuncoupled GLM

Pillow et al 2008

37

[Pillow et al 2008]

cross-correlationscell #

cell

#

capturing dependencies in multi-neuron responses

retinal receptive fields

Decoding

y

x ?

• estimate stimuli from the observed spike times • tool for comparing different encoding models

1 pixel

Frechette et al, 2005

Decode: response 1

Q: what was the stimulus?

Frechette et al, 2005

Decode: response 2

Q: what was the stimulus?

0

55

0 1 2 3

cell

time (s)

Responses to Moving Bar

Frechette et al, 2005

y

x ?

Bayes’ rule:

likelihoodposterior prior

Bayesian Decoding

y

x ?

Bayes’ rule:

likelihoodposterior prior

“independent” (uncoupled GLM)

“joint encoding” (coupled GLM)

vs.

Bayesian Decoding

Decoding Comparison

Bayesian decoding

0

1 0

2 0

3 0

4 0w/o coupling

with coupling

20% increase

linear decoding

linear

[Pillow et al 2008]

Regularization

Modern statistics• more dimensions than samples

= N observations

D regressors

• fewer equations than unknowns!• no unique solution

+ noise

Simulated Example

truetrue w maximum likelihood

• 100-element filter (D=100)• 100 noisy samples (N=100)

maximize

“overfitting” - parameters fit to details in the training data that are not useful for predicting new data

Simulated Example

truetrue w maximum likelihood

maximize

“ridge regression”

maximize

penalty on big weights

• 100-element filter (D=100)• 100 noisy samples (N=100)

true

maximum likelihood

ridge

Lasso

automatic relevance

automatic smoothness

“L2 shrinkage”

ARD(sparse)

ASD

“L1 shrinkage”(sparse)

- biased, but gives improved performance for appropriate choice of λ (James & Stein 1960)

Simulated Example

truetrue w maximum likelihood

maximize

“smoothed”

maximize

smoothnesspenalty

• 100-element filter (D=100)• 100 noisy samples (N=100)

Simplest answer: use cross-validation!Q: how to set the regularization strength ? �

GLM tutorial (matlab):code: https://github.com/pillowlab/GLMspiketraintutorial data: available on request from [email protected]

• tutorial1_PoissonGLM.m - fitting of a linear-Gaussian GLM and Poisson GLM (aka LNP model) to RGC neurons stimulated with temporal white noise stimulus.

• tutorial2_spikehistcoupledGLM.m - fitting of a Poisson GLM with spike-history and coupling between neurons.

• tutorial3_regularization_linGauss.m - regularizing linear-Gaussian model parameters using maximum a posteriori (MAP) estimation under two kinds of priors:◦ (1) ridge regression (aka "L2 penalty");◦ (2) L2 smoothing prior (aka "graph Laplacian”).

• tutorial4_regularization_PoissonGLM.m - MAP estimation of Poisson-GLM parameters using same two priors as in tutorial3.

GLM summary

• linear (“dim reduction”) + nonlinear + noise

• incorporate spike-history via “spike history” filter

• rich dynamical properties: refractoriness, bursting, adaptation

• incorporate correlations between neurons via “coupling” filters

• flexible tool for encoding & decoding analyses

• regularize to reduce overfitting (essential w/ correlated stimuli)

Beyond GLM

Taylor series expansion of a function f(x) in n dimensions

const

vector

matrix

3-tensor

1 n (20)

n3

(8000)n2

(400)# parameters:

polynomial models

Volterra / Wiener Kernels

Lee & Schetzen 1965Marmarelis & Naka 1972Korenberg & Hunter1986

• from “systems identification” literature (1960s-70s)• white noise stimuli • estimate kernels using moments of spike-triggered stimuli

Why are Volterra/Wiener models (generally) bad?

• no output nonlinearity• polynomials give poor fit to neural nonlinearities (e.g.,

rectifying, saturating)

-3 -2 -1 0 1 2 3-50

0

50

100

150

rate

(sp/

s)

stimulus projection

true responselinear modelquad model

• responses may depend on more than one projection of stimulus!• emphasis on dimensionality reduction

• no longer technically a GLM if fitting nonlinearity f

multi-filter LNP

multi-filter LNP

• Spike-triggered covariance (STC) [de Ruyter & Bialek 1998, Schwartz et al 2006]

• Generalized Quadratic Model (GQM) [Park & Pillow 2011; Park et al 2013; Rajan et al 2013]

• maximally informative dimensions (MID) / maximum likelihood

[Sharpee et al 2004] [Williamson et al 2015]

Estimators:

extending GLM to conductance-based model

Stimulus

nonlinearity

inh filter

exc filter

noise

post-spike filterspikes

�(t) = f(V (t))

membranedynamics

dV

dt= gl(El � V ) + ge(Ee � V ) + gi(Ei � V )

inst. spike rate

conductancesge(t) = fc(ke · x(t))gi(t) = fc(ki · x(t))

• shunting inhibition• adaptive changes in dynamics

[Latimer et al 2014]

excitatory (from spikes)inhibitory (from spikes)excitatory (from conductance)inhibitory (from conductance)

Linear filters

0 100 200

−0.1

0

0.1

time (ms)

gain

• intracellular recordings in macaque parasol RGCs (Fred Rieke)

250ms

10nS

exci

tato

ryin

hibi

tory

measured conductancesfit to conductance (R2=0.83)fit to spikes (R2 = 0.63)

(R2=0.63)(R2 = 0.51)

extending GLM to conductance-based model

[Latimer et al 2014]

many other biophysically oriented extensions

other aspects of neuronal processing into the linear stimulus-processing framework, and can be used to model nonlinearstimulus processing through predefined nonlinear transformations[31,32,58]. Importantly, this approach also provides a foundationfor parameter estimation for the NIM.

A principal motivation for the NIM structure is that if theneuronal output at one level is well described by an LN model,downstream neurons will receive inputs that are already rectified(or otherwise nonlinearly transformed). Thus, we use LN modelsto represent the inputs to the neuron in question, and the neuron’sresponse is given by a summation over these LN inputs followed bythe neuron’s own spiking nonlinearity (Fig. 2B). Importantly, thisallows us to account for the rectification of a neuron’s inputsimposed by the spike-generation process. The NIM can thus beviewed as a ‘second-order’ generalization of the LN model, or anLNLN cascade [59,60]. Previous work from our lab [45] cast thismodel structure in a probabilistic form, and suggested severalstatistical innovations in order to fit the models using neural data[45,61,62]. Here, we present a general and detailed framework forNIM parameter estimation that greatly extends the applicability ofthe model. This model structure has also been suggested forapplications outside of neuroscience in the form of projectionpursuit regression [63], including generalizations to responsevariables with distributions from the exponential family [64].

The processing of the NIM is comprised of three stages (Fig. 2C):(a) the filters ki that define the stimulus selectivity of each input; (b)the static ‘upstream’ nonlinearities fi(.) and corresponding linearweights wi which determine how each input contributes to theoverall response; and (c) the spiking nonlinearity F[.] applied to thelinear sum over the neuron’s inputs. The predicted firing rate r(t) isthen given as:

r(t)~FX

i

w ifi ki:s(t)ð Þð Þzh:x(t)

" #

, ð1 Þ

where s(t) is the (vector-valued) stimulus at time t, x(t) representsany additional covariates (such as the neuron’s own spike history),and h is a linear filter operating on x. Note that equation (1)reduces to a GLM when the fi(.) are linear functions. The wi canalso be extended to include temporal convolution of the subunitcontributions to model the time course of post-synaptic responsesassociated with individual inputs [45], as well as ‘spatial’convolutions to account for multiple spatially distributed inputswith similar stimulus selectivity [65]. Since equivalent models canbe produced by rescaling the wi, and fi(.) (see Methods), weconstrain the subunit weights wi to be either +/21. Because wegenerally assume the fi(.) are rectifying functions, the wi thus specifywhether each subunit will have an ‘excitatory’ or ‘inhibitory’influence on the neuron.

Parameter estimation for the NIM is based on maximumlikelihood (or maximum a posteriori) methods similar to those usedwith the GLM [6–8]. Assuming that the neuron’s spikes aredescribed in discrete time by a conditionally inhomogeneousPoisson count process with rate function r(t), the log-likelihood (LL)of the model parameters given an observed set of spike countsRobs(t) is given (up to an overall constant) by:

LL~X

t

Robs(t)log r(t){r(t)ð Þ: ð2 Þ

To find the set of parameters that maximize the likelihood (eq.2), we adapt methods that allow for efficient parameteroptimization of the GLM [7]. First, we use a parametric spikingnonlinearity given by F[x] = alog[1+exp(b(x-h))], with scale a,shape b, and offset h. Other functions can be used, so long as theysatisfy conditions specified in [7]. This ensures that the likelihoodsurface will be concave with respect to linear parameters inside thespiking nonlinearity [7], and in practice will be well-behaved forother model parameters (see Fig. S1; Methods).

Figure 2. Schematic of LN and NIM structures. A) Schematic diagram of an LN model, with multiple filters (k1, k2, …) that define the linearstimulus subspace. The outputs of these linear filters (g1, g2, …) are then transformed into a firing rate prediction r(t) by the static nonlinear functionF[g1,g2,…], depicted at right for a two-dimensional subspace. Note that while the general LN model thus allows for a nonlinear dependence onmultiple stimulus dimensions, estimation of the function F[.] is typically only feasible for low (one- or two-) dimensional subspaces. B) Schematicillustration of a generic neuron that receives input from a set of ‘upstream’ neurons that are themselves driven by the stimulus s. Each of theupstream neurons provides input to the model neuron that is generally rectified due to spike generation (inset at left), and thus is either excitatory orinhibitory. The model neuron then integrates its inputs and produces a spiking output. C) Block diagram illustrating the structure of the NIM, basedon (B). The set of inputs are represented as (one-dimensional) LN models, with a corresponding stimulus filter ki, and ‘‘upstream nonlinearity’’ fi(.).These inputs are then linearly combined, with weights wi, and fed into the spiking nonlinearity F[.], resulting in the predicted firing rate r(t). The NIMthus has a ‘second-order LN’ structure (or LNLN), with the neuron’s own nonlinear processing shaped by the LN nature of its inputs.doi:10.1371/journal.pcbi.1003143.g002

A Nonlinear Neuronal Model of Sensory Processing

PLOS Computational Biology | www.ploscompbiol.org 4 July 2013 | Volume 9 | Issue 7 | e1003143

to the system is analogous to a reaction rate that depends on theconcentration of the reactants. For example, the change in theactive state is described by

dA

dt= inflow! outflow= kau ðtÞRðtÞ ! kfiAðtÞ; (Equation 1)

where R(t) and A(t) are the occupancies of the resting and activestates, ka and kfi are constants, and u (t) is the input that scalesthe activation rate constant, ka.

When a train of pulses of either small or large amplitude drivesthe four-state system, the larger input produces output pulseswith a smaller gain and also increases the baseline (Figure 2A).To produce dynamics with both fast and slow timescales, thefourth state (I2) couples to the first inactivated state (I1), usingslower rate constants. As a result, a slow shift in baseline occursfollowing a change in the amplitude of the input. The rateconstants in the four-state model are the rates of activation(ka), fast inactivation (kfi), fast recovery (kfr), slow inactivation(ksi), and slow recovery (ksr).

Although this four-statesystemcanproduceadaptivechanges,it lacks the temporal filtering and selectivity of retinal neurons. At

D

A

B

C

E

g(t) u(t)

N(g)

s(t)

Nonlinearity Kinetics

u ka

kfikfr

ksiu ksr

R A

I1

I2

r(t)

F(t)

u 60

320

1 0.03

R A

I1

I2u(t) A(t)

Figure 2. The LNK Model(A) A train of impulses that changed from low- to

high-amplitude is shownasan input, u (t), presented

to a first-order kinetic model with four states.

Numbers indicate rate constants for transitions

between the resting (R), active (A), and inactivated

states, (I1 and I2). The rateconstantbetween resting

and active states ismodulated by u (t). The output is

the occupancy of the active state (A(t)).

(B) The LNK model. The input, s(t), is convolved

with a linear temporal filter, FLNK(t), and then

passed through a static nonlinearity, NLNK(g), that

does not change with contrast. The output of

the nonlinearity, u (t), controls two rate constants

in the kinetics block, one that leads to the active

state and one that accelerates recovery from the

inactivated state, I2. Other rate constants are fixed,

and the output of the model r0ðtÞ is the active state.

(C) The membrane potential of an adapting ama-

crine cell compared to the LNK model output for a

transition to low contrast (left) and to high contrast

(right).

(D) The LNK model compared to the amacrine cell

response for three repeats of an identical stimulus

sequence.

(E) The distribution of the absolute difference in

membrane potential between responses to an

identical stimulus compared to the distribution of

the difference between the model output and

membrane potential responses. Results are

combined for six cells with three repeated

responses across the entire recording.

a fixed mean luminance, photoreceptorsare nearly linear. Strong rectification firstappears in amacrine and ganglion cells,coincidingwith strong contrast adaptation(Baccus and Meister, 2002; Kim andRieke, 2001; Rieke, 2001). This threshold

likely arises from voltage-dependent calcium channels in thebipolar cell synaptic terminal (Heidelberger and Matthews,1992), a point that would occur prior to adaptive changes insensitivity in the presynaptic terminal or postsynaptic membrane.Thus, we combined the adaptive system with a linear-nonlinearmodel, yielding a system with a linear temporal filter, a staticnonlinearity, and an adaptive kinetics block (Figure 2B). In thislinear-nonlinear-kinetic (LNK) model, the kinetics block contrib-utes both to the overall temporal filtering and the sensitivity ofthe system, making these properties depend on the input. Thus,the linear filter (FLNK) and nonlinearity (NLNK) of the LNK modelare not the same as the filter and nonlinearity, FLN and NLN,respectively, in an LN model fit to the entire response. To couplethe initial linear-nonlinear system to the kinetics block, theoutput of the nonlinearity, u (t), scales one or two rate constants.Although this means that the transition rate is proportional to thenonlinearity output, a higher-order dependence—such as thedependence of vesicle release on a higher power of the calciumconcentration—can be captured in the nonlinearity itself.We fit LNK models using a constrained optimization algorithm

(see Experimental Procedures). The filter and nonlinearity were

Neuron

The Computational Structure of Variance Adaptation

1004 Neuron 73, 1002–1015, March 8, 2012 ª2012 Elsevier Inc.

[Real, Asari, Gollisch & Meister 2017]

Nonlinear input model (NIM)[McFarland, Cui, & Butts 2013]

Linear-Nonlinear-Kinetics (LNK)[Ozuysal & Baccus 2014]

We have developed a general model for V1 neurons (Fig. 1),along with a direct method for fitting it to spiking data. Themodel has two channels (one excitatory, one suppressive), eachformed from a weighted sum of linear-nonlinear (LN) subunits,similar in concept to Hubel and Wiesel’s original description ofcomplex cell responses as resulting from a spatial combination ofsimple cells (Hubel and Wiesel, 1962), to Barlow and Levick’scharacterization of directionally selective retinal ganglion cells inthe rabbit (Barlow and Levick, 1965), and to Victor and Shapley’ssubunit model for Y-type ganglion cells in cat retina (Victor andShapley, 1979).

To make the fitting problem tractable, we assume that thesubunit filters of each channel differ only in spatial position andthat their nonlinearities are identical. The difference between theresponses of the two channels is transformed with a rectifyingnonlinearity to give the firing rate of the neuron. We have devel-oped a method for directly and efficiently estimating the model,and have tested it on data from V1 neurons driven by spatiotem-poral white noise. The results show that the fitted model outper-forms previously published functional models (specifically, theLN, energy, and spike-triggered covariance [STC]-based models)for all cells, in addition to providing a more biologically reason-able account of the origins of cortical receptive fields. A briefaccount of some of this work has appeared previously (Vintch etal., 2012).

Materials and MethodsElectrophysiologyWe recorded from 38 well-isolated single neurons in the primary visualarea (V1) of adult macaque monkeys (Macaca nemestrina and M. fascicu-laris; 6 males), using methods that are described in detail previously(Cavanaugh et al., 2002). Typical experiments spanned 5–7 d duringwhich animals were maintained in an anesthetized and paralyzed statethrough a continuous intravenous infusion of sufentanil citrate and ve-curonium bromide. Vital signs (temperature, heart rate, end-tidal PCO2

levels, blood pressure, EEG activity, and urine quantity, and specific

gravity) were continuously monitored and maintained within physiolog-ical limits. Eyes were treated with topical gentamicin, dilated with topicalatropine, and protected with gas-permeable hard contact lenses. Addi-tional corrective lenses were chosen via direct ophthalmoscopy to makethe retinae conjugate with the experimental display. All experimentalprocedures and animal care were performed in accordance to protocolsapproved by the New York University Animal Welfare Committee, andin compliance with the National Institute of Health Guide for the Care andUse of Laboratory Animals.

We recorded neuronal signals with quartz-platinum-tungsten micro-electrodes (Thomas Recording) lowered through a craniotomy and du-rotomy centered between 10 and 16 mm lateral to the midline andbetween 3 and 6 mm behind the lunate sulcus. We recorded across allcortical depths. Receptive fields were centered in the inferior quadrant ofthe visual field, between 2 and 5 degrees from the center of gaze. Theamplified signal from the electrode was bandpassed (300 Hz to 8 kHz)and routed to a time-amplitude discriminator, which detected and time-stamped spikes at a resolution of 0.1 ms.

Visual stimulationWe presented pixellated (XYT) noise stimuli on a gamma-corrected CRTmonitor (Eizo T966; mean luminance, 33 cd/m 2), at a resolution of1280 ! 960 pixels, with a refresh rate of 120 Hz, positioned 114 cm fromthe animal’s eyes. We generated stimuli pseudorandomly using Exposoftware (http://corevision.cns.nyu.edu) on an Apple Macintosh com-puter. The stimuli consisted of a pixel array (usually 16 ! 16) filled withwhite, ternary noise that was continuously refreshed at 40 Hz. The widthof the array was approximately double that of the receptive field (asmeasured with optimized drifting gratings) while still maintaining ade-quate pixel resolution to capture receptive field features. For a subset ofcells, we measured responses to a repeated “frozen” sample of noise forcross-validation. Each sample of frozen noise had identical statistics tothe main white-noise stimuli, lasted 25 s, and was repeated 20 times insuccession.

Data for flickering bar (XT) noise stimuli were taken from Rust et al.(2005); these data were collected similarly, and details can be found in theoriginal article. Stimuli were displayed using a Silicon graphics Octane-2workstation at 100 Hz. Each frame consisted of a square region contain-ing 16 adjacent parallel bars of the neuron’s preferred orientation, ar-

a

b

Figure 1. Subunit model for a single channel. a, A signal flow diagram describes how stimulus information is converted to a firing rate. The stimulus is passed through a bank of spatially shifted,but otherwise identical, linear-nonlinear subunits. The activity of these subunits is combined with a weighted sum over space (and optionally time; not shown), and passed through an outputnonlinearity to generate a firing rate. b, Responses of intermediate model stages are depicted for an example input (for simplicity, a single frame is shown, rather than a temporal sequence). Eachstage, except for the last, maps a spatially distributed array of inputs to a spatially distributed array of responses, depicted as pixel intensities in an image. The pooling stage sums over these responsesand applies the final output nonlinearity.

14830 • J. Neurosci., November 4, 2015 • 35(44):14829 –14841 Vintch et al. • A Subunit Model of V1 Responses

LNFDSNF modelconvolutional subunit model

[Vintch, Movshon & Simoncelli 2015, Wu, Park & Pillow 2014]

modeling. Repeated presentations of the same flicker sequencereliably evoked very similar spike trains (Figures 2A, 2B, andS2B), as expected from previous studies [9–11]. This suggeststhat essential features of the retina’s light response can becaptured by a deterministic model of the ganglion cell and itsinput circuitry [4]. In addition, we presented a long non-repeatingflicker sequence to explore as many spatiotemporal patterns aspossible. Thirty ganglion cells were selected for quantitativemodeling based on the stability of their responses throughoutthe extended recording period.

Modeling ApproachWe focused on predicting the firing rate of ganglion cells (GCs),namely the expected number of spikes fired in any given 1/60 sinterval. Mathematical models were constructed that take thetime course of the flicker stimulus as input and produce a timecourse of the firing rate at the output. The parameters of themodel were optimized to fit the long stretch of non-repeatingflicker (!80% of the data; the ‘‘training set’’). Specifically, wemaximized the fraction of variance in the firing rate that themodelexplains (Equation S10) [11]. Then the model performance wasevaluated on the remaining data examined with the repeated

flicker (!20%; the ‘‘test set’’). This performance metric wastracked across successive changes in the model structure.As a formalism, we chose so-called cascade models [4, 5].

These are networks of simple elements that involve either linearfiltering (convolution in space and time) or a static nonlineartransform. They map naturally onto neural circuitry (Figure 1)and can be adjusted from a coarse-grained version (everyneuron is an element) to arbitrarily fine-grained ones (multi-compartment models of every neuron and synapse).As a reference point, we chose the so-called LN model, con-

sisting of a single linear-nonlinear cascade (Figure 1B). Thishas been very popular in sensory neuroscience [12–14] andserves as a common starting point for fitting neural responses.This model was able to approximate the GC output (Figures2A, 2B, and S2B), though with a wide range of performance fordifferent neurons (Figures 2C and 2D). Even with optimized pa-rameters, however, the LN model predicts firing at times whenit should not, thus making the peaks of firing events wider andflatter than observed (Figures 2A, 2B, and S2B).Guided by knowledge of retinal anatomy, we then created a

sequence of four cascade models by systematically addingcomponents to the circuits (Figures 1C–1F). Each model derives

71

25

7

42

25

108...

output

Σ

stimulus

Σ

output

stimulus

B C

D

E

F G

...

stimulus

output

Σ

LN model LNSN model

LNSNF's GCM

LNFSNF's BCM

LNFDSNF model Parameters

BBBB

P

H

G

A

P

P P P PP

A Retinal circuit

BCM

GCM

Σ

ACM

Figure 1. A Progression of Circuit ModelsConstrained by Retinal Anatomy(A)Schematicof thecircuit upstreamofaganglioncell

in the vertebrate retina. Photoreceptors (P) transduce

the visual stimulus into electrical signals that propa-

gate through bipolar cells (B) to the ganglion cell (G).

At both synaptic stages, one finds both convergence

and divergence, as well as lateral signal flow carried

by horizontal (H) and amacrine (A) cells. The bipolar

cell and its upstream circuitry are modeled by a

spatiotemporal filter, a nonlinearity, and feedback

(bipolar cell module [BCM]; blue). The amacrine cell

introduces a delay in lateral propagation (amacrine

cell module [ACM]; red). The ganglion cell was

modeled by a weighted summation, another nonlin-

earity, and a second feedback function (ganglion cell

module [GCM]; green). Drawings after Polyak, 1941.

(B) LN model. A different temporal filter is applied to

the history of each bar in the stimulus. The outputs

of all of these filters are summed over space. The

resulting signal is passed through an instantaneous

nonlinearity.

(C) LNSN model. The stimulus is first processed

by partially overlapping, identical BCMs, each of

which consists of its own spatiotemporal filter and

nonlinearity. Their outputs are weighted and sum-

med by the GCM, which then applies another

instantaneous nonlinearity to give the model’s

output. For display purpose, the BCMs are shown

here to span only three stimulus bars, but they

spanned seven bars in the computations.

(D) LNSNF model. This is identical to the previous

one, except that the GCM (depicted here) has an

additional feedback loop around its nonlinearity.

(E) LNFSNF model. This is identical to the previous one, except that the BCMs (one of which is depicted here) have an additional feedback loop around their

nonlinearities. This new feedback function is the same for all BCMs.

(F) LNFDSNFmodel. This is identical to the previous one, except that there is a delay inserted between each BCM and the GCM. These delays are allowed to vary

independently for each BCM.

(G) A count of the free parameters in the LNFDSNF model, color coded as in the model diagram. Except for the total (108), the numbers here also apply to the

LNSN, LNSNF, and LNFSNF models. The LN model has 186 free parameters in the linear filter (31 spatial positions, each with six-parameter temporal filter as in

Equations S3–S5) and one in the nonlinearity. See also Figures S1 and S3.

190 Current Biology 27, 189–198, January 23, 2017

linear-nonlinear-feedback-delayed-sum-nonlinear-feedback

deep learning / deep neural networks (DNNs)

[Yamins et al 2014, Mcintosh et al 2016, Maheswaranathan et al 2017, Benjamin et al 2017, …]

… …

time8 subunits 16 subunits

convolution

convolution

denseresponses

Figure 1: A schematic of the model architecture. The stimulus was convolved with 8 learnedspatiotemporal filters whose activations were rectified. The second convolutional layer then projectedthe activity of these subunits through spatial filters onto 16 subunit types, whose activity was linearlycombined and passed through a final soft rectifying nonlinearity to yield the predicted response.

retina, potentially simplifying the retinal response to such stimuli [11, 12, 2, 10, 13]. In contrast tothe perceived linearity of the retinal response to coarse stimuli, the retina performs a wide variety ofnonlinear computations including object motion detection [6], adaptation to complex spatiotemporalpatterns [14], encoding spatial structure as spike latency [15], and anticipation of periodic stimuli[16], to name a few. However it is unclear what role these nonlinear computational mechanisms havein generating responses to more general natural stimuli.

To better understand the visual code for natural stimuli, we modeled retinal responses to natural imagesequences with convolutional neural networks (CNNs). CNNs have been successful at many patternrecognition and function approximation tasks [17]. In addition, these models cascade multiple layersof spatiotemporal filtering and rectification–exactly the elementary computational building blocksthought to underlie complex functional responses of sensory circuits. Previous work utilized CNNsto gain insight into the neural computations of inferotemporal cortex [18], but these models havenot been applied to early sensory areas where knowledge of neural circuitry can provide importantvalidation for such models.

We find that deep neural network models markedly outperform previous models in predicting retinalresponses both for white noise and natural scenes. Moreover, these models generalize better to unseenstimulus classes, and learn internal features consistent with known retinal properties, includingsub-Poisson variability, feedforward inhibition, and contrast adaptation. Our findings indicate thatCNNs can reveal both neural computations and mechanisms within a multilayered neural circuitunder natural stimulation.

2 Methods

The spiking activity of a population of tiger salamander retinal ganglion cells was recorded in responseto both sequences of natural images jittered with the statistics of eye movements and high resolutionspatiotemporal white noise. Convolutional neural networks were trained to predict ganglion cellresponses to each stimulus class, simultaneously for all cells in the recorded population of a givenretina. For a comparison baseline, we also trained linear-nonlinear models [19] and generalizedlinear models (GLMs) with spike history feedback [2]. More details on the stimuli, retinal recordings,experimental structure, and division of data for training, validation, and testing are given in theSupplemental Material.

2.1 Architecture and optimization

The convolutional neural network architecture is shown in Figure 2.1. Model parameters wereoptimized to minimize a loss function corresponding to the negative log-likelihood under Poissonspike generation. Optimization was performed using ADAM [20] via the Keras and Theano softwarelibraries [21]. The networks were regularized with an `2 weight penalty at each layer and an `1activity penalty at the final layer, which helped maintain a baseline firing rate near 0 Hz.

2

stimulus

If you understand GLMs … you understand Deep Nets!

•Stack multiple LNs of top of each other: LN LN LN LN-P•Use Gradient Ascent to maximize log-likelihood•Use Software Tools (Theano, Tensorflow, Autograd) to calculate gradients for you (no more high-school calculus needed!)

•Use a bunch of tricks (noise, adaptive gradients, …)•Do NOT worry about local maxima!

Everything you need to know about DNNs, in one slide, without equations!

If you understand GLMs… you understand DNNs!

• stack many LNs on top of each other: LN LN LN LN P• use gradient ascent to maximize likelihood• use software (tensorflow, theano) to compute gradients

(no more computing gradients by hand!)• use a bunch of tricks (batches, noise, SGD, dropout, ….)• do NOT worry about local maxima!

[credit: Jakob Macke]

Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee

Miller1,2,3, Konrad Paul Kording1,2,3

1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of

Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA

Contact: Ari Benjamin, [email protected]

Abstract

Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.

Introduction

A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.

A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.

The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.

The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee

Miller1,2,3, Konrad Paul Kording1,2,3

1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of

Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA

Contact: Ari Benjamin, [email protected]

Abstract

Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.

Introduction

A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.

A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.

The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.

The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

Fig 2

6

learning methods with their own approaches for encoding models.

To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple

and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the

Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.

Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

6

learning methods with their own approaches for encoding models.

To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple

and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the

Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.

Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

7

spiking of M1 neurons from the cosine and sine of the direction of hand movement in the reaching task. (The linear combination of a sine and cosine curve is a phase-shifted cosine, by identity, allowing the GLM to learn the proper preferred direction). We observed that each method identified a similar tuning curve (Fig. 2b), constructed by plotting the predictions of spike rate on the validation set against movement direction. The bulk of the neurons in the dataset were just as well predicted by each of the methods (Fig. 2a, c), though the ensemble was slightly better than the GLM (mean comparative pseudo-R2, defined in methods, of 0.06 [0.043 – 0.084], 95% bootstrapped confidence interval (CI)). The similar performance suggested that an exponentiated cosine is a nearly optimal approximating function of the neural response to movement direction alone, as was previously known (42). This classic example thus illustrated that all methods can in principle estimate tuning curves.

The exact form of the nonlinearity of the neural response to a given feature is rarely known, but this lack of knowledge need not impact our prediction ability. To illustrate the ability of modern machine learning to find the proper nonlinearity, we performed the same analysis as above but omitted the initial cosine feature engineering step. Trained on only the hand velocity direction, in radians, which are likely to be

discontinuous at ±π, the modern ML methods very nearly reproduced the predictive power they attained using the engineered feature (Fig. 3a). As expected, the GLM failed at generating a meaningful tuning curve (Fig. 3b). Both trends were consistent across the population of recorded neurons (Fig. 3c). The neural net, XGBoost, and ensemble methods thus perform well without feature engineering and the required prior knowledge or assumptions.

Machine learning methods can also take advantage of information contained in combinations of inputs, and should perform better if given more inputs. We verified that this was true for our dataset by training on the four-dimensional set of hand position and velocity !, #, !, # , which we call the set of original features. All

methods gained a significant amount of predictive power with these new features, though the GLM did not nearly match the other methods (Fig 4a, c). This set of neurons thus seemed to encode strongly for position and velocity in a potentially nonlinear fashion captured by machine learning methods.

While some amount of feature engineering can improve the performance of GLMs, it is not always simple to guess the optimal set of processed features. We demonstrated this by training all methods on features that have previously been successful at explaining spike

Figure 4: Training on the set of original features (!, #, !̇, #̇) increased the predictive power of all methods. Note the change in axes scales from Figures 2-3. (a) For the same example neuron as in Figure 3, all methods gained a significant amount of predictive power, indicating a strong encoding of position and speed or their correlates. The GLM showed less predictive power than the other methods on this feature set. (b) The spike rate in black, with jitter on the y-axis, again overlaid with the predictions of the three methods as a function of velocity direction. The neuron encodes for position and speed, as well, and the projection of the multidimensional tuning curve onto a 1D velocity direction dependence leaves the projected curve diffuse. (c) The ensemble method, neural network, and XGBoost performed consistently better than the GLM across the population. The mean pseudo-R2 scores show the hierarchy of success across methods.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

8

rate in a similar center-out reaching task (6). These extra features included the sine and cosine of velocity direction (as in Figure 2), the speed, the radial distance of hand position, and the sine and cosine of position direction. The training set was thus 10-dimensional, though highly redundant, and was aimed at maximizing the predictive power of the GLM. Feature engineering improved the predictive power of all methods to variable degrees, with the GLM improving to the level of the neural network (Fig. 5). XGBoost and the ensemble still predicted spikes better than the GLM (Fig. 5c), with the ensemble scoring on average 1.8 times higher than the GLM (ratio of population means of 1.8 [1.4 – 2.2], 95% bootstrapped CI). The ensemble was significantly better than XGBoost (mean comparative pseudo-R2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) and was thus consistently the best predictor. Though standard feature engineering greatly improved the GLM, the ensemble and XGBoost still captured the neural response more accurately.

To ensure that these results are not specific to the motor cortex, we extended the same analyses to primary somatosensory cortex (S1) data. The ensemble was consistently the best predictor across all neurons, scoring almost twice as well as the GLM (ratio of 1.8 [1.2 – 2.2] of population means, 95% bootstrapped CI). XGBoost predicted spikes better than the GLM only for

neurons with significant effect sizes for any of the four methods (i.e., with cross-validated pseudo-R2 scores two standard deviations above 0; mean comparative pseudo-R2 was 0.002 [0.0006 – 0.0045], 95% bootstrapped CI). Interestingly, the neural network performed worse than all other methods. We speculated that this could be related to the small covariate effect size in the S1 dataset, as we observed similar scores for the neural network on the M1 dataset for regimes of similar effect sizes, as well as on simulated data with GLM structure, small effect size, and similar firing rates (Supp. Fig. 2). We also found that a much smaller network performed better (a single hidden layer with 20 nodes) but that max-norm or elastic-net regularization did not improve the results with the larger network. Neural networks may thus be poor choices for Poisson data with very small covariate effect sizes, though we see no theoretical reason why this should be the case. Overall, on this S1 dataset featuring generally low predictability, the tested methods displayed a range of performances, with the ensemble predicting the data nearly twice as well as the GLM alone.

We asked if the same trends of performance would hold for the rat hippocampus dataset, which was characterized by very low mean firing rates but strong effect sizes. All methods were trained on a list of features representing the rat position and orientation, as

Figure 5: Encoding models for M1 trained on all the original features plus the engineered features show that modern ML methods can outperform the GLM even with standard featuring engineering. (a) For this example neuron, inclusion of the computed features increased the predictive power of the GLM to the level of the neural net. XGBoost and the ensemble method also increased in predictive power. (b) The tuning curves for the example neuron are diffuse when projected onto the movement direction, indicating a high-dimensional dependence. (c) Even with feature engineering, XGBoost and the ensemble consistently achieve pseudo-R2 scores higher than the GLM, though the neural net does not. The selected neuron at left is marked with black arrows.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

Fig 3 Fig 4 Fig 5

9

described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.

Discussion We contrasted the performance of GLMs with recent

machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.

The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-

Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

9

described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.

Discussion We contrasted the performance of GLMs with recent

machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.

The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-

Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

macaque S1 hippocampus

macaque M1

Fig 6

Fig 6

Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee

Miller1,2,3, Konrad Paul Kording1,2,3

1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of

Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA

Contact: Ari Benjamin, [email protected]

Abstract

Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.

Introduction

A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.

A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.

The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.

The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee

Miller1,2,3, Konrad Paul Kording1,2,3

1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of

Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA

Contact: Ari Benjamin, [email protected]

Abstract

Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.

Introduction

A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.

A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.

The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.

The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

Fig 2

6

learning methods with their own approaches for encoding models.

To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple

and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the

Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.

Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

6

learning methods with their own approaches for encoding models.

To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple

and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the

Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.

Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

7

spiking of M1 neurons from the cosine and sine of the direction of hand movement in the reaching task. (The linear combination of a sine and cosine curve is a phase-shifted cosine, by identity, allowing the GLM to learn the proper preferred direction). We observed that each method identified a similar tuning curve (Fig. 2b), constructed by plotting the predictions of spike rate on the validation set against movement direction. The bulk of the neurons in the dataset were just as well predicted by each of the methods (Fig. 2a, c), though the ensemble was slightly better than the GLM (mean comparative pseudo-R2, defined in methods, of 0.06 [0.043 – 0.084], 95% bootstrapped confidence interval (CI)). The similar performance suggested that an exponentiated cosine is a nearly optimal approximating function of the neural response to movement direction alone, as was previously known (42). This classic example thus illustrated that all methods can in principle estimate tuning curves.

The exact form of the nonlinearity of the neural response to a given feature is rarely known, but this lack of knowledge need not impact our prediction ability. To illustrate the ability of modern machine learning to find the proper nonlinearity, we performed the same analysis as above but omitted the initial cosine feature engineering step. Trained on only the hand velocity direction, in radians, which are likely to be

discontinuous at ±π, the modern ML methods very nearly reproduced the predictive power they attained using the engineered feature (Fig. 3a). As expected, the GLM failed at generating a meaningful tuning curve (Fig. 3b). Both trends were consistent across the population of recorded neurons (Fig. 3c). The neural net, XGBoost, and ensemble methods thus perform well without feature engineering and the required prior knowledge or assumptions.

Machine learning methods can also take advantage of information contained in combinations of inputs, and should perform better if given more inputs. We verified that this was true for our dataset by training on the four-dimensional set of hand position and velocity !, #, !, # , which we call the set of original features. All

methods gained a significant amount of predictive power with these new features, though the GLM did not nearly match the other methods (Fig 4a, c). This set of neurons thus seemed to encode strongly for position and velocity in a potentially nonlinear fashion captured by machine learning methods.

While some amount of feature engineering can improve the performance of GLMs, it is not always simple to guess the optimal set of processed features. We demonstrated this by training all methods on features that have previously been successful at explaining spike

Figure 4: Training on the set of original features (!, #, !̇, #̇) increased the predictive power of all methods. Note the change in axes scales from Figures 2-3. (a) For the same example neuron as in Figure 3, all methods gained a significant amount of predictive power, indicating a strong encoding of position and speed or their correlates. The GLM showed less predictive power than the other methods on this feature set. (b) The spike rate in black, with jitter on the y-axis, again overlaid with the predictions of the three methods as a function of velocity direction. The neuron encodes for position and speed, as well, and the projection of the multidimensional tuning curve onto a 1D velocity direction dependence leaves the projected curve diffuse. (c) The ensemble method, neural network, and XGBoost performed consistently better than the GLM across the population. The mean pseudo-R2 scores show the hierarchy of success across methods.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

8

rate in a similar center-out reaching task (6). These extra features included the sine and cosine of velocity direction (as in Figure 2), the speed, the radial distance of hand position, and the sine and cosine of position direction. The training set was thus 10-dimensional, though highly redundant, and was aimed at maximizing the predictive power of the GLM. Feature engineering improved the predictive power of all methods to variable degrees, with the GLM improving to the level of the neural network (Fig. 5). XGBoost and the ensemble still predicted spikes better than the GLM (Fig. 5c), with the ensemble scoring on average 1.8 times higher than the GLM (ratio of population means of 1.8 [1.4 – 2.2], 95% bootstrapped CI). The ensemble was significantly better than XGBoost (mean comparative pseudo-R2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) and was thus consistently the best predictor. Though standard feature engineering greatly improved the GLM, the ensemble and XGBoost still captured the neural response more accurately.

To ensure that these results are not specific to the motor cortex, we extended the same analyses to primary somatosensory cortex (S1) data. The ensemble was consistently the best predictor across all neurons, scoring almost twice as well as the GLM (ratio of 1.8 [1.2 – 2.2] of population means, 95% bootstrapped CI). XGBoost predicted spikes better than the GLM only for

neurons with significant effect sizes for any of the four methods (i.e., with cross-validated pseudo-R2 scores two standard deviations above 0; mean comparative pseudo-R2 was 0.002 [0.0006 – 0.0045], 95% bootstrapped CI). Interestingly, the neural network performed worse than all other methods. We speculated that this could be related to the small covariate effect size in the S1 dataset, as we observed similar scores for the neural network on the M1 dataset for regimes of similar effect sizes, as well as on simulated data with GLM structure, small effect size, and similar firing rates (Supp. Fig. 2). We also found that a much smaller network performed better (a single hidden layer with 20 nodes) but that max-norm or elastic-net regularization did not improve the results with the larger network. Neural networks may thus be poor choices for Poisson data with very small covariate effect sizes, though we see no theoretical reason why this should be the case. Overall, on this S1 dataset featuring generally low predictability, the tested methods displayed a range of performances, with the ensemble predicting the data nearly twice as well as the GLM alone.

We asked if the same trends of performance would hold for the rat hippocampus dataset, which was characterized by very low mean firing rates but strong effect sizes. All methods were trained on a list of features representing the rat position and orientation, as

Figure 5: Encoding models for M1 trained on all the original features plus the engineered features show that modern ML methods can outperform the GLM even with standard featuring engineering. (a) For this example neuron, inclusion of the computed features increased the predictive power of the GLM to the level of the neural net. XGBoost and the ensemble method also increased in predictive power. (b) The tuning curves for the example neuron are diffuse when projected onto the movement direction, indicating a high-dimensional dependence. (c) Even with feature engineering, XGBoost and the ensemble consistently achieve pseudo-R2 scores higher than the GLM, though the neural net does not. The selected neuron at left is marked with black arrows.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

Fig 3 Fig 4 Fig 5

9

described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.

Discussion We contrasted the performance of GLMs with recent

machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.

The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-

Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

9

described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.

Discussion We contrasted the performance of GLMs with recent

machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.

The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-

Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

macaque S1 hippocampus

GLMs NNs?

6

learning methods with their own approaches for encoding models.

To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple

and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the

Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.

Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

8

rate in a similar center-out reaching task (6). These extra features included the sine and cosine of velocity direction (as in Figure 2), the speed, the radial distance of hand position, and the sine and cosine of position direction. The training set was thus 10-dimensional, though highly redundant, and was aimed at maximizing the predictive power of the GLM. Feature engineering improved the predictive power of all methods to variable degrees, with the GLM improving to the level of the neural network (Fig. 5). XGBoost and the ensemble still predicted spikes better than the GLM (Fig. 5c), with the ensemble scoring on average 1.8 times higher than the GLM (ratio of population means of 1.8 [1.4 – 2.2], 95% bootstrapped CI). The ensemble was significantly better than XGBoost (mean comparative pseudo-R2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) and was thus consistently the best predictor. Though standard feature engineering greatly improved the GLM, the ensemble and XGBoost still captured the neural response more accurately.

To ensure that these results are not specific to the motor cortex, we extended the same analyses to primary somatosensory cortex (S1) data. The ensemble was consistently the best predictor across all neurons, scoring almost twice as well as the GLM (ratio of 1.8 [1.2 – 2.2] of population means, 95% bootstrapped CI). XGBoost predicted spikes better than the GLM only for

neurons with significant effect sizes for any of the four methods (i.e., with cross-validated pseudo-R2 scores two standard deviations above 0; mean comparative pseudo-R2 was 0.002 [0.0006 – 0.0045], 95% bootstrapped CI). Interestingly, the neural network performed worse than all other methods. We speculated that this could be related to the small covariate effect size in the S1 dataset, as we observed similar scores for the neural network on the M1 dataset for regimes of similar effect sizes, as well as on simulated data with GLM structure, small effect size, and similar firing rates (Supp. Fig. 2). We also found that a much smaller network performed better (a single hidden layer with 20 nodes) but that max-norm or elastic-net regularization did not improve the results with the larger network. Neural networks may thus be poor choices for Poisson data with very small covariate effect sizes, though we see no theoretical reason why this should be the case. Overall, on this S1 dataset featuring generally low predictability, the tested methods displayed a range of performances, with the ensemble predicting the data nearly twice as well as the GLM alone.

We asked if the same trends of performance would hold for the rat hippocampus dataset, which was characterized by very low mean firing rates but strong effect sizes. All methods were trained on a list of features representing the rat position and orientation, as

Figure 5: Encoding models for M1 trained on all the original features plus the engineered features show that modern ML methods can outperform the GLM even with standard featuring engineering. (a) For this example neuron, inclusion of the computed features increased the predictive power of the GLM to the level of the neural net. XGBoost and the ensemble method also increased in predictive power. (b) The tuning curves for the example neuron are diffuse when projected onto the movement direction, indicating a high-dimensional dependence. (c) Even with feature engineering, XGBoost and the ensemble consistently achieve pseudo-R2 scores higher than the GLM, though the neural net does not. The selected neuron at left is marked with black arrows.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

(No of course not!)

GLM is a special case of NN!

9

described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.

Discussion We contrasted the performance of GLMs with recent

machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.

The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-

Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

9

described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.

Discussion We contrasted the performance of GLMs with recent

machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.

The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-

Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.

. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;

stimulus

spikes

membrane potential

imaging

neural activity

encoding models

encoding models

What if there’s no stimulus?

spikes

membrane potential

imaging

neural activity

encoding models

encoding models

What if there’s no stimulus?

spikes

membrane potential

imaging

neural activity

latent variable models

latent encoding models

latent variable(unobserved or

“hidden”)

<latexit sha1_base64="W03iimHuVnRwoVFYpQGh0ykBtHs=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3QkiuGzBqtCpkklPNTSXITlTW4Z5Arf6Kr6L4Erc+g6mdRZaPZDw83/nkJM/SaVwGIavwdT0zOzc/MJiZWl5ZXVtfWPz0pnMcmhyI429TpgDKTQ0UaCE69QCU4mEq6R3OuJXfbBOGH2BwxTait1p0RWcobcag9v1algLx0X/iqgUVVJW/XYj2Ik7hmcKNHLJnGtFYYrtnFkUXEJRiTMHKeM9dgctLzVT4Nr5eNOC7nmnQ7vG+qORjt2fEzlTzg1V4jsVw3s3yUbmf6yVYfeonQudZgiafz/UzSRFQ0ffph1hgaMcesG4FX5Xyu+ZZRx9OJVKrOGBG6WY7uRxv1f4CzjtFRNgUILBBECbFvlNjCYtfKDRZHx/RfOgdlyLGmH15KxMdoFsk12yTyJySE7IOamTJuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkF+AGrsg==</latexit>

<latexit sha1_base64="Y1Rdd0vDJoDdlc8eOTZsZtbYSw8=">AAACgXicbVFdb9MwFHXDgK18rIXHvVhUSB1CldNRuoqXSRMSj0WibKgJlePcdlb9EdnO1irkp/DKfhP/BqfLA2RcydbROffI1+cmmeDWEfK7FTzYe/jo8f5B+8nTZ88PO90XX63ODYMZ00Kby4RaEFzBzHEn4DIzQGUi4CJZn1f6xTUYy7X64rYZxJKuFF9yRp2nFp3utL/9sTmOjIG0mPY3x+Wi0yMDsit8H4Q16KG6potu61uUapZLUI4Jau08JJmLC2ocZwLKdpRbyChb0xXMPVRUgo2L3ewlfu2ZFC+18Uc5vGP/dhRUWruVie+U1F3ZplaR/9PmuVuexgVXWe5AsbuHlrnATuMqCJxyA8yJrQeUGe5nxeyKGsqcj6vdjhTcMC0lVWkRXa9LfwHD67IhbGph0xCcycrie+R01hCqoL1n98+5WSVxQQbvJ6fkZPLWhzx+NwqJB+HJeDgaldUywmb098FsOJgMws+kd/ax3so+OkKvUB+FaIzO0Cc0RTPE0A36iX6h22AveBOQYHjXGrRqz0v0TwUf/gB49sGr</latexit>

50

Neu

ron

inde

x

Time

(simulated data)

spike responses[credit: Jakob Macke]

50

Neu

ron

inde

x

Time

(simulated data)

spike responses

0

Time

Trajectories

inferred latent variables

[credit: Jakob Macke]

50

Neu

ron

inde

x

Time

[credit: Jakob Macke]

spike responses

[credit: Jakob Macke]

spike response y

nonlinearfunction noise

common input x

couplings C

latent variable models = GLMs where we don’t know<latexit sha1_base64="ql7PqUzNbTp1EoEFdNvIvQdBct0=">AAACb3icbZHdahNBFMcna9Uav1q9EQplMAgKEmZbY5q7ggi9bKGxlexaZmdP0iHzscycrQnLPoG39uH6Gj6Bk3Qv6tYDM/z5/+Yw5yMrlPTI2E0nerDx8NHjzSfdp8+ev3i5tf3qm7elEzAWVll3nnEPShoYo0QF54UDrjMFZ9n8y4qfXYHz0ppTXBaQaj4zcioFx2CdLC62eqzP1kHvi7gRPdLE8cV253uSW1FqMCgU934SswLTijuUQkHdTUoPBRdzPoNJkIZr8Gm1rrSm74KT06l14Rika/duRsW190udhZea46Vvs5X5PzYpcXqQVtIUJYIRtx9NS0XR0lXbNJcOBKplEFw4GWql4pI7LjAMp9tNDPwUVmtu8iq5mtfhAkHndQssGrBoAXRFXf1I0BYt4BzkIWfd58TNsrRi/c+jA7Y/+hiGPPw0iFkQ8f5wbzCo67CMuD36+2K81x/14xPWO/zabGWT7JC35D2JyZAckiNyTMZEECC/yG9y3fkTvYl2I3r7NOo0Oa/JPxF9+AvvT7wp</latexit>

<latexit sha1_base64="lz4ifoyFRNHWQ7mKU1FpWBpsYiU=">AAACb3icbZHbahsxEIblTdqm7imHm0IhiJpCC8Vok7qO7wKl0MsE4ibFuw1a7dgR1mGRZpOYZZ8gt+nD9TX6BJWdvUg3HZD4+T8NmkNWKOmRsd+daG390eMnG0+7z56/ePlqc2v7u7elEzAWVll3lnEPShoYo0QFZ4UDrjMFp9n8y5KfXoLz0poTXBSQaj4zcioFx2AdL843e6zPVkEfirgRPdLE0flW50eSW1FqMCgU934SswLTijuUQkHdTUoPBRdzPoNJkIZr8Gm1qrSm74KT06l14RikK/d+RsW19wudhZea44Vvs6X5PzYpcXqQVtIUJYIRdx9NS0XR0mXbNJcOBKpFEFw4GWql4oI7LjAMp9tNDFwJqzU3eZVczutwgaDzugWuG3DdAuiKuvqZoC1awDnIQ86qz4mbZWnF+p9HB2x/9DEMefhpELMg4v3h3mBQ12EZcXv0D8V4rz/qx8esd/i12coGeUPekvckJkNySL6RIzImggC5IbfkV+dP9Drajejd06jT5OyQfyL68BfxWrwq</latexit>

<latexit sha1_base64="ql7PqUzNbTp1EoEFdNvIvQdBct0=">AAACb3icbZHdahNBFMcna9Uav1q9EQplMAgKEmZbY5q7ggi9bKGxlexaZmdP0iHzscycrQnLPoG39uH6Gj6Bk3Qv6tYDM/z5/+Yw5yMrlPTI2E0nerDx8NHjzSfdp8+ev3i5tf3qm7elEzAWVll3nnEPShoYo0QF54UDrjMFZ9n8y4qfXYHz0ppTXBaQaj4zcioFx2CdLC62eqzP1kHvi7gRPdLE8cV253uSW1FqMCgU934SswLTijuUQkHdTUoPBRdzPoNJkIZr8Gm1rrSm74KT06l14Rika/duRsW190udhZea46Vvs5X5PzYpcXqQVtIUJYIRtx9NS0XR0lXbNJcOBKplEFw4GWql4pI7LjAMp9tNDPwUVmtu8iq5mtfhAkHndQssGrBoAXRFXf1I0BYt4BzkIWfd58TNsrRi/c+jA7Y/+hiGPPw0iFkQ8f5wbzCo67CMuD36+2K81x/14xPWO/zabGWT7JC35D2JyZAckiNyTMZEECC/yG9y3fkTvYl2I3r7NOo0Oa/JPxF9+AvvT7wp</latexit>

?

goal: find shared structure underlying y

chicken and egg problem

Why are latent variable models hard to work with?

• hard to compute likelihood!

sensory encoding model

x

y

logP (Y |X) =X

t

yt log �t � �t

Why are latent variable models hard to work with?

• hard to compute likelihood!

fit using:Poisson GLM:

sensory encoding model

x

y

logP (Y |X) =X

t

yt log �t � �t

latentvariable model

Why are latent variable models hard to work with?

• hard to compute likelihood!

x

y

<latexit sha1_base64="lhxhmta7V1DdOurbFKk1Lo00a9U=">AAACeHicbZHLbhMxFIad4VbCpReWbAwRopVQ5ElJ0+wqISSWQSK0JTNUHs9JasWXkX2mNBrNW7CF9+JZ2OCks4ApR7L16/985HPJCiU9MvarE925e+/+g62H3UePnzzd3tnd++xt6QRMhVXWnWXcg5IGpihRwVnhgOtMwWm2fLfmp1fgvLTmE64KSDVfGDmXgmOwviTKLuhk//yAXuz0WJ9tgt4WcSN6pInJxW7nPMmtKDUYFIp7P4tZgWnFHUqhoO4mpYeCiyVfwCxIwzX4tNqUXNNXwcnp3LpwDNKN+3dGxbX3K52Fl5rjpW+ztfk/NitxfpxW0hQlghE3H81LRdHSdf80lw4EqlUQXDgZaqXikjsuMEyp200MfBNWa27yKrla1uECQZd1C1w34LoF0BV19TVBW7SAc5CHnE2fM7fI0or1j8bH7HD8Jgx59HYYsyDiw9FgOKzrsIy4PfrbYjroj/vxR9Y7ed9sZYs8Jy/JPonJiJyQD2RCpkQQQ76TH+Rn53f0InodHdw8jTpNzjPyT0SDPyolvuM=</latexit>

<latexit sha1_base64="kkgOIzcbbGBUSnoMenwIov6gI1Y=">AAACinicbVFdaxNBFJ2s1dZobapv9mVoECpImG0a0yBCwQp9TMHYlOwaZmdv0iHzsczM1oR1wV/jq/4d/42TdB9064WZOZxzL3PvuUkmuHWE/G4ED7YePtreedx88nT32V5r//lnq3PDYMS00GacUAuCKxg57gSMMwNUJgKuksWHtX51C8ZyrT65VQaxpHPFZ5xR56lp6+V7HAk9xxFXDg+Prr+NX/vHX+ly2mqTDtkEvg/CCrRRFcPpfuM6SjXLJSjHBLV2EpLMxQU1jjMBZTPKLWSULegcJh4qKsHGxWaIEr/yTIpn2vjje9mwf1cUVFq7konPlNTd2Lq2Jv+nTXI3O40LrrLcgWJ3H81ygZ3Ga0dwyg0wJ1YeUGa47xWzG2ooc963ZjNS8JVpKalKi+h2UfoLGF6UNWFZCcua4ExWFl8ip7OaYAykvmYz58TMk7ggnbeDU9IdvPEm9096IfEg7PaPe72y9MsI69bfB6PjzqATXpL22cdqKzvoAB2iIxSiPjpDF2iIRoih7+gH+ol+BbtBNxgE7+5Sg0ZV8wL9E8H5H0sHw6I=</latexit>

requires an integral!

fit using: fit using:

latentdynamical model

Why are latent variable models hard to work with?

• hard to compute likelihood!

<latexit sha1_base64="lhxhmta7V1DdOurbFKk1Lo00a9U=">AAACeHicbZHLbhMxFIad4VbCpReWbAwRopVQ5ElJ0+wqISSWQSK0JTNUHs9JasWXkX2mNBrNW7CF9+JZ2OCks4ApR7L16/985HPJCiU9MvarE925e+/+g62H3UePnzzd3tnd++xt6QRMhVXWnWXcg5IGpihRwVnhgOtMwWm2fLfmp1fgvLTmE64KSDVfGDmXgmOwviTKLuhk//yAXuz0WJ9tgt4WcSN6pInJxW7nPMmtKDUYFIp7P4tZgWnFHUqhoO4mpYeCiyVfwCxIwzX4tNqUXNNXwcnp3LpwDNKN+3dGxbX3K52Fl5rjpW+ztfk/NitxfpxW0hQlghE3H81LRdHSdf80lw4EqlUQXDgZaqXikjsuMEyp200MfBNWa27yKrla1uECQZd1C1w34LoF0BV19TVBW7SAc5CHnE2fM7fI0or1j8bH7HD8Jgx59HYYsyDiw9FgOKzrsIy4PfrbYjroj/vxR9Y7ed9sZYs8Jy/JPonJiJyQD2RCpkQQQ76TH+Rn53f0InodHdw8jTpNzjPyT0SDPyolvuM=</latexit>

fit using:

xt+1

yt+1

xt

yt

xt-1

yt-1

dynamics

...

......response

<latexit sha1_base64="qKHYXGUUbB4Z85tADVA3P8Cjjyk=">AAACNHicbZC7TsMwFIYd7pRbgZHFokJqB6qkCzAgVbAwMIDU0kpNiRzHLRZObNknqFXoS7HwHkwwMABi5RlwSgduR7L16f/PkX3+UAluwHWfnKnpmdm5+YXFwtLyyupacX3jwshUU9akUkjdDolhgiesCRwEayvNSBwK1gqvj3O/dcO04TJpwFCxbkz6Ce9xSsBKQfHU5wlgX2kZBRkceqPLBvaPeL+sysMAbgcBVLAqD8aYwa43quRuBUeDwMuvGvZpJMHk3AiKJbfqjgv/BW8CJTSps6D44EeSpjFLgApiTMdzFXQzooFTwUYFPzVMEXpN+qxjMSExM91svPUI71glwj2p7bE7jNXvExmJjRnGoe2MCVyZ314u/ud1UujtdzOeqBRYQr8e6qUCg8R5hDjimlEQQwuEam7/iukV0YSCDbpgQ/B+r/wXmrXqQdU7d0v1o0kaC2gLbaMy8tAeqqMTdIaaiKI79Ihe0Ktz7zw7b877V+uUM5nZRD/K+fgEHp6qAQ==</latexit>

latent

observations dynamics

high-dimensional integral

Fitting Latent Variable Models

1. Sampling (“MCMC”) - fully Bayesian inference

1) sample latents: conditional over latents

2) sample parameters: conditional over parameters

• procedure for sampling joint distribution:

Fitting Latent Variable Models

1. Sampling (“MCMC”) - fully Bayesian inference

2. Expectation maximization (EM)

1) sample latents: conditional over latents

2) sample parameters: conditional over parameters

• procedure for sampling joint distribution:

Alternate updating parameters and posterior over latents.

Fitting Latent Variable Models

1. Sampling (“MCMC”) - fully Bayesian inference

2. Expectation maximization (EM)

3. Variational inference

1) sample latents: conditional over latents

2) sample parameters: conditional over parameters

• procedure for sampling joint distribution:

Alternate updating parameters and posterior over latents.

Optimize a lower bound on posterior over parametersEasy with modern probabilistic programming languages (STAN, Edward)

Latent Variable models are defined by two quantities:

<latexit sha1_base64="ws37Pmllp+Wshk+faQ9ajP05jK0=">AAACcnicbZHLbhMxFIad4daGWwvLbgwRUpGqyNMS0uwqoUpdBonQosxQeTwniRVfRvaZkmg0r8AWXo0HYV8nzAKmHMnWr//zkc8lK5T0yNivTnTv/oOHj3Z2u4+fPH32fG//xWdvSydgIqyy7irjHpQ0MEGJCq4KB1xnCi6z5YcNv7wB56U1n3BdQKr53MiZFBw31vhw9fZ6r8f6bBv0rogb0SNNjK/3O1+S3IpSg0GhuPfTmBWYVtyhFArqblJ6KLhY8jlMgzRcg0+rbbE1fROcnM6sC8cg3bp/Z1Rce7/WWXipOS58m23M/7FpibPTtJKmKBGM+PPRrFQULd10TnPpQKBaB8GFk6FWKhbccYFhPt1uYuCbsFpzk1fJzbIOFwi6rFtg1YBVC6Ar6uprgrZoAecgDznbPqdunqUV678fnbKT0VEY8vDdIGZBxCfD48GgrsMy4vbo74rJcX/Ujz+y3tl5s5UdckBek0MSkyE5IxdkTCZEkAX5Tn6Qn53f0UH0KmpWGHWanJfkn4iObgGqV7zo</latexit>

<latexit sha1_base64="z3DakK0PnR7CeqMWV3f0ylYJ1tc=">AAACdHicbZFdT9swFIbdbGysY+NjlxOStQqJSVPkAF3pHRJC2mWRVmBqMuQ4p8XUH5HtQKss/2G38M/4JdzOLbnYwo5k69X7+MjnI80Ft46Qh1bw4uXKq9erb9pv1969X9/Y3DqzujAMhkwLbS5SakFwBUPHnYCL3ACVqYDzdHq84Oc3YCzX6rub55BIOlF8zBl13job7M5/zT5fbnRISJaBn4uoFh1Ux+Bys/UjzjQrJCjHBLV2FJHcJSU1jjMBVTsuLOSUTekERl4qKsEm5bLcCu94J8NjbfxRDi/dvzNKKq2dy9S/lNRd2SZbmP9jo8KND5OSq7xwoNjTR+NCYKfxoneccQPMibkXlBnua8XsihrKnJ9Qux0ruGVaSqqyMr6ZVv4ChqdVA8xqMGsAZ/Kq/Bk7nTeAMZD5nGWfIzNJk5KEX/uHZL//xQ+5d9CNiBfRfm+v260qv4yoOfrnYrgX9sPolHSOTuqtrKKP6BPaRRHqoSP0DQ3QEDF0jX6jO3Tfegy2g06w8/Q0aNU5H9A/EYR/AO0zvfE=</latexit>

latent mapping

• discrete • Gaussian• Mixture ofGaussians

(“clustering”)

Model

• Gaussian • linear, Gaussian

• Factor analysis(PCA is special

case)

• linear Gaussian dynamics

• linear, Gaussian

• Linear Dynamical Systems (LDS)(“Kalman filter”)

• discrete transitions • any • Hidden Markov

Model (“HMM”)

variational latent Gaussian process (vLGP)

Poisson GLMGaussian Process

[Zhao & Park 2016]

<latexit sha1_base64="ws37Pmllp+Wshk+faQ9ajP05jK0=">AAACcnicbZHLbhMxFIad4daGWwvLbgwRUpGqyNMS0uwqoUpdBonQosxQeTwniRVfRvaZkmg0r8AWXo0HYV8nzAKmHMnWr//zkc8lK5T0yNivTnTv/oOHj3Z2u4+fPH32fG//xWdvSydgIqyy7irjHpQ0MEGJCq4KB1xnCi6z5YcNv7wB56U1n3BdQKr53MiZFBw31vhw9fZ6r8f6bBv0rogb0SNNjK/3O1+S3IpSg0GhuPfTmBWYVtyhFArqblJ6KLhY8jlMgzRcg0+rbbE1fROcnM6sC8cg3bp/Z1Rce7/WWXipOS58m23M/7FpibPTtJKmKBGM+PPRrFQULd10TnPpQKBaB8GFk6FWKhbccYFhPt1uYuCbsFpzk1fJzbIOFwi6rFtg1YBVC6Ar6uprgrZoAecgDznbPqdunqUV678fnbKT0VEY8vDdIGZBxCfD48GgrsMy4vbo74rJcX/Ujz+y3tl5s5UdckBek0MSkyE5IxdkTCZEkAX5Tn6Qn53f0UH0KmpWGHWanJfkn4iObgGqV7zo</latexit> <latexit sha1_base64="z3DakK0PnR7CeqMWV3f0ylYJ1tc=">AAACdHicbZFdT9swFIbdbGysY+NjlxOStQqJSVPkAF3pHRJC2mWRVmBqMuQ4p8XUH5HtQKss/2G38M/4JdzOLbnYwo5k69X7+MjnI80Ft46Qh1bw4uXKq9erb9pv1969X9/Y3DqzujAMhkwLbS5SakFwBUPHnYCL3ACVqYDzdHq84Oc3YCzX6rub55BIOlF8zBl13job7M5/zT5fbnRISJaBn4uoFh1Ux+Bys/UjzjQrJCjHBLV2FJHcJSU1jjMBVTsuLOSUTekERl4qKsEm5bLcCu94J8NjbfxRDi/dvzNKKq2dy9S/lNRd2SZbmP9jo8KND5OSq7xwoNjTR+NCYKfxoneccQPMibkXlBnua8XsihrKnJ9Qux0ruGVaSqqyMr6ZVv4ChqdVA8xqMGsAZ/Kq/Bk7nTeAMZD5nGWfIzNJk5KEX/uHZL//xQ+5d9CNiBfRfm+v260qv4yoOfrnYrgX9sPolHSOTuqtrKKP6BPaRRHqoSP0DQ3QEDF0jX6jO3Tfegy2g06w8/Q0aNU5H9A/EYR/AO0zvfE=</latexit>

latent mapping

variational latent Gaussian process (vLGP)

• 63 simultaneously-recorded V1 neurons [Graf et al 2011]• stimuli: drifting sinusoidal gratings

[Zhao & Park 2016]

https://www.youtube.com/watch?v=CrY5AfNH1ik

variational latent Gaussian process (vLGP)[Zhao & Park 2016]

• 63 simultaneously-recorded V1 neurons [Graf et al 2011]• stimuli: drifting sinusoidal gratings

https://www.youtube.com/watch?v=CrY5AfNH1ik

Summary

• descriptive statistical “encoding” models• seek to capture structure in data• formal tools for comparing models• encoding and decoding analyses via Bayes rule• models are modular, easy to build /extend / generalize

• large-scale recording technology advancing rapidly

• lots of interesting structure in high-D neural data

• big opportunities in computational / statistical for developing new methods and models to find / exploit this structure!

Big Picture