Statistical Models for Neural Data: from Regression / GLMs to ...
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of Statistical Models for Neural Data: from Regression / GLMs to ...
Jonathan PillowPrinceton Neuroscience Institute
Statistical Models for Neural Data: from Regression / GLMs to Latent Variables
TutorialCosyne 2018
Retinal responses to white noise
Shlens, Field, Gauthier, Greschner, Sher , Litke & Chichilnisky (2009).
(ON parasol cells)
stimulus
spikes
membrane potential
imaging
neural activity
neural coding problem
• How are stimuli and actions encoded in neural activity?• What aspects of neural activity carry information?
stimulus
spikes
membrane potential
imaging
neural activity
neural coding problem
encoding models
• develop flexible statistical models of P(y|x) • quantify information carried in neural responses
Approach:
spikes
membrane potential
imaging
neural activityencoding models
“regression models”
NATURE NEUROSCIENCE VOLUME 17 | NUMBER 3 | MARCH 2014 441
A R T I C L E S
Columns correspond to the activities at different times and for differ-ent movements. W contains the weights for the linear mapping from neurons to muscles. That is, W specifies the weighted sum of neurons’ firing rates that drives each muscle.
To build intuition about this model, consider the following extreme, unphysiologically simplified situation. Imagine that just two excitatory neurons synapsed directly on a muscle, and this muscle produced force proportional to the sum of its two inputs. As long as the sum of the two inputs remained constant, the muscle would produce a constant amount of force: no ‘gate’ or ‘switch’ is required. The activity of these two neurons can be represented as a point in a two-dimensional firing rate space. Their pattern of activity over time is a trajectory through this space35–37. In the state space, the constant-sum line forms an ‘output-null’ dimension (Fig. 2). The muscle’s force output will change only if there is a change in the sum of the neurons’ firing rates; we term the direction in which that sum changes the ‘output-potent’ dimension (Fig. 2). This idea also generalizes to more complex cases: if one of these hypothetical neurons had a net inhibitory effect, the dimensions would be switched. With many neurons, we would expect multiple output-null dimensions. If there were multiple independent muscles, we would need multiple output-potent dimensions. This is all to say that activity in the output-potent dimensions would be read out by the target muscle or brain area, whereas activity in output-null dimensions would not be visible to the target. Formally, any activity changes in output-null dimensions fall in the null space of W. Conversely, activity changes in output-potent dimensions fall in the row space of W.
The existence of output-potent and output-null dimensions is likely inevitable, as there are more neurons than muscles. The key question is whether the brain exploits these dimensions to control when cir-cuits communicate (as opposed to relying on nonlinear thresholds or a time-varying gain). The hypothesis that output-null dimensions are used to control communication leads to two predictions. First, if this mechanism operates between cortex and muscles, then during motor preparation changes in neural firing rates should occur in combina-tions that produce changes in output-null dimensions but do not pro-duce changes in dimensions that are output-potent with respect to the muscles (Fig. 2). Second, if this same mechanism operates between cortical areas, we would expect PMd preparatory activity to prefer-entially occupy dimensions that are output-null with respect to M1.
If this latter prediction is correct, this could help produce the well-known reduction in preparatory activity between PMd and M1.
Exploitation of output-null dimensions is unlikely to leave any par-ticular signature at the level of single neurons. Changing state along the output-null dimensions corresponds to activity changes in most of the relevant neurons (Fig. 2). Such activity cancels out only at the level of the population output. Intriguingly, though, this model tends to produce neurons with mismatches in tuning between the prepara-tory and movement periods, as has been observed previously25–28. Thus, if one averages over neurons based on their preferred reach condition during movement, their preparatory tuning largely aver-ages away (Supplementary Fig. 1). This mismatch is suggestive, but it forms only an indirect test and is neither necessary nor sufficient to demonstrate that such a model is correct (Online Methods). Testing this hypothesis requires both knowing the population response and estimating the output-null and output-potent dimensions.
To test our hypothesis, we used a variant of a standard delayed-reaching task with two monkeys, J and N (Fig. 1 and Online Methods). We recorded the population response using both single- and multiunit neural activity (using single moveable electrodes for data sets J and N, and silicon electrode arrays for data sets JA and NA) and muscle activity (using percutaneous electrodes). Trial-averaged data were used except where noted: the primary goal of these analyses was to explain how there can be preparatory tuning without movement, not to explain trial-by-trial variability. Thus, all repeats of the same con-dition were averaged to produce a single rate versus time. The same reaches were required every day and monkeys were highly practiced. Repeated reaches to the same targets were thus extremely similar to one another over the course of months (Supplementary Fig. 2). Data from different days were therefore combined.
As a basic test for the plausibility of exploiting output-null dimen-sions, we can search for neuron pairs whose preparatory activity
0
10
cm
0
1 a.u.
0
110 spikes per s
200 ms
ba Vertical target position
Vertical cursor positionCentral spot
Firing rate of onePMd neuron
Deltoid EMG
Target Go Move
Figure 1 Task and typical data. (a) Layout of maze task. One typical trial shown. The same mazes were repeated many times; each maze is hereafter called a ‘condition’. (b) Top, task timeline. The monkey initially touched a central spot with a cursor projected slightly above his fingertip; then a target and (typically) barriers appeared. On some trials, two inaccessible distractor ‘targets’ also appeared. After the Go cue (cessation of slight target jitter, extinguishing of central spot), the monkey made a curved reach around the barriers to touch the accessible target, leaving a white trail on the screen. If no barriers were present, reaches were straight. Middle, trial-averaged deltoid EMG; a.u., arbitrary units. Bottom, firing rate of one PMd neuron. Target, target onset; Go, go cue; Move, movement onset. Flanking traces show s.e.m. Maze identifier 100, neuron J-PM48, EMG recording J-PD10.
Firing rate neuron 1
Firin
g ra
te n
euro
n 2
Preparation
BaselineReach right
Go cue
FR n
euro
n 1
FR n
euro
n 2
Out
put-p
oten
t pro
ject
ion
Out
put-n
ull p
roje
ctio
n
TimeT G
TimeT G
TimeT GTimeT G
Reach left
Figure 2 Simplified output-null model. For illustration, assume a muscle receives input from two neurons and produces a response that is the linear sum of the inputs. If the sum is constant (output-null dimension), the muscle cannot distinguish between input 1 being high and 2 low, and vice versa. When the sum changes (output-potent dimension), muscle output will change. If preparatory neural activity changes only within the output-null dimension (two different reaches illustrated in darker and lighter shades), then the muscle’s activity remains constant; when neural activity changes in the output-potent dimension also, movement ensues. Insets: PSTHs for the neurons and PSTH-like views of output-potent and output-null dimensions. T, target onset; G, go cue; FR, firing rate.
hand position
[Kauffman et al 2014]
• not restricted to sensory variables
neural coding problem
spikes
membrane potential
imaging
neural activityencoding models
“external covariates”
[Hardcastle et al 2015]
“regression models”
• not restricted to sensory variables
neural coding problem
spikes
membrane potential
imaging
neural activity
latent variable models
• capture hidden structure underlying neural activity
latent encoding models
latent variable(unobserved or
“hidden”)
(eg. low-dimensional or discrete states)
<latexit sha1_base64="W03iimHuVnRwoVFYpQGh0ykBtHs=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3QkiuGzBqtCpkklPNTSXITlTW4Z5Arf6Kr6L4Erc+g6mdRZaPZDw83/nkJM/SaVwGIavwdT0zOzc/MJiZWl5ZXVtfWPz0pnMcmhyI429TpgDKTQ0UaCE69QCU4mEq6R3OuJXfbBOGH2BwxTait1p0RWcobcag9v1algLx0X/iqgUVVJW/XYj2Ik7hmcKNHLJnGtFYYrtnFkUXEJRiTMHKeM9dgctLzVT4Nr5eNOC7nmnQ7vG+qORjt2fEzlTzg1V4jsVw3s3yUbmf6yVYfeonQudZgiafz/UzSRFQ0ffph1hgaMcesG4FX5Xyu+ZZRx9OJVKrOGBG6WY7uRxv1f4CzjtFRNgUILBBECbFvlNjCYtfKDRZHx/RfOgdlyLGmH15KxMdoFsk12yTyJySE7IOamTJuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkF+AGrsg==</latexit>
<latexit sha1_base64="Y1Rdd0vDJoDdlc8eOTZsZtbYSw8=">AAACgXicbVFdb9MwFHXDgK18rIXHvVhUSB1CldNRuoqXSRMSj0WibKgJlePcdlb9EdnO1irkp/DKfhP/BqfLA2RcydbROffI1+cmmeDWEfK7FTzYe/jo8f5B+8nTZ88PO90XX63ODYMZ00Kby4RaEFzBzHEn4DIzQGUi4CJZn1f6xTUYy7X64rYZxJKuFF9yRp2nFp3utL/9sTmOjIG0mPY3x+Wi0yMDsit8H4Q16KG6potu61uUapZLUI4Jau08JJmLC2ocZwLKdpRbyChb0xXMPVRUgo2L3ewlfu2ZFC+18Uc5vGP/dhRUWruVie+U1F3ZplaR/9PmuVuexgVXWe5AsbuHlrnATuMqCJxyA8yJrQeUGe5nxeyKGsqcj6vdjhTcMC0lVWkRXa9LfwHD67IhbGph0xCcycrie+R01hCqoL1n98+5WSVxQQbvJ6fkZPLWhzx+NwqJB+HJeDgaldUywmb098FsOJgMws+kd/ax3so+OkKvUB+FaIzO0Cc0RTPE0A36iX6h22AveBOQYHjXGrRqz0v0TwUf/gB49sGr</latexit>
spikes
membrane potential
imaging
neural activity
latent variable models
latent variable
latent dynamics
latent dynamical encoding models
• capture hidden dynamics underlying neural activity
(unobserved or “hidden”)
<latexit sha1_base64="W03iimHuVnRwoVFYpQGh0ykBtHs=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3QkiuGzBqtCpkklPNTSXITlTW4Z5Arf6Kr6L4Erc+g6mdRZaPZDw83/nkJM/SaVwGIavwdT0zOzc/MJiZWl5ZXVtfWPz0pnMcmhyI429TpgDKTQ0UaCE69QCU4mEq6R3OuJXfbBOGH2BwxTait1p0RWcobcag9v1algLx0X/iqgUVVJW/XYj2Ik7hmcKNHLJnGtFYYrtnFkUXEJRiTMHKeM9dgctLzVT4Nr5eNOC7nmnQ7vG+qORjt2fEzlTzg1V4jsVw3s3yUbmf6yVYfeonQudZgiafz/UzSRFQ0ffph1hgaMcesG4FX5Xyu+ZZRx9OJVKrOGBG6WY7uRxv1f4CzjtFRNgUILBBECbFvlNjCYtfKDRZHx/RfOgdlyLGmH15KxMdoFsk12yTyJySE7IOamTJuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkF+AGrsg==</latexit>
<latexit sha1_base64="mlBhLlmzDrT6CPDWqx9CWJUwZR0=">AAACkXicbVFdb9MwFHXDx0b5WDd448VQIXUSVM62ritPFQgJiZciUTbUhMhx3M5qbEf2zWiV5Xm/Zq/wW/g3OF0eIONKto7OuUf2PTfOUmGBkN8t787de/e3th+0Hz56/GSns7v31ercMD5lOtXmLKaWp0LxKQhI+VlmOJVxyk/j5ftKP73gxgqtvsA646GkCyXmglFwVNR5MemtI7hcRbAfGMOTYtJzGF/iVVTAG7/cL6NOl/TJpvBt4Negi+qaRLutb0GiWS65ApZSa2c+ySAsqAHBUl62g9zyjLIlXfCZg4pKbsNiM0uJXzkmwXNt3FGAN+zfjoJKa9cydp2SwrltahX5P22Ww/wkLITKcuCK3Tw0z1MMGlfB4EQYziBdO0CZEe6vmJ1TQxm4+NrtQPEfTEtJVVIEF8vSXZzhZdkQVrWwaghgsrL4HoDOGkIVuvNs5pyZRRwWpH88OiGHo9cu5OHRwCcO+IfDg8GgrJbhN6O/DaYH/VHf/0y64w/1VrbRc/QS9ZCPhmiMPqIJmiKGrtA1+ol+ec+8t97Ye3fT6rVqz1P0T3mf/gBUUsgh</latexit>
model desiderata
fittability /tractability
richness / flexibility
(can be fit to data) (capture realistic neural properties)
linear, Gaussian
multi- compartment
Hodgkin-Huxley
sweet spotGLM
descriptive statistical models What is the code?
Why does the codetake this form?
normative theories(e.g. “efficient coding”)
How is it implemented?anatomy,
biophysics
1. Spike count models & Maximum Likelihood
2. Spike train models (GLMs with spike history)
3. Multiple Spike Train Models (GLMs with coupling)
4. Regularization
5. Beyond GLM
6. Latent variable models
Outline
simple example #1: linear Poisson neuron
spike count
spike rate
encoding model:
stimulusparameter
important distributions
Gaussian0 1 2 3 4 5 6 7 8 9 10
−3 −2 −1 0 1 2 3
Poisson
0 1 2 3 4 5 6 7 8 9 10
−3 −2 −1 0 1 2 3
others that may come up: Bernoulli, binomial, multinomial, exponential, gamma,
37
= mean = variance
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
conditional distributionP (y|x)<latexit sha1_base64="XO6O7/dO3TkCBayCB27dyDpqB8U=">AAACOHicbZDPShxBEMZrNMbNqolrjkFoIoJellkvSW5CEDyukFVhZ5We3lptp/8M3TW6y2RAfAKv+ip5khxzCrnmCez9c0hWC7r5+H5VdPWX5kp6iuOf0cLiq6XXy7U39ZXVtbfv1hsbx94WTmBHWGXdaco9KmmwQ5IUnuYOuU4VnqTZ1zE/uUbnpTXfaJRjT/MLIwdScArWcXtn9H24e76+FTfjSbHnojUTW/vb2d0tALTPG9Fm0rei0GhIKO59txXn1Cu5IykUVvWk8JhzkfEL7AZpuEbfKyfrVmw7OH02sC4cQ2zi/jtRcu39SKehU3O69PNsbL7EugUNPvdKafKC0IjpQ4NCMbJs/HfWlw4FqVEQXDgZdmXikjsuKCRUrycGb4TVmpt+mVxnVbhQsKyaA8MZGM4BcnlVniVk8yoE2pqP77no7DW/NOOjEOwBTKsGH+Aj7EALPsE+HEIbOiDgCu7hAR6jH9Gv6Hf0Z9q6EM1m3sN/Ff19ApLpr0s=</latexit><latexit sha1_base64="vElakgQetGBJCA9/pD4i/DlUbZ8=">AAACOHicbZDPSltBFMbnqm1jtNXYpQhDRUg34aab2l2gCF2m0EQhNw1zJyfJ9M6fy8y5NuF6KfQJutWVPkcfwrVLV+LWJ3DyZ2FjD8zw8f3OYc58cSqFwzC8CVZW1168fFVaL29svn6ztV3ZaTuTWQ4tbqSxJzFzIIWGFgqUcJJaYCqWcBwnn6f8+BSsE0Z/w0kKXcWGWgwEZ+itdrM6ORu/723vh7VwVvS5qC/EfuMg+f3r6vK62asEe1Hf8EyBRi6Zc516mGI3ZxYFl1CUo8xBynjChtDxUjMFrpvP1i3ogXf6dGCsPxrpzH06kTPl3ETFvlMxHLllNjX/xzoZDg67udBphqD5/KFBJikaOv077QsLHOXEC8at8LtSPmKWcfQJlcuRhp/cKMV0P49Ok8JfwGlSLIHxAoyXANq0yL9HaNLCB1pfju+5aH2ofaqFX32wR2ReJbJL3pEqqZOPpEG+kCZpEU5+kD/knFwEf4Pb4C64n7euBIuZt+SfCh4eAe1VsVo=</latexit><latexit sha1_base64="vElakgQetGBJCA9/pD4i/DlUbZ8=">AAACOHicbZDPSltBFMbnqm1jtNXYpQhDRUg34aab2l2gCF2m0EQhNw1zJyfJ9M6fy8y5NuF6KfQJutWVPkcfwrVLV+LWJ3DyZ2FjD8zw8f3OYc58cSqFwzC8CVZW1168fFVaL29svn6ztV3ZaTuTWQ4tbqSxJzFzIIWGFgqUcJJaYCqWcBwnn6f8+BSsE0Z/w0kKXcWGWgwEZ+itdrM6ORu/723vh7VwVvS5qC/EfuMg+f3r6vK62asEe1Hf8EyBRi6Zc516mGI3ZxYFl1CUo8xBynjChtDxUjMFrpvP1i3ogXf6dGCsPxrpzH06kTPl3ETFvlMxHLllNjX/xzoZDg67udBphqD5/KFBJikaOv077QsLHOXEC8at8LtSPmKWcfQJlcuRhp/cKMV0P49Ok8JfwGlSLIHxAoyXANq0yL9HaNLCB1pfju+5aH2ofaqFX32wR2ReJbJL3pEqqZOPpEG+kCZpEU5+kD/knFwEf4Pb4C64n7euBIuZt+SfCh4eAe1VsVo=</latexit>
P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
conditional distributionP (y|x)
<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
conditional distributionP (y|x)
<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>
Maximum Likelihood Estimation:
• given observed data , find that maximizes
all spikecounts
all stimuli
parameters
}single-trial probability
Q: what assumption are we making about the responses?A: conditional independence across trials!
A: when considering it as a function of !
Maximum Likelihood Estimation:
• given observed data , find that maximizes
all spikecounts
all stimuli
parameters
}single-trial probability
Q: what assumption are we making about the responses?A: conditional independence across trials!
Q: when do we call a likelihood?
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)Maximum Likelihood Estimation:
• given observed data , find that maximizes
• could in theory do this by turning a knob
P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)Maximum Likelihood Estimation:
• given observed data , find that maximizes
• could in theory do this by turning a knob
P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)Maximum Likelihood Estimation:
• given observed data , find that maximizes
• could in theory do this by turning a knob
P (y|x)<latexit sha1_base64="scq7+Qz0RAQOse9tfscCxr6dKAg=">AAACOHicjZC7ahtBFIbPOk7sKDc7KU1giAg4jVi5SVLZYIxTymBdQKuY2dGRNNFclpmztsR638Ft8ippUucNUroKbvMEHl0KWySQH2b4+b9zmDMnzZT0FMe/orUH6w8fbWw+rjx5+uz5i63tly1vcyewKayyrpNyj0oabJIkhZ3MIdepwnY6Ppzx9jk6L605pWmGPc2HRg6k4BSiVmN3ejl5d7ZVrdfiudi/TXX/5/XxDwBonG1Hr5O+FblGQ0Jx77v1OKNewR1JobCsJLnHjIsxH2I3WMM1+l4xH7dkb0PSZwPrwjHE5undjoJr76c6DZWa08ivsln4N9bNafChV0iT5YRGLB4a5IqRZbO/s750KEhNg+HCyTArEyPuuKCwoUolMXghrNbc9IvkfFyGCwUblytgsgSTFUAuK4vPCdms/L+FNvdqH2vxSVw9OIKFNmEH3sAu1OE9HMAnaEATBHyBK/gK36Lv0XX0O7pZlK5Fy55XcE/Rn1twprBt</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit><latexit sha1_base64="0rYNxxjLoIixCeyBhLOxxQRH874=">AAACOHicjZDNThsxFIU9QFsIbfkpO1TJIqqUbqIJG+iqSAjRZZAIIGWGyOO5ATP+Gdl3aMIw78C2fYi+QN+gb9BlVqjbPkGdhAWNWokj2To6373y9U1yKRyG4c9gbn7h2fMXi0u15ZevXq+srq2fOFNYDh1upLFnCXMghYYOCpRwlltgKpFwmmT7Y356DdYJo49xmEOs2IUWfcEZ+uik3RjeDt73VuutZjgR/b+pf/wxOvy2cTNq99aCt1FqeKFAI5fMuW4rzDEumUXBJVS1qHCQM56xC+h6q5kCF5eTcSv6zicp7Rvrj0Y6SR93lEw5N1SJr1QML90sG4f/Yt0C+7txKXReIGg+fahfSIqGjv9OU2GBoxx6w7gVflbKL5llHP2GarVIw2dulGI6LaPrrPIXcJpVM2DwAAYzAG1elecRmrx62kI7280PzfAorO8dkKkWySbZIg3SIjtkj3wibdIhnFyRO/KFfA2+B6PgPvg1LZ0LHnrekL8U/P4DqFCxyg==</latexit>
0 1 2
log-likelihood
log
Likelihood function: as a function of
Because data are independent:
0 1 2
likelihood
Properties of the MLE (maximum likelihood estimator)
• consistent (converges to true in limit of infinite data)
• efficient (converges as quickly as possible, i.e., achieves minimum possible asymptotic error)
0 20 40
0
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
All slices have same width
encoding distribution
Log-Likelihood
Differentiate, set to zero, and solve for ✓<latexit sha1_base64="1mL7mdCDWiAOaBCSidPGnhyZV6w=">AAACOHicbZC7ThtBFIbPQrjE3EwoI6QRViQqa00DVCChKJREigHJa9Ds+BgPnstq5ixgrfYd0sKr0FDzBimporR5AsaXIjEcaUa//u8czZk/zZT0FMe/opnZD3PzC4sfK0vLK6tr1fVPp97mTmBTWGXdeco9KmmwSZIUnmcOuU4VnqX9oyE/u0HnpTU/aJBhW/MrI7tScArWaUI9JH5ZrcX1eFTsrWhMRO3g+eXbEwCcXK5Hm0nHilyjIaG4961GnFG74I6kUFhWktxjxkWfX2ErSMM1+nYxWrdkX4LTYV3rwjHERu6/EwXX3g90Gjo1p56fZkPzPdbKqbvXLqTJckIjxg91c8XIsuHfWUc6FKQGQXDhZNiViR53XFBIqFJJDN4KqzU3nSK56ZfhQsH65RS4m4C7KUAuK4uLhGxWhkAb0/G9Fc2d+n49/h7XDr/CuBbhM2zBNjRgFw7hGE6gCQKu4Sfcw0P0GL1Ev6M/49aZaDKzAf9V9PcVIVKwzg==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit>
Log-Likelihood
Maximum-Likelihood Estimator:(“Least squares regression” solution)
(Recall that for Poisson, )
example #3: unknown neuron
-25 0 250
25
50
75
100
(contrast)
(spi
ke c
ount
)
Be the computational neuroscientist: what model would you use?
Example 3: unknown neuron
More general setup:
-25 0 250
25
50
75
100
(contrast)
(spi
ke c
ount
)
firing rate is nonlinear
Poisson firingThis is a GLM!
stimulus filter Poissonspiking
stimulus
f
λ(t)
exponentialnonlinearity
dimensionalityreduction
nonlinearstretching
noise
“basic” Poisson generalized linear model (GLM)
Linear-Nonlinear-Poisson (LNP) model
spike ratespike count
✓<latexit sha1_base64="1mL7mdCDWiAOaBCSidPGnhyZV6w=">AAACOHicbZC7ThtBFIbPQrjE3EwoI6QRViQqa00DVCChKJREigHJa9Ds+BgPnstq5ixgrfYd0sKr0FDzBimporR5AsaXIjEcaUa//u8czZk/zZT0FMe/opnZD3PzC4sfK0vLK6tr1fVPp97mTmBTWGXdeco9KmmwSZIUnmcOuU4VnqX9oyE/u0HnpTU/aJBhW/MrI7tScArWaUI9JH5ZrcX1eFTsrWhMRO3g+eXbEwCcXK5Hm0nHilyjIaG4961GnFG74I6kUFhWktxjxkWfX2ErSMM1+nYxWrdkX4LTYV3rwjHERu6/EwXX3g90Gjo1p56fZkPzPdbKqbvXLqTJckIjxg91c8XIsuHfWUc6FKQGQXDhZNiViR53XFBIqFJJDN4KqzU3nSK56ZfhQsH65RS4m4C7KUAuK4uLhGxWhkAb0/G9Fc2d+n49/h7XDr/CuBbhM2zBNjRgFw7hGE6gCQKu4Sfcw0P0GL1Ev6M/49aZaDKzAf9V9PcVIVKwzg==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit><latexit sha1_base64="08bW+NyMbCV5777Y3FTtWP++NHs=">AAACOHicbZDLThsxFIY9oVwa7tBdVckiQmIVTdgAK5AQossgNRcpE5DHOUlMfBnZZ4AwmndgSx+CF+gb9A1YZlV12yeoc1m0gSPZ+vV/58jHf5xI4TAMX4PCwofFpeWVj8XVtfWNza3tnbozqeVQ40Ya24yZAyk01FCghGZigalYQiMenI954w6sE0Z/w2ECbcV6WnQFZ+iteoR9QHazVQrL4aToW1GZidLpz9Hly6fHUfVmO/gSdQxPFWjkkjnXqoQJtjNmUXAJeTFKHSSMD1gPWl5qpsC1s8m6Od33Tod2jfVHI524/05kTDk3VLHvVAz7bp6NzfdYK8XucTsTOkkRNJ8+1E0lRUPHf6cdYYGjHHrBuBV+V8r7zDKOPqFiMdJwz41STHey6G6Q+ws4HeRz4GEGHuYA2iTPriM0Se4DrczH91bUDssn5fAqLJ1dkGmtkM9kjxyQCjkiZ+QrqZIa4eSWPJFn8j34EYyCX8HvaWshmM3skv8q+PMXWPyyKw==</latexit>
• also known as a “cascade” model
What is a GLM?
Be careful about terminology:
Linear Linear
General Linear Model Generalized Linear Model
GLM GLM≠
(Nelder 1972)
Linear Noise
“Dimensionality Reduction”
(exponential family)
Examples: 1. Gaussian
2. Poisson
y = ~✓ · ~x + ✏
1. General Linear Model
2. Generalized Linear Model
Linear
Examples: 1. Gaussian
2. Poisson
Noise(exponential family)
Nonlinear
y = f(~✓ · ~x) + ✏
2. Generalized Linear Model
Linear Noise(exponential family)
Nonlinear
Terminology:
“distribution function”
“parameters”= “link function”
f<latexit sha1_base64="AI/t4M9ypbTplLN4BJ80y0xLxXQ=">AAACNHicbVBNSxxBEK0xJurmQzc5itAogZyW2VwSb0IIeFTJZhd2NtLTU6PN9sfQXaO7DPMPvCZ/Jdf8C/GQU/Dqb0jvx0FHH1TxeK+Krn5poaSnOL6JVp6tPn+xtr7Revnq9ZvNrfbb796WTmBPWGXdIOUelTTYI0kKB4VDrlOF/XT8Zeb3L9B5ac03mhY40vzMyFwKTkE6ydnp1l7ciedgj0l3SfYO2J9rCDg6bUc7SWZFqdGQUNz7YTcuaFRxR1IorFtJ6bHgYszPcBio4Rr9qJqfWrP3QclYbl0oQ2yu3t+ouPZ+qtMwqTmd+6Y3E5/yhiXln0eVNEVJaMTiobxUjCyb/Ztl0qEgNQ2ECyfDrUycc8cFhXRarcTgpbBac5NVycW4Dg0FG9cNY7I0Jg2DXFFXPxKyRR0C7Tbje0x6Hzv7nfg4BPsVFliHbdiFD9CFT3AAh3AEPRCQwxX8hF/R7+hv9C+6XYyuRMudd/AA0d1/cc6twA==</latexit><latexit sha1_base64="RQP4MU4syzl9VD9nDpZqDmBMilc=">AAACNHicbZC7SgQxFIYz3l3vWooQFMFqmbVRO0EFSxVXhZ1VMtkzGjaXITmjLsO8ga2+ioWNTyFYWImtnb3ZS6GrBxJ+/u8ccvLHqRQOw/A1GBgcGh4ZHRsvTUxOTc/Mzs2fOJNZDlVupLFnMXMghYYqCpRwllpgKpZwGjd32vz0GqwTRh9jK4W6YpdaJIIz9NZRQi9mV8Jy2Cn6V1R6YmWbPr987T4tHlzMBUtRw/BMgUYumXO1SphiPWcWBZdQlKLMQcp4k11CzUvNFLh63lm1oKveadDEWH800o77cyJnyrmWin2nYnjl+lnb/I/VMkw267nQaYagefehJJMUDW3/mzaEBY6y5QXjVvhdKb9ilnH06ZRKkYYbbpRiupFH183CX8Bps+gDtz1w2wfQpkV+HqFJCx9opT++v6K6Xt4qh4c+2D3SrTGySJbJGqmQDbJN9skBqRJOEnJH7slD8Bi8Be/BR7d1IOjNLJBfFXx+A8zZr84=</latexit><latexit sha1_base64="RQP4MU4syzl9VD9nDpZqDmBMilc=">AAACNHicbZC7SgQxFIYz3l3vWooQFMFqmbVRO0EFSxVXhZ1VMtkzGjaXITmjLsO8ga2+ioWNTyFYWImtnb3ZS6GrBxJ+/u8ccvLHqRQOw/A1GBgcGh4ZHRsvTUxOTc/Mzs2fOJNZDlVupLFnMXMghYYqCpRwllpgKpZwGjd32vz0GqwTRh9jK4W6YpdaJIIz9NZRQi9mV8Jy2Cn6V1R6YmWbPr987T4tHlzMBUtRw/BMgUYumXO1SphiPWcWBZdQlKLMQcp4k11CzUvNFLh63lm1oKveadDEWH800o77cyJnyrmWin2nYnjl+lnb/I/VMkw267nQaYagefehJJMUDW3/mzaEBY6y5QXjVvhdKb9ilnH06ZRKkYYbbpRiupFH183CX8Bps+gDtz1w2wfQpkV+HqFJCx9opT++v6K6Xt4qh4c+2D3SrTGySJbJGqmQDbJN9skBqRJOEnJH7slD8Bi8Be/BR7d1IOjNLJBfFXx+A8zZr84=</latexit>
= “the nonlinearity”
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
stimulus
response
Applying it to data
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
~xt
yt
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
stimulus
response
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
t = 1
walk through the data one time bin at a time
~xt
yt
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
t = 2
walk through the data one time bin at a time
stimulus
response
~xt
yt
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
t = 3
walk through the data one time bin at a time
stimulus
response
~xt
yt
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
t = 4
walk through the data one time bin at a time
stimulus
response
~xt
yt
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
t = 5
walk through the data one time bin at a time
stimulus
response
~xt
yt
00 10 00 1 00 0 0 0 0 1 010 10 0 0 0 00 0 0 00 0 00 000 00 0
linearfilter
vector stimulus at time t
yt = ~k · ~xt + noise
time
responseat time t
t = 6
walk through the data one time bin at a time
stimulus
response
Computing maximum likelihood estimate
001
…
Y X~k= + noise
…=tim
e
stimulus covariance
spike-triggered avg(STA)
k̂ = (XTX)�1XTY
~k
1. “Linear-Gaussian” GLM:
001
…
Y = + noise
…=tim
e
2. Poisson GLM: k = glmfit(X,Y,‘Poisson’);
maximum likelihood fit(assumes exponential nonlinearity by default)
~k
Computing maximum likelihood estimate
<latexit sha1_base64="CaRu3bohmMwwqXh5y9qIYwNjphg=">AAAB8XicbVBNS8NAEN34WetX1aOXxSLUS0m8qLeiF48VjC2koWy2k3bpZhN2J4VS+jO8eFDx6r/x5r9x2+agrQ8GHu/NMDMvyqQw6Lrfztr6xubWdmmnvLu3f3BYOTp+MmmuOfg8laluR8yAFAp8FCihnWlgSSShFQ3vZn5rBNqIVD3iOIMwYX0lYsEZWimIa23aGQGnw4tuperW3TnoKvEKUiUFmt3KV6eX8jwBhVwyYwLPzTCcMI2CS5iWO7mBjPEh60NgqWIJmHAyP3lKz63So3GqbSmkc/X3xIQlxoyTyHYmDAdm2ZuJ/3lBjvF1OBEqyxEUXyyKc0kxpbP/aU9o4CjHljCuhb2V8gHTjKNNqWxD8JZfXiX+Zf2m7j241cZtkUaJnJIzUiMeuSINck+axCecpOSZvJI3B50X5935WLSuOcXMCfkD5/MHKGeQDA==</latexit>
001
…
Y = + noise
…=tim
e
3. Bernoulli GLM: k = glmfit(X,Y,’binomial’);(assumes logistic nonlinearity by default)
~k
“logistic regression”
outputs 0 and 1
Computing maximum likelihood estimate
<latexit sha1_base64="CaRu3bohmMwwqXh5y9qIYwNjphg=">AAAB8XicbVBNS8NAEN34WetX1aOXxSLUS0m8qLeiF48VjC2koWy2k3bpZhN2J4VS+jO8eFDx6r/x5r9x2+agrQ8GHu/NMDMvyqQw6Lrfztr6xubWdmmnvLu3f3BYOTp+MmmuOfg8laluR8yAFAp8FCihnWlgSSShFQ3vZn5rBNqIVD3iOIMwYX0lYsEZWimIa23aGQGnw4tuperW3TnoKvEKUiUFmt3KV6eX8jwBhVwyYwLPzTCcMI2CS5iWO7mBjPEh60NgqWIJmHAyP3lKz63So3GqbSmkc/X3xIQlxoyTyHYmDAdm2ZuJ/3lBjvF1OBEqyxEUXyyKc0kxpbP/aU9o4CjHljCuhb2V8gHTjKNNqWxD8JZfXiX+Zf2m7j241cZtkUaJnJIzUiMeuSINck+axCecpOSZvJI3B50X5935WLSuOcXMCfkD5/MHKGeQDA==</latexit>
GLM summary
k̂ = (XTX)�1XTY
1. Linear-Gaussian GLM: Y |X,~k ⇠ N (X~k,�2I)
2. Poisson GLM:
3. Bernoulli GLM: yt|~xt,~k ⇠ Ber(f(~xt · ~k))
<latexit sha1_base64="UNt4lEq3y3m8KLk2pIT2tsWpp4c=">AAACbXicbZDNThsxFIWdofw0LRBA6qZQWY0QsCCaYdOyqITaTRcsqERKqkyIPM6dxIp/RvYdSjSaZ+Fpum3XfQseAWeYRQm9kq2j890rX58kk8JhGP5tBEsvlldW1142X71e39hsbW1/dya3HLrcSGN7CXMghYYuCpTQyywwlUi4SqZf5vzqBqwTRl/iLIOBYmMtUsEZemvYOo0Vwwlnsjgv6Sf6I0ab0ViaMU0Pe/HN9Ige06olSYuorGgNhq122Amros9FVIs2qetiuNXYi0eG5wo0csmc60dhhoOCWRRcQtmMcwcZ41M2hr6Xmilwg6L6Y0n3vTOiqbH+aKSV++9EwZRzM5X4zvm6bpHNzf+xfo7px0EhdJYjaP74UJpLiobOA6MjYYGjnHnBuBV+V8onzDKOPtZmM9bwkxulmB4VPpbSX8DptFwAtzW4XQA+0LK4jtFkpQ80WozvueiedE474bewffa5TnaNvCXvySGJyAdyRr6SC9IlnNyRX+Q3+dO4D94Eu8G7x9agUc/skCcVHDwAMYS9SQ==</latexit>
<latexit sha1_base64="HP+VfMMA0E+XGBPwmKdZYtaDvzk=">AAACcnicbZHLahsxFIbl6S11L7GTZSioNQUbGjPTTZJFSkg3WXThQl07eFyjkc84wroM0hknZpi3ydNkm2z6IN1HdgfaOj0g8ev/dJD0K8mkcBiGP2vBo8dPnj7bel5/8fLV6+1Gc+e7M7nl0OdGGjtMmAMpNPRRoIRhZoGpRMIgmX9e8cECrBNGf8NlBmPFZlqkgjP01qTxKVYMLziTxZeSHtPzGG1GY2lmNG0P48W8Q/dpO9o/7/wBflmxDp00WmE3XBd9KKJKtEhVvUmz9iaeGp4r0Mglc24UhRmOC2ZRcAllPc4dZIzP2QxGXmqmwI2L9UNL+t47U5oa64dGunb/7iiYcm6pEr9z9Sy3yVbm/9gox/RwXAid5Qia/z4ozSVFQ1ep0amwwFEuvWDcCn9Xyi+YZRx9tvV6rOGSG6WYnhY+mNJPwOm83ABXFbjaAD7asvgRo8lKH2i0Gd9D0f/YPeqGX8PWyWmV7BbZI+9Im0TkgJyQM9IjfcLJNbkht+Su9ivYC94G1TcEtapnl/xTwYd7gzC9GQ==</latexit>
log-likelihood:
<latexit sha1_base64="F3rP+XsnNYD2Mnh1h5tcub2kI7M=">AAACXXicbVBNb9NAEN2YUkpaSkoPHLisGiGlEorsXgD1UrUXjqlEaKVsFK3X43SV/bB2xyWR8f/h13Cl3PpTuk59gJSRdvT03hvNzksLJT3G8Z9O9Gzr+faLnZfd3b1X+697B2++eVs6AWNhlXXXKfegpIExSlRwXTjgOlVwlS4uGv3qFpyX1nzFVQFTzedG5lJwDNSsd76iPyi7XX4IbUHZKfNSs1NKmeZ443Q1stL7mqVyPqD5IBhnSJnILDb+44Y/nvX68TBeF30Kkhb0SVuj2UFni2VWlBoMCsW9nyRxgdOKO5RCQd1lpYeCiwWfwyRAwzX4abU+tqbvA5PR3LrwDNI1+/dExbX3K50GZ3OD39Qa8n/apMT807SSpigRjHhclJeKoqVNcjSTDgSqVQBcOBn+SsUNd1xgyLfbZQa+C6s1N1kVsqlDA0EX9YawbIVlHXJLNlN6CsYnw8/D5DLun523Ae6Qd+SIDEhCPpIz8oWMyJgI8pP8Ir/JXec+2o72ov1Ha9RpZw7JPxW9fQBidLYs</latexit>
<latexit sha1_base64="/eGfDfF6Yx5Ln/n5tPz6/Ht41Bw=">AAACZnicbZDNThsxFIWdaaEQ/gJVxQIhWURIVBXRTDaFHSqbLkEiJSgTIo9zJ1jxz8i+A0SjeY8+TbflFfoIfQucMEgQuJKt43Pule0vyaRwGIb/asGHjwuLn5aW6yura+sbjc2tX87klkOHG2lsN2EOpNDQQYESupkFphIJl8n4dJpf3oJ1wugLnGTQV2ykRSo4Q28NGu3DGFPLeBGVRTt2YqTYdbs8uDrsxrfjrzHajD4f6DfKjXY4aDTDVjgr+lZElWiSqs4Gm7XdeGh4rkAjl8y5XhRm2C+YRcEllPU4d5AxPmYj6HmpmQLXL2afK+m+d4Y0NdYvjXTmvpwomHJuohLfqRjeuPlsar6X9XJMj/qF0FmOoPnTRWkuKRo6JUWHwgJHOfGCcSv8Wym/YZ4Vep71eqzhjhulmB4WHk/pN+B0XM4F91VwPxd4smVxHaPJSg80msf3VnTareNWdB42T35UZJfIDtkjByQi38kJ+UnOSIdw8pv8IX/JQ+1/sBF8CbafWoNaNfOZvKqAPgIWdLp7</latexit>
log-likelihood:
MLE:
log-likelihood:
“logistic regression” if
integer counts
binary counts
continuous
<latexit sha1_base64="lp1eCGr0ThxpOBIdDQbd/kxVouE=">AAACTXicdZDfahNBFMZno7U1tTbVKxFhaBBaistOTU17USh642UFYwvZNMxOzrZD5s8yM1sThqVP09v6Kt76Il6V4mwaQYMemOHj+53DmfmyQnDrkuRH1HjwcOnR8srj5uqTtafrrY1nX6wuDYMe00Kb04xaEFxBz3En4LQwQGUm4CQbf6j5ySUYy7X67KYFDCQ9VzznjLpgDVsv8q3JNj7EaW4o86TyZAfO/JtJVQ1b7SQ+2O+8JR2cxAkh3Xd7tdjtdvYIJnEyqzaa1/FwI3qVjjQrJSjHBLW2T5LCDTw1jjMBVTMtLRSUjek59INUVIId+NkfKvw6OCOcaxOOcnjm/jnhqbR2KrPQKam7sIusNv/F+qXL9weeq6J0oNj9orwU2GlcB4JH3ABzYhoEZYaHt2J2QUMaLsTWbKYKvjItJVUjn16Oq3ABw+NqAUzmYLIAnCkqf5Y6XdSB/k4N/1/0duODmHxK2kfv58muoJdoE20hgrroCH1Ex6iHGLpC1+gGfYu+Rz+j2+juvrURzWeeo7+qsfwLVEK0Hg==</latexit>
stimulus filter Poissonspiking
stimulus
kλ(t)
spike rate
exponentialnonlinearity
f
• problem: assumes spiking depends only on stimulus!
Poisson GLM
Poisson GLM with spike-history dependence
post-spike filter
exponentialnonlinearity
probabilisticspiking
stimulus
stimulus filter
+
(Truccolo et al 2004, Gerstner 2001)
• output: no longer a Poisson process• interpretation: “soft-threshold” integrate-and-fire model
spike rate:
kh
f
Poisson GLM with spike-history dependence
post-spike filter
exponentialnonlinearity
probabilisticspiking
stimulus
stimulus filter
+
(Truccolo et al 2004, Gerstner 2001)
kh
f
filter output
traditional IF
filter output
∞“hard threshold”
“soft-threshold” IF
spik
e ra
te
• interpretation: “soft-threshold” integrate-and-fire model
<latexit sha1_base64="Foj+KEmVWEyBlwzgy1UK2SPlVfo=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3RXduFSwKnSqZNLTNjSXITlTLcM8gVt9Fd9FcCVufQfTOgutHkj4+b9zyMmfpFI4DMPXYGp6ZnZufmGxsrS8srq2vrF55UxmOTS4kcbeJMyBFBoaKFDCTWqBqUTCddI/HfHrAVgnjL7EYQotxbpadARn6K2Lq7v1algLx0X/iqgUVVLW+d1GsBO3Dc8UaOSSOdeMwhRbObMouISiEmcOUsb7rAtNLzVT4Fr5eNOC7nmnTTvG+qORjt2fEzlTzg1V4jsVw56bZCPzP9bMsHPUyoVOMwTNvx/qZJKioaNv07awwFEOvWDcCr8r5T1mGUcfTqUSa7jnRimm23k86Bf+Ak77xQR4KMHDBECbFvltjCYtfKDRZHx/ReOgdlyLLsJq/aRMdoFsk12yTyJySOrkjJyTBuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkFuZyrjQ==</latexit>
GLM dynamic behaviors
post-spike filter h(t)
stimulus
p(spike)
• irregular spiking
filter outputs(“currents”)
GLM dynamic behaviors
post-spike filter h(t)
stimulus
p(spike)
• regular spiking
filter outputs(“currents”)
GLM dynamic behaviors
post-spike filter h(t)
stimulus
filter outputs(“currents”)
p(spike)
• adaptation
GLM dynamic behaviors
post-spike filter h(t)
• bursting
filter outputs(“currents”)
p(spike)
stimulus
GLM dynamic behaviors (from Izhikevich)A B C D
E F G H
I J K L
M N O P
tonic spiking phasic spiking tonic bursting phasic bursting
mixed mode type I type II
spike latency resonator integrator rebound spike
rebound burst variabilitybistability I bistability II
50 ms
spike frequencyadaptation
threshold
Figure 6: Suite of dynamical behaviors of Izhikevich and GLM neurons. Each panel,
top to bottom: stimulus (blue), Izhikevich neuron response (black), GLM responses
on five trials (gray), stimulus filter (left, blue), and post-spike filter (right, red). Black
line in each plot indicates a 50 ms scale bar for the stimulus and spike response.
(Differing timescales reflect timescales used for each behavior in original Izhikevich
paper (Izhikevich, 2004)). Stimulus filter and post-spike filter plots all have 100 ms
duration.
19
(Weber & Pillow 2017)
Izhikevich neuron
GLM spikes
stimulus
GLM parameters
multi-neuron GLM
exponentialnonlinearity
probabilisticspiking
stimulus
neuron 1
neuron 2
post-spike filter
stimulus filter
+
+
multi-neuron GLM
exponentialnonlinearity
probabilisticspiking
coupling filters
stimulus
neuron 1
neuron 2
post-spike filter
stimulus filter
+
+
Uzzell et al (J Neurophys 04)
• stimulus = binary flicker• parasol retinal ganglion cell spike responses
Example dataset
Uzzell et al (J Neurophys 04)
• stimulus = binary flicker• parasol retinal ganglion cell spike responses
Example dataset
YX~k
time
time lag
<latexit sha1_base64="T7110r2UjToEat5gBhsCYBUeHmw=">AAACOHicbZDLThsxFIY9XAqktFy6REhWI6SwiWbYADsEmy5TqSFBmTTyOCfBjS8j+0wgms47sIVX4Um67KpiyxPgJLOA0CPZ+vV/58jHf5JK4TAM/wRLyyurH9bWNyofNz993tre2b10JrMcmtxIY9sJcyCFhiYKlNBOLTCVSGglo4spb43BOmH0D5yk0FVsqMVAcIbeumzUrn63D3vb1bAezoq+F1EpqqSsRm8n2I/7hmcKNHLJnOtEYYrdnFkUXEJRiTMHKeMjNoSOl5opcN18tm5BD7zTpwNj/dFIZ+7riZwp5yYq8Z2K4bVbZFPzf6yT4eCkmwudZgiazx8aZJKiodO/076wwFFOvGDcCr8r5dfMMo4+oUol1nDDjVJM9/N4PCr8BZyOigVwW4LbBYA2LfKfMZq08IFGi/G9F82j+mk9+h5Wz87LZNfJHvlKaiQix+SMfCMN0iSc/CJ35J48BI/B3+Bf8DRvXQrKmS/kTQXPLyBqrTc=</latexit>
model
Stimulus-only GLM
spike responsedesign matrix
time <latexit sha1_base64="T7110r2UjToEat5gBhsCYBUeHmw=">AAACOHicbZDLThsxFIY9XAqktFy6REhWI6SwiWbYADsEmy5TqSFBmTTyOCfBjS8j+0wgms47sIVX4Um67KpiyxPgJLOA0CPZ+vV/58jHf5JK4TAM/wRLyyurH9bWNyofNz993tre2b10JrMcmtxIY9sJcyCFhiYKlNBOLTCVSGglo4spb43BOmH0D5yk0FVsqMVAcIbeumzUrn63D3vb1bAezoq+F1EpqqSsRm8n2I/7hmcKNHLJnOtEYYrdnFkUXEJRiTMHKeMjNoSOl5opcN18tm5BD7zTpwNj/dFIZ+7riZwp5yYq8Z2K4bVbZFPzf6yT4eCkmwudZgiazx8aZJKiodO/076wwFFOvGDcCr8r5dfMMo4+oUol1nDDjVJM9/N4PCr8BZyOigVwW4LbBYA2LfKfMZq08IFGi/G9F82j+mk9+h5Wz87LZNfJHvlKaiQix+SMfCMN0iSc/CJ35J48BI/B3+Bf8DRvXQrKmS/kTQXPLyBqrTc=</latexit>
model model
stimulus portion
spike-historyportion
Stimulus + SpikeHistory GLM
YX~k
spike responsedesign matrix
time <latexit sha1_base64="T7110r2UjToEat5gBhsCYBUeHmw=">AAACOHicbZDLThsxFIY9XAqktFy6REhWI6SwiWbYADsEmy5TqSFBmTTyOCfBjS8j+0wgms47sIVX4Um67KpiyxPgJLOA0CPZ+vV/58jHf5JK4TAM/wRLyyurH9bWNyofNz993tre2b10JrMcmtxIY9sJcyCFhiYKlNBOLTCVSGglo4spb43BOmH0D5yk0FVsqMVAcIbeumzUrn63D3vb1bAezoq+F1EpqqSsRm8n2I/7hmcKNHLJnOtEYYrdnFkUXEJRiTMHKeMjNoSOl5opcN18tm5BD7zTpwNj/dFIZ+7riZwp5yYq8Z2K4bVbZFPzf6yT4eCkmwudZgiazx8aZJKiodO/076wwFFOvGDcCr8r5dfMMo4+oUol1nDDjVJM9/N4PCr8BZyOigVwW4LbBYA2LfKfMZq08IFGi/G9F82j+mk9+h5Wz87LZNfJHvlKaiQix+SMfCMN0iSc/CJ35J48BI/B3+Bf8DRvXQrKmS/kTQXPLyBqrTc=</latexit>
model model
stimulus portion
neuron 1spike-hist
neuron 2spike-hist
neuron 3spike-hist
neuron 4spike-hist
YX~k
spike responsedesign matrix
Stimulus + History + 3 Neuron Coupling GLM
Fitting: Maximum Likelihood
• maximize log-likelihood for filters {k, h1, h2, …hn}GLMData
• log-likelihood is concave • no local maxima [Paninski 04]
logP (Y |X) =X
t
yt log �t � �t
<latexit sha1_base64="6FqC25K3VDOCpK+hSgntuV0eIRs=">AAACVXicbVBNT9tAFFy7FGgoEOBYIW0bVaKXyO6lcKiE4MIRJAJIcYjW62dYZT+s3eeQyPK5v6ZX+Cv0z1RsEh/awEi7Gs3M09udtJDCYRT9CcJ3K+9X19Y/tDY+bm5tt3d2r5wpLYceN9LYm5Q5kEJDDwVKuCksMJVKuE5HpzP/egzWCaMvcVrAQLE7LXLBGXpp2P6cSB/O2BDpT5ofJOOJZwnPjL/HwOno27DdibrRHPQ1iRvSIQ3OhzvBfpIZXirQyCVzrh9HBQ4qZlFwCXUrKR0UjI/YHfQ91UyBG1Tzv9T0q1cymhvrj0Y6V/+dqJhybqpSn1QM792yNxPf8vol5oeDSuiiRNB8sSgvJUVDZ8XQTFjgKKeeMG6Ffyvl98wyjr6+VivR8MCNUkxnVTIe1dWinnrJmDTGZMlAW9TVbYKmqH2h8XJ9r0nve/eoG19EneOTptl18ol8IQckJj/IMTkj56RHOPlFfpNH8hQ8B3/DlXB1EQ2DZmaP/Idw+wUvX7T3</latexit>
firing rate:
convexity and concavity
concave convex
• everywhere downward curvature
• everywhere upward curvature
• maximizing concave function ⟺ minimizing a convex function
• preclude existence of non-global local optima
1 2 3 4 5 7 8 9 106
2
11
10
9
8
6
5
4
3
7
75 sp/s
50 ms
1
2
3
4
5
6 7
8
9
1 0
11
receptive fields
modeling correlation structure in neural spike trains
dataGLMuncoupled GLM
Pillow et al 2008
37
[Pillow et al 2008]
cross-correlationscell #
cell
#
capturing dependencies in multi-neuron responses
retinal receptive fields
Decoding
y
x ?
• estimate stimuli from the observed spike times • tool for comparing different encoding models
1 pixel
y
x ?
Bayes’ rule:
likelihoodposterior prior
“independent” (uncoupled GLM)
“joint encoding” (coupled GLM)
vs.
Bayesian Decoding
Decoding Comparison
Bayesian decoding
0
1 0
2 0
3 0
4 0w/o coupling
with coupling
20% increase
linear decoding
linear
[Pillow et al 2008]
Modern statistics• more dimensions than samples
= N observations
D regressors
• fewer equations than unknowns!• no unique solution
+ noise
Simulated Example
truetrue w maximum likelihood
• 100-element filter (D=100)• 100 noisy samples (N=100)
maximize
“overfitting” - parameters fit to details in the training data that are not useful for predicting new data
Simulated Example
truetrue w maximum likelihood
maximize
“ridge regression”
maximize
penalty on big weights
• 100-element filter (D=100)• 100 noisy samples (N=100)
true
maximum likelihood
ridge
Lasso
automatic relevance
automatic smoothness
“L2 shrinkage”
ARD(sparse)
ASD
“L1 shrinkage”(sparse)
- biased, but gives improved performance for appropriate choice of λ (James & Stein 1960)
Simulated Example
truetrue w maximum likelihood
maximize
“smoothed”
maximize
smoothnesspenalty
• 100-element filter (D=100)• 100 noisy samples (N=100)
Simplest answer: use cross-validation!Q: how to set the regularization strength ? �
GLM tutorial (matlab):code: https://github.com/pillowlab/GLMspiketraintutorial data: available on request from [email protected]
• tutorial1_PoissonGLM.m - fitting of a linear-Gaussian GLM and Poisson GLM (aka LNP model) to RGC neurons stimulated with temporal white noise stimulus.
• tutorial2_spikehistcoupledGLM.m - fitting of a Poisson GLM with spike-history and coupling between neurons.
• tutorial3_regularization_linGauss.m - regularizing linear-Gaussian model parameters using maximum a posteriori (MAP) estimation under two kinds of priors:◦ (1) ridge regression (aka "L2 penalty");◦ (2) L2 smoothing prior (aka "graph Laplacian”).
• tutorial4_regularization_PoissonGLM.m - MAP estimation of Poisson-GLM parameters using same two priors as in tutorial3.
GLM summary
• linear (“dim reduction”) + nonlinear + noise
• incorporate spike-history via “spike history” filter
• rich dynamical properties: refractoriness, bursting, adaptation
• incorporate correlations between neurons via “coupling” filters
• flexible tool for encoding & decoding analyses
• regularize to reduce overfitting (essential w/ correlated stimuli)
Taylor series expansion of a function f(x) in n dimensions
const
vector
matrix
3-tensor
1 n (20)
n3
(8000)n2
(400)# parameters:
polynomial models
Volterra / Wiener Kernels
Lee & Schetzen 1965Marmarelis & Naka 1972Korenberg & Hunter1986
• from “systems identification” literature (1960s-70s)• white noise stimuli • estimate kernels using moments of spike-triggered stimuli
Why are Volterra/Wiener models (generally) bad?
• no output nonlinearity• polynomials give poor fit to neural nonlinearities (e.g.,
rectifying, saturating)
-3 -2 -1 0 1 2 3-50
0
50
100
150
rate
(sp/
s)
stimulus projection
true responselinear modelquad model
• responses may depend on more than one projection of stimulus!• emphasis on dimensionality reduction
• no longer technically a GLM if fitting nonlinearity f
multi-filter LNP
multi-filter LNP
• Spike-triggered covariance (STC) [de Ruyter & Bialek 1998, Schwartz et al 2006]
• Generalized Quadratic Model (GQM) [Park & Pillow 2011; Park et al 2013; Rajan et al 2013]
• maximally informative dimensions (MID) / maximum likelihood
[Sharpee et al 2004] [Williamson et al 2015]
Estimators:
extending GLM to conductance-based model
Stimulus
nonlinearity
inh filter
exc filter
noise
post-spike filterspikes
�(t) = f(V (t))
membranedynamics
dV
dt= gl(El � V ) + ge(Ee � V ) + gi(Ei � V )
inst. spike rate
conductancesge(t) = fc(ke · x(t))gi(t) = fc(ki · x(t))
• shunting inhibition• adaptive changes in dynamics
[Latimer et al 2014]
excitatory (from spikes)inhibitory (from spikes)excitatory (from conductance)inhibitory (from conductance)
Linear filters
0 100 200
−0.1
0
0.1
time (ms)
gain
• intracellular recordings in macaque parasol RGCs (Fred Rieke)
250ms
10nS
exci
tato
ryin
hibi
tory
measured conductancesfit to conductance (R2=0.83)fit to spikes (R2 = 0.63)
(R2=0.63)(R2 = 0.51)
extending GLM to conductance-based model
[Latimer et al 2014]
many other biophysically oriented extensions
other aspects of neuronal processing into the linear stimulus-processing framework, and can be used to model nonlinearstimulus processing through predefined nonlinear transformations[31,32,58]. Importantly, this approach also provides a foundationfor parameter estimation for the NIM.
A principal motivation for the NIM structure is that if theneuronal output at one level is well described by an LN model,downstream neurons will receive inputs that are already rectified(or otherwise nonlinearly transformed). Thus, we use LN modelsto represent the inputs to the neuron in question, and the neuron’sresponse is given by a summation over these LN inputs followed bythe neuron’s own spiking nonlinearity (Fig. 2B). Importantly, thisallows us to account for the rectification of a neuron’s inputsimposed by the spike-generation process. The NIM can thus beviewed as a ‘second-order’ generalization of the LN model, or anLNLN cascade [59,60]. Previous work from our lab [45] cast thismodel structure in a probabilistic form, and suggested severalstatistical innovations in order to fit the models using neural data[45,61,62]. Here, we present a general and detailed framework forNIM parameter estimation that greatly extends the applicability ofthe model. This model structure has also been suggested forapplications outside of neuroscience in the form of projectionpursuit regression [63], including generalizations to responsevariables with distributions from the exponential family [64].
The processing of the NIM is comprised of three stages (Fig. 2C):(a) the filters ki that define the stimulus selectivity of each input; (b)the static ‘upstream’ nonlinearities fi(.) and corresponding linearweights wi which determine how each input contributes to theoverall response; and (c) the spiking nonlinearity F[.] applied to thelinear sum over the neuron’s inputs. The predicted firing rate r(t) isthen given as:
r(t)~FX
i
w ifi ki:s(t)ð Þð Þzh:x(t)
" #
, ð1 Þ
where s(t) is the (vector-valued) stimulus at time t, x(t) representsany additional covariates (such as the neuron’s own spike history),and h is a linear filter operating on x. Note that equation (1)reduces to a GLM when the fi(.) are linear functions. The wi canalso be extended to include temporal convolution of the subunitcontributions to model the time course of post-synaptic responsesassociated with individual inputs [45], as well as ‘spatial’convolutions to account for multiple spatially distributed inputswith similar stimulus selectivity [65]. Since equivalent models canbe produced by rescaling the wi, and fi(.) (see Methods), weconstrain the subunit weights wi to be either +/21. Because wegenerally assume the fi(.) are rectifying functions, the wi thus specifywhether each subunit will have an ‘excitatory’ or ‘inhibitory’influence on the neuron.
Parameter estimation for the NIM is based on maximumlikelihood (or maximum a posteriori) methods similar to those usedwith the GLM [6–8]. Assuming that the neuron’s spikes aredescribed in discrete time by a conditionally inhomogeneousPoisson count process with rate function r(t), the log-likelihood (LL)of the model parameters given an observed set of spike countsRobs(t) is given (up to an overall constant) by:
LL~X
t
Robs(t)log r(t){r(t)ð Þ: ð2 Þ
To find the set of parameters that maximize the likelihood (eq.2), we adapt methods that allow for efficient parameteroptimization of the GLM [7]. First, we use a parametric spikingnonlinearity given by F[x] = alog[1+exp(b(x-h))], with scale a,shape b, and offset h. Other functions can be used, so long as theysatisfy conditions specified in [7]. This ensures that the likelihoodsurface will be concave with respect to linear parameters inside thespiking nonlinearity [7], and in practice will be well-behaved forother model parameters (see Fig. S1; Methods).
Figure 2. Schematic of LN and NIM structures. A) Schematic diagram of an LN model, with multiple filters (k1, k2, …) that define the linearstimulus subspace. The outputs of these linear filters (g1, g2, …) are then transformed into a firing rate prediction r(t) by the static nonlinear functionF[g1,g2,…], depicted at right for a two-dimensional subspace. Note that while the general LN model thus allows for a nonlinear dependence onmultiple stimulus dimensions, estimation of the function F[.] is typically only feasible for low (one- or two-) dimensional subspaces. B) Schematicillustration of a generic neuron that receives input from a set of ‘upstream’ neurons that are themselves driven by the stimulus s. Each of theupstream neurons provides input to the model neuron that is generally rectified due to spike generation (inset at left), and thus is either excitatory orinhibitory. The model neuron then integrates its inputs and produces a spiking output. C) Block diagram illustrating the structure of the NIM, basedon (B). The set of inputs are represented as (one-dimensional) LN models, with a corresponding stimulus filter ki, and ‘‘upstream nonlinearity’’ fi(.).These inputs are then linearly combined, with weights wi, and fed into the spiking nonlinearity F[.], resulting in the predicted firing rate r(t). The NIMthus has a ‘second-order LN’ structure (or LNLN), with the neuron’s own nonlinear processing shaped by the LN nature of its inputs.doi:10.1371/journal.pcbi.1003143.g002
A Nonlinear Neuronal Model of Sensory Processing
PLOS Computational Biology | www.ploscompbiol.org 4 July 2013 | Volume 9 | Issue 7 | e1003143
to the system is analogous to a reaction rate that depends on theconcentration of the reactants. For example, the change in theactive state is described by
dA
dt= inflow! outflow= kau ðtÞRðtÞ ! kfiAðtÞ; (Equation 1)
where R(t) and A(t) are the occupancies of the resting and activestates, ka and kfi are constants, and u (t) is the input that scalesthe activation rate constant, ka.
When a train of pulses of either small or large amplitude drivesthe four-state system, the larger input produces output pulseswith a smaller gain and also increases the baseline (Figure 2A).To produce dynamics with both fast and slow timescales, thefourth state (I2) couples to the first inactivated state (I1), usingslower rate constants. As a result, a slow shift in baseline occursfollowing a change in the amplitude of the input. The rateconstants in the four-state model are the rates of activation(ka), fast inactivation (kfi), fast recovery (kfr), slow inactivation(ksi), and slow recovery (ksr).
Although this four-statesystemcanproduceadaptivechanges,it lacks the temporal filtering and selectivity of retinal neurons. At
D
A
B
C
E
g(t) u(t)
N(g)
s(t)
Nonlinearity Kinetics
u ka
kfikfr
ksiu ksr
R A
I1
I2
r(t)
F(t)
u 60
320
1 0.03
R A
I1
I2u(t) A(t)
Figure 2. The LNK Model(A) A train of impulses that changed from low- to
high-amplitude is shownasan input, u (t), presented
to a first-order kinetic model with four states.
Numbers indicate rate constants for transitions
between the resting (R), active (A), and inactivated
states, (I1 and I2). The rateconstantbetween resting
and active states ismodulated by u (t). The output is
the occupancy of the active state (A(t)).
(B) The LNK model. The input, s(t), is convolved
with a linear temporal filter, FLNK(t), and then
passed through a static nonlinearity, NLNK(g), that
does not change with contrast. The output of
the nonlinearity, u (t), controls two rate constants
in the kinetics block, one that leads to the active
state and one that accelerates recovery from the
inactivated state, I2. Other rate constants are fixed,
and the output of the model r0ðtÞ is the active state.
(C) The membrane potential of an adapting ama-
crine cell compared to the LNK model output for a
transition to low contrast (left) and to high contrast
(right).
(D) The LNK model compared to the amacrine cell
response for three repeats of an identical stimulus
sequence.
(E) The distribution of the absolute difference in
membrane potential between responses to an
identical stimulus compared to the distribution of
the difference between the model output and
membrane potential responses. Results are
combined for six cells with three repeated
responses across the entire recording.
a fixed mean luminance, photoreceptorsare nearly linear. Strong rectification firstappears in amacrine and ganglion cells,coincidingwith strong contrast adaptation(Baccus and Meister, 2002; Kim andRieke, 2001; Rieke, 2001). This threshold
likely arises from voltage-dependent calcium channels in thebipolar cell synaptic terminal (Heidelberger and Matthews,1992), a point that would occur prior to adaptive changes insensitivity in the presynaptic terminal or postsynaptic membrane.Thus, we combined the adaptive system with a linear-nonlinearmodel, yielding a system with a linear temporal filter, a staticnonlinearity, and an adaptive kinetics block (Figure 2B). In thislinear-nonlinear-kinetic (LNK) model, the kinetics block contrib-utes both to the overall temporal filtering and the sensitivity ofthe system, making these properties depend on the input. Thus,the linear filter (FLNK) and nonlinearity (NLNK) of the LNK modelare not the same as the filter and nonlinearity, FLN and NLN,respectively, in an LN model fit to the entire response. To couplethe initial linear-nonlinear system to the kinetics block, theoutput of the nonlinearity, u (t), scales one or two rate constants.Although this means that the transition rate is proportional to thenonlinearity output, a higher-order dependence—such as thedependence of vesicle release on a higher power of the calciumconcentration—can be captured in the nonlinearity itself.We fit LNK models using a constrained optimization algorithm
(see Experimental Procedures). The filter and nonlinearity were
Neuron
The Computational Structure of Variance Adaptation
1004 Neuron 73, 1002–1015, March 8, 2012 ª2012 Elsevier Inc.
[Real, Asari, Gollisch & Meister 2017]
Nonlinear input model (NIM)[McFarland, Cui, & Butts 2013]
Linear-Nonlinear-Kinetics (LNK)[Ozuysal & Baccus 2014]
We have developed a general model for V1 neurons (Fig. 1),along with a direct method for fitting it to spiking data. Themodel has two channels (one excitatory, one suppressive), eachformed from a weighted sum of linear-nonlinear (LN) subunits,similar in concept to Hubel and Wiesel’s original description ofcomplex cell responses as resulting from a spatial combination ofsimple cells (Hubel and Wiesel, 1962), to Barlow and Levick’scharacterization of directionally selective retinal ganglion cells inthe rabbit (Barlow and Levick, 1965), and to Victor and Shapley’ssubunit model for Y-type ganglion cells in cat retina (Victor andShapley, 1979).
To make the fitting problem tractable, we assume that thesubunit filters of each channel differ only in spatial position andthat their nonlinearities are identical. The difference between theresponses of the two channels is transformed with a rectifyingnonlinearity to give the firing rate of the neuron. We have devel-oped a method for directly and efficiently estimating the model,and have tested it on data from V1 neurons driven by spatiotem-poral white noise. The results show that the fitted model outper-forms previously published functional models (specifically, theLN, energy, and spike-triggered covariance [STC]-based models)for all cells, in addition to providing a more biologically reason-able account of the origins of cortical receptive fields. A briefaccount of some of this work has appeared previously (Vintch etal., 2012).
Materials and MethodsElectrophysiologyWe recorded from 38 well-isolated single neurons in the primary visualarea (V1) of adult macaque monkeys (Macaca nemestrina and M. fascicu-laris; 6 males), using methods that are described in detail previously(Cavanaugh et al., 2002). Typical experiments spanned 5–7 d duringwhich animals were maintained in an anesthetized and paralyzed statethrough a continuous intravenous infusion of sufentanil citrate and ve-curonium bromide. Vital signs (temperature, heart rate, end-tidal PCO2
levels, blood pressure, EEG activity, and urine quantity, and specific
gravity) were continuously monitored and maintained within physiolog-ical limits. Eyes were treated with topical gentamicin, dilated with topicalatropine, and protected with gas-permeable hard contact lenses. Addi-tional corrective lenses were chosen via direct ophthalmoscopy to makethe retinae conjugate with the experimental display. All experimentalprocedures and animal care were performed in accordance to protocolsapproved by the New York University Animal Welfare Committee, andin compliance with the National Institute of Health Guide for the Care andUse of Laboratory Animals.
We recorded neuronal signals with quartz-platinum-tungsten micro-electrodes (Thomas Recording) lowered through a craniotomy and du-rotomy centered between 10 and 16 mm lateral to the midline andbetween 3 and 6 mm behind the lunate sulcus. We recorded across allcortical depths. Receptive fields were centered in the inferior quadrant ofthe visual field, between 2 and 5 degrees from the center of gaze. Theamplified signal from the electrode was bandpassed (300 Hz to 8 kHz)and routed to a time-amplitude discriminator, which detected and time-stamped spikes at a resolution of 0.1 ms.
Visual stimulationWe presented pixellated (XYT) noise stimuli on a gamma-corrected CRTmonitor (Eizo T966; mean luminance, 33 cd/m 2), at a resolution of1280 ! 960 pixels, with a refresh rate of 120 Hz, positioned 114 cm fromthe animal’s eyes. We generated stimuli pseudorandomly using Exposoftware (http://corevision.cns.nyu.edu) on an Apple Macintosh com-puter. The stimuli consisted of a pixel array (usually 16 ! 16) filled withwhite, ternary noise that was continuously refreshed at 40 Hz. The widthof the array was approximately double that of the receptive field (asmeasured with optimized drifting gratings) while still maintaining ade-quate pixel resolution to capture receptive field features. For a subset ofcells, we measured responses to a repeated “frozen” sample of noise forcross-validation. Each sample of frozen noise had identical statistics tothe main white-noise stimuli, lasted 25 s, and was repeated 20 times insuccession.
Data for flickering bar (XT) noise stimuli were taken from Rust et al.(2005); these data were collected similarly, and details can be found in theoriginal article. Stimuli were displayed using a Silicon graphics Octane-2workstation at 100 Hz. Each frame consisted of a square region contain-ing 16 adjacent parallel bars of the neuron’s preferred orientation, ar-
a
b
Figure 1. Subunit model for a single channel. a, A signal flow diagram describes how stimulus information is converted to a firing rate. The stimulus is passed through a bank of spatially shifted,but otherwise identical, linear-nonlinear subunits. The activity of these subunits is combined with a weighted sum over space (and optionally time; not shown), and passed through an outputnonlinearity to generate a firing rate. b, Responses of intermediate model stages are depicted for an example input (for simplicity, a single frame is shown, rather than a temporal sequence). Eachstage, except for the last, maps a spatially distributed array of inputs to a spatially distributed array of responses, depicted as pixel intensities in an image. The pooling stage sums over these responsesand applies the final output nonlinearity.
14830 • J. Neurosci., November 4, 2015 • 35(44):14829 –14841 Vintch et al. • A Subunit Model of V1 Responses
LNFDSNF modelconvolutional subunit model
[Vintch, Movshon & Simoncelli 2015, Wu, Park & Pillow 2014]
modeling. Repeated presentations of the same flicker sequencereliably evoked very similar spike trains (Figures 2A, 2B, andS2B), as expected from previous studies [9–11]. This suggeststhat essential features of the retina’s light response can becaptured by a deterministic model of the ganglion cell and itsinput circuitry [4]. In addition, we presented a long non-repeatingflicker sequence to explore as many spatiotemporal patterns aspossible. Thirty ganglion cells were selected for quantitativemodeling based on the stability of their responses throughoutthe extended recording period.
Modeling ApproachWe focused on predicting the firing rate of ganglion cells (GCs),namely the expected number of spikes fired in any given 1/60 sinterval. Mathematical models were constructed that take thetime course of the flicker stimulus as input and produce a timecourse of the firing rate at the output. The parameters of themodel were optimized to fit the long stretch of non-repeatingflicker (!80% of the data; the ‘‘training set’’). Specifically, wemaximized the fraction of variance in the firing rate that themodelexplains (Equation S10) [11]. Then the model performance wasevaluated on the remaining data examined with the repeated
flicker (!20%; the ‘‘test set’’). This performance metric wastracked across successive changes in the model structure.As a formalism, we chose so-called cascade models [4, 5].
These are networks of simple elements that involve either linearfiltering (convolution in space and time) or a static nonlineartransform. They map naturally onto neural circuitry (Figure 1)and can be adjusted from a coarse-grained version (everyneuron is an element) to arbitrarily fine-grained ones (multi-compartment models of every neuron and synapse).As a reference point, we chose the so-called LN model, con-
sisting of a single linear-nonlinear cascade (Figure 1B). Thishas been very popular in sensory neuroscience [12–14] andserves as a common starting point for fitting neural responses.This model was able to approximate the GC output (Figures2A, 2B, and S2B), though with a wide range of performance fordifferent neurons (Figures 2C and 2D). Even with optimized pa-rameters, however, the LN model predicts firing at times whenit should not, thus making the peaks of firing events wider andflatter than observed (Figures 2A, 2B, and S2B).Guided by knowledge of retinal anatomy, we then created a
sequence of four cascade models by systematically addingcomponents to the circuits (Figures 1C–1F). Each model derives
71
25
7
42
25
108...
output
Σ
stimulus
Σ
output
stimulus
B C
D
E
F G
...
stimulus
output
Σ
LN model LNSN model
LNSNF's GCM
LNFSNF's BCM
LNFDSNF model Parameters
BBBB
P
H
G
A
P
P P P PP
A Retinal circuit
BCM
GCM
Σ
ACM
Figure 1. A Progression of Circuit ModelsConstrained by Retinal Anatomy(A)Schematicof thecircuit upstreamofaganglioncell
in the vertebrate retina. Photoreceptors (P) transduce
the visual stimulus into electrical signals that propa-
gate through bipolar cells (B) to the ganglion cell (G).
At both synaptic stages, one finds both convergence
and divergence, as well as lateral signal flow carried
by horizontal (H) and amacrine (A) cells. The bipolar
cell and its upstream circuitry are modeled by a
spatiotemporal filter, a nonlinearity, and feedback
(bipolar cell module [BCM]; blue). The amacrine cell
introduces a delay in lateral propagation (amacrine
cell module [ACM]; red). The ganglion cell was
modeled by a weighted summation, another nonlin-
earity, and a second feedback function (ganglion cell
module [GCM]; green). Drawings after Polyak, 1941.
(B) LN model. A different temporal filter is applied to
the history of each bar in the stimulus. The outputs
of all of these filters are summed over space. The
resulting signal is passed through an instantaneous
nonlinearity.
(C) LNSN model. The stimulus is first processed
by partially overlapping, identical BCMs, each of
which consists of its own spatiotemporal filter and
nonlinearity. Their outputs are weighted and sum-
med by the GCM, which then applies another
instantaneous nonlinearity to give the model’s
output. For display purpose, the BCMs are shown
here to span only three stimulus bars, but they
spanned seven bars in the computations.
(D) LNSNF model. This is identical to the previous
one, except that the GCM (depicted here) has an
additional feedback loop around its nonlinearity.
(E) LNFSNF model. This is identical to the previous one, except that the BCMs (one of which is depicted here) have an additional feedback loop around their
nonlinearities. This new feedback function is the same for all BCMs.
(F) LNFDSNFmodel. This is identical to the previous one, except that there is a delay inserted between each BCM and the GCM. These delays are allowed to vary
independently for each BCM.
(G) A count of the free parameters in the LNFDSNF model, color coded as in the model diagram. Except for the total (108), the numbers here also apply to the
LNSN, LNSNF, and LNFSNF models. The LN model has 186 free parameters in the linear filter (31 spatial positions, each with six-parameter temporal filter as in
Equations S3–S5) and one in the nonlinearity. See also Figures S1 and S3.
190 Current Biology 27, 189–198, January 23, 2017
linear-nonlinear-feedback-delayed-sum-nonlinear-feedback
deep learning / deep neural networks (DNNs)
[Yamins et al 2014, Mcintosh et al 2016, Maheswaranathan et al 2017, Benjamin et al 2017, …]
… …
time8 subunits 16 subunits
convolution
convolution
denseresponses
Figure 1: A schematic of the model architecture. The stimulus was convolved with 8 learnedspatiotemporal filters whose activations were rectified. The second convolutional layer then projectedthe activity of these subunits through spatial filters onto 16 subunit types, whose activity was linearlycombined and passed through a final soft rectifying nonlinearity to yield the predicted response.
retina, potentially simplifying the retinal response to such stimuli [11, 12, 2, 10, 13]. In contrast tothe perceived linearity of the retinal response to coarse stimuli, the retina performs a wide variety ofnonlinear computations including object motion detection [6], adaptation to complex spatiotemporalpatterns [14], encoding spatial structure as spike latency [15], and anticipation of periodic stimuli[16], to name a few. However it is unclear what role these nonlinear computational mechanisms havein generating responses to more general natural stimuli.
To better understand the visual code for natural stimuli, we modeled retinal responses to natural imagesequences with convolutional neural networks (CNNs). CNNs have been successful at many patternrecognition and function approximation tasks [17]. In addition, these models cascade multiple layersof spatiotemporal filtering and rectification–exactly the elementary computational building blocksthought to underlie complex functional responses of sensory circuits. Previous work utilized CNNsto gain insight into the neural computations of inferotemporal cortex [18], but these models havenot been applied to early sensory areas where knowledge of neural circuitry can provide importantvalidation for such models.
We find that deep neural network models markedly outperform previous models in predicting retinalresponses both for white noise and natural scenes. Moreover, these models generalize better to unseenstimulus classes, and learn internal features consistent with known retinal properties, includingsub-Poisson variability, feedforward inhibition, and contrast adaptation. Our findings indicate thatCNNs can reveal both neural computations and mechanisms within a multilayered neural circuitunder natural stimulation.
2 Methods
The spiking activity of a population of tiger salamander retinal ganglion cells was recorded in responseto both sequences of natural images jittered with the statistics of eye movements and high resolutionspatiotemporal white noise. Convolutional neural networks were trained to predict ganglion cellresponses to each stimulus class, simultaneously for all cells in the recorded population of a givenretina. For a comparison baseline, we also trained linear-nonlinear models [19] and generalizedlinear models (GLMs) with spike history feedback [2]. More details on the stimuli, retinal recordings,experimental structure, and division of data for training, validation, and testing are given in theSupplemental Material.
2.1 Architecture and optimization
The convolutional neural network architecture is shown in Figure 2.1. Model parameters wereoptimized to minimize a loss function corresponding to the negative log-likelihood under Poissonspike generation. Optimization was performed using ADAM [20] via the Keras and Theano softwarelibraries [21]. The networks were regularized with an `2 weight penalty at each layer and an `1activity penalty at the final layer, which helped maintain a baseline firing rate near 0 Hz.
2
stimulus
If you understand GLMs … you understand Deep Nets!
•Stack multiple LNs of top of each other: LN LN LN LN-P•Use Gradient Ascent to maximize log-likelihood•Use Software Tools (Theano, Tensorflow, Autograd) to calculate gradients for you (no more high-school calculus needed!)
•Use a bunch of tricks (noise, adaptive gradients, …)•Do NOT worry about local maxima!
Everything you need to know about DNNs, in one slide, without equations!
If you understand GLMs… you understand DNNs!
• stack many LNs on top of each other: LN LN LN LN P• use gradient ascent to maximize likelihood• use software (tensorflow, theano) to compute gradients
(no more computing gradients by hand!)• use a bunch of tricks (batches, noise, SGD, dropout, ….)• do NOT worry about local maxima!
[credit: Jakob Macke]
Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee
Miller1,2,3, Konrad Paul Kording1,2,3
1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of
Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA
Contact: Ari Benjamin, [email protected]
Abstract
Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.
Introduction
A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.
A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.
The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.
The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee
Miller1,2,3, Konrad Paul Kording1,2,3
1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of
Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA
Contact: Ari Benjamin, [email protected]
Abstract
Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.
Introduction
A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.
A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.
The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.
The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
Fig 2
6
learning methods with their own approaches for encoding models.
To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple
and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the
Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.
Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
6
learning methods with their own approaches for encoding models.
To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple
and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the
Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.
Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
7
spiking of M1 neurons from the cosine and sine of the direction of hand movement in the reaching task. (The linear combination of a sine and cosine curve is a phase-shifted cosine, by identity, allowing the GLM to learn the proper preferred direction). We observed that each method identified a similar tuning curve (Fig. 2b), constructed by plotting the predictions of spike rate on the validation set against movement direction. The bulk of the neurons in the dataset were just as well predicted by each of the methods (Fig. 2a, c), though the ensemble was slightly better than the GLM (mean comparative pseudo-R2, defined in methods, of 0.06 [0.043 – 0.084], 95% bootstrapped confidence interval (CI)). The similar performance suggested that an exponentiated cosine is a nearly optimal approximating function of the neural response to movement direction alone, as was previously known (42). This classic example thus illustrated that all methods can in principle estimate tuning curves.
The exact form of the nonlinearity of the neural response to a given feature is rarely known, but this lack of knowledge need not impact our prediction ability. To illustrate the ability of modern machine learning to find the proper nonlinearity, we performed the same analysis as above but omitted the initial cosine feature engineering step. Trained on only the hand velocity direction, in radians, which are likely to be
discontinuous at ±π, the modern ML methods very nearly reproduced the predictive power they attained using the engineered feature (Fig. 3a). As expected, the GLM failed at generating a meaningful tuning curve (Fig. 3b). Both trends were consistent across the population of recorded neurons (Fig. 3c). The neural net, XGBoost, and ensemble methods thus perform well without feature engineering and the required prior knowledge or assumptions.
Machine learning methods can also take advantage of information contained in combinations of inputs, and should perform better if given more inputs. We verified that this was true for our dataset by training on the four-dimensional set of hand position and velocity !, #, !, # , which we call the set of original features. All
methods gained a significant amount of predictive power with these new features, though the GLM did not nearly match the other methods (Fig 4a, c). This set of neurons thus seemed to encode strongly for position and velocity in a potentially nonlinear fashion captured by machine learning methods.
While some amount of feature engineering can improve the performance of GLMs, it is not always simple to guess the optimal set of processed features. We demonstrated this by training all methods on features that have previously been successful at explaining spike
Figure 4: Training on the set of original features (!, #, !̇, #̇) increased the predictive power of all methods. Note the change in axes scales from Figures 2-3. (a) For the same example neuron as in Figure 3, all methods gained a significant amount of predictive power, indicating a strong encoding of position and speed or their correlates. The GLM showed less predictive power than the other methods on this feature set. (b) The spike rate in black, with jitter on the y-axis, again overlaid with the predictions of the three methods as a function of velocity direction. The neuron encodes for position and speed, as well, and the projection of the multidimensional tuning curve onto a 1D velocity direction dependence leaves the projected curve diffuse. (c) The ensemble method, neural network, and XGBoost performed consistently better than the GLM across the population. The mean pseudo-R2 scores show the hierarchy of success across methods.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
8
rate in a similar center-out reaching task (6). These extra features included the sine and cosine of velocity direction (as in Figure 2), the speed, the radial distance of hand position, and the sine and cosine of position direction. The training set was thus 10-dimensional, though highly redundant, and was aimed at maximizing the predictive power of the GLM. Feature engineering improved the predictive power of all methods to variable degrees, with the GLM improving to the level of the neural network (Fig. 5). XGBoost and the ensemble still predicted spikes better than the GLM (Fig. 5c), with the ensemble scoring on average 1.8 times higher than the GLM (ratio of population means of 1.8 [1.4 – 2.2], 95% bootstrapped CI). The ensemble was significantly better than XGBoost (mean comparative pseudo-R2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) and was thus consistently the best predictor. Though standard feature engineering greatly improved the GLM, the ensemble and XGBoost still captured the neural response more accurately.
To ensure that these results are not specific to the motor cortex, we extended the same analyses to primary somatosensory cortex (S1) data. The ensemble was consistently the best predictor across all neurons, scoring almost twice as well as the GLM (ratio of 1.8 [1.2 – 2.2] of population means, 95% bootstrapped CI). XGBoost predicted spikes better than the GLM only for
neurons with significant effect sizes for any of the four methods (i.e., with cross-validated pseudo-R2 scores two standard deviations above 0; mean comparative pseudo-R2 was 0.002 [0.0006 – 0.0045], 95% bootstrapped CI). Interestingly, the neural network performed worse than all other methods. We speculated that this could be related to the small covariate effect size in the S1 dataset, as we observed similar scores for the neural network on the M1 dataset for regimes of similar effect sizes, as well as on simulated data with GLM structure, small effect size, and similar firing rates (Supp. Fig. 2). We also found that a much smaller network performed better (a single hidden layer with 20 nodes) but that max-norm or elastic-net regularization did not improve the results with the larger network. Neural networks may thus be poor choices for Poisson data with very small covariate effect sizes, though we see no theoretical reason why this should be the case. Overall, on this S1 dataset featuring generally low predictability, the tested methods displayed a range of performances, with the ensemble predicting the data nearly twice as well as the GLM alone.
We asked if the same trends of performance would hold for the rat hippocampus dataset, which was characterized by very low mean firing rates but strong effect sizes. All methods were trained on a list of features representing the rat position and orientation, as
Figure 5: Encoding models for M1 trained on all the original features plus the engineered features show that modern ML methods can outperform the GLM even with standard featuring engineering. (a) For this example neuron, inclusion of the computed features increased the predictive power of the GLM to the level of the neural net. XGBoost and the ensemble method also increased in predictive power. (b) The tuning curves for the example neuron are diffuse when projected onto the movement direction, indicating a high-dimensional dependence. (c) Even with feature engineering, XGBoost and the ensemble consistently achieve pseudo-R2 scores higher than the GLM, though the neural net does not. The selected neuron at left is marked with black arrows.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
Fig 3 Fig 4 Fig 5
9
described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.
Discussion We contrasted the performance of GLMs with recent
machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.
The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-
Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
9
described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.
Discussion We contrasted the performance of GLMs with recent
machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.
The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-
Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
macaque S1 hippocampus
macaque M1
Fig 6
Fig 6
Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee
Miller1,2,3, Konrad Paul Kording1,2,3
1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of
Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA
Contact: Ari Benjamin, [email protected]
Abstract
Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.
Introduction
A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.
A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.
The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.
The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
Modern machine learning far outperforms GLMs at predicting spikes Ari S. Benjamin1, Hugo L. Fernandes2, Tucker Tomlinson3, Pavan Ramkumar2,4, Chris VerSteeg1, Lee
Miller1,2,3, Konrad Paul Kording1,2,3
1.! Department of Biomedical Engineering, Northwestern University,, Evanston, IL, 60208, USA 2.! Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of
Chicago, Chicago,�IL, 60611, USA 3.! Department of Physiology, Northwestern University, Chicago, IL, 60611, USA 4.! Department of Neurobiology, Northwestern University, Evanston, IL, 60208, USA
Contact: Ari Benjamin, [email protected]
Abstract
Neuroscience has long focused on finding encoding models that effectively ask “what predicts neural spiking?” and generalized linear models (GLMs) are a typical approach. Modern machine learning techniques have the potential to perform better. Here we directly compared GLMs to three leading methods: feedforward neural networks, gradient boosted trees, and stacked ensembles that combine the predictions of several methods. We predicted spike counts in macaque motor (M1) and somatosensory (S1) cortices from reaching kinematics, and in rat hippocampal cells from open field location and orientation. In general, the modern methods produced far better spike predictions and were less sensitive to the preprocessing of features. XGBoost and the ensemble were the best-performing methods and worked well even on neural data with very low spike rates. This overall performance suggests that tuning curves built with GLMs are at times inaccurate and can be easily improved upon. Our publicly shared code uses standard packages and can be quickly applied to other datasets. Encoding models built with machine learning techniques more accurately predict spikes and can offer meaningful benchmarks for simpler models.
Introduction
A central tool of neuroscience is the tuning curve, which maps stimulus to neural response. The tuning curve asks what information in the external world a neuron encodes in its spikes. For a tuning curve to be meaningful it is important that it accurately predicts the neural response. Often, however, methods are chosen that sacrifice accuracy for simplicity. Predictive methods for tuning curves should instead be evaluated primarily by their ability to describe neural activity accurately.
A common predictive model is the Generalized Linear Model (GLM), occasionally referred to as a linear-nonlinear Poisson (LNP) cascade (1-4). The GLM performs a nonlinear operation upon a linear combination of the input features, which are often called external covariates. Typical covariates are stimulus features, movement vectors, or the animal’s location.
The nonlinear operation on the weighted sum of covariates is usually held fixed, though it can be learned (5, 6), and the linear weights of the combined inputs are chosen to maximize the agreement between the model fit and the neural recordings. This optimization problem of choosing weights is often convex and can be solved with efficient algorithms (7). The assumption of Poisson firing statistics can often be loosened (8) allowing the modeling of a broad range of neural responses. Due to its ease of use, perceived interpretability, and flexibility, the GLM has become a popular model of neural spiking.
The GLM’s central assumption of linearity in feature space may hold in certain cases (8, 9), but in general, neural responses can be very nonlinear (5, 10). When a neuron responds nonlinearly to stimulus features, it is common practice to mathematically transform the features to obtain a new set that meets the linearity requirements of the GLM and yields better spike
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
Fig 2
6
learning methods with their own approaches for encoding models.
To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple
and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the
Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.
Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
6
learning methods with their own approaches for encoding models.
To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple
and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the
Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.
Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
7
spiking of M1 neurons from the cosine and sine of the direction of hand movement in the reaching task. (The linear combination of a sine and cosine curve is a phase-shifted cosine, by identity, allowing the GLM to learn the proper preferred direction). We observed that each method identified a similar tuning curve (Fig. 2b), constructed by plotting the predictions of spike rate on the validation set against movement direction. The bulk of the neurons in the dataset were just as well predicted by each of the methods (Fig. 2a, c), though the ensemble was slightly better than the GLM (mean comparative pseudo-R2, defined in methods, of 0.06 [0.043 – 0.084], 95% bootstrapped confidence interval (CI)). The similar performance suggested that an exponentiated cosine is a nearly optimal approximating function of the neural response to movement direction alone, as was previously known (42). This classic example thus illustrated that all methods can in principle estimate tuning curves.
The exact form of the nonlinearity of the neural response to a given feature is rarely known, but this lack of knowledge need not impact our prediction ability. To illustrate the ability of modern machine learning to find the proper nonlinearity, we performed the same analysis as above but omitted the initial cosine feature engineering step. Trained on only the hand velocity direction, in radians, which are likely to be
discontinuous at ±π, the modern ML methods very nearly reproduced the predictive power they attained using the engineered feature (Fig. 3a). As expected, the GLM failed at generating a meaningful tuning curve (Fig. 3b). Both trends were consistent across the population of recorded neurons (Fig. 3c). The neural net, XGBoost, and ensemble methods thus perform well without feature engineering and the required prior knowledge or assumptions.
Machine learning methods can also take advantage of information contained in combinations of inputs, and should perform better if given more inputs. We verified that this was true for our dataset by training on the four-dimensional set of hand position and velocity !, #, !, # , which we call the set of original features. All
methods gained a significant amount of predictive power with these new features, though the GLM did not nearly match the other methods (Fig 4a, c). This set of neurons thus seemed to encode strongly for position and velocity in a potentially nonlinear fashion captured by machine learning methods.
While some amount of feature engineering can improve the performance of GLMs, it is not always simple to guess the optimal set of processed features. We demonstrated this by training all methods on features that have previously been successful at explaining spike
Figure 4: Training on the set of original features (!, #, !̇, #̇) increased the predictive power of all methods. Note the change in axes scales from Figures 2-3. (a) For the same example neuron as in Figure 3, all methods gained a significant amount of predictive power, indicating a strong encoding of position and speed or their correlates. The GLM showed less predictive power than the other methods on this feature set. (b) The spike rate in black, with jitter on the y-axis, again overlaid with the predictions of the three methods as a function of velocity direction. The neuron encodes for position and speed, as well, and the projection of the multidimensional tuning curve onto a 1D velocity direction dependence leaves the projected curve diffuse. (c) The ensemble method, neural network, and XGBoost performed consistently better than the GLM across the population. The mean pseudo-R2 scores show the hierarchy of success across methods.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
8
rate in a similar center-out reaching task (6). These extra features included the sine and cosine of velocity direction (as in Figure 2), the speed, the radial distance of hand position, and the sine and cosine of position direction. The training set was thus 10-dimensional, though highly redundant, and was aimed at maximizing the predictive power of the GLM. Feature engineering improved the predictive power of all methods to variable degrees, with the GLM improving to the level of the neural network (Fig. 5). XGBoost and the ensemble still predicted spikes better than the GLM (Fig. 5c), with the ensemble scoring on average 1.8 times higher than the GLM (ratio of population means of 1.8 [1.4 – 2.2], 95% bootstrapped CI). The ensemble was significantly better than XGBoost (mean comparative pseudo-R2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) and was thus consistently the best predictor. Though standard feature engineering greatly improved the GLM, the ensemble and XGBoost still captured the neural response more accurately.
To ensure that these results are not specific to the motor cortex, we extended the same analyses to primary somatosensory cortex (S1) data. The ensemble was consistently the best predictor across all neurons, scoring almost twice as well as the GLM (ratio of 1.8 [1.2 – 2.2] of population means, 95% bootstrapped CI). XGBoost predicted spikes better than the GLM only for
neurons with significant effect sizes for any of the four methods (i.e., with cross-validated pseudo-R2 scores two standard deviations above 0; mean comparative pseudo-R2 was 0.002 [0.0006 – 0.0045], 95% bootstrapped CI). Interestingly, the neural network performed worse than all other methods. We speculated that this could be related to the small covariate effect size in the S1 dataset, as we observed similar scores for the neural network on the M1 dataset for regimes of similar effect sizes, as well as on simulated data with GLM structure, small effect size, and similar firing rates (Supp. Fig. 2). We also found that a much smaller network performed better (a single hidden layer with 20 nodes) but that max-norm or elastic-net regularization did not improve the results with the larger network. Neural networks may thus be poor choices for Poisson data with very small covariate effect sizes, though we see no theoretical reason why this should be the case. Overall, on this S1 dataset featuring generally low predictability, the tested methods displayed a range of performances, with the ensemble predicting the data nearly twice as well as the GLM alone.
We asked if the same trends of performance would hold for the rat hippocampus dataset, which was characterized by very low mean firing rates but strong effect sizes. All methods were trained on a list of features representing the rat position and orientation, as
Figure 5: Encoding models for M1 trained on all the original features plus the engineered features show that modern ML methods can outperform the GLM even with standard featuring engineering. (a) For this example neuron, inclusion of the computed features increased the predictive power of the GLM to the level of the neural net. XGBoost and the ensemble method also increased in predictive power. (b) The tuning curves for the example neuron are diffuse when projected onto the movement direction, indicating a high-dimensional dependence. (c) Even with feature engineering, XGBoost and the ensemble consistently achieve pseudo-R2 scores higher than the GLM, though the neural net does not. The selected neuron at left is marked with black arrows.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
Fig 3 Fig 4 Fig 5
9
described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.
Discussion We contrasted the performance of GLMs with recent
machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.
The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-
Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
9
described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.
Discussion We contrasted the performance of GLMs with recent
machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.
The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-
Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
macaque S1 hippocampus
GLMs NNs?
6
learning methods with their own approaches for encoding models.
To test that all methods work reasonably well in a trivial case, we trained each to predict spiking from a simple
and well-understood feature. Some neurons in M1 have been described as responding linearly to the exponentiated cosine of movement direction relative to a preferred angle (41). We therefore predicted the
Figure 2: Encoding models for M1 performed similarly when trained on the sine and cosine of hand velocity direction. (a) The pseudo-R2 for an example neuron was similar for all four methods. On this figure and in Figures 3-5 the example neuron is the same, and is not the neuron for which method hyperparameters were optimized. (b) The tuning curves of the neural net and XGBoost were similar to that of the GLM. The black points are the recorded responses, to which we added y-axis jitter for visualization. The tuning curve of the ensemble method was similar and is omitted here for clarity. (c) Plotting the pseudo-R2 of modern ML methods vs. that of the GLM indicates that the similarity of methods generalizes across neurons. The single neuron plotted at left is marked with black arrows. The mean scores, inset, indicate the overall success of the methods; error bars represent the 95% bootstrap confidence interval.
Figure 3: Modern ML models could learn the cosine nonlinearity when trained on only the direction of hand velocity, in radians. (a) For the same example neuron as in Figure 3, the neural net and XGBoost maintained the same predictive power, while the GLM was unable to extract a relationship between direction and spike rate. (b) XGBoost and neural nets displayed reasonable tuning curves, while the GLM reduced to the average spiking rate (with a small slope, in this case). (c) Most neurons in the population were poorly fit by the GLM, while the ML methods achieved the performance levels of Figure 2. The ensemble performed the best of the methods tested.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
8
rate in a similar center-out reaching task (6). These extra features included the sine and cosine of velocity direction (as in Figure 2), the speed, the radial distance of hand position, and the sine and cosine of position direction. The training set was thus 10-dimensional, though highly redundant, and was aimed at maximizing the predictive power of the GLM. Feature engineering improved the predictive power of all methods to variable degrees, with the GLM improving to the level of the neural network (Fig. 5). XGBoost and the ensemble still predicted spikes better than the GLM (Fig. 5c), with the ensemble scoring on average 1.8 times higher than the GLM (ratio of population means of 1.8 [1.4 – 2.2], 95% bootstrapped CI). The ensemble was significantly better than XGBoost (mean comparative pseudo-R2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) and was thus consistently the best predictor. Though standard feature engineering greatly improved the GLM, the ensemble and XGBoost still captured the neural response more accurately.
To ensure that these results are not specific to the motor cortex, we extended the same analyses to primary somatosensory cortex (S1) data. The ensemble was consistently the best predictor across all neurons, scoring almost twice as well as the GLM (ratio of 1.8 [1.2 – 2.2] of population means, 95% bootstrapped CI). XGBoost predicted spikes better than the GLM only for
neurons with significant effect sizes for any of the four methods (i.e., with cross-validated pseudo-R2 scores two standard deviations above 0; mean comparative pseudo-R2 was 0.002 [0.0006 – 0.0045], 95% bootstrapped CI). Interestingly, the neural network performed worse than all other methods. We speculated that this could be related to the small covariate effect size in the S1 dataset, as we observed similar scores for the neural network on the M1 dataset for regimes of similar effect sizes, as well as on simulated data with GLM structure, small effect size, and similar firing rates (Supp. Fig. 2). We also found that a much smaller network performed better (a single hidden layer with 20 nodes) but that max-norm or elastic-net regularization did not improve the results with the larger network. Neural networks may thus be poor choices for Poisson data with very small covariate effect sizes, though we see no theoretical reason why this should be the case. Overall, on this S1 dataset featuring generally low predictability, the tested methods displayed a range of performances, with the ensemble predicting the data nearly twice as well as the GLM alone.
We asked if the same trends of performance would hold for the rat hippocampus dataset, which was characterized by very low mean firing rates but strong effect sizes. All methods were trained on a list of features representing the rat position and orientation, as
Figure 5: Encoding models for M1 trained on all the original features plus the engineered features show that modern ML methods can outperform the GLM even with standard featuring engineering. (a) For this example neuron, inclusion of the computed features increased the predictive power of the GLM to the level of the neural net. XGBoost and the ensemble method also increased in predictive power. (b) The tuning curves for the example neuron are diffuse when projected onto the movement direction, indicating a high-dimensional dependence. (c) Even with feature engineering, XGBoost and the ensemble consistently achieve pseudo-R2 scores higher than the GLM, though the neural net does not. The selected neuron at left is marked with black arrows.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
(No of course not!)
GLM is a special case of NN!
9
described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.
Discussion We contrasted the performance of GLMs with recent
machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.
The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-
Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
9
described in methods. We found that many neurons were described much better by XGBoost and the ensemble method than by the GLM (Fig. 6b). On average, the ensemble was almost ten times more predictive than the GLM (ratio of population means of 9.8 [5.4 – 100.0], 95% bootstrapped CI), and many neurons shifted from being completely unpredictable by the GLM (pseudo-R2 near zero) to very predictable by XGBoost and the ensemble (pseudo-R2 above 0.2). The neural network performed poorly, here not due to effect size as in S1 but likely due to the very low firing rates of most hippocampal cells (Supp. Fig. 2). Out of the 58 neurons in the dataset, 54 had rates below 1 spikes/ second, and it was only on the four high-firing neurons that the neural network achieved pseudo-R2 scores comparable to the GLM. The relative success of XGBoost was interesting given the failure of the neural network, and supported the general observation that XGBoost can work well with smaller and sparser datasets than those neural networks generally require. Thus for hippocampal cells, a method leveraging decision trees such as XGBoost or the ensemble is able to capture far more structure in the neural response than the GLM or the neural network.
Discussion We contrasted the performance of GLMs with recent
machine learning techniques at the task of predicting spike rates in three brain regions. We found that the tested ML methods predicted spike rates far more accurately than the GLM. Typical feature engineering only partially bridged the performance gap. The ML methods performed comparably well with and without feature engineering, indicating they could serve as convenient performance benchmarks for improving simpler encoding models. The consistently best method was the ensemble, which was an instance of XGBoost stacked on the predictions of the GLM, neural network, XGBoost, and a random forest. The ensemble and XGBoost could fit the data well even for very low spike rates, as in the hippocampus dataset, and for very low covariate effect sizes, as in the S1 dataset. These findings indicate that GLMs are not the best choice as neuroscience’s standard method of spike prediction.
The ML methods we have put forward here have been implemented without substantial modification from methods that are already in wide use. We hope that this simple application might spur a wider adoption of these methods in the neurosciences, thereby increasing the power and efficiency of studies involving neural prediction without requiring complicated, application-
Figure 6: XGBoost and the ensemble method predicted the activity of neurons in S1 and the hippocampus better than a GLM. The diagonal dotted line in both plots is the line of equal predictive power with the GLM. (a) The ensemble predicted firing almost twice as well, on average, as the GLM for all neurons in the S1 dataset. XGBoost was better for neurons with higher effect sizes but poorly predicted neurons that were not predictable by any method. The neural network performed the worst of all methods. (b) Many neurons in the rat hippocampus were described well by XGBoost and the ensemble but poorly by the GLM and the neural network. The poor neural network performance in the hippocampus was due to the low rate of firing of most neurons in the dataset (Supp. Fig. 2). Note the difference in axes; hippocampal cells are generally more predictable than those in S1.
. CC-BY-NC-ND 4.0 International licensepeer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was not. http://dx.doi.org/10.1101/111450doi: bioRxiv preprint first posted online Feb. 24, 2017;
stimulus
spikes
membrane potential
imaging
neural activity
encoding models
encoding models
What if there’s no stimulus?
spikes
membrane potential
imaging
neural activity
encoding models
encoding models
What if there’s no stimulus?
spikes
membrane potential
imaging
neural activity
latent variable models
latent encoding models
latent variable(unobserved or
“hidden”)
<latexit sha1_base64="W03iimHuVnRwoVFYpQGh0ykBtHs=">AAACM3icbZDLSgMxFIYz3q13XYoQLIKrMuNG3QkiuGzBqtCpkklPNTSXITlTW4Z5Arf6Kr6L4Erc+g6mdRZaPZDw83/nkJM/SaVwGIavwdT0zOzc/MJiZWl5ZXVtfWPz0pnMcmhyI429TpgDKTQ0UaCE69QCU4mEq6R3OuJXfbBOGH2BwxTait1p0RWcobcag9v1algLx0X/iqgUVVJW/XYj2Ik7hmcKNHLJnGtFYYrtnFkUXEJRiTMHKeM9dgctLzVT4Nr5eNOC7nmnQ7vG+qORjt2fEzlTzg1V4jsVw3s3yUbmf6yVYfeonQudZgiafz/UzSRFQ0ffph1hgaMcesG4FX5Xyu+ZZRx9OJVKrOGBG6WY7uRxv1f4CzjtFRNgUILBBECbFvlNjCYtfKDRZHx/RfOgdlyLGmH15KxMdoFsk12yTyJySE7IOamTJuEEyCN5Is/BS/AWvAcf361TQTmzRX5V8PkF+AGrsg==</latexit>
<latexit sha1_base64="Y1Rdd0vDJoDdlc8eOTZsZtbYSw8=">AAACgXicbVFdb9MwFHXDgK18rIXHvVhUSB1CldNRuoqXSRMSj0WibKgJlePcdlb9EdnO1irkp/DKfhP/BqfLA2RcydbROffI1+cmmeDWEfK7FTzYe/jo8f5B+8nTZ88PO90XX63ODYMZ00Kby4RaEFzBzHEn4DIzQGUi4CJZn1f6xTUYy7X64rYZxJKuFF9yRp2nFp3utL/9sTmOjIG0mPY3x+Wi0yMDsit8H4Q16KG6potu61uUapZLUI4Jau08JJmLC2ocZwLKdpRbyChb0xXMPVRUgo2L3ewlfu2ZFC+18Uc5vGP/dhRUWruVie+U1F3ZplaR/9PmuVuexgVXWe5AsbuHlrnATuMqCJxyA8yJrQeUGe5nxeyKGsqcj6vdjhTcMC0lVWkRXa9LfwHD67IhbGph0xCcycrie+R01hCqoL1n98+5WSVxQQbvJ6fkZPLWhzx+NwqJB+HJeDgaldUywmb098FsOJgMws+kd/ax3so+OkKvUB+FaIzO0Cc0RTPE0A36iX6h22AveBOQYHjXGrRqz0v0TwUf/gB49sGr</latexit>
50
Neu
ron
inde
x
Time
(simulated data)
spike responses
0
Time
Trajectories
inferred latent variables
[credit: Jakob Macke]
[credit: Jakob Macke]
spike response y
nonlinearfunction noise
common input x
couplings C
latent variable models = GLMs where we don’t know<latexit sha1_base64="ql7PqUzNbTp1EoEFdNvIvQdBct0=">AAACb3icbZHdahNBFMcna9Uav1q9EQplMAgKEmZbY5q7ggi9bKGxlexaZmdP0iHzscycrQnLPoG39uH6Gj6Bk3Qv6tYDM/z5/+Yw5yMrlPTI2E0nerDx8NHjzSfdp8+ev3i5tf3qm7elEzAWVll3nnEPShoYo0QF54UDrjMFZ9n8y4qfXYHz0ppTXBaQaj4zcioFx2CdLC62eqzP1kHvi7gRPdLE8cV253uSW1FqMCgU934SswLTijuUQkHdTUoPBRdzPoNJkIZr8Gm1rrSm74KT06l14Rika/duRsW190udhZea46Vvs5X5PzYpcXqQVtIUJYIRtx9NS0XR0lXbNJcOBKplEFw4GWql4pI7LjAMp9tNDPwUVmtu8iq5mtfhAkHndQssGrBoAXRFXf1I0BYt4BzkIWfd58TNsrRi/c+jA7Y/+hiGPPw0iFkQ8f5wbzCo67CMuD36+2K81x/14xPWO/zabGWT7JC35D2JyZAckiNyTMZEECC/yG9y3fkTvYl2I3r7NOo0Oa/JPxF9+AvvT7wp</latexit>
<latexit sha1_base64="lz4ifoyFRNHWQ7mKU1FpWBpsYiU=">AAACb3icbZHbahsxEIblTdqm7imHm0IhiJpCC8Vok7qO7wKl0MsE4ibFuw1a7dgR1mGRZpOYZZ8gt+nD9TX6BJWdvUg3HZD4+T8NmkNWKOmRsd+daG390eMnG0+7z56/ePlqc2v7u7elEzAWVll3lnEPShoYo0QFZ4UDrjMFp9n8y5KfXoLz0poTXBSQaj4zcioFx2AdL843e6zPVkEfirgRPdLE0flW50eSW1FqMCgU934SswLTijuUQkHdTUoPBRdzPoNJkIZr8Gm1qrSm74KT06l14RikK/d+RsW19wudhZea44Vvs6X5PzYpcXqQVtIUJYIRdx9NS0XR0mXbNJcOBKpFEFw4GWql4oI7LjAMp9tNDFwJqzU3eZVczutwgaDzugWuG3DdAuiKuvqZoC1awDnIQ86qz4mbZWnF+p9HB2x/9DEMefhpELMg4v3h3mBQ12EZcXv0D8V4rz/qx8esd/i12coGeUPekvckJkNySL6RIzImggC5IbfkV+dP9Drajejd06jT5OyQfyL68BfxWrwq</latexit>
<latexit sha1_base64="ql7PqUzNbTp1EoEFdNvIvQdBct0=">AAACb3icbZHdahNBFMcna9Uav1q9EQplMAgKEmZbY5q7ggi9bKGxlexaZmdP0iHzscycrQnLPoG39uH6Gj6Bk3Qv6tYDM/z5/+Yw5yMrlPTI2E0nerDx8NHjzSfdp8+ev3i5tf3qm7elEzAWVll3nnEPShoYo0QF54UDrjMFZ9n8y4qfXYHz0ppTXBaQaj4zcioFx2CdLC62eqzP1kHvi7gRPdLE8cV253uSW1FqMCgU934SswLTijuUQkHdTUoPBRdzPoNJkIZr8Gm1rrSm74KT06l14Rika/duRsW190udhZea46Vvs5X5PzYpcXqQVtIUJYIRtx9NS0XR0lXbNJcOBKplEFw4GWql4pI7LjAMp9tNDPwUVmtu8iq5mtfhAkHndQssGrBoAXRFXf1I0BYt4BzkIWfd58TNsrRi/c+jA7Y/+hiGPPw0iFkQ8f5wbzCo67CMuD36+2K81x/14xPWO/zabGWT7JC35D2JyZAckiNyTMZEECC/yG9y3fkTvYl2I3r7NOo0Oa/JPxF9+AvvT7wp</latexit>
?
goal: find shared structure underlying y
chicken and egg problem
sensory encoding model
x
y
logP (Y |X) =X
t
yt log �t � �t
Why are latent variable models hard to work with?
• hard to compute likelihood!
fit using:Poisson GLM:
sensory encoding model
x
y
logP (Y |X) =X
t
yt log �t � �t
latentvariable model
Why are latent variable models hard to work with?
• hard to compute likelihood!
x
y
<latexit sha1_base64="lhxhmta7V1DdOurbFKk1Lo00a9U=">AAACeHicbZHLbhMxFIad4VbCpReWbAwRopVQ5ElJ0+wqISSWQSK0JTNUHs9JasWXkX2mNBrNW7CF9+JZ2OCks4ApR7L16/985HPJCiU9MvarE925e+/+g62H3UePnzzd3tnd++xt6QRMhVXWnWXcg5IGpihRwVnhgOtMwWm2fLfmp1fgvLTmE64KSDVfGDmXgmOwviTKLuhk//yAXuz0WJ9tgt4WcSN6pInJxW7nPMmtKDUYFIp7P4tZgWnFHUqhoO4mpYeCiyVfwCxIwzX4tNqUXNNXwcnp3LpwDNKN+3dGxbX3K52Fl5rjpW+ztfk/NitxfpxW0hQlghE3H81LRdHSdf80lw4EqlUQXDgZaqXikjsuMEyp200MfBNWa27yKrla1uECQZd1C1w34LoF0BV19TVBW7SAc5CHnE2fM7fI0or1j8bH7HD8Jgx59HYYsyDiw9FgOKzrsIy4PfrbYjroj/vxR9Y7ed9sZYs8Jy/JPonJiJyQD2RCpkQQQ76TH+Rn53f0InodHdw8jTpNzjPyT0SDPyolvuM=</latexit>
<latexit sha1_base64="kkgOIzcbbGBUSnoMenwIov6gI1Y=">AAACinicbVFdaxNBFJ2s1dZobapv9mVoECpImG0a0yBCwQp9TMHYlOwaZmdv0iHzsczM1oR1wV/jq/4d/42TdB9064WZOZxzL3PvuUkmuHWE/G4ED7YePtreedx88nT32V5r//lnq3PDYMS00GacUAuCKxg57gSMMwNUJgKuksWHtX51C8ZyrT65VQaxpHPFZ5xR56lp6+V7HAk9xxFXDg+Prr+NX/vHX+ly2mqTDtkEvg/CCrRRFcPpfuM6SjXLJSjHBLV2EpLMxQU1jjMBZTPKLWSULegcJh4qKsHGxWaIEr/yTIpn2vjje9mwf1cUVFq7konPlNTd2Lq2Jv+nTXI3O40LrrLcgWJ3H81ygZ3Ga0dwyg0wJ1YeUGa47xWzG2ooc963ZjNS8JVpKalKi+h2UfoLGF6UNWFZCcua4ExWFl8ip7OaYAykvmYz58TMk7ggnbeDU9IdvPEm9096IfEg7PaPe72y9MsI69bfB6PjzqATXpL22cdqKzvoAB2iIxSiPjpDF2iIRoih7+gH+ol+BbtBNxgE7+5Sg0ZV8wL9E8H5H0sHw6I=</latexit>
requires an integral!
fit using: fit using:
latentdynamical model
Why are latent variable models hard to work with?
• hard to compute likelihood!
<latexit sha1_base64="lhxhmta7V1DdOurbFKk1Lo00a9U=">AAACeHicbZHLbhMxFIad4VbCpReWbAwRopVQ5ElJ0+wqISSWQSK0JTNUHs9JasWXkX2mNBrNW7CF9+JZ2OCks4ApR7L16/985HPJCiU9MvarE925e+/+g62H3UePnzzd3tnd++xt6QRMhVXWnWXcg5IGpihRwVnhgOtMwWm2fLfmp1fgvLTmE64KSDVfGDmXgmOwviTKLuhk//yAXuz0WJ9tgt4WcSN6pInJxW7nPMmtKDUYFIp7P4tZgWnFHUqhoO4mpYeCiyVfwCxIwzX4tNqUXNNXwcnp3LpwDNKN+3dGxbX3K52Fl5rjpW+ztfk/NitxfpxW0hQlghE3H81LRdHSdf80lw4EqlUQXDgZaqXikjsuMEyp200MfBNWa27yKrla1uECQZd1C1w34LoF0BV19TVBW7SAc5CHnE2fM7fI0or1j8bH7HD8Jgx59HYYsyDiw9FgOKzrsIy4PfrbYjroj/vxR9Y7ed9sZYs8Jy/JPonJiJyQD2RCpkQQQ76TH+Rn53f0InodHdw8jTpNzjPyT0SDPyolvuM=</latexit>
fit using:
xt+1
yt+1
xt
yt
xt-1
yt-1
dynamics
...
......response
<latexit sha1_base64="qKHYXGUUbB4Z85tADVA3P8Cjjyk=">AAACNHicbZC7TsMwFIYd7pRbgZHFokJqB6qkCzAgVbAwMIDU0kpNiRzHLRZObNknqFXoS7HwHkwwMABi5RlwSgduR7L16f/PkX3+UAluwHWfnKnpmdm5+YXFwtLyyupacX3jwshUU9akUkjdDolhgiesCRwEayvNSBwK1gqvj3O/dcO04TJpwFCxbkz6Ce9xSsBKQfHU5wlgX2kZBRkceqPLBvaPeL+sysMAbgcBVLAqD8aYwa43quRuBUeDwMuvGvZpJMHk3AiKJbfqjgv/BW8CJTSps6D44EeSpjFLgApiTMdzFXQzooFTwUYFPzVMEXpN+qxjMSExM91svPUI71glwj2p7bE7jNXvExmJjRnGoe2MCVyZ314u/ud1UujtdzOeqBRYQr8e6qUCg8R5hDjimlEQQwuEam7/iukV0YSCDbpgQ/B+r/wXmrXqQdU7d0v1o0kaC2gLbaMy8tAeqqMTdIaaiKI79Ihe0Ktz7zw7b877V+uUM5nZRD/K+fgEHp6qAQ==</latexit>
latent
observations dynamics
high-dimensional integral
Fitting Latent Variable Models
1. Sampling (“MCMC”) - fully Bayesian inference
1) sample latents: conditional over latents
2) sample parameters: conditional over parameters
• procedure for sampling joint distribution:
Fitting Latent Variable Models
1. Sampling (“MCMC”) - fully Bayesian inference
2. Expectation maximization (EM)
1) sample latents: conditional over latents
2) sample parameters: conditional over parameters
• procedure for sampling joint distribution:
Alternate updating parameters and posterior over latents.
Fitting Latent Variable Models
1. Sampling (“MCMC”) - fully Bayesian inference
2. Expectation maximization (EM)
3. Variational inference
1) sample latents: conditional over latents
2) sample parameters: conditional over parameters
• procedure for sampling joint distribution:
Alternate updating parameters and posterior over latents.
Optimize a lower bound on posterior over parametersEasy with modern probabilistic programming languages (STAN, Edward)
Latent Variable models are defined by two quantities:
<latexit sha1_base64="ws37Pmllp+Wshk+faQ9ajP05jK0=">AAACcnicbZHLbhMxFIad4daGWwvLbgwRUpGqyNMS0uwqoUpdBonQosxQeTwniRVfRvaZkmg0r8AWXo0HYV8nzAKmHMnWr//zkc8lK5T0yNivTnTv/oOHj3Z2u4+fPH32fG//xWdvSydgIqyy7irjHpQ0MEGJCq4KB1xnCi6z5YcNv7wB56U1n3BdQKr53MiZFBw31vhw9fZ6r8f6bBv0rogb0SNNjK/3O1+S3IpSg0GhuPfTmBWYVtyhFArqblJ6KLhY8jlMgzRcg0+rbbE1fROcnM6sC8cg3bp/Z1Rce7/WWXipOS58m23M/7FpibPTtJKmKBGM+PPRrFQULd10TnPpQKBaB8GFk6FWKhbccYFhPt1uYuCbsFpzk1fJzbIOFwi6rFtg1YBVC6Ar6uprgrZoAecgDznbPqdunqUV678fnbKT0VEY8vDdIGZBxCfD48GgrsMy4vbo74rJcX/Ujz+y3tl5s5UdckBek0MSkyE5IxdkTCZEkAX5Tn6Qn53f0UH0KmpWGHWanJfkn4iObgGqV7zo</latexit>
<latexit sha1_base64="z3DakK0PnR7CeqMWV3f0ylYJ1tc=">AAACdHicbZFdT9swFIbdbGysY+NjlxOStQqJSVPkAF3pHRJC2mWRVmBqMuQ4p8XUH5HtQKss/2G38M/4JdzOLbnYwo5k69X7+MjnI80Ft46Qh1bw4uXKq9erb9pv1969X9/Y3DqzujAMhkwLbS5SakFwBUPHnYCL3ACVqYDzdHq84Oc3YCzX6rub55BIOlF8zBl13job7M5/zT5fbnRISJaBn4uoFh1Ux+Bys/UjzjQrJCjHBLV2FJHcJSU1jjMBVTsuLOSUTekERl4qKsEm5bLcCu94J8NjbfxRDi/dvzNKKq2dy9S/lNRd2SZbmP9jo8KND5OSq7xwoNjTR+NCYKfxoneccQPMibkXlBnua8XsihrKnJ9Qux0ruGVaSqqyMr6ZVv4ChqdVA8xqMGsAZ/Kq/Bk7nTeAMZD5nGWfIzNJk5KEX/uHZL//xQ+5d9CNiBfRfm+v260qv4yoOfrnYrgX9sPolHSOTuqtrKKP6BPaRRHqoSP0DQ3QEDF0jX6jO3Tfegy2g06w8/Q0aNU5H9A/EYR/AO0zvfE=</latexit>
latent mapping
• discrete • Gaussian• Mixture ofGaussians
(“clustering”)
Model
• Gaussian • linear, Gaussian
• Factor analysis(PCA is special
case)
• linear Gaussian dynamics
• linear, Gaussian
• Linear Dynamical Systems (LDS)(“Kalman filter”)
• discrete transitions • any • Hidden Markov
Model (“HMM”)
variational latent Gaussian process (vLGP)
Poisson GLMGaussian Process
[Zhao & Park 2016]
<latexit sha1_base64="ws37Pmllp+Wshk+faQ9ajP05jK0=">AAACcnicbZHLbhMxFIad4daGWwvLbgwRUpGqyNMS0uwqoUpdBonQosxQeTwniRVfRvaZkmg0r8AWXo0HYV8nzAKmHMnWr//zkc8lK5T0yNivTnTv/oOHj3Z2u4+fPH32fG//xWdvSydgIqyy7irjHpQ0MEGJCq4KB1xnCi6z5YcNv7wB56U1n3BdQKr53MiZFBw31vhw9fZ6r8f6bBv0rogb0SNNjK/3O1+S3IpSg0GhuPfTmBWYVtyhFArqblJ6KLhY8jlMgzRcg0+rbbE1fROcnM6sC8cg3bp/Z1Rce7/WWXipOS58m23M/7FpibPTtJKmKBGM+PPRrFQULd10TnPpQKBaB8GFk6FWKhbccYFhPt1uYuCbsFpzk1fJzbIOFwi6rFtg1YBVC6Ar6uprgrZoAecgDznbPqdunqUV678fnbKT0VEY8vDdIGZBxCfD48GgrsMy4vbo74rJcX/Ujz+y3tl5s5UdckBek0MSkyE5IxdkTCZEkAX5Tn6Qn53f0UH0KmpWGHWanJfkn4iObgGqV7zo</latexit> <latexit sha1_base64="z3DakK0PnR7CeqMWV3f0ylYJ1tc=">AAACdHicbZFdT9swFIbdbGysY+NjlxOStQqJSVPkAF3pHRJC2mWRVmBqMuQ4p8XUH5HtQKss/2G38M/4JdzOLbnYwo5k69X7+MjnI80Ft46Qh1bw4uXKq9erb9pv1969X9/Y3DqzujAMhkwLbS5SakFwBUPHnYCL3ACVqYDzdHq84Oc3YCzX6rub55BIOlF8zBl13job7M5/zT5fbnRISJaBn4uoFh1Ux+Bys/UjzjQrJCjHBLV2FJHcJSU1jjMBVTsuLOSUTekERl4qKsEm5bLcCu94J8NjbfxRDi/dvzNKKq2dy9S/lNRd2SZbmP9jo8KND5OSq7xwoNjTR+NCYKfxoneccQPMibkXlBnua8XsihrKnJ9Qux0ruGVaSqqyMr6ZVv4ChqdVA8xqMGsAZ/Kq/Bk7nTeAMZD5nGWfIzNJk5KEX/uHZL//xQ+5d9CNiBfRfm+v260qv4yoOfrnYrgX9sPolHSOTuqtrKKP6BPaRRHqoSP0DQ3QEDF0jX6jO3Tfegy2g06w8/Q0aNU5H9A/EYR/AO0zvfE=</latexit>
latent mapping
variational latent Gaussian process (vLGP)
• 63 simultaneously-recorded V1 neurons [Graf et al 2011]• stimuli: drifting sinusoidal gratings
[Zhao & Park 2016]
https://www.youtube.com/watch?v=CrY5AfNH1ik
variational latent Gaussian process (vLGP)[Zhao & Park 2016]
• 63 simultaneously-recorded V1 neurons [Graf et al 2011]• stimuli: drifting sinusoidal gratings
https://www.youtube.com/watch?v=CrY5AfNH1ik
Summary
• descriptive statistical “encoding” models• seek to capture structure in data• formal tools for comparing models• encoding and decoding analyses via Bayes rule• models are modular, easy to build /extend / generalize