Topics to discuss (23/May/2020)

# Topics to discuss (23/May/2020) ## Inverting computational neuroscience models I've read a few papers on inverting computational neuroscience models. There are mainly two kinds of models: - Models that generate spikes, such as Hodgkin-Huxley and co. - The main example is the paper from Jacob Macke's lab ([arxiv](https://www.biorxiv.org/content/10.1101/838383v1)). They only show examples with this kind of models. Also, the summary statistics are always decided upfront. There's no example where the features are learnt. - In general, the parameters of the model are easy to interpret and relate to such or such physiological thing. - Neural mass models - There's a few recent papers proposing to invert neural mass models. The one that I prefer is this one ([arxiv](https://arxiv.org/abs/1903.01138)). There's also one from a group from Oxford ([biorxiv](https://www.biorxiv.org/content/10.1101/785568v2)) but it is a bit cryptic and not yet peer-reviewed. Also, the first paper has public code available (ok, it's in R, but anyways). In all those works, the features to use are always decided upfront. This seems, to me, an important bottleneck. What would be a good way to learn features from time series? Specially those time series that we are interested in (EEG). I think that it should start with a feature extractor that **learns the second order statistics kind of "naturally"**. Would it be a good thing to try comparing our results/methods with the approach from DCM on a toy model? I'm looking into ways to relate what one would obtain on a **neural mass model with some real physiological meaning**. This [paper](https://link.springer.com/article/10.1007/s004220000160) gives some ideas but it's rather old. The first author is french and from the Université de Rennes. > Check works from Jonathan Touboul ## Learning features for time series analysis I've read some papers doing kind of a deep neural network mixed with AR modeling. There's a few connections with normalizing flows as well, which seems quite interesting. - This paper seems promising but I haven't had the time to go really deep into it: Wiqvist et al. "Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation" [[arxiv]](http://arxiv.org/abs/1901.10230) - They show an example with an AR(2) process where they learn the features. In fact, they say in the beginning of the paper that: *"In particular, our network architecture is specifically designed to learn summary statistics for dynamic models"* - They apply the procedure to ABC procedures - They have public code on GitHub written in Julia: https://github.com/SamuelWiqvist/PENs-and-ABC - The second author is a CR @ INRIA-Nice: https://pamattei.github.io/ ## Some experiments I did quite a few tests and experiments on **the simplest time series process** I could imagine: the univariate AR(1) process, $$ x(n) = \theta x(n-1) + u(n)~, $$ where $\theta \in (-1, +1)$ for the process to be stationary. The process $u(n)$ is a univariate Gaussian white noise process and, for simplicity, we will assume that $\mathbb{E}[u(n)] = 0$ and $\text{var}(u(n)) = 1$. We suppose that we observe a certain realization of the AR process with $N$ samples, $\boldsymbol{x}_{N} = \{x(0), \dots, x(N-1)\}$, and **the goal is to infer the parameter $\theta$ that generated** such samples. We will assume that we may summarize the information carried by the $N$ samples via some vector of summary statistics, which we will denote $S(\boldsymbol{x}_N)$. Writing Bayes theorem we have $$ p(\theta|S(\boldsymbol{x}_N)) \propto p(S(\boldsymbol{x}_N)|\theta)p(\theta)~. $$ In general, the expression for $p(S(\boldsymbol{x}_N)|\theta)$ may end end up being very complicated, making the posterior impossible to obtain. Using the SNPE algorithm, we may estimate the posterior distribution without knowing how to write such likelihood. However, for this **toy example**, we will consider a case where we know the expression for the likelihood function and its related posterior. This will allow us to compare the results obtained via the SNPE-C algorithm with the analytical expression for the posterior. We will consider that our summary statistics is an estimate of the autocorrelation function (ACF) of the time series. Remember that the autocovariance function of $x(n)$ is $$ \gamma(h) = \mathbb{E}[x(n)x(n+h)] $$ for $h = 0, \pm 1, \pm 2 \dots$, and that the ACF is $$ \rho(h) = \dfrac{\gamma(h)}{\gamma(0)} $$ > Note that if we have an AR(1) process as defined above, then $\rho(1) = \theta$. For simplicity of analysis/exposition/discussion, we will use just one lag of the ACF as summary statistics, i.e., $S(\boldsymbol{x}_N) = \hat{\rho}(1)$. Using the results from this [link](https://stats.libretexts.org/Bookshelves/Book%3A_Time_Series_Analysis_(Aue)/2%3A_The_Estimation_of_Mean_and_Covariances/2.2%3A_Estimation_of_the_Autocovariance_Function) we may write the statistical distribution of the estimator $\hat{\rho(1)}$ in terms of the parameter $\theta$. We have that: $$ p(\hat{\rho}(1)|\theta) = \mathcal{N}(\theta, \sigma_N^2) $$ with $\sigma_N^2 = (1-\theta^2)/N$. Based on this result, if we assume that the parameter $\theta$ follows an uniform distribution in $(-1,+1)$, then the posterior distribution for ${\theta}$ given an observed summary statistics $\theta_0$ is $$ p(\theta|\hat{\rho}(1)=\theta_0) = \frac{1}{Z}\mathcal{N}(\theta_0, \sigma^2_N)~\mathbf{1}_{\theta \in (-1,+1)} $$ where $Z$ is just a normalization constant easily obtained from the gaussian CDF. The figure below shows the pdf obtained using the SNPE-C algorithm with a neural density estimator which is a MDN with just one mixture (i.e. a Gaussian distribution). We show the results for three values of $N$ and fixing the number of simulations per round in the SNPE iterative procedure. The data was generated using an AR process with $\theta = 0.5$ and we assume that we observe exactly the correct value of $\theta$, that is $\theta_0 = 0.5$. ![](https://i.imgur.com/3ZZd0Vd.png) The next figure gives a more quantitave summary of the results. We consider an increasing number of simulations per round and different values for $N$. We also consider two different AR processes: $\theta = 0.5$ and $\theta = 0.95$. We report the Wasserstein distance ($p = 2$) between the analytic distribution and the one estimated with SNPE-C. ![](https://i.imgur.com/rZmOWcd.png) I've also considered what happens when taking into account more time lags. ![](https://i.imgur.com/KMXwzNW.png) **Next steps**: - Can we learn automatically the ACF from the data? Learn the best features for the time series? In linear cases it is analytical, but what happens in more complicated settings? - Compare the results of SNPE-C with ABC procedure - Use mainly the method from this [paper](https://arxiv.org/abs/1903.01138). - How the results compare to when we use the spectral domain as summary statistics? - I think that an AR(2) process would probably give richer results and more interesting things to say. For instance, we didn't see much difference on the results for different value of $\theta$. I found some analytical results that should guide/help me having the analytic form of the posterior.