Improving expressivity of source posteriors

# Improving expressivity of source posteriors I propose to use autoregressive convolutional layers to model local correlations in the source parameters. Good discussions about the topic can be found here: https://arxiv.org/abs/1901.11137 Given some image $x$, the basic goal is to perform a convolution $$ x' = C x $$ with convolution layer $C$ such that the log determinant of this operation is simple to calculate. This is in fact the case when kernels of the form ![](https://i.imgur.com/ryivBab.png) are used (being strictly zero in the white regions). Convolution with this kernel leads to a strictly triangular convolution matrix, of the form ![](https://i.imgur.com/K9P60WN.png) The determinant is in this case just given by a product of diagonal elements. It is given by $$ \log\det C = h\cdot w\cdot\log K_{mm} $$ where $K_{mm}$ denotes the central value of the convolution kernel, and $h$ and $w$ the height and width of the image. In order to account for the possibility that correlation structure can change throughout the image, it makes sense to (1) introduce diagonal weight layers and (2) to use multiple channels. In that case the operation becomes $$ x' = \sum_c W_c C_c x $$ where $W_c$ denotes multiple $h\times w$ tensors, $c$ is the channel index, and $C_c$ denotes the corresponding convolution. The log-det of this operation is easy to calculate and reads $$ \log\det \frac{\partial x'}{\partial x} = \sum_{ij} \log \sum_c W_{ij,c} K_{mm,c} $$ where the sum over $i, j$ is the sum over image height and width. ### Notes - The central value of the kernel needs to be strictly positive - Zero values have to be enforced by switching of gradients appropriately (see, e.g., https://discuss.pytorch.org/t/applying-custom-mask-on-kernel-for-cnn/87099, for technical details). - The weight matrices $W$ should be strictly positive.