I propose to use autoregressive convolutional layers to model local correlations in the source parameters. Good discussions about the topic can be found here: https://arxiv.org/abs/1901.11137
Given some image $x$, the basic goal is to perform a convolution
$$
x' = C x
$$
with convolution layer $C$ such that the log determinant of this operation is simple to calculate. This is in fact the case when kernels of the form are used (being strictly zero in the white regions). Convolution with this kernel leads to a strictly triangular convolution matrix, of the form
The determinant is in this case just given by a product of diagonal elements. It is given by
$$