A strategy for a soft implementation of the algorithm

# A strategy for a soft implementation of the algorithm ## > **_Summary_:** The simplest description of the convnet, how to generalize it to our case, i.e. a strategy for implementing a "soft" version of our algorithm in neural-network land. ### TLDR -- how to turn a classical algorithm into a "soft" neural network algorithm In our current algorithm, replace all subsets of sensors $U \subset S$ with learned functions $F_U : S \to \mathbb{R}$, thinking of the latter as generalizing the former via the "indicator function" construction, $U\mapsto\chi_U$. Similarly, replace all functions $\phi: A \to B$ (e.g., our charts $\phi: U \to M$) by kernels $K_\phi: A\times B \to \mathbb{R}$, where the latter generalizes the former by applying the indicator function construction to the graph $\Gamma_\phi \subset A\times B$. Observe that all functional and set-theoretic operations are replaced by operator-theoretic operations, which become linear algebra operations if we think of our spaces as being approximated by sensors. Learn the various matrices by gradient descent on appropriate cost functions. ### Notation: input sensors To fix notation, suppose that our "vanilla convolutional layer (VCL)" implements a function $f : X \to Y$. The cardinality of the set $S$ of pixels (i.e., input sensors) equals the number of dimensions in the input space, i.e. $$X = \mathbb{R}^S = \mathbb{R}^{N_S},$$ where $|S| = N_S$. E.g. in the VCL with an $n\times n$ image, we'd have $N_S= n^2$. ### Notation: patches and filter Traditionally, the convolutional filter will have a fixed size and shape, e.g. it might be a square array of size $N_L = k^2$ (for some $k < n$), so that $N_L$ denotes the number of pixels in the filter, which will then be iteratively "applied" to many shifted patches in the image. These shifted patches are in bijection with the output neurons of the VCL. Let $$P = \{p_1, \dots , p_{N_P}\}$$ denote the set of patches, so that the output space can be written $Y = \mathbb{R}^P = \mathbb{R}^{N_P}$. E.g. in a VCL, we might have $N_P \sim (n-k)^2$. ### "Model" sensors To describe the filtering process, we can first explicitly introduce a third set of "model" sensors $L$ with $|L| = N_L$, which should be viewed as the "abstract" or model pixels where the filter will be defined. In the VCL, for each patch $p_i$, the $N_L$ elements of $L$ are identified one to one with a particular set of $N_L$ pixels/sensors in $S$. We can view this identification as a subset $\Gamma_i \subset S \times L,$ where $(s,l) \in \Gamma_i$ if patch $p_i$ identifies pixel $s \in S$ with the model pixel $l \in L$. ### Kernel operators We will use the following "kernel construction" several times: suppose we have compact measure spaces $(X, \mu_X)$ and $(Y, \mu_Y)$, as well as a "kernel" $K: X \times Y \to \mathbb{R}$. We define the corresponding kernel integral operator $$ \mathbf{K}: \mathbb{R}^X \to \mathbb{R}^Y$$ by the formula $$K(f)(y) = \int_X K(x,y)f(x) \\ \text{ for } y\in Y, ~ f\in \mathbb{R}^X$$ ### Return to the filtering process in the VCL. The $i$th patch is equivalent to the correspondence $$\Gamma_i \subset S \times L$$ Using the indicator construction, view $\Gamma_i$ as a kernel $$\Gamma_i: S \times L \to \mathbb{R},$$ i.e. $$\Gamma_i(s,l) = 1$$ if $(s,l) \in \Gamma_i,$ and $\Gamma_i(s,l) = 0$ otherwise. Furthermore, our "filter" Then the operator $$\mathbf{\Gamma}_i : \mathbb{R}^S \to$$