# Pseudo-psuedo code for MNIST analysis Who is here? Role call: - [x] Cotton - [x] Will - [x] Sam 1. **Step 1** 1. Ingest MNIST Data --- 2. **Step 2** 1. We start with a set of images which are points $x_1, \dots , x_n$ in a space $X$. 2. We have a set $S$ of pixel functions or sensors. Each $s\in S$ is a function $s: X \to \mathbb{R}$ (or a more general measurement space), where $s(x)$ is the value of sensor $s$ on image $x$. 3. Calculate $L^2$/"correlation" metric $d = d_S$ on $S$. 4. There is a measurement set $X$. A sensor $s\in S$ is a map $s: X\to \mathbb{R}$. --- 3. **Step 3: Building transport maps $\mathbb{R}^{B_{r_1}(s_1)} \to \mathbb{R}^{B_{r_2}(s_2)}$** 1. **Black box:** We assume our model space $M$, defined as the space where our charts take values, is equal to $\mathbb{R}^2$ (later we'll make this work more generally). 2. **Simplifying assumption** We can use a single point $s_0$ as the "origin" of all developing maps. 3. Finding $s_0$ 1. Find $s_0 = \min_{s\in S}\{\max(d(s),x)~|~x\in S\}$ 2. We now want to break $S$ up into a set of charts $\mathcal{C} = \{C_i\}$. 4. Look at the set $$\mathcal{B}=\{B_r(s) \subset S~|~r\in\mathbb{R}^+, s\in S\}$$ 5. For each $B\in \mathcal{B}$, find a "best" map $\phi_B: B\to \mathbb{R}^2$. Let $$\mathcal{C}= \{\phi_B~|~B\in\mathcal{B}\}$$ 6. Given two balls $B_1$ and $B_2$ from $\mathcal{B}$, we now want $\Gamma_{1,2}: (B_1)^\mathbb{R}\to (B_2)^\mathbb{R}$ a method of transfering functions from $B_1$ and $B_2$. We will use the developing map to do this. given by the developing map. Assume the chart on $B_1$ is fixed. The cost function of the developing map is $$F(\phi_{B_2}) = \left(\sum_{b \in B_2, b'\in B_1\cup B_2} |d(\phi(b), \phi(b')) - d_S(b, b')|\right)^{\frac{1}{2}}$$ 7. To make it symmetric $$F(\phi_{B_1}, \phi_{B_2}) = \left(\sum_{b, b'\in B_1\cup B_2} |d(\phi(b), \phi(b')) - d_S(b, b')|\right)^{\frac{1}{2}}$$ ---- 4. **Step 4: Build module to learn functions on $\mathbb{R}^2$** 1. A kernel is a function $\psi:M = \mathbb{R}^2\to\mathbb{R}$. We are going to learn that. 2. What we now do with is use it as a filter, i.e. as a function from $\Phi_\psi: X\to \mathbb{R}^S$ (think of this as a map from $X \to X$.) 3. Take $\phi: M = $. Fix a chart $\psi: S_i \to M$. 4. We pick a number $n$ of convolutional layers, kernels $$\psi_1,\dots, \psi_n$$ and charts $$\phi_1:B_1\to \mathbb{R}^2,\dots, \phi_n:B_n\to \mathbb{R}^2$$ Each convolutional layer $S_i$ is a copy of $S$. The $i$-th copy of the sensor $s$ takes the value at $x\in X$ given by $$s_i(x)=\sum_{y\in B_r(s)} \Gamma_{B_i,B_r(s)}(\phi^\ast(\psi_i))(y)\cdot y(x)$$ 4. 5. We 6. **Observation**: in this formalism, the "correct" view of the parameter space for our generalized convnet is that is a subspace $W \subset \mathbb{R}^M$, where $M$ is the model space from Step 3. (where $M = \mathbb{R}^2$ in our image example). --- ![](https://i.imgur.com/pPO4B84.png) ---