---
tags: phone-notes
---
# Cotton-Will Notes August 2, 2020
We have a dataset $D\subset \mathcal{U}$, which we'll think of as a finite set of points in a "data universe" $\mathcal{U}.$
Let $Y$ be a space, thought of as the readout of a sensor.
[Is Y fixed for all sensors? if yes :]
**Def:** A *sensor* is a function $f: \mathcal{U}\to Y$
[Is Y fixed for all sensors? if no :]
**Def 2:** A *sensor* is an index set S, and for each $s \in S$, a target space $Y_s$ and a function $\mathcal{U}\to Y_s$.
**Def** A *measurement space* is a set of sensors.
**Alternate Def** A *measurement space* for a given set of sensors $S$, is (1) Y^|S|, or (2) $\prod_s Y_s$.
With MSNIST, the desired measurement space is the 28x28 grid of pixels, (which we can consider as a metric space in the obvious way.)
We want to think of our observation set as functions from $S\to Y$, so what we're looking for is a function $\Phi:\mathcal{U}\to Y^S$.
**Lemma:** There is a natural bijection between functions $\Phi: \mathcal{U}\to Y^S$ and sets of sensors on $\mathcal{U}$ with cardinality $|S|$.
A sensor $s$ is: $f_s : \mathcal{U}\to Y$. Let $X$ be an $k$-element set. $k$ sensors is the same as $\rm{Hom}(X, \rm{Hom}(\mathcal{U}, Y))$, which is the same as $\rm{Hom}(\mathcal{U}, \rm{Hom}(X, Y))$.
Once we've picked our measurement space $X$ (which is a set of sensors), we then want to put a metric on it.
The above set is also in bijection with $\rm{Hom}(\mathcal{U}\times S, Y)$.
Two sensors $s_1$ and $s_2$ give us a subset of $\mathcal{U}\times Y\times Y$.
What we need is a metric on $Y^\mathcal{U}$.
If you have a set of sensors $S$, the observed data is a matrix $X$ is $X_{d, s} = s(d)$ (where we're thinking of a sensor as a function on the data points).
$D$ is the proto- or platonic ideal data.
Question: Should a (quasi)-metric on $S$ come from a metric on all of $Y^\mathcal{U}$, or is it sufficient to have metric defined merely on $S \subset Y^D$ or $Y^D$?
Answer: We want to be able to accomodate more both sensors, which requires having this defined on $Y^D$, and more datapoints, which requires this to be defined on $Y^\mathcal{U}$.
The model for data collection is a probability distribution $\mu$ on $\mathcal{U}$ and experimental observation is a sample from $\mathcal{U}$ and its measurement.
**Question:** Should a metric on $Y^\mathcal{U}$ be of the form $d(s_1,s_2)=\phi(\int_\mathcal{U} \psi(s_1(x),s_2(x))d\mu)$ where $\psi$ is a function from $Y\times Y \to R$ where $R$ is an abelian monoid and $\phi: R\to \mathbb{R}^+$ is a function.
Does mutual entropy fit into this framework? Let's see.
The standard definition of mutual entropy is that for random variables $s_1, s_2:\mathcal{U}\to Y$,
$H(s_1|s_2)= -\sum_{y_1,y_2\in Y} p(s_1(u)=y_1 \land s_2(u)=y_2)\log\left( p(s_1(u)=y_1 | s_2(u)=y_2)\right)$
$=-\sum_{y_1,y_2\in Y} p(s_1(u)=y_1 \land s_2(u)=y_2)\log\left(\frac{p(s_1(u)=y_1 \land s_2(u)=y_2)}{p(s_2(u)=y_2)}\right)$
Let $I_{y}$ denote the indicator function of a point y.
$=-\sum_{y_1,y_2\in Y} (\int_\mathcal{U} I_{y_1}(s_1(u))I_{y_2}(s_2(u)d\mu)\log\left(\frac{p(s_1(u)=y_1 \land s_2(u)=y_2)}{p(s_2(u)=y_2)}\right)$
$=-\sum_{y_1,y_2\in Y} (\int_\mathcal{U} I_{y_1}(s_1(u))I_{y_2}(s_2(u)d\mu)\log\left(\frac{\int_\mathcal{U} I_{y_1}(s_1(u))I_{y_2}(s_2(u)d\mu}{\int_\mathcal{U} I_{y_2}(s_2(u))d\mu}\right)$
This nearly fits into the above framework, but it does seem that we need to generalize it somewhat, since we are integrating over $Y$ as well as $\mathcal{U}$ here, and the integrands depend on the $Y$ values.
So I think the answer to the above question is no.