Rough notes - what is a local feature?

# Rough notes - what is a local feature? ## Developing a notion of "same local feature" Suppose we're given two different maps $f_1, f_2:X\to Y$. What does it mean to say that $f_1$ and $f_2$ share a local feature? The first thing we might mean by this pair of subsets $U_1$ and $U_2$ such that $f_1|_{U_1}$ "is equal to" $f_2|_{U_2}$. Of course these functions have different domains, so we need to be able to relate these domains via a function $\phi:U_1\to U_2$ to compare them. We also might not want to use all of the information available in $Y$ to compare these functions, e.g. if $Y$ is $\mathbb{R}^2$ we may be interested in functions that agree on the first coordinate. This suggests the following definition: **Definition 1:** A *local feature correspondence* from $f_1$ and $f_2$ consists of 1. A pair of subsets $U_1$ and $U_2$ of $X$. 2. A function $\phi: U_1\to U_2$ 3. A quotient map $q: Y\to Z$ such that $q \circ f_2 \circ \phi = q \circ f_1$. Of course we may want to loosen this definition to allow near correspondences rather than strict correspondences. We then get the following definition: **Definition 2:** An *$\epsilon$-close local feature correspondence* from $f_1$ and $f_2$ consists of 1. A pair of subsets $U_1$ and $U_2$ of $X$. 2. A function $\phi: U_1\to U_2$ 3. A quotient map $q: Y\to Z$ 4. A metric $d$ on $Z^{U_1}$ such that $d(q \circ f_2 \circ \phi , q \circ f_1)<\epsilon$. There seem to be a number of natural choices for a metric on $Z^{U_1}$ whenever $Z$ itself is a metric space (e.g. pointwise sup, $L^2$, etc.) So far, we've allowed for arbitrary subsets $U_1$ and $U_2$ and an arbitrary map $\phi: U_1\to U_2$, however this does not reflect the local structure of $X$. It therefore seems that we should restrict to maps that are sufficiently continuous. When $X$ is a metric space, it seems that using $k$-Lipschitz maps given a good definition, so we get: **Definition 3:** An *$(\epsilon,k)$-close local feature correspondence* from $f_1$ and $f_2$ consists of 1. A pair of subsets $U_1$ and $U_2$ of $X$. 2. A k-Lipschitz function $\phi: U_1\to U_2$ 3. A quotient map $q: Y\to Z$ 4. A metric $d$ on $Z^{U_1}$ such that $d(q \circ f_2 \circ \phi , q \circ f_1)<\epsilon$. As we've defined things so far, there's an interesting directionality to the correspondence, since having a $(\epsilon, k)$-feature correspondence between $f_1$ and $f_2$ doesn't imply there is an $(\epsilon,k)$-feature correspondence between $f_2$ and $f_1$. In particular, if $U_2$ is a quotient of $U_1$ such that the diagram in the definition commutes, then this will be a perfectly good correspondence in one direction, since distances will only contract. To highlight why this might be useful, let's extend our definition to functions defined on different spaces. **Definition 4:** An *$(\epsilon,k)$-close local feature correspondence* from $f_1:X_1\to Y$ to $f_2:X_2\to Y$ consists of 1. A pair of subsets $U_1\subset X_1$ and $U_2\subset X_2$ 3. A k-Lipschitz function $\phi: U_1\to U_2$ 4. A quotient map $q: Y\to Z$ 5. A metric $d$ on $Z^{U_1}$ such that $d(q \circ f_2 \circ \phi , q \circ f_1)<\epsilon$. One place the assymetry of this relationship might be useful is where $X_1$ is a pixel space and $X_2$ is given by lowering the resolution of $X_1$. We'd then expect $k$-Lipschitz maps from $X_1\to X_2$, but not the other way around, so insisting on a symmetric relationship here might not be the way to go. Now there's one more thing we might want, which to loosen the requirement that the relationship between $U_1$ and $U_2$ be given by a function. For instance, if we can map $U_1$ and $U_2$ into the same space $C$ such that their images are close together, then we might hope that this correspondence is good enough to "transfer" the function from $U_2$ to $U_1$. **Definition 5:** An *$(\epsilon,k)$-close local feature correspondence* from $f_1:X_1\to Y$ to $f_2:X_2\to Y$ consists of 1. A pair of subsets $U_1\subset X_1$ and $U_2\subset X_2$ 2. A pair of k-Lipschitz functions $\phi_1: U_1\to C$ and $\phi_2: U_2\to C$ 3. An extention $\tilde{f}_2:C \to Y$ such that $\tilde{f}_2\circ \phi_2 = f_2$. 4. A quotient map $q: Y\to Z$ 6. A metric $d$ on $Z^{U_1}$ such that $d(q \circ \tilde{f}_2 \circ \phi_1 , q \circ f_1)<\epsilon$. This shows that there is an extension problem to solve in general that may or may not be possible if $X_1$, $X_2$ and $C$ are topological spaces and we're working with continuous maps. Even though we're usually interested in working with finite spaces, this sort of issue should still come up in a coarse manifestation, i.e. a topological obstruction should prevent us from finding sufficiently continuous maps even in the finite setting. So what is a local feature? It seems to me that the functions $\tilde{f}_2: C\to Y$ are the real "local features" here. I think this justifies thinking about "charts" as being the primary objects of interest. We can then refine our definition based on an "atlas" $\mathcal{A}$ of charts $\{C_1,\dots, C_n\}$ together with maps $\{\phi^i_j: U_j\subset X \to C_j \}$ **Definition 6:** Let $\mathcal{A}_1$ and $\mathcal{A}_2$ be $k$-Lipschitz atlases on $X_1$ and $X_2$ with the same set of underlying chart spaces $\{C_1,\dots,C_m\}$. An *$(\epsilon,k)$-close local feature correspondence restricted to $\mathcal{A}_1$ and $\mathcal{A}_2$* from $f_1:X_1\to Y$ to $f_2:X_2\to Y$ consists of 1. A pair of $\mathcal{A}$-charts $\phi_1:U\subset X_1 \to C$ and $\phi_2:V\subset X_2 \to C$ with a common range 2. A function $\tilde{f}:C\to Y$ (that has some sort of regularity constraints, more on this later) 4. A quotient map $q: Y\to Z$ 6. Metrics $d_1$ on $Z^U$ and $d_2$ on $Z^V$ such that such that $d_1(q \circ f_1 , q \circ \tilde{f} \circ \phi_1 )<\epsilon$ and $d_2(q \circ f_2 , q \circ \tilde{f} \circ \phi_2 )<\epsilon$ This definition seems nice, in that it says that $f_1$ and $f_2$ are "close to the same function" even though they are defined on different spaces, since they are both "close to factoring through" $\tilde{f}$. This definition is also symmetric. This is my favorite definition so far, though there is something to discuss about the continuity properties. To think about this, note that the above suggests a definition for the distance between functions defined on different open sets: **Definition of distance between local functions** Let $f_1:U\to Y$ and $f_2:V\to Y$ be functions (here I am thinking of $U$ and $V$ as being both subsets of $X$, but that's not important) and let $\phi_1:U\to C$ and $\phi_2:V\to C$ be charts. The distance between $(f_1,\phi_1)$ and $(f_2,\phi_2)$ is the minimum over $\tilde{f}:C\to Y$ of $d_1(f_1,\tilde{f}\circ\phi_1)+d_2(f_2,\tilde{f}\circ \phi_2)$ Note that we can further minimize away the dependence on $\phi_1$ and $\phi_2$. To see that some sort of regularity constraint is necessary, note that if $\phi_1$ and $\phi_2$ have disjoint images, then we can choose $\tilde{f}$ so that the $d_1$ and $d_2$ terms vanish in the above definitions. However if we constrain the amount that $\tilde{f}$ varies we should be able to control this. This suggests that the additional thing we need is a norm of $Y^C$ to control the variance. Alternatively (and this seems better) we could ask for an extension property, namely that functions to $Y$ defined on finite subsets of $C$ extend (naturally) to functions on all of $C$. If $Y$ is $\mathbb{R^n}$ and $C$ is any metric space, then we can extend a function $f$ defined on $x_1,\dots, x_n$ via something like $f(x)=\min_{i}d(x,x_i)\left(\frac{f(x_1)}{d(x,x_1)}+\dots+\frac{f(x_n)}{d(x,x_n)} \right)$ We could then adjust the definition above so that $\tilde{f}$ is defined to be the extension of the map $f_1$ or $f_2$ pushed forward to $C$. I also prefer the asymmetric version of this metric. Note that two charts might have different numbers of points in them. Comparing a function defined at one point to a function defined on many points could yield two different (and interesting) pieces of data. A match in one direction would imply that the function is nearly constant, whereas a match in the other would just imply that the two functions match at a single value. ### The set of charts for a fixed neighborhood and comparison space Given a subset $U$ of $X$ and a comparison space $C$, there are, in general, a number of different valid charts $\phi: U\to C$. For instance, suppose we're working in the category of metric spaces and $k$-Lipschitz maps. Then the set of valid $\phi$ will be the set of all $k$-Lipschitz maps from $U$ to $C$. This set factors as the set of all maps of the form $\psi_2\circ\phi\circ\psi_1$, where $\psi_1: U\to U$ is a $k_1$-Lipschitz map, $\phi: U\to C$ is a $k_2$-Lipschitz map, $\psi_2:C\to C$ is a $k_3$-Lipschitz map, and $k_1k_2k_3\le k$. Generalizing this, we can see that for a given notion of structure group or monoid (note that $k$-Lipschitz maps aren't invertible, so we shouldn't expect more than a monoid, and if $k$>1 $k$-Lipschitz maps aren't even composable so even a monoid might not always be possible), the set of charts can be identified with some quotient of the $Struct(U,U)\times Struct(U,C)\times Struct(C,C)$. In the special case that we're dealing with isometries, we can generate all isometries using a fixed $\phi$, so there is a ($\phi$-dependent) surjection $Isom(U,U)\times Isom(C,C)\to Isom(U,C).$ These self-map structures $Struct(U,U)$ and $Struct(C,C)$ therefore measure something about ambiguity of the charts. It therefore seems that to understand the set of charts, we should understand these self-mapping structures ### Reduction of structure I think substructures should play a key role here. #### Reduction to point stabilizers The most obvious reduction of structure to consider is looking at pointed maps, i.e. maps from $(U,x)\to(C,o)$ where $o\in C$ is an origin point. This clearly reduces the available family of maps considerably, and allows us to think about where features are "centered." It also seems to come up in an essential way when thinking about filters. When thinking about strict isometries, the set $Isom((U,x),(C,o))$ can be identified with a quotient of the product of the stabilizers $Stab(x)\times Stab(o)$. If $X$ is a Riemannian manifold and $C$ is a symmetric space (e.g. $\mathbb{R}^n$), then generically we expect $Stab(x)$ to be trivial and $Stab(o)$ to be an orthogonal group. #### Reduction to "nice" substructure This is already implicit in the above, but for instance bi-Lipschitz maps offer a nice substructre of Lipschitz maps, and isometry groups offer a nice substructure within bi-Lipschitz groups, so we can try to restrict structure to get better properties with respect to compositionality. Similarly, we can filter Lipschitz groups by their Lipschitz constants, to get a flexible "rating" system maps. #### Reduction to "orientation-like" substructure I think reduction to substructure also can deal with things like edge detection. For this, suppose that $C$ is a chart space with structure (semi)-monoid $G$. Let $H$ be a substructure of $G$. **Definition** By an $H$-structure on a subset $V$ of $X$, we will mean a covering of $V$ by charts $U_i$ with invertible maps $\phi_i:U_i\to C$ such that $\phi_j\circ \phi_i^{-1}: \phi_i(U_i\cap U_j)\to C$ is the restriction of an element $h\in H$. As an application, consider the case where $C$ is $\mathbb{R}^2$, $G$ is $Isom(\mathbb{R}^2)$, and $H<G$ is the group of translations. Then an $H$-structure on $X$ is exactly the same as a global notion "horizonal" and "vertical." This is exactly whats needed to detect which direction an edge is pointing in, for instance, for edge detection in an image. #### Using substructures to define similarity classes of features So far, we've thought of "features" as being single functions $f:C\to Y$ where $C$ is a chart. Yet our structure object $G$ of $C$ seems to give a nice way of decomposing our feature space into $H$-orbits, where $H$ is a substructure of G. For example, suppose that we are interesting in image space with $C=\mathbb{R}^2$ and $G=Sim(\mathbb{R}^2)$, the set of all similarities of $\mathbb{R}^2$ (including dilations). Then we can look at the dilation subgroup $\mathbb{R}^+<Sim(\mathbb{R}^2)$. Given a feature $f: C\to \mathbb{R}^n$ considered as an image and $g\in Sim(\mathbb{R}^2)$, $g\cdot f$ is simply a rescaled version of the same image. At least for $g$ near the identity, this should give the "same image." This seems like a general observation. If we're dealing with things like images, the orbit of a feature under a neighborhood of the identity in the relevant structure algebra should give a family of "similar" features. This gives a nice relative notion of "feature correspondence" above. Note that we can modify the metric on the function spaces to take a minimum over $H$ translations. This would allow us to think about feature correspondences for images at different scales. #### Using features to define heirarchical (pseudo)-metrics on $Y^X$ It would be nice to figure out a way to come up with a distance between functions in $X^Y$ that was based on optimal feature mapping. Assume, for instance, that an image consists of a ball and a cat floating in empty space with fixed orientations. With a given ball image and a given cat image, the parameter space of such images should be $\mathbb{R}^2\times \mathbb{R}^2$ with this parameter space being given as the quotient given by taking center points of the cat and the ball. (Note that this quotient map also has a section but I'm not sure this is important.) We can then imagine varying the cat or ball image but keeping its center point fixed. This moves us in the fiber of these parameters. Generalizing this, we might look for compatible decompositions of our space $X$ by features. Here's an attempt to define this: **Definition:** Let $f_1$ and $f_2$ be elements of $Y^X$. Let $\mathcal{U}=\{U_1,U_2,\dots, U_n\}$ and $\mathcal{V}=\{V_1,\dots,V_n\}$ be coverings of $X$ by maps with charts to $\{\phi_i:U_i\to C\}$ and $\{\psi_i:V_i\to C\}$. We define the *partition distance vector* to be the vector $(d_{Haus}(U_1,V_1),\dots,d_{Haus}(U_n,V_n))$, and the $(\mathcal{U},\mathcal{V})$-*feature distance vector* between $f_1$ and $f_2$ is the vector $(d(f_1|_{U_1},f_2|_{V_2}),\dots, d(f_1|_{U_n},f_2|_{V_n}))$ where the distance function here is the one defined in the previous section for measure the distance between different metrics with respect to the same chart. We can then minimize the lengths of these vectors with respect to whatever metric we choose on $\mathbb{R}^n$ to get a pair of numbers $(d_1,d_2)$, where $d_1$ measures the distance between the chosen partitions and $d_2$ measures the distance between the feature at $U_1$ and the feature at $V_1$ We could then see how this varies over matchings. The first factor being zero would imply that the partititions are the same, the second that the features are the same just appearing in different parts of the space. #### Dual notions to "features" in $X^Y$ In what we've talked about so far, we've considered a feature to be a quotient of function on a local patch. A "co-feature" should therefore be subset of (local) functions. Other than general nonsense, why would a co-feature be interesting? One reason is that it seems natural in image processing. Let's consider an family of images considered as a set of elements of $\left(C\cup \{0\}\right)^{[0,1]\times[0,1]}$, where $C$ is color space and "0" denotes transparent. We often assume that a set of images arises as the image of some set under a quotient map $\left(C\cup \{0\}\right)^{\mathbb{R}^3}\to \left(C\cup \{0\}\right)^{[0,1]\times[0,1]}$, e.g. via taking a photograph. Now consider the space of positions of a mug in $\mathbb{R}^3. There is a 6-manifold of such positions, each of which generates an element of $(C\cup \{0\})^{\mathbb{R}^3}$. If the mug has a handle and doesn't have any symmteries, then this is an embedded 6-manifold. The image of this manifold under the image map is therefore a quotient of a 6-manifold. Interestingly, this quotient is locally injective if the mug is squarely in frame and the camera is fixed. So images of mugs generate a "cofeature." We might also hope to be able to detect "cofeatures" via "features," i.e. a quotient map that detects them. That's presumably what the neural net will do. But it's interesting to ask the question: can we determine the structure of a "cofeature" given a feature, at least in this limited case where we're looking at a mug? Learning to "describe" this manifold, in my opinion, would represent a deep understanding of mugs. ### Piecing together features #### Filters, feature maps, convolution layers In the typical convolutional neural network setting, our convolution layer consists of feature maps. These are formed by taking each feature and asking the question: "how much is this feature present at this point?" This is certainly something we can do in this case, but it seems that it breaks into pieces. It's also a bit "pointy." In our setting, the first question to ask is, given a chart $C$, "how well adapted is this chart to the geometry at this point?" To answer this question, we can see how close to finding a nice an embedding we can find of a neighborhood of $X$ into $C$. We get an answer to this question for each $r$ ball, so we get some sort of monotone decreasing function of $r$ at each point (if $B_r(x)$ embedds nicely in $C$, then $B_{r'}(x)$ should embed at least as nicely for each $r'<r$.) If we pick a reasonably sized $r$, we therefore get something like a feature map (perhaps a terrain map?) that is independent of the function on $X$. It measures how easy it is to "terraform" a given piece of $X$ to make it look like $C$. Presumably we only want to look for "$C$-features" on regions of $X$ that look like $X$. For instance, in our MNIST example, we have once model space which is $\mathbb{R}^2$, however if we learn the grid structure it's pretty wonky at the edges. If we look at a heat map that shows how close $r$-balls are to being $C$-mappable for a fixed $r$, we'll therefore see a lot of nice regions in the middle and decay of fit near the edges.