Yoneda Lemma - HackMD

# Yoneda Lemma (under construction) (part of a [short course on monads](https://hackmd.io/@alexhkurz/H1OxumxRP)) (all exercises are maths exercises, no programming exercises in this lecture) From the point of view of the [Short of Introduction to Monads](https://hackmd.io/@alexhkurz/H1OxumxRP) this is an excursion that one may want to skip if interested mainly in [monads](https://hackmd.io/@alexhkurz/HkH8SBxAw). On the other hand, the Yoneda lemma plays a central role in the structural approach to mathematics as formulated by [category theory](https://hackmd.io/@alexhkurz/BktmSSgCP) and should be worth a detour. Moreover, from a more technical perspective, the analogy between polymorphic functions in programming languages and natural transformations in category theory is important for both areas and I do not want to miss the opportunity to make some remarks on this. For example, the Yoneda lemma allows to give short and concise proofs of the following results. - The only natural transformation $\rm Id\to Id$, in other words, the only polymorphic function `a->a`, is the identity function. [^identity] - The natural transformations $\rm Id^3\to Id^2$, in other words, the polymorphic functions `(a,a,a)->(a,a)` are in bijection with the set of functions $2\to 3$. [^functions23] - The natural transformations ${\rm Id}^\ast \to {\rm Id}^\ast$, in other words the polymorphic functions `List a -> List a`can be enumerated bijectively by the following data. A function $f:\mathbb N\to \mathbb N$ and, for all $n\in\mathbb N$, a function $g:f(n)\to n$. [^functionlistlist] This answers some of the question that we raised in the [Exercise on Polymorphic Functions](https://hackmd.io/JGNOqD4aQHquNcqisW0glA?view#Polymorphic-Functions-as-Natural-Transformations). Before we can get explain the Yoneda lemma we need to familiarize ourselves with quite some new notation. One way to do this is to step back and make another excursion into the philosophical foundations of category theory. ## The Mathematical Structuralism of Category Theory In the previous lecture, we introduced category theory as an axiomatic theory of structure. Let us review this again at the hand of the example of monoids. ### Recap of Monoids A monoid is defined as a set $M$ with two operations satisfying some equations. All monoids and monoid homomorphism form a category $\sf Monoid$. While we used operations and equations to define the particular category $\sf Monoid$, we can now throw away this "ladder" and study the category $\sf Monoid$ just from the point of view of objects and arrows. How much of the original structure of monoids can be recovered from the structure preserved purely in terms of objects and arrows? To fully answer this question would easily fill a one semester course on category theory. But for what we want to do here only some general ideas are needed. ### Arrows as generalised elements and observations We want to explore the two basic ideas that allow us to extract the intrinsic structure of objects just from studying the available arrows between objects. Let $\mathcal C$ be any category. Let $A$ be an object in $\mathcal C$. There are two kind of arrows connecting $A$ with other objects: Arrows into $A$ and arrows from $A$. ![](https://i.imgur.com/rxOPIfW.png =650x300) The basic idea now is: - We know how to build $A$ if we know all arrows into $A$. Every arrow $X\to A$ can be thought of as a **generalized element** of $A$. - We know all observations that can me made of $A$ if we know all arrows out of $A$. Every arrow $A\to X$ can be thought of as a generalised observation of $A$. One of the recurring themes in category theory, is that such ideas *about* category theory can be formulated *in* category theory. Here is how we do this in this case. The class of all arrows into $A$ can be organised into a functor we denote by $\mathcal C (-,A)$. The $-$ is meant to range over all objects of $\mathcal C$. In set theory, I would represent the idea of $\mathcal C (-,A)$ as the set $\{\mathcal C(X,A)\mid X\in \mathcal C\}$, but this would miss part of the important structure that is contained in the idea of $\mathcal C (-,A)$. It is much more powerful to define $\mathcal C (-,A)$ as a functor. Here is the definiton of $\mathcal C (-,A)$. \begin{align} \mathcal C (-,A) : \mathcal C & \to Set\\ \end{align} maps an object $X\in\mathcal C$ to the set of arrows $\mathcal C (X,A)$ and an arrow $f:Y\to X$ to the function \begin{align} \mathcal C (f,A) : \mathcal C (X,A) & \to \mathcal C (Y,A) \\ g & \mapsto f\,; g \end{align} ![](https://i.imgur.com/YhnsVLh.png =520x180) We say that an arrow $f:Y\to X$ acts by **precomposition** on arrows $X\to A$. Similarly, generalised observations $A\to X$ organise themselves into a functor \begin{align} \mathcal C (A,-) : \mathcal C & \to Set\\ \end{align} that maps an object $X\in\mathcal C$ to the set of arrows $\mathcal C (A,X)$ and an arrow $f:X\to Y$ to the function \begin{align} \mathcal C (A,f) : \mathcal C (A,X) & \to \mathcal C (A,Y) \\ g & \mapsto g\,; f \end{align} ![](https://i.imgur.com/Zkr9y17.png =520x180) We say that an arrow $f:X\to Y$ acts by **postcomposition** on arrows $A\to X$. These functors $\mathcal C(A,-)$ and $\mathcal C(-,A)$ are called **representable functors**, or functors represented by $A$. Before we are going to have a look at representable functors, I want to say a bit more about generalised elements and isomorphism. **Remark on generalised elements in general categories:** Remember that we spoke of $X\to A$ as a generalised element of $A$. Now let us specialise to $\mathcal C=Set$ and let us take $X$ to be a one element set. In fact, it is common in category theory to write $1$ for any one-element set. And, similarly, to write $2$ for a two element set, etc. Coming back to arrows $1\to A$ in $Set$, we notice that arrows $1\to A$ are in [bijective](https://en.wikipedia.org/wiki/Bijection) correspondence with the elements of $A$. We write this as $$Set(1,A)\cong A.$$ Thus, in the category $Set$, every element is a generalised element. Conversely, in the category $Set$, every generalised element $X\to A$ can be seen as a sum of ordinary elements $1\to A$. [^ordinaryelements] [^neitherof] Category theory is of particular interest in situations where generalised elements cannot be reduced to ordinary elements in such an obvious way. It then often turns out that the category theoretic notion of generalised element is a mathematically more powerful way to describe structure than ordinary set-theoretic elements. [^grothendieck] [^grothendieck]: In mathematics, the paradigmatic example of this is the notion of a Grothendieck topos. Later these ideas became also important for semantics of programming languages. Already Church's untyped $\lambda$-calculus is better understood with the help of category theory than in traditional set-theoretic language. ### Structure up to isomorphism We have discussed that category theory investigates mathematical objects from a structural point of view in the sense that category theory abstracts away from how mathematical objects are constructed/defined/implemented and focusses on how they relate to all other mathematical objects in "the universe", or, rather, in their category. An important aspect of this structuralist approach to mathematics is that we as mathematicians should not distinguish between ismorphic objects. Any two isomorphic objects should be taken to be the same from the point of view of category theory. [^isomorphic] To emphasize the fundamental importance of the notion of isomorphism for category theory, I want to show you that we could have motivated the definition of a category as providing just what is needed to formalise the intuitive notion of "same structure". First let us recall the definition. Given a category $\mathcal C$ and two objects $A$ and $B$ in $\mathcal C$, we say that $A$ and $B$ are **isomorphic**, $A\cong B$, if there are arrows $f:A\to B$ and $g:B\to A$ such that $f\,;g = id_A$ and $g\,; f= id_B$ as summarised in the diagram ![](https://i.imgur.com/f2XAc9N.png =400x140) Such an arrow $f$ is called an **isomorphism** and $g$ is called its inverse. First, one should verify that in the category of sets and functions an isomorphism is precisely a bijective function. [^bijective] Second, one should prove that the axioms of a category are exactly what is needed to prove that $\cong$ is an equivalence relation.[^equivalence] To go back to the example $Set(1,A)\cong A$ above: - We like to write $n$ for a finite set with $n$ elements, because all sets with the same cardinality are isomorphic. So, from the point of view of category theory, it makes sense to use a notation that emphasises that only the cardinality of a set matters. [^vonneumannordinals] - $Set(1,A)$ is the set of functions from a one-element set to $A$ and, as such, different from $A$ itself. But the difference really doesn't matter and is just one of representing the same data differently. All that matter is that we have $Set(1,A)\cong A$. - Another important isomorphism that is well-known in Haskell under the name of "currying" is $$Set(A\times B,C)\cong Set(A,C^B)$$ Here, we write $C^B$ for the set of functions $B\to C$ and so all we did was rewriting in mathematical langauge the isomorphism between Haskell types `(a,b)->c` and `a->b->c`. [^isomorphic]: As useful as category theory is to problem solving, software engineering and programming, one should keep in mind that the costs of engineering solutions is not invariant under isomorphism. As always, the power of abstraction also means that one should always remember what one forgets when abstracting. ## Representable Functors The Yoneda lemma characterises the natural transformations from a representable functor to an arbitrary set-valued functor. Before stating the Yoneda lemma, therefore, it makes sense to better understand representable functors and set-valued functors. We say that a functor $F:\mathcal C\to Set$ is **representable** if there is an object $A\in\mathcal C$ such that $F$ is isomorphic to $\mathcal C(A,-)$, that is $F\cong\mathcal C(A,-)$ **Exercise:** Show that the following functors $Set\to Set$ are representable: \begin{align} X & \mapsto X\\ X & \mapsto X\times X\\ X & \mapsto X^2\\ X& \mapsto X^n \end{align} The key idea here is that $X\cong Set(1,X)$. This proves the first item. Note how this refers back to our discussion of arrows into a set as generalised elements. In fact, arrows $1\to X$ are exactly (up to isomorphism) the elements of $A$. The remaining items follow the same idea. The next exercise will follow from the Yoneda lemma, so feel free to skip. On the other hand, it is a good preparation. **Exercise:** Show that if a functor $F:\mathcal C\to Set$ is represented by $A$ and by $B$ then $A\cong B$. ## Set-valued Functors (Presheaves) (can be skipped) We motivated set-valued functors for the role they play in understanding polymorphic functions such as those of type `a->a->a`. But set-valued functors have many more uses and we briefly look at some. One way to understand set-valued functors is as "variable sets". After all, a functor $F:\mathcal C\to Set$ is nothing but a collection of sets $(FA)_{A\in\mathcal C}$ connected by functions as specified in $\mathcal C$. The first example, is meant to illustrate the name "variable set" as the indexing category can be understood as a model of time and the set-valued functor (aka presheaf) as a set changing over time. **Example:** Let the objects of $\mathcal C$ be the natural numbers and let there be an arrow $n\to m$ iff $n\le m$. If we draw a picture of such a variable set, we see how it takes into account that new elements can enter the picture, two old elements can be identified, but no elements can disappear. ![](https://i.imgur.com/mDeJQPk.png =600x350) The picture shows a typical presheaf $F$ with the action of the arrows $n\to m$ indicated in magenta. The upper horizontal line shows one generalised element of the presheaf and how it evolves over time. Similarly, by following the second element of $F0$ to the right we obtain another generalised element of $F$. These two generalised elements are different at time $0$ but equal from time $1$ onwards. The bottom horizontal line is a generalised element of $F$ that "remains different to all other rows forever". It is also interesting to turn around the direction of arrows: **Example:** Let the objects of $\mathcal C$ be the natural numbers and let there be an arrow $n\to m$ iff $n\ge m$. If we draw a picture of such a variable set, we see how it corresponds to a set of trees, with numbers indexing the nodes at the same level. ![](https://i.imgur.com/P1jEcUt.png =600x350) The picture shows a "forest" consisting of two trees. The arrows $m\leftarrow n$ act on a node by mapping it to its parent. [^forest] [^forest]: It is interesting to think of the possible variations of $F$. A node can have any number of children (including $0$ or even infinitely many). Similarly, a forest can consist of any number of trees. Finally, as opposed to the previous example, branches do not need to be infinite. Of course, there is no reason why one would have to think of the indices as ranging over time. In the next example, we have two indices, $V$ and $E$, reminding us of "vertices" and "edges", and two arrows $s,t:E\to V$ which we read as "source" and "target". This example has more of a spatial flavour. **Example:** Let $\mathcal C$ be the category that has two objects $V,E$ and two arrows $s,t:E\to V$. [^arrows] A presheaf $F:\mathcal C\to Set$ is a graph. [^arrows]: Of course, there are also two identity arrows. There are many more data structures that can be represented as presheaves. If that is possible, this is a good indication that category theoretic methods could prove useful. The most famous ones, perhaps, are the [simplicial sets](https://en.wikipedia.org/wiki/Simplicial_set) used in the study of topological spaces. But they are also applied in programming languages to build mathematical models of variable binding and higher-order abstract syntax. ## The Yoneda Lemma **Lemma:** Let $\mathcal C$ be a category and $F:\mathcal C\to Set$ a functor. Then natural transformations $\mathcal C(A,-)\to F$ are in bijective correspondence with elements of $FA$, which we also write as $${\sf Nat}(\mathcal C(A,-), F)\cong FA$$ Before we go into the proof, let us apply the Yoneda lemma to the items from the introduction. ### Characterising polymorphic functions - The only natural transformation $\rm Id\to Id$, in other words, the only polymorphic function `a->a`, is the identity function. *Proof:* \begin{align} Nat(Id,Id) & \cong Nat(Set(1,-),Set(1,-)) \\ & \cong Set(1,1)\\ & \cong 1 \end{align} - The natural transformations $\rm Id^3\to Id^2$, in other words, the polymorphic functions `(a,a,a)->(a,a)` are in bijection with the set of functions $2\to 3$. \begin{align} Nat(Id^3,Id^2) & \cong Nat(Set(3,-),Set(2,-)) \\ & \cong Set(2,3)\\ & \cong 3^2 \end{align} - The natural transformations ${\rm Id}^\ast \to {\rm Id}^\ast$, in other words the polymorphic functions `List a -> List a`can be enumerated bijectively by the following data. A function $f:\mathbb N\to \mathbb N$ and, for all $n\in\mathbb N$, a function $g:f(n)\to n$. *Proof:* This is an application of the previous item together with $X^\ast = \sum_n X^n$. [^polynomialfunctor] [^polynomialfunctor]: There is something which I didnt say, namely that a natural transformation $\sum_n X^n\to FX$ can be split into $n$ natural transformations $X^n\to FX$. And that a natural transformation into $G\to \sum_n X^n$ consists of a choice of $n$ and a natural transformation $G\to X^n$. So this is quite a bit more complicated in the sense that it requires more knowledge of how + works as a so-called colimit. At the current stage we can only do this with a lot of handwaving. ### Generic Figures (can be skipped) A wonderful introduction to category theory through presheaves is the book [Generic figures and their glueings](https://marieetgonzalo.files.wordpress.com/2004/06/generic-figures.pdf) by Reyes, Reyes, and Zolfaghari. Justified by pictures like we have seen in the section on presheaves above, they think of presheaves as "figures". The Yoneda lemma then allows us to identify objects of $\mathcal C$ with certain presheaves, which are then called generic figures. Let us just quickly go through the examples from the section of presheaves. **Example:** - In the example where a presheaf is a set throught time, the generic figure of $n\in\mathcal C$ is the presheaf that represent one point "starting at time $n$ and moving through time infinitely". - In the example where a presheaf is a set of trees, the generic figure of $n\in\mathcal C$ is a "one-branch-tree starting at the root and finishing at level $n$". - In the example where a presheaf is a graph, the generic figure of $V$ is the graph which has one vertex and no edge and the generic figure of $E$ is the graph that has one edge and two vertices. An important consequence of the Yoneda lemma is the following. (Its proof needs the notion of colimit and is beyond the scope of this lecture.) **Fact:** Every presheaf arises as a glueing of generic figures. Even without the technology, we can work out how this fact plays out in the three examples above. For example, it should be intuitively obvious how every graph can be "glued together" from arrows and vertices. ### Proof of the Yoneda Lemma The Yoneda lemma allows to combinatorially characterise the set of natural transformation from a representable functor to a presheaf. More precisely, $${\sf Nat}(\mathcal C(A,-),F)\cong FA.$$ The proof can be summarised in the following picture. We pick a natural transformation $\tau \in {\sf Nat}(\mathcal C(A,-),F)$ and write down what it means for $\tau$ to be natural: ![](https://i.imgur.com/9vqUgYq.png =500x300) The key to the proof is the idea that the action of $\tau$ on $id_A$ determines the action of $\tau$ on all elements at all types $X$. Chasing $id_A$ around the diagram in two different ways, shows that the naturality of $\tau$ implies that $$\tau_X(f) = Ff(\tau_A(id_A))$$ This defines a function \begin{align} {\sf Nat}(\mathcal C(A,-),F) & \to FA\\ \tau & \mapsto \tau_A(id_A) \end{align} Conversely, with a small modification of the diagram ![](https://i.imgur.com/HKZolBt.png =500x300) we read off the definition of a function in the other direction \begin{align} FA & \to {\sf Nat}(\mathcal C(A,-),F)\\[1ex] a & \mapsto \tau_X(f) \stackrel{\rm def}= Ff(a) \end{align} All that remains (exercise!) is to show that these two maps are inverse of each other. [^inverse] **Remark:** There is a dual version of the Yoneda lemma that is proved in the same way: $${\sf Nat}(\mathcal C(-,A),F)\cong FA.$$ To discuss duality properly we use the notation $\mathcal C^{op}$ for the category that is the same as $\mathcal C$ but has the direction of all arrows reversed. With this notation we can write the types of the representable presheaves as \begin{gather} \mathcal C(A,-):\mathcal C\to Set\\[1ex] \mathcal C(-,A):\mathcal C^{op}\to Set \end{gather} In a slogan, postcomposition is covariant and precomposition is contravariant. **Remark:** In category theory, one can usually avoid repeating the same proof for a dual category. If a theorem is proved for all categories $\mathcal C$, then it also holds for all categories $\mathcal C^{op}$. Now observe that what we called the "dual version of the Yoneda lemma" is really the same Yoneda lemma for $\mathcal C^{op}$ instead of for $\mathcal C$. **Remark:** Mathematicians usually call Yoneda lemma what I called the "dual version of the Yoneda lemma". One for this reason will become apparent in the next section. ### Consequences of the Yoneda Lemma In the [lecture on monoids](https://hackmd.io/@alexhkurz/B1oIuXxRv#Every-monoid-is-a-monoid-of-functions) we have seen that every monoid can be represented as a monoid of functions. This is a special case of the Yoneda lemma for one-object categories. **Exercise:** Justify the last sentence above. More importantly, the Yoneda lemma shows that every category of abstract objects and arrows can be interpreted in the category $Set$ where objects are sets and arrows are functions. More, precisely, writing $$[\mathcal C^{op},Set]$$ for the category of functors $\mathcal C^{op}\to Set$ and natural transformations, \begin{align} \mathcal C & \to [\mathcal C^{op},Set] \\ A & \mapsto \mathcal C(-,A) \end{align} is an embedding in the sense that every natural transformation \begin{align} \mathcal C(-,A) & \to \mathcal C(-,B) \end{align} is induced by an arrow \begin{align} A \to B \end{align} In math lingo, this result is expressed as follows. **Proposition:** The Yoneda embedding is full and faithful. **Proof:** ... ## References A good introduction to presheaves is - Reyes, Reyes, and Zolfaghari: [Generic figures and their glueings](https://marieetgonzalo.files.wordpress.com/2004/06/generic-figures.pdf). If that book has too many prerequisites Lawvere and Schamuel's "Conceptual Mathematics" is a good place to start. A more advanced textbook is Mac Lane and Moerdijk. The main reference to presheaves, sheaves, toposes, etc is the comprehensive "Elephant" by Peter Johnstone. [^identity]: We have to show that $\tau:Id\to Id$ natural implies that $\tau_A$ is the identity for all $A$. A good way to think about such problems is to assume for a contradiction that one has some $A$ and $a,a\in A$ such that $\tau_A(a)=a'\not= a$ and then find one naturality square that does not commute. Spelling out the details, this question reduces to the combinatorial problem of finding a function $f:A\to A$ such that $f(\tau_A(a))\not=\tau_A(f(a))$, which can be done by guessing and verifying. [^functions23]: Here $2,3$ stand in for any sets with exactly $2$, respectively $3$, elements. In Haskell, the different functions `(a,a,a)->(a,a)` can be defined in the following $3^2=9$ different ways ```haskell f (x,y,z) = (x,x) f (x,y,z) = (x,y) f (x,y,z) = (x,z) f (x,y,z) = (y,x) f (x,y,z) = (y,y) f (x,y,z) = (y,z) f (x,y,z) = (z,x) f (x,y,z) = (z,y) f (x,y,z) = (z,z) ``` Exercise: Figure out how each of the 9 lines above corresponds to exactly one function $2\to 3$. [^functionlistlist]: Such $f,g$ then define the polymorphic function which takes a list $l$ of length $n$ to a list of length $f(n)$ that has as its $i$-th element the $g(i)$-th element of $l$. [^ordinaryelements]: To talk about this in category theoretic language we should introduce coproducts and colimits, which we will not do in this course. [^neitherof]: Neither of the two statements is true for general categories. In fact, in a general category, there are do not need to be any ordinary elements, since an object does not need to be made from a set. [^bijective]: A function $f:A\to B$ is bijective if \begin{gather} \forall a,a'\in A\,. f(a)=f(a') \ \Rightarrow \ a = a'\\ \forall b\in B\,.\exists a\in A\,. f(a)=b\\ \end{gather} This gives us a first glimpse of two features of category theory. Abstract category theory properties - simplify combinatorial properties formulated in set-theory, - generalise to other categories than $Set$. [^equivalence]: In detail, one has to show \begin{gather} A\cong A\\ A\cong B \Rightarrow B\cong A\\ A\cong B, B\cong C \Rightarrow A\cong C\\ \end{gather} [^vonneumannordinals]: In set-theory the so-called von Neumann ordianls defined as $n=\{0,1,\dots n-1\}$ similarly identify sets with numbers. But this definition fixes a particular "implementation" whereas category theorists prefer to avoid this. [^inverse]: In detail, given $\tau$ and defining $\tau'_X(f)=Ff(\tau_A(id_A))$, show $\tau=\tau'$. Given $a$ and defining $a'=F\,id_A(id_A)$, show $a=a'$.