On Variables and Substitution

$\newcommand{\id}{\mathrm{id}} \newcommand{\Set}{\mathbf{Set}} \newcommand{\Fin}{\mathbf{Fin}} \newcommand{\Card}{\mathbf{Card}} \newcommand{\names}{\mathbf{N}} \newcommand{\hcomp}{\; ;\;}$ # On Variables and Substitution (notes for a talk at Fullerton College, Sept 6, 2018 ... variables, substitutions, category theory, string diagrams and reporting on ongoing joint work with Samuel Balco ... for comments and questions [email me](mailto:akurz@chapman.edu).) (should be readable now, but some gaps remain ...) ## Abstract Variables---and operations of substituting values for variables---play an important role in mathematics and computer science. Whenever we want to design programming languages or build tools such as compilers, a careful understanding of variables is necessary. The difficulties and subtleties that arise in a study of variables link up with two of the most beautiful mathematical theories that are currently under intense development, the theory of nominal sets and the theory of string diagrams, both of which are children of category theory. In this talk, I will give an overview of the area and briefly touch on some ongoing collaboration with Samuel Balco. I hope that the presentation would appeal in equal measure to students of mathematics and of computer science. ## Introduction I try to make the talk interesting for students of maths and of computer science. I also want to show you some category theory, an important area of mathematics that is of fundamental importance, but not even taught at all universities. One of many reason why category theory is important is that it links up mathematics and computer science, and we can also add logic and physics, even though that will not play a big role in this talk. So let us start with a dictionary. mathematics| computer science | category theory | logic | physics -----------|-------------|------------| ---- |-- set | data type | object | proposition | space function | program | arrow | proof | transformation I will illustrate the nature of this landscape with a research question on which I have worked recently. But the emphasis is to hint with a concrete example at the power of the dictionary above. ## Preliminaries: Substitution, Free and Bound Variables ### Substitution Sustitution of values for variables is familiar to all of us. For example, we may want to substitute $1$ for $x$ in $$ 0 = x^2 - 2x +1 $$ to check that $1$ is a solution of the equation. Substitution of terms for variables is as common. For example, if we have two functions $f,g$ then the composition $g\circ f$ is defined as $(g\circ f)(x)=g(f(x))$, that is, by substituting $f(x)$ for $y$ in $g(y)$. ### A Warning Example Substitution is such a basic activity in maths and programming that we usually don't think much about it. But if not done carefully some subtle mistakes can happen. For example, suppose we have two functions $$f(x) = x + y$$ and $$g(y) = 2*y$$ What is the function obtained from substituting the output of g into the input of f? In other words, what is the correct formula for the function $f\circ g$? The first idea is probably to replace the $x$ on the right-hand side of $f(x) = x + y$ by $g(y)$, which is $y*2$, and to write $$f(g(y)) = 2*y + y$$ but this is the wrong answer. [^correct] Such mistakes arise from not taking care properly of the distinction of free and bound variables, which we review now. ### Free and Bound Variables The following definitions are not completely precise, but they can be made precise in all relevant circumstances known to me. A ***fresh variable*** is a variable that has not been used before. A ***bound variable*** is a variable that can be renamed by a ***fresh variable*** without changing its meaning. A variable is called ***free*** if it is not bound. **Example 1:** If we define a function, for example $$f(x) = x^2 - 2x + 1$$ then $x$ is a bound variable. Indeed $$f(y) = y^2 - 2y + 1$$ defines the same function $f$. To run through the definitions, $y$ is *fresh* in $f(x) = x^2 - 2x + 1$ since $y$ does not occur in $f(x) = x^2 - 2x + 1$. And if we substitute $x$ by $y$, then the meaning of the expression does not change: The function $f$ is the same, whether defined via $f(x) = x^2 - 2x + 1$ or via $f(y) = y^2 - 2y + 1$. For the argument above, it is crucial that $y$ is fresh for $f(x) = x^2 - 2x + 1$. If instead, we substituted $y$ for $x$ in $$\color{black}{y=x^2 - 2x + 1}$$ where $y$ is not fresh, the meaning of the expression would change. **Question:** Can you explain in plain words how the meaning of $\color{black}{y=x^2 - 2x + 1}$ is different from the meaning of $\color{black}{y=y^2 - 2y + 1}$. [Hint: The challenge is to make precise what a good notion of meaning is in this example.] The technical terminology in programming languages is that the expressions $f(x) = x^2 - 2x + 1$ and $f(y) = y^2 - 2y + 1$ are ***alpha-equivalent***. $\alpha$-equivalent expressions define the same functions/programs. **Example 2:** If we fixed a coordinate system with axes labelled $x$ and $y$ as usual, then in $$y=x^2-2x+1$$ the variables $x,y$ are free or bound, depending on whether we include the coordinate system as part of our description or whether the coordinate system is outside of our description, or, as we may say, part of the "environment". - Suppose the coordinate system is part of the environment. Then $x,y$ are free. Indeed, suppose $x,y$ where bound. Because I can replace bound variables by fresh variables without changing the meaning, I can also permute (ie swap) bound variables without changing the meaning. But if you plot the graph corresponding to $x=y^2-2y+1$ you will see that it is different from the graph that solves $y=x^2-2x+1$. This shows that $x,y$ are not bound, hence free. - Suppose the coordinate system is part of the our description. Then, if I replace $x,y$ by fresh variables in the equation, I also have to do the corresponding replacement in the coordinate system. Consequently, the graph of the equation does not change. This shows that $x,y$ are bound. **Remark:** The observation that the notion of free variable can be axiomatised in terms of permutations of variables has far-reaching consequences and is at the heart of the theory of ***nominal sets*** developed by Gabbay and Pitts. **Example 3:** Typically bound variables can be recognised by the presence of a so-called *binder* introducing the variable and a well-defined *scope* in which the variable has meaning. For example, in $$ \sum_{i=0}^\infty 2^i$$ the variable $i$ is bound, the binder is $\sum_{i=0}^\infty$ and the scope is everything that follows. Similarly, in $\int_0^\infty e^{-x}dx$ the variable $x$ is bound, the binder is $\int_0^\infty [\ldots]dx$ and the scope is $[\ldots]$. **Example 4:** In a program such as int f(int a, int b) { ... } the binder is `(int a, int b)`, the variables `a`, `b` are bound and the scope is `{...}`. **Example 5:** In the following programming example from [Khan Academy](https://www.khanacademy.org/computing/computer-programming/programming/looping/p/challenge-a-loopy-ruler), the first task is to display a ruler across the top of a window using a while loop; the code fill(17, 0, 255); // a handy dandy ruler across the top var x = 0; while(x < 400) { text(x, x, 10); x += 50; } gives indeed the [expected result](https://github.com/alexhkurz/talks/blob/master/Screenshot-2018-09-03-15.37.03.png). The next task was to add a ruler down the left. A first try is the following. fill(17, 0, 255); // a handy dandy ruler across the top var x = 0; while(x < 400) { text(x, x, 10); x += 50; } // a handy dandy ruler down the left var y = 0; while(y < 400) { text(y, y, 10); y += 50; } What is the output of the code above? Why? Let us go back to fill(17, 0, 255); // a handy dandy ruler across the top var x = 0; while(x < 400) { text(x, x, 10); x += 50; } Is `x` a free or a bound variable in the code above? Do you see a binder that binds `x`? How does the answer explain the observation that both snippets of code produce the same visible output? ### Substitution and Bound Variables In the presence of bound variables, we need to be careful with substitution. We have seen one [warning example](https://hackmd.io/s/BkLiMcFPQ#A-Warning-Example). Here are some more. <I might skip this if I think we have enough for a 70min talk> <While important, it is not needed in the rest of talk> #### A maths example The following is a well known equation $$(x-1)\cdot\sum_{i=0}^{n} x^i = x^{n+1}-1$$ Which of the variables $i,x,n$ are free/bound in $\sum_{i=0}^{n} x^i$ ? Does the equation remain true if we replace $x$ by any other variable such as $y,z,\dots$ ? #### A programming example (JavaScript) - In a Javascript program such as var a=10; var f = function(a) { return a+3; }; `f(2)` evaluates to `5`, because `a` in `a+3` is bound by the occurrence of `a` in `function(a)` and not by the occurrence of `a` in `var a`. - The scope of the binder `function(a)` is the block demarcated by `{` and `}` var f = function(a) { return a+3; } - If we change the code to var a=10; var f = function() { return a+3; }; then `f()` evaluates to `13` because the `a` in `a+3` is now bound by `var a=10;` - The code below has a 'hole' marked '??'. var a=10; var f = function() { var a = 2; return a+??; }; If we want substitute `a+3` for `??` in a capture avoiding way, we need to first rename `a` using a fresh variable as in var a=10; var f = function() { var b = 2; return b+??; }; Then `f()` evaluates to `15`, whereas a textual (non-capture avoiding) substitution would make `f()` evaluate to `7`. #### A programming example (Prolog) - The transitive closure `ancestor` of a relation `parent` can be defined as, reading ` :- ` as "if" and ` , ` as "and" ancestor(X,Y) :- parent(X,Y) and ancestor(X,Y) :- ancestor(X,Z) , parent (Z,Y) - Note that - the names ancestor and parent are just chosen for illustration. `parent` can be an arbitrary binary relation and `ancestor`is then its transitive closure; - this is the same definition, just in different notation, as the one we have seen in [the lecture on induction](https://hackmd.io/s/H1panO_um) and which we repeat here in the following footnote; [^transitive] - the notation we have chosen here is the one of the programming language [Prolog](http://www.swi-prolog.org). - Assume we know that parent(harry,william) parent(charles,harry) parent(philip,charles) and we want to know all ancestors of `william`, that is, we want to know all `X` such that `ancestor(X,william)` Let us do the computation. First, we known that parent(harry,william) so we output `X=harry`. Next, we compute ancestor(X,william) if ancestor(X,Z) and parent(Z,william) upon which we can either output `X=charles` or continue with ancestor(X,william) if ancestor(X,Z), parent(Z,william) if ancestor(X,-), parent(-,Z), parent(Z,william) where, for the last line above, we want to apply the rule ancestor(X,Y) :- ancestor(X,Z) , parent (Z,Y) to ancestor(X,Z) To do this we need to replace the `Y` in `ancestor(X,Y)` with `Z`. But this means that on the right hand side we also need to replace `Y` by `Z`, resulting in ancestor(X,-), parent(-,Z) But now we have a question: - **Exercise:** Which variable should replace the `-` in the above? ## A Research Question Describe a calculus of substitutions that can be used to give a formal account of substitutions. Of course that has been done before many times. But I am not aware of a 2-dimensional calculus for simultaneous substitutions. What is new here? String diagrams ... enter category theory ... so we need to put in some work on categories before coming back to subsitutions. ## Categories One of the many ways of looking at category theory is as a theory of compositionality. This is important to programming and software engineering as we need to build big systems from small systems we understand in a compositional way. "Compositional" here refers to being able to deduce properties of the big system from known properties of the smaller components. Category theory has to say a lot about compositionality in many different ways. To start with, a category is a mathematical structure that has "arrows" and composition of arrows. **Definition:** A category $\cal A$ consists of a set of objects, also denoted by $\cal A$, and for any two objects $A,B$ a set of arrows $\mathcal A(A,B)$. There are special arrows $\id_A$ called identiy arrows or identities for all $A$ in $\cal A$ and for any $f$ in $\mathcal A(A,B)$ and $g$ in $\mathcal A(B,C)$ there is an arrow $f;g$, called the composition of $f$ and $g$. This data is required to satisfy tha laws of identity and associativity. To express that $f$ is an element of $\mathcal A(A,B)$, we also write $f:A\to B$ and call $A$ the domain or source of $f$ and $B$ the codomain or target. The composition $f;g$ is also written as $g\circ f$ or $gf$. **Example:** - Let $\Set$ be the category that has sets as objects and all functions as arrows. - Given a countably infinite set $\names$, the category $\Fin$ is the category of finite subsets of $\names$. This is the leading example for us as we think of the set $\names$ as the set of variables $\{x,y,z,\ldots\}$. - The category $\Card$ consists of finite cardinals and all functions. One could check that $\Set$, $\Fin$, $\Card$ are indeed categories. But we know anyway that we can compose functions and that there is an idenity function. **Quick Excursion:** The examples above are bit boring, at least at first sight. So why is category theory interesting? - Going back to our dictionary, there are many categories that are very different from **Set**. - Categories where objects are sets, but with extra structure and arrows are functions that preservere that structure. For example, there are categories of (aks the audience what structures they came across) - graphs - groups - commutative groups - monoids - ordered sets - vector spaces - metric spaces - topological spaces - ... - Categories where objects are data-types and arrows are programs (*type theory*) - Categories where objects are sets (with extra structure or not) and arrows are not functions but, say, relations - Categories where objects are not sets, but themselves points in a space and the arrows are paths connecting the points - Categories where the set of arrows *A(a,b)* is replaced by an object in another category (*enriched category theory*) - Categories where the both the set of object and the set of arrows are replaced by objects in another category (*internal category theory*) - Categories that have two or more ways of composing arrows (*monoidal categories, double categories, higher categories*) While it is the first item that is best known in the mathematical world, it is the other ones that are more interesting as topics of research inside category theory. The last item will play a role in the rest of the talk. - Category theory is the way to go if you want to think "up to isomorphism". (Example: The skeleton of a category.) - Category theory can be understood as a set theory in which you dont use "is an element of" as the basic operation but "composition". This leads to a fantastic idea, namely *definition by universal property*. We have seen that there is a large variety of categories. If you make a definition by "is an element of" then you have to adapt the definition for each category and, by the way, it only works in case your category is of the "objects are sets plus structure" variety. If you make a definition by universal property, then it makes sense in all of these different examples. So category theory can be full of beautiful surprises where seemingly different constructions end up being instance of the same universal property. (Example: cartesian product.) - (Maybe this point needs to be skipped as it would take too long and is not needed for the talk.) One of the most powerful notions of mathematics, comparable to the invention of the 0, is that of a *natural transformation*. It is the notion category theory was created for and plays a central role in the [first paper on the subject](https://ncatlab.org/nlab/show/General+Theory+of+Natural+Equivalences). I will just explain the main idea. In mathematics, early on, you encounter indexed numbers, such as x_i where i runs over an infinite set as eg in Taylor series. Later, one also needs indexed functions f_i. A natural transformation generalises such a family of functions in the case where the indices are not a mere set but a category. This happens all the time in mathematics and I give you just one example. We said that there is a category where objects are data types and arrows are programs. But some data types can be indexed, for example the type List(X) of lists over a set X: The elements of X are the elements you can put in the lists. Now functions f_X: List(X) -> List(X) from lists to lists are also indexed by X. So which families f_X of indexed functions are natural in this example? The natural transformations are what is called in programming the *polymorphic functions*, that is, the functions that have the same definitions for all X, or, the programs that would run correctly whatever X is. For example, reversing a list is polymorphic (natural), but sorting a list of numbers is not. (Question: For which data type would sorting be polymorphic?) ## Monoidal Categories, String Diagrams, 2-Dimensional algebra Some comments on 2-Dimensional Algebra ... A ***monoidal category*** is a category in which we have a second composition on objects and arrows which we write as $A\otimes B$ and $f\otimes g$ and which satisfies the so-called interchange law $$ (f;g)\otimes (f';g') = (f\otimes f');(g\otimes g')$$ This law, called the ***interchange law***, looks probably unfamiliar to you $\ldots$ so how should I explain it? First, let us note that, as in associativity, each variable appears exactly once $\ldots$ and also the order between the variables is preserved, if we ... read the interchange law in two dimensions. Let us explain this. It is really a kind of 2-dimensional associativity. <draw the picture on the whiteboard> [or click here](https://github.com/alexhkurz/talks/blob/master/Screenshot-interchange-law.png), taken from [lecture 1, page 40, of Sobocinski, Graphical Linear Algebra, MGS 2017](http://www.cs.le.ac.uk/events/mgs2017/courses/graphical-linear-algebra.html). With this we can build 2-dimensional algebra. Remember that (1-dimensional algebra) is built from variables, constants and operations such as $+, -, /, \cdot, \dots$ To give an example of a 2-dimensional algebra take as operations the so-called cup and the lollipop <draw the picture on the whiteboard> [or click here](https://github.com/alexhkurz/talks/blob/master/Screenshot-cup-lollipop.png), taken from [lecture 2, page 17, of Sobocinski, Graphical Linear Algebra, MGS 2017](http://www.cs.le.ac.uk/events/mgs2017/courses/graphical-linear-algebra.html) $\ldots$ actually, in Sobocinski's lectures there are [two more operations](https://github.com/alexhkurz/talks/blob/master/Screenshot-id-twist.png) but from those I only want to take the identity (the left one) and for now ignore the "twist" (the right one) $\dots$ (**Exercise:** Show that every function $f:\{0,1,2,3\}\to\{0,1,2\}$ can be represented by a string diagram. Hint: Write down 4 input wires on the left and label them top down $0,\ldots 3$, write down 3 output wires on the right and label them top down again. Now link 0 to $f(0)$ etc using cup and lollipop. Do you need twist? Which functions can be represented without twist?) Moreover, much as there are equations in 1-dimensional algebra, there are in 2-dimensional. For example, <draw the picture> [or click here](https://github.com/alexhkurz/talks/blob/master/Screenshot-generators-equations.png), taken from [lecture 2, page 22, of Sobocinski, MGS2017](http://www.cs.le.ac.uk/events/mgs2017/courses/graphical-linear-algebra.html). (**Exercise:** Show that equations hold for functions. More precisely, if you transform a diagram $D$ using the equations into the diagram $D'$, then $D$ and $D'$ represent the same function.) (**Exercise:** (More difficult) Show that if two diagrams represent the same function, then the diagrams can be proved equal using the equations.) This is example is just the simplest interesting one, but it suffices for what follows. If you want to read more about the great number of different 2-dimensional algebras I recommend the survey of Selinger or anything on the topic written by Bonchi, Sobocinski, Zanasi and collaborators. If you want to read about applications to quantum information and physics try the book by Coecke and Kissinger. **Remark:** If you know Turing machines, you will remember that Turing's analysis [in Section 1 of his famous paper](https://londmathsoc.onlinelibrary.wiley.com/doi/epdf/10.1112/plms/s2-42.1.230) of computability reduced computation to a 1-dimensional tape. In terms of computability, modern CPU's are still Turing machines, but for the sake of efficiency they involve massively parallel (=higher dimensional) computations. At least in some areas, the step from one-dimensional to higher-dimensional algebra gives you a similar gain in efficiency. ## A 2-Dimensional Calculus of Simultaneous Substitutions #### Substitutions Substitutions $[a{\mapsto}b]$ and $[b{\mapsto}c]$ can be composed sequentially, and we will have $$[a{\mapsto}b]\hcomp[b{\mapsto}c] = [a{\mapsto}c].$$ Additionally, we want to build simultaneous substitutions $$[a{\mapsto}b] \oplus [c{\mapsto}d] = [a{\mapsto}b, c{\mapsto}d].$$ We call $\oplus$ the tensor, or the monoidal or vertical or parallel composition. Semantically, this simultaneous substitution will correspond to the function $$f:\{a,c\}\to \{b,d\}$$ satisfying $f(a)=b$ and $f(c)=d$ as suggested by the syntax $[a{\mapsto}b, c{\mapsto}d]$. Parallel composition of simultaneous substitutions is partial. For example, $$[a{\mapsto}b] \oplus [a{\mapsto}c]$$ is undefined, since there is no function $\{a\}\to\{b,c\}$ that maps $a$ simultaneously to both $b$ and $c$. #### What is the advantage of the 2-dimensional calculus over a 1-dimensional one? As we are setting out to represent the category of finite sets and functions, in a 1-dimensional calculus, operations $[a{\mapsto}b]$ will have to be indexed by a finite set $S$ and will be of type $[a{\mapsto}b]_S:S]\uplus \{a\}\to S\uplus \{b\}$ for sets $S$ with $a,b\notin S$. The first observation is that the ability of the 2-dimensional calculus to represent set-union by $\oplus$ makes indexing with subsets $S$ unnecessary. More importantly, consider how to implement the swapping \begin{equation} [a{\mapsto}b] \oplus [b{\mapsto}a]:\{a,b\}\to\{a,b\} \end{equation} in the 1-dimensional calculus via \begin{gather} \{a,b\}\longrightarrow\{c,b\}\longrightarrow\{c,a\}\longrightarrow\{a,b\}\\ [a{\mapsto}c]_{\{b\}} \hcomp [b{\mapsto} a]_{\{c\}} \hcomp [c{\mapsto}a]_{\{b\}} \end{gather} While it is possible to write down the equations and rewrite rules for the 1-dimensional calculus, it does not appear as particularly natural. In particular, only in the 2-dimensional calculus, will the substitution $[a{\mapsto}c]_{\{b\}} \hcomp [b{\mapsto} a]_{\{c\}} \hcomp [c{\mapsto}a]_{\{b\}}$ have a simple normal form such as $[a{\mapsto}b] \oplus [b{\mapsto}a]$, which is unique up to commutativity of $\oplus$. #### A graphical calculus for simultaneous substitutions [With twists needed for diagrams in 2 dimensions](https://github.com/alexhkurz/talks/blob/master/Screenshot-nominal-string-diagrams-2.png) [For diagrams in 3 dimensions](https://github.com/alexhkurz/talks/blob/master/Screenshot-nominal-string-diagrams-3.png) ## Summary and Outlook ... ... say sth about nominal sets ... ... ## References If you are looking for a blog-style introduction to string diagrams I recommend **Pawel Sobocinski: [Graphical Linear Algebra](https://graphicallinearalgebra.net)** which is written with a general audience in mind. But don't be deceived, there is original mathematical research happening in it as well. If you want to see how this material looks when turned into a 4 lecture course at a summer-school for graduate students consult the more concise **Pawel Sobocinski: [Graphical Linear Algebra](http://www.cs.le.ac.uk/events/mgs2017/courses/graphical-linear-algebra.html)**. MGS 2017, Leicester, UK. If you think that string diagrams look a lot like circuits, then you are right and you may be interested in **Bonchi, Sobocinski, Zanasi: [The Calculus of Signal Flow Diagrams I: Linear Relations on Streams](https://www.southampton.ac.uk/~ps1a06/papers/sfg1.pdf)** 2015. and **Ghica, Jung, Lopez: [Diagrammatic Semantics for Digital Circuits](http://www.cs.bham.ac.uk/~drg/papers/csl17b.pdf).** CSL 2017 A great place to learn about category theory and physis (and much more) is one of the first science blogs ever, the now legendary **John Baez: [This week's finds in mathematical physics](http://math.ucr.edu/home/baez/twfshort.html)**. Not an easy read but the most complete reference on category theory is **nLab: [A wiki for collaborative work on Mathematics, Physics and Philosoph](https://ncatlab.org/nlab/show/HomePage)** where you find, for example, the full definition of a [monoidal category](https://ncatlab.org/nlab/show/monoidal+category). For more relaxed blog-style discussions on current topics of interest in category theory **The n-category cafe: [A group blog on math, physics and philosophy](https://golem.ph.utexas.edu/category/2008/11/nlab.html)** is a wonderful place. For much of the 2-dimensional algebra that has graphical calculi the standard reference is **Peter Selinger: [A survey of graphical languages for monoidal categories. Springer Lecture Notes in Physics 813 (2011)](https://arxiv.org/abs/0908.3347)**. A new book that applies this technology to quantum physics is **Bob Coecke, Aleks Kissinger: [Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press (2017)](http://www.cambridge.org/us/academic/subjects/physics/quantum-physics-quantum-information-and-quantum-computation/picturing-quantum-processes-first-course-quantum-theory-and-diagrammatic-reasoning?format=HB)**. Nominal sets where invented by Gabbay and Pitts (and the equivalent formalism of named sets by Montanari and Pistore) and a good place to start is the book **Andrew Pitts: [Nominal Sets: Names and Symmetry in Computer Science. Cambridge University Press (2013)]()**. The papers that elevated string diagrams from an informal technique associated with Feynman and Penrose to a mathematically rigourous research area are **Joyal, Street. The geometry of tensor calculus I**. Advances in Math. 88 (1991) and subsequent work. They show that gemetric equality of diagrams in the plane or in space not only gives equations (such as the interchange law) for free, but also is complete for important equational theories. The introduction is also interesting from a historical point of view. [^correct]: The correct answer is $$f(g(z)) = 2*z + y$$ where $z$ could be any variable different from the ‘global variable’ $y$.