Variables, Binding, Scope and Substitution

--- tags: programming languages, christopher strachey --- $\newcommand{\id}{\mathrm{id}} \newcommand{\Set}{\mathbf{Set}} \newcommand{\Fin}{\mathbf{Fin}} \newcommand{\Card}{\mathbf{Card}} \newcommand{\names}{\mathbf{N}} \newcommand{\hcomp}{\; ;\;}$ "It seems to me wiser not to make a distinction between the meaning of ‘name’ and that of ‘identifier’ and I shall use them interchangeably. The important feature of a name is that it has no internal structure at any rate in the context in which we are using it as a name. Names are thus atomic objects and the only thing we know about them is that given two names it is always possible to determine whether they are equal (i.e., the same name) or not." Christopher Strachey, [Fundamental Concepts in Programming Languages](https://www.cs.cmu.edu/~crary/819-f09/Strachey67.pdf) [^videogames] [^videogames]: [Christopher Strachey](https://en.wikipedia.org/wiki/Christopher_Strachey) is a pioneer of programming languages. [This article](https://theconversation.com/who-invented-video-games-169792) credits him as a pioneer of computer games. # Variables, Binding, Scope and Substitution (In the light of the quote, maybe I should have said "name" instead of "variable". Variables are names, but often have additional structure. $\lambda$-calculus variables are "pure" names in the sense of the quote.) ## Introduction Variables---and operations of substituting values for variables---play an important role in mathematics and computer science. Whenever we want to design programming languages or build tools such as interpreters and compilers, a careful understanding of variables, and in particular of binding and scope, is necessary. ## JavaScript After this lecture you should be able to explain the following. - The Javascript program ```javascript var a=10; var f = function(a) { return a+3; }; f(2); ``` evaluates to which value? [^f(2)] [^f(2)]: This evaluates to `5`, because `a` in `a+3` is bound by the occurrence of `a` in `function(a)` and not by the occurrence of `a` in `var a`. - The *scope* of the *binder* `function(a)` is the block demarcated by `{` and `}` ```javascript var f = function(a) { return a+3; } ``` - If we change the code to ```javascript var a=10; var f = function() { return a+3; }; ``` what happens? [^function()] [^function()]: `f()` evaluates to `13` because the `a` in `a+3` is now bound by `var a=10;` - The code below has a 'hole' marked '??'. ```javascript var a=10; var f = function() { var a = 2; return a+??; }; ``` If we substitute `a+3` for `??` what happens? [^aplusthree] [^aplusthree]: Imagine you are the designer of Javascript. What is the result you want? There are two possibilities and what you have to decide is whether you want macro-substitution (ie mere text replacement) or capture avoiding substitution. Substituting in a capture avoiding way, we need to first rename `a` using a fresh variable as in ```javascript var a=10; var f = function() { var b = 2; return b+??; }; ``` Then `f()` evaluates to `15`, whereas a textual (non-capture avoiding) substitution would make `f()` evaluate to `7`. Can we translate this to lambda calculus? Here is one (simplified) analysis of what happens using lambda calculus ``` (\code. \argument. code argument) (\x. var a=10; var f = function() {\a.(var a = 2;return a+x;)};) (a+3) = (\x. var a=10; var f = function() {\a.(var a = 2;return a+x;)};) (a+3) = var a=10; var f = function() {\b.(var b = 2;return b+(a+3);)}; ``` After the last `=`, we see how capture avoiding substitution renames the "inner `a`" to some fresh variable (here `b`) to give us the expected result. Compare this with ```javascript var a=10; var f = function(x) { var a = 2; return a+x; }; f(a+3); ``` Use this opportunity to discuss the difference of call by value (JavaSript) and call by name (lambda calculus). ## Substitution Sustitution of values for variables is familiar to all of us. For example, we may want to substitute $1$ for $x$ in $$ 0 = x^2 - 2x +1 $$ to check that $1$ is a solution of the equation. Substitution of terms for variables is as common. For example, if we have two functions $f,g$ then the composition $g\circ f$ is defined as $(g\circ f)(x)=g(f(x))$, that is, by substituting $f(x)$ for $y$ in $g(y)$. ### Function Composition An important method, in mathematics and software engineering, to build complex systems from simple ones is to compose functions. For example, suppose we have two functions $$f(x) = x + y$$ and $$g(y) = 2\cdot y$$ What is the function $f\circ g$ obtained from composing $f$ with $g$, that is, from substituting the output of g into the input of f? What is the correct formula for (or implementation of) the function $f\circ g$? **Definition:** The function $$f\circ g$$ (pronounced "f after g") is the function obtained from taking the output of $g$ as the input of $f$. In the example, we want to replace the $x$ in $f(x) = x + y$ by $g(y)=y\cdot 2$. It is tempting to write $$f\circ g\,(y)=f(g(y)) = 2\cdot y + y = 3y$$ but this gives the **wrong** formula for $f\circ g$. Such mistakes arise from not taking proper care of the distinction of free and bound variables. ### Exercise - What is the correct formula for the function $f\circ g$ in the example above? - Use the notions of free and bound variables and scope to explain your answer. - Using lambda calculus write down a general formula for $f\circ g$ that works for arbitrary $f$ and $g$. ## Free and Bound Variables The following definitions are not completely precise, but should suffice for now. A ***fresh variable*** is a variable that has not been used before. A ***bound variable*** is a variable that can be renamed by a ***fresh variable*** without changing the meaning. A variable is called ***free*** if it is not bound. Free variables receive their meaning from the **context** in which they appear. **Example 1:** If we define a function, for example $$f(x) = x^2 - 2x + 1$$ then $x$ is a bound variable. Indeed $$f(y) = y^2 - 2y + 1$$ defines the same function $f$. To run through the definitions, $y$ is *fresh* in $f(x) = x^2 - 2x + 1$ since $y$ does not occur in $f(x) = x^2 - 2x + 1$. And if we substitute $x$ by $y$, then the meaning of the expression does not change: The function $f:\mathbb Z\to \mathbb Z$ is the same, whether defined via $f(x) = x^2 - 2x + 1$ or via $f(y) = y^2 - 2y + 1$. **Remark:** We said that $x$ in $f(x) = x^2 - 2x + 1$ is a bound variable because the meaning of $f(x) = x^2 - 2x + 1$ does not depend on $x$. Another way to see this clearly is to translate $f(x) = x^2 - 2x + 1$ into English as "$f$ is the function that squares its input, then subtracts twice the input, then adds one". Clearly, the meaning of this sentence does not depend on $x$. **Question:** Can you use your knowledge of mathematics to define the meaning of $f(x) = x^2 - 2x + 1$ using set-theoretic notation? [^settheoretic] [^settheoretic]: $f=\{(x,x^2-2x+1) \mid x\in\mathbb R\}$. For an example that freshness is important think about $$\color{black}{y=x^2 - 2x + 1}$$ If we think of this equation as defining a parabola, then $x$ should be a bound variable. Which means that we should be able to replace $x$ by, say, $z$. And, indeed, it makes sense to say that the equation $$\color{black}{y=z^2 - 2z + 1}$$ defines the same parabola as the equation $\color{black}{y=x^2 - 2x + 1}$. But, while replacing $x$ by $z$ does not change the meaning of the equation, replacing $x$ by $y$ does: $$\color{black}{y=y^2 - 2y + 1}$$ **Question:** What would be an appropriate mathematical definition of meaning for equations such as $y=x^2 - 2x + 1$? Are $x,y$ bound variables then? Why? [^solutions] [^solutions]: We can define the meaning of an equation as its set of solutions. This definition does not mention any variable names such as $x$ and $y$. This shows that the variable names are, in this case, bound variables. **Remark/Reminder:** The technical terminology in programming languages is that the expressions $f(x) = x^2 - 2x + 1$ and $f(y) = y^2 - 2y + 1$ are ***alpha-equivalent***. $\alpha$-equivalent expressions define the same functions/programs. The following example shows that the notions of free and bound variables are relative to a distinction between "inside" and "outside", or "object" and "meta", or ... **Example 2:** If we fixed a coordinate system with axes labelled $x$ and $y$ as usual, then in $$y=x^2-2x+1$$ the variables $x,y$ are free or bound, depending on whether we include the coordinate system in our "object system" or whether it is part of the "context" or environment" or "meta system". - Suppose the coordinate system is part of the environment. Then $x,y$ are free. Indeed, suppose $x,y$ where bound. Because I can replace bound variables by fresh variables without changing the meaning, I can also permute (ie swap) bound variables without changing the meaning. But if you plot the graph corresponding to $x=y^2-2y+1$ you will see that it is different from the graph of $y=x^2-2x+1$. This shows that $x,y$ are not bound, hence free. - Suppose the coordinate system is part of the object system. Then, if I replace $x,y$ by fresh variables in the equation, I also have to do the corresponding replacement in the coordinate system. Consequently, the graph of the equation does not change. This shows that $x,y$ are bound. **Remark:** The observation that the notion of free variable can be axiomatised in terms of permutations of variables has far-reaching consequences and is at the heart of the theory of ***nominal sets*** developed by Gabbay and Pitts. **Summary:** The lesson to learn from the previous example is that free variables are bound in an environment. But they are not bound by the syntax itself. Unless we extend the syntax to include the environment. **Example 3:** Typically bound variables can be recognised by the presence of a so-called *binder* introducing the variable and a well-defined *scope* in which the variable has meaning. For example, in $$ \sum_{i=0}^\infty 2^i$$ the variable $i$ is bound, the binder is $\sum_{i=0}^\infty$ and the scope is $2^i$. Similarly, in $\int_0^\infty e^{-x}dx$ the variable $x$ is bound, the binder is $\int_0^\infty [\ldots]dx$ and the scope is $e^{-x}$. **Example 4:** In a program such as int f(int a, int b) { ... } the binder is `(int a, int b)`, the variables `a`, `b` are bound and the scope is `{...}`. [^scopef] [^scopef]: Use your knowledge of programming languages to think about what the scope of `f` is. **Question:** Is `int f` also a binder? What would be its scope? Discuss with examples of programming languages you know. **Example 5:** In the following programming example from [Khan Academy](https://www.khanacademy.org/computing/computer-programming/programming/looping/p/challenge-a-loopy-ruler), the first task is to display a ruler across the top of a window using a while loop; the code fill(17, 0, 255); // a handy dandy ruler across the top var x = 0; while(x < 400) { text(x, x, 10); x += 50; } gives indeed the [expected result](https://github.com/alexhkurz/talks/blob/master/Screenshot-2018-09-03-15.37.03.png). The next task was to add a ruler down the left. A first try is the following. fill(17, 0, 255); // a handy dandy ruler across the top var x = 0; while(x < 400) { text(x, x, 10); x += 50; } // a handy dandy ruler down the left var y = 0; while(y < 400) { text(y, y, 10); y += 50; } What is the output of the code above? Why? Let us go back to fill(17, 0, 255); // a handy dandy ruler across the top var x = 0; while(x < 400) { text(x, x, 10); x += 50; } Is `x` a free or a bound variable in the code above? Do you see a binder that binds `x`? How does the answer explain the observation that both snippets of code produce the same visible output? **Example:** The following quote is from the Russell and Whitehead's famous [Principia Mathematica](https://quod.lib.umich.edu/u/umhistmath/aat3201.0001.001/39), page 17, first published 1910. They write $(x)$ where we would write $\forall x$, meaning forall $x$. ![](https://i.imgur.com/oye88Er.png) They use "apparent variable" where we would say "bound variable", but otherwise their explanation is already using modern technical language. If you follow the link and read on, you will find that they already use "scope" in the same way as we do today. ## Capture Avoiding Substitution In the presence of bound variables, we need to be careful with substitution as explained in the examples above. Two more examples follow below. Let $M$ and $N$ be any expressions that may contain free and bound variables. If the set of bound variables of $M$ is disjoint from the set of free variables of $N$, then $$M[x\mapsto N]$$ is just the expression obtained from replacing all free occurrences of $x$ in $M$ by $N$. Else, if the set of bound variables of $M$ is not disjoint from the set of free variables of $N$, $M[x\mapsto N]$ denotes any $M'[x\mapsto N]$ where $M'$ is obtained from $M$ by renaming the bound variables of $M$ such that the set of bound variables of $M'$ is disjoint from the set of free variables of $N$. There is a subtle issue here: The term $M'$ is not uniquely determined as there are infinitely many ways of renaming bound variables. But they are all equivalent (following Church, we call this equivalence relation $\alpha$-equivalence) because, by their very nature, bound variables are variables that can be renamed without changing the meaning of an expression. #### A maths example The following is a well known equation $$(x-1)\cdot\sum_{i=0}^{n} x^i = x^{n+1}-1$$ Which of the variables $i,x,n$ are free/bound in $\sum_{i=0}^{n} x^i$ ? Does the equation remain true if we replace $x$ by any other variable such as $y,z,\dots$ ? What is an example of a term that can only be substituted for $x$ after renaming the bound variable $i$? #### A programming example The transitive closure `ancestor` of a relation `parent` can be defined as, reading ` :- ` as "if" and ` , ` as "and" ancestor(X,Y) :- parent(X,Y) and ancestor(X,Y) :- ancestor(X,Z) , parent (Z,Y) Note that - the names ancestor and parent are just chosen for illustration. `parent` can be an arbitrary binary relation and `ancestor`is then its transitive closure; - the notation we have chosen here is the one of the programming language [Prolog](http://www.swi-prolog.org). Assume we know that parent(harry,william) parent(charles,harry) parent(philip,charles) and we want to know all ancestors of `william`, that is, we want to know all `X` such that `ancestor(X,william)` Let us do the computation. First, we known that parent(harry,william) so we output `X=harry`. Next, we compute ancestor(X,william) if ancestor(X,Z) and parent(Z,william) upon which we can either output `X=charles` or continue with ancestor(X,william) if ancestor(X,Z), parent(Z,william) if ancestor(X,-), parent(-,Z), parent(Z,william) where, for the last line above, we want to apply the rule ancestor(X,Y) :- ancestor(X,Z) , parent (Z,Y) to ancestor(X,Z) To do this we need to replace the `Y` in `ancestor(X,Y)` with `Z`. But this means that on the right hand side we also need to replace `Y` by `Z`, resulting in ancestor(X,-), parent(-,Z) But now we have a **Question:** Which variable should replace the `-` in the above? ## Barendregt's Variable Convention We have seen that in the presence of bound variables, we have to be careful with substitutions. Roughly speaking, substituting $s$ for $x$ in $t$, we need to make sure that the free variables of $s$, which are implicitely bound in an environment, will not accidentally come into the scope of a binder in $t$. Substitution that avoids such accidental capture of free variables is called **capture avoiding substitution**. The reason we say "accidental capture" is that the bound variables in $t$ can always be renamed. In this sense, the actual name used for a bound variable is accidental (as opposed to essential, or as opposed to having independent significance.) The easiest way to implement capture avoiding substitution on the informal mathematical level is the following, known as Barendregt's Variable Convention. Assume that the set of bound variables is disjoint from the set of free variables. In case this assumption happens to be violated, rename the bound variables with fresh variables. ## L- and R-values of a variable The general features of free and bound variables we have discussed so far are shared by variables both as we know them from mathematics and from programming. So I should emphasise an important way in which both are different. In imperative programming languages we have assignements such as x := 5 or x = 5 in C-like languages. Then each variable has two values. On the left-hand side an address or location. On the right-hand side a value such as 5. It is convenient to distinguish them by talking about the L-value and the R-value of a variable. Mathematical variables only have R-values. Moreover, inside the same scope, the R-value of a mathematical variable never changes. There are no assignments in mathematics. When we say "let the value of $x$ be 5", then we are stating an assumption about the R-value of $x$. We are not assigning 5 to $x$. Mathmatical variables are names for values (or expressions), they do not have location. Programming variables do have location as they are storage into which we can write some data. In particular, writing into storage erases what was in that area of store before. So while the R-value of a mathematical variable never changes, for programming variables this is only true of the L-value. The above terminology of mathematical vs programming variables is not ideal as programming languages in fact do know both kind of variables. Rather, we should talk about **immutable variables** and **mutable variables**. Variables in declarative languages such as Haskell and Prolog are immutable while variables in imperative languages such as C and Java are mutable. [^transitive]: The inductive definition of transitive closure was written before as $$\frac{(x,y)\in R}{(x,y)\in R^+} \quad\quad\quad\quad \frac{(x,y)\in R \quad\quad (y,z)\in R^+}{(y,z)\in R^+}$$ Comparing with the programming notation above we see that $\quad$ ` :- ` represents the horizontal bar, $\quad$ ` , ` represents the space in the middle of "$(x,y)\in R \quad (y,z)\in R^+$" $\quad$ `parent(x,y)` represents "$(x,y)\in R$" $\quad$ `ancestor(x,y)` represents "$(x,y)\in R^+$"