(under construction)
(part of my course Programming Languages at Chapman University)
Normalisation by Evaluation is a technique in programming languages that is complicated through the presence of variables and higher-order functions. [1] But some of the main ideas can be already explained in a very simple situation without variables and function types. This is the purpose of this lecture.
As a concrete example, we use this method to fulfill several promises from previous lectures:
There are also interesting corollaries from the point of view of studying formal systems. For example, in the language of arithemetic with only 1, + it is the case that the law of associativity is enough to evaluate all possible computations. In particular, in that language, associativity implies commutativity.
Remark for mathematicians: In group theory, normalization by evalution is known as van der Waerden's trick as explained in Bergman's course on universal algebra. [2]
Recall from the lecture Operational and Denotational Semantics the definition of the semantics
of a simple language e ::= 1 | e+e | e*e
. In this lecture we will add a new ingredient, namely a function
which we understand as providing a semantic definition of normal form. As our aim is to achieve a perfect match between syntax and semantics, we will require that "going around"
is the identity, saying that the meaning of the expression encoding a number is . The "other way round"
we will see that we can achieve equality "up to some equivalence relation ". In fact, this as a requirement can be used to find the equivalence in the first place.
This is a general method that can be used reconstruct the setting discussed in Operational and Denotational Semantics, where we showed how the meaning function can be described purely syntactically by an equivalence relation , ending up with a situation in which the syntax and the semantics match up perfectly. Mathematically, we expressed this by saying that the quotient of wrt is isomorphic to , in symbols:
Students will learn
We go back to our lectures on Operational and Denotational Semantics.
One topic of these lectures was in what sense it is justified to say that the meaning of the (purely formal) language
exp ::= 1 | exp + exp | exp * exp
is captured by the (equally formal) equations
On the one hand, it is important to emphasise that the language and the equations can be run on a machine without any human interference or intelligence. (For example, we can read the equations from right to left and consider them as defining a rewrite system that can be executed on a machine.)
On the other hand, the language and the equations make sense to us because they agree with our understanding of mathematics.
How do we convince ourselves that the equations capture exactly the familiar of mathematics?
How do we know that we have enough equations (that is the question of completeness)? Can we drop some equations?
The answer will use ideas we learned about when we discussed abstract reduction systems (ARSs).
We defined a normal form in an ARS as an element that cannot be reduced, that is, there is no such that . This is a purely syntactic notion (at least if the ARS can be described in a finite way).
In this lecture, we will see a semantic way of giving a notion of normal form. And we will also see how to match these two notions up.
As we explained above, it is important to separate syntax from semantics. Expressions have operations 1,+,* and numbers have operations . This situation is very common in programming where we also want to denote different operations by the same symbol. This is called overloading. Here we overloaded 1 and + and did not overload multiplication for which we use * and , respectively. Is this a bad or a good idea? Here are some points to consider:
In my experience, there is no one good answer. In programming one tends to not overload and in maths one tends to use overloading a lot, but there are no general rules and one needs to decide on an ad hoc basis, taking into account asthetic considerations, not only objective criteria such as correctness, efficiency, etc
Recall that the meaning of our language of exp
's is given by the semantics (meaning function)
The new idea is to, conversely, define a function
For example, , etc. [4]
Exercise: Show that
We can now define the semantic normal form of an expression as
Example: Here is an example how we normalise by evaluation. You see that instead of normalising by using the reduction relation given by some set of equations, we reduce to normal form by using the definiton of the semantics .
Example: Here is the same example again, now by reducing to normal form via equations. You see here that the semantics (meaning) of the the numbers does not come in at all.
Exercise: Replay the two examples above with the expression instead of .
Question: Which equations between exp
's are needed to guarantee that
for all expressions ?
For maximum effect pause the lecture and try to find out yourself.
We reason by induction on the three rules that make the definition of
exp ::= 1 | exp + exp | exp * exp
If is 1 then . No equations needed. [5]
If is , we want to reason
Exercise: Justify all steps. Which equations are needed? Hint: For the first "", make an example such as justify all steps again and then generalise to other numbers than and . (Formally, the argument would run by induction on the second number, but we don't need to go into that much detail here. A more informal argument would be already great to have.)
Answer to the exercise (only read the footnote after you attempted your own solution).[6]
Similarly, if is , we reason
Exercise: Justify all steps. [Hint: This follows the same outline as the previous exercise.]
To conclude, we have proved by induction on the definition of that
for all .
Exercise: Going back to the two previous exercises, can you confirm that the argument above only used the following equations?
We summarise what we learned so far in
Theorem 1: The equations
allow us deduce by equational reasoning the equations
for all expressions in the language given by
exp ::= 1 | exp + exp | exp * exp
Remark: This theorem has both a logical interpretation and a computational interpretation, as we are going to see next.
In this section, we show that Theorem 1 has a logical interpretation. In fact, we can reformulate Theorem 1 as a so-called completeness result.
Theorem 2 below states that the equations
are sound and complete for the language given by
exp ::= 1 | exp + exp | exp * exp
and the interpretation in the natural numbers given by
In symbolic notation we write this as
Theorem 2:
Proof: Soundness means that for all expressions , the following equations hold for natural numbers. But these are just properties of numbers we are familiar with.
Completeness is the converse, namely
This follows from
where the "" are due to Theorem 1 and "" is due to the assumption.
This finishes the proof.
To summarise, for all expressions in the language above there are derivations of the equations from the equations
One way in which our language is simple is that it does not contain variables. We will see how to add variables later. For now, the next question is an opportunity to think about this point. [Hint: The answer to the question can be yes or no, depending on the chosen. In fact, we have shown that the answer is yes, for the language of this lecture. But, this does not mean that these equations are not needed for richer languages.]
Question: Can the equations be derived from the equations or can they not?
So far in this lecture all computation happened on the right-hand (semantic) side of
On the left-hand (syntactic) side, we only have a formal language and a set of equations that guarantee, according to Theorem 2,
But we have seen many instance of the idea of turning equations into computation rules by reading them from left to right. We pick this idea up now again.
Exercise: You showed in a previous exercise that the proof of Theorem 1 only uses the equations
Show that the proof that uses the equations only from left to right.
The previous exercise can be extended to a proof of the following theorem.
Theorem 3 below says that the rewrite rules
define an ARS such that the equivalence relations induced by and are the same.
In symbolic notation we write this as
Theorem 3:
Activity: Think about the similarity of Theorem 2 and 3. Relate this to what we learned about ARSs. Recall the example of the sorting algorithm where could be understood that the implementation satisfied the specification, that is, that does implement indeed a sorting algorithm.
Remark: The second theorem shows that the semantic normal forms of the first theorem are also syntactic normal forms (ie normal forms of ).
Project Questions:
While the example we considered is simple, the method is very general.
A typical situation is as follows.
We start with a semantic domain , for example, numbers and the task is to represent it in a computational way.
First, one develops a syntax that allows to denote all objects of by a term in a formal language . This denotation gives us a function
If we have done our work well, is defined by induction on the structure of , which in turn must be discovered from structure already contained in . [7] Typically, will be surjective (all elements of can be represented by some term), but not injective (some elements are denoted by more than one term).
This means that we do not yet fully capture . What is lacking is to understand when two terms denote the element of .
To solve this problem, we identify special terms, sometimes called "canonical terms" or "semantic normal forms", that we are going to use as "canonical representatives" of elements of . In other words, we identify a function
which preserves the semantics in the sense that for all .
The interesting, and sometimes difficult, step is now to identify a nice set of equations on terms that allows us to prove (using the rules of equational logic)
for all .
Since the equations form an equivalence relation on , we obtain a bijection, or isomorphism,
which, from left to right is and from right to left is .
This isomorphism establishes that up to the naming of elements the domain is the same the language of terms modulo the equations .
In particular, all computations that can be done in the structure can be done by rewriting terms using the equations .
A lot of graduate-level information on application of NBE and related topics is available at the webpage of the Summerschol on Metaprogramming. Already the page of abstracts is an interesting read, but there are also very good teaching materials. If you want to go deeper into this the paper, available from Wadler's homepage on how to apply these ideas to implement domains specific languages could be a good starting point. Elsewhere, Dybier presents an introduction and Abel gives an advanced treatment that also contains some historical and other remarks of general interest. ↩︎
I recommend his discussion starting after the proof on page 37 with "Well – are statements (a)-(d) true, or not??" to all mathematics students. I also paste in a screenshot from page 40. The first paragraph points out that the word problem is often undecidable (which means that there is no terminating algorithm computing normal forms). The second paragraph has the reference to van der Waerden's trick. Btw, van der Waerden was a student of Emmy Noether whose fascinating biography features in an In Our Time Podcast (btw, I highly recommend In Our Time for philosophy and history). Interestingly, some of the central topics we are discussing (algorithms, invariants) also show up in the podcast. ↩︎
This is same reason why it is important to develop a prototype to elicit requirement specifications from customers. Trying to implement a computational model is often the best method to elicit from informal domain expertise the requirement specification necessary to build the computational model in the first place. To break this apparent logical paradox, one often starts from a rough specification, builds a first a prototype and then iterates through a cycle of "specification -> implementation -> specification ->implementation-> …" ↩︎
The symbol "1" is overloaded as we use it both for a term of type exp
and for the number ; similarly for "+". ↩︎
The rule of logic used here is (refl). ↩︎
If you go through all the reasoning steps of the argument, you will find that only one of the equations "" is used, namely in the place where we write "" in Which of the equations below did we use? I agree that, if not careful in thinking and notation, it may be confusing that we have equations here on two different levels: We use "" for the normal mathematical equations (feeling free to use equations such as 3+2=4+1 we know to be mathematically true) and we use "" for the equations that define our computational model. The possible confusion that arise here is the same that may arise in programming when, for example, writing a C compiler in C. ↩︎
We will learn later that this means that is a homomorphism. ↩︎