Normalisation by Evaluation

(under construction)

(part of my course Programming Languages at Chapman University)

Normalisation by Evaluation is a technique in programming languages that is complicated through the presence of variables and higher-order functions. ^[1] But some of the main ideas can be already explained in a very simple situation without variables and function types. This is the purpose of this lecture.

As a concrete example, we use this method to fulfill several promises from previous lectures:

To prove the adequacy of the operational and denotational semantics of the calculator.
To prove the soundness and completeness of the equational logic of arithmetic.

There are also interesting corollaries from the point of view of studying formal systems. For example, in the language of arithemetic with only 1, + it is the case that the law of associativity is enough to evaluate all possible computations. In particular, in that language, associativity implies commutativity.

Remark for mathematicians: In group theory, normalization by evalution is known as van der Waerden's trick as explained in Bergman's course on universal algebra. ^[2]

Introduction

Recall from the lecture Operational and Denotational Semantics the definition of the semantics

[[-]] : L \to N

of a simple language e ::= 1 | e+e | e*e. In this lecture we will add a new ingredient, namely a function

nf : N \to L,

which we understand as providing a semantic definition of normal form. As our aim is to achieve a perfect match between syntax and semantics, we will require that "going around"

N \overset{nf}{⟶} L \overset{[[-]]}{⟶} N

is the identity, saying that the meaning of the expression encoding a number

n

n

. The "other way round"

L \overset{[[-]]}{⟶} N \overset{nf}{⟶} L

we will see that we can achieve equality "up to some equivalence relation

\approx

". In fact, this as a requirement can be used to find the equivalence

\approx

in the first place.

This is a general method that can be used reconstruct the setting discussed in Operational and Denotational Semantics, where we showed how the meaning function

[[-]]

can be described purely syntactically by an equivalence relation

\approx

, ending up with a situation in which the syntax and the semantics match up perfectly. Mathematically, we expressed this by saying that the quotient of

L

wrt

\approx

is isomorphic to

N

, in symbols:

L / \approx ≅ N

Learning Outcomes

Students will learn

the basic ideas the technique known in programming languages as normalisation by evaluation.
how to use this technique to prove completeness of an axiomatisation wrt a semantics.

Aims

Reinforce the relationship between the lectures about syntax and semantics on the one hand and abstract reduction systems on the other hand.
Show how normal forms can be used in completeness proofs. This is a general technique that is widely applicable.
Show how the method of proving completeness via normalisation by evaluation is actually a method to discover the equations from our informal knowledge of the problem domain. ^[3]
Prove an observation from Operational and Denotational Semantics that associativity is enough to prove all equations about expressions in the language given by
$1, +$ .
Extend this observation to the language
$1, +, *$ .
Link 'semantic' and 'syntactic' normal forms.
Discuss some subtleties about overloading notation.

Example

We go back to our lectures on Operational and Denotational Semantics.

One topic of these lectures was in what sense it is justified to say that the meaning of the (purely formal) language

    exp ::= 1 | exp + exp | exp * exp

is captured by the (equally formal) equations

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X * 1 & \approx X \\ X * (Y + Z) & \approx X * Y + X * Z \\ X * (Y * Z) & \approx (X * Y) * Z \\ X + Y & \approx Y + X \\ X * Y & \approx Y * X \end{aligned}

On the one hand, it is important to emphasise that the language and the equations can be run on a machine without any human interference or intelligence. (For example, we can read the equations from right to left

\to

and consider them as defining a rewrite system that can be executed on a machine.)

On the other hand, the language and the equations make sense to us because they agree with our understanding of mathematics.

How do we convince ourselves that the equations capture exactly the familiar

1, +, \cdot

of mathematics?

How do we know that we have enough equations (that is the question of completeness)? Can we drop some equations?

The answer will use ideas we learned about when we discussed abstract reduction systems (ARSs).

We defined a normal form in an ARS

(A, \to)

as an element

a \in A

that cannot be reduced, that is, there is no

b

such that

a \to b

. This is a purely syntactic notion (at least if the ARS can be described in a finite way).

In this lecture, we will see a semantic way of giving a notion of normal form. And we will also see how to match these two notions up.

Remark on Overloading

As we explained above, it is important to separate syntax from semantics. Expressions have operations 1,+,* and numbers have operations

1, +, \cdot

. This situation is very common in programming where we also want to denote different operations by the same symbol. This is called overloading. Here we overloaded 1 and + and did not overload multiplication for which we use * and

\cdot

, respectively. Is this a bad or a good idea? Here are some points to consider:

Making a distinction between * and
$\cdot$ helps making clear whether we talk about syntax or semantics.
Using different symbols for 1,+ would make the lecture more difficult to read.
Even using the different symbols * and
$\cdot$ and, similarly,
$\approx$ and
$=$ , has disadvantages as the equations holds for both syntax and semantics (indeed, that is the crux of this lecture, as we will see). So do we have to write the equations twice now, once with * and
$\approx$ and once with
$\cdot$ and
$=$ , or should we better avoid this?

In my experience, there is no one good answer. In programming one tends to not overload and in maths one tends to use overloading a lot, but there are no general rules and one needs to decide on an ad hoc basis, taking into account asthetic considerations, not only objective criteria such as correctness, efficiency, etc

Normalisation by Evaluation

Recall that the meaning of our language

L

of exp's is given by the semantics (meaning function)

\begin{aligned} [[-]] : L & \to N \\ [[1]] & = 1 \\ [[t + t^{'}]] & = [[t]] + [[t^{'}]] \\ [[t * t^{'}]] & = [[t]] \cdot [[t^{'}]] \end{aligned}

The new idea is to, conversely, define a function

\begin{aligned} nf : N & \to L \\ nf (1) & = 1 \\ nf (n + 1) & = nf (n) + 1 \end{aligned}

For example,

nf (4) = ((1 + 1) + 1) + 1)

, etc. ^[4]

Exercise: Show that

$[[-]]$ restricted to left-bracketed expressions such as
$(1 + 1) + 1$ is an inverse of
$nf$ .
$[[nf (n)]] = n$

We can now define the semantic normal form of an expression

e

nf ([[e]])

Example: Here is an example how we normalise by evaluation. You see that instead of normalising by using the reduction relation given by some set of equations, we reduce to normal form by using the definiton of the semantics

[[-]]

\begin{aligned} nf ([[1 * (1 + (1 + 1))]]) & = nf ([[1]] \cdot [[1 + (1 + 1)]]) \\ = nf (1 \cdot [[1 + (1 + 1)]]) \\ = nf ([[1 + (1 + 1)]]) \\ = nf ([[1]] + [[1 + 1]]) \\ = nf ([[1]] + [[1]] + [[1]]) \\ = nf (3) \\ = (1 + 1) + 1 \end{aligned}

Example: Here is the same example again, now by reducing to normal form via equations. You see here that the semantics (meaning) of the the numbers does not come in at all.

\begin{aligned} 1 * (1 + (1 + 1)) & \to (1 * 1) + 1 * (1 + 1)) \\ \to 1 + 1 * (1 + 1)) \\ \to 1 + (1 * 1 + 1 * 1) \\ \to 1 + (1 + 1) \\ \to (1 + 1) + 1 \end{aligned}

Exercise: Replay the two examples above with the expression

(1 + 1) * (1 + 1)

instead of

1 * (1 + (1 + 1))

Question: Which equations between exp's are needed to guarantee that

e \approx nf ([[e]])

for all expressions

e

For maximum effect pause the lecture and try to find out yourself.

We reason by induction on the three rules that make the definition of

    exp ::= 1 | exp + exp | exp * exp

e

is 1 then

1 = nf ([[1]])

. No equations needed. ^[5]

e

e_{1} + e_{2}

, we want to reason

\begin{aligned} nf ([[e_{1} + e_{2}]]) & = nf ([[e_{1}]] + [[e_{2}]]) \\ \approx nf ([[e_{1}]]) + nf ([[e_{2}]]) \\ \approx e_{1} + e_{2} \end{aligned}

Exercise: Justify all steps. Which equations are needed? Hint: For the first "

\approx

", make an example such as

\begin{aligned} nf (3 + 2) & = nf (4 + 1) = nf (4) + 1 = nf (3 + 1) + 1 = (nf (3) + 1) + 1 \\ \approx nf (3) + (1 + 1) = nf (3) + nf (2), \end{aligned}

justify all steps again and then generalise to other numbers than

3

and

2

. (Formally, the argument would run by induction on the second number, but we don't need to go into that much detail here. A more informal argument would be already great to have.)

Answer to the exercise (only read the footnote after you attempted your own solution).^[6]

Similarly, if

e

e_{1} * e_{2}

, we reason

\begin{aligned} nf ([[e_{1} * e_{2}]]) & = nf ([[e_{1}]] \cdot [[e_{2}]]) \\ \approx nf ([[e_{1}]]) * nf ([[e_{2}]]) \\ \approx e_{1} * e_{2} \end{aligned}

Exercise: Justify all steps. [Hint: This follows the same outline as the previous exercise.]

To conclude, we have proved by induction on the definition of

L

that

nf ([[e]]) \approx e

for all

e \in L

Exercise: Going back to the two previous exercises, can you confirm that the argument above only used the following equations?

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X \cdot 1 & \approx X \\ X \cdot (Y + Z) & \approx X \cdot Y + X \cdot Z \end{aligned}

We summarise what we learned so far in

Theorem 1: The equations

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X \cdot 1 & \approx X \\ X \cdot (Y + Z) & \approx X \cdot Y + X \cdot Z \end{aligned}

allow us deduce by equational reasoning the equations

nf ([[e]]) \approx e

for all expressions

e

in the language

L

given by

    exp ::= 1 | exp + exp | exp * exp

Remark: This theorem has both a logical interpretation and a computational interpretation, as we are going to see next.

Logical Interpretation: Completeness

In this section, we show that Theorem 1 has a logical interpretation. In fact, we can reformulate Theorem 1 as a so-called completeness result.

Theorem 2 below states that the equations

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X * 1 & \approx X \\ X * (Y + Z) & \approx X * Y + X * Z \end{aligned}

are sound and complete for the language given by

    exp ::= 1 | exp + exp | exp * exp

and the interpretation in the natural numbers given by

\begin{aligned} [[1]] & = 1 \\ [[t + t^{'}]] & = [[t]] + [[t^{'}]] \\ [[t * t^{'}]] & = [[t]] \cdot [[t^{'}]] \end{aligned}

In symbolic notation we write this as

Theorem 2:

e_{1} \approx e_{2} ⟺ [[e_{1}]] = [[e_{2}]]

Proof: Soundness means that for all expressions

X, Y, Z

, the following equations

\begin{aligned} [[X + (Y + Z)]] & = [[(X + Y) + Z]] \\ [[X * 1]] & = [[X]] \\ [[X * (Y + Z)]] & = [[X * Y + X * Z]] \end{aligned}

hold for natural numbers. But these are just properties of numbers we are familiar with.

Completeness is the converse, namely

[[e_{1}]] = [[e_{2}]] ⟹ e_{1} \approx e_{2}

This follows from

e_{1} \approx nf ([[e_{1}]]) = nf ([[e_{2}]]) \approx e_{2}

where the "

\approx

" are due to Theorem 1 and "

=

" is due to the assumption.

This finishes the proof.

To summarise, for all expressions

e_{1}, e_{2}, e_{3}

in the language above there are derivations of the equations

\begin{aligned} e_{1} * (e_{2} * e_{3}) & \approx (e_{1} * e_{2}) * e_{3} \\ e_{1} + e_{2} & \approx e_{2} + e_{1} \\ e_{1} * e_{2} & \approx e_{2} * e_{1} \end{aligned}

from the equations

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X * 1 & \approx X \\ X * (Y + Z) & \approx X * Y + X * Z \end{aligned}

One way in which our language

L

is simple is that it does not contain variables. We will see how to add variables later. For now, the next question is an opportunity to think about this point. [Hint: The answer to the question can be yes or no, depending on the chosen. In fact, we have shown that the answer is yes, for the language of this lecture. But, this does not mean that these equations are not needed for richer languages.]

Question: Can the equations

\begin{aligned} X * (Y * Z) & \approx (X * Y) * Z \\ X + Y & \approx Y + X \\ X * Y & \approx Y * X \end{aligned}

be derived from the equations

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X * 1 & \approx X \\ X * (Y + Z) & \approx X * Y + X * Z \end{aligned}

or can they not?

Computational Interpretation: Rewriting to Normal Form

So far in this lecture all computation happened on the right-hand (semantic) side of

[[-]] : L \to N

On the left-hand (syntactic) side, we only have a formal language and a set of equations that guarantee, according to Theorem 2,

e_{1} \approx e_{2} ⟺ [[e_{1}]] = [[e_{2}]]

But we have seen many instance of the idea of turning equations into computation rules by reading them from left to right. We pick this idea up now again.

Exercise: You showed in a previous exercise that the proof of Theorem 1 only uses the equations

\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X * 1 & \approx X \\ X * (Y + Z) & \approx X * Y + X * Z \end{aligned}

Show that the proof that

e \approx nf (e)

uses the equations only from left to right.

The previous exercise can be extended to a proof of the following theorem.

Theorem 3 below says that the rewrite rules

\begin{aligned} X + (Y + Z) & \to (X + Y) + Z \\ X * 1 & \to X \\ X * (Y + Z) & \to X * Y + X * Z \end{aligned}

define an ARS

(L, \to)

such that the equivalence relations induced by

\to

and

[[-]]

are the same.

In symbolic notation we write this as

Theorem 3:

e_{1} \overset{*}{⟷} e_{2} \Leftrightarrow [[e_{1}]] = [[e_{2}]]

Activity: Think about the similarity of Theorem 2 and 3. Relate this to what we learned about ARSs. Recall the example of the sorting algorithm where

\approx = \overset{*}{⟷}

could be understood that the implementation satisfied the specification, that is, that

\to

does implement indeed a sorting algorithm.

Remark: The second theorem shows that the semantic normal forms of the first theorem are also syntactic normal forms (ie normal forms of

(L, \to)

Project Questions:

What is needed to turn the ARS
$(L, \to)$ into an algorithm evaluating arbitary expressions
$e$ ?
How does the algorithm given by
$\to$ compare to the one given by
$[[-]]$ ?

Summary

While the example we considered is simple, the method is very general.

A typical situation is as follows.

We start with a semantic domain

D

, for example, numbers and the task is to represent it in a computational way.

First, one develops a syntax that allows to denote all objects of

D

by a term in a formal language

L

. This denotation gives us a function

[[-]] : L \to D

If we have done our work well,

[[-]]

is defined by induction on the structure of

L

, which in turn must be discovered from structure already contained in

D

. ^[7] Typically,

[[-]]

will be surjective (all elements of

D

can be represented by some term), but not injective (some elements are denoted by more than one term).

This means that we do not yet fully capture

D

. What is lacking is to understand when two terms denote the element of

D

To solve this problem, we identify special terms, sometimes called "canonical terms" or "semantic normal forms", that we are going to use as "canonical representatives" of elements of

D

. In other words, we identify a function

nf : D \to L

which preserves the semantics in the sense that

[[n f (d)]] = d

for all

d \in D

The interesting, and sometimes difficult, step is now to identify a nice set of equations on terms that allows us to prove (using the rules of equational logic)

nf (d) \approx d

for all

d \in D

Since the equations form an equivalence relation on

L

, we obtain a bijection, or isomorphism,

L / \equiv ⟶ D

which, from left to right is

[[-]]

and from right to left is

n f

This isomorphism establishes that up to the naming of elements the domain

D

is the same the language of terms modulo the equations

\equiv

In particular, all computations that can be done in the structure

D

can be done by rewriting terms using the equations

\equiv

A lot of graduate-level information on application of NBE and related topics is available at the webpage of the Summerschol on Metaprogramming. Already the page of abstracts is an interesting read, but there are also very good teaching materials. If you want to go deeper into this the paper, available from Wadler's homepage on how to apply these ideas to implement domains specific languages could be a good starting point. Elsewhere, Dybier presents an introduction and Abel gives an advanced treatment that also contains some historical and other remarks of general interest. ↩︎
I recommend his discussion starting after the proof on page 37 with "Well – are statements (a)-(d) true, or not??" to all mathematics students. I also paste in a screenshot from page 40. The first paragraph points out that the word problem is often undecidable (which means that there is no terminating algorithm computing normal forms). The second paragraph has the reference to van der Waerden's trick. Btw, van der Waerden was a student of Emmy Noether whose fascinating biography features in an In Our Time Podcast (btw, I highly recommend In Our Time for philosophy and history). Interestingly, some of the central topics we are discussing (algorithms, invariants) also show up in the podcast.
↩︎
This is same reason why it is important to develop a prototype to elicit requirement specifications from customers. Trying to implement a computational model is often the best method to elicit from informal domain expertise the requirement specification necessary to build the computational model in the first place. To break this apparent logical paradox, one often starts from a rough specification, builds a first a prototype and then iterates through a cycle of "specification -> implementation -> specification ->implementation-> …" ↩︎
The symbol "1" is overloaded as we use it both for a term of type exp and for the number
$1 \in N$ ; similarly for "+". ↩︎
The rule of logic used here is (refl). ↩︎
If you go through all the reasoning steps of the argument, you will find that only one of the equations "
$\approx$ " is used, namely in the place where we write "
$\approx$ " in
$\begin{aligned} nf (3 + 2) & = nf (4 + 1) = nf (4) + 1 = (nf (3) + 1) + 1 \\ \approx nf (3) + (1 + 1) = nf (3) + nf (2), \end{aligned}$ Which of the equations below did we use?
$\begin{aligned} X + (Y + Z) & \approx (X + Y) + Z \\ X * 1 & \approx X \\ X * (Y + Z) & \approx X * Y + X * Z \\ X * (Y * Z) & \approx (X * Y) * Z \\ X + Y & \approx Y + X \\ X * Y & \approx Y * X \end{aligned}$ I agree that, if not careful in thinking and notation, it may be confusing that we have equations here on two different levels: We use "
$=$ " for the normal mathematical equations (feeling free to use equations such as 3+2=4+1 we know to be mathematically true) and we use "
$\approx$ " for the equations that define our computational model. The possible confusion that arise here is the same that may arise in programming when, for example, writing a C compiler in C. ↩︎
We will learn later that this means that
$[[-]]$ is a homomorphism. ↩︎

Normalisation by Evaluation

Introduction

Learning Outcomes

Aims

Example

Remark on Overloading

Normalisation by Evaluation

Logical Interpretation: Completeness

Computational Interpretation: Rewriting to Normal Form

Summary

Read more

Create, Detect, Lift, Preserve, Reflect (Co)Limits

A Very Short Introduction to Monads

Category Theory - Axiomatic Theory of Structure

Category Theory in Computer Science