(part of a course on Programming Languages)
"In a rough and ready sort of way it seems to me fair to think of the semantics as being what we want to say and the syntax as how we have to say it."
Christopher Strachey in his seminal paper Fundamental Concepts in Programming Languages (1967).
To be able to explain two concepts central to the field of Programming Languages, namely denotational and operational semantics.
To solve programming problems we need a programming language to represent data and define operations on data.
To reason about the correctness of our programs we need to also know what is being represented. To this end we define mathematical models of the real world. These models are sometimes also called domains or mathematical domains.
Example: In data science, mathematical models are typically statistical models built from probability distributions.
(Whether the mathematical models are accurate representations of the real world cannot be determined by purely scientific methods. For this we need to look at scientific consensus, which arises from a process that draws on science but also comprises social, historical and psychological factors. In the following we will not pursue this important point but I recommend the important book The Structure of Scientific Revolutions by Thomas Kuhn to everybody who is interested in this.) [1]
Once we agreed on the mathematical model, the correctness of the program becomes the question whether the program agrees with the model. We can picture this as a function
Such a map is often called a denotational semantics because it associates to each datum its mathematical denotation (meaning). It is also sometimes called a 'meaning function' or just 'the semantics'.
Example: For shortest path algorithms
Activity: Add your own examples.
In the following, we are interested in the case where the elements of the set
to emphasize that we think of
Programs are special data that can be executed. Thus, in addition to a denotational semantics we want an operational semantics that captures computations.
For this we need a model of computation.
To keep things simple, we will build on our previous work on rewriting. In other words, our model of computation will be rewriting to normal form.
We can now summarize our approach (that will be refined later) in the following diagram
Since the write-up of this lecture is longer than usual, I typeset the parts that can be skipped on first reading in grey.
This lecture will address the following questions.
What is meaning? [2] On the one hand, computation consists of the blind application of formal rules, on the other hand we write programs to perform meaningful tasks. Is it possible to give a mathematical account of the meaning of a program?
While these questions may seem rather philosophical, the attempt at an answer will lead to both a deeper understanding of programming and to practical tools that are relevant to everyday software engineering. [3]
For example:
Throughout, we will illustrate the concepts by the simplest example from arithmetic. But this methodology scales to complex software systems including compilers and operating systems.
We start with a language 1
and two binary operations +
and *
. So far +
from *
.
There are two ways to equip
Operational Semantics adds computation rules (equations, rewrite rules). For example, after adding
X*(Y+Z) = X*Y+X*Z
the symmetry between +
and *
is broken and +
begins to look like addition and *
like multiplication. Moreover, we can read it from right to left and intepret it as a computation rule.
Denotational Semantics maps the language 1
to +
to *
to
Question: Can you think of examples of both methods in the context of software engineering? Review some of your favourite algorithms. How do you establish the correctness of the algorithms? [4]
Summary: The remainder of the lecture discusses the details of the sketch above:
Finally, we will indicate that the principles outlined for arithmetic translate to programming languages and software engineering in general and briefly look at the problem of full abstraction.
The following recap of arithmetic is not intended to introduce new material, but to clarify the structure of arithmetic in terms of syntax, computation and semantics. This structure repeats itself in countless examples in programming languages, algorithms, and logic and we will see more of it throughout Programming Languages and Compiler Construction.
The abstract syntax [5]
exp ::= 1 | exp + exp | exp * exp
allows us to form, for example, the expression
(1+1)*((1+1)+1)
We can read this as
((1+1)+1)+1
Remark (Haskell): Similarly to our implemention of the calculator, we can represent the abstract syntax of a simple language for arithmetic in Haskell as
data Exp = One | Add Exp Exp | Mult Exp Exp
The expression (1+1)*((1+1)+1)
would then be represented as
Mult (Add One One) (Add (Add One One) One)
What about computations?
Acticity: How do we compute
on a machine that has only one basic number 1
and does not have timetables stored.?
Answer. [6]
Activity: Which rules do justify the computation steps above? Label each step of the computation above by the rule that justifies it.
Hint. [7]
Further Question: Can you think of an example computation that needs the following equations?
X + Y = Y + X
X * Y = Y * X
X * ( Y * Z ) = ( X * Y ) * Z
Important Idea: We can use equations as rewrite rules to do computations. In order for computations to terminate, we orient equations and allow applications of equations only "from left to right". If we want to emphasise our intention to only apply equations from left to right, we say "rewrite rule" instead of "equation" and use ->
instead of =
. For example,
X + ( Y + Z ) -> ( X + Y ) + Z
X * 1 -> X
X * ( Y + Z ) -> X * Y + X * Z
Exercise: The rewrite rules above define an abstract reduction system
Exercise: Why does every computation in the language given by the syntax exp ::= 1 | exp + exp
and the rewrite rule X + ( Y + Z ) -> ( X + Y ) + Z
terminate? Can you find an invariant?
Optional Exercise: Extend the exercise above to the larger language involving multiplication. Another extension would be to add zero and a unary operation for successor. Etc.
Preliminary summary: We reviewed abstract syntax, equations, rewrite rules. Moreover we raised the question of how to recognise whether we have enough equations/rewrite rules. "Enough" can refer to two distinct but related ideas. Do we have enough equations to prove all mathematical equalities? Do we have enough rewrite rules to do all relevant computations (in our example: to evaluate all arithmetic expressions)?
The big idea that is missing so far is how to give meaning to a language. We also say: What is the semantics of our formal language?
Meaning comes from talking about something that everybody has agreed upon or understands already (the pre-understanding). For us, there are two possibilities: We all understand (at least to some degree that should suffice for the following)
If we give the semantics of a language in terms of a mathematical model we speak of mathematical or denotational semantics. If the semantics is in terms of a (virtual) computing machine, we speak of operational semantics. Both are two different and important ways of explicating the meaning of a programming language.
In our running example of arithmetic, the denotational semantics will assign to every expression the number denoted by that expression. The operational semantics will describe instead the computation steps taken in order to evaluate the expression to normal form. [8]
The borderline between denotational semantics and operational semantics can be somewhat fluid if the semantic domain is a virtual machine. For example, if we take as our semantic domain for arithmetic the Integer
data type of Haskell, then, since Haskell can be executed, we have a semantics of arithmetic that is both denotational and operational.
But usually there is a distinct gap between operational and denotational semantics: The operational semantics is the specification of an interpreter for the programming language whereas the denotational semantics tries to capture the mathematically essence of a program, abstracting away from computational details.
Let us write
exp ::= 1 | exp + exp | exp * exp
Let
The (denotational) semantics of the language
will involve three different languages and we use, just for now, three different colours to help keeping them apart.
We can now define the semantics
Or the same definition in Haskell (now taking Haskell's Integer
instead of the mathematician's
eval :: Exp -> Integer
eval One = 1
eval (Add n m) = (eval n) + (eval m)
eval (Mult n m) = (eval n) * (eval m)
You can run this definition at repl.it. [9]
Remark: As a little excursion important later in the course, note that the definition of the function
explains the meaning of the "big system"
Remark: To get a feeling for how such an inductive definition works we do an example computation where we inductively/recursively decompose terms (Exercise: fill in the dots in the equational reasoning below.) until we can apply the base case and finally add up in the natural numbers.
If you now wonder why this denotational semantics is not also an operational semantics you are on the right track. But we are in a special situation here where the mathematics (in red) is in fact computable. This is not always the case for mathematics. For example, most real numbers are not computable.
While the denotational semantics worked by mapping syntax to a semantic domain, we now describe a different way of giving meaning to syntax.
There are many different ways to set up an operational semantics. The simplest one, pursued in this section, is rewriting to normal form. We have studied rewriting to normal form before, so this section is largely a recap of things we have seen.
One advantage of operational semantics via rewriting to normal form is that rewriting to normal form does not require us to define another language or a virtual machine. Instead, we directly define computations on the already given syntax
exp ::= 1 | exp + exp
by adding rewrite rules, which, in our case, amounts to adding the single rewrite rule
X + ( Y + Z ) -> ( X + Y ) + Z
Question: What do we need to do in order to show that all computations terminate? If two computations from the same initial expression terminate, are the two final expressions the same? (Recall what we learned about confluence and unique normal forms.) [10]
Recall that a normal form is an expression to which no rewrite rule can be applied.
Exercise: Determine the normal forms of the rewrite system above.
A rewrite system is terminating if there are no infinte computations.
Exercise: Show that the rewrite system above is terminating.
Remark: Recall that a rewrite system is a non-determinstic algorithm. The input is the starting expression. The result is the normal form in which the computation terminates. For a full analysis of our rewrite system it remains to show that results of computations are unique, that is, any two computations starting from the same input lead to the same normal form.
Recall that a rewrite system has unique normal forms if any two terminating computations starting from the same input lead to the same normal form.
Exercise: Show that the rewrite system above has unique normal forms.
Recall that two expressions are called equivalent, if they can be connected by a finite sequence of (forward or backward) rewrite steps.
Exercise: Characterise equivalence of expressions.
Optional Exercise: Extend the exercises above to the language with multiplication.
Now that we have two different ways of explaining the meaning of a programming language, we want to know whether they amount to the same (in a suitable sense). Before we can explain the (positive) answer to this question, we to learn/revise some notions from discrete mathematics.
In order to clarify the relationship between rewrite rules and equations, we need the mathematical notion of equivalence relations.
If you remember what this was about feel free to skip.
Briefly, for any binary relation
One of the uses one makes of equivalence relations is to quotient by them. This turns equivalence into equality. For example, if one quotients the integers by the smallest equivalence relation containing
If you are familiar with these ideas you can go to the next section, otherwise read the notes on Discrete Mathematics.
To check your understanding consider again the equation
X + ( Y + Z ) = ( X + Y ) + Z
Remark: Note that the variables X, Y, Z
are not part of our language of arithmetic. They are merely place holders (metavariables) for the terms (expressions) of the language.
Exercise: Describe the equivalence relation induced by the equation above.
Answer.[13]
Question: Equations introduce a problem. On the one hand, 3+2=5. On the other hand, 3+2 and 5 are certainly different expressions.[14] How can we make sense of these different notions of equality?
For background on this question see the sections on Equivalence Relations and Quotienting by Equivalence Relations in my notes on relations.
Exercise: Answer the question above using the fact that one can quotient by an equivalence relation.
In the next section, we want to connect operational and denotational semantics. The following two exercises will be important.
Exercise: Show the quotient of the language exp ::= 1 | exp + exp
by the equation X + ( Y + Z ) = ( X + Y ) + Z
is isomorphic to the positive integers.
Answer. [15]
Exercise: Show that the isomorphism
Optional Exercise: Extend all exercises above to the language with multiplication.
Now that we have built two different semantics, one mathematical and one computational, we want to know how they relate to each other.
To recap, the denotational semantics is a function
and the operational semantics is a function
where
How can we compare the two semantics?
Exercise: Find a map
I will call that map the "meaning" map
We say that the diagram commutes to express that going around both ways gives the same function. In one-dimensional algebraic notation we write this
or sometimes also as
Here, the ";" reminds us of the sequential composition in programming languages.
We say that the operational semantic is adequate if the diagram above commutes. If we allow ourselve the identifications
1 = 1
(1+1) = 2
((1+1)+1) = 3
...
then we can also formulate adequacy as
The direction "
The direction "
The language of arithmetic is special since there are no functions and no recursion. All computations terminate and evaluate to a number. In particular two expressions are equivalent if and only if they evaluate to the same value.
But what would be a good notion of equivalence for functions?
Two functions, or programs, are observationally equivalent if they evaluate to the same values in all contexts.
I will write
A context x
and return, respectively, x+1
and 1+x
are observationally equivalent.
Exercise: Use the definition of observational equivalence to show that the code fragments x+x
and 2*x
are equivalent.
Remark: Observational equivalence is the notion of equivalence needed for compiler optimisations: If
Similarly, to our discussion of adequacy, we can now ask again whether operational and denotational semantics agree, that is,
Here, the "easy" direction is "
The problem of full abstraction was first introduced for the programming language PCF, which is the simply typed lambda calculus with natural numbers and recursion.
How can we try to understand why full abstraction is a difficult problem?
One can always construct a model from syntax by quotienting. But this does not constitute a "natural mathematical" model, that is, a model from which one could hope to get insights into programming from using "ordinary" mathematics. On the other hand, when one constructs a mathematical model from mathematical (as opposed to computational) principles, this model may contain features that are natural from a mathematical point of view but happen to not be part of the programming language. If this happens, the mathematical model is not fully abstract for the operational semantics.
For PCF this is exactly what happened. (Plotkin, 1977) idenitified the feature that was missing from PCF to make it fully abstract wrt to Scott's denotational semantics: The parallel or.
Obviously, this would need further explanations, but as this excursion on full abstraction already goes beyond of what we can discuss in this course, I leave it at this for now and refer to the literature below for more on this story.
We can summarize the lecture in the following picture. We start with a language
On the left, we have an operational semantics rewriting expression to a normal form. On the right we have a denotational semantics to a semantic domain
In our example, the horizontal arrows are all isomorphisms, that is, we obtain a perfect match between operational and denotational semantics.[18] This perfect picture can be achieved in many situations and will appear, implicitely or explicitely, again and again later in this course.
In particular, we will see examples where the operational semantics (= an algorithm) is given and the question is to understand it (= find a denotational semantics) and, vice versa, where the speicification of a problem (= a denotational semantics) is given and the task is to find an appropriate algorithm (= an operational semantics). Again, in most cases we will look at in this course, a perfect match between the two sides is what we aim for.
However, if we had time to study how this picture can be extended to programming languages with recursion (or while loops, for that matter), we would see that we run into the halting problem and matters get rather more complicated/interesting.
Read these notes and do the exercises.
Here are two further discussion questions:
Even though Church invented the lambda calculus as a caculus of mathematical functions, for a long time it only had an operational semantics. In fact, it was known that there cannot be a (naive) set-theoretic denotational semantics for (untyped) lambda calculus.[20] This was a significant obstacle since set theory was (and is) the most widely accepted foundation of mathematics. Then, in 1969, two things happened: While working on a paper (Scott, 1969/1993) arguing against the (untyped) lambda calculus of Church,[21] Dana Scott found a denotational semantics for it (Scott, 1972). With these discoveries, Scott not only solved one of the outstanding problems of theoretical computer science, he also initiated the fields of domain theory and denotational semantics, which in turn had ramifications in process algebra, concurrency, theorem proving, verification, program analysis. For some pointers to the early literature, continue reading the historical references below.
Reading the introductions of the following historical references gives a sense of the developments. Many of the questions that arose at the time remain fundamental in the area of programming languages.
The early history of the subject is difficult to pin down. We are lucky that the following has been published (more than 20 years after it was written). It is the paper that argued against the untyped lambda calculus and introduced a typed lambda calculus known as the logic of computable functions (LCF):
In the 1993 preface to the above paper, Scott tells the story how he finally found a mathematical model for the (untyped) lambda calculus. The earliest publication of this I could find is:
For a modern account on the field that was started by this work of Scott, see Abramsky and Jung's Domain Theory. The next paper emphasises the connections between the mathematical theory (domain theory) and its application to programming languages (denotational semantics).
Ironically, Scott's unpublished work on LCF was as important as his models of untyped lambda caclulus. Robin Milner turned LCF into one of the first modern interactive theorem provers. Moreover, in order to implement LCF, he invented one of the major functional programming languages, ML. Then, to make programming in ML practical, he created the type-inference algorithm described in A theory of type polymorphism in programming.[22] An early paper by Milner on LCF is the following. It is still surprisingly readable, already written in the notation we still use today.
The next in my list is one of the early papers that showed how to apply the nascent area of domain theory to the semantics of programming languages. It introduced PCF, which is to the day the paradigmatic language to introduce domain theory and denotational semantics.
As a direct response to Plotkin's work on PCF, Milner introduced the notion of full abstraction.
Milner provided a syntactic model (syntax quotiented by an equivalence relation). The question of finding a fully abstract mathematical model for PCF not arising as a quotient of syntax was one of the most famous and fruitful problems in Programming Languages. The best-known solutions have been found around 1993 by Abramsky, Jagadeesan and Malacaria and by Hyland and Ong. For more on the history see the introduction of Full Abstraction for PCF and the conclusion of On Full Abstraction for PCF.
The work on full abstraction for PCF led to a third style of semantics, game semantics. These three styles of semantics, operational semantics, denotational semantics and game semantics continue to play a major role in Programming Languages research. A graduate level introduction to the topic can be found in Dan Ghica's videoed lectures on Game Semantics.
We followed some of the history of denotational semantics and domain theory. The modern appraoch to operational semantics was introduced in
For more on the history of the ideas sketched above see
(This is material typically covered in graduate courses on programming languages.)
The following lecture by one of the leading researchers in domain theory and denotational semantics explains some of the basic questions of the area in quite general philosophical terms that should be of interest to a wider computer science audience.
Next, I list some sources one would study in a graduate course on Semantics of Programming Languages.
The first 30 min of the first lecture of the videoed course
gives a good introduction to Semantics of Programming Langauges in general. The rest of the first lecture explains the idea of functions as processes at the heart of game semantics.[23] After that, the introduction of Game-theoretic analysis of call-by-value computation comparing call-by-name and call-by-value should make sense.
Classic texts on domain theory and denotational semantics are:
The "canonical" content of a first course on denotational semantics is discussed in
While this outline of a course on denotational semantics is addressed to the teacher rather than the student, it may be a useful overview for students as well. The technical details can be found in (Streicher, 2006).
Thanks to Clarisse Bonang, Dan Haub, Drew Moshier for suggesting improvements.
Is truth objective or is truth relative to agreement by people? This question has been important ever since the invention of money, writing and propaganda, but is becoming ever more urgent now where more and more people spend most of their lifes online in virtual worlds. ↩︎
"What is" questions are often what moves science forward. Typically, these questions do not have definite answers and every now and then a new answer is found, often moving science forward decisively. Examples are: What is the infinite? What is the continuum? What is space? What is time? What is causality? What is logic? What is a proof? What is an algorithm? What is communication? What is probability? What is causality? All these question have received mathematical answers which are important to engineering despite not being settled. ↩︎
Roughly speaking, the outcome of the discussion will be that we cannot fully escape the fact that meaning depends on "subjective" pre-scientific understanding (the pre-understanding) tied to our human existence, sometimes called thrownness and giving rise to the hermeneutic circle, the relevance of which to science was studied by Jean-Michel Salanskis in his book "L'herméneutique formelle". The relevance of the hermeneutic circle to computer science has been presented by well-known AI researcher (and PhD advisor of Google founder Larry Page) Terry Winograd (in collaboaration with Fernando Flores) in the book Understanding computers and cognition : a new foundation for design. Importantly, from a software engineering point of view, while the philosophical problems should inform our work, they do not prevent us from creating ad hoc formalizations of meaning that can be turned into powerful software tools (such as verified compilers). ↩︎
Some comments:
Recall that by saying "abstract syntax", we are signalling that we think of syntax as trees. Parentheses are not part of the abstract syntax and are only used to unambiguously write trees in one-dimensional notation. ↩︎
We want to do a computation like the following.
(1+1)*((1+1)+1) = (1+1)*(1+1)+(1+1)*1
= (1+1)*(1+1)+(1+1)
= ((1+1)*1+(1+1)*1)+(1+1)
= ((1+1)*1+(1+1))+(1+1)
= ((1+1)+(1+1))+(1+1)
= (((1+1)+1)+1)+(1+1)
= ((((1+1)+1)+1)+1)+1
↩︎Consider the equations/computation rules/rewrite rules
X + ( Y + Z ) = ( X + Y ) + Z
X * 1 = X
X * ( Y + Z ) = X * Y + X * Z
↩︎These last two paragraphs may not be clear right now, but I will expand on them in the next two subsections. A good way to clarify these ideas for yourself is to make your own examples and discuss them with me. ↩︎
It is interesting to think about why the Haskell code runs. There is still a difference between source (blue, Exp
) and target (red, Integer
) language. But all are embedded into Haskell (black), which has itself an operational semantics. ↩︎
This proof is famous in category theory. If you look at page 167 of MacLane's Category Theory for the Working Mathematician you may recognize the diamond shape we encountered when we discussed confluence. This is not a coincidence. If one peels away all the category theory, MacLane's famous proof boils down to a simple rewriting argument. Essentially, it just show that the rewriting system given by the equation above has unique normal forms. ↩︎
We write
For our current purposes we can take isomorphic to mean bijective and isomorphism to mean bijective function. But ismorphism is the right concept if we want to compare not only sets but also take into account operations like +
and *
. So I prefer to use the terminology that will continue to be useful for a more thorough investigation we may turn to later (if time permits). ↩︎
We define the equivalence relation exp ::= 1 | exp + exp
as the smallest equivalence relation containing
3+2 and 5 are certainly different as strings. But in some sense they also differ in information content: If I tell you that I am thinking of two numbers the sum of which is 5, you won't be able to say which numbers I am thinking of. Computation destroys information. ↩︎
Let
Lemma: Let (...(1+1)+... 1)
. Then
Exercise: Define
"Independent of any notion of computation" is somewhat vague and one can debate in each particular case how well this aim was achieved. What is important is that the denotational and the operational model both shed different lights on the meaning of the language and complement each other. In the middle we have the model that is constructed from quotienting syntax. ↩︎
Moreover,
A relevant idea here is the universal Turing machine. ↩︎
We may come back to this if you are interested. ↩︎
And reading the introduction, one gets a real sense of the frustration caused by the inability of finding a denotational semantics of the lambda calculus. ↩︎
This paper contains the famous slogan that well-typed programs cannot go "wrong". This slogan is usually quoted without the quotes and then has to be taken with a grain of salt. But it is also a theorem in the paper, see page 365. ↩︎
This will make more sense after you learned about lambda calculus. ↩︎