Universal Algebra 3: Variables, Free Algebras

"In order to find the value of an expression it is necessary to know the value of its components. Thus to find the value of a + 5 + b/a we need to know the values of a and b. Thus we speak of evaluating an expression in an environment (or sometimes relative to an environment) which provides the values of components."

Christopher Strachey, Fundamental Concepts in Programming Languages

Introduction

In the previous lecture we learned about

term-algebras
homomorphisms
initiality

In this lecture we will see how to add

variables
substitutions
equations

into the picture. We start with some general consideration about variables, then run again through our running example from the syntax and semantics lecture and think about how to extend it with variables. This then leads to the new notion of free algebras.

What are variables?

There are at least two possible answers.

Let us think of natural numbers with addition, that is, the algebra

(N, 0, +)

Variables as place holders

For example, going back in school to when we first encountered variables, we wanted to express, for example, that

+

is commutative, that is

\begin{aligned} 1 + 2 & = 2 + 1 \\ 1 + 3 & = 3 + 1 \\ \dots & = \dots \\ 2 + 3 & = 3 + 2 \\ \dots & = \dots \end{aligned}

and then found that it is much easier to write instead

x + y = y + x

So, in the first answer, variables are place-holders for the elements of the algebra and just a clever notation to avoid writing infinitely many equations.

Variables as first class citizens

What distinguishes high-school algebra from primary school computations with numbers?

One of the main ideas introduced in high-school algebra is that of variables as first-class citizens. For example, when we do computations such as

x^{2} - 1 = (x + 1) \cdot (x - 1)

to find the 0's of the quadratic equation

y = x^{2} - 1

we treat the variables as part of the data ("first class citizens"), not just as place-holders that represent numbers.

For example, we say that

y = x^{2} - 1

"is a parabola", which makes no sense when we replace

x, y

by individual numbers as in

y = 0

and

x = 1

Meta-variables vs object-variables

In programming languages we often call the programming language that is studied the object language, as opposed to the meta-language, which is the language that is used to study the programming language, typically mathematics or English or a mix of the two. We have seen this distinction of object-language and meta-language at work in the lecture on syntax and semantics where the language of expressions exp is the object language but, for example, the numbers

N

and the meaning function

[[-]]

are part of the meta-language.

In primary school the object languages consists of numbers and operations

+, *, -, /

. Later, when we learn first about variables, they are meta-variables, which we use to denote numbers in the object language. The step to algebra consists in adding variables to the object language. To distinguish these variables from meta-variables we call them object-variables.

For example, when we computed above

x^{2} - 1 = (x + 1) \cdot (x - 1)

we did not do this computation for all possible numbers, but we used rules of computation that did directly manipulate the symbols

x, y

. In other words, variables are "objects" of rules of computation.

We will encounter below further examples of meta-variables and object-variables.

Outlook on variables in programming languages

Of course, in programming languages variables also play a crucial role, both as mathematical variables and as names for memory cells and other resources.

Variables are sometimes called "names" or "nominals" or "identifiers" if the operation of interest is the equaltiy test
$x = y$ (which here is not a comparison of values but a comparison of the names themselves). ^[1]
Mathematical variables are names that can have a value. They also can be substituted.
Programming variables are variables the value of which can change over time.
Both mathematical and programming variables can be free or bound, and can have scope. We will hear much more about this when we come to
$λ$ -calculus.

In this lecture, we will only look at the simplest kind of mathematical variables.

To see what is at issue let us go back to the example of the first lecture.

Example: Arithmetic Expressions

Recall from the lecture on syntax and semantics that we started with an example of a formal language given by

    exp ::= 1 | exp + exp | exp * exp

and we defined its semantics by

\begin{aligned} [[-]] : L & \to N \\ [[1]] & = 1 \\ [[t + t^{'}]] & = [[t]] + [[t^{'}]] \\ [[t * t^{'}]] & = [[t]] \cdot [[t^{'}]] \end{aligned}

What happens if we add variabls to the language?

Pause the lecture and think about it.

First, we need to add variables to the syntax. We can add one variable x by defining

    exp ::= 1 | x | exp + exp | exp * exp

or we can have a set Var of variables all of which we may want to use in expressions (Question: How many variables are there in your favourite programming language?) and we write

    exp ::= 1 | v | exp + exp | exp * exp         v in Var

Remark: In the first definition of exp the symbol x is the variable we want to use in the expressions. In the second definition v is a place-holder that ranges over the set of variables Var.^[2] ^[3] For example, if Var is the set consisting of x,y,z, then the second definition of exp is a short hand for

    exp ::= 1 | x | y | z | exp + exp | exp * exp

In programming terminology we can think of the v in Var as a macro that is expanded to turn the second definition of exp into the third. In practice, the notational distinction between a variable x and the place-holder v that ranges over variables such as x is usually not made. So don't worry too much about it now. But distinctions such as these will become important again when we will look at compilers next semester. (End of Remark)

How can we modify the semantics to account for variables? We need to add to the three clauses of

\begin{aligned} [[-]] : L & \to N \\ [[1]] & = 1 \\ [[t + t^{'}]] & = [[t]] + [[t^{'}]] \\ [[t * t^{'}]] & = [[t]] \cdot [[t^{'}]] \end{aligned}

one clause for variables. But where do we map variables to? What should

[[x]]

be?

Something is missing here … so let us just add it!

Environments

The solution is to introduce an "environment", that is a function

ρ : Var \to N

(pronounced as "rho") that evaluates variables. We can then define the semantics

[[]]_{ρ}

depending on the parameter

ρ

as follows:

\begin{aligned} [[-]]_{ρ} : L & \to N \\ [[1]]_{ρ} & = 1 \\ [[t + t^{'}]]_{ρ} & = [[t]]_{ρ} + [[t^{'}]]_{ρ} \\ [[t * t^{'}]]_{ρ} & = [[t]]_{ρ} \cdot [[t^{'}]]_{ρ} \\ [[x]]_{ρ} & = ρ (x) \end{aligned}

Remark: Notice that

[[-]]_{ρ}

preserves

1, +, *

but not

x

. In other words,

[[-]]

is a homomorphism wrt the signature

{1, +, *}

but not wrt the signature

{1, +, *, x}

Remark: It is typical for universal algebra that the

ρ

in the definition above always stays the same. This will change as soon as we start applying this methodology to programming languages that have assigments or variable binding or higher-order functions. But for now it is good to explore in some depth the simpler situation. (Our programming language is now that of a calculator that can deal with (mathematical) variables but does not allow the user to define functions or write programs.)

The next step is to look at the above definition of

[[-]]

trough the eyes of universal algebra. One of the central notions of universal algebra is that of a free algebra.

Free algebras

The aim of this section is to show define so-called free algebras and to show how they are used to describe syntax and semantics of languages that inlcude variables.

There are at two different ways to introduce free algebras. A concrete one, via term-algebras, and an abstract one, generalising the universal property of initiality.

Term-algebras over a set of variables

The first one emphasises the similarity of variables and constants: The free algebra over a set of variables

X

, is the term algebra that is obtained when adding to the operations of the signature also the variables

x \in X

as new constants symbols. At this point, the only difference between constants and variables is that the variables are not part of the signature and, therefore, do not need to preserved by homomorphisms.

Example: Let

Σ = {1, +, *}

be the signature of our running example, that is, with one constant 1 and two binary operations

+, *

. The free algebra over the set

X = {x, y, z}

of variables is the language

L_{X}

given by

            exp ::= 1 | exp + exp | exp * exp | x | y | z

with operations

1^{L_{X}}

+^{L_{X}}

*^{L_{X}}

defined in exactly the same way as in the section on term-algebras in the previous lecture.

It is clear that the example above generalises to arbitrary signatures. This leads to new

Notation: We denote by

T e r m_{Σ} (X)

, or just

T e r m (X)

the term-algebra over a set

X

Substitution as a homomorphism

Here we want to show that substitutions are homomorphisms on the term-algebra.

For concreteness let us fix a set

X = {x, y, z}

of variables.

I claim that homomorphisms

h : T e r m (X) \to T e r m (X)

are precisely substitutions.

First, every homomorphism

h

defines a substitution

s : X \to T e r m s (X)

given by

s (x) \overset{d e f}{=} h (x)

So a substitution just specifies for each variable

x

a term

s (x)

, which is meant to be the term replacing

x

in another term

t

This means that we also have the operation of applying a substitution

s

to a term

t

. This is often written as

t [x \mapsto s (x)]

t [x \mapsto s_{x}]

t [x := s_{x}]

Activity: Make some concrete examples and think about how to implement the operation that takes a term and a substitution and applies the substitution to the term. Think about why this operation is a homomorphism and why it is important that the signature does not contain the variables as constants.

We summarise the activity by

Proposition: There is a bijective correspondence between functions (substitutions)

X \to T e r m (X)

and homomorphism

h : T e r m (X) \to T e r m (X)

. More precisely, for all functions

s : X \to T e r m (X)

there is a unique homomorphism

h : T e r m (X) \to T e r m (X)

such that

s (x) = h (x)

for all

x \in X

Notice the similarity of the proposition with the definition of initiality.

Free algebras defined

The second way to introduce free algebras directly leads to the official definition of free algebra. Recall the universal property of initiality:

An algebra

I

is initial if for all algebras

A

there is a unique homomorphism

I \to A

How should we modify this universal property if, moving back to our example, instead of the initial algebra

L

we want to characterise the algebras

L_{X}

given wrt a set

X

of variables?

Recall how we introduced the environments

ρ : X \to N

in our running example (back then we wrote Var instead of

X

). The semantics

[[-]]

was defined as the unique function such that

\begin{aligned} [[-]]_{ρ} : L & \to N \\ [[1]]_{ρ} & = 1 \\ [[t + t^{'}]]_{ρ} & = [[t]]_{ρ} + [[t^{'}]]_{ρ} \\ [[t * t^{'}]]_{ρ} & = [[t]]_{ρ} \cdot [[t^{'}]]_{ρ} \\ [[x]]_{ρ} & = ρ (x) \end{aligned}

The first three conditions say that

[[-]]

is a

Σ

-homomorphism. Hence

[[-]]

is the unique homomorphism satsifying

[[x]] = ρ (x)

To summarise, we have seen that

L_{X}

has the following property.

For all algebras

A

and for all environments

ρ : X \to A

there is a unique homomorphism

h : L_{X} \to A

such that

h (x) = ρ (x)

We turn this now into a

Definition: Let

Σ

be a signature and

X

a set. A

Σ

-algebra

F_{X}

is called a free
$Σ$ -algebra over
$X$ if for all

Σ

-algebras and all functions

ρ : X \to A

there is a unique homomorphism

h : F_{X} \to A

such that

h (x) = ρ (x)

Remark: The notation

F_{X}

is meant to suggest "free over

X

". Other common notations are

F X

F (X)

T e r m_{Σ} (X)

T (X)

T X

and many more. There is no standardized symbolic notation for free algebras, but the terminology "free algebra" is used uniformly in mathematics and computer science.

Remark on the difficulty of maintaining a uniform notation: Why do we change notation from Var to

X

, or from

L

I

, or from

L_{X}

T e r m (X)

F_{X}

Changing notation always make learning a subject more difficult. On the other hand, it is impossible to keep a consistent and good notation across different levels of abstraction. And as this is something we need to deal with all the time as software engineers anyway, we can as well get used to it now.

For example: On the concrete level one prefers notation that reflects the particularities of the current situation. For instance, in programming we name variables and functions in such a way that the code becomes self-documenting as far as possible; on an abstract level, though, one prefers a more neutral naming convention that is not biased towards a particular application or interpretation.

References

…

Assigning unique names is a more subtle problem then it may look like at first sight.
(1) For example, if we want to have unique postal addresses, we use house numbers to identify a house in a street. We use street names to identify streets in a city. Etc. But note how uniqueness is achieved by fixing a context such as "in a street", "in a city", etc.
(2) It is common to use numbers as addresses (Activity: List 3 examples of this.) One property that makes numbers suitable for this is that there infinitely many numbers. So if one needs a new name, we know that there must be one available.
(3) It is not always convenient or practical to keep a list of all names that have been assigned. (Activity: List examples.) A common solution then is to use hash-functions. Hashes can be used as names because they are (almost) unique and they can be compared for equality, which is the basic operation one needs on names.
(4) Activity: Think about, in the spirit of our discussion about abstract data types, what are the operations that should be supported by an abstract data type of names. (As in the case of sequences, there may be more than one answer.) ↩︎
x is an object-variable, v is a meta-variable. ↩︎
I want to counter the impression that meta-variables are simpler than object-variables. This impression may be created by our examples which suggests that meta-variables are merely placeholders. In fact, meta-variables can carry the same complicated structure as object variables. For example, both exp and Var are meta-variables, with exp being a bound variable and Var a free variable. We also see that the scope of Var changes in the remark: First, Var refers to any set, later Var is in a scope in which it has value
${x, y, z}$ . ↩︎