How does induction really work? (2018)

Another title of this note could have been

Theorem proving with Isabelle and Idris

(thanks to Samuel Balco for implementing the Isabelle and Idris code)

In this note we will again try to understand more deeply something we already know how to do. Namely how to reason with equations. And again, this understanding will allow us to put on a machine something that used to be the reserve of humans: automated reasoning.

But before we can go there we need to understand mathematical induction better.

Examples of Induction

Induction can come in many different forms:

Numbers

We can show, to give just one example, that the equation

\begin{matrix} (bb) & 1 + 2 + \dots n = 1 / 2 \cdot n \cdot (n + 1) \end{matrix}

holds for all positive integers as follows. If

n = 1

then the LHS is 1 (you start and end at 1, so 1 is all you have) and the RHS is 1 as well. If

n = k + 1

then

\begin{aligned} 1 + 2 + \dots n & = 1 + 2 + \dots (k + 1) \\ = (1 + 2 + \dots k) + (k + 1) \\ = 1 / 2 \cdot k \cdot (k + 1) + (k + 1) \\ = (1 / 2 \cdot k + 1) \cdot (k + 1) \\ = 1 / 2 \cdot (k + 2) \cdot (k + 1) \\ = 1 / 2 \cdot n + 1 \cdot n \\ = 1 / 2 \cdot n \cdot (n + 1) \end{aligned}

Exercise: Go through the equational reasoning above. Can you justify/explain each step?

Remark: Why is it correct to apply the equation we want to prove in the proof of the equation itself? Isn't circular reasoning unsound? Depends … some forms of circularity are justifiable, others are not. In the example above, we prove the equation for

n = k + 1

by using the equation for

n = k

. This is ok, because, roughly speaking,

k

is smaller than

k + 1

. One way to understand this better is by making an analogy with programming. If I define eg the factorial function f by f(n)= if n=0 then 1 else n*f(n-1) I made a circular definition. But is this a definition? Does it actually define a function ? Or, in other words, does f terminate on all inputs? Yes, f does terminate on all inputs … because each recursive call f(n-1) calls f with a smaller argument n-1.

Expressions

We defined a class of arithmetic expressions in BNF as

    num ::= 1 | num +1
    exp ::= num | exp + exp | exp * exp

This is an inductive definition.

Transitive Closure

We defined the transitive closure of a relation

R

as the smallest relation

R^{+}

such that

$R \subseteq R^{+}$ and
$x R^{+} y & y R^{+} z \Rightarrow x R^{+} z$

This is an inductive definition.

Equational Reasoning

We added to the exp above some equations. For example,

X + ( Y + Z ) = ( X + Y ) + Z

and then claimed that for all numbers n and m, the equation

n + m = m + n

follows already, even without having an equation for commutativity.

What do we mean when we say that an equation follows from some others?

Given a set of equations

E

, can we define the set

E^{'}

of all equations that follow from

E

An equation is in this set

E^{'}

if you can derive the equation from the equations in

E

using the usual rules of equational reasoning from high-school algebra.

This is an inductive definition.

What is Induction?

Numbers, context-free grammars, transitive closure, equational reasoning

\dots

all look different.

Can we explain how they all are instances of the same general phenomenon?

In all four examples, we define a set of elements:

N

R^{+}

, exp,

E^{'}

In all cases, the set is infinite, so writing it down with a finite number of rules is doing something clever.

Example: In case of numbers we can write the two rules as

\frac{}{0 \in N} \frac{n \in N}{S n \in N}

Of course, the trick is that the rule on the right has a free variable

n

that can be instantiated with any element of

N

What is inductive about this definition?

This is the crucial point:

When we say that the two rules above "define

N

inductively" or "are an inductive definition of

N

", we are really saying that

N

consists of all elements,and only those, that can be formed according to the rules.

In other words,

N

is the smallest set closed under the two rules. ^[1]

Exercise: What are the rules for the example of arithmetic expressions and for transitive closure? Write them in a form that resembles as much as possible the two rules for the natural numbers.

For a solution see the footnote. ^[2]

The case of equational reasoning is important for too many reasons to start making a list. We will discuss it in some detail in the next section.

Equational Reasoning

We define inductively the set of all equations

e_{1} \approx e_{2}

that we want to study for expression. We write

n, m, \dots

to denote nums and

e

's to denote exps.

\frac{}{1 \in n u m} \frac{n \in n u m}{S n \in n u m}

I write now

S n

instead of

n + 1

to distinguish the

+ 1

from addition. And for expressions are given by

\frac{n \in n u m}{n \in e x p} \frac{e_{1} \in e x p e_{2} \in e x p}{e_{1} + e_{2} \in e x p}

I omit the rule for multiplication because it is not needed in the following.

As an axiom on expressions we want to have for now only

e_{1} + (e_{2} + e_{3}) \approx (e_{1} + e_{2}) + e_{3}

and we want to show that

n + m \approx m + n

follows already from associativity. Of course, this is only an example. What we are really interested in is to understand what it means for one equation to follow from other equations.

As indicated above, the idea is to inductively define the set of all equations that follow. For this, we need to write out the rules of equational reasoning. They are as follows:

\frac{}{e \approx e} (r e f l) \frac{e_{1} \approx e_{2}}{e_{2} \approx e_{1}} (s y m) \frac{e_{1} \approx e_{2} e_{2} \approx e_{3}}{e_{1} \approx e_{3}} (t r a n s)

and

\frac{e_{1} \approx e_{1}^{'} e_{2} \approx e_{2}^{'}}{e_{1} + e_{2} \approx e_{1}^{'} + e_{2}^{'}} (c o n g)

To these rules of equational reasoning we want to add, as we said, in our example, the axiom

\frac{}{e_{1} + (e_{2} + e_{3}) \approx (e_{1} + e_{2}) + e_{3}} (a s s o c)

To warm up, and to understand how equational reasoning is inductive, let us prove something even simpler then

n + m \approx m + n

, namely

1 + n \approx n + 1

The most important idea to understand in this note is the following.

To say that the equation

$1 + n \approx n + 1$ follows from
$(a s s o c)$ by equational reasoning is to say that
$1 + n \approx n + 1$ is an element of the set inductively defined by the rules

(r e f l), (s y m), (t r a n s), (c o n g), (a s s o c) .

If you have any doubts about this statement, stop and think …

How do we show that

1 + n \approx n + 1

is an element of that set?

How would we do this in high-school algebra style?

n = 1

, then

1 + 1 = 1 + 1

n = S k

, then

1 + S k = 1 + (k + 1) = (1 + k) + 1 = (k + 1) + 1 = S k + 1

Exercise: Can you justify all the steps in the chain of reasoning above?

Exercise: Write the chain of reasoning out in way that one can see at each step which rule among

(r e f l), (s y m), (t r a n s), (c o n g), (a s s o c)

is applied in the reasoning.

This last exercise, if done properly, means that we understand in all detail how equational reasoning works. This should mean that we can implement it on a machine.

The next section will look at two programming languages, Isabelle and Idris, in which these proofs can be implemented.

Excursion on Equational Logic

The topic of the current lecture is not logic but the more general phenomenon of induction. Nevertheless, while analysing familiar reasoning with equations from the point of view of induction, we discovered almost all the rules of equational logic. So we may as well give them here.

Equational logic is the part of logic that is only concerned with proving new equations from old. The rules of equational logic are the one we have seen above (we use now

t

for "term" where we used

e

for "expression" before)

\frac{}{t \approx t} (r e f l) \frac{t_{1} \approx t_{2}}{t_{2} \approx t_{1}} (s y m) \frac{t_{1} \approx t_{2} t_{2} \approx t_{3}}{t_{1} \approx t_{3}} (t r a n s)

and, for all

n

-ary operations

f

a congruence rule

\frac{t_{1} \approx t_{1}^{'} \dots t_{n} \approx t_{n}^{'}}{f (t_{1}, \dots t_{n}) \approx f (t_{1}^{'}, \dots t_{n}^{'})} (c o n g)

and a rule for substitution of terms into variables that we won't explain now but encounter later again

\frac{t_{1} \approx t_{2}}{t_{1} [x \mapsto t] \approx t_{2} [x \mapsto t]} (s u b s t)

Summary of what we learned so far

An inductive definition defines a smallest set closed under a finite set of rules
This accounts for such different examples as
- the set of natural numbers
- the set of programs of a programming language specified in BNF (ie by a context-free grammar)
- the set of equations that can be derived from a given set of assumptions

Thus we have a set of analogies

mathematics	computer science	logic
numbers	programs	theorems

with the sets of numbers/programs/theorems being defined inductively. ^[3] Of course, there is much more defined inductively, for example we could also define a set of proofs inductively.

So can we compute with programs and theorems as we can compute with numbers?

Arithmetic Expressions in Isabelle and Idris

See the next lecture

You could say that
$\frac{}{0 \in N}$ does not look like a rule, because it does not have premises. I'd rather say that it has an empty set of premises. ↩︎
The context-free grammar
```
num ::= 1 | num +1
exp ::= num | exp + exp | exp * exp
```
can be written as a set of rules as follows.

$\frac{}{1 \in n u m} \frac{n \in n u m}{n + 1 \in n u m} \frac{n \in n u m}{n \in e x p} \frac{e_{1} \in e x p e_{2} \in e x p}{e_{1} + e_{2} \in e x p} \frac{e_{1} \in e x p e_{2} \in e x p}{e_{1} * e_{2} \in e x p}$
The mathematical definition of transitive closure
- $R \subseteq R^{+}$ and
- $x R^{+} y & y R^{+} z \Rightarrow x R^{+} z$
can be written as a set of rules as

$\frac{(x, y) \in R}{(x, y) \in R^{+}} \frac{(x, y) \in R (y, z) \in R^{+}}{(y, z) \in R^{+}}$ ↩︎
As often in research, what starts out as a curious analogy can be turned into an important research programme. So let us take, for a moment, the analogy program vs theorem seriously. If programs are like theorems, then parsing is like proving. This is the basic idea of two papers by Lambek (1958, 1961) that spawned whole research areas in both linguistics and in logic. ↩︎