Try   HackMD

CPSC-298 Logic in Software Engineering (Lean)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This course is taught by Alexander Kurz (email) in Spring 2024 and, while independent of, complements Jeff Turner's course Using Large Language Models for Coding Assistance. On the surface, it may not be evident how these two courses play together, but Jeff and I designed both courses with common future developments in mind. Roughly speaking, the connection is that neural AI and symbolic AI are coming together in an emerging field called neuro-symbolic AI.

This is a one-unit course open to all students and has no prerequisites.

In this course we will use the interactive theorem prover Lean to learn about logic in general and applications to software engineering in particular. The focus is on hands-on programming in Lean, but we will be able to glimpse at many advanced topics[1] such as algebraic data types, λ-calculus, type systems, parsing, functional programming, higher-order functions, recursion (dynamic programming), specification vs implementation, proofs by induction,

Make sure to join the Discord if you want to follow the course.

(This course ended now.)

Summary

In this course we will focus on learning from the online games available at the Lean Game Server, in particular:

See the appendices below if you want to study Lean beyond of what is required for this introductory course.

Under Propositional Intutionistic Logic, I link the full formal summary of what we are going to learn in terms of rules of logical reasoning.

Lecture 1 (Introduction)

  • Introduction to Interactive Theorem Proving. Neural and symbolic AI.
  • Natural Number Game (NNG)
    • Why successor numbers (no carrying, unbounded, induction, efficient for proofs, ).
    • Getting started is a bit like learning assembly. [2]
    • While working on the NNG make a list of all theorems. They will come in handy as the puzzles get more challenging.
  • Homework: How far in the Natural Number Game can everybody get? Make a list of what you learned.
  • Discord Rules for Homework:
    • Announce when you complete a "world".
    • If you get stuck ask an interesting question.
    • Post interesting answers to interesting questions. Dont post spoilers.
    • (If you are unsure what is meant by "interesting", dont worry, we will figure this out as we go along.)
    • I dont have to give you a grade, but for getting a pass I expect you to participate actively.

Lecture 2 (Natural Number Game)

Why Theorem Proving

Theorem proving can help answering the question "Is my program correct?"

Take the following Python program that squares a number:

def square_number(x):
    return x * x

You will maybe test it for one or two inputs but not spend much time worrying about its correctness because you know that it must be correct.

It is worth asking the question: How do you know that this program is correct?

Because you have a proof of its correctness: The equation x2=xx is true.

One aim of this course is to convince you that the idea of proving programs correct scales to much more complicated programs that are not obviously correct.

Theorem Proving as a Search Problem

Many common problems in programming can be understood as search problems. Think of finding your path out of a maze. You have a start state and a goal and you have to find a path from start to goal.

You search your way out by guessing the next step and, possibly, backtracking if you feel like ending in a cul-de-sac.

(Btw, what is a good way to do this in an actual maze? Have you heard of Ariadne's thread?)

Refinements of this basic idea are at the heart of many algorithms, including those for theorem proving.

Playing the NNG (Tutorial World)

In theorem proving, guessing the next step that brings you closer to the goal is guessing the so-called "tactic" you want to use.

When you work through the examples below, watch out how each time you apply a tactic the goal changes.

On the first level, you only have to make one step by choosing the tactic

rfl

On the second level, you first substitute y by x+7 using the tactic

rw [h]

where rw stands for "rewrite" and h is (a proof of) y=x+7.

On the third level you use

rw [two_eq_succ_one]
rw [one_eq_succ_zero]

to rewrite 2 as succ 1 and then to rewrite 1 as succ 0.

On the fourth level (which we didnt do in class) you learn how to rewrite an equation X=Y from right-to-left instead of from left-to-right (as we have done on the previous levels).

Natural Numbers

Going back to Level 3, to understand how natural numbers work, we should be asking the following question.

What are natural numbers?

We can find this out by clicking on the "N" symbol in

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

which leads us to

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

In English the bullet points tell us

  • 0 is a number
  • If n is a number then succ n is a number.

Note that in some sense this is more powerful than the data type of numbers you have in a programming language. For example, in C, the datatype int has only finitely many elements whereas the datatype N has infinitely many elements since the rule of the second bullet point can be applied arbitrarily often. However big your number, you can always to +1 to get a bigger one.

What About Negative Numbers?

Good question. We can add negative numbers. Let us postpone this for now, though.

Homework

Continue with the NNG.

Lecture 3 (Tutorial World, cont'd)

The aim of this lecture was to get everybody to finish the Tutorial World. Next lecture we will finally get to induction in more detail. If you are still on the Tutorial World, finish it this week and do not hesitate to use the Discord to get you unstuck.

In class, we talked in some detail about level 1/8 which asks you to prove

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

I write the proof out in mathematics style. Your task is then to figure out how to do this in Lean.

2+2=SSO+SSO def of 2=S(SSO+SO) def of +=SS(SSO+O) def of +=SSSSO def of +=SSS1 def of 1=SS2 def of 2=S3 def of 3=4 def of 4

In Lean, you program the proof above by using so-called tactics that name the steps in the right-hand column above. For example, to find the Lean syntax for the definitions of the numbers 1,2,3,4 use

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

and for the definition of + click on the "+" in

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Question: This is a long proof for 2+2=4. What do we do if the numbers get larger?

Answer: Induction.

Homework: For next lecture:

  • Try to get as far as possible with the Addition World.
  • Ask an interesting question on Discord in the channel "w3-question" by the end of next Tuesday. Here are the rules:
    • An interesting question consists of two parts. The first part (typically one or two sentences) sets the scene. The second part (typically one sentence) asks the question.
    • An interesting question ideally both has some technical content and is open-ended.
    • An interesting question doesnt have a yes/no answer.
    • A purely technical question that only aims to get you unstuck on a proof does not count as interesting[3]. But you can take a technical problem as a lead towards an interesting question. This should be an open ended question on the topic of the class. Purely technical questions to help you get unstuck should go in the "general" channel.
  • All other questions are of course also welcome, but they should go on the channel "general".

Lecture 4 (Addition World)

The aim of the lecture was to learn how to use mathematics as an intermediate language to go from English to Lean Tactics.

Introduction

I argued that the essence of programming is to describe infinitely many things by finitely many rules.[4]

For example, a program that sorts lists of numbers is finite, but it can sort any list in the infinite set of all lists of numbers.

Similarly, addition is finite but it allows us to add arbitrarily large numbers. In fact, addition is defined by two rules

add a 0 = a
add a (S b) = S(add a b)

Remark: The aim of the course is to "teach" machines how to reason. Whatever the model of computation we use to define "machine", all a machine is able to do is to push symbols around. It is therefore worth looking at the two equations (rules) above from the point of view of symbol pushing:

  • The first rule says that 0 can be removed.
  • The second rule says that an "inner" S can be moved to the "outside".
    Note how powerful the second equation is. If you wanted to implement it via a symbol pushing model of computation it could get quite complicated. For example, given SS0+SSSS0, we might first need a for-loop to process the string from left to right in order to find the relevant S in SS0+SSSS0 and then we need another for-loop to move it out so that we end up with S(SS0+SSS0).[5]

Mathematics as a Programming Language

I started with some general remarks on the stack of abstractions in computer science, mathematics and science more generally. The purpose of this lecture was to introduce an intermediary level of abstraction between a theorem and its implementation in Lean.

In our example (Level 1/5), the question was how to go from

image

to

induction n rw [add_zero] rfl rw [add_succ] rw [n_ih] rfl

The answer to the question is to write down the proof in mathematics first as follows.

Proof: We prove 0+n=n for all nN by induction on n.

Case n=0:
0+0=0add_zero

Case n=Sk:
0+Sk=S(0+k)add_succ=Skinduction hypothesis

Remark:

  • The proof comes in two columns. On the left, we have a sequence of equation. On the right, we have a justification for each equation.
  • Writing math, you first write the equations and then the justification.
  • Translating the math to Lean, you use the right-hand column to name the tactics and Lean will construct the equations.

Homework:

  • Go through the Lean proof and line it up against the math proof line by line. The Lean proof has more detail, but the same structure.
  • Choose a proof by induction on a level greater than 1/5 and post your math proof on Discord w4-question together with a question related to the topic of today.

Remark: If you want to use my method for typesetting a math proof in markdown and latex make a (free afaik) account with https://hackmd.io . You can use the code for my math proof above as a template:

*Proof:* We prove $0+n=n$ for all $n\in\mathbb N$ by induction on $n$.

Case $n=0$: 
\begin{align}
0+0 &= 0 & \text{add_zero}
\end{align}

Case $n=S\,k$: 
\begin{align}
0+ Sk &= S(0+k) & \text{add_succ}\\
&= Sk & \text{induction hypothesis}\\
\end{align}

The best way to learn how to adapt the above markdown/latex code for your own purposes is to experiment and talk to your favourite LLM. (Also ask on our Discord.) Here are some hints:

  • Markdown: * for italics, ** for bold, backquotes for code.
  • Latex:
    • $...$ for math
    • $$...$$ for displayed math
    • align for equations, & for separating columns
    • \in for , \mathbb N for N

Finally, you can take a screenshot of the proof in view mode and post it on Discord.

Further Reading

Goedel, Escher, Bach.

Lecture 5 (Addition, cont'd)

Lecture 6 (Logic)

Guest lecture by Dr Jonathan Weiberger.

The topic was the " Tutorial" of A Lean Intro to Logic.

Lecture 7 (Conjunction)

In the lecture I gave some more general background connecting what we have done so far with what Jonathan had done in the last lecture.

I spent some time on answering a question about what it means for a programming language to be Turing complete. I also made a remark connecting while loops with induction.

Another question we discussed was how to apply theorem proving to programming languages such as C which do not come equipped with a proof assistant.

Finally, we went over Answers to Week 5 Questions , paying close attention to the idea of literate programming and to writing programs simultaneously in code and in math.

Lecture 8 (Implication)

We started the lecture by discussing the w7-questions on Discord.

Then we started on the " Tutorial" in the logic game:

Recap on the logic of "and"

From the " Tutorial".
The logic of and is about lists. If you have names for all elements of a list then a proof is just a nested list (the elements of which can again be lists) as in

image


If you dont have names already, you can make them using have as in

image

The logic of "and" and "implication"

The " Tutorial" in the logic game.

In class we completed Level 1.

We also talked about the so-called Howard-Curry correspondence:

Type Theory Logic
Data Type Proposition
Program Proof

Homework: (See also on Canvas)

Lecture 9 (Implication, cont'd)

We continued with " Tutorial" in the logic game

Recall the following about how to typeset math

math source
λ \lambda
\mapsto
\langle
\rangle
\and or \wedge

Also note that applying a function f to an argument x is written f x with a space between f and x.

Lecture 10 (Disjunction)

We reviewed introduction and elimination rules for and .

Writing a Math Proof

Here is a template how to write out

image

in math style:

(1) Assume (SC)(SD)
(2) SC
(3) SD

(4) Assume S
(5) C
(6) D
(7) CD

(8) SCD

Exercise: Annotate each line in the proof above by the rule that is used to derive it. For example, line (5) is derived from lines (2) and (4) using -elimination (modus ponens).

Or-Tutorial

Then we started on the -tutorial. Essentially, all you need to know are how to do the first three levels.


image


image


image


From now on, everything in the -tutorial is a combination of things we have seen before. See whether you can get through the rest of the -tutorial now level 4 we did in class:

image

Here is Level 4 in mathematics:

(1) Assume CO

(2) Case C
(3) OC

(4) Case O
(5) OC

(6) OC

For short proofs this notation looks a bit heavy (too much bureaucracy). But as proofs gets longer this way of structuring proofs becomes a powerful method.

Remark: In the proof above, I write "Case" to label an assumption that arises from a disjunction (CO in this example).

Lecture 11 (Parsing and Type Inference)

In this lecture we took another look at the Level 4 proof above, both from a programming and from a math point of view. Specifically, we learned how Lean uses an algorithm for Type Inference to compute the math proof from the program. I go through this in detail again below.

We start from the Lean program

exact or_elim h (λ c ↦ or_inr c) (λ o ↦ or_inl o)

Exercise: Paste this into Lean.

This program doesn't show any types. We can put in all the types, if we want:

exact ( ( or_elim 
          (h : C∨O) 
          (λ c:C ↦ ((or_inr (c:C)):O∨C) : C→O∨C) 
          (λ o:O ↦ ((or_inl (o:O)):O∨C) : O→O∨C) 
        ) : O∨C
      )

Exercise: Paste this into Lean. Remove the types and see whether you can put them back in again.

Remark: We understand now why type inference is important. If we had to always write and read the types, both writing and reading would be too cumbersome.

Exercise: (This requires some time and patience.) Now we have the Lean program annotated with all the types, can you match the Lean proof with the math proof?

By now you should have some idea of how to obtain the math proof from Lean proof once the Lean proof has been annotated with all the types.

But how does Lean compute the types?

Short answer: Lean has a type inference algorithm.

For the longer answer, it is worth making a quick detour and explain how to translate the string

or_elim h (λ c ↦ or_inr c) (λ o ↦ or_inl o)

into a tree (the so-called abstract syntax tree (AST)).

The type inference algorithm does not work directly on the string, but rather on the AST.

Excursion on Parsing

In class we started with the example 1+2*3. Does this compute to 7 or 9? Why? How many ways are there to put parentheses into the expression?

One way to think about parsing is to put the parentheses back into an expression or program.

A better way to think about parsing is as translating strings into trees.

For example, the string

or_elim h (λ c ↦ or_inr c) (λ o ↦ or_inl o)

can be turned into a tree (where I indicate the tree structure using indentation)

or_elim 
    h 
    λ 
        c 
        or_inr 
            c 
    λ 
        o
        or_inl
            o

Exercise: Write this tree out using pen-and-paper connecting each parent to its children by drawing lines as in this picture.

Type Inference

We want to understand how to compute

exact ( ( or_elim 
          (h : C∨O) 
          (λ c:C ↦ ((or_inr (c:C)):O∨C) : C→O∨C) 
          (λ o:O ↦ ((or_inl (o:O)):O∨C) : O→O∨C) 
        ) : O∨C
      )

from

or_elim h (λ c ↦ or_inr c) (λ o ↦ or_inl o)

using the AST

or_elim 
    h 
    λ 
        c 
        or_inr 
            c 
    λ 
        o
        or_inl
            o

as the intermediate data structure.

The information we have is the first line of the program

  1. image

and the type of or_elim (available by finding the button image in the logic game) which is given as

  1. image

Exercise:

  • From 1. read off the type of or_elim at the root of the AST.
  • From 2. read off the type of h in the next line of the AST.
  • From 1. compute the type of (λ c ↦ or_inr c).
  • From the previous item, compute the type of c.
  • From the previous item and the type of or_inr given by image compute the type of or_inr c.
  • Etc

In class we went in detail through this and ended up with the following AST. Blue annotations refer to the symbols in 1. and green annotations refer to the symbols in 2. (I admit it takes some time and patience to puzzle this out.)

image

Exercise: Fill in the remaining question marks in the lower right of the picture.

Lecture 12 (Intuitionistic Propositional Logic)

Most of the lecture consisted in discussing parsing and type inference, but we also returned to proofs in the -Tutorial.

Parsing and Type Checking

We discussed the questions on Discord on Parsing and Type Checking/Inference.

I wrote a note on parsing on the example how parse trees explain the ambiguity of certain natural language sentences.

Proofs in the -Tutorial

We did Level 7 of the -Tutorial:

image

Or, in math notation:

Theorem: If G(HU) then (GH)(GU).

Here is the math proof (let me know if that doesnt help to find the Lean proof):

We assume that G(HU).

(2) G
(3) HU
(4) Assume: H

(5) GH

(6) HGH
(7) H(GH)(GU)

(8) Assume: U

(9) GU

(10) UGU
(11) U(GH)(GU)
(12) (GH)(GU)

We have proved (GH)(GU) as required.

Exercise: For each numbered line in the proof above, explain which rule was used to derive it. Here are some examples:

(2) follows by applying -elimination to (GH)(GU)
(6) follows from (4) and (5) using -introduction
(11) follows from (10) using -introduction

The complete list of rules available for the exercise is:

Intuitionistic Propositional Logic

-introduction

ABAB

-elimination

ABAABB

-introduction

Assume A, prove BAB

-elimination

AABB

-introduction

AABBAB

-elimination

ABACBCC

Lecture 13 (Negation)

We started with a general discussion about the difference of classical propositional logic, or Boolean logic as given by truth tables and the intuitionistic logic implemented in Lean and summarized by the introduction and elimination rules above.

Then we started on the ¬-Tutorial, which adds negation to our logic.

We write ¬A as the formal notation for "not A".

In the logic of Lean (intuitionistic logic), negation is defined via implication.

¬A is A in math notation and A -> False in Lean notation. or False is a proposition that does not have a proof.

In terms of the rules for Inutitionstic Propositional Logic from the previous section (Lecture 12), this simply means that there is no introduction rule for or False.

The elimination rule is sometimes known as ex falso quod libet, that is, False implies any formula.

¬-elimination

As

Exercise/Activity: Match up the rule of ex falso quodlibet with the truth table of implication.

Here are the proofs we did in class for the first to levels of the ¬-Tutorial:

image

image

If you are wondering how to find the proofs, rewrite ¬ by and use the knowledge from the previous tutorials:

  • For Level 1, ¬False becomes False → False, which, like any AA has λ f ↦ f as a proof (see -Tutorial, Level 2).
  • For Level 2, the assumption h is of type S → False. We also have an element s of type S. Hence we can apply h to s writing h s. From checking the types we know that h s is of type False, so we can use false_elim to cast h s to type B.

Now give levels 3 and 4 a try. I know negation is the most confusing connective, but if you remember to translate negation into implication you can get it to work.

Lecture 14 (Negation, cont'd)

In this lecture we look first again at negation.

Then we connect what we have done this semester with related topics in computer science and software engineering (see the epilogue).

Epilogue

Looking back, this semester we have done a lot of "low level" coding. But how does this hang together with abstract and high level concepts in computer science and software engineering?

Thanks to the Natural Number Game and the Lean Intro to Logic, we got some hands-on experience with a wide variety of important topics in theoretical computer science and software engineering. Each of this topics is a worthy topic of study in its own right. But if you come across these topics in the future, you will be able to use your knowledge of Lean to get yourself a good understanding.

In the last lecture, I will briefly go throught the topics and links below and explain how they tie in what we have done in class this semester.

There are a lot of links on the WWW but I is not easy to find ones that discuss the big ideas in non-technical but interesting and precise way. If you find links that help you understand these concepts better, please send them to me.

Appendix A: Running Lean Locally

If you want to work more seriously with Lean, you should install Lean on your computer and program Lean using an editor. Here are the instructions in case you use VSCode as your editor.

  1. In Vscode install the lean4 extension.
  2. Following https://lean-lang.org/lean4/doc/quickstart.html , in Vscode, clicking on the -symbol, create a new standalone project: Click New Project, then Project: Create Standalone Project.
  3. I am curious about LeanDojo. I followed steps 1 and 2 and then
    ​​​​lean --version
    ​​​​# update lakefile.lean
    ​​​​lake update LeanCopilot
    ​​​​lake exe LeanCopilot/download
    ​​​​lake build
    
    but this is not needed for the NNG.[6]
  4. You can have a look at Lean-Games to see how I implemented the beginnings of the natural number game.
  5. Once you start working on a Lean-file in VSCode, do not forget Infoview: Toggle Infoview to view the messages of the interactive proof assistant.

Appendix B: General Lean Resources

Further Lean Resources:

Books:


  1. See the Epilogue below for some links. ↩︎

  2. Btw, learning some assembly gives a much deeper understanding of how computers work. Similarly, learning low-level theorem proving teaches a lot about proofs. And, don't worry, we will be working our way up to higher levels. ↩︎

  3. Maybe I need to invent another term than "interesting" for what I want to express. ↩︎

  4. The technical term for this is recursion. If you want to learn more, I recommend the book Goedel, Escher, Bach. ↩︎

  5. If you wonder what role the parentheses play in symbol pushing, the answer is that I think of symbol pushing as manipulating trees, not strings. This is called "abstract syntax" in programming. Ask me if you want to know more. ↩︎

  6. In fact, LeanDojo didn't work for me. Maybe the reason is that for the NNG we define the natural numbers from scratch instead of using the Lean built-in numbers. Without LeanDojo, you can work with the lakefile.lean you get from 2. and bypass 3. See this video for an introduction to LeanDojo. More info on the homepage of Kaiyu Yang. ↩︎