Alternative modeling tools; Start SMT

# Alternative modeling tools Alloy is just one of a large number of modeling and formal methods tools! Today's assigned reading is a paper from Amazon about the investigations they took into applying Alloy, another tool called TLA+, and several other tools, circa 2014. # Discussion of [Why Amazon Chose TLA+ paper](https://link.springer.com/chapter/10.1007/978-3-662-43652-3_3), Newcombe - The conclusion of this paper speaks to why *this* class is in Alloy: the author agrees that it is the easiest formal methods tool to pick up from scratch. The visual display in the Alloy Analyzer also makes it one of the most engaging ways to learn formal methods. - The related course at Brown actually now uses a different, related tool called Forge that shares much of Alloy's syntax and semantics, but is not used in industry. - One interesting point of the paper is that Alloy works better than expected on moderately complex concurrent problems, even **before** the temporal logic extension. With the additional of temporal logic, it would be interesting to reasses/recompare Alloy and TLA+ in this context. :::info The paper makes a distinction between direct and inductive modeling, what is the difference? ::: In direct modeling, you explicitly have sigs and atoms for every step in the trace you want to model. Lasso traces in a temporal Alloy model are an example of this. In inductive modeling, you avoid explicitly modeling *every* step of a trace, and instead rely on inductive reasoning (similar to inductive proofs). If we simplify the universe to `good` or `bad` states, the intuition for what an inductive model must show is: 1. Every initial state is `good`. 2. Every possible transition *from* a `good` state results in another `good` state. :::success Why do we not worry about transitions *from* `bad` states? ::: <details> <summary>Think, then click! </summary> <br> We don't really care if a `bad` state transitions to another `bad` state, because in that trace we have already "lost". The initial state condition should ensure that if our system is correct, we never end up in a `bad` state to begin with. </details> <br> Inductive models are less successful for reasoning about liveness properties (eventually, a specific `good` thing happens) because they typically only consider a single transition at a time. :::info The paper talks about the importance of the distributed mode of TLA+, what is that? ::: The distributed mode of TLA+ is about how the tool itself works, rather than specifically about what the user is trying to model. As we'll see in the remainder of the semester, Alloy uses an underlying SAT solver. In 2014, most SAT solvers could not be run in a concurrent, distributed way. TLA+ circa 2014, however, supported distrubuting its backend solving across multiple concurrent servers. This allowed analyzering/solving itself to finish faster for large problems. In the next unit, we'll go entirely beyond the realm of Alloy and look to other tools commonly used to model computer systems. # Solving beyond booleans As we saw last week, SAT is an incredibly broadly applicable format in which to represent problems. However, the task of *encoding* a specific problem into SAT can be very difficult, depending on the domain. Consider the example of addition in Alloy: we discussed last week that Alloy needs to model `add` on integers with $n$ bits as an $n$-bit adder. For more complex int arithmetic, Alloy also needs to encode multiplication, etc. :::info An **Satisfiability Modulo Theories (SMT) solver** is a satisfiability solver that can handle domain-specific concepts (or "theories") beyond pure boolean logic. ::: The name "modulo theories" alludes to these solvers still answering the question of satisfiability, but with the addition of (for example) the theory of "bitvectors", which model machine integers (vectors of bits). The main theories supported by SMT solvers that we'll talk about today: 1. Bitvectors (machine integers) 2. Real numbers 3. Integers And some more we'll see some of later: 4. Uninterpreted functions 6. Floating pointers 7. Arrays 8. Strings 9. Data types ## Difference: design vs implementation A core difference from Alloy is whether we are modeling based on a separate design specification, or based on the implementation itself. To use Alloy, an engineer needs to write a new model of their design in the Alloy Analyzer. This is an example of design-level modeling. SMT solvers use a more flexibly, lower-level format. You can think of this like an assembly code for modeling tools: it's more flexible in how different tools build on top of it. Many SMT tools work on *the implementation of a system itself*, rather than a separate design specification (though you could make any choice as a modeler). ## Example For example of an application, consider this program in C, which tries to check the bounds of two variables and their sum: ```C #define MAX 16 void check_bounds(int a, int b) { if (a < MAX) { if (b < MAX) { if (a + b > 2*MAX) { printf("Error!\n"); exit(1); } } } } ``` :::success Does this program ever crash with an error? If so, say when. ::: <details> <summary>Think, then click! </summary> <br> Yes, this could hit the error! C's `int` type uses a two's complement signed representation. If both `a` and `b` are negative numbers, their sum can overflow to a position number that's greater than 32. </details> <br> We can use an SMT solver to find a specific instance of this bug for us. # The Z3 solver The most popular SMT solver is one put out by Microsoft, called the [Z3 solver](https://microsoft.github.io/z3guide/docs/logic/intro/). Note that SMT solvers are also often "theorem provers"; more on this later. z3 won [an award in 2015](https://www.microsoft.com/en-us/research/blog/z3-wins-2015-acm-sigplan-award/) for "developing a software system that has had a lasting influence, reflected in contributions to concepts, in commercial acceptance, or both". z3 can be used as a library within Python (and Rust, but we'll stick to Python for today). If you want to follow along in class today, install z3 with: ``` pip3 install z3-solver ``` And update to the latest with: ``` pip install z3-solver --upgrade ``` You can then use Z3 within a Python file: ``` from z3 import * x = Int('x') y = Int('y') solve(x > 2, y < 10, x + 2*y == 7) ``` # Continuing example: bitvector bounds We can model our previous example in Z3 by using the theory of bitvectors. First, we can model this directly in the `SMT-LIB` format. This is a low-level format (you can think of it almost like an assembly code for verification and modeling tools). It uses a syntax with lots of parenthesis, like many functional programming languages (e.g., Lisp). We'll only briefly show this low-level `SMT-LIB` format; after this, we'll instead work with SMT via libraries in Python or Rust. ```lisp ;; Create two new variables, that are 32-bit integers (declare-const a (_ BitVec 32)) (declare-const b (_ BitVec 32)) ;; Model if-statement 1: ;; Add a constraint that a < 16 (in hex) ;; "bvslt" is for "bitvector signed less than" (assert (bvslt a #x00000010)) ;; Model if-statement 2 ;; Add a constraint that b < 16 (in hex) (assert (bvslt b #x00000010)) ;; Model if-statement 3: ;; Add a constraint that the sum of a+b < 32 (in hex) (assert (bvsgt (bvadd a b) #x00000020)) ;; Check if our model has any satisfying instances (check-sat) ;; Print one such instance (see the note on terminology below) (get-model) ``` If we save this file as `ex.smt`, we can run this as: ``` z3 ex.smt ``` Note, you may need to install `z3` as a separate utility (e.g., `brew install z3`) to run on this file; however, this is not necessary for the assignments in this class where we will use z3 as a library in Python or Rust. :::info *Terminology note:* In Alloy, we used the word model to describe the definitions and constraints you use to model a system. This is generally what the software-engineering community means by the word. The logic community, on the other hand, uses model to mean the same thing that we call an *instance* in Alloy or "assignment" in SAT latest homework: the assignment of values to variables that either satisfies or does not satisfy a set of constraints. Z3's "get model" uses the word "model" like a logician, not a software engineer, and it means the concrete assignment. ::: # Python's z3 library To make it easier to use z3 without writing code in the low-level format, the remaining examples will use Python's library for z3. More next week!