Symbolic Execution

:::warning Announcements - Reminder: final presentation timing form. - Next workshop session: [final project](https://hackmd.io/@RbcpVXJZQJa0j9NcVfk5xg/SJLVI_7xA) group formation. ::: # Symbolic execution Unlike most of the modeling we've done so far on system *design*, **symbolic execution** is a strategy for reasoning about computer systems that takes as input actual *implementations* (e.g., programs). In Alloy, we took the core ideas of systems and hand-wrote models. In SMT 1, we looked at actual C code, and by-hand transcribed it to SMT queries. In symbolic execution, we'll write programs than consume other programs as input! The core idea of symbolic execution is to run programs on "symbolic" inputs rather than concrete values. At each operation and program branch, we update the symbolic state to formulas that represent possible concrete values along that program path. The examples of SMT that we have shown so far (and some that you are working on for current your homework) are essentially hard-coded versions of symbolic execution. A symbolic execution engine talks in program source code and produces the symbolic output, which can (often) tell us what program paths and what outcomes are possible. # Some brief history Symbolic execution really took root in the 1970s (see this [2025 retrospective](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10886972) by Lori Clarke, one of the first researchers to work on these ideas). But symbolic reasoning is computationally expensive, so compute was a limiting factor. With computation getting faster in the decades sense (see [Moore's Law](https://en.wikipedia.org/wiki/Moore%27s_law)), symbolic reasoning tools could solve bigger and bigger problems (see ["A Billion SMT Queries A Day", AWS](https://www.amazon.science/blog/a-billion-smt-queries-a-day)). This has made symbolic execution more practical for more use cases. ## Symbolic execution with SMT solvers Under-the-hood, most state-of-the-art symbolic execution engines use SMT solvers to reason about possible program paths. Benefits of symbolic execution: 1. Provide a concrete input on which the program fails to meet some specification. 2. Analysis is done automatically, via the underlying solver. Downsides: 1. Often, in the literature, symbolic execution is *unsound*: it's used even in cases where not all program paths are explored. :::success What is a case where symbolic execution might not explore a program path completely? ::: <details> <summary>Think, then click! </summary> <br> If a program has an loop that is not statically bounded(for example, a server always waiting for connections), then it wouldn't be possible for symbolic execution to reason about every possible path, since the number of paths is not statically knowable. </details> <br> ## Symbolic execution by example Consider the following program: ```C= void f (int x, int y) { if (x > y) { int tmp = x; x = y; y = tmp; if (x > 0 && y > 0 && x - y > 0) { exit(-1); } } int z = x << 7; int w = x & 0xFF; if (!w) { exit(-1); } } ``` In last week's lecture, we loosely went over the idea that we can model nested `if` statements with `and`. Full symbolic execution formalizes this process. At each program point, we track two data structures: (1) a mapping of variables to symbolic values, and (2) the path conditions to get to that program point. If we treat points in the above program as nodes in a graph, we can draw the path conditions as edges and the symbolic values as annotations on each node: Lines: 1. Add to Path: `true`. Mapping: `{x: M, y: N}`. `M` and `N` are "fresh" variables to represent the starting values of `x` and `y`. 2. Add to Path inside the `if`: `M > N`. `if` statements add their condition to the path, replacing variables with their current values from the mapping. We also use the *symbolic* version of the operator (for the Python bindings, `>` looks the same). 3. Add to Path: `true`. Mapping: `{x: M, y: N, tmp = M}`. Variable initialization adds to the mapping. For the right side of the `=`, we again look up variables in the mapping and use symbolic operators. 4. Add to Path: `true`. Mapping: `{x: N, y: N, tmp = M}`. 5. Add to Path: `true`. Mapping: `{x: N, y: M, tmp = M}`. 6. Add to path inside the if: `` 7. At this point, we hit `exit(-1)`, which indicates an error. We ask the solver if the accumulated conditions along the path up to this point are SAT: `And(true, M > N, true, true, true, N > 0, M > 0, N - M > 0)`. Via properties of `>`, the solver can show this condition path is `unsat`, meaning the error is *not* reachable. 8. At this point, control flow *merges*. Different symbolic execution engines use different strategies. For now, we'll say that after this point, the variable assignments are conditioned based on whether the branch was taken (shown in the next line). 9. Add to Path: `true`. Mapping: `{x: If(M>N, N, M) , y: If(M>N, M, N)}`. 10. Add to Path: `true`. Mapping: `{x: If(M>N, N, M) , y: If(M>N, M, N), z: If(M>N, N, M) << 7}`. We can think of `z` as having 32 - 7 = 25 bits from either `M` or `N`, then 7 low bits that must all be 0, based on properties of `<`. 11. Add to Path: `true`. Mapping: `{x: If(M>N, N, M) , y: If(M>N, M, N), z: (If(M>N, N, M) << 7) & 0xFF}`. `OxFF = 0b11111111`. We know now `z` is either all 0, or the lowest bit from either `M` or `N`, shifted left by 7. 12. Add to Path inside the if: `!((If(M>N, N, M) << 7) & 0xFF) != 0`. Mapping: `{x: If(M>N, N, M) , y: If(M>N, M, N), z: (If(M>N, N, M) << 7) & 0xFF}`. `OxFF = 0b11111111`. 13. Another exit with error. We ask the solver if the path, which simplifies to `!((If(M>N, N, M) << 7) & 0xFF) != 0`, is SAT. This is true if the value of `x` and line 10 was odd! So yes, this error is feasible, and the solver can give us concrete inputs demonstrating this! Notable points: 1. We start with a path condition of just `True`. 2. Most normal statements, like assignments (`var = ...`), do not modify the path condition. 3. If statements **do** update the path condition, for both the if- and else- branches. The condition is added to the if- brand with an `And(cond, ...)` and added to the else branch with an `And((Not(cond)), ...)`. 4. Assignments cause us to update our symbolic variable mapping. In class, the strategy we used was to look up whatever variables are in the right hand side of the assignment in the symbolic map, and combine them (with constants as applicable) to build a new symbolic value for the variable. At each point that could indicate a potential error (here, the `exit(1)` statements), a symbolic execution engine can use an SMT solver to check whether the path to get to that point is feasible. If these is some assignment of inputs such that the path is reachable, then we have an issue! If the SMT solver returns `UNSAT`, then the potential bug is unreachable, as desired. :::success Is it possible to hit the error case on line 14? ::: <details> <summary>Think, then click! </summary> <br> Yes, if the value of `x` at line 10 is odd. This is the case if the original value of `x` is odd and the if-statement is not taken, or if the original value of `y` is odd and the if-statement is taken. </details> <br> More on this next time.