Pattern Types ⊆ Refinement Types

I been working for the last four year on a tool called Flux as part of my phd. Flux is a refinement type checker for Rust that lets you prove correctness properties like the absence of out-of-bound array accesses or more generally absence of panics.

I recently started to look closer into recent discussions around adding pattern types to Rust. The idea has evolved over time, but the most recent incarnation is this pre-RFC by @joboet. The proposal hits very close to home. In fact, I'd say it is as a subset of what Flux can do (or hope to do). However, unlike Flux, which tries to explore the limits of refinement types, the pre-RFC proposes a small enough incremental change that I can see being implemented in the compiler in the not so distant future.

Given what I've learned working with Flux, I thought it'd be good to share my thoughts on the proposal. But more than anything, this as an opportunity to get out some of my ideas in a way that will hopefully make sense to people. This is not a critique to the proposal and I'm not proposing anything myself. My main goal is to provide useful vocabulary to reason and discuss about the kind of features being proposed.

In the following, I'll introduce a general framework for refinement types and then try to describe pattern types as an instance of this framework. I'll then elaborate on the challenges of implementing refinement types by extending the proposal with additional features supported by Flux. In framing it like this, my tendency will be to make the proposal more complex. I think that's the opposite of what we want. We should start with something simple on which we can all agree on. But once you give users basic pattern/refinement types they'll ask for more and I think it's important to know why we can't give them more. Additionally, exploring the design space is always important to understand what future extensions will be backward compatible.

So what are refinement types. Quickly and poorly explained, a refinement type describes the subset of values of some type that satisfy a constraint. For example, you can use a refinement type to describe the set of i32 values that are also non-zero. This type is usually written as:

{v: i32 | v != 0}

More generaly, a refinement type has the form {v: B | p}, where v is a refinement variable, B is a Rust base type, and p is a refinement predicate constraining the variable v. The intuitive reading of this type is values v of type B that satisfy the constraint p^[1].

Now, what is p exactly. A key insight of refinement types as implemented in tools like Flux is that you can design a practical system by letting p be a formula drawn from an SMT decidable logic. As I'll explain latter, pattern types further restrict p to be a pattern^[2].

Picking the right logic to draw predicates from is important for two major reasons. First, subtyping needs to be decidable. For example, the type {v: i32 | v > 0} is intuitively a subtype of {v: i32 | v != 0}, because a function expecting a non-zero value can happily take a positive one. Checking this subtyping boils down to deciding whether v > 0 implies v != 0. In this simple case, this is obviously true, but in general, the system needs to prove implications between arbitrary predicates. By choosing p to be a formula in a decidable theory we can offset the work to an SMT solver.

Second, and more interestingly, we want the system to do refinement inference. That is, analogous to type inference, we should be able omit refinement annotations and let the system infer them for us. For instance, suppose we write let x = 5. Without the need of extra annotations, the Rust type system can infer that x has type i32. A system doing refinement inference would also be able to infer, for example, that x can be given type {v: i32 | v != 0}. This is in contrast to forcing the user to annotate the type explicitly

let x: {v: i32 | v != 0} = 5;

I believe that freeing users from this annotation burden is crucial to make the system practical.
A major result from research in the area is that picking formulas be SMT decidable also let us do refinement inference.

This is all very abstract, but the main point I want to drive is that to have a practical refinement type system you need to chose predicates in such a way that allows you to have (1) decidable subtyping and (2) decidable refinement inference. This is at least in theory, in practice, we need to solve them fast. Efficiency is an extremly subtle concept, specially in this context. Many of the problems that arise in refinement type checking lie in the boundary of decidability or have extremely high theoretical complexity bounds, yet SMT solvers deal with them routinely. SMT solvers are the living proof that we can solve hard problems in practice. I think is important to keep this in mind.

Finally, I'm phrasing subtyping and inference as if they were two different problems. In reality, inference subsumes subtyping because picking the right refinements involves choosing them in a way that satisfy subtyping constraints. Still, I think the distinction is useful because it allows us to visualize how a system that doesn't do inference (where everything is annotated) would look like.

Pattern types

So what are pattern types. The pre-RFC proposes adding a new type written T is p, where T is a type and p a pattern. A pattern is what you would normally write in a match statement. The intuitive reading of T is p is that it represents values of type T that match the pattern p. Some examples taken from the proposal:

i32 is 1..16
Option<i32> is Some(_)
&[Ordering] is [Ordering::Less, ..]

This looks an awful lot like a refinement type. In fact, using the refinement type syntax from earlier we can write the first example as {v: i32 | matches!(v, 1..16)}. Thus, you can say that pattern types are a restricted form of refinement types where predicates are always of the form matches!(v, pattern).

I like the idea of pattern types for two reasons. First, a pattern is a concept already familiar to Rust programmers so we don't need to tell them things like "you can only use formulas drawn from the decidable theory of uninterpreted functions and linear arithmetic". Second, patterns may be both useful and simple enough to be worth implementing in the compiler.

The proposal allows patterns on any matchable type. To keep things simple, I'll center most of the discussion around integers and range patterns. They are interesting and challenging enough to present most of my ideas.

Can pattern types be practical?

Let's start by checking if pattern types satisfy my criteria for a practical refinement type system (spoiler I'm not quite sure).

Decidable subtyping: Checking if T is p1 is a subtype of T is p2 requires checking that the pattern p1 is included in the pattern p2, that is, any value that matches p1 also matches p2. I will notate this as p1 ⊆ p2 and I'll call it pattern inclusion. For example, the type i32 is 1.. (positive integers) is a subtype of i32 is ..0 | 1.. (non-zero integers) because 1.. ⊆ ..0 | 1... The details of how to implement this in an algorithm eludes me, but the problem is certainly decidable. In fact, the compiler already implements most if not all of the logic for exhaustiveness checking^[3]. We have already deemed this to be fast enough so I'm hopeful.
Pattern inference: This is the part that is least clear and the main technical challenge, but I'm confident it can be done, at least for the simple form of pattern types being proposed. Ideally, the way I believe this should be implemented is to have a pattern checking phase whose only output is a set of constraints of the form p1 ⊆ p2^[4]. These constraints may include unknown patterns that need to be inferred. The inference will happen in a latter and independent phase. This is how it works in Flux, and I think it is what makes the problem tractable^[5]. The details of pattern checking will determine how constraints look like. I believe it's important to discuss possible extensions in terms of how constraints would look like if we add them.

On backward compatibility

Pattern types must be backward compatible, as a consequence we need to give meaning to any type written so far. Old types don't contain patterns, so what does it mean when I write i32? In my mind, there are two ways to interpret it.

If there's no pattern, we assume the wildcard pattern, i.e., we interpret i32 as i32 is _. This should undoubtedly be the behavior in function signatures, because (1) it means function signatures will continue to have the same meaning and (2) it makes pattern inference a local problem. This is the behavior in Flux, where an i32 used in a function signature is interpreted as {v: i32 | true}.
We could also interpret i32 as i32 is ?, which is made up syntax to say the pattern needs to be inferred (sad that we can't use _ here with the usual meaning of something being inferred). You could argue that this should be the behavior when annotating local variables. For example, if I write let x: i32 = 5; I would like the system to infer x: i32 is 5. This is the approach taken by Flux, where a variable annotated with i32 is interpreted as {v: i32 | ?}. Alternatively, we could choose to interpret i32 as i32 is _ and only infer patterns if the variable has no annotation at all: let x = 5. For backward compatibility, I think there's also an argument for not introducing patterns unless explicitly requested by the user, for example, by explicitly annotating one let x: i32 is 5 = 5 or by explicitly saying that the pattern needs to be inferred let x: i32 is ? = 5. If patterns are only introduced explicitly, it can never happen that old code infers a pattern that makes the code no longer compile. I think subtle non backward compatibility can be introduced this way via mutable references.

Flow sensitivity

Flow-sensitive typing is when the type of an expression is different dependeing on where it appears in the program. Rust type system is flow sensitive. Take for example:


let x = vec![0, 1, 2];
let y = x;

The type of x in line 1 is Vec<i32>. After the assignment to y the type of x is no longer Vec<i32> but uninitialized. Rust doesn't expose an uninitialized type but instead the uninitialized state is part of the computations done internally by the borrow checker.

Another aspect in Rust that's flow sensitive is lifetimes: the paths blocked by an active reference are different depending on the point of the program. Conceptually, you can interpret this as the type of references changing at different points in the program.

Since flow sensitivity is restricted to these special cases, we don't typically present this as types changing after each statement, but rather in terms of variables being moved and borrows expiring. However, once we add patterns, types can change in interesting ways. Consider the following:







fn take_positive(x: i32 is 1..) { ... }
fn take_less_than_ten(x: i32 is ..10) { ... }

let mut x = 20;
take_positive(x);
x = 5;
take_less_than_ten(x);

Should the code be accepted? If the answer is yes, then pattern types must be flow sensitive. The type of x starts being i32 is 20 which is a subtype of i32 is 1.. and consequently the call to take_positive is accepted. Latter, the type of x changes to i32 is 5, this is typically called a strong update. Since i32 is 5 is a subtype of i32 is ..10, the call to take_less_than_ten is also accepted.

Alternatively, we could decide that pattern types are flow insensitive. In this world, we need to assign x a sigle type that must be compatible with all assignments. For the example, we could assign x type i32 is 5 | 20 which is compatible with both assignments. This type is a subtype of i32 is 1.. but not of i32 is ..10. In this context, the program would have to be rejected. A key question is whether it will be backward compatible to start flow insensitive and then move to flow sensitive, my intuition is that it should be, at least in principle.

Another subtle point I want to call out is that there's some tension between type annotations as they are typically understood in Rust and pattern types being flow sensitive. Consider the following

let mut x: i32 is 20.. = 20;
take_positive(x);
x = 5;
take_less_than_ten(x);

Should the annotation i32 is 20.. be understood as being valid only for the first assignment and then latter strongly update the type in the second assignement, or should it be understood flow insensitively and the second assignment be rejected?

Mutable references

Pushing further on the idea of flow sensitivity, an immediate question that comes to mind is what to do about mutable references. Consider the following modified version of the example from before








fn take_positive(x: i32 is 1..) { ... }
fn take_less_than_ten(x: i32 is ..10) { ... }

let mut x = 20;
take_positive(x);
let r = &mut x;
*r = 5;
take_less_than_ten(x);

The behavior is the same, but instead of mutating x directly we do it through a mutable reference. Should the code be accepted now?

A major aspect in the design of Flux is how to handle mutable references in a way that allows us to accept this code. In a nutshell, we can verify this code by tracking the specific location the mutable reference points to. Therefore, Flux can determine that the mutation in line 7 is actually mutating x, updating its type accordingly. Thus, Flux is not only flow sensitive, but also location sensitive: it knowns the location references points to^[6].

Tracking locations precisely can become intractable very quickly, for example if you have references pointing inside an array. To solve this issue, Flux takes a very simple aproach: we track locations until it's too hard and then give up. At this point the location becomes flow insensitive. The key insight is that with a relatively simple design, we can be flow sensitive while we know exactly who can mutate a location.

This last idea can also be applied without location sensitivity. Effectively, we can have a simple design in which we can be flow sensitive while we own a value and fallback to flow insensitive when we take a mutable reference to it.

To illustrate, let me modify the example again changing the assignments a bit.









fn take_positive(x: i32 is 1..) { ... }
fn take_less_than_ten(x: i32 is ..10) { ... }

let mut x = -1;
x = 5;
take_positive(x);
let r = &mut x;
*r = 0;
take_less_than_ten(x);

Let's see how we could analyze the code by applying the idea above

We start by assigning x type i32 is -1.
In the next line, we strongly update the type of x to i32 is 5.
In the call to take_positive, we generate the constraint 5 ⊆ 1.., by relating the actual and formal arguments.
In line 7 we take a mutable reference to x. This is the crucial part. To proceed with the analysis, we generate a fresh inference variable ?0 and assign r type &mut i32 is ?0. From now on, we can only do flow-insensitive updates through r. But mutations through r will also modify x so we need to account for that. To do so, we also update the type of x to be i32 is ?0 and generate the constraint 5 ⊆ ?0 relating the old and new type of x. In a sense, the new pattern ?0 captures any possible mutation that can happen to x via r.
In the assignment in line 8, we generate the constraint 0 ⊆ ?0.
Finally, in the call to take_less_than_ten, we generate the constraint ?0 ⊆ ..10.

This is the set of constraints we generated

5 ⊆ 1..
5 ⊆ ?0
0 ⊆ ?0
?0 ⊆ ..10

We can satisfy them by picking ?0 := 0 | 5

5 ⊆ 1..
5 ⊆ 0 | 5
0 ⊆ 0 | 5
0 | 5 ⊆ ..10

For completeness, let's also discuss how the code would be checked flow insensitively. In this world, we would use the same inference variable for both x and r generating the following constraints

5 ⊆ ?0
?0 ⊆ 1..
0 ⊆ ?0
?0 ⊆ ..10

This set of constraint doesn't have a solution so the program would be rejected.

Finally, I have one last thing to say about mutable references: they are invariant on he pointee type. We have to be careful about this for backward compatibility. Suppose old code is calling a function fn(&mut i32). After adding pattern types, the function will be interpreted as fn(&mut i32 as _). If at the call site we infer the argument to have type &mut i32 as p for p anything other than _ the code will no longer compile.

Arithmetic

Another important area to explore is how well pattern type can behave when mixed with arithmetic. I think this will determine how useful they are in practice. Consider:





fn take_positive(x: i32 is 1..) { ... }
fn do_some_arith(x: i32 is 1..10, y: i32 is 1..10) {
    let z = x + y;
    take_positive(z);
}

The code is perfectly fine because given the bounds, x + y will never overflow and the result will remain positive. Can we make the analysis work to notice this?

Something useful to remember here is that the behavior of addition in Rust is always defined, even when there's overflow. This mean we can overapproximate it. One way to do this is via interval arithmetic^[7]. In the example above, this mean we can overapproximate the result of the addition between x and y as 1..10 + 1..10 = 2..19. For this to work in general, we need to be a bit careful with overflow. For instance, if we have a value a of type i32 is 1..=i32::MAX we cannot simply give a + 1 type i32 is 2..=(i32::MAX + 1), that would be unsound (the pattern would also be malformed at type i32, is that a thing?).

One easy solution is to give up when there's overflow and give a + 1 type i32 is .., that would be sound but imprecise. I'm not too familiar with the area, but there's a lot of research trying to make range analysis more precise. I believe it should work well with other basic arithmetic operators like substraction, multiplication, and division (maybe modulo is harder).

Interval arithmetic works great when we have known patterns we can evaluate, but keep in mind that in general types will contain inference variables. To handle this, one option would be to propagate variables symbolically. For instance, if you are adding i32 is ?0 and i32 is 1..10 we can give the result type i32 is ?0 + 1..10. The pattern ?0 + 1..10 may latter show up in an inclusion constraint and the inference algorithm would have to handle it. Alternatively, you could require patterns to be known at those points. I'm not sure how this would be implemented, but I'm imagining you could either force inference during pattern checking for the constraints accumulated so far or maybe generate dependencies in the constraints such that some subsets have to be solved before others.

Finally, it'd be useful if the system could help us with proving the absense of overflows. For that, we only need to generate an extra constraint. In the example above, we would generate 1..10 + 1..10 ⊆ i32::MIN..=i32::MAX, given the approximate nature of the analysis they would be a lot of false negatives so this needs to be implemented as a lint.

The syntax in Flux is a bit different because we are very pedantic about separating the syntax of the language from the syntax of the logic, but the intuition behind is still the same. ↩︎
How convenient that both predicate and pattern start with a p. ↩︎
It's my understanding that exhaustiveness checking can solve arbitrary SAT problems, so if we are already doing that maybe we should just distribute Rust with a copy of Z3 and throw away this document. ↩︎
Constraints don't necessarily need to have this form, but the point remains. For example, Flux generates constraints in horn format because the kind of implications/inclusions are contextual: they hold under certain assumptions. ↩︎
This has implications with trait resolution. Basically, you can't have a trait implementation depend on a pattern, or at the very least you can't multiple implementations for types that only differ in they patterns. ↩︎
The example I'm using is not very illustrative, no one will mutate a value through a mutable reference if you can instead mutate the variable directly. The main reason you take mutable references is to pass them to other functions. Thus, the real benefit of location sensitivity comes when you can preserve it across function calls. ↩︎
I think in general we need union of intervals to account for | patterns. ↩︎

Pattern Types ⊆ Refinement Types

Refinement types

Pattern types

Can pattern types be practical?

On backward compatibility

Flow sensitivity

Mutable references

Arithmetic

Read more

Constriant vs Index

OOPSLA24 Response

Flux Storm

Generic Refinements