owned this note
owned this note
Published
Linked with GitHub
# Salsa and Recursive Solver Ideas
In salsa, we could change how we handle multithreading and recursion. Currently, we guarantee that a single query instance only executes on one thread at a time. We do this by updating the state of the query (atomically) to "in-progress" and then again to "complete", and tracking cross-thread dependencies.
## Step one: idempotent queries by default
In the newer model, we would just have a HashMap that tracks Key -> Value. When executing a query, we would do:
1. Check if there is a Key present and return the Value if so.
2. If not, check if the Query(Key) instance is being computed on the current thread. If so, return a cycle error (but see below).
3. If not, push onto the stack and start computing.
4. Upon completion, store the Key -> Value result into the map -- note that it may also have been stored by some other thread concurrently, but the two results should be equivalent (we can decide which one to keep, doesn't really matter).
We are effectively requiring that salsa queries are idempotent (i.e., may be executed multiple times) by default.
(This does require us to find a way to contain side-effects like error reporting, but we need that anyway.)
## Required: some way to declare locked queries
We do probably want some way to declare a salsa query as "locked" or "at most once". There are two reasons you might do that:
* The query is really expensive and you would rather not have two threads executing it (e.g., perhaps type-check in rustc? unclear).
* The query may not always produce exactly the same result and if you allow two different results to co-mingle, you can get internal errors and other similar problems (e.g., procedural macros).
To accommodate **locked** queries, we would add a **separate** mechaism whereby the query can update its state. i.e., once it is pushed onto the stack, it can also check for other threads. This would basically be the same as the existing logic but separated and (I believe) clearer as a result. I'm going to ignore this for now.
## Fixed point queries
Building on the above, we could extend the system with "fixed-point" queries (which also must be idempotent). The idea of a fixed-point query is that, when it is first executed, it has an "intermediate" result R0.
We execute the query body once. If there is a recursive call, and the stack consists only of fixed-point queries (or perhaps even only of the same fixed-point query, that would suffice for now I think), then we can return this intermediate result. But we also track the dependency (much as chalk's recursive solver does today).
When the fixed point query completes, it will give back a new value R1. If there were recursive calls (i.e., if that result R1 depended on the intermediate result of R0) then we will re-execute the fixed point query. But this time, if there are recursive calls, we will yield R1. eventually this second iteration will complete, yielding R2. If R1 == R2, we are done, otherwise we keep going.
Obviously, this could go forever: part of the contract of a fixed point query is that eventually the results will stabilize (reach a fixed point).
Furthermore, in terms of dependency tracking, we will accumulate dependencies such that, when we do reach the fixed point, the result depends on all the queries that were accessed over all the iterations. (And all elements in the SCC will have the same dependency set.)
### API possibilities
The precise API and interface to fixed-point queries will require some experimentation. I am imagining a few possibilities.
One is a callback like
```
salsa_runtime.fixed_point(v0, || ...)
```
that can be invoked. The downside of this is that it ought to be the "only thing" you do in the query implementation, and I'm not sure how to guarantee that exactly, but I'm not sure how important that is.
Another is that we tag the query as `#[fixed_point]` (well that may be desirable regardless) and we then require that the result of the query implements `Default` or some such trait. The query can then be invoked as normal but `Default::default` will be used for recusive results on the first round.
One final possibility is that we may want to provide the previous result (initially, the default) to the query function as an input. For chalk at least, there is a "merging of results" that happens, and having the previous round's input would be helpful there.
I lean towards this rough design at present:
* tag query as `#[fixed_point]`
* for a fixed-point query, we generate an initial result using `FixedPointDefault::default`
* the query function takes `(db, key, value)` where `value` is the result from the previous round
## Refactoring chalk's recursive solver
To integrate these changes into chalk's recursive solver will require some work. In particular, it pushes some of the responsibility out from chalk to salsa, which means we need to update chalk's `RustIrDatabase` trait to manage that.