# Salsa fixed point sync
## General sync
* https://github.com/salsa-rs/salsa/pull/527
* Benchmark perf regression -- what's happening
* easing the ordering (6-9%)
## Dashmap
Three uses of dashmap
* tracked functions (memoized results)
* interned values
* tracked structs (get to a unique identity)
"validate function" -- have the ingredient index + key but still have to lookup the entry.
with a slab setup we could avoid that lookup
tracked functions:
```rust=
#[salsa::input]
struct Input { }
#[salsa::tracked]
fn foo(x: &dyn Db, input: Input) -> {
}
```
Micha's question: is it possible to avoid the hashmap lookups when validating derived tracked structs in: https://github.com/salsa-rs/salsa/blob/86f06d8485edd3ad829a186eadb9fee91dbcf32c/src/function/maybe_changed_after.rs#L195-L225 by directly storing the pointer into the Slab page instead of the `DependencyKeyIndex`?
- Going to mostly read-only after program startup. some mutation at the start, so a fast, read/lockfree-optimized data structure would be appropriate here.
Options:
1. (today) global map from input key
2. sparse vector from input key (kind of the same...?)
3. store a hashmap of what functions track an input on the input itself. this small hashmap; allows for some DICE-style eager invalidation with (ideally) a high degree of memory locality
a. This might be the best option.
## Fixed point sync
```rust
x = 0; flag = True
while flag:
x = x or 1
flag = check_flag()
```
Abstract interpretation on Python code, more or less.
### Option A
* Execute a query F
* Execute a query G
* Execute a query H
* Execute a query F
becomes
* Execute a query F (cycle-participant)
* Execute a query G (cycle-participant)
* Execute a query H (cycle-participant)
* Execute a query F -- return the current value (bottom)
* yields F0
* execute query F again, with F0 as current value
* Execute a query G (cycle-participant)
* Execute a query H (cycle-participant)
* Execute a query F -- return F0
* yields F1 -- we store LUB(F0, F1)
* execute query F again, with F0 as current value
* Execute a query G (cycle-participant)
* Execute a query H (cycle-participant)
* Execute a query F -- return F0
multiple threads would be annoying, it'd be slightly nicer if we knew F G and H are all on the same thread
### Option B
```rust
query_f(db)
.then(async {
query_g(db).then {
query_h(db).
}
})
async query_h(db) {
let x = query_f();
let y = independent_work();
join(, y).await;
}
```
## DICE -- incremental
Partially eager
PDF: https://github.com/facebook/buck2/blob/main/dice/dice/docs/DiceIncrementalityAlgorithms.pdf
- DavidB's understanding of Salsa today: incrementing a global revision count requires walking *all* deps, even if a changed value does not impact a dependent value.
- Modulo durabilities: if we see that a query only depends on inputs with durability {Medium, High}, we only invalidate the input if a Medium/High dependency is invalidated
- Possibly consider a bloom filter!
- DICE is designed around the idea/assumption that most changes will be pretty late in a topologically sorted dependency graph, as most of the build can be cached/unchanged since people are rebuilding a binary/library with few/little dependencies.
- DavidB: https://lord.io/spreadsheets/ helped me understand the different tradeoffs.
Asymptotic complexities of different incremental compuation systems:
- Dice: `O(changed_subset)` recomputation; (near) `O(invalidated_set)` graph traversals
- Adapton: `O(changed_subset)` recomputation; `O(invalidated_set)` traversals
- Salsa: `O(changed_subset)` recomputation; `O(mountain)` traversals if lazy
- `mountain` is defined, by the Buck2 folks, as "the set of nodes required for a given computation request"
- Salsa is less asymptoticly efficient than alternatives so that it can be more lazy: an IDE might not care!
- In other cases, however, we do: https://github.com/salsa-rs/salsa/issues/41