[toc]
# Salsa Concepts
## Tracked Structs
```rust
#[salsa::tracked]
struct MyStruct<'db> {
value: usize
}
```
Internally: A thin wrapper around a salsa ID (u64).
### Equality
Equality is soley based on the instance. The same as reference equality/ptr equality in a single revision. Slightly weaker across revisions or during fixpoint iteration.
In more detail, two tracked structs are only equal if:
* They're created by the:
* same query
* for the exact same query-arguments
* The values of all (untracked-) fields are equal
This tries to capture: This is the same `::new` call.
Two tracked structs created by different queries aren't equal even if their values are
```rust
#[salsa::tracked]
fn query_a(db: &dyn Db) -> MyStruct {
MyStruct::new(db, 10)
}
#[salsa::tracked]
fn query_b(db: &dyn Db) -> MyStruct {
MyStruct::new(db, 10)
}
assert_ne!(query_a(db, a), query_b(db, a));
```
Two tracked structs created by the same query but with different query-arguments aren't equal, even if their values are:
```rust
#[salsa::tracked]
fn query(db: &dyn Db, file: File) -> MyStruct {
MyStruct::new(db, 10)
}
let a: File = ...;
let b: File = ...;
# Two tracked structs aren't equal if they're created by the same query but
assert_ne!(query(db, a), query(db, b));
```
Two tracked structs created by the same query aren't equal , even if their values are:
```rust
#[salsa::tracked]
fn query(db: &dyn Db) -> (MyStruct, MyStruct) {
(MyStruct::new(db, 10), MyStruct::new(db, 10))
}
let (a, b) = query(db);
assert_ne!(a, b);
```
Two tracked structs created in different revisions but with different values aren't equal.
```rust
#[salsa::tracked]
fn query(db: &dyn Db, file: File) -> MyStruct {
MyStruct::new(db, file.size(db))
}
let file = File::new(db, 100);
let a = query(db, file);
file.set_size(&mut db).to(1000);
let b = query(db, file); // Note, this won't compile
assert_ne!(a, b);
```
The following structs are equal:
```rust
#[salsa::tracked]
fn query(db: &dyn Db) -> MyStruct {
MyStruct::new(db, 10)
}
assert_eq(query(db), query(db));
```
This is also true if the query re-executes after a change:
```rust
#[salsa::tracked]
fn query(db: &dyn Db, file: File) -> MyStruct {
let _size = file.size();
MyStruct::new(db, 10)
}
let file = ...;
let r1 = query(db, file);
file.set_size(&mut db).to(...);
let r2 = query(db, file);
// Note: This doesn't actually compile because of lifetimes
// but retaining the identity is important because `r1` could be the input to another
assert_eq(r1, r2);
```
### Garbage collection
A tracked struct gets deleted when:
It's creating query re-execute and it no longer creates the tracked struct:
```rust
#[salsa::tracked]
fn query(db: &dyn Db, file: File) -> Option<MyStruct> {
if file.size(db) < 1000 {
Some(MyStruct::new(db, 10))
} else {
None
}
}
let file = File::new(db, /* size */ 500);
let a = query(db, file);
assert!(a.is_some());
file.set_size(&mut db).to(10_000);
let a = query(db, file);
// ... the old `MyStruct` now gets deleted.
assert!(a.is_none());
```
This is also true if the tracked struct gets created with different values (because it's then a different tracked struct)
```rust
fn query(db: &dyn Db, file: File) -> MyStruct {
MyStruct::new(db, file.size())
}
let file = File::new(db, /* size */ 500);
let a = query(db, file);
file.set_size(&mut db).to(10_000);
query(db, file); // `a` is marked as ready for deletion
```
If the owning query gets garbage collected (e.g. because its input struct gets deleted).
### When to use
When you don't need global data sharing:
* Because all instances are created in a single query (and you know that the created instances never overlap when the query is called with different arguments). E.g., `ScopeId` is created during semantic indexing and no two `semantic_index` calls can create the same `ScopeId` because it has a `file: File` field (there are never two identical instances).
* Because structural equality isn't important or you actually want a new instance every time (I don't think we have this in ty)
### Advantages
* Cheaper than interned structs (see below) because it doesn't require coordination between threads
* More predictable lifetime. They're lifetime only depends on a single query
### Tracked fields
This is an advanced feature but we use it in ty. Tracked structs can have tracked fields. Tracked fields aren't considered when comparing if two tracked structs are equal because the tracked field is tracked separately.
This query compared non equal before but compares equal if `size` is a tracked struct
```rust!
#[salsa::tracked]
struct MyStruct<'db> {
name: String,
#[tracked]
size: usize
}
#[salsa::tracked]
fn query(db: &dyn Db, file: File) -> MyStruct {
MyStruct::new(db, file.name().clone(), file.size())
}
let file = File::new(db, /* name */ "test".to_string(), /* size */ 500);
let a = query(db, file);
file.set_size(&mut db).to(10_000);
let b = query(db, file); // this won't compile
assert_eq!(a, b);
assert_ne!(a.size(db), b.size(db));
```
We use this for the `Definition`'s' and `Expression`'s `node` fields. It's still the same `Definition` even if the AST node changed.
## Interned Structs
```rust
#[salsa::interned]
struct MyStruct<'db> {
value: usize
}
```
Very similar to tracked, but:
* Allow data sharing: Different queries interning the same value share the same underlying interned struct instance.
* As a result, the interned value compares equal regardless of which query it created the value
### Equality
All examples that didn't compare equal for tracked structs do compare equal for interned structs (because they're the same):
Interned structs with the same values created by different queries compare equal:
```rust
#[salsa::tracked]
fn query_a(db: &dyn Db) -> MyStruct {
MyStruct::new(db, 10)
}
#[salsa::tracked]
fn query_b(db: &dyn Db) -> MyStruct {
MyStruct::new(db, 10)
}
assert_eq!(query_a(db, a), query_b(db, a));
```
Interned structs created by the same query but with different arguments compares equal:
```rust
#[salsa::tracked]
fn query(db: &dyn Db, file: File) -> MyStruct {
MyStruct::new(db, 10)
}
let a: File = ...;
let b: File = ...;
assert_eq!(query(db, a), query(db, b));
```
Two interned structs created in the same query are equal
```rust
#[salsa::tracked]
fn query(db: &dyn Db) -> (MyStruct, MyStruct) {
(MyStruct::new(db, 10), MyStruct::new(db, 10))
}
let (a, b) = query(db);
assert_eq!(a, b);
```
### Garbage collection
The current version isn't technically releasing old values. It's mainly
that Salsa tries to reuse interned values if they haven't been used "in a while":
* LRU based garbage collection
* Values that haven't been used "in a while" can be reused
* Only available for low-durability values (more on durability later)
### When to use?
* If structural equality matters and the values are created in different queries
* If data reuse is important
## Input Structs
Inputs store the state that comes from outside of Salsa. It's the only Salsa struct that can be mutated.
```rust
#[salsa::input]
#[derive(PartialOrd, Ord)]
pub struct File {
/// The path of the file (immutable).
#[returns(ref)]
pub path: FilePath,
/// The unix permissions of the file. Only supported on unix systems. Always `None` on Windows
/// or when the file has been deleted.
#[default]
pub permissions: Option<u32>,
/// The file revision. A file has changed if the revisions don't compare equal.
#[default]
pub revision: FileRevision,
/// The status of the file.
///
/// Salsa doesn't support deleting inputs. The only way to signal dependent queries that
/// the file has been deleted is to change the status to `Deleted`.
#[default]
status: FileStatus,
}
```
* Inputs use referential equality. No two inputs compare equal.
* There's no garbage collection: Deleting (dropping) an input doesn't result in Salsa clearing all queries that take `File` as argument. (maybe something we should add someday?)
* Each field is tracked separately: Changing `revision` only invalidates queries that read `file.revision`.
:::warning
Salsa doesn't compare if the old and new values are equal. It always assumes that the value has changed when the setter is called! That's why we have this common pattern:
```rust
if file.status(db) != status {
tracing::debug!("Updating the status of `{}`", file.path(db));
file.set_status(db).to(status);
}
```
:::
#### Durabilities
Salsa supports three durabilities:
* `LOW` (default)
* `MEDIUM` (ty uses medium for third party dependencies)
* `HIGH` (ty uses high for anything in the vendored file system)
Each input struct field gets assigned a durability during construction:
```rust
let metadata = db.system().path_metadata(path);
let durability = self
.root(db, path)
.map_or(Durability::default(), |root| root.durability(db));
let builder = File::builder(FilePath::System(absolute))
.durability(durability)
.path_durability(Durability::HIGH);
let builder = match metadata {
Ok(metadata) if metadata.file_type().is_file() => builder
.permissions(metadata.permissions())
.revision(metadata.revision()),
Ok(metadata) if metadata.file_type().is_directory() => {
builder.status(FileStatus::IsADirectory)
}
_ => builder
.status(FileStatus::NotFound)
.status_durability(Durability::MEDIUM.max(durability)),
};
builder.new(db)
```
Durabilities are an optimization that allow Salsa to short-circuit validation after an input has changed. For example, ty uses the `HIGH` durability for typeshed and `LOW` for any first party file. If a first party file changes, Salsa can skip over all queries that only operate on typeshed because it knows, only an input with `LOW` durability has changed.
This is similar to a generational GC where objects that were recently created are scanned more often and long-living objects are in pools that are only scanned infrequently.
:::info
We're considering adding an `IMMORTAL` durability for inputs that are known to **never** change. It would allow Salsa to not track any dependencies for those queries (or much fewer). Again, typeshed would be an example where such a durability would be useful
:::
## Queries
Salsa queries are functions where Salsa adds caching. Most queries have the form
```rust
#[salsa::tracked]
fn query(db: &dyn db, salsa_struct: SalsaStruct) -> R {
...
}
```
Salsa caches the function's result and only re-executes it:
* when it participates in a fixpoint-iteration and a new iteration started.
* One of the function's inputs changed
### Inputs
The inputs of a query are queries that a query-function calls or structs from which the query function reads a value from:
```rust
#[salsa::tracked(returns(ref), no_eq)]
pub fn parsed_module(db: &dyn Db, file: File) -> ParsedModule {
let _span = tracing::trace_span!("parsed_module", ?file).entered();
let parsed = parsed_module_impl(db, file);
ParsedModule::new(file, parsed)
}
pub fn parsed_module_impl(db: &dyn Db, file: File) -> Parsed<ModModule> {
let source = source_text(db, file);
let ty = file.source_type(db);
let target_version = db.python_version();
let options = ParseOptions::from(ty).with_target_version(target_version);
parse_unchecked(&source, options)
.try_into_module()
.expect("PySourceType always parses into a module")
}
```
The `parsed_module` query's inputs are:
* `source_text(db, file)`
* `File::path` (called by `file.source_type(db)`)
* `Program::python_version` (called by `db.python_version`)
Salsa re-executes the query [if any of the query's inputs changed](https://salsa-rs.netlify.app/plumbing/maybe_changed_after). E.g., if a file gets updated, then `source_text` changes, which now requires that `parsed_module` needs to re-run.
:::info
Maybe surprising, but the query's argument isn't an input!
It only becomes an input if the query reads a value from the argument.
:::
:::info
Salsa ["backdates"](https://salsa-rs.netlify.app/plumbing/terminology/backdate) a query if it returns the same result after re-executing it as in the previous revision.
:::
### Arguments
Queries that take more than one argument are lowered to a single argument query by interning the arguments:
```rust
#[salsa::tracked]
fn query(db: &dyn db, type: Type, context: Context) -> R {
...
}
```
gets lowered to
```rust
#[salsa::interned]
struct Arguments<'db> {
type: Type,
context: Context
}
#[salsa::tracked]
fn query(db: &dyn db, arguments: Arguments) -> R {
}
```
## Storage
### Structs
* Are stored in pages of 1000 each (arena allocation)
* Each struct tracks additional metadata ([interned](https://github.com/MichaReiser/salsa/blob/4f627b202e55409ddf78f227f68c09972803017a/src/interned.rs#L107-L133), [input](https://github.com/MichaReiser/salsa/blob/4f627b202e55409ddf78f227f68c09972803017a/src/input.rs#L254-L273), [tracked](https://github.com/MichaReiser/salsa/blob/4f627b202e55409ddf78f227f68c09972803017a/src/tracked_struct.rs#L273-L317))
### Queries
Queries are stored on the salsa struct that they take as argument-input.
```rust
#[salsa::tracked]
fn query(db: &dyn db, salsa_struct: SalsaStruct) -> R {
...
}
```
The query results are stored on the `salsa_struct`.
Salsa stores [a non trivial amount of metadata](https://github.com/MichaReiser/salsa/blob/4f627b202e55409ddf78f227f68c09972803017a/src/function/memo.rs#L90-L101) with each cached query:
* The query inputs (reads of struct fields or queries) and outputs (created tracked structs)
* The revision in which the query last changed
* The ids of the tracked structs it created
* Whether it is provisional
* ...
# The Database
## Structure
```rust
pub struct Storage<Db> {
handle: StorageHandle<Db>,
/// Per-thread state
zalsa_local: zalsa_local::ZalsaLocal,
}
```
* `handle` is the shared state between all threads
* `zalsa_local` tracks the thread local state:
* Query stack
* Metadata about the current query
`StorageHandle` isn't very interesting on its own. The interesting bits of the shared state are in the `Zalsa` struct.
`Zalsa` stores:
* An index of all known ingredients (which Salsa structs and queries can be used with this database)
* The [`Runtime`](https://github.com/MichaReiser/salsa/blob/4f627b202e55409ddf78f227f68c09972803017a/src/runtime.rs#L14-L37)
```rust
pub struct Runtime {
/// Set to true when the current revision has been canceled.
/// This is done when we an input is being changed. The flag
/// is set back to false once the input has been changed.
revision_canceled: AtomicBool,
/// Stores the "last change" revision for values of each duration.
/// This vector is always of length at least 1 (for Durability 0)
/// but its total length depends on the number of durations. The
/// element at index 0 is special as it represents the "current
/// revision". In general, we have the invariant that revisions
/// in here are *declining* -- that is, `revisions[i] >=
/// revisions[i + 1]`, for all `i`. This is because when you
/// modify a value with durability D, that implies that values
/// with durability less than D may have changed too.
revisions: [Revision; Durability::LEN],
/// The dependency graph tracks which runtimes are blocked on one
/// another, waiting for queries to terminate.
dependency_graph: Mutex<DependencyGraph>,
/// Data for instances
table: Table,
}
```
## ty's Db
```rust
#[salsa::db]
#[derive(Clone)]
pub struct ProjectDatabase {
project: Option<Project>,
files: Files,
// IMPORTANT: Never return clones of `system` outside `ProjectDatabase` (only return references)
// or the "trick" to get a mutable `Arc` in `Self::system_mut` is no longer guaranteed to work.
system: Arc<dyn System + Send + Sync + RefUnwindSafe>,
// IMPORTANT: This field must be the last because we use `zalsa_mut` (drops all other storage references)
// to drop all other references to the database, which gives us exclusive access to other `Arc`s stored on this db.
// However, for this to work it's important that the `storage` is dropped AFTER any `Arc` that
// we try to mutably borrow using `Arc::get_mut` (like `system`).
storage: salsa::Storage<ProjectDatabase>,
}
```
## What you can do with a salsa Db
Not much besides reading the data or updating inputs.
### Cloning
You can clone a database to move a `Db` to another thread while sharing the same data
```rust
rayon::scope(move |scope| {
for file in &files {
let db = db.clone();
scope.spawn(move |_| {
let check_file_span =
tracing::debug_span!(parent: project_span, "check_file", ?file);
let _entered = check_file_span.entered();
let result = self.check_file_impl(&db, file);
file_diagnostics.lock().unwrap().extend(result);
reporter.report_file(&file);
});
}
});
```
Mutating a `Db` requires that all other `Db` instances referring the same storage are dropped first. This is accomplished by setting a cencallation token and then waiting for the Dbs to drop. A `Db` instance that sees that the cancellation token is set will panic with `salsa::Cancelled`.
### Untracked read
You can register an untracked read. This adds an input dependency to the current query which will force that query to re-execute on every revision.
### Trigger a synthetic write
Simulate a change of an input with a given `Durability`. Mainly useful for testing salsa.
### But it can be extended by having specific `Db` traits
```rust
#[salsa::db]
pub trait Db: SourceDb + Upcast<dyn SourceDb> {
fn is_file_open(&self, file: File) -> bool;
/// Resolves the rule selection for a given file.
fn rule_selection(&self, file: File) -> &RuleSelection;
fn lint_registry(&self) -> &LintRegistry;
}
```
The main purpose for methods on the `Db` trait are to provide extension points for functionality that's implemented in a higher up crate. For example, `rule_selection` requires accessing the project's settings, which are only known in `ty_project` (but this is the `Db` trait from `ty_python_semantic`).
# Why is it important to avoid cross-file query-dependencies
> I would be interested in an example of that cross-file/module boundary that we sometimes talk about. Where it is important to introduce additional queries and/or not keep references to things in other files. See, I can't even properly phrase the problem.
* It doesn't matter for a single run (e.g. CLI).
* It could result in doing some work multiple times but that might be fine.
* It's only important for incrementality.
Let's say we have:
**`a.py`**:
```python!
x: int = 10
```
**`b.py`**:
```python!
from a import x
reveal_type(x)
```
And the user now changes `a.py` to:
```python
x: int = 20 + 30
```
For as far as `b`'s concerned, nothing interesting changed about `a.py`: `x` is still typed as `int`. Because of that, ty shouldn't need to re-run any type inference inside `b.py` because nothing has changed.
This is accomplished by making `global_symbol` a Salsa query (or some function that it calls internally). `global_symbol` then acts as an invalidation boundary:
* Salsa reruns `global_symbol(a, "x")` because `a`'s AST has changed.
* `global_symbol(a, "X")` returns the same result as before the change (`int`)
* Salsa recognizes this and won't re-run the query infering `reveal_type(x)` because all that query's input are unchanged (the AST of `b` is unchanged and `x` is still `int`)
:::info
Resolving `x` calls into `global_symbol` (I think) which eventually calls into `place_by_id`. `place_by_id` is a Salsa query (`place_by_id` is too large to paste here).
:::
If we remove the `#[salsa::tracked]` from `global_symbol` (`place_by_id` to be precise), then the query infering `reveal_type(x)` would depend on `a.py`'s AST and it would have to re-run whenever `a.py` changes.
TLDR: It's only about containing changes for faster incremental type checking. This will also become relevant when we have persistent caching, in which case ty could even skip parsing `b.py` if only `a.py` changed.