owned this note
owned this note
Published
Linked with GitHub
# Ez Parallel Plan
## Goals
* Allow queries to be easily invoked across many items in parallel
## Code example
```rust
salsa::par_map(
db,
inputs,
|input| ..., // <-- not allowed to capture surrounding state
);
```
where
```rust
fn par_map<Db: Database, D, E, C>(
db: &Db,
inputs: impl IntoParallelIterator<Item = D>,
op: fn(&Db, D) -> E,
} -> C
where
C: FromParallelIterator<Self::Item>,
{
inputs
.into_par_iter()
.map_with(
db,
|db, element| op(db, element)
) /* pseudocode */
.collect()
}
```
## Technical architecture
Database types today can be *cloned* to create distinct handles (which must run on separate threads). One challenge is that we can't invoke `Clone` on a `&dyn Database`. But we can add a method to the `ZalsaDatabase` trait that clones to a `Box<dyn Database>` and then use upcasting.
`par_map` can internally use rayon and can use the `map_with` combinator. This combinator invokes `clone` when it must to have a local copy of `db` per thread. We'll actually have to use some kind of newtype wrapper around the `&Db` since that type doesn't implement `Clone`.
Some important points:
* The `par_map` above takes a `fn`, which ensures that there is no state captured in upvars. I *thought* this was important, because I didn't want state from the middle of a query to leak, but as I think about it, I don't really see why it matters. So maybe it can be an `impl Fn`, which is better.
## Mentoring notes / suggested plan
Let's do this...
* Remove the `Clone` impl from `Storage` and add a `fn fork(&self) -> Storage<Self>` instead.
* Add a `fn fork(&self) -> Box<dyn Database>` method to the `salsa::Database` trait (we may want to revise this signature later, but it's ok to start)
```rust
trait Database {
fn fork(&self) -> Box<dyn Database>;
}
```
Users are then expected to implement it and fork their internal fields (if any)...
```rust
#[salsa::db]
struct MyStruct {
storage: Storage<Self>,
my_field: MyField,
}
#[salsa::db]
impl Database for MyStruct {
fn fork(&self) -> Box<dyn Database> {
Box::new(Self {
storage: self.storage.fork(),
my_field: self.my_field.clone(),
})
}
}
```
* A simple "join" method would then look like
```rust
fn join<Db, A, B>(db: &Db, a: impl FnOnce(&Db) -> A, b: impl FnOnce(&Db) -> B) -> (A, B)
where
Db: ?Sized + Database,
{
let fork_db_a: Box<dyn Database> = db.fork();
let fork_db_a: &Db = fork_db_a.as_view::<Db>().unwrap();
rayon::join(
|| {
a(fork_db_a)
},
|| {
b(db)
},
)
}
```
...generalizing this to `par_map` is left as an exercise to the reader.
### Clone alternative
* Already `Storage<Self>` implements `Clone`
* Add `Clone` as a supertrait of `HasStorage`
* Add a method `fn fork_db(&self) -> Box<dyn Database>` to the `ZalsaDatabase` trait
* this trait is implemented on line 93 of storage.rs, we would add an implementation that does `Box::new(self.clone())` basically
* Do the "upcasting" and `join` method as described above
**Impact for users:**
Every Salsa database *must* implement `Clone`, typically by deriving it (but not necessarily). `Clone` is required to create a separate database handle with same backing global state.
## Questions
* Right now, the way to fork a database is to "clone" it. Maybe we should make a `fork` method or something? It's a nit, really.
* We could conceivably not have a `par_map` function but instead expose the wrappers that we'll use to make `map_with` work. But I'm sort of inclined to start this way.