# Ez Parallel Plan ## Goals * Allow queries to be easily invoked across many items in parallel ## Code example ```rust salsa::par_map( db, inputs, |input| ..., // <-- not allowed to capture surrounding state ); ``` where ```rust fn par_map<Db: Database, D, E, C>( db: &Db, inputs: impl IntoParallelIterator<Item = D>, op: fn(&Db, D) -> E, } -> C where C: FromParallelIterator<Self::Item>, { inputs .into_par_iter() .map_with( db, |db, element| op(db, element) ) /* pseudocode */ .collect() } ``` ## Technical architecture Database types today can be *cloned* to create distinct handles (which must run on separate threads). One challenge is that we can't invoke `Clone` on a `&dyn Database`. But we can add a method to the `ZalsaDatabase` trait that clones to a `Box<dyn Database>` and then use upcasting. `par_map` can internally use rayon and can use the `map_with` combinator. This combinator invokes `clone` when it must to have a local copy of `db` per thread. We'll actually have to use some kind of newtype wrapper around the `&Db` since that type doesn't implement `Clone`. Some important points: * The `par_map` above takes a `fn`, which ensures that there is no state captured in upvars. I *thought* this was important, because I didn't want state from the middle of a query to leak, but as I think about it, I don't really see why it matters. So maybe it can be an `impl Fn`, which is better. ## Mentoring notes / suggested plan Let's do this... * Remove the `Clone` impl from `Storage` and add a `fn fork(&self) -> Storage<Self>` instead. * Add a `fn fork(&self) -> Box<dyn Database>` method to the `salsa::Database` trait (we may want to revise this signature later, but it's ok to start) ```rust trait Database { fn fork(&self) -> Box<dyn Database>; } ``` Users are then expected to implement it and fork their internal fields (if any)... ```rust #[salsa::db] struct MyStruct { storage: Storage<Self>, my_field: MyField, } #[salsa::db] impl Database for MyStruct { fn fork(&self) -> Box<dyn Database> { Box::new(Self { storage: self.storage.fork(), my_field: self.my_field.clone(), }) } } ``` * A simple "join" method would then look like ```rust fn join<Db, A, B>(db: &Db, a: impl FnOnce(&Db) -> A, b: impl FnOnce(&Db) -> B) -> (A, B) where Db: ?Sized + Database, { let fork_db_a: Box<dyn Database> = db.fork(); let fork_db_a: &Db = fork_db_a.as_view::<Db>().unwrap(); rayon::join( || { a(fork_db_a) }, || { b(db) }, ) } ``` ...generalizing this to `par_map` is left as an exercise to the reader. ### Clone alternative * Already `Storage<Self>` implements `Clone` * Add `Clone` as a supertrait of `HasStorage` * Add a method `fn fork_db(&self) -> Box<dyn Database>` to the `ZalsaDatabase` trait * this trait is implemented on line 93 of storage.rs, we would add an implementation that does `Box::new(self.clone())` basically * Do the "upcasting" and `join` method as described above **Impact for users:** Every Salsa database *must* implement `Clone`, typically by deriving it (but not necessarily). `Clone` is required to create a separate database handle with same backing global state. ## Questions * Right now, the way to fork a database is to "clone" it. Maybe we should make a `fork` method or something? It's a nit, really. * We could conceivably not have a `par_map` function but instead expose the wrappers that we'll use to make `map_with` work. But I'm sort of inclined to start this way.