Design meeting 2025-01-29: Generators part 1

--- title: "Design meeting 2025-01-29: Generators part 1" tags: ["T-lang", "design-meeting", "minutes"] date: 2025-01-29 discussion: https://rust-lang.chat.com/#narrow/channel/410673-t-lang.2Fmeetings/topic/Design.20meeting.202025-01-29 url: https://hackmd.io/7O9IyhHvRmqaMd-NS6dYyw --- # Self Referential Generators *Why we need them, and can we live without them?* Questions for Lang team: - Should we support borrows across yields? (This requires changes to the `Iterator` trait, or creating a new trait.) - Should we support both non-borrowing generators (that implement `Iterator` as it exists today) and borrowing generators? - If we support both, what should the default be? Are we okay with the default between `gen` and `async` being different? ## What does self-referentiality enable? Self-referentiality is Rust's way of allowing generators to hold borrows of variables on the generator's stack across yield points. Below is a trivial example. ```rust gen { let x = 42; let y = &x; yield 0; yield *y; } ``` This generator yields 0 and then 42. Because of the way we've written it, we store 42 on the stack, and keep a reference to it on the stack. We dereference `y` to yield 42, which means the borrow must live accross the `yield 0` line. Rust's generator lowering converts this into a self-reference. This example is obviously contrived and there are better ways to write this particular generator. But this pattern arises in a number of important use cases, some of which we'll discuss now. For more examples, see [Self-borrowing generator examples](/DTSOVR4QRLyvaU1HQiQZvg). ### `RefCell` or `Mutex` A common pattern that requires borrowing across yields is when you want to iterate over a collection that's held behind a `RefCell` or `Mutex` and yield some of its items. For example: ```rust gen fn interesting_items(items: Rc<RefCell<Vec<Item>>>) -> Item { let items = items.borrow(); for item in items.iter() { if is_interesting(item) { // clone is needed so this is not a lending generator yield item.clone() } } } ``` In this case, you could attempt to not self-borrow like this: ```rust gen fn interesting_items(items: Rc<RefCell<Vec<Item>>>) -> Item { let len = items.borrow().len(); for i in 0..len { let items = items.borrow(); let item = items[i].clone(); if is_interesting(&item) { yield item } } } ``` This does not preserve all the semantics though. In between loop iterations, code in another place that has access to the same collection could add or remove values. That is prevented in our original formulation by holding the `RefCell` borrow for the duration of the iterator. Another option is to put the burden of borrowing the `RefCell` on the caller. ```rust gen fn interesting_items(items: Ref<'_, Vec<Item>>) -> Item { for item in items.iter() { if is_interesting(item) { // clone is needed so this is not a lending generator yield item.clone() } } } let items: Rc<RefCell<Vec<Item>>> = get_items(); let interesting = interesting_items(items.borrow()); for item in interesting { println!("Isn't {item} interesting?"); } ``` This persists the borrow for the whole lifetime of the generator, but unfortunately in this version the generator is no longer self-contained. The caller must make sure to keep `items` in place for the whole lifetime of the generator. In our original formulation, the generator owned `items`. ### Turning a borrowing iterator into an owning one ```rust gen fn chars(s: String) -> char { for c in s.chars() { //~^ ERROR: borrow may still be in use when `gen` fn body yields yield c; } } ``` [Playground link](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&code=%23%21%5Bfeature%28gen_blocks%29%5D%0A%0Agen+fn+chars%28s%3A+String%29+-%3E+char+%7B%0A++++for+c+in+s.chars%28%29+%7B%0A++++++++%2F%2F~%5E+ERROR%3A+borrow+may+still+be+in+use+when+%60gen%60+fn+body+yields%0A++++++++yield+c%3B%0A++++%7D%0A%7D%0A%0Afn+main%28%29+%7B%7D%0A) This particular case could instead be written as: ```rust pub gen fn chars(s: String) -> char { let mut idx = 0; loop { let c = match s[idx..].chars().next() { Some(x) => x, None => return, }; idx += c.len_utf8(); yield c; } } ``` [Playground link](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&code=%23%21%5Bfeature%28gen_blocks%29%5D%0A%0Apub+gen+fn+chars%28s%3A+String%29+-%3E+char+%7B%0A++++let+mut+idx+%3D+0%3B%0A++++loop+%7B%0A++++++++let+c+%3D+match+s%5Bidx..%5D.chars%28%29.next%28%29+%7B%0A++++++++++++Some%28x%29+%3D%3E+x%2C%0A++++++++++++None+%3D%3E+return%2C%0A++++++++%7D%3B%0A++++++++idx+%2B%3D+c.len_utf8%28%29%3B%0A++++++++yield+c%3B%0A++++%7D%0A%7D%0A%0Afn+main%28%29+%7B%7D%0A) But that of course takes much more thought and care, and it loses much of the benefit of using generators in the first place. With types that do not support indexing or are more opaque than `String`, this kind of rewrite may not always be available. ### Iterating non-destructively over an owned value Here we need to borrow a generator-owned object while iterating through part of it. ```rust /// Yields the set of values for each key listed in `keys`. gen fn vals_for_keys<K: Eq + Hash, T: Clone>( map: HashMap<K, Vec<T>>, keys: Vec<K>, ) -> T { for k in keys { if let Some(values) = map.get(&k) { for val in values { yield val.clone(); } } } } ``` [Playground link](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=612276051e6b95bf74b6ba86b7ed6959) Versions of this example that do work: - Accepting a `&HashMap<K, Vec<T>>` - Advantage: `T` does not need to be cloned and can be handed out by reference - Disadvantage: Cannot return this generator from another function that owns the containers - Destructively accessing the owned HashMap via `map.remove(&k)` - Advantage: `T` does not need to be cloned and can be handed out by value - Disadvantage: In this example, cannot support repeated keys Note that this also does not work in the shared ownership (`Mutex`/`RefCell`) case. ## What still works without self-referentiality? Many, perhaps even most, generators will work just fine without self-referentiality. We'll discuss some broad categories of non-self-referential generators here. ### No borrowed state ```rust let count_to_10 = gen { let mut i = 0; while i <= 10; { i += 1; yield i; } } ``` This example does not hold any borrows across a `yield`, but it is still able to mutate stack variables and maintain state between yields. We suspect many generators will fall into this category. ### Borrowing, and yielding, externally owned state ```rust fn iterate_things(things: &[Thing]) { let thingifier = gen { for thing in things { yield thing.thingify() } } for thing in thingifier { println!("{thing:?}"); } } ``` We can yield references to the owned state, too. ```rust gen fn active_items(&self) -> &Item { for item in &self.items { if item.active() { yield item; } } } ``` As you can see, this works when borrowing `self`. ### Nested iteration While this category is probably a subset of the previous ones we've look at, we call it out here because we were frankly surprised it works. Here's an example: ```rust gen fn hashmap_to_association_list( map: HashMap<String, Vec<usize>> ) -> (String, usize) { for (k, v) in map { for x in v { yield (k.to_string(), x); } } } ``` We expected something to end up borrowing the outer iterator for the whole inner loop. This does happen if the outer loop were `for (k, v) in map.iter()`, but we suspect in most cases it is more desirable to use a value iterator than a borrowed iterator. ### Indexing into a Vec Non-destructive iteration over a container can still be accomplished when the container supports indexing and the index domain can be derived cheaply. The simplest and most common case is indexing into a vec. For example: ```rust gen fn repeat_vec<T: Clone>(v: Vec<T>) -> T { for i in 0..v.len() { yield v[i].clone(); } for i in 0..v.len() { yield v[i].clone(); } } ``` ### How to explain errors Without self-referential generators, users will be prone to writing code that contains a self borrow. These are likely to be confusing to many users. We will need to invest in offering guidance on how to transform their generators to not be self-referential. ## Learning from `async` The primary occurrence of self-referentiality in Rust today is in futures that result from `async` blocks or functions. In fact, this is the reason why `Pin` exists and why it's used in the `Future` trait. Async and generators share a lot in common in the implementation, as they are both enabled by the coroutine transform. Thus, some lessons from `async` might apply to generators as well. Early futures in Rust were built around combinators, particularly a `.and_then` combinator. This meant a lot of IO APIs, for example, had to be designed in a way that felt very "un-Rusty." For example, while today we might write: ```rust let mut buf = [0; 1024]; let mut cursor = 0; while cursor < 1024 { cursor += socket.read(&mut buf[cursor..]).await?; } ``` With combinators you'd have to do something more like: ```rust let mut buf = Buf { data: Box::new([0; 1024]), cursor: 0, }; while buf.cursor < 1024 { match await!(socket.read(buf)) { Ok((new_socket, new_buf, n)) => { socket = new_socket; buf = new_buf; buf.cursor += n; } Err((new_socket, new_buf, e)) => { socket = new_socket; buf = new_buf; Err(e)? } } } ``` (This example is taken from Aaron Turon's post [Borrowing in async code].) In this example, `read` takes and returns both the socket and buffer by value. This allows the whole future to be `'static`, which enables spawning on an executor. It would have been possible to write `read` more like the synchronous version, which borrows a buffer to read into, but doing so would have made the future inherently non-`'static` and greatly limited its usefulness. Because spawning onto global, often multithreaded, executors is so critical to futures, this limitation to the combinator approach was felt pervasively throughout future-based code. It was clear that `async` with self references was a huge improvement over what was possible with combinators. It is interesting that in all the years we've had the `Iterator` trait and its associated combinators, we have not had the same kind of pervasive struggles with non-`'static` iterators. It seems rather rare to need to send iterators between threads, especially relative to how often that use case occurs with futures. [Borrowing in async code]: https://aturon.github.io/tech/2018/04/24/async-borrowing/ ## What are the tradeoffs of supporting self-borrowing? The `Iterator` trait, as it exists today, does not support self references. At a minimum, supporting self-borrowing would require that a generator is immovable once iteration begins. This means that signatures such as ```rust fn foo<T>( iter: impl IntoIterator<Item = I> ) -> impl Iterator<Item = I> { ... } ``` would need to be rewritten as something equivalent to the following: ```rust fn foo<T>( iter: impl IntoGenerator<Item = I> ) -> impl Generator<Item = I> { ... } ``` In argument position this would be a backwards-compatible change. We'll discuss in the next section how we could make this backward compatible in return position. We believe that, in the fullness of time, we could make the needed changes mostly transparent to the user. That includes the pinning of the generator itself. A future lang meeting may dive into the details of this proposal more, including places where there are remaining rough edges.[^rough] [^rough]: For example, when passing a generator to an interface that accepts an iterator, passing `&mut my_gen` or `&mut pinned my_gen` might be required instead of the usual passing by value. Note that there are alternative ways of spelling the above, e.g. with a `Move` / `Overwrite` trait and emplacement, or a `pin` effect on the Iterator trait. We do not consider the various ways of spelling these to be in scope for today's discussion. Regardless of how they are spelled, the effect is the same: Existing code must update its signatures to support self-borrowing generators. ### Backward compatibility with `Generator + Unpin` We could improve the backward compatibility story by making generators which do not borrow across a yield point implement `Unpin` or its equivalent. Considering the above signature again, we could write instead: ```rust fn foo<T>( iter: impl IntoGenerator<Item = I> ) -> impl Generator<Item = I> + Unpin { ... } ``` If we had an `impl<G: Generator + Unpin> Iterator for G`, then it would *not* be a breaking change to upgrade to this signature. Assuming for the moment that this is possible, the primary drawback of this mitigation is that it creates two classes of generators: Those which are `Unpin` (or `Move`) and those which are not. It may, or may not, be clearer to distinguish un-pinnable generators with syntax like `gen iter {}`, `gen unpin {}`, or similar. Alternatively, in the other direction, we could make `gen {}` the unpinned variant and use `gen pinned {}`, `gen static {}`, or similar to opt-in to pinning. (Note that this would be counter to `async {}` which always requires pinning, though we could conceivably change that to match.) ### Ecosystem migration A lot of code is written using `Iterator` as it stands today. If we were to extend this in some way, such as by creating new traits or modifying `Iterator` in some other way, then some changes in the ecosystem will be needed to migrate. We can minimize many of the changes needed by having a strong cross-compatibility story, using bridging impls and other adapters. The exact story here will depend on which path we take in modifying `Iterator`. There are likely to be more migrations in the future. For example, generators make it easy to write lending iterators, so generators will probably increase the demand for lending iterators. Lending iterators are also not supported by the current `Iterator` trait, so in a few years we may be in a similar boat of asking the ecosystem to migrate to a new, more powerful iteration trait. ## Aside: What about lending generators? A likely future extension to iterators or generators is lending. This is when the item type is a reference into the iterator or generator's internal state. As a trivial example written using generator syntax, we could have: ```rust gen { let mut x = String::from("hello"); yield &x; yield &x; } ``` This case happens not to borrow across a yield, because none of the `&x` are used after yielding. Thus, this example does not create a self-reference.  But many examples involving self-references would benefit from lending. You may have noticed how many of the examples above involved cloning the item being yielded. A lending generator would allow us to rewrite all of them to avoid the clone; e.g., the first example could be rewritten as ```rust gen fn interesting_items(items: Rc<RefCell<Vec<Item>>>) -> &Item { let items = items.borrow(); for item in items.iter() { if is_interesting(item) { yield item; // <-- no `.clone()` needed! // Further processing can also be // done on `item`, if desired. } } } ``` Conversely, we hypothesize that many real-world examples involving lending also involve self-references like this. A lending iterator must lend from *something* owned by the generator, and it is often natural for that something to be an iterator which holds a reference into a container. Lending and self-references might then be "symbiotic" in a sense. There is at least one exception to this: when a lending generator yields references into an indexed container, most notably a buffer. ```rust /// Like the BufRead::lines method, but reuses the buffer /// between calls. gen fn lines(file: File) -> &str { let mut reader = BufReader::new(file); let mut line_buffer = String::with_capacity(1024); loop { if reader.read_line(&mut line_buffer).unwrap() == 0 { break; } yield line_buffer.as_str(); // yields an indexed range line_buffer.clear(); } } ``` ### Dependencies of lending generators oli-obk found no nice way to make non-lending coroutines implement `Iterator` without also making `Iterator` lending first. See [this zulip thread](https://rust-lang.zulipchat.com/#narrow/channel/481571-t-lang.2Fgen/topic/lending.20generators/near/496534696) for context. Useful reading: https://rust-lang.github.io/generic-associated-types-initiative/explainer/required_bounds.html#workaround Possible avenue for resolving this: 1. GAT improvements to allow avoiding `Self: 'me` bounds * possibly needs next solver? unsure, there seemed a lot of uncertainty in https://github.com/rust-lang/rust/issues/95451 and related discussions. 3. followed by making it backwards compatible to change an assoc ty into a GAT by implying GAT generics (necessary to make `Iterator::Item` be a GAT without breaking the world) 4. followed by making it backwards compatible to change `Iterator::next` to linking the `&'a mut self` lifetime to the `Self::Item<'a>` lifetime. 5. followed by making `Iterator` lending, then making coroutines lending --- # Discussion ## Attendance - People: TC, tmandry, eholk, tmandry, Josh, cramertj, Oli, yosh ## Meeting roles - Minutes, driver: TC ## Decision tree tmandry: Without getting into the weeds on any one thing, can we work out what the decision tree looks like for these questions? Questions for Lang team: - Should we support borrows across yields? (This requires changes to the `Iterator` trait, or creating a new trait.) - Should we support both non-borrowing generators (that implement `Iterator` as it exists today) and borrowing generators? - If we support both, what should the default be? Are we okay with the default between `gen` and `async` being different? - What is the purpose of gen (see Yosh's question)? Make it easier to write `impl Iterator` implementations? Be its own, new thing with its own rules? Preliminary answers: nikomatsakis: * Borrows across yields? Yes, that should be the default. I think that people should be able to take a random Rust function that returns `-> Vec<T>`, stick `gen` in front and `yield` in the middle, and get back something that yields up an iterator of `T`. That is my "experience goal". * Support non-borrowing generators? Probably useful, my preference would be to infer whether something is `Unpin` or not as an "extra trait"; I've not thought hard about how feasible that is. I'd ideally like to do it for `async {}` blocks too though. * Are we okay with the default between `gen` and `async` being different? No. I think we are going to want to move towards a common model of usage. I'd rather work to make working with pin easy. * What is the purpose of gen? Making it easier to work with "streams of data" in a uniform way across multiple contexts (sync, async, etc). See above for my "litmus test" of converting a function that returns `Vec<T>` to a function that returns `impl Iterator<Item = T>` with ease. I'd be interested in other "litmus tests". * Other questions that interest me: * how do we make things as optimal as iterators in terms of "expected length" and the rest * Lending: I see it as an orthogonal expansion of sorts, but I'd like to re-read the section of the doc above =) Josh: * Note: my answers would change completely if self-referential generators didn't require pinning. * Borrows across yields have value, and we should support them, but we shouldn't support *exclusively* those, if the cost is inflicting Pin on people. Let's not delay supporting generators on the basis of not supporting self-referential generators yet. * Support non-borrowing generators: yes, they're a distinct thing, and they don't require pinning. * Are we OK with the default being different: Yes. The default should not require pin. * Would I be OK with changing the default for `async` to match `gen`? Also yes. :grin: * I think self-referential generators and lending generators both seem useful, and both fall in the category of "things we could do later". tmandry: * Eventually we should support self-borrows *and* lending. I think they are a bit "symbiotic". * We should support a subset of the full features this year. I think the purpose of this is to make writing iterators easier. You should be able to take an `impl Iterator` and rewrite it as a `gen` block. * I think that subset could exclude both self-borrows and lending, or could include self-borrows. * I would like to limit the number of migrations we need to do in the ecosystem, which pushes me in the direction of no self-borrows until we also have lending. * Should the defaults differ? If async inherently has more of a need for self-borrowing I think the answer *can* be yes. I would like to converge toward a common set of capabilities. * I also think we can reserve space for the default to be self-borrowing without shipping it now, with something like `gen iter {}` and/or `impl Generator + Unpin`. TC: I think people are going to expect this to work: ```rust gen fn chars(s: String) -> char { for c in s.chars() { yield c; } } ``` And people will expect many similar things to work. And so I think it's just going to be too weird if we don't support this. I worry that, if we were to have the non-self-referential limitation, that it's the sort of thing that would make sense to us, given that we know how things work under the covers, but that it wouldn't really make sense to most people, and I don't know that this is the sort of thing we really want to have to teach carefully. It leaks the abstraction a bit. At the same time, I think there probably is value in allowing some way for `gen` and `async` literals to return things that implement `Unpin`, whether that's through an automatic mechanism or via some keyword annotation. In my view, both `async` and `gen` should work the safe way (maybe after an edition migration, if needed) so as to support one unified mental model of the behavior of "closure-like" blocks, in the same way that the `move` keyword has a consistent meaning between them. On lending, one challenge is that we don't currently know how to express non-lending traits as a strict subset of a broader lending trait, resulting in two hierarchies. We either need to accept that in a lending design or wait until we can (if we can) express that. But we don't necessarily need to solve this now, either way. Eric: (not a lang team member, but sharing some thoughts anyway) - Should we support borrowing across yields? Yes. Ideally people could yield pretty much wherever and not have to worry about it. I think the compiler errors would be annoying if we didn't support it. - Should we support both? We kind of have to, since non-self-referential iterators already exist. - What should the default be? Are we okay if `gen` and `async` are different? I'd make the default self-referential. I'm okay if `gen` and `async` are different. I think to language nerds they look the same, but I'm not sure that's true for most users. - What is the point of `gen`? I think it's more its own thing, but obviously closely related to iterators. While my first choice would be allowing self-borrows across yields, I think moving forward with `gen iter {}` is a reasonable compromise. I also think self references + lending are closely related, it might be worth trying to do them together, except that we don't quite have a viable path to lending yet. Yosh (also not on T-Lang, sharing thoughts): - Should we support borrows across yields? Eventually yes, but it's okay not to straight away. There are other features like lending and return types that we're not supporting straight away either. This would also allow us to ship `gen {}` blocks sooner. - Should we support both non-borrowing generators (that implement `Iterator` as it exists today) and borrowing generators? Yes! - If we support both, what should the default be? Are we okay with the default between `gen` and `async` being different? I think so - `Iterator + Send` is not the default either, we should make *all* variants like these easier to manage. - What is the purpose of gen (see Yosh's question)? Make it easier to write `impl Iterator` implementations? Be its own, new thing with its own rules? I believe we should start with what we have today, and chart a path to incrementally supporting more features. --- tmandry: In terms of hitting a wall, do we worry about people hitting the lending wall as much as the self-referential wall? NM: That feels like a separate wall that we already have. What I see here is "lift `Iterator` up a bit, but not too far." NM: The way I handle lending scenarios is with callbacks. I think a lot of things in Rust are designed to work that way. I think that external iteration is somewhat in tension with that. Not sure what to do there. tmandry: Is that skepticism that we'll ever support lending generally? NM: Not really. I do think we will want to but it feels like a thing we already struggle with, e.g., if you want to have internal references that live past a stack frame, you can't use `&[u8]` and you switch over to the `bytes` crate. I feel like this "fits" there in that it requires the same sorts of workarounds. Not amazing, but not end of the world for now. Josh: I think that pinning isn't strictly needed in a lot of real-world examples because the pointers in question are actually referring into the heap, not the stack. E.g., this example from elsewhere: ```rust gen fn chars(s: String) -> char { for c in s.chars() { //~^ ERROR: borrow may still be in use when `gen` fn body yields yield c; } } ``` Josh: I think saying "you need `pin` if you want to avoid allocation" is much better than "you need pin for any internal borrow". TC: We'd need to express this in the type system somehow. It represents a semver guarantee on the behavior of `chars`, as the returned iterator could change its representation to hold a reference to the length of the string, which is owned by the generator. It's also fragile because if the input becomes `SmallVec` or similar, this stops working. NM (typing, not out loud): I've been thinking about this. I think it starts to work better if we move to place-based borrows as I've been exploring lately with Dada. You basically want a way to say "I borrow from a stable addressed referenced by self". TM: Josh's point is part of what makes me feel we are contributing to technical debt by enshrining pin as part of the default generator requirement, when we could solve most of the same use cases without requiring it at all. Yosh: Yes, it feels like that to me too. NM: I'm curious what people think about *this* example ```rust async gen fn logging_chars(s: String) -> char { for c in s.chars() { log_char(c).await; //~^ ERROR: borrow may still be in use when `gen` fn body yields yield c; } } ``` In other words, if `async gen` also prohibited self-borrows, so it returns a future (which must be pinned) but otherwise doesn't permit borrows. I certainly feel better about `gen` and `async gen` both being limited than just one. JT: If we can do non-self-referential now, and get surprisingly far with just that, without pinning, then I love the idea of deferring self-referential long enough to see if we can get it to work without pin. NM: I think my take is that we just don't get very far with `gen` unless it handles borrows. It's not that much better. But I would like to pivot to the "path forward". Clearly we don't yet see eye to eye, how do we best make the case? How can we *explore* what it looks like? One thing I am interested in is a kind of canincal set of examples and the chance to play with things to see how the pin feels in practice. --- tmandry: To answer your other question, Niko, I feel like maybe async things are longer-lived than gen things, and maybe that's the difference. TC: It's these limitations of `Iterator` that force it to be short-lived, because you often end up holding a reference to data you can't internalize. This, I think, drives our perception of what is common. When I think about how I've used generators in other languages where they can usefully own data, I've often used them in long-lived ways. tmandry: I think what you're saying is plausible, TC. NM: I've been wondering "what would it take to convince me otherwise". I think one of my "axioms" is that we'll be able to make pin feel decently nice. I'd like to explore that. I am also curious to see examples to see if I can be convinced that `async gen` is fundamentally more likely to encounter cases that require self-borrows. I would perhaps be more persuaded if we resolved the `gen` case to support anything heap-allocated. Yosh: everyone agrees self-referential is useful eventually, right? ....yes.... TM: Even if we can get pinning nice, I still see the cost of updating bounds being significant and don't want to do that twice. For me, the biggest doubts arise around how analogous sync and async should be. Jubilee (joining late): What are the exact "scope" of those costs? Nobody has mapped out "this is what we are going to pay". NM: There isn't total incompatibility under this design. I would definitely want to measure out the impacted crates and get a sense of what in the ecosystem ultimately happens, but it's not like we have to build edition migrations and so forth. TC: If we go with non-self-referential, we should, I think, start with a syntax like `gen iter` to save the `gen` syntax space for the "real" one. That's not cost-free. People will want to use this feature, and will rewrite things in terms of it, but will hit the boundaries and have to learn about the nature of this limitation and write weird things as we've seen in our examples. When we then add the full thing, people may reasonably want to rewrite again in terms of that. So there's still churn. TC: In terms of the migration, I'm not that worried. The interop between `Iterator` and `Generator` can be handled reasonably well with blanket impls. It'll be a new language feature, and the ecosystem will migrate over time to support it, and that's fine. What would probably change my mind is if we found the migration would be more disruptive than we currently believe. eholk: With the cost, we are being vague, I've got some other documents that I think do enumerate the cost on particular paths more fully. One of the challenges though is that it's not realistic to say we can enumerate exactly what it will cost ahead of time. The best we can do is an estimate. There is a decision tree and each path has different costs. If the outcome of this meeting is that we decide we won't ship self-referential generators at the start, then the path forward is that we write a stabilization report for gen blocks, quibble over the syntax, and then start looking at when to add self-referential ones, but if we've already stabilized gen blocks, we have to work with what is stabilized already. If we wait until we have support for self-referential generators, a lot of it is an ecosystem-wide thing, we have to relax a lot of bounds, the generator trait is "morally" a supertrait of iterator (not *exactly*) but there are bridging impls and conversions that basically work. What I'd like is to be able to make progress on the first nodes in the decision tree while having an intuition about the long-term costs. tmandy: I've been saying I'm worried about the migration but really the thing I'm worried about is the complex space that we end up presenting to the user. If there's a knob for iterator vs generator, a further knob for lending and not lending, the differences between them and the specific use cases that they support are subtle. It's taken us quite a long time to sort through them in writing this doc. If we end up in a situation where every user has to decide whether the thing they return is going to be self-borrowing or not, lending or not, etc, when they just want to write an iterator, that's a big cost to Rust's perceived cost. I do see value in the orthogonal capabilities, I get that, but I feel like if we are going to do this big change from one trait to another, it should be to the final state that lets you do all the things. A 2x2 matrix just for iteration, not even layering in all the other effects. JT: To some extent we already have this problem. I'm not sure we have an opportunity to make it wildly better. tmandry: I agree, I just don't want there to be 4 traits for iterators before you even get to fallibility. NM: I agree with what Tyler had to say but I think I came to it the other way. I feel like the added value of a syntax for today's iterators is not that great. I want you to be able to just write `gen fn` and not worry about it. On the other hand, the argument I find most convincing that "hey you're not getting it with just self-referential borrows, you need lending too". (The meeting ended here.) --- ## Thought on "generator" vs "iterator" > It is interesting that in all the years we've had the `Iterator` trait and its associated combinators, we have not had the same kind of pervasive struggles with non-`'static` iterators. It seems rather rare to need to send iterators between threads, especially relative to how often that use case occurs with futures. Josh: While this isn't necessarily enough to outweigh other concerns, I *do* think that there are use cases for generators that aren't what we'd ordinarily think of as iteration. For instance, in some cases it may make sense to think of the read end of a queue as a generator. Python generators are extremely general, and get used for some very *creative* applications. TC: +1. eholk: And people (like TC) are already finding creative ways to use `async` as a more general coroutine. I'm guessing with generators, that will just increase. nikomatsakis: I agree the problems are not as pervasive with iterators, I think that's because they don't encode the "entirety" of what fn does in the same way; I do think the limitations crop up, particularly if you write code with `std::iter::from_fn`, which I do fairly regularly. The structure of having to manually "unwind" the iterator does sort of hide some of this. ## Observation: the "iterating non-destructively over an owned value" often arises for me with ref-counted data nikomatsakis: I have frequently hit the case of "iterating non-destructively over an owned value" in the following scenario. I have a widely shared `Vec`, say, that lives in an `Arc<Vec<T>>`. I will write something like ```rust fn process_data(bytes: Arc<Vec<u8>>) -> impl Iterator<Item = u8> { let mut bytes = bytes.iter(); std::iter::from_fn(|| { loop { if let Some(&b) = bytes.next() { if b != 0 { return b; } continue; } return None; } }) } ``` and then I will get errors and I will rewrite it to ```rust fn process_data(bytes: Arc<Vec<u8>>) -> impl Iterator<Item = u8> { let mut i = 0; std::iter::from_fn(|| { loop { if let Some(&b) = bytes.get(i) { if b != 0 { return b; } i += 1; continue; } return None; } }) } ``` and silently curse the world that we have brought into being. tmandry: The first example talks about using RefCell or Mutex. It should probably talk about "shared ownership" more generally. I see these as part of the same general case. nikomatsakis: Perhaps. I wanted to mention it mostly because I thought it might strike a chord of recognition from folks. ## What is the purpose of `gen`? yosh: A bit of a philosophical one, but is the purpose of `gen {}` to: 1. Make it easier to write `impl Iterator` implementations? 2. Be its own, new thing with its own rules? This answer may change over time too; in fact I expect it to. But it may help if we articulate what we believe the starting point should be - as well as what we see as our eventual goal. That way we can articulate what we should prioritize now vs we want to eventually get around to later as well. Yosh: From my perspective the `Iterator` trait has between 5-12 extensions we may want to make. In this meeting we are discussing two (lending, address-sensitivity). We're not going to make all of these extensions at once. Acknowledging we're not going to do it all in one go anyway, can we start with `Iterator` as our foundation and incrementally work our way up to adding more capabilities from there? ## Question: I'm interested in gathering data about how often iterators "escape" a stack nikomatsakis: Pinning works well if we can identify a particular stack frame which fully contains the iteration. I believe that covers the vast majority of iterators in practice. Very occasionally however I will have a struct that owns an iterator, most often when writing a parser. In that case you would either have to have (1) a `G` where `G: Generator + Unpin`; (2) a `Pin<Box<dyn Generator>>`; or (3) a `G` where the wrapping struct uses `Pin<&mut Self>`. I would be interested in measuring the frequency in which iterators are created and how much they get moved and transformed before being used. I do think that this fact is also why we've gotten away with iterators that take in borrowed references. ## Self-referential generators being transparent Josh: > We believe that, in the fullness of time, we could make the needed changes mostly transparent to the user. That includes the pinning of the generator itself. I find myself skeptical of this, and I think it's load-bearing. In general, it does not seem self-evident to me that something requiring pinning could be made transparent to the user, without them having to care whether something needs pinning or not. I'd want to at least see a proposal for what that could look like, before treating it as something that seems likely. If that is *not* transparent, then it seems likely that type signatures would have to distinguish "self-referential generator" from "non-self-referential generator". And that seems OK to me. Which would then mean it seems OK to start out having non-self-referential generators, and later add self-referential generators, and distinguish the two in type signatures and in the ways we handle them. Relatedly: > Assuming for the moment that this is possible, the primary drawback of this mitigation is that it creates two classes of generators: Those which are `Unpin` (or `Move`) and those which are not. Why is this a drawback? It seems like there *are* two classes of generators. eholk: Transparency is a bit of a spectrum here (maybe translucency is a better word?). We have a prototype where if you stick to for loops and combinators you basically don't see pin. The places where pinning comes up in the prototype are if you want to pass a generator to a context that's expecting an iterator, or if you want to manually advance a generator. Josh: That's exactly my concern: I'd like anything accepting an `impl Iterator` to be able to accept an inline `gen` block passed in the argument list. And I do also want to be able to call `.next()` on one, transparently. (Or `.next().await`, or `.next().await?`, ...) ## Implicit or explicit lending cramertj: As the examples point out, not all generators that return references are lending. However, the author of the generator is the one who usually knows whether they intend to make the generator lending or not, and thus that is the better place to create error messages. For example: ```rust let some_vec = v; let some_vec_ref = &v; let my_gen = gen { for elem_ref in v { yield elem_ref; } }; // not lending -- the lifetimes of all of the references unify and don't depend on generator state for elem_ref in my_gen { ... } ``` ```rust let some_vec = v; let my_gen = gen { for elem_ref in &v { yield elem_ref; } }; // lending -- the lifetimes of all of all the references are the same, but they depend on // the generator state because `&v` is created inside the generator. for elem_ref in my_gen { ... } // ERROR: ... not an iterator because it's lending ... ``` The user probably wanted to receive the above error at the creation of the generator (ideally pointing to the `yield` and explaining that the lifetime of `elem_ref` is only as long as `&v`). It would be nice to provide some explicit syntax so that they can request this. The combinatorial explosion of: lending/not lending, self-referential/non-self-referential is confusing, and IMO it would be helpful to either remove this distinction by defaulting to the maximally-permissive version (self-referential lending) or to make this choice explicit. ## Syntax for pinned generators > Alternatively, in the other direction, we could make `gen {}` the unpinned variant and use `gen pinned {}`, `gen static {}`, or similar to opt-in to pinning. Josh: Yes please. (Though I'd prefer something more semantic like `selfref`.) This doesn't seem like too much to ask, especially if the user gets a nice error message telling them to add it. ## Does supporting lending iterators actually require changes to the Iterator trait? cramertj: AFAICT this only requires changes to the Iterator trait if we don't have a way to spell: ```rust impl<G: Generator, Item> Iterator for Pin<&mut G> where for<'a> <G as Generator::Item<'a>> = Item { type Item = Item; ... } // same impl for `G` rather than `Pin<&mut G>` if `G: Unpin` ``` cramertj: We want to be able to write this kind of "GATs unify" requirement for many reasons. If our ideal version of generators uses this feature, IMO we should "just do it" (insert caveats that this is hard). oli: I'm unsure; this may also be hard/only sound with next solver. oli: We can avoid migrations by designing the `Generator` trait for lending, but just not supporting it yet from `gen` blocks. ## Can we make more things work without self-reference? Josh: Copying an example that TC wrote above: ```rust gen fn chars(s: String) -> char { for c in s.chars() { //~^ ERROR: borrow may still be in use when `gen` fn body yields yield c; } } ``` The generator owns `s`. `s.chars()` borrows `s`. However, in *theory*, the borrow is entirely self-contained within the generator, and the actual yielded values are `char` which is `Copy`. I understand how, if we wanted to represent the type of the generator's state type entirely within Rust, it would involve a borrow. If, hypothetically, generators were something entirely magic and internal to the compiler, is there any way we could support this *without* a self-referential generator? Or, perhaps more importantly, without requiring pinning? (This may be a long discussion involving more language design rabbit holes than we want to go into today.) nikomatsakis: It is possible. We would need to add some better notion of deref such that we could say -- we hold a reference to heap memory that is rooted in `x`, but that reference is not invalidated when it is moved. I think we will need to tackle this problem sooner or later. I've been pondering it a bit as I work through what safe internal references look like. Josh: I'm extremely interested in understanding what that would look like and how we could make it work. Bearing in mind that we don't have to make it a user-visible thing in order to use it in the internal state of a generator. How feasible is this, in very coarse terms? Quarter, year, edition, decade? Asking in particular because that seems like a common way to solve this problem: give ownership to the generator. Much as people often solve async issues with "stick it in an `Arc`". If people *can* do that, this would be much less painful. TC: We'd need to express this in the type system somehow. It represents a semver guarantee on the behavior of `chars`, as the returned iterator could change its representation to hold a reference to the length of the string, which is owned by the generator. It's also still fragile because if the input becomes `SmallVec` or similar, this stops working.