owned this note
owned this note
Published
Linked with GitHub
- Feature Name: emissions_control
- Start Date:
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
# Summary
[summary]: #summary
Add `*_nonexhausting()` variants for every `drain()` that do not eagerly consume residual items on drop of the `DrainNonexhausting` struct.
Add an adapter `.exhausting()` to `Iterator` which gives the same behaviour of running the iterator to its end on drop to arbitrary iterators.
Before dropping it will act exactly like the source iterator.
# Motivation
[motivation]: #motivation
The `drain` API is a specialized operation that combines two unrelated tasks:
1. Moving elements out of a collection without consuming it
2. Clearing the range or the entire collection, regardless of iteration
You could call it `drain_clearing`. In cases where one does not know in advance
how many elements to remove, there is no efficient way of lazily moving
a subset of elements out while keeping the collection. The forced consumption isn't necessary for a safe drain, doesn't give you more control nor is it necessarily faster.
For collections allowing range arguments in `drain`, a `drain_nonexhausting(..).nth(end - 1)` has no additional cost over a `.drain(..end)`.
However, the latter needs to know `end` ahead of time.
Other collections, like `BinaryHeap`, need to rebuild their structure afterwards, but this can still be cheaper than building a new
collection from scratch.
The `drain_filter` methods recognize the need for a more selective removal with on-the-fly decisions.
However, `DrainFilter`, too, will eagerly exhaust itself on drop with no way of stopping.
Excess elements can be kept by hacking some state awareness into the conditional closure and always returning `false` after some point,
but this is both unnecessary computation and tedious for the programmer.
More generally speaking, it's highly uncharacteristic for an iterator to behave (semi-)eagerly by default.
`drain()` is stable and therefore cannot be changed, but we should have a conforming iterator.
```rust
// take only what's needed
for element in dont_waste_me.drain_nonexhausting(..) {
/* do stuff */
if condition {
break
}
}
let cherrypicked = vec.drain_filter_nonexhausting(condition)
.take(10)
.collect();
```
## Exhausting Iterators
On the flipside, the current methods showcase the use of self-exhausting iterators. The principle can apply to any side effecting iterator
where all side effects are needed but not (all) the elements it returns.
An `exhausting` adapter allows adding self-exhaustion on drop to arbitrary iterators.
The iter can then be passed to a function or returned from one. It also has better compatibility with method chaining.
Note that returning a self-exhausting iterator from a function should mostly be limited to callback situations.
Adding `exhausting` to the std library should make it easy for users to gain this behaviour where necessary
and avert more non-lazy iterator APIs in the std library and outside of it. Hardcoding self-exhaustion is mixing concerns and needlessly limiting.
Iteration through `.by_ref()` can achieve the same result, but only in some cases. Namely, when one is holding the iter.
```rust
// manual exhaustion with by-ref
let mut iter = iter.some()
.adapter()
.chain();
let val = iter.by_ref() // chain breaking indirection
.map(func)
find(condition);
iter.for_each(|_| {}); // explicitly consume iterator
// must have access to iter
// chain with proposed method
let val = iter.some() // all of this
.adapter() // will run
.chain() // for all elements in iter
.exhausting()
.map(func) // runs only until
.find(condition); // an element is found or the iterator is exhausted
// pass side effecting iter away
iter_of_iters.flat_map(|iter| {
iter.map(side_effects)
.exhausting() // finish what you've started
.take_while(condition)
});
// return self-exhausting iter from function
fn drain(&mut self) -> Drain {
// wrapper can forward all iterator methods to internal iter
Drain(
self.drain_nonexhausting()
.exhausting()
)
}
```
# Implementation
## Non-Exhausting Drain
The non-exhausting drain adapters can be built from the regular `Drain` structs with small changes. The `drain`s that accept range arguments require minimal adaptation. They can use the range end as an iteration limit only and the current position of the iterator instead of the range's end for the collection repair.
The `{Hash | B}{Map | Set}` collections and `BinaryHeap` need to rebuild in place. This RFC doesn't lay out a plan as for how to do that. Many of these are still lacking `drain` and/or `drain_filter`, but there is a [desire to add them](https://github.com/rust-lang/rfcs/issues/2140). As mentioned above, `drain_filter_nonexhausting` can be emulated with `drain_filter(condition)` by returning `false` for everything from `condition` after some point. `drain` can be emulated with `drain_filter`. Therefore, any collection for which `drain_filter` can exist, can also have a `drain_filter_nonexhausting` and by extension `drain_nonexhausting`.
## Exhausting
During iteration, the `Exhausting` adapter is a trivial wrapper that acts like `&mut Self`, meaning it implements all the Iterator traits that the contained iter implements and will always do external iteration. On drop, it runs `for _ in self {}`.
## Interactions
`drain_nonexhausting().exhausting()` is functionally equivalent to `drain()`, but the same is not true for `drain_filter`. If a panic occurs during `drop` of `DrainFilter` while it is self-exhausting, then all remaining elements will be leaked. An unwind during `drop` for `Exhausting(DrainFilterNonexhausting)` will still call the repair code in `DrainFilterNonexhausting`'s `Drop` and thus not leak.
Put another way, `drain_filter(condition)` guarantees to remove all elements for which `condition` holds true at the cost of leaks and of losing elements that should be kept, `drain_filter_nonexhausting(condition).exhausting()` guarantees that all elements for which the `condition` doesn't hold are still there at the cost of leftover elements that should have been removed but without leaks. Of course, the state of the collection is only of relevance when you're catching unwinds, at which point you should be expecting broken invariants.
# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation
`drain_nonexhausting` is like `drain` but does not remove items from the collection that were not consumed through the iterator.
The difference between `drain_filter_nonexhausting` and `drain_filter` is the same.
# Drawbacks
[drawbacks]: #drawbacks
* `.exhausting()` may be too niche a usecase with `drain` already forcing the behaviour.
* The `Exhausting` adapter has a corner case on finished, non-fused iterators. On drop, it will attempt iteration again which will result in implementation dependent behaviour unless guarded against with a flag and a comparison on every `.next()`.
# Rationale and alternatives
[alternatives]: #alternatives
* Make `drain_filter` nonexhaustive and don't add any `_nonexhausting` variants at all. This minimizes API surface, but the discrepancy between `drain` and `drain_filter` will probably be surprising to many.
* Leave either nonexhausting drains or the exhausting adapter out. They are proposed together because they are related, but there is no interdependency.
`exhausting` could be a part of itertools. `nonexhausting` deals with collection internals and needs to be in `std` (or `alloc`).
* The proposed names are chosen for their symmetry and familiarity. To exhaust an iterator is standard parlance, the -ing
hints that it doesn't do so immediately.
`nonexhausting` could be replaced by `lazy` or `lazy_drop`. The `lazy` part may be confusing because the iterator is already lazy apart from `Drop`. Bikeshedding welcome
after the semantics are nailed down.
# Unresolved questions
[unresolved]: #unresolved-questions
* Self-exhausting iterators that can panic during `next()` can easily run into double panics. If `next()` panics during normal program execution,
then the drop of `Exhausting` will cause `next()` to be called again which is not unlikely to produce another panic, resulting in the whole program to abort.
The self-exhausting `drain_filter` is also subject to this.
We could guard against aborts, by not self-exhausting when the panic occured during iteration (communicated via a flag) or, alternatively, not to self-exhaust under any panic (with `std::thread::panicking`). This is a choice between possibly unnecessary leaks and a higher likelihood of accidentally tearing down a long lived process.
Example of how this would look like:
```rust
impl<T: Iterator> Iterator for Exhausting<T> {
type Item = ...;
fn next(...) -> ... {
self.currently_iterating = true; // no double iteration
let next = self.iter.next();
self.current_iterating = false; // no double iteration
}
}
impl<T: Iterator> Drop for Exhausting<T> {
fn drop(&mut self) {
// if !std::thread::panicking() { // no panicking iteration at all
if !self.currently_iterating { // no double iteration
for _ in self.iter {}
}
}
}
```