Range Reform

tags: Libs RFCs RFC Draft

Summary

This RFC proposes adding new versions of the range types, and changing the range syntax desugaring to use those types in the 2021 edition. By implementing IntoIterator rather than Iterator, the interfaces of the new types avoid some of the awkward drawbacks of the existing types.

Motivation

The Rust language's range syntax (.., a..b, a.., ..b, a..=b, a..=) desugar into types defined in the core::ops module (RangeFull, Range, RangeFrom, RangeTo, RangeInclusive, and RangeToInclusive respectively). Of those types, Range, RangeFrom, and RangeInclusive implement the Iterator trait. This implementation imposes a couple of constraints:

  • We do not allow iterator types to be Copy, as it can cause confusion when an implicit copy of an iterator is advanced rather than the original.
  • While most of the range types are simple POD types with public start and/or end fields, RangeInclusive is not, as it has to keep track of an extra bit of state to ensure its Iterator implementation behaves correctly.

However, these types have many uses other than as iterators. For example, slicing syntax is generic over range types to easily pull different bits out of the sequence (buf[..10], buf[..], etc). Similarly, the rand crate's Rand::gen_range method is takes in a range argument to support selection from both half-open and inclusive ranges. In other cases, a developer implementing a data structure may want to store a Range<usize> field to represent the start and end of some region of the structure since it is more self-describing than just a pair of usize.

When working with ranges in these contexts, Iterator implementation is superfluous, but the limitations on the types imposed by the implementation makes the types more awkward to work with and adds unnecessary overhead. In the case of gen_range, RangeInclusive's bounds can only be accessed via methods that return references. In the case of using a Range<usize> as a field, that prevents the outer type from ever being Copy.

Fortunately, it's not actually necessary for the types to implement Iterator in the first place! The most common way to work with iterators is through a for loop:

for i in 0..10 {
    println!("iteration {}", i);
}

But, for loop syntax does not require the expression on the right-hand side of the in keyword to implement Iterator itself, but rather the IntoIterator trait, which allows a type to be converted into an iterator without itself being an iterator.

If range syntax were added today, these types would clearly implement the IntoIterator trait, but they were initially added before that existed and things were unfortunately not updated after IntoIterator's introduction. Luckily, we can use an edition boundary to fix this oversight.

Guide-level explanation

Explain the proposal as if it was already included in the language and you were teaching it to another Rust programmer. That generally means:

  • Introducing new named concepts.
  • Explaining the feature largely in terms of examples.
  • Explaining how Rust programmers should think about the feature, and how it should impact the way they use Rust. It should explain the impact as concretely as possible.
  • If applicable, provide sample error messages, deprecation warnings, or migration guidance.
  • If applicable, describe the differences between teaching this to existing Rust programmers and new Rust programmers.

For implementation-oriented RFCs (e.g. for compiler internals), this section should focus on how compiler contributors should think about the change, and give examples of its concrete impact. For policy RFCs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms.

Reference-level explanation

For each of the iterable range types, a corresponding module will be added to core::ops. Taking Range as an example, the core::ops::range module will contain two types:

pub struct Range<T> {
    pub start: T,
    pub end: T,
}

pub struct IntoIter<T> {
    pub start: T,
    pub end: T,
}

The IntoIter type is just the original core::ops::Range type moved and renamed. It will continue to have exactly the same API and implement the same traits.

The new Range type will implement the same methods and traits as its original counterpart, with a few exceptions:

  • It will implement Copy for T: Copy.
  • It will not implement Iterator.
  • It will implement IntoIterator<Item = T, IntoIter = IntoIter<T>>.

Notably, both types will implement the traits required to be used in places like slice indexing.

Back in the core::ops module, an edition specific type alias will be defined for the name Range. When building a crate for the 2021 edition, the alias will map to core::ops::range::Range, and when building a crate for an earlier edition, the alias will map to core::ops::range::IntoIter. A prototype implementation of the compiler side of this exists in rust-lang/rust#82489:

#[rustc_per_edition]
pub type Range<T> = (
    range::IntoIter<T>, // in the 2015 edition
    range::IntoIter<T>, // in the 2018 edition
    range::Range<T>, // in the 2021 edition
);

The same pattern is used for RangeFrom and RangeInclusive.

The module structure and naming conventions here match the pattern established by the std::collections module, where e.g. HashMap is defined as std::collections::hash_map::HashMap and reexported at the std::collections level and its iterator types are named after their functionality and located in the std::collections::hash_map module.

The RangeFull, RangeTo, and RangeToInclusive types do not implement Iterator, and so can remain as they are.

Rustfix

Rustfix uses a two-pass approach to update crates to a new edition. In the first pass, the code is updated to compile properly in both the 2018 and 2021 editions. In this case, that will involve:

  • Any expression calling an Iterator method directly on a Range, RangeFrom, or RangeInclusive type will have an .into_iter() call inserted. Since all iterators already implement IntoIterator<IntoIter = Self>, this will compile on both editions.
  • Any explicit use of the core::ops::Range, core::ops::RangeFrom, or core::ops::RangeInclusive types will be changed to core::ops::range::IntoIter, core::ops::range_from::IntoIter, or core::ops::range_inclusive::IntoIter respectively. While the names in the core::ops module change based on the edition, the names in the respective submodule will be fixed across editions.
  • Any expression that passes a Range, RangeFrom, or RangeInclusive value to a method expecting either that exact type or a type implementing a certain trait will have an .into_iter() call inserted. As an exception, the compiler knows that the standard library APIs consuming range types (for e.g. slice indexing) work with both the old and new versions of the types so they can remain unchanged.

Examples

let x = (0..10).collect::<Vec<_>>();
// converts to
let x = (0..10).into_iter().collect::<Vec<_>>();
pub struct Foo {
    range: Range<usize>,
}
// converts to
pub struct Foo {
    range: core::ops::range::IntoIter<usize>,
}
let x = rand::thread_rng().gen_range(0..=10);
// converts to
let x = rand::thread_rng().gen_range((0..=10).into_iter());
let y = &slice[..5];
// is not changed

The second pass of rustfix changes code in ways that would stop it from compiling in the 2018 edition but more closely matches the intended idioms of the 2021 edition. In this case, a potential change could be to switch explicit uses of the range types back from IntoIters to the new definitions, but it may be difficult to ensure that the modified code will compile. In the worst case, this cleanup may be left to the developer.

Removing .into_iter() calls when passing a range type to a third party API will unfortunately be a manual action, since even if the compiler can check that the new range type implements the relevant trait bound, it cannot guarantee that the behavior of the implementations of those traits will be equivalent.

Examples

pub struct Foo {
    range: core::ops::range::IntoIter<usize>,
}
// maybe converts to (??)
pub struct Foo {
    range: Range<usize>,
}

Drawbacks

Having multiple versions of the range types that very slightly will almost certainly cause some amount of confusion down the line. No matter what name we choose for these new types, it will almost certainly be worse than the original names.

It may be difficult to make rustfix's update logic exactly precise, particularly in cases where a range expression is being passed to a method. However, hopefully we can make something "good enough" to catch the common cases leaving a small number of errors that have to be manually fixed when migrating editions.

In the transitional period where crates are updating to the 2021 edition, there will be a period where awkward transitions between the range types are required to pass them from 2021 edition crates to APIs defined in 2015/2018 edition crates.

Rationale and alternatives

To solve the Copy issue, @eddyb has proposed adding a #[must_clone] annotation. When applied to a Copy type, it would warn whenever the type is implicitly copied, telling users to call .clone() explicitly instead. This would then allow Range, RangeFrom, and RangeInclusive to implement Copy while avoiding the confusion around implicit copies during iteration.

However, this does not solve the other problems caused by the Iterator implementations. It also feels a bit unfortunate to say that a type is Copy, but that you shouldn't ever actually try to copy it! In the contexts where you aren't using the range type as an iterator at all, you wouldn't want to have to ever deal with manual .clone() calls.

Unresolved questions

N/A

Future possibilities

N/A

Select a repo