Try   HackMD

RFC 3550 design document

Meeting 2024-03-20

🖥️ Rendered RFC

Premise

The Range, RangeInclusive, and RangeFrom types are generally considered to be flawed. RFC 3550 suggests to resolve this by introducing three new types and changing the range syntax to resolve to these new types in Edition 2024.

However, this is an unprecedented kind of Edition change, and there are concerns that the implications for third-party libraries break the interoperability promise of the Edition process. Adding type inference for the range syntax could resolve those concerns.

RFC Motivation

The current iterable range types (Range, RangeFrom, RangeInclusive) implement Iterator directly. This is now widely considered to be a mistake, because it makes implementing Copy for those types hazardous due to how the two traits interact.

However, there is considerable demand for Copy range types for multiple reasons:

  • ergonomic use without needing explicit .clone()s or rewriting the a..b syntax repeatedly
  • use in Copy types (currently people work around this by using a tuple instead)

Another primary motivation is the extra size of RangeInclusive. It uses an extra bool field to keep track of when the upper bound has been yielded by the iterator. This results in a size 50% larger than necessary to just store the bounds, but this extra size is useless when the type is not used as an iterator.

Introducing new range types and new types for their iterators would allow us to correct the ExactSizeIterator impls. Currently, the ExactSizeIterator impls for Range and RangeInclusive are incorrect on platforms with usize of less than 64 bits.

Finally, a new RangeInclusive iterator type could allow for performance optimizations that the current API does not (because it returns the bounds by reference).

RFC Proposal

RFC 3550 proposes to introduce three new range types:

// New range types in `std::ops::range` module
// (or maybe `std::range`)

pub struct Range<Idx> {
    pub start: Idx,
    pub end: Idx,
}
pub struct RangeInclusive<Idx> {
    pub start: Idx,
    pub end: Idx,
}
pub struct RangeFrom<Idx> {
    pub start: Idx,
}

Unlike the legacy range types, these new types will implement Copy and IntoIterator instead of Iterator.

The range syntax would resolve to either these types or the legacy range types depending on edition:

Syntax Edition 2021 and prior Edition 2023 and later
a..b std::ops::Range std::ops::range::Range
a..=b std::ops::RangeInclusive std::ops::range::RangeInclusive
a.. std::ops::RangeFrom std::ops::range::RangeFrom

Effects on Library Interoperability

There are cases where existing APIs specify a legacy range type explicitly or accept Iterators instead of IntoIterators:

pub fn takes_range(range: std::ops::Range<usize>) { ... }
pub fn takes_iter(range: impl Iterator<usize>) { ... }
impl Index<std::ops::Range<usize>> for Bar { ... }

And current code can use range syntax directly in those APIs:

takes_range(5..11);
takes_iter(5..11);
bar[1..8];

Such code will result in errors in Edition 2024, where range syntax will resolve to new distinct types that do not implement Iterator. To gracefully migrate code between editions, cargo fix --edition will add explicit conversions where necessary:

takes_range((5..11).to_legacy());
takes_iter((5..11).into_iter());
bar[(1..8).to_legacy()];

Existing code can be easily migrated using cargo fix --edition, but these explicit conversions are a significant ergonomic downgrade when writing new code using legacy APIs. The explicit conversions also add visual noise that hurts the readability of the migrated code.

Mitigation via Library Updates

Libraries facing these problems are encouraged to issue updates which change their API to accept the new range types in a backwards-compatible way:

// Before
pub fn takes_range(range: std::ops::Range<usize>) { ... }
pub fn takes_iter(range: impl Iterator<usize>) { ... }
impl Index<std::ops::Range<usize>> for Bar { ... }

// After
pub fn takes_range(range: impl Into<range::legacy::Range<usize>>) { ... }
pub fn takes_iter(range: impl IntoIterator<usize>) { ... }
impl Index<range::legacy::Range<usize>> for Bar { ... }
impl Index<range::Range<usize>> for Bar { ... }

This allows users of the library to upgrade to Edition 2024 without cargo fix --edition adding explicit conversions, and new users of the library can use the exact same syntax regardless of which edition they are using.

The RFC recommends that the new types be stabilized as soon as possible so library authors have time to implement these changes.

Concerns

Although not the first to bring them up, Mara summed up the concerns well:

So we end up in a world where library authors have to manually make their public API compatible with both the old and new edition, which very much goes against the idea of editions being a package/crate local decision.

Basically, the edition change only works effortlessly if all libraries that take ranges in their API will update their API before the new edition. This is very unlike other edition changes.

Most importantly, I would like to avoid a world where library maintainers will have to deal with bug reports or feature requests like "please support 2024 ranges" or "please support legacy ranges". In the past, edition changes that resulted in extra work for some library maintainers were not received well, and I don't think we should repeat that mistake on even larger scale.

Alternative: Type Inference

The RFC recommends only a mitigation path where the new types are stabilized well before the new edition is released, allowing third-party libraries to publish updates supporting the new types in time for the new edition.

The best alternative I've seen is to make range syntax similar to integer literals, where the concrete range type is inferred based on context. This would prevent the most prominent ergonomic drawbacks:

  • fewer explicit conversions would be added by cargo fix --edition (possibly zero)
  • new code would not hit issues when interfacing with APIs expecting only the legacy range types, such as
    • takes_range(0..5) with fn takes_range(range: std::ops::Range<usize>)
    • custom_type[0..5] where custom_type: CustomType and CustomType: Index<std::ops::Range<usize>>
    • takes_iter(0..5) with fn takes_iter(iter: impl Iterator<usize>)

Please note that I am not an expert on Rust's type system, so the following is a best-effort attempt at outlining how "range type inference" might work.

  1. Range syntax expressions start out with unknown type {range<_>}
  2. If a specific range type is expected where the value is used, then the type of the value resolves to that:
let lr: legacy::Range<_> = 0..5; // Uses legacy type
let nr: range::Range<_> = 0..5; // Uses new type

fn takes_new_range(r: range::RangeFrom<u8>) {}
takes_new_range(0..); // Uses new type
fn takes_legacy_range(r: legacy::RangeFrom<u8>) {}
takes_legacy_range(0..); // Uses legacy type
  1. If the value is passed as a trait-bound function argument, and only one (legacy xor new) of the range types fulfills that bound, then the type of the value resolves to that:
impl ops::Index<legacy::Range<usize>> for Thing { ... }
fn use_thing(t: Thing) {
    t[3..7]; // Uses legacy range
}

impl ops::Index<range::Range<usize>> for Stuff { ... }
fn use_stuff(s: Stuff) {
    s[3..7]; // Uses new range
}

We could choose to also apply this to method resolution, but that would probably not be worth it.

  1. If neither (2) nor (3) apply, then the type resolves to:
  • Edition 2024 and later: the new range type
  • Edition 2021 and earlier: the legacy range type
let no_expectation = 13..67; // Uses new range
ops::RangeBounds::contains(no_expectation, &5); // (Both types impl `RangeBounds`)

impl ops::Index<legacy::Range<usize>> for Doohicky { ... }
impl ops::Index<range::Range<usize>> for Doohicky { ... }
fn use_doohicky(d: Doohicky) {
    d[3..7]; // Uses new range
}

Based on proposal from Niko documented here

Data Summary

I included only the regressed and test-pass categories in my analysis, excluding the 8251 error, 170 spurious-fixed, 246 spurious-regressed, 2 fixed, and 3 unknown results.

Crate Sources

Total Crates.io Github
229,793 92,273 137,520

Range Syntax usage overall

a..b (Range) a..=b (RangeInclusive) a.. (RangeFrom)
1,312,526 159,341 284,161

a..b is used 8.2x more often than a..=b and 4.6x more often than a..

Some form of range syntax was used in 43% of the sample crates (98,483).

RangeBounds usage in public APIs

Total fn trait impl struct def struct impl trait def enum def enum impl
2,292 2,004 (87.4%) 202 (8.8%) 39 (1.7%) 32 (1.4%) 12 (0.52%) 2 (0.09%) 1 (0.04%)

RangeBounds was used as a trait bound (including impl RangeBounds) in 0.25% of the sample crates (584).

Range Type explicit usage in public APIs (excluding trait impls)

Total fn struct field enum field type alias const item static item
Range 5,395 4,118 (76.3%) 698 (12.9%) 383 (7.1%) 124 (2.3%) 70 (1.3%) 2 (0.04%)
RangeInclusive 1,207 901 (74.7%) 92 (7.6%) 91 (7.5%) 19 (1.6%) 103 (8.5%) 1 (0.08%)
RangeFrom 882 864 (98.0%) 3 (0.34%) 10 (1.13%) 4 (0.45%) 1 (0.11%) 0
Total 7,484 5,883 (78.6%) 793 (10.6%) 484 (6.5%) 147 (2.0%) 147 (2.3%) 3 (0.04%)

Range types were used in public APIs (excluding trait impls) in 0.79% of the sample crates (1,826).

Range Type explicit usage in public trait impls

Total Range RangeInclusive RangeFrom
3,647 1,893 (52%) 687 (19%) 1,067 (29%)

Range types were used in public trait impls in 0.33% of the sample crates (766).

Note: it's likely the following are undercounted due to the data extraction being unable to see into macro invocations

700 of these were impls for ops::Index or ops::IndexMut with one of the range types as the index type.

Another 741 were impls of one of the standard conversion traits (From, Into, TryFrom, TryInto).

Range Type explicit usage in public APIs (all inclusive)

Range types were explicitly used in any public API in 0.89% of sample crates (2,038). 554 crates have both (a) a public trait impl involving a range type and (b) some other API explicitly using a range type.

These constitute the cases where less ergonomic library interop could apply.

I ranked each affected crate by the number of crates that depend on it ("dependents"), because I thought that was a good analog for how often user in the wild will depend directly on such crates. Of the 1,449 total affected crates on crates.io, only 39 crates had more than 100 dependents, but those 39 had more than 93% of the total number of dependents.

Histogram: Dependents Breakdown

The 39 crates with over 100 dependents:

Name Latest Version # of Dependents Most Recent Release Recent Downloads Downloads of Latest Version All Time Downloads
serde 1.0.193 35043 2024-02-20 34268992 3741887 275397546
rand 0.8.5 13441 2024-02-18 29254072 4560 291883321
regex 1.10.2 8439 2024-01-21 28861662 7995426 227215737
bytes 1.5.0 5423 2023-09-07 21418791 25385538 182917006
rayon 1.8.0 3011 2024-02-27 13987087 761638 97231137
bincode 2.0.0-rc.3 2496 2023-03-30 7437296 1188981 57011490
indexmap 2.1.0 1785 2024-02-29 35930957 1229167 205879156
quickcheck 1.0.3 1067 2021-01-15 1366240 7467603 15756539
schemars 0.8.16 859 2023-11-11 3050968 2114567 14500776
ndarray 0.15.6 847 2022-07-30 1359994 5154517 10641179
pest 2.7.5 643 2024-03-02 7220140 234133 57121309
pyo3 0.20.0 551 2024-03-10 4158271 793 24956831
scale-info 2.10.0 443 2024-03-12 1126482 3159 5761728
rustyline 12.0.0 441 2024-03-06 1004925 3829 8560979
bitvec 1.0.1 439 2022-07-10 6590366 21419740 43913058
rocket 0.5.0 403 2023-11-17 394120 248754 4352752
miette 5.10.0 376 2024-03-07 1702824 8128 8179648
tower-http 0.5.0 356 2024-02-23 4727121 115135 28181117
arbitrary 1.3.2 334 2023-10-30 2027869 1861244 10472715
wgpu 0.17.2 331 2024-03-01 598911 32881 3921522
azure_core 0.17.0 318 2024-01-05 237609 50471 1146701
tree-sitter 0.20.10 308 2024-03-10 440632 870 3145906
egui 0.24.1 284 2024-02-14 374680 39643 2031028
bstr 1.8.0 276 2024-02-24 10631156 576274 77912405
logos 0.13.0 204 2024-02-07 752791 6665 5139345
regress 0.7.1 193 2024-02-26 664300 5246 2790599
rkyv 0.7.42 182 2024-02-23 3165527 341 13007693
wiremock 0.5.22 154 2024-02-11 1060791 49514 5201466
fancy-regex 0.12.0 142 2023-12-22 3155625 114428 14967513
glium 0.33.0 140 2024-01-03 119700 4949 1579022
codespan- reporting 0.11.1 137 2021-02-25 3629090 24338635 25631055
object 0.32.1 132 2024-03-11 15658574 685 107373131
sodiumoxide 0.2.7 122 2021-06-24 285641 1558541 2751305
aho-corasick 1.1.2 117 2023-10-09 29259349 25600773 214824438
amplify 4.5.0 113 2024-02-15 50524 7289 308644
bitcoin_ hashes 0.13.0 111 2023-08-24 1040867 109601 5818899
smartstring 1.0.1 107 2022-03-24 1648792 6565392 8816787
wasmer 4.2.4 107 2024-03-04 297222 2706 3636787
similar 2.3.0 105 2023-12-29 2917823 1033808 15768494

32 of the 39 (82%) have issued a release within the last 7 months.

Months since Last Release

I analyzed each crate to categorize exactly what kind of explicit Range usage they have.


Here is a breakdown of those categories:

derive trait bound trait method index from fn field or alias
total occurences 22 4 9 67 15 63 5
affected crates 6 1 4 10 8 13 5
  • derive: range impl for trait primarily used for recursively derived traits, such as #[derive(serde::Serialize)]
  • trait bound: range impl for trait primarily used as the bound in a function, like fn gen_range(range: impl SampleRange)
  • trait method: impl for trait primarily used via a method call, like (0..11).into_par_iter()
  • index: Index<Range*> implementations
  • from: From<Range*> implementations
  • fn: free-standing or associated functions take a range type as a parameter
  • field or alias: struct field or type alias containing a range type

Conclusion

Comparison

I wanted to provide a good overview of what could be supported

derive trait bound trait method index from fn field or alias
Do Nothing / Reject RFC OK OK OK OK OK OK OK
Accept Current RFC X X X X X X X
Type Inference X OK X OK X OK OK
Copy Trait Impls OK OK OK OK OK X X
  • OK: can use new range types the same as old range types without library changes or explicit conversions
  • X: libraries must add support for the new range types

Note: "Copy Trait Impls" refers to a hypothetical option where a compiler hack automatically duplicates any third-party trait impls with the old range types to the new range types. I don't consider it a viable option, but included it as an example that would cover the cases that "Type Inference" would not.

My Position

I stand by the RFC as written: stabilize the new range types ASAP and encourage libraries to make changes to ergonomically support them. The changes libraries need to make are pretty straightforward, and most of the affected popular libraries are actively maintained.

Type inference alone would help significantly for cases like indexing, but it won't help cases like serde::Serialize.

I think we can provide enough mitigations to minimize the issue without the need for special handling at the language level:

  • Forward-compatibility lint for trait impls involving a legacy range type without a corresponding impl with the new range type
  • Diagnostic suggesting new-to-legacy conversions where necessary
  • Lint unnecessary new-to-legacy conversions
  • Stabilize the new range types ASAP
  • Community initiative updating libraries to support the new range types
    (I personally plan on helping with this, starting with the most popular crates and moving down the list)

While this will be burdensome on maintainers to a degree, I think most would agree that having better range types is worth it.

Finally, a quick reminder:

  • The issues presented here are not breaking changes: cargo fix --edition will 100% cover any interop between old and new range types
  • The new range types come with significant benefits:
    • Smaller RangeInclusive
    • Fixed ExactSizeIterator impls
    • More opportunities for optimizing the RangeInclusive iterator
    • Copy for all range types