optimize me baby

# optimize me baby I recently came across this code sample: ```rust pub fn aaa(v: Vec<usize>) -> Vec<usize> { v.into_iter().map(|a| a + 3).map(|a| a * 4).collect() } ``` It looks very simple. It's idiomatic Rust code. But, is it fast? It uses iterators, two `map` calls and `collect`s in the end. In many high level languages, this code would be slower than the handwritten loop. But [not in Rust](https://rust.godbolt.org/z/888xG6Es5). ```assembly .LCPI0_0: .quad 12 .quad 12 example::aaa: mov rax, rdi mov rcx, qword ptr [rsi] mov rdx, qword ptr [rsi + 8] mov rsi, qword ptr [rsi + 16] test rsi, rsi je .LBB0_7 cmp rsi, 4 jae .LBB0_3 xor edi, edi jmp .LBB0_6 .LBB0_3: mov rdi, rsi and rdi, -4 xor r8d, r8d movdqa xmm0, xmmword ptr [rip + .LCPI0_0] .LBB0_4: movdqu xmm1, xmmword ptr [rcx + 8*r8] movdqu xmm2, xmmword ptr [rcx + 8*r8 + 16] psllq xmm1, 2 psllq xmm2, 2 paddq xmm1, xmm0 paddq xmm2, xmm0 movdqu xmmword ptr [rcx + 8*r8], xmm1 movdqu xmmword ptr [rcx + 8*r8 + 16], xmm2 add r8, 4 cmp rdi, r8 jne .LBB0_4 cmp rsi, rdi je .LBB0_7 .LBB0_6: mov r8, qword ptr [rcx + 8*rdi] lea r8, [4*r8 + 12] mov qword ptr [rcx + 8*rdi], r8 lea r8, [rdi + 1] mov rdi, r8 cmp rsi, r8 jne .LBB0_6 .LBB0_7: mov qword ptr [rax], rcx mov qword ptr [rax + 8], rdx mov qword ptr [rax + 16], rsi ret ``` If you don't understand the precise details of the assembly, that's fine, it doesn't really matter. What matters is that this code is free of both allocations and function calls. It's simple straightfoward code that reuses the allocation. Even better, it uses SIMD instructions in its hot loop (which can be inferred from the instructions mnemonics (their name) looking like someone smashed their head on the keyboard). How do we go from that neat one-liner to this? Your Rust code goes through a long journey when you run `cargo build` or `rustc`. We still skip cargo here and jump straight into `rustc`. For context, this was written in June 2023. Some of the implemention details may have changed if you're reading this later, but the general idea remains. # Early stages At the beginning, there was source code. The source code gets tokenized and parsed into an AST, an abstract syntax tree. ASTs are something you will find in almost every compiler or interpreter. They're a simple way to represent source code in a structured way. If you want to learn more about ASTs, check out [the chapter](https://craftinginterpreters.com/representing-code.html) in the amazing Crafting Interpreters, a great book that I can recommend to everyone wanting to learn more about programming language implementations. rustc expands macros on the AST. We don't use macros in our code, so that doesn't do much. We can dump the macro expanded AST (using `-Zunpretty=expanded`). ```rust use ::std::prelude::rust_2015::*; extern crate std; pub fn aaa(v: Vec<usize>) -> Vec<usize> { v.into_iter().map(|a| a + 3).map(|a| a * 4).collect() } ``` The prelude import is injected, but nothing more. Let's go a step further. After the AST, the code is lowered to HIR, the "high level intermediate representation". HIR is similar to the AST, with some constructs "lowered" or "desugared". "desugaring" is the process of taking a langauge feature and representing it in a different way that can still be expressed in the language. For example, here's how a `for` loop and range syntax are desugared. ```rust fn main() { for i in 0..1 {} } ``` ```rust fn main() { match IntoIter::into_iter(Range { start: 0, end: 1 }) { mut iter => loop { match #[lang = "next"](&mut iter) { None => break, Some(i) => {} } }, } } ``` The HIR lowers many things, but in our case there's nothing interesting, other than the very interesting formatting of the HIR pretty-printer (PRs welcome). ```rust use ::std::prelude::rust_2015::*; extern crate std; fn aaa(v: Vec<usize>) -> Vec<usize> { v.into_iter().map(|a| a + 3).map(|a| a * 4).collect() } ``` Quickly skipping THIR (typed HIR) because it's not important and doesn't even have a proper pretty-printer we land at MIR, the mid-level intermediate representation. It's where borrow checking and several optimizations happen. MIR is quite different from Rust as it's not a normal tree of expressions but a control flow graph. It's basically a flowchart, you've probably seen those when learning how to program. Here starts the point in this blog where I have to slightly start lying. While I will also show the MIR (and the following IRs) in their real form, I will also try to represent them in normal Rust to attempt to better illustrate what exactly is happening. So without further ado, let's jump into the MIR. # MIR MIR has several phases that can all be accessed using `-Zdump-mir`. I will be highlighting the most important ones. First, let's look at the `analysis` MIR phase, which is used by the borrow checker. This is just the MIR for the `aaa` function, not for the closures, which both have their own MIR. But since they are so simple, we'll leave them out for now. They will come in later. MIR uses `StorageLive` and `StorageDead` statements to control the range of statements where a local variable is considered "live", so where it can be used. We will ignore these statements here and I removed them from the source as they are not important and mostly distracting for what we're trying to do here. ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let mut _0: std::vec::Vec<usize>; let mut _2: std::iter::Map<std::iter::Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]>; let mut _3: std::iter::Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>; let mut _4: std::vec::IntoIter<usize>; let mut _5: std::vec::Vec<usize>; let mut _6: [closure@src/lib.rs:2:21: 2:24]; let mut _7: [closure@src/lib.rs:2:36: 2:39]; bb0: { _5 = move _1; _4 = <Vec<usize> as IntoIterator>::into_iter(move _5) -> [return: bb1, unwind: bb9]; } bb1: { _6 = [closure@src/lib.rs:2:21: 2:24]; _3 = <std::vec::IntoIter<usize> as Iterator>::map::<usize, [closure@src/lib.rs:2:21: 2:24]>(move _4, move _6) -> [return: bb2, unwind: bb8]; } bb2: { _7 = [closure@src/lib.rs:2:36: 2:39]; _2 = <Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]> as Iterator>::map::<usize, [closure@src/lib.rs:2:36: 2:39]>(move _3, move _7) -> [return: bb3, unwind: bb7]; } bb3: { _0 = <Map<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]> as Iterator>::collect::<Vec<usize>>(move _2) -> [return: bb4, unwind: bb6]; } bb4: { drop(_1) -> [return: bb5, unwind: bb11]; } bb5: { return; } } ``` The MIR consists of several "basic blocks" with control flow edges between them. Since there is no interesting control flow going on, each block just calls its successor. There is some nuance around unwinding, but that's for another time, we will ignore unwinding here. Translating this to straightforward Rust would look something like the following: ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let _5 = _1; let _4 = Vec::into_iter(_5); let _6 = |a| a * 3; let _3 = <std::vec::IntoIter<usize> as Iterator>::map(_4, _6); let _7 = |a| a * 4; let _2 = <Map<std::vec::IntoIter<usize>, _> as Iterator>::map(_3, _7); let _0 = <Map<Map<std::vec::IntoIter<usize>, _>, _> as Iterator>::collect::<Vec<usize>>(move _2); return _0; } ``` All temporary variables have been turned into their own local variables and all method calls are resolved into fully qualified syntax, but otherwise everything remains the same. We first create the iterator, map twice and collect the result. This code passes the borrow checker and we can move on. Before we get into exciting compiler optimizations, let's go on a small tangent. As we have seen in the beginning, the allocation was reused. This is not an optimization rustc performs today, something else is up. But what is up? # The wondrous world of iterator specializations At the end of the function, we called `<Map<Map<std::vec::IntoIter<usize>, _>, _> as Iterator>::collect::<Vec<usize>>`. Where will we end up? By default, [`Iterator::collect::<B>`](https://doc.rust-lang.org/1.69.0/src/core/iter/traits/iterator.rs.html#1887-1892) calls `<B as FromIterator>::from_iter`. `Map` does not override that method. So, forward to [`<Vec<T> as FromIterator>::from_iter`](https://doc.rust-lang.org/1.69.0/src/alloc/vec/mod.rs.html#2723-2725). `<Self as SpecFromIter<T, I::IntoIter>>::from_iter(iter.into_iter())`, that's a great start. So there is indeed specialization happening here. The (internal) documentation (that can be seen on https://stdrs.dev or in your local rust-lang/rust checkout) for `SpecFromIter` shows a nice diagram for what's happening here. ``` +-------------+ |FromIterator | +-+-----------+ | v +-+-------------------------------+ +---------------------+ |SpecFromIter +---->+SpecFromIterNested | |where I: | | |where I: | | Iterator (default)----------+ | | Iterator (default) | | vec::IntoIter | | | TrustedLen | | SourceIterMarker---fallback-+ | +---------------------+ +---------------------------------+ ``` We're looking for the most specific implementation that applies to `SpecFromIter<T, Map<Map<std::vec::IntoIter<usize>, _>, _>`, or to keep it short, `SpecFromIter<T, Map<_, _>`. There are several implementations of `SpecFromIter`: - `impl<T, I: Iterator<Item = T>> SpecFromIter<T, I> for Vec<T>` - the most general fallback - `impl<T> SpecFromIter<T, IntoIter<T>> for Vec<T>` - a specific special case for `into_iter().collect()` - `impl<T, I> SpecFromIter<T, I> for Vec<T> where I: Iterator<Item = T> + SourceIter<Source: AsVecIntoIter> + InPlaceIterableMarker` - the specialization we are interested in here. This specialization uses some private unsafe traits to obtain more direct access to the iterator and reuse the allocation. This as as deep as we will go here, but for anyone brave enough to jump deeper, here's a [starting point](https://github.com/rust-lang/rust/blob/eda41addfcf8112e69531f56ca8c478509be0135/library/alloc/src/vec/in_place_collect.rs#L1). # Back to MIR There are many interesting MIR changes between our first look at the MIR and now, they don't significantly change the MIR for our code example, so they are left out. But now, we get to something that does change it: inlining. Inlining is an optimization that's very simple to understand conceptually, but is pretty much the most important optimization done in optimizing compilers, especially in abstraction-heavy languages like Rust. Inlining is what gets us the "zero" in "zero cost". Inlining at the MIR level is still relatively new and not very powerful, so it's not too exciting yet. We will come back to inlining later. After MIR inlining, our function looks like this. ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let mut _0: std::vec::Vec<usize>; let mut _2: std::iter::Map<std::iter::Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]>; let mut _3: std::iter::Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>; let mut _4: std::vec::IntoIter<usize>; let mut _5: std::vec::Vec<usize>; let mut _6: [closure@src/lib.rs:2:21: 2:24]; let mut _7: [closure@src/lib.rs:2:36: 2:39]; bb0: { _5 = move _1; _4 = <Vec<usize> as IntoIterator>::into_iter(move _5) -> [return: bb1, unwind unreachable]; } bb1: { _6 = [closure@src/lib.rs:2:21: 2:24]; _3 = Map::<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]> { iter: move _4, f: move _6 }; _7 = [closure@src/lib.rs:2:36: 2:39]; _2 = Map::<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]> { iter: move _3, f: move _7 }; _8 = <Map<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]> as IntoIterator>::into_iter(move _2) -> [return: bb2, unwind unreachable]; } bb2: { _0 = <Vec<usize> as vec::spec_from_iter::SpecFromIter<usize, Map<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]>>>::from_iter(move _8) -> [return: bb3, unwind unreachable]; } bb3: { return; } } ``` Translated to Rust, this still looks similar to the previous one but a little different. ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let _5 = _1; let _4 = Vec::into_iter(_5); let _6 = |a| a * 3; let _3 = Map { iter: _4, f: _6 }; let _7 = |a| a * 4; let _2 = _2 = Map { iter: _3, f: _7 }; let _8 = <Map<Map<std::vec::IntoIter<usize>, _>, _> as IntoIterator>::into_iter(_2); let _0 = <Vec<usize> as vec::spec_from_iter::SpecFromIter<usize, Map<Map<std::vec::IntoIter<usize>, _>, _>>>::from_iter(_8); return _0; } ``` `Iterator::map` and `Iterator::collect` has been inlined. `Map` is constructed right in our function and the internals of `collect` that we've looked at before now lie exposed. After a few more cleanups, we end up with the final MIR: ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let mut _0: std::vec::Vec<usize>; let mut _2: std::iter::Map<std::iter::Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]>; let mut _3: std::iter::Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>; let mut _4: std::vec::IntoIter<usize>; bb0: { StorageLive(_2); StorageLive(_3); StorageLive(_4); _4 = <Vec<usize> as IntoIterator>::into_iter(move _1) -> [return: bb1, unwind unreachable]; } bb1: { _3 = Map::<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]> { iter: move _4, f: const ZeroSized: [closure@src/lib.rs:2:21: 2:24] }; StorageDead(_4); _2 = Map::<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]> { iter: move _3, f: const ZeroSized: [closure@src/lib.rs:2:36: 2:39] }; StorageDead(_3); StorageLive(_5); _5 = <Map<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]> as IntoIterator>::into_iter(move _2) -> [return: bb2, unwind unreachable]; } bb2: { _0 = <Vec<usize> as vec::spec_from_iter::SpecFromIter<usize, Map<Map<std::vec::IntoIter<usize>, [closure@src/lib.rs:2:21: 2:24]>, [closure@src/lib.rs:2:36: 2:39]>>>::from_iter(move _5) -> [return: bb3, unwind unreachable]; } bb3: { StorageDead(_5); StorageDead(_2); return; } } ``` ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let _4 = Vec::into_iter(_1); let _3 = Map { iter: _4, f: const { |a| a * 3 } }; let _2 = _2 = Map { iter: _3, f: const { |a| a * 4 } }; let _5 = <Map<Map<std::vec::IntoIter<usize>, _>, _> as IntoIterator>::into_iter(_2); let _0 = <Vec<usize> as vec::spec_from_iter::SpecFromIter<usize, Map<Map<std::vec::IntoIter<usize>, _>, _>>>::from_iter(_5); return _0; } ``` The `const` blocks are purely illustrative and don't represent the actual semantics of `const` blocks. Since the closures don't capture anything they are zero-sized-types, for which MIR has special handling to simplify how they can be used. And this ends our journey with MIR, it's now time to go to the next step. # [LLVM IR](https://llvm.org/docs/LangRef.html) LLVM is the codegen backend used by rustc and many other compilers like clang or the swift compiler. While there are other rustc backends in development (mainly [`rustc_codegen_cranelift`](https://github.com/bjorn3/rustc_codegen_cranelift) and [`rustc_codegen_gcc`](https://github.com/rust-lang/rustc_codegen_gcc)), LLVM is the main backend that is used by default, so we will exclusively look at it. GCC will be similar, but cranelift does less optimizations so it will not achive the same level of code quality. I will be using the [LLVM Opt Pipeline Viewer on godbolt.org](https://rust.godbolt.org/z/WzTeooEra). The intial LLVM IR looks like a straightforward lowering of the MIR. ```llvm-ir define void @aaa(ptr noalias nocapture noundef sret(%"alloc::vec::Vec<usize>") dereferenceable(24) %0, ptr noalias nocapture noundef dereferenceable(24) %v) unnamed_addr { start: %_5 = alloca %"core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, [closure@<source>:3:21: 3:24]>, [closure@<source>:3:36: 3:39]>", align 8 %self2 = alloca %"alloc::vec::into_iter::IntoIter<usize>", align 8 %self1 = alloca %"core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, [closure@<source>:3:21: 3:24]>", align 8 %self = alloca %"core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, [closure@<source>:3:21: 3:24]>, [closure@<source>:3:36: 3:39]>", align 8 call void @"_ZN90_$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..iter..traits..collect..IntoIterator$GT$9into_iter17h43b0bafd8dfc0f5fE"(ptr noalias nocapture noundef sret(%"alloc::vec::into_iter::IntoIter<usize>") dereferenceable(32) %self2, ptr noalias nocapture noundef dereferenceable(24) %v) call void @llvm.memcpy.p0.p0.i64(ptr align 8 %self1, ptr align 8 %self2, i64 32, i1 false) call void @llvm.memcpy.p0.p0.i64(ptr align 8 %self, ptr align 8 %self1, i64 32, i1 false) call void @"_ZN63_$LT$I$u20$as$u20$core..iter..traits..collect..IntoIterator$GT$9into_iter17hd3a5ce23ac56a9bbE"(ptr noalias nocapture noundef sret(%"core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, [closure@<source>:3:21: 3:24]>, [closure@<source>:3:36: 3:39]>") dereferenceable(32) %_5, ptr noalias nocapture noundef dereferenceable(32) %self) call void @"_ZN5alloc3vec16in_place_collect108_$LT$impl$u20$alloc..vec..spec_from_iter..SpecFromIter$LT$T$C$I$GT$$u20$for$u20$alloc..vec..Vec$LT$T$GT$$GT$9from_iter17h4cc42e53518e2ce4E"(ptr noalias nocapture noundef sret(%"alloc::vec::Vec<usize>") dereferenceable(24) %0, ptr noalias nocapture noundef dereferenceable(32) %_5) ret void } ``` There is one important bit that that can be seen immediately. Some of the symbols mentioned here are _mangled_, meaning their full paths have been converted to "normal" symbol characters. Some of these symbols have been demangled by godbolt, but others haven't. We can still roughly understand which paths they correspond to by looking at the alphabetical characters inside of them. Alternatively, a tool like [rustfilt](https://crates.io/crates/rustfilt) can be used to demangle them. Just like MIR, LLVM contains statements to control the liveness for variables (in fact, the reason MIR has them is mostly because they lower to the LLVM ones). In LLVM IR, they are calls to `@llvm.lifetime` intrinsics. Translating this code to Rust leads to simliar code. ```rust fn aaa(_1: Vec<usize>) -> Vec<usize> { let self2: vec::IntoIter<usize> = Vec::into_iter(_1); let self1: Map<vec::IntoIter<usize>, _> = self2; let self: Map<Map<vec::IntoIter<usize>, _>, _> = self1; let _5 = IntoIterator::into_iter(self); let _0 = alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for Vec<T>>::from_iter(_5); return _0; } ``` This translation is fairly approximate. Several redundant looking assignments can be seen, these are leftovers from MIR where the types actually mattered. Now, the inliner is run across this function. This generates a lot of LLVM IR that's hard to read. Let's skip it over and go right to the Rust code equivalent. ```llvmir define void @aaa(ptr noalias nocapture noundef sret(%"alloc::vec::Vec<usize>") dereferenceable(24) %0, ptr noalias nocapture noundef dereferenceable(24) %v) unnamed_addr #0 personality ptr @rust_eh_personality { start: %_5 = alloca %"core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, [closure@<source>:3:21: 3:24]>, [closure@<source>:3:36: 3:39]>", align 8 %self2 = alloca %"alloc::vec::into_iter::IntoIter<usize>", align 8 %self1.sroa.0 = alloca %"alloc::vec::into_iter::IntoIter<usize>", align 8 %self = alloca %"core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, [closure@<source>:3:21: 3:24]>, [closure@<source>:3:36: 3:39]>", align 8 %me.sroa.0.0.copyload.i = load ptr, ptr %v, align 8 %me.sroa.4.0.self.sroa_idx.i = getelementptr inbounds i8, ptr %v, i64 8 %me.sroa.4.0.copyload.i = load i64, ptr %me.sroa.4.0.self.sroa_idx.i, align 8 %me.sroa.5.0.self.sroa_idx.i = getelementptr inbounds i8, ptr %v, i64 16 %me.sroa.5.0.copyload.i = load i64, ptr %me.sroa.5.0.self.sroa_idx.i, align 8 %_14.i = getelementptr inbounds i64, ptr %me.sroa.0.0.copyload.i, i64 %me.sroa.5.0.copyload.i store ptr %me.sroa.0.0.copyload.i, ptr %self2, align 8 %1 = getelementptr inbounds %"alloc::vec::into_iter::IntoIter<usize>", ptr %self2, i64 0, i32 3 store i64 %me.sroa.4.0.copyload.i, ptr %1, align 8 %2 = getelementptr inbounds %"alloc::vec::into_iter::IntoIter<usize>", ptr %self2, i64 0, i32 4 store ptr %me.sroa.0.0.copyload.i, ptr %2, align 8 %3 = getelementptr inbounds %"alloc::vec::into_iter::IntoIter<usize>", ptr %self2, i64 0, i32 5 store ptr %_14.i, ptr %3, align 8 call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %self1.sroa.0, ptr noundef nonnull align 8 dereferenceable(32) %self2, i64 32, i1 false) call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %self, ptr noundef nonnull align 8 dereferenceable(32) %self1.sroa.0, i64 32, i1 false) call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %_5, ptr noundef nonnull align 8 dereferenceable(32) %self, i64 32, i1 false) %self1.i = load ptr, ptr %_5, align 8 %4 = getelementptr inbounds %"alloc::vec::into_iter::IntoIter<usize>", ptr %_5, i64 0, i32 3 %_29.i = load i64, ptr %4, align 8 %5 = getelementptr i8, ptr %_5, i64 16 %iterator.val.i = load ptr, ptr %5, align 8 %6 = getelementptr i8, ptr %_5, i64 24 %iterator.val4.i = load ptr, ptr %6, align 8 %7 = ptrtoint ptr %iterator.val4.i to i64 %8 = ptrtoint ptr %iterator.val.i to i64 %9 = sub nuw i64 %7, %8 %10 = lshr i64 %9, 3 %.not.i.i = icmp eq ptr %iterator.val4.i, %iterator.val.i br i1 %.not.i.i, label %"_ZN5alloc3vec16in_place_collect108_$LT$impl$u20$alloc..vec..spec_from_iter..SpecFromIter$LT$T$C$I$GT$$u20$for$u20$alloc..vec..Vec$LT$T$GT$$GT$9from_iter17h4cc42e53518e2ce4E.exit", label %bb6.preheader.i.i bb6.preheader.i.i: ; preds = %start %umax.i.i = call i64 @llvm.umax.i64(i64 %10, i64 1) br label %bb6.i.i bb6.i.i: ; preds = %bb6.i.i, %bb6.preheader.i.i %iter.sroa.0.04.i.i = phi i64 [ %11, %bb6.i.i ], [ 0, %bb6.preheader.i.i ] %11 = add nuw nsw i64 %iter.sroa.0.04.i.i, 1 %src.i.i.i.i.i.i.i = getelementptr inbounds i64, ptr %iterator.val.i, i64 %iter.sroa.0.04.i.i %12 = load i64, ptr %src.i.i.i.i.i.i.i, align 8 %13 = shl i64 %12, 2 %14 = add i64 %13, 12 %dst.i.i = getelementptr inbounds i64, ptr %self1.i, i64 %iter.sroa.0.04.i.i store i64 %14, ptr %dst.i.i, align 8 %exitcond.not.i.i = icmp eq i64 %11, %umax.i.i br i1 %exitcond.not.i.i, label %"_ZN5alloc3vec16in_place_collect108_$LT$impl$u20$alloc..vec..spec_from_iter..SpecFromIter$LT$T$C$I$GT$$u20$for$u20$alloc..vec..Vec$LT$T$GT$$GT$9from_iter17h4cc42e53518e2ce4E.exit", label %bb6.i.i "_ZN5alloc3vec16in_place_collect108_$LT$impl$u20$alloc..vec..spec_from_iter..SpecFromIter$LT$T$C$I$GT$$u20$for$u20$alloc..vec..Vec$LT$T$GT$$GT$9from_iter17h4cc42e53518e2ce4E.exit": ; preds = %start, %bb6.i.i store i64 0, ptr %4, align 8 store ptr inttoptr (i64 8 to ptr), ptr %_5, align 8 store ptr inttoptr (i64 8 to ptr), ptr %5, align 8 store ptr inttoptr (i64 8 to ptr), ptr %6, align 8 store ptr %self1.i, ptr %0, align 8 %vec.sroa.4.0..sroa_idx.i = getelementptr inbounds i8, ptr %0, i64 8 store i64 %_29.i, ptr %vec.sroa.4.0..sroa_idx.i, align 8 %vec.sroa.5.0..sroa_idx.i = getelementptr inbounds i8, ptr %0, i64 16 store i64 %10, ptr %vec.sroa.5.0..sroa_idx.i, align 8 ret void } ``` ```rust fn aaa(v: Vec<usize>) -> Vec<usize> { let self2: alloc::vec::into_iter::IntoIter<usize>; let v_ptr = v.ptr; // me.sroa.0.0.copyload.i let v_cap = v.second_field; // me.sroa.4.0.copyload.i let v_len = v.third_field; // me.sroa.5.0.copyload.i let _14_i = v_ptr.add(v_len); // Build IntoIter self2.buf = v_ptr; self2.cap = v_cap; // cap self2.start = v_ptr; // start self2.end = _14_i; // end let self1_sroa_0: alloc::vec::into_iter::IntoIter<usize> = self2; let self: core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, _>, _> = %self1_sroa_0; let _5: core::iter::adapters::map::Map<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<usize>, _>, _> = self; let self1_i = _5.buf; // ptr let _4 = _5.start; // start let _29_i = *_4; let _5 = _5.field_at_byte_offset_sixteen_which_is_offset_2; %5 = getelementptr i8, ptr %_5, i64 16 %iterator.val.i = load ptr, ptr %5, align 8 %6 = getelementptr i8, ptr %_5, i64 24 %iterator.val4.i = load ptr, ptr %6, align 8 %7 = ptrtoint ptr %iterator.val4.i to i64 %8 = ptrtoint ptr %iterator.val.i to i64 %9 = sub nuw i64 %7, %8 %10 = lshr i64 %9, 3 %.not.i.i = icmp eq ptr %iterator.val4.i, %iterator.val.i } ``` `std::vec::IntoIter` layout: ```rust /* 0 */ buf: NonNull<T>, /* _ */ phantom: PhantomData<T>, /* 3 */ cap: usize, /* _ */ alloc: ManuallyDrop<A>, /* 4 */ ptr: *const T, /* 5 */ end: *const T, ```