Watching a Vec Grow Up and Move: A Debugging Adventure

# Watching a Vec Grow Up and Move: A Debugging Adventure So you want to see what *actually* happens when a Rust `Vec` runs out of room and needs to grow? Excellent. You've come to the right place. We're going to use LLDB to watch the entire reallocation dance, step by step, byte by byte. Once you see the details, your intuition sharpens, this is the kind of understanding that pays off everywhere. ## Prerequisites ### 1. The Stdlib Opacity Problem Here's a fun surprise: Rust's standard library ships **pre-compiled with optimizations**, even when your own code is built in debug mode. This means when you try to step through `Vec::push` or `realloc`, LLDB will cheerfully inform you that variables are `[opt]`: optimized away, unavailable, gone. The code becomes opaque precisely when you want transparency. This is, shall we say, *suboptimal* for learning. The fix is to rebuild the standard library from source with debug info. Yes, really. It's not as scary as it sounds: ```bash # Add rust-src component (you'll need the source to rebuild it) rustup component add rust-src # Create .cargo/config.toml mkdir -p .cargo cat > .cargo/config.toml << 'EOF' [unstable] build-std = ["std", "panic_unwind"] [build] target = "aarch64-apple-darwin" # adjust for your platform [profile.dev] opt-level = 0 debug = 2 [profile.dev.package."*"] opt-level = 0 debug = 2 EOF # Build with nightly (build-std is unstable, hence nightly) cargo +nightly build ``` The `[profile.dev.package."*"]` bit is the secret sauce: it applies debug settings to *all* dependencies, including the freshly-built stdlib. ### 2. The Program We'll Dissect ```rust // src/main.rs #[inline(never)] fn make_box() -> Box<i32> { Box::new(5) } fn main() { let mut victor = vec![9_u32, 8, 7, 6]; // cap=4, len=4 let sixers = &victor[2..]; let boxed = make_box(); println!("boxed: {:?}", boxed); println!("victor: {:?}", victor); println!("sixers: {:?}", sixers); victor.push(255); // <-- The moment of truth: cap=4, but we need 5 let sixers = &victor[2..]; println!("victor: {:?}", victor); println!("sixers: {:?}", sixers); println!("This is the end."); } ``` A few notes on the setup: - **`#[inline(never)]`** on `make_box()`: Without this, the compiler will inline `Box::new(5)` directly into `main`, and you'll never see it as a separate stack frame. We want to watch the allocation, so we tell the compiler to keep its hands off. - **`vec![9_u32, 8, 7, 6]`**: Creates a Vec with exactly capacity 4. Not 5, not 8. Exactly 4. This is important. - **`victor.push(255)`**: Here's where things get interesting. We have 4 elements, capacity for 4, and we're asking for room for a 5th. Something's gotta give. I named the Vec `victor`. (Narrator: Sherley, he didn't!) ## The LLDB Session ### Launch the Debugger ```bash lldb target/aarch64-apple-darwin/debug/rs-hours ``` ### Set a Breakpoint on the Growth Function We want to catch the moment Vec realizes it needs more space. The function responsible is `grow_amortized`, and yes, "amortized" is a hint about the strategy we're about to witness. ```lldb (lldb) breakpoint set -r grow_amortized (lldb) run ``` ### We've Hit grow_amortized When Vec needs more capacity, it calls `RawVecInner::grow_amortized`. Let's see what we're working with: ```lldb Process stopped * frame #0: rs-hours`grow_amortized at mod.rs:679 (lldb) frame variable (RawVecInner *) self = 0x16fdfd7a0 (unsigned long) len = 4 # current capacity (unsigned long) additional = 1 # we just need ONE more slot (Layout) elem_layout: size = 4 # sizeof(u32) = 4 bytes align = 4 ``` So: we have capacity for 4 elements, and we're asking for 1 more. Seems reasonable. Surely we'll just allocate space for 5, right? *Right?* ### The Doubling Strategy Revealed ```lldb (lldb) next (lldb) next # oooh, twice.. unrelated! ``` At line 692, we see the growth formula in all its glory: ```rust let cap = cmp::max(self.cap.as_inner() * 2, required_cap); ``` This computes: `cap = max(4 × 2, 5) = max(8, 5) = 8` Vec doesn't grow to 5. It grows to **8**. It *doubles*. This is amortized growth in action. By doubling instead of incrementing, Vec ensures that a sequence of N pushes takes O(N) total time, not O(N²). Each individual push might occasionally trigger an expensive reallocation, but averaged out, it's constant time per push. (Victor, having grown, is now twice the vec he used to be. Who knew he had such potential?) ### Into the Allocator ```lldb (lldb) b realloc (lldb) continue ``` We stop at `alloc::alloc::realloc`: ```lldb (lldb) frame variable (unsigned char *) ptr = 0x100a49e40 # the old heap pointer (Layout) layout: size = 16 # old size: 4 elements × 4 bytes align = 4 (unsigned long) new_size = 32 # new size: 8 elements × 4 bytes ``` There it is: we're reallocating from 16 bytes to 32 bytes. The old buffer could hold 4 `u32`s; the new one will hold 8. ### The Pointer Changes Address ```lldb (lldb) finish Return value: (unsigned char *) $0 = 0x100a49290 ``` Notice something? - **Old pointer:** `0x100a49e40` - **New pointer:** `0x100a49290` They're different. The allocator couldn't extend the buffer in place (perhaps something else was using the adjacent memory), so it had to: 1. Allocate a fresh 32-byte buffer elsewhere 2. Copy the existing 16 bytes of data 3. Free the old buffer This is why you can't hold a reference into a Vec across a push that might reallocate: the reference would be pointing to freed memory. The borrow checker protects you from this particular footgun, but now you know *why* it's a footgun. ### The Final State After `push(255)` completes, let's admire our handiwork: ```lldb (lldb) frame variable victor victor: Vec<unsigned int> ptr = 0x100a49290 # NEW address len = 5 cap = 8 (lldb) memory read 0x100a49290 -c 20 -f x 0x100a49290: 0x00000009 0x00000008 0x00000007 0x00000006 0x100a492a0: 0x000000ff ``` | Index | Value | Notes | |-------|-------|-------| | 0 | 9 | original | | 1 | 8 | original | | 2 | 7 | original | | 3 | 6 | original | | 4 | 255 | **the new arrival** | | 5-7 | (unused) | room for 3 more before the next reallocation | Victor has grown from a capacity of 4 to a capacity of 8, moved to a new home in memory, and welcomed 255 to the family. ## The Complete Flow ``` push(255) │ v Vec::push() checks: len(4) == cap(4)? ─── YES, we're full │ v RawVecInner::grow_amortized(len=4, additional=1) │ v new_cap = max(4 × 2, 5) = 8 ← DOUBLING, not incrementing │ v alloc::realloc(old_ptr, old_size=16, new_size=32) │ v System allocator returns NEW pointer (old is freed) │ v Vec updates its internal state: ptr=new, cap=8 │ v Write 255 at ptr[4] │ v len = 5 ``` ## Things to Take Away 1. **Amortized growth is the whole point.** Vec doubles capacity (not +1) to keep `push()` at O(1) amortized. You pay for occasional reallocations, but averaged over many pushes, each one is constant time. 2. **Reallocation can move your data.** The pointer may change. This is *why* the borrow checker won't let you hold a `&vec[i]` across a potentially-reallocating operation. Now you've seen the danger with your own eyes. 3. **Vec is three words on the stack.** `[ptr, cap, len]`, that's it. The actual data lives on the heap, addressed by that pointer. 4. **Debug stdlib is essential for learning.** Without `-Z build-std`, stdlib functions appear as `[opt]` with no variable visibility. You can't see `len`, `cap`, `new_size`, or any of the interesting values. Rebuilding with debug info makes the internals transparent. 5. **`#[inline(never)]` is your friend.** When the compiler inlines a function, it disappears as a discrete stack frame. If you want to observe an allocation (or any small function), tell the compiler to back off. ## Useful LLDB Commands | Command | Purpose | |---------|---------| | `breakpoint set -r <regex>` | Break on functions matching a pattern | | `frame variable` | Show local variables in current frame | | `register read x0 x1 x2` | Check function arguments (ARM64 calling convention) | | `memory read <addr> -c <n>` | Dump n bytes starting at address | | `finish` | Run until current function returns | | `bt` | Show backtrace | | `next` | Step over (don't enter function calls) | | `step` | Step into (enter function calls) | ## Custom LLDB Commands for Rust We used a custom `rvec` command that pretty-prints Rust Vec internals: ```lldb (lldb) rvec victor victor: Vec<unsigned int> @ 0x16fdfd7a0 -> ptr=0x100a49290 len=5 cap=8 0000000100a49290 0x00000009 0x00000008 0x00000007 0x00000006 0000000100a492a0 0x000000ff ``` (Building your own LLDB commands for Rust types is a rabbit hole worth exploring, but that's a tutorial for another day.) --- *Now go forth and debug. And remember: every time you call `push()`, somewhere in memory, a Vec might be packing up and moving to a bigger apartment.*