Windows `cache_arena.rs` panic

The Panic

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Investigation

Lighthouse is panicking with an out-of-bounds slice access at position 144115188075905177. Let's call that position p.

In binary, p isn't very interesting:

>>> math.log2(p)
57.00000000000049
>>> bin(p)
'0b1000000000000000000000000000000000000000001100000010011001'

If the slice contained u8, a slice of length p would be 144 petabytes. So, it seems that we're getting a wildly incorrect slice index.

The panic is the last line of this function:

    /// Mutably iterate through all values in some allocation.
    fn iter_mut(&mut self, alloc_id: usize) -> Result<impl Iterator<Item = &mut T>, Error> {
        let range = self.range(alloc_id)?;
        Ok(self.backing[range].iter_mut())
    }

It must be Self::range that's giving us p:

    /// Returns the range in `self.backing` that is occupied by some allocation.
    fn range(&self, alloc_id: usize) -> Result<Range<usize>, Error> {
        let start = *self
            .offsets
            .get(alloc_id)
            .ok_or(Error::UnknownAllocId(alloc_id))?;
        let end = self
            .offsets
            .get(alloc_id + 1)
            .copied()
            .unwrap_or(self.backing.len());

        Ok(start..end)
    }

Therefore, p must be either:

A value in self.offsets.
The length of self.backing.

The type of self.backing is Vec<Hash256>, so the user would need about 4,611 petabytes of RAM to allocate that slice. The Blue Waters supercomputer at the University of Illinois boasts 1.5 petabytes of RAM, so let's assume that the user is not running a super computer and therefore (2) is impossible because a slice of that length would never have been allocated in the first place.

Therefore, p must be a value in self.offsets. There are three places where self.offsets is set:

It is instantiated in Self::alloc based on the length of aself.backing.
It is mutated in Self::grow.
It is mutated in Self::shrink.

We must rule out (1) for the same reason we ruled out the previous (1); the user has (unfortuately) not hijacked the worlds largest supercomputer and installed Windows and Lighthouse on it.

Subsequently, we must look at the only use of Self::grow:

Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,

Ah ha! Unchecked subtraction! It's an underflow!

Oh wait, if we zoom out we can see there's a std::cmp::Ord::cmp preventing underflow:

        match prev_len.cmp(&self.backing.len()) {
            Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,
            Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,
            Ordering::Equal => {}
        }

To get an underflow we'd need self.backing.len() to be less than prev_len. But the match statement prevents us from trying to subtract when that's the case (example playground showing Ord::cmp behaviour).

Therefore, Self::grow can only be called by a value that is less than self.backing.len(). The not-a-super-computer argument rules out Self::grow.

Now, let us look at the only use of Self::shrink:

Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,

Spoiler, it's from the previous match statement. I'll repeat it here for readability:

        match prev_len.cmp(&self.backing.len()) {
            Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,
            Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,
            Ordering::Equal => {}
        }

The Ord::cmp means that there is no underflow in prev_len - self.backing.len(). If we scroll up a few lines we can see prev_len being instantiated:

let prev_len = self.backing.len();

As long as we believe the match statement prevents underflow, we must believe that the value being provided to Self::shrink is less than self.backing.len(). The not-a-super-computer argument saves us again.

Conclusion

In my analysis, I can't see how it's possible to index self.backing with a value that is longer than it's length.

Although my analysis could be wrong, I also note that we haven't meaningfully modified this code in years and we've never seen this panic before.

Therefore, I am presently concluding that there is an issue with the user's hardware causing some sort of memory or compute corruption.

The Panic

Investigation

Conclusion

Read more

SnowyHitch Panics

Serialization Overview for Ethereum CL Devs

Debugging

Lighthouse v3.1.0 [DRAFT RELEASE NOTES]