## The Panic

## Investigation
Lighthouse is panicking with an out-of-bounds slice access at position `144115188075905177`. Let's call that position `p`.
In binary, `p` isn't very interesting:
```python
>>> math.log2(p)
57.00000000000049
>>> bin(p)
'0b1000000000000000000000000000000000000000001100000010011001'
```
If the slice contained `u8`, a slice of length `p` would be 144 petabytes. So, it seems that we're getting a *wildly* incorrect slice index.
The panic is the last line of [this function](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L184):
```rust
/// Mutably iterate through all values in some allocation.
fn iter_mut(&mut self, alloc_id: usize) -> Result<impl Iterator<Item = &mut T>, Error> {
let range = self.range(alloc_id)?;
Ok(self.backing[range].iter_mut())
}
```
It must be [`Self::range`](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L161C1-L174C6) that's giving us `p`:
```rust
/// Returns the range in `self.backing` that is occupied by some allocation.
fn range(&self, alloc_id: usize) -> Result<Range<usize>, Error> {
let start = *self
.offsets
.get(alloc_id)
.ok_or(Error::UnknownAllocId(alloc_id))?;
let end = self
.offsets
.get(alloc_id + 1)
.copied()
.unwrap_or(self.backing.len());
Ok(start..end)
}
```
Therefore, `p` must be either:
1. A value in `self.offsets`.
2. The length of `self.backing`.
The type of `self.backing` is `Vec<Hash256>`, so the user would need about 4,611 *petabytes* of RAM to allocate that slice. The Blue Waters supercomputer at the University of Illinois boasts 1.5 petabytes of RAM, so let's assume that the user *is not* running a super computer and therefore (2) is impossible because a slice of that length would never have been allocated in the first place.
Therefore, `p` must be a value in `self.offsets`. There are three places where `self.offsets` is set:
1. It is [instantiated](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L43) in `Self::alloc` based on the length of a`self.backing`.
2. It is [mutated](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L58) in `Self::grow`.
2. It is [mutated](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L67-L83) in `Self::shrink`.
We must rule out (1) for the same reason we ruled out the previous (1); the user has (unfortuately) not hijacked the worlds largest supercomputer and installed Windows and Lighthouse on it.
Subsequently, we must look at the [only use](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L113) of `Self::grow`:
```rust
Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,
```
Ah ha! Unchecked subtraction! It's an underflow!
Oh wait, if we [zoom out](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L111-L115) we can see there's a `std::cmp::Ord::cmp` preventing underflow:
```rust
match prev_len.cmp(&self.backing.len()) {
Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,
Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,
Ordering::Equal => {}
}
```
To get an underflow we'd need `self.backing.len()` to be *less than* `prev_len`. But the `match` statement prevents us from trying to subtract when that's the case ([example playground]([self_backing_len](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ca456252e7a6e17c622cfd11293d2df1)) showing `Ord::cmp` behaviour).
Therefore, `Self::grow` can only be called by a value that is less than `self.backing.len()`. The not-a-super-computer argument rules out `Self::grow`.
Now, let us look at the [only use](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L112) of `Self::shrink`:
```rust
Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,
```
Spoiler, it's from the previous `match` statement. I'll repeat it here for readability:
```rust
match prev_len.cmp(&self.backing.len()) {
Ordering::Greater => self.shrink(alloc_id, prev_len - self.backing.len())?,
Ordering::Less => self.grow(alloc_id, self.backing.len() - prev_len)?,
Ordering::Equal => {}
}
```
The `Ord::cmp` means that there is no underflow in `prev_len - self.backing.len()`. If we scroll up a few lines we can see `prev_len` being [instantiated](https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/consensus/cached_tree_hash/src/cache_arena.rs#L107C1-L107C1):
```rust
let prev_len = self.backing.len();
```
As long as we believe the `match` statement prevents underflow, we must believe that the value being provided to `Self::shrink` is *less than* `self.backing.len()`. The not-a-super-computer argument saves us again.
## Conclusion
In my analysis, I can't see how it's possible to index `self.backing` with a value that is longer than it's length.
Although my analysis could be wrong, I also note that we haven't meaningfully modified this code in years and we've never seen this panic before.
Therefore, I am presently concluding that there is an issue with the user's hardware causing some sort of memory or compute corruption.