owned this note
owned this note
Published
Linked with GitHub
# FVM (M1) Audit Dashboard
:::info
This is a working doc for the audit that Alex Wade is performing on the ref-fvm repo, starting April 11, 2022. The primary purpose of the audit is to assess the security of the new FVM in preparation for the network v16 upgrade (aka the M1 upgrade) planned for late June.
:::
Total time alloted: 7 weeks.
**Projects:**
* Audit of the new Filecoin Virtual Machine, its usage of WASM, and its integration into Lotus via the FFI
* DESCOPED: Security assessment of the Rust-based Builtin actors (previously examined for interoperability)
**Other considerations**
* Prep work for programmable actors on Filecoin
* Consideration of builtin actor changes that will be needed for the introduction of programmable actors
## Contents
[TOC]
## Schedule
:::info
This schedule should NOT be considered rigid. Alex is encouraged to take longer than scoped on any of the areas of focus if he deems it necessary, and to move on more quickly than planned if the work feels unnecessary or unproductive. The schedule is constructed in order of priority, the latter areas of focus are easiest to descope.
Additionally, new areas of concern might emerge that warrant more immediate attention.
:::
| Date | Time allocated | Primary Focus | Notes |
|-|-|-|-|
| April 11 - April 22 | 2 weeks | - Understand the FVM architecture <br> - Audit the message execution pipeline in the FVM <br> - Audit the syscalls, especially `Send` & the ABI <br>- Audit the integration of the FVM into Lotus, through the FFI <br> - Audit the SDK and FvmRuntime|
| April 25 - April 29 | 1 weeks |- FVM usage of WASM, especially fuel and perf | |
| May 2 - May 13 | 2 weeks | - Audit the [FVM IPLD stack] |
| May 16 - May 27 | 2 weeks | - Builtin actors security, especially the hooks into the FVM |
## Findings
### F1: `InvocationData<Kernel>::charge_gas_for_exec_units` does not calculate exec unit delta
:::success
Addressed by [`ref-fvm/#501`](https://github.com/filecoin-project/ref-fvm/pull/501).
:::
`InvocationData<Kernel>::charge_gas_for_exec_units` should charge gas corresponding to the exec units consumed based on the last snapshot, then update the snapshot with the new total exec units consumed. [Code here.](https://github.com/filecoin-project/ref-fvm/blob/bed676b7559f0c2a4cc0606dea996890a454807f/fvm/src/syscalls/mod.rs#L75-L78)
However, the gas charge applied is the [full cost for the passed-in `exec_units_consumed`](https://github.com/filecoin-project/ref-fvm/blob/bed676b7559f0c2a4cc0606dea996890a454807f/fvm/src/syscalls/mod.rs#L83).
As a result, gas charges for exec units will monotonically increase as an actor makes syscalls, rather than accrue according to actual work done.
### F2: `InvocationData<Kernel>` uses `gas_available` instead of `gas_used`
:::success
Addressed by [`ref-fvm/#501`](https://github.com/filecoin-project/ref-fvm/pull/501).
:::
`InvocationData<Kernel>` uses `GasTracker::gas_available()` to calculate gas deltas in between gas and exec unit charges. However, `gas_available` is the total gas allocated to the high-level call, and does not change. Instead, gas charges [update `GasTracker::gas_used`](https://github.com/filecoin-project/ref-fvm/blob/70e76973c41d2678049828eb200a43c2eee07457/fvm/src/gas/mod.rs#L26-L48).
The following locations use `gas_available()` incorrectly:
* [`InvocationData::new`](https://github.com/filecoin-project/ref-fvm/blob/70e76973c41d2678049828eb200a43c2eee07457/fvm/src/syscalls/mod.rs#L46-L51)
* [`calculate_exec_units_for_gas`](https://github.com/filecoin-project/ref-fvm/blob/70e76973c41d2678049828eb200a43c2eee07457/fvm/src/syscalls/mod.rs#L62-L66)
* [`charge_gas_for_exec_units`](https://github.com/filecoin-project/ref-fvm/blob/70e76973c41d2678049828eb200a43c2eee07457/fvm/src/syscalls/mod.rs#L86)
* [`charge_exec_units_for_gas`](https://github.com/filecoin-project/ref-fvm/blob/70e76973c41d2678049828eb200a43c2eee07457/fvm/src/syscalls/bind.rs#L109)
* [`DefaultCallManager::send_resolved`](https://github.com/filecoin-project/ref-fvm/blob/70e76973c41d2678049828eb200a43c2eee07457/fvm/src/call_manager/default.rs#L334)
### F3: Panics in proof libraries can cause fatal error during tipset processing
:::success
Addressed in [`ref-fvm/#574`](https://github.com/filecoin-project/ref-fvm/pull/574).
:::
Several syscalls pass user-provided proofs to filecoin proof libraries without checking proof lengths. Typically, these proof libraries also do not check proof lengths before performing reads via array access. As a result, users can cause certain kernel methods to panic by providing proofs of insufficient size.
The following methods are vulnerable to these panics:
* [`verify_seal`](https://github.com/filecoin-project/ref-fvm/blob/6a42089d717bc5545acf2b6483016cbe68184536/fvm/src/kernel/default.rs#L546)
* _Note:_ this method is unique in that [the passed-in proof may not be empty](https://github.com/filecoin-project/rust-fil-proofs/blob/07696ba24e11e1642602b0db73dbfe013288cd25/filecoin-proofs/src/api/seal.rs#L931). However, a proof of length 1 will cause a panic.
* [`verify_post`](https://github.com/filecoin-project/ref-fvm/blob/6a42089d717bc5545acf2b6483016cbe68184536/fvm/src/kernel/default.rs#L552)
* [`verify_replica_update`](https://github.com/filecoin-project/ref-fvm/blob/6a42089d717bc5545acf2b6483016cbe68184536/fvm/src/kernel/default.rs#L729)
**Note**: I'm not 100% sure how wasmtime handles a panic in the wrapped-host-function portion of a syscall. After reading through their code a bit, it seems likely that:
1. Panics are caught ([reference](https://github.com/bytecodealliance/wasmtime/blob/7f69a7d9e0f598b53616a799718ef65854aaabbf/crates/wasmtime/src/func.rs#L1885-L1893))
2. Panics are "resumed" at the site where wasm invocation occured ([reference](https://github.com/bytecodealliance/wasmtime/blob/7f69a7d9e0f598b53616a799718ef65854aaabbf/crates/wasmtime/src/func.rs#L1930))
I wasn't able to confirm this in any wasmtime documentation, though; it may be that a Trap is ultimately raised.
**As a result**: Regardless of whether host function panics result in a Trap or a panic, the end result is that an error is returned from [`fil_fvm_machine_execute_message`](https://github.com/filecoin-project/filecoin-ffi/blob/8b322529c858dc5dd22a906f711efa7206cd8c56/rust/src/fvm/machine.rs#L168-L175), which will lead to an error in [lotus' `fvm::ApplyMessage`](https://github.com/filecoin-project/lotus/blob/1162b02e609560a6ffcd4d07211100ea18554602/chain/vm/fvm.go#L302-L305), halting tipset processing.
**Mitigation**: The two changes below should directly address the issue.
1. **Check proof lengths before performing reads.** This change likely belongs in the proof libraries, though it may make sense to include checks at actors' level as well. Currently, [there are similar checks in actors code](https://github.com/filecoin-project/builtin-actors/blob/41640c7468a3a09a04bbe1109a8a6583e477810b/actors/miner/src/lib.rs#L463-L471), though these cover "max size" restrictions rather than "min size."
2. **Catch panics returned from proof verification.** This change likely belongs in the proof libraries, though it may make sense to include panic catching in the kernel. Currently, [`batch_verify_seals` catches panics when verifying seals](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/kernel/default.rs#L621).
**Further work**: Additional hardening around fatal errors and panics is detailed in recommendations [#R2](#R2-Produce-a-consensus-outcome-for-fatal-errors) and [#R3](#R3-Catch-potential-panics-in-FVM-kernel).
### F4: `ipld::read` does not use `offset` properly
:::success
Addressed in [`ref-fvm/#555`](https://github.com/filecoin-project/ref-fvm/pull/555).
:::
Given a block id, [`ipld::read`](https://github.com/filecoin-project/ref-fvm/blob/17072f1db3602021a8e66e7081cd51f4734d78a9/sdk/src/sys/ipld.rs#L75) reads a block from the kernel's `BlockRegistry` into a provided buffer. `ipld::read` also accepts an `offset` into the block where the read should begin.
`offset` is passed into [`Kernel::block_read`](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L365), which first calculates the length of the read to perform before copying block data to the output buffer:
```rust=
let len = if offset as usize >= data.len() {
0
} else {
buf.len().min(data.len())
};
// ...
if len != 0 {
buf.copy_from_slice(&data[offset as usize..][..len]);
}
```
Neither operation correctly handles nonzero offsets:
* The read length should be `buf.len().min(data.len() - offset)`
* The copy operation should be `buf[..len].copy_from_slice(...)`
*(Please double-check these are correct!)*
**As a result**: reads with nonzero offsets may panic for some output buffers. (From Steven: the offset feature is unused.)
### F5: `ipld` syscalls return incorrect `ErrorNumbers`
:::warning
Partially addressed in [`ref-fvm/#533`](https://github.com/filecoin-project/ref-fvm/pull/533).
All implementation changes have been made, but syscall documentation needs to be updated to reflect the actual error numbers returned by each method.
**See subsections below** for specific changes needed.
*Original issue text has been hidden to avoid taking up too much room; see "Show hidden" below:*
:::
:::spoiler *(Show hidden)*
`ipld` syscalls should return various `ErrorNumbers` as documented [here](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/sdk/src/sys/ipld.rs#L13). Currently, the only `ErrorNumber` returned by these syscalls is `ErrorNumber::IllegalArgument`.
:::
#### `ipld::block_open`
:::warning
Update syscall documentation to reflect the new error numbers:
* `IllegalArgument`
* `LimitExceeded` - *missing from docs*
* `IllegalCodec` - *missing from docs*
**Followup**: remove `NotFound` and add missing errors.
:::
:::spoiler *(Show hidden)*
```rust=
/// | Error | Reason |
/// |---------------------|---------------------------------------------|
/// | [`NotFound`] | the target block isn't in the reachable set |
/// | [`IllegalArgument`] | there's something wrong with the CID |
```
* `ipld::open` should return `NotFound` if the target block isn't in the reachable set. [Instead, this is a fatal error](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L299-L308).
* Every other non-fatal error uses `IllegalArgument`.
* [`self.blocks.put`](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L320-L321) only returns an error if "too many blocks" have been placed in the `BlockRegistry`. While this shouldn't be possible, `LimitExceeded` may be a more suitable error for this case.
:::
#### `ipld::block_create`
:::warning
Update syscall documentation to reflect the new error numbers:
* `IllegalArgument`
* `LimitExceeded`
* `IllegalCodec`
**Followup**: remove `Serialization` and `NotFound`.
:::
:::spoiler *(Show hidden)*
```rust=
/// | Error | Reason |
/// |---------------------|---------------------------------------------------------|
/// | [`LimitExceeded`] | the block is too big |
/// | [`NotFound`] | one of the blocks's children isn't in the reachable set |
/// | [`IllegalCodec`] | the passed codec isn't supported |
/// | [`Serialization`] | the passed block doesn't match the passed codec |
/// | [`IllegalArgument`] | the block isn't in memory, etc. |
```
* `ipld::create` lists several errors; only `IllegalArgument` is used.
* `IllegalArgument` is used for an error returned by `self.blocks.put`. `LimitExceeded` may be more suitable in this case. (See `ipld::open` above).
* The comment for `LimitExceeded` is "the block is too big". There is **no** check for block size.
* The comment for `NotFound` implies some check should be made against the blockstore. **No** such check is made.
* The comments for `IllegalCodec` and `Serialization` imply that checks should be made that the codec is allowed, and that the data in the block uses said codec. **No** such checks are made.
:::
#### `ipld::block_read`
:::success
No followup needed.
:::
:::spoiler *(Show hidden)*
```rust=
/// | Error | Reason |
/// |---------------------|---------------------------------------------------|
/// | [`InvalidHandle`] | if the handle isn't known. |
/// | [`IllegalArgument`] | if the passed buffer isn't valid, in memory, etc. |
```
* `ipld::read` should use `InvalidHandle` if the handle in question isn't in the `BlockRegistry`. However, any output from `self.blocks.get(id)` is [overwritten by `IllegalArgument`](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L366).
:::
#### `ipld::block_stat`
:::success
No followup needed.
:::
:::spoiler *(Show hidden)*
```rust=
/// | Error | Reason |
/// |-------------------|----------------------------|
/// | [`InvalidHandle`] | if the handle isn't known. |
```
* `ipld::stat` should only ever return `InvalidHandle`. However, [only `IllegalArgument` is ever returned](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L388).
:::
#### `ipld::block_link`
:::warning
Update syscall documentation to reflect the new error numbers:
* `IllegalArgument`
* `IllegalCid`
* `InvalidHandle`
* `BufferTooSmall` - *missing from docs*
**Followup**: Add missing error.
:::
:::spoiler *(Show hidden)*
```rust=
/// | Error | Reason |
/// |---------------------|---------------------------------------------------|
/// | [`InvalidHandle`] | if the handle isn't known. |
/// | [`IllegalCid`] | hash code and/or hash length aren't supported. |
/// | [`IllegalArgument`] | if the passed buffer isn't valid, in memory, etc. |
```
* `ipld::cid` should return `InvalidHandle` if the block handle isn't known. However, [this is overwritten by `IllegalArgument`](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L338).
* `ipld::cid` should return `IllegalCid` if the provided hash function and length aren't supported. **No** check against an allow-list is made, and any errors resulting from the use of these inputs is converted to `IllegalArgument`.
* ([ref: `hash_fun`](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L339-L341))
* ([ref: `hash_len`](https://github.com/filecoin-project/ref-fvm/blob/297a7694ff9a584880632dde8826d9134d976fd9/fvm/src/kernel/default.rs#L350-L354))
:::
### F6: Conflicting Cid max length constants used
:::warning
Partially addressed in [`ref-fvm/#579`](https://github.com/filecoin-project/ref-fvm/pull/579): all constants now agree on a value of `100`.
Further improvement in [`ref-fvm/#590`](https://github.com/filecoin-project/ref-fvm/pull/590), which unifies `MAX_CID_LEN` in the FVM.
**Followup**: Filecoin FFI still has its own const, `EST_MAX_CID_LEN` -- this should be pulled from the FVM instead.
UPDATE: Done in https://github.com/filecoin-project/filecoin-ffi/pull/283
:::
`filecoin-ffi` and `ref-fvm` use three different constants describing the max length of a Cid:
* [`filecoin-ffi/blockstore`](https://github.com/filecoin-project/filecoin-ffi/blob/d3b7d26e55942363335baf427b1800006e79c332/rust/src/fvm/blockstore/cgo.rs#L15-L17) - `EST_MAX_CID_LEN = 100`
* [`ref-fvm/sdk`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/sdk/src/lib.rs#L18-L19) - `MAX_CID_LEN = 256`
* [`ref-fvm/syscalls`](https://github.com/filecoin-project/ref-fvm/blob/425375673af5049fbf13d81ca0c76a403960788d/fvm/src/syscalls/mod.rs#L95-L96) - `MAX_CID_LEN = 256`
**Recommendation**:
* Determine which of the constants is valid for M1. From the comment [here](https://github.com/filecoin-project/ref-fvm/pull/555/commits/ec3086d38601426e9cb219ad4301ac3b6cb6248d#r872509685), the value should be 100 for now.
* Avoid redefining constants - remove two of these definitions, and import from the third.
### F7: Resolve similar constants in `ref-fvm::sdk`
As of June 13 2022, still TODO, but not concerning.
Four constants now exist to describe the nil `BlockId` in the kernel's `BlockRegistry`. All are defined as `0u32`:
* `ipld::get_block` uses [`sys::ipld::UNIT`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/sdk/src/ipld.rs#L46-L49)
* Several other locations use either:
* [`sdk::lib::NO_DATA_BLOCK_ID`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/sdk/src/lib.rs#L24-L25)
* [`sdk::vm::NO_DATA_BLOCK_ID`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/sdk/src/vm.rs#L7-L8)
* [`fvm::call_manager::NO_DATA_BLOCK_ID`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/fvm/src/call_manager/mod.rs#L22-L23)
**Recommendation**:
* Use one constant in all locations.
* This constant may be better suited living in the `BlockRegistry` itself, as it's closely related to [`blocks::FIRST_ID`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/fvm/src/kernel/blocks.rs#L20-L21).
* ... that said, this might pose a risk of introducing a fifth constant!
### F8: Enabling debug may cause clients to fork
:::info
Known issue.
:::
`debug::log` checks if the kernel has "debug mode" enabled, returning early if it does:
```rust=
pub fn log(context: Context<'_, impl Kernel>, msg_off: u32, msg_len: u32) -> Result<()> {
// No-op if disabled.
if context.kernel.debug_enabled() {
return Ok(());
}
let msg = context.memory.try_slice(msg_off, msg_len)?;
let msg = String::from_utf8(msg.to_owned()).or_illegal_argument()?;
context.kernel.log(msg);
Ok(())
}
```
Especially in M1, this method is expected never to produce an error, but they are possible.
If a client has debug-mode enabled and a transaction calls `debug::log` with a payload that causes the method to return an error, the debugging client will fork away from the network.
**Recommendation**: Move the `debug_enabled()` check into `kernel::log`, which can return early with no side effects.
### F9: `ipld::get_block` may incorrectly set output length
:::success
Addressed in [`ref-fvm/#584`](https://github.com/filecoin-project/ref-fvm/pull/584/).
:::
The unsafe function `Vec::set_len(new_len: usize)` forces the length of a vector to `new_len`. [From the docs](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set_len), two properties of `new_len` must be observed for this operation to be considered safe:
1. `new_len` must be less than or equal to `capacity()`.
2. The elements at `old_len..new_len` must be initialized.
`ipld::get_block`, accepts a `size_hint: Option<u32>`, which allows the caller to shortcut slightly if they know the length of the data to be read. A buffer for the first read is allocated using `size_hint`: the value itself if provided, or `1024` as a default if not:
```rust=
let mut buf = Vec::with_capacity(size_hint.unwrap_or(1024) as usize);
// (unsafe)
let mut remaining = sys::ipld::block_read(id, 0, buf.as_mut_ptr(), buf.capacity() as u32)?;
if remaining > 0 {
// ... second read here
}
buf.set_len(buf.capacity() + (remaining as usize));
```
Assuming the initial allocated capacity is enough to hold the contents of the block, `remaining` will be zero, and the final length of the buffer will be equal to the initial capacity.
However, if the initial allocated capacity is *more than enough*, the returned `remaining` will *still* be zero. Then, the final `set_len` operation will set the length of the output buffer to its capacity - beyond the range of initialized elements.
Using this buffer may result in undefined behavior.
**Recommendation**:
* Have `block_read` return *both* `amount_read` and `remaining`.
* Require:
* `remaining == 0`
* `amount_read <= buf.capacity()`
* Set buffer length to `amount_read`
**See related:** [F10](#F10-Unsafe-usage-of-set_len-in-IPLD-syscalls)
### F10: Unsafe usage of `set_len` in IPLD syscalls
:::success
Addressed in [`ref-fvm/#584`](https://github.com/filecoin-project/ref-fvm/pull/584/).
:::
In `ipld::get_block`, `set_len` is used to manually set the length of the output buffer after a call to `ipld::block_read`. From [sdk/ipld.rs](https://github.com/filecoin-project/ref-fvm/blob/ec80f27d723d187d762cc217698697e921fc3b37/sdk/src/ipld.rs#L53-L65):
```rust=
let mut remaining = sys::ipld::block_read(id, 0, buf.as_mut_ptr(), buf.capacity() as u32)?;
if remaining > 0 {
buf.set_len(buf.capacity());
buf.reserve_exact(remaining as usize);
remaining = sys::ipld::block_read(
id,
buf.len() as u32,
buf.as_mut_ptr_range().end,
(buf.capacity() - buf.len()) as u32,
)?;
debug_assert!(remaining <= 0, "should have read whole block");
}
buf.set_len(buf.capacity() + (remaining as usize));
```
The first use of `set_len` is safe, as it sets the length to a maximum of the buffer's capacity.
The [second use of `set_len`](https://github.com/filecoin-project/ref-fvm/blob/ec80f27d723d187d762cc217698697e921fc3b37/sdk/src/ipld.rs#L65) is valid for one case - where `remaining == 0` after either read. Otherwise,
* If `remaining > 0` after the second read, `set_len` is clearly dangerous, setting the buffer's length beyond its capacity. Additionally, no error is thrown, even though we expected to read the entire block.
* If `remaining < 0` after either read, `set_len` will overflow as the negative value is cast to `usize`.
Neither of these two cases should be possible, but the implementation could be much more defensive.
**Recommendation**:
* Before the final call to `set_len`, check that `remaining == 0`, and return an error if this is not the case. This will ensure we get an error if the entire block is not read.
* Remove the addition of the `remaining` parameter in the final call to `set_len`.
```rust=
if remaining != 0 { /* return error here */ }
buf.set_len(buf.capacity());
```
* It's probably also worth it to perform a similar check for `remaining` in `ipld::get`, as this [currently also uses `debug_assert_eq`](https://github.com/filecoin-project/ref-fvm/blob/ec80f27d723d187d762cc217698697e921fc3b37/sdk/src/ipld.rs#L37), which is a no-op in an optimized build.
### F11: `Amt::batch_delete` returns incorrect value
`Amt::batch_delete` deletes Amt indices from a passed-in iterator. It should return whether the Amt was modified: if none of the input indices are present in the Amt, no modification occured and the root Cid will be unchanged.
`batch_delete` calls `Amt::delete` in a loop. `Amt::delete` returns `Ok<None>` if the index was not present ([src](https://github.com/filecoin-project/ref-fvm/blob/4100dba80fec4495e101bedc9d5df418cad40977/ipld/amt/src/amt.rs#L185-L193)):
```rust=
// Delete node from AMT
let deleted =
self.root
.node
.delete(&self.block_store, self.height(), self.bit_width(), i)?;
if deleted.is_none() {
return Ok(None);
}
```
In `batch_delete`, if the result from `is_delete` is `None`, we set `modified` to `true` ([src](https://github.com/filecoin-project/ref-fvm/blob/4100dba80fec4495e101bedc9d5df418cad40977/ipld/amt/src/amt.rs#L247-L258)):
```rust=
// TODO: optimize this
let mut modified = false;
// Iterate sorted indices. Sorted to safely optimize later.
for i in sorted(iter) {
let found = self.delete(i)?.is_none();
if strict && found {
return Err(anyhow!("no such index {} in Amt for batch delete", i).into());
}
modified |= found;
}
Ok(modified)
```
**As a result**: the returned value from `batch_delete` is almost never correct. However, the deletion should occur as intended; `strict` should return an error if the index was not present.
* I checked for usage of `batch_delete`, and it doesn't look like any uses in `builtin-actors` actually check the return value.
## Recommendations
### R1: Perform `MessageValidForBlockInclusion` checks in FVM
Currently, message [preflight checks](https://github.com/filecoin-project/ref-fvm/blob/bed676b7559f0c2a4cc0606dea996890a454807f/fvm/src/executor/default.rs#L193-L199) perform very [limited syntactic validation](https://github.com/filecoin-project/ref-fvm/blob/bed676b7559f0c2a4cc0606dea996890a454807f/shared/src/message.rs#L40).
**Recommendation**: perform [`MessageValidForBlockInclusion`](https://github.com/filecoin-project/lotus/blob/1922e763498ab246bdf6704a0c9bcfe325ef5439/chain/types/message.go#L149) checks in the FVM.
### R2: Produce a consensus outcome for fatal errors
:::success
Partly addressed in [`ref-fvm/#548`](https://github.com/filecoin-project/ref-fvm/pull/548), but changes did not handle the case where a syscall paniced FVM-side.
Fully addressed in [`ref-fvm/#557`](https://github.com/filecoin-project/ref-fvm/pull/557) by wrapping wasm invocation with a panic handler.
:::
This recommendation is already thoroughly outlined in Raul's issue [here](https://github.com/filecoin-project/ref-fvm/issues/508). Note that the changes described should be extended to handle not just Traps, but panics resulting from wasm invocation as well.
### R3: Catch potential panics in FVM kernel
:::success
Addressed in [`ref-fvm/#574`](https://github.com/filecoin-project/ref-fvm/pull/574).
:::
A few locations in the kernel call methods or perform operations that may panic:
* `verify_post` [assumes that passed-in randomness is exactly 32 bytes in length](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/kernel/default.rs#L565-L566).
* Anywhere `bytes_32` is called assumes that the passed-in buffer is exactly 32 bytes in length:
* [`verify_post`](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/kernel/default.rs#L582)
* [`verify_aggregate_seals`](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/kernel/default.rs#L688-L689)
* [`verify_seal`](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/kernel/default.rs#L997-L998)
Though the output of randomness syscalls should only ever be 32 bytes, the burden of checking this is left to actors code and the rand syscalls themselves. The kernel assumes the input it receives is valid.
**Recommendation**: Explicitly validate lengths prior to calling these methods, or wrap calls with `panic::catch_unwind`.
### R4: Ensure syscall cbor-unmarshal errors are highly visible to implementers
:::warning
Partially addressed in [`ref-fvm/#574`](https://github.com/filecoin-project/ref-fvm/pull/574).
However, the logging only covers cases where deserialization panics. If cbor returns an error, it is converted to `IllegalArgument` and passed through.
:::
Syscall execution often follows this pattern:
* In the SDK, a struct is serialized and passed to the FVM as a pointer / length pair. ([example - `verify_replica_update`](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/sdk/src/crypto.rs#L147-L152))
* In the FVM, the same pointer / length pair are used as offsets into the actor's memory from which the same struct is deserialized. ([example - `verify_replica_update`](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/syscalls/crypto.rs#L203-L205))
If deserialization fails, the actor is returned [`Err(ErrorNumber::IllegalArgument)`](https://github.com/filecoin-project/ref-fvm/blob/7bf6731c85a2c33e207214e90f407d891edff2cb/fvm/src/syscalls/context.rs#L77-L80), the same error used to indicate that the user has provided invalid parameters to the syscall.
In M1, actors will only consist of trusted code. In this context, errors during the latter deserialization portion of this pattern are highly irregular and likely indicate a serious problem. The use of `IllegalArgument` may mask these issues, as it is also used to signify any number of more mundane problems.
**Recommendation**: Ensure logging methods capture and highlight cbor deserialization issues.
### R5: Expand actors' `UnvalidatedBitField` pattern to Cids
:::info
*After discussion, this isn't worth the effort.*
*The original text has been hidden so this doesn't take up too much room. See "Show hidden" below:*
:::
In `builtin-actors`, external method parameters use the type `UnvalidatedBitField` to represent raw, CBOR-decoded RLE. Before calling any methods on these fields, each `UnvalidatedBitField` must first be converted to a `BitField` via `UnvalidatedBitField::validate()`, which checks that the raw RLE+ encoding is sound.
This conversion process allows the latter type, `BitField`, to make more guarantees about its behavior (compared to go-actors, where each method may potentially validate RLE and so must return an error).
This pattern may be useful for Cids in both `builtin-actors` and `ref-fvm`.
:::spoiler **(Show hidden)**
#### Expanding to Cids
Whether passed in as params to actors methods or generated as part of FVM/actors execution, Cids permeate every part of the Filecoin stack. Crucially, `Cid` may represent a broad spectrum of objects - some valid, most not. Each part of the stack must either validate passed-in Cids for its specific context, or trust that this validation was performed elsewhere. Reading through isolated portions of code, it's nearly impossible to determine what variants of `Cid` are allowed at a given point.
Especially with the `ref-fvm's` frequent use of FFI with low-level reads from memory, the potential variety `Cid` represents requires messy, error-prone code - particularly at syscall boundaries:
* From [`fvm_sdk::ipld::get`](https://github.com/filecoin-project/ref-fvm/blob/049ff89f0e7e3d5470b1f5c3c7435b2d304bcbe6/sdk/src/ipld.rs#L34-L41):
```rust=
pub fn get(cid: &Cid) -> SyscallResult<Vec<u8>> {
unsafe {
// TODO: Check length of cid?
let mut cid_buf = [0u8; MAX_CID_LEN];
cid.write_bytes(&mut cid_buf[..])
.expect("CID encoding should not fail");
let fvm_shared::sys::out::ipld::IpldOpen { id, size, .. } =
sys::ipld::open(cid_buf.as_mut_ptr())?;
```
* From [`syscalls::context::Memory::read_cid`](https://github.com/filecoin-project/ref-fvm/blob/049ff89f0e7e3d5470b1f5c3c7435b2d304bcbe6/fvm/src/syscalls/context.rs#L53-L62):
```rust=
pub fn read_cid(&self, offset: u32) -> Result<Cid> {
// NOTE: Be very careful when changing this code.
//
// We intentionally read the CID till the end of memory. We intentionally do not "slice"
// with a fixed end.
// - We _can't_ slice MAX_CID_LEN because there may not be MAX_CID_LEN addressable memory
// after the offset.
// - We can safely read from an "arbitrary" sized slice because `Cid::read_bytes` will never
// read more than 4 u64 varints and 64 bytes of digest.
Cid::read_bytes(
```
<!-- #### Recommendation
* Take important checks like [`is_sealed_sector`](https://github.com/filecoin-project/builtin-actors/blob/78deeb98bce0b8054778d697c42258775538fd2a/actors/miner/src/policy.rs#L43-L49) and encode them at the type level. -->
:::
### R6: Amt / Hamt Hardening
Both the Amt and Hamt frequently make assumptions that various invariants are maintained. If these invariants aren't maintained, the code panics. (As example, the `unreachable!` macros and direct array access used frequently in both libraries)
There probably isn't a better way to handle this, as (in this case) throwing errors would add a lot of extra code for dubious benefit. Instead, it may be worth it to ensure that Amt/Hamt serde methods are as strict as possible to ensure that these data structures are valid both when they are loaded and stored.
The following is a starting point for this (and all numbers should be double-checked!):
1. Both: Add `MIN_BIT_WIDTH`/`MAX_BIT_WIDTH` constants:
* Amt: [2, 63] - there may be a better, lower max, but beyond 63, various calculations start to panic (e.g. `1 << 64`)
* Hamt: [2, 8] - these values need double-checking.
2. Amt: Add validity checks in `Amt::Root::deserialize` for fields:
* `bit_width ∈ [MIN_BIT_WIDTH, MAX_BIT_WIDTH]`
* `height < MAX_HEIGHT` (64)
* `count <= nodes_for_height`
3. Hamt: Remove unused parameter: `depth`. Passed around internally in `Hamt::Node` methods - only incremented, never read. (ex: [Node::get](https://github.com/filecoin-project/ref-fvm/blob/4100dba80fec4495e101bedc9d5df418cad40977/ipld/hamt/src/node.rs#L194))
* Alternatively, `depth` could be used as a check to safeguard against infinite recursion, by asserting that `depth < (32 / bit_width)`, as this is the max height of the Hamt.
## Changes required before M2 release
:::info
This section contains items that are "suitable for an M1 release, but should be changed before M2."
Please feel free to add to this list.
:::
#### Params
* Bump `MAX_CID_LEN` to 256
* Set upper bound on [`MAX_BLOCKS`](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/fvm/src/kernel/blocks.rs#L21) in kernel `BlockRegistry`
* ([From Steb's comment](https://github.com/filecoin-project/ref-fvm/pull/555/commits/cf15572f48c66f182aedef6afaabf12aa91b194f#r871977666): "We can set a limit in M2")
* Update `sdk::send` to handle different return data codecs.
* ([From code comment](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/sdk/src/send.rs#L38))
#### Wasmtime config
* Add "advanced support" for `wasm_bulk_memory` config
* ([From code comment](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/fvm/src/machine/engine.rs#L87-L90))
* Determine whether `wasm_multi_value` can be enabled
* ([From code comment](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/fvm/src/machine/engine.rs#L92-L95))
* Ensure `cranelift_opt_level(Speed)` is guaranteed to run in linear time
* ([From code comment](https://github.com/filecoin-project/ref-fvm/blob/c4b01e2964291b12cfd8b1bf2e41cd3f9daaa927/fvm/src/machine/engine.rs#L135-L137))
#### User code permissions
* Restrict user actor writes to Globals (e.g. `avail_gas_global`)
* Validate callers of priveleged syscalls (e.g. only Init actor may call `new_actor_address`)
* Custom actors probably should not be able to call `charge_gas` directly
* Ensure the FVM asserts that an Aborting actor's exitcode is nonzero (currently this does not happen)
#### Misc
* Update blockstore to write to a "write set," only flushing when updating the root.
* ([From code comment](https://github.com/filecoin-project/ref-fvm/blob/425375673af5049fbf13d81ca0c76a403960788d/fvm/src/kernel/default.rs#L306-L307))
* Review wasm instrumentation in context of user-supplied bytecode
* Can users cause this to crash / loop indefinitely?
* Can users manipulate the instrumentation process to get favorable exec costs?
* Does the underlying code panic, and can this be triggered by malicious input?
* Current pattern in `Engine::load_raw` [deserializes raw wasm before instrumentation](https://github.com/filecoin-project/ref-fvm/blob/b1b35d04b9c6adce33ab732ebe14a70b1e5aae79/fvm/src/machine/engine.rs#L228). Internally, this calls `Module::deserialize`, which warns against exposing this method to arbitrary user input ([source](https://docs.rs/wasmtime/latest/wasmtime/struct.Module.html#method.deserialize)).
## Notes
* `buffered::scan_for_links` uses a `Cursor` around read blocks. When a `MajByteString` or `MajTextString` tag is encountered, `buf.seek` advances the cursor position by `extra`, skipping the field. [ref](https://github.com/filecoin-project/ref-fvm/blob/049ff89f0e7e3d5470b1f5c3c7435b2d304bcbe6/fvm/src/blockstore/buffered.rs#L129-L131)
* `Cursor::seek` does NOT return an error if this advance places the position beyond the bounds of the underlying buffer. Subsequent reads will cause an error, but this will not be caught if it is the final read of the buffer.
* [ref - cursor.rs](https://doc.rust-lang.org/src/std/io/cursor.rs.html#295-299)
* Another good pattern for this method would be to check that the cursor is in the final position after all links have been scanned (before returning), as the inverse suggests this is invalid CBOR. Ex (from parity-wasm):
```rust=
pub fn deserialize_buffer<T: Deserialize>(contents: &[u8]) -> Result<T, T::Error> {
let mut reader = io::Cursor::new(contents);
let result = T::deserialize(&mut reader)?;
if reader.position() != contents.len() {
// It's a TrailingData, since if there is not enough data then
// UnexpectedEof must have been returned earlier in T::deserialize.
return Err(io::Error::TrailingData.into())
}
Ok(result)
}
```
* `BufferedBlockstore::flush` can be made to overflow the program stack if `bs.put_keyed(cid, data)` is called, where `data` is the serialized form of `cid`:
```rust=
#[test]
fn stack_overflow() {
let mem = MemoryBlockstore::default();
let buf_store = BufferedBlockstore::new(&mem);
let cid = buf_store.put_cbor(&0u8, Code::Blake2b256).unwrap();
let bytes = crate::to_vec(&cid).unwrap();
println!("{:?}", bytes);
println!("Placing malicious object in store...");
buf_store.put_keyed(&cid, &bytes).unwrap();
println!("Flushing store...");
buf_store.flush(&cid).unwrap();
println!("Flushed!"); // won't reach this - stack overflow
}
```
* Risk: `BufferedBlockstore::flush` allows malicious writes to block entire cache.
* Write methods (`put`, `put_keyed`, etc) don't perform checks on provided Cids / Blocks / links; these are performed in `flush`.