owned this note
owned this note
Published
Linked with GitHub
# RFB: Detecting Ethereum RPC censorship programmatically.
**UPDATE**: I was wrong, Infura doesn't censor RPC reads! This post caught the eyes of Infura, and they set out and wrote [a blog post](https://blog.infura.io/post/how-to-use-ethereum-proofs-2) which you should check out. Although, the party isn't over yet. The option to censor is still there unless we use secure RPC. Which is why I ended up building out this, check it out- https://github.com/liamzebedee/eth-verifiable-rpc
[Liam Zebedee](https://glissblog.vercel.app/) (@liamzebedee).
This is a spec, a request for either (1) grants or (2) builders. Please reach out on the Twitter thread / over DM's if you're interested in either.
## Introduction.
Recently, Ethereum node providers like Infura/Alchemy started censoring parts of the Ethereum database from being read via the [JSON-RPC API's](https://eth.wiki/json-rpc/API).
This proposal is to programatically detect this, by building a local EVM shim that **verifiably** loads state from a remote node during execution.
## Problem.
**Example**: the [ENS](ens.domains) entry for `tornadocash.eth`
On a censoring provider like Infura, the contenthash key for `tornadocash.eth` returns 0, where in fact we know it to be nonzero.
You can verifty this simply using `cast` from the [Foundry](https://github.com/foundry-rs/foundry) toolbelt:
```sh!
(base) ➜ lib git:(main) ✗ ETH_RPC_URL=https://mainnet.infura.io/v3/84842078b09946638c03157f83405213 cast call 0x226159d592E2b063810a10Ebf6dcbADA94Ed68b8 "contenthash(bytes32 node)" tornadocash.eth
0x00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000
```
As part of my work on [Dappnet](https://twitter.com/liamzebedee/status/1578127982173908992), I know that it's being censored. But this isn't being talked about.
**What is worse**, is that we don't know how to detect it. So I'm implementing an RPC provider marketplace, and I'm unable to tell which providers will at a moment's notice, block users from accessing their money/dapps.
## How can we detect censorship?
This section outlines **(1) how Ethereum works** and then **(2) how we can detect censorship**.
### (1) How Ethereum works.
What is happening when we call `cast call 0x222... "contenthash(bytes32 node)" tornadocash.eth`?
* Ethereum is a database with a microservices layer called smart contracts, which run on the EVM.
* To write to the database, we send transactions. To read from the database, we call these smart contracts and get data.
* The read/write messages are sent over an RPC protocol, called [JSON-RPC](https://ethereum.org/en/developers/docs/apis/json-rpc/) to an Ethereum node.
* The Ethereum node tracks two things - consensus (the hash of the latest block of transactions in the database) and execution (the world state and processing of txs).
* `cast call` translates to an `eth_call` RPC, which translates to running the EVM with the following message (as EVM is a message-passing model):
* ENS is the domain name system, mapping `(name => (key => value))`. To track this, we call a contract called the resolver. The resolver's address is `0x226159d592E2b063810a10Ebf6dcbADA94Ed68b8`, which we'll call `ENS_RESOLVER`.
* We are calling `contenthash(bytes32 node)` ([impl](https://github.com/ensdomains/resolvers/blob/master/contracts/profiles/ContentHashResolver.sol)), a function on the resolver contract.
* Our call data is encoded according to the EVM [calling convention](https://en.wikipedia.org/wiki/Calling_convention), wherein we concat the 4 byte function selector with its ABI-coded arguments.
* `cast abi-encode "contenthash(bytes32 node)(bytes memory)" $(cast --from-ascii "tornadocash.eth")`
* This creates our message for the EVM to execute - `Message(from=0x0, to=$ENS_RESOLVER, data=0x746f726e61646f636173682e6574680000000000000000000000000000000000, value=0 ether)`,
* When `eth_call` is run, the EVM executes the bytecode of the contract, and returns data from the storage.
How does storage work?
* Traditional databases use SQL to represent data, and we write SQL in order to read/write it. In Ethereum, the language is EVM bytecode, and operates purely on a key-value basis (`sstore`, `sload` opcodes), no relational model (joins, etc). Smart contracts are like writing programs that natively use the database for storing their data structures.
* EVM has two notions of memory locations - `memory` aka RAM, and `storage` aka disk.
* Every contract has its own private namespace for `storage`, and other contracts cannot read it, they must use contract calls to interface with each other.
How does consensus work?
* Ethereum is a blockchain, meaning the latest block hash represents the state of your entire system - all of the transactions it has processed, the latest state of the database, the balances, the smart contract programs, etc.
* An easy way to think about it - each block represents a tick of the system, and the block hash is like the time.
* In Ethereum 1.0, the clock was based on proof-of-work. But since August 2022, it's been upgraded to a new protocol, proof-of-stake.
* **You can track the clock without tracking the rest of the database**. This is called a _light node_, but since Eth 2.0, it just means running a "consensus node" - since the consensus layer has been split from the execution layer.
How does the block hash represent?
* The block hash represents the cryptographically authenticated state of Ethereum - which is a fancy way of saying, it's a big fat merkle tree, and you can prove anything in the database by revealing a path from the root to the leaf.
* The [**_seminal_ diagram for the Ethereum world state is here**](https://ethereum.stackexchange.com/questions/268/ethereum-block-architecture/6413#6413). Seriously, this was made in 2018 and is just _that_ fucking good.
* Simply put, Ethereum's world state is split into 3 tries - accounts, code, and storage.
* This looks like:
* `state => (Accounts(address => balance), Code(address => bytes), Storage(contract_address => (bytes32 => bytes)))`
### (2) How we can detect censorship.
Concept:
* If we have a consensus node, we know the block hash.
* If we know the block hash, we can verify proof of anything in the database.
* Looking up the `contenthash` for `tornadocash.eth` is simply running a very small amount of EVM code, that interacts with a very small amount of state.
* `state.Code[ENSResolver]`
* `state.Code[ContentHashResolver]`
* `state.Storage[ContentHashResolver][hashes][tornadocash.eth]`
* If we ask the RPC node for this state using `eth_getStorageAt`, we can **trivially** verify if it was censored or not. How?
* By requesting a merkle proof of the path: `(block_hash, storage, ContentHashResolver, hashes, tornadocash.eth)`
* If the hash check fails, then we know the state leaf isn't authentic.
Ideation:
* A lightweight consensus node like [Helios](https://github.com/a16z/helios).
* Requesting state directly from the RPC node using [`eth_getStorageAt`](https://docs.alchemy.com/reference/eth-getproof)
* Executing EVM `eth_call` client side (ie. something like [Wei/FUCory's work](https://twitter.com/fucory/status/1608193056725139456?s=61&t=boSMYnkV-3i-5YN_FeTIoQ)), and [lazily loading](https://en.wikipedia.org/wiki/Lazy_loading) the storage from the remote execution node.
* Load the `msg.to` contract's code.
* Execute a local EVM.
* When encounter `CALL`, load the corresponding contract's code.
* When encounter `SLOAD`, load the corresponding storage key.
* Verify both of these through Merkle proofs, so we can detect inauthentic state.
* Return the value of the call like normal, ie. `contenthash(xx)`
Next steps:
* Sanity check this could work.
* Build this.
* Run it against every publicly available RPC provider - Infura, Quiknode, Alchemy, POKT.
**Why?** Because while we know which nodes censor transactions to Tornado, we don't know which nodes censor read-access to Tornado.