Search for ERC20 balance storage slots

![image](https://hackmd.io/_uploads/SJH-rmWQR.png) # Search for ERC20 balance storage slots ## Introduction This post explores my journey of aiming to change ERC20 token balances on a forked network, leading to the development of a tool that identifies balance storage slots for arbitrary tokens. You can find the repository [here](https://github.com/halo3mic/token-bss). It will encompass: * the use case for finding the slot of the ERC20 token balance; * how ERC20 balance is stored; * different ways of finding the balance storage slot, including from call traces; * and how to leverage REVM and EVM-inspector to streamline the latter. Let's start by understanding why anyone would want to change the ERC20 token balance on a forked network. ## Streamlining YakSwap testing Three years ago, I had an opportunity to create a fully on-chain DEX aggregator: [YakSwap](https://github.com/yieldyak/yak-aggregator). The aggregator uses a single router contract that compares rates across different pools in an attempt to find the optimal path through various markets and intermediate assets. To effectively compare markets, the router needs to interact with them using a consistent interface. This is where adapters come into play - contracts unifying different dexes under the same interface, ranging from simple to moderately complex designs. While YakSwap was already serving its customers, new DEXes with unique logic were popping up left and right, each requiring a new adapter to integrate DEX into our system. Any carelessly added adapter presented a risk of affecting the whole system, either returning users an error, unrealistically high quotes, or, the worse, bad quotes that could lead to users losing money. Thus, each adapter had to be rigorously tested before integration. Testing an adapter involved ensuring the quotes matched the ones offered by the underlying DEX and that swap works as expected. This meant that you needed to somehow mimic the underlying system. With enough time and patience, the best strategy would be to fork the code of the underlying dex, create your own tokens, supply the liquidity, and test the adapter. However, there was time pressure associated with adding the support for new dexes, and some of them were significantly complex to initiate locally - another approach was needed. This "another approach" was to fork the state of the blockchain and test the functionality on a fork with existing(real) liquidity. This is simple for testing the quoting mechanism, but for swapping, you need the tokens associated with the pool. I resorted to the simplest approach, which was using Hardhat's `hardhat_impersonateAccount` to imitate the account with a large balance of a desired asset. For some time, this worked well, but it had a downside for each test token holder with a sufficiently large balance of desired assets had to be found, and it was difficult to streamline the process. Alternative soon presented itself. Around that time hardhat released their method `hardhat_setStorageSlot` which offered a way to modify storage on a forked network. This presented an opportunity to set the token's balance to an arbitrary value for a holder given balance storage slot - presented in [this article](https://kndrck.co/posts/local_erc20_bal_mani_w_hh/). By automating finding the storage slot for the token balance, which we will get into very shortly, the testing process became streamlined and trivial. Less time was required for testing, and it was easy for other developers to contribute without having to learn a complex set of instructions. But how can finding the storage slot be automated? Let's first look into how token balances are reperesented in the EVM. ## ERC20 Token balance representation In an ERC20 token contract, you need to track the balances of many token holders. A common way to represent this is through a mapping of key-value pairs. Solidity and Vyper, the most commonly used languages for smart contract development, both support mappings for this purpose. In both Solidity and Vyper, mappings are represented under the hood using a hash of the storage slot associated with the mapping and the key. For token balance, the key is typically the address of the token holder. * Solidity: The storage location is calculated as `keccak256(key + slot)` * Vyper: The storage location is calculated as `keccak256(slot + key)` The resulting hash represents the storage location where the value (i.e., the balance) associated with a specific key (i.e., the holder's address) is stored. By knowing the storage slot, you can find the specific storage location for a particular holder's balance and read or modify it as needed. So, how can we find the balance storage slot? ## Search for the storage slot In Solidity and Vyper, storage slots are allocated sequentially, starting from zero and increasing up to 2^256. This sequential allocation is determined by the position of the variables in the smart contract, following a "first come, first served" order. #### Example For smart contract: ```solidity pragma solidity ^0.8.9; contract Example { string public name; mapping(address => uint) public balance; } ``` The value of `name` will be associated with storage slot 0 and balance with storage slot 1. This leads us to the first approach to finding the balance storage slot for a token. ### The storage layout 🔖 Both Solidity and Vyper can export a contract's storage layout, mapping variable names to their storage slots. This method works well when the contract source code and variable names are known, though it's less effective when dealing with proxy contracts or external balance storage. <details> <summary>Toggle for longer version</summary> Both Solidity and Vyper compiler provide a way to export storage layout associated with a smart contract. Storage layout maps variable name with the associated storage slot. * Solidity: `solc --storage-layout ExampleContract.sol` * Vyper: `vyper -f layout ExampleContract.vy` For our use case, we could leverage this to obtain the storage slot of the `balances` variable, the most common choice for naming the balance mapping. While this is quite straightforward, it suffers from some limitations, especially in terms of automating the process. For one, this approach only works if you know the source code of the smart contract and the exact name of the variable holding token balances. Although, this is not a given, almost all relevant tokens have source of their smart contract known and name the balance variable `balances`. The biggest issue here would be the edge cases and obtaining a large amount of contract source code if that was needed, of course. For example, USDC on Ethereum stores balance in variable named `balanceAndBlacklistStates.` Another issue with this approach is that tokens where balances are stored in a different contract than the one token is represented with could complicate the process. Similar complications could occur with proxies. </details> ### Guess and pray 🍀 For tokens using low-indexed storage slots, guessing can be quick and effective. However, this method falls short for higher-indexed slots or when balances are managed in separate contracts, as with the SNX token. <details> <summary>Toggle for longer version</summary> If the storage slot of the token you want to set the balance for is near 0, then guessing is an amazing strategy - simple and effective. Just loop through the N storage slot, and you are done. That said, token contracts differ and can use any storage slot to represent their balance. It might be easy to find storage slot 3, but good luck looping to find the storage slot at 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff. As with the previous approach the issue with this method is that the balance is not always stored on the token contract. For example SNX token contract is `0xc011a73ee8576fb46f5e1c5751ca3b9fe0af2a6f` but the balance is stored in the contract `0x5b1b5fea1b99d83ad479df0c222f0492385381dd`. </details> ### Forge cheat codes ⚒️ Forge's state recording cheat codes can capture the effects of balance-related transactions, highlighting potential storage slots. This method is efficient within the Forge environment but can be cumbersome to scale. <details> <summary>Toggle for longer version</summary> Forge offers a way to record the state of the blockchain with their [cheatcode `startstatediffrecording`](https://book.getfoundry.sh/cheatcodes/stop-and-return-state-diff). Here, one could start recording the state, then call the selected token with the `balanceOf` call and select the appropriate touched storage slot. The last part is straightforward if the call only touches a single storage slot, but it becomes more ambiguous when there are many. Overall, the approach works great if you already have the contract in the Forge environment, but it is difficult to streamline this process. This does however give us a hint that investigating the effects and/or the process of the `balanceOf` call could reveal the storage slot we are searching for. </details> ### Investigating the call trace 🔎 Tracing provides a granular view of EVM execution, showing each opcode and its effect. By analyzing balanceOf calls and tracking SLOAD and KECCAK256 opcodes, we can pinpoint the storage slot used for balances. This method considers contract interactions and can accurately identify relevant storage slots, even across contract boundaries. <details> <summary>Toggle for longer version</summary> Tracing refers to detailed recording of the execution process of the single call or whole transactions within EVM. What this mean practically is that every opcode instruction is recorded with the state of stack, memory and storage at the time of its executions. This is helpful for debugging and could also help us discover the storage slot associated with token's balance. Calling `balanceOf` method returns the latest balance of the token for a specified holder. This balance is retrieved from storage, and the only way EVM can do that is with opcode `SLOAD.` `SLOAD` takes the storage slot as an argument from the stack and returns the value at that position. So, finding `SLOAD` from the call trace and taking the top value from the stack would give us the mapping location for the holder. This represents the storage slot where the balance for a particular holder is located, but not a general balance storage slot. Remember that the way the mapping in Solidity and Vyper work is by storing the value at the location `keccak256(concat(key, slot))`. We are interested in finding the hashed storage slot, as from it we can find balance storage slot for any holder. In EVM hashing is done with the opcode `KECCAK256`, so we know this opcode needs to be called before the storage is accessed. Finding the `KECCAK256` instruction that preceeded `SLOAD` one could help us identify the parameters used in hashing - thus finding the storage slot associated with token balance. In theory there could be many unrelated hashings operations before `SLOAD` and `SLOAD` doesn't have to directly follow `KECCAK256`, there could be other operations in between. As a solution we could hash the parameters found in memeory at `KECCAK256` and compare this to `SLOAD` operation. As mentioned earlier, there are tokens that store the holder's balance in a different contract than the one token is associated with. Therefore, we need to know what contract the `SLOAD` is executing on. Traces normally store field depth, which increases each time execution steps into a new contract and decreases when execution returns. The only way that can occur is with `CALL` and `DELEGATECALL` opcodes. Thus, with mapping entered contracts at `CALL` and `DELEGATECALL` to a specific depth of the execution, we can keep track of the contract address each opcode executes on. YieldYak used some earlier versions of this method that worked with many more assumptions. The code can be found [here](https://github.com/yieldyak/yak-aggregator/blob/686f66706569f72d92018455077d2045c49bebeb/src/test/helpers.js#L16-L47). </details> ## Quality assurance Another thing to consider is that there could be different mappings called during the `balanceOf` call and the balances could be just one of them. For example, there could be mapping access for a token holder checking whether the holder is blacklisted. Thus, we could get multiple possible solutions for slots, and we need to find one that works. To do that we could call `balanceOf` call with overrides where in overrides we specify the desired balance on the found slot. If the call results in the desired balance, we can be confident that the used storage slot is associated with the token balance. Sometimes, the storage slot value doesn't directly map to the holder's balance. For example, stETH shows the underlying ETH balance and only calculates the actual balance when `balanceOf` is called. Thus, changing the holder's storage won't change the holder's balance 1:1. ## Let's streamline it Overall, I found tracing to be the most effective method among the ones listed. But how do we get traces? ### How to get traces? #### Run your own node The simples option to obtain traces is to run your own node and allow exposing `debug` API. With this you can trace with `debug_traceCall` with all supported tracing types. #### Most paid providers are not an option Running your own node isn't always possible, or it could present too much of a hassle for the expected output - e.g., you would just like to do a quick experiment. In those cases, it is possible to resort to RPC providers like Alchemy and Quicknode, which offer node access on demand. While both services do offer the `debug_traceCall` method for paid subscriptions, neither of them supports struct traces - traces that show all instructions involved during the call. In short, remote providers do not provide the service needed to find the balance storage slot. #### Local tracing Fortunately, there is another way. Both Hardhat and Anvil support "local tracing." With local tracing, I am referring to fetching the state and executing tracing locally. Thus, if you fork a network with Anvil or Hardhat, you can call `debug_callTrace` even if the underlying provider doesn't support this method. The gist of it is that the blockchain state is pulled on demand during tracing. This makes it very convenient to trace even with public providers like the ones from [chainlist](https://chainlist.org/). ### Rust tool In Autumn 2023, the mentioned approach led me to write Rust library with a CLI tool that helps you find the balance storage slot of an arbitrary token. Twitter post describing it can be found [here](https://twitter.com/MihaLotric/status/1737515503835553927). But still, we have to spin up an Anvil or Hardhat node to do this. Is there a better way? ## Poor man's tracer How does Anvil trace locally? Could we just extract the relevant parts and do it ourselves without spinning a node for this? Under the hood, for tracing, Anvil uses EVM with REVM-inspector, which are both standalone libraries that I bundled together to create a lightweight tracer. The library Poor Man's Tracer can be found [here](https://github.com/halo3mic/poor-mans-tracer). ## API please I went even further and decided it would be useful for frontends and other services to query the storage slot from the API. Think of Rivet, which allows users to set an arbitrary token balance but tries to find the storage slot for it by optimistically checking the first ten slots of the token. ### API Run the following command to get the storage slot for SNX on Optimism, or modify it to query for your favorite token. ```bash curl http://token-bss.xyz/opt/0x8700daec35af8ff88c16bdf0418774cb3d7599b4 | jq ``` The API supports the following networks: * Ethereum with key `eth` * Optimism with key `opt` * Arbitrum with key `arb` * Avalanche with key `avax` You can also spin up your own server by using the codebase: https://github.com/halo3mic/token-bss. --- That is it; thanks for stopping by! For any further questions or discussion feel free reach out on [Telegram](https://t.me/themiha) or [X](https://x.com/MihaLotric).