What is the best way to debug low level EVM?

Disclaimer: grammar fixed by AI. I was having fun with a Solidity CTF. Since writeups are not allowed, I am not going to share details about it. After reading through the code, I narrowed down the suspect to inline assembly, specifically, memory corruption problems. ## Contemplating debugging But how exactly do I approach the exploit? Googling "EVM memory exploit" did not return anything useful. How does anyone approach this type of exploit? What tools do they use? Maybe most people aren't doing low-level debugging. OK, even though the world does not have what I need, I will still get what I want. That is what you call a hacker. hehe! Although I know that memory corruption can affect some pointers, the details are missing. I did have some fun with stepping through EVM bytecode one by one and figuring out what it is doing. But that was a long time ago. Since then, I have lived comfortably within the abstraction of high-level programming languages. So I might want to do it again. ## Stepping through bytecode with Foundry Writing a quick test in Foundry and running it in debug mode. Messy. Intelligible after I really try to understand it. But Foundry debug mode feels very limited, unlike GDB. After struggling for half an hour, I finally get what I need. Now, I just need to change my code a little bit, run the test again, and debug again. It should only take...forever! In hindsight, there is a breakpoint feature, but you have to use the cheatcode inside the code, which is not the most convenient way. Also, I wonder why the interface is so different from GDB, as I could not do things like changing memory. Is this something about the EVM? Anyway, this simply does not feel like the correct way to debug hardcore low-level stuff. Then I recall there is a new cool tool called Simbolik from Runtime Verification, which has good integration with VS Code. I did play with it before, but nothing meaningful yet. As I am always skeptical about new tools, I approach it with caution. Well yeah, it is somewhat intuitive, but you have to write the whole test in one function, and I'm not sure if it works with different callers like `vm.startPrank(caller)` in Foundry. I will have to tweak my Foundry test cases just so I can run them in Simbolik. And...it worked! There are some bugs, so I can't always get it running correctly, but I kind of get what I want. I see how my payload is used in memory and now understand the challenge better. But then, this has similar functions to Foundry testing, with a better UI and probably less versatility. So this also does not feel like the solution. Why don't I simply try to figure it out with symbolic execution like Kontrol? Maybe this is exactly the kind of thing it can solve! Remember the awesome symbolic execution engine called "angr"? Basically, symbolic execution does not use concrete values but symbolic (abstract and undetermined) values, which is why it considers all cases. In the end, you can evaluate the symbols in angr. This could be awesome! ## Trying symbolic execution But as always, skeptical. I went to my home lab, but Kontrol is not running for me. Why does God hate me? After debugging for a bit, I find it is probably my Linux kernel experiments that messed things up. So I did a reboot, which somehow took an hour before the lab went online by itself. Now, Kontrol, be a good boy, and tell me the magic spell! Alohomora! After setting it up, I hit run. However, it seems to be taking its time. Kind of worried because there are things not meant to be handled by symbolic execution due to path explosion, e.g., loops. But I will still leave it running. The way I understand symbolic execution, this should be fine. In the meantime, I tinkered with other tools, then did other things, and then it is the next day. I connect to my home lab, attach to that tmux session, and then yes! It did find something after 1.5 hours! However, the path constraints (symbolic and abstract descriptions of the solution) are so messy. I could understand partly, but that is it. Did it not give me a counterexample? If it did, then that should do the trick. Carefully read the documentation again, and it really looks like it should give me a counterexample. Tried going over the path constraints in K for another hour, and then finally decided not to self-torture. Let's do it the hard way. I divided the dynamic bytes data into many `bytes32` variables, hardcoded some of the bytes to save it some work, and let it run again. This time, it gives me something in 8 minutes! Also, concrete values! Then that is the answer: only dynamic types don't have concrete values (also kindly confirmed by Runtime Verification Discord). But wait a minute. My skeptical hacker intuition tells me that is not the solution. It does not seem to contain what I want it to contain. Moreover, when I use those values in Foundry tests, it simply fails. Then I tried to use the concrete values in Kontrol, and it also fails to find an exception (by showing it passes). Why would it lie to me? I thought we were friends! Let's try to figure out whether that solution makes sense by going with the hardcore way of Foundry debugging. Yes, the solution makes no sense at all. Just when I was ready to smash my keyboard as a scapegoat, I recalled that Kontrol sometimes does [this](https://docs.runtimeverification.com/kontrol/guides/advancing-proofs): "**Invalid execution paths:** It is not always easy for **Kontrol** to deduce that a particular execution path is infeasible. If an invalid execution path is traversed, it means that **Kontrol** failed to identify a contradiction from the path condition.". How to solve this problem? By manually telling it in K, a language that I don't speak! But I am not sure if that is always possible even for experienced professionals, since the path constraints can be really messy! Maybe I can get something out of it if it didn't "fail fast". There should be multiple failures, where one of them is the solution while others are non-obvious infeasible paths. But I do not find a way to disable fail fast. I think it did fail fast because there are still some pending: ``` test%BridgeTest.testKontrol(uint256,uint256,uint256,uint256,uint256,uint256,uint256,uint256,uint256,uin… F… FAILED: 69 nodes: 3 pending|7 passed|1 failing|0 vacuous|0 refuted|… ❌ PROOF FAILED ❌ test%BridgeTest.testKontrol(uint256,uint256,uint256,uint256,uint256,uint256,uint256,uint256,uint256,uint256):0 ``` Ready to smash my keyboard again, but a half-brilliant idea came to my mind: I should be able to figure out whether the path constraint is possible by swapping in the concrete values? Anyway, enough with Kontrol experiments for now. Then I wanted to give Certora a chance, but the second I created the spec file, I immediately gave up. Harness, havoc, vacuity, and weird things. Just thinking how to express the proof in CVL gives me a headache. I know some professionals can find real bugs with Certora, but I don't feel like doing it today. So conclusion: Kontrol does not always help with low-level debugging, especially with dynamic symbolic types. I also tried to adjust the number of symbolic `bytes32`, just in case that was the problem, but none worked. ## Bug with Foundry and Simbolik I went back to Foundry to guess the solution one more time. Maybe it is faster to solve it this way, and probably that is what others did. Somehow, this gives me the weirdest bug I have ever seen. Since my guess is wrong, the test fails due to a memory error. However, when I debug, it starts to use an insane amount of memory, and finally the process gets murdered by a spy agency called kernel OOM. It does not run, and I am blind! Simbolik fails to run as well. I could keep guessing blindly, which will probably give me the answer. But no, my goal is not to solve this challenge but to think philosophically, like what is the best way to debug low-level bugs. As much as I love a genie to give me a magic spell, I still have to have a way to debug it manually. How do I do it? Interrogating GPT did not give me anything useful—they start to admit crimes that they definitely did not commit, like killing the emperor. Yeah, although surrounded by high tech, I am still on my own now. If I could stop using all that high tech and go back to basics... Py-EVM! This could be a very good way to do it! I know it is not "built for this", so normal users won't be able to do it, but a challenge? I laugh at challenge! HAHA! I welcome technical difficulties, which only make me stronger. The idea is to debug it like Python, which is super nice and intuitive, and infinitely hackable! This is like a walk in the park for CTF rev challenges! Let's see if there are some good resources on this...None. But is it actually reliable or like one of those academic experiments that is no longer maintained? Oh yeah, it is probably cool since it is developed by the official Ethereum team! The old documentation scared me, but after checking the source code, I believe it is up to date. Let's freaking go! The documentation is as detailed as a bald man's hair. It is tricky to figure out how to bend the power creature to my will. But since I have a PhD in Python, I quickly get a grasp of the codebase. I decided the best documentation for me would be the test cases. If you are not familiar with development, basically think of testing as what it is trying to achieve and how it is achieving it. Of course, as a virtual machine, it will have to test its bytecode running, which should be what I am looking for. ## Tinkering with Py-EVM Since it is a Python package, I install it with `pip install .`. I see it is using pytest as the test framework and has a `Makefile`. I simply run `make test`, and it simply fails to run. Damn it! It seems to be missing requirements. Upon closer look, it is only including some of the libraries by default. In `setup.py`: ```python extras_require["dev"] = ( extras_require["dev"] + extras_require["docs"] + extras_require["eth"] + extras_require["test"] ) install_requires = extras_require["dev"] # change from eth to dev, to run tests ``` Now, `pip install .` again, and test again... Still no. ``` =========================================================================== short test summary info ============================================================================ ERROR tests/json-fixtures/blockchain/test_blockchain.py - AssertionError: Suspiciously found zero fixtures: ['/home/kali/Desktop/tools/py-evm/fixtures/LegacyTests/Cancun/BlockchainTests', '/home/kali/Desktop/tools/py-evm/fixtures... ERROR tests/json-fixtures/test_difficulty.py - AssertionError: Suspiciously found zero fixtures: ['/home/kali/Desktop/tools/py-evm/fixtures/DifficultyTests'] ERROR tests/json-fixtures/test_transactions.py - AssertionError: Suspiciously found zero fixtures: ['/home/kali/Desktop/tools/py-evm/fixtures/LegacyTests/Cancun', '/home/kali/Desktop/tools/py-evm/fixtures_eest'] ERROR tests/json-fixtures/test_virtual_machine.py - AssertionError: Suspiciously found zero fixtures: ['/home/kali/Desktop/tools/py-evm/fixtures/LegacyTests/Constantinople/VMTests'] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ========================================================================= 1 skipped, 4 errors in 0.77s ========================================================================= make: *** [Makefile:41: test] Error 2 ``` Fixture problems! What is it? A fixture in pytest is a function decorated with `@pytest.fixture` that provides reusable setup/teardown logic for tests. It exists to centralize test dependencies (like database connections or mock data), eliminating boilerplate code and enabling consistent, modular test environments. This is very easy to understand. When you do tests, you need to test a certain feature inside a certain scenario, so you have to build that scenario as well. Fixtures are the building blocks of the scenarios. [Somehow](https://github.com/ethereum/tests/tree/9201075490807f58811078e9bb5ec895b4ac01a5?tab=readme-ov-file#clients-using-the-library) the developers decided to make the fixtures a git submodule. It is more modular so other projects can reuse them. I did not pull them. Now `git submodule update --init --recursive` to pull them. And oh yeah, it finally starts running. There are so many tests in the fixture that it looks like it will not end in a short period of time. Most of them are irrelevant here, so I will just stop the tests or simply focus on the one that I am interested in. I picked test case `tests/core/vm/test_computation.py`. It does exactly what I think it does and lets me debug! Exciting! Maybe I can use a similar method to debug JVM as well? Sadly PyJVM is not a full implementation. ![Pasted image 20250725152047](https://hackmd.io/_uploads/HyzDn2Svlg.png) Now we know how to run a single tx, but that is far from what we can easily do in Foundry, e.g., creating contracts, mocking addresses, etc. Maybe what I am looking for is web3.py and EthTester? Basically web3 is the interface, EthTester is a framework, and Py-EVM does the actual work. I can confirm that web3 is using my experimental Py-EVM codebase. ``` Python 3.13.2 (main, Mar 13 2025, 14:29:07) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from web3 import Web3, EthereumTesterProvider >>> w3 = Web3(EthereumTesterProvider()) ... ... w3.is_connected() ... True >>> e= EthereumTesterProvider() >>> e <web3.providers.eth_tester.main.EthereumTesterProvider object at 0x7f456ad2ce10> >>> dir(e) ['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', '_batch_request_func_cache', '_batching_context', '_current_request_id', '_is_batching', '_middleware', '_request_cache', '_request_cache_lock', '_request_func_cache', 'api_endpoints', 'cache_allowed_requests', 'cacheable_requests', 'ccip_read_max_redirects', 'ethereum_tester', 'global_ccip_read_enabled', 'has_persistent_connection', 'is_async', 'is_connected', 'logger', 'make_request', 'request_cache_validation_threshold', 'request_func'] >>> e.ethereum_tester <eth_tester.main.EthereumTester object at 0x7f456a8a5590> >>> dir(e.ethereum_tester) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', '_account_passwords', '_account_unlock', '_add_all_to_pending_block', '_add_log_entries_to_filter', '_add_transaction_to_pending_block', '_block_filters', '_filter_counter', '_get_pending_transaction_by_hash', '_handle_filtering_for_transaction', '_log_filters', '_normalize_pending_transaction', '_pop_pending_transactions_to_pending_block', '_process_block_logs', '_reset_local_state', '_revert_block_filter', '_revert_log_filter', '_revert_pending_transaction_filter', '_snapshot_counter', '_snapshots', 'add_account', 'auto_mine_transactions', 'backend', 'call', 'create_block_filter', 'create_log_filter', 'create_pending_transaction_filter', 'delete_filter', 'disable_auto_mine_transactions', 'enable_auto_mine_transactions', 'estimate_gas', 'fork_blocks', 'get_accounts', 'get_all_filter_logs', 'get_balance', 'get_block_by_hash', 'get_block_by_number', 'get_code', 'get_fee_history', 'get_logs', 'get_nonce', 'get_only_filter_changes', 'get_storage_at', 'get_transaction_by_hash', 'get_transaction_receipt', 'lock_account', 'mine_block', 'mine_blocks', 'normalizer', 'reset_to_genesis', 'revert_to_snapshot', 'send_raw_transaction', 'send_transaction', 'take_snapshot', 'time_travel', 'unlock_account', 'validator'] >>> e.ethereum_tester.backend <eth_tester.backends.pyevm.main.PyEVMBackend object at 0x7f456a8a56d0> >>> dir(e.ethereum_tester.backend) ['__abstractmethods__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_create_type_aware_signed_transaction', '_create_type_aware_unsigned_transaction', '_generate_genesis_params', '_generate_genesis_state', '_get_normalized_and_signed_evm_transaction', '_get_normalized_and_unsigned_evm_transaction', '_key_lookup', '_max_available_gas', '_normalize_transaction', 'account_keys', 'add_account', 'apply_withdrawals', 'call', 'chain', 'estimate_gas', 'from_mnemonic', 'generate_genesis_params', 'generate_genesis_state', 'get_accounts', 'get_balance', 'get_base_fee', 'get_block_by_hash', 'get_block_by_number', 'get_code', 'get_fee_history', 'get_nonce', 'get_storage', 'get_transaction_by_hash', 'get_transaction_receipt', 'is_eip838_error', 'mine_blocks', 'reset_to_genesis', 'revert_to_snapshot', 'send_raw_transaction', 'send_signed_transaction', 'send_transaction', 'take_snapshot', 'time_travel'] >>> from eth.vm import forks >>> forks.__file__ '/home/kali/Desktop/tools/py-evm/eth/vm/forks/__init__.py' ``` Now, everything should just work. I simply use the web3 library normally and set breakpoints in Py-EVM, then it should all work. I write a simple script like this: ```python from web3 import Web3, EthereumTesterProvider from solcx import compile_source w3 = Web3(EthereumTesterProvider()) assert w3.is_connected() source_code = ''' pragma solidity ^0.8.0; contract Vault { uint256 public num; constructor() public payable { num = msg.value; } function setNum(uint256 n) public { num = n; } } ''' compiled = compile_source(source_code, output_values=['abi', 'bin']) compiled = list(compiled.values())[0] vault = w3.eth.contract(abi=compiled['abi'], bytecode=compiled['bin']) # create contract construct_tx = vault.constructor().transact() tx_receipt = w3.eth.wait_for_transaction_receipt(construct_tx) print(tx_receipt) vault = w3.eth.contract(abi=compiled['abi'], address=tx_receipt.contractAddress) # call setNum vault.functions.setNum(123).transact() print(vault.functions.num().call()) ``` But it is not hitting the breakpoint in Py-EVM. Even when I rename the entire Py-EVM folder, which should break the code, the code runs normally. Later I found that the only reason it used my cloned codebase instead of the library under `site-packages` is that the script was under the root of the cloned codebase. Putting it anywhere else will use the one under `site-packages`. Do not forget to set `justMyCode` to false inside VS Code. Finally debugging baby! Now I feel the power! POWERRRRRR! ![Pasted image 20250725165750](https://hackmd.io/_uploads/SykB32HPle.png) Of course, since we are debugging, we probably want to change the code for Py-EVM. Therefore, it is a good idea to use the cloned codebase, not the library. After a simple sanity check with a toy protocol, I begin to set up my experiment. Basically, rewrite the CTF Foundry logic in Python. There are some naive and ugly hacks... Now I find my exploit reverts, which is expected, but due to lack of gas. I wonder if there is some mechanism that consumes all the gas on error. Since I am the god of this VM, I can print remaining gas to my log. After inspecting the log, I see that the gas is actually being used normally, and it used up! Once I give it more gas, it just keeps running, kind of like an infinite loop. It keeps expanding memory and consuming a lot of gas. My idea was to first see the exception and then figure out what is wrong. Maybe it is a bad idea. Now try to cut in from the function where it starts to deal with memory stuff. I just need to know the program counter for that function and break there. But that is kind of hard since I don't know what program counter corresponds to what source code. I have no experience with an EVM disassembler, so I will just get the PC from Foundry debug or Simbolik for now. ### Foundry and Py-EVM Since the normal PoC test will kill Foundry debug mode, I will have to tweak the test case, then get the source code to bytecode mapping. Before I do that, I think Simbolik might be faster. But no, it refuses to start up for unknown reasons. Then I thought maybe "Remix" is better since you get to have a GUI breakpoint, but my bytecode seems to be inconsistent with its bytecode, and I also know that Remix has some bugs that once put me in a huge rabbit hole. I will stick with Foundry for now. From the `output` folder in the Foundry project, I found the `json` file containing the compiled bytecode. Then I swapped that into my Python code. After navigating in Foundry debug mode for a minute, I found what I was looking for. Comparing it with my log for Py-EVM...yes that is consistent. I got confused for a second, but keep in mind that in Py-EVM, stack top is at -1, while Foundry debug has the stack top visually on top. Now I think I know what is partially going on. Foundry debug shows a place where it is checking the length of an array. I break at the same program counter in Py-EVM, and it takes the wrong value, which is 1 ether. It is looping 10 to the power of 18 times, which explains the behavior that I saw earlier. I need to restructure my `calldata` to make it loop only once. Guess what? Since I am in Py-EVM, I am simply going to rewrite the memory and go on. You know what I am talking about? And...mission accomplished. CTF solved. Debug tool found. Although right now it is very dumb and clunky, since it is infinitely hackable, I can develop upon it. ## Getting answers - Why doesn't Foundry debug allow a GDB-like experience, like changing memory? Is it something about EVM? Definitely not. I can do it in Py-EVM. Foundry debug seems to need the TX to finish first and then debug that TX in read-only mode. - Is Foundry or Simbolik the best tool for low-level debugging? No. They are mostly good, but low-level debugging can use more powerful features. - Is Kontrol good for low-level debugging? No. It can probably do what you want it to do, but it's complex. It might give you infeasible paths, and you will have to manually convince Kontrol that it is infeasible in K, then it can go on. Also, Kontrol does formal verification, and cannot replace normal debugging. - Is Py-EVM good for normal users? Definitely no. It is somewhat complex. But if you are a Python developer or know reverse engineering, then this is the best option. - How does others approach this type of low level debugging? I can't say for sure. Py-EVM is the standard reverse engineer way of doing it. I also know that some elite hackers just understand the low level stuff really good, and just do the math in their brain without the need of powerful tools. - Why debug in Simbolik and Foundry is read-only? They first have a TX done, and then trace that TX. This is not necessary. ## Template for web3.py + Py-EVM testing `test.py` ```python from web3 import Web3, EthereumTesterProvider from solcx import compile_source def assert_no_revert(tx_hash): r = w3.eth.wait_for_transaction_receipt(tx_hash) assert r.status == 1 w3 = Web3(EthereumTesterProvider()) assert w3.is_connected() source_code = ''' your source code here... ''' # create accounts deployer = w3.eth.account.create() w3.eth.send_transaction({'from':w3.eth.accounts[0], 'to':deployer.address, 'value':w3.to_wei(200, 'ether')}) print(f'deployer key={w3.to_hex(deployer.key)}, account={deployer.address}, value={w3.eth.get_balance(deployer.address)}') user = w3.eth.account.create() w3.eth.send_transaction({'from':w3.eth.accounts[0], 'to':user.address, 'value':w3.to_wei(2, 'ether')}) print(f'user key={w3.to_hex(user.key)}, account={user.address}, value={w3.eth.get_balance(user.address)}') w3.provider.ethereum_tester.backend.add_account(deployer.key) w3.provider.ethereum_tester.backend.add_account(user.key) compiled = compile_source(source_code, output_values=['abi', 'bin']) compiled = list(compiled.values())[0] compiled['bin'] = 'copy from foundry...' bridge = w3.eth.contract(abi=compiled['abi'], bytecode=compiled['bin']) print('bytecode:', compiled['bin']) # create contract with deployer, register validator w3.eth.default_account = deployer.address tx = bridge.constructor().transact() assert_no_revert(tx) r = w3.eth.wait_for_transaction_receipt(tx) bridge = w3.eth.contract(abi=compiled['abi'], address=r.contractAddress) assert bridge.functions.owner().call() == deployer.address tx = bridge.functions.registerValidator( "0x" + "0" * 40, b'Some Tag'.ljust(32, b'\0') ).transact({'value':w3.to_wei(100, 'ether')}) assert_no_revert(tx) # user register validator and vote w3.eth.default_account = user.address calldata = bytes.fromhex('your calldata hex...') + zero + zero + deadbeef + one + memory_pointer + one + bytes.fromhex(user.address[2:]) tx = { 'to': bridge.address, 'data': calldata, 'nonce': w3.eth.get_transaction_count(user.address), 'gas': 100000, 'gasPrice': w3.to_wei('20', 'gwei'), } signed_tx = user.sign_transaction(tx) tx = w3.eth.send_raw_transaction(signed_tx.raw_transaction) assert_no_revert(tx) ``` `eth/vm/computation.py` for breakpoints and printing: ```python @classmethod def apply_computation( cls, state: StateAPI, message: MessageAPI, transaction_context: TransactionContextAPI, parent_computation: Optional[ComputationAPI] = None, ) -> ComputationAPI: ... opcode_lookup = computation.opcodes for opcode in computation.code: try: opcode_fn = opcode_lookup[opcode] except KeyError: opcode_fn = InvalidOpcode(opcode) if show_debug2 or 1: # We dig into some internals for debug logs base_comp = cast(BaseComputation, computation) try: mnemonic = opcode_fn.mnemonic except AttributeError: mnemonic = opcode_fn.__wrapped__.mnemonic # type: ignore print( f"OPCODE: 0x{opcode:x} ({mnemonic}) | " f"pc: 0x{max(0, computation.code.program_counter - 1):x} | " f"stack: {base_comp._stack} | " f"Gas: {computation.get_gas_remaining()}" ) try: opcode_fn(computation=computation) except Halt: break except Exception as e: print(e) raise e return computation ```