MedGa: Make Ethereum Debugger Great Again

# Introduction MedGa (**M**ake **E**thereum **D**ebugger **G**reat **A**gain, previously *EtherDebug*) is a team dedicated to enhancing the debugging experience in smart contract development. Our team currently consists of two members, each with over ten years of software development experience: [Dr. Zhuo Zhang](https://zzhang.xyz) and [Dr. William Cheung (Wuqi Zhang)](https://troublor.xyz/). The goal of MedGA is to **develop a robust debugger for Solidity akin to GDB for C/C++ programming, offering functionalities that surpass existing solutions in the market**. # Problem Statement Many typical (yet critical) debugging functionalities adopted by traditional debuggers are still missing from smart contract debuggers, e.g., arbitrary expression watchers and conditional breakpoints. # Roadmap We plan to start developing the debugger based on [Foundry](https://github.com/foundry-rs/foundry), which the majority of Web3 projects are currently using. We will gradually add support for [Hardhat](https://hardhat.org/) following the development of the Foundry version, albeit with a slight delay. Note that our core technique is standalone and does not depend on specific development toolkits. However, developers tend to prefer using existing toolkits over adopting new tools. As a result, we propose to integrate our debugger into existing development toolkits to better serve the ecosystem and achieve a broader impact. ## Task I: On-chain Contract Tweak ### Objective The on-chain contract tweak aims to change the behavior of an already deployed on-chain contract during debugging. It represents the first step in enhancing and streamlining the debugging experience. Through this feature, users can collect valuable information and modify the contract's behavior, e.g., by integrating `printf` functions into on-chain contracts as if they were local. Besides enabling numerous possibilities for a better debugging experience, this _tweak_ feature also serves as the foundational building blocks for other tasks in this proposal. ### Subtasks #### Subtask 1: forge clone Implement a Foundry plugin that clones on-chain contracts into a Foundry project, allowing users to easily modify the source code and compile it using the same compiler settings. #### Subtask 2: forge tweak Using the user-modified code, generate the corresponding deployed code that follows the exact deployment process of the original contract. During this process, multiple subtle issues may need to be overcome. The deployed code will then be embedded in the forking environment used by debuggers, supporting the on-chain tweak feature. ### Current Status We have already completed this stage by implementing plugins for Foundry, such as the `forge clone` and `forge replay` commands, which allow developers to tweak an on-chain contract (e.g., print the value of a variable using `console.log`). **The `forge clone` command has been officially merged into the Foundry toolkit!** [Our project](https://github.com/EtherDebug/foundry/blob/tweak/TWEAK.md) has attracted considerable attention on Twitter. ![twitter recognization](https://hackmd.io/_uploads/SJsKpVv-C.png) We have also been recognized by [Week in Ethereum News](https://twitter.com/WeekInEthNews) twice. ![week in ethereum news](https://hackmd.io/_uploads/S1pJpVvbR.png) ### Traction ![traction](https://hackmd.io/_uploads/SyP3WrDZA.png) ## Task II: Implement Arbitrary Watchers in Solidity for Enhanced Debugging ### Objective **Arbitrary watchers are a feature not supported by existing smart contract debuggers, yet they are a common component in traditional debuggers and have high demand from developers.** Developers often wish to monitor the real-time values of specific Solidity expressions during debugging sessions, such as checking if borrowing is allowed on an account during a collateral deposit transaction. However, this task is complex, even with the successful development of a new debugging format from [EthDebug](https://github.com/ethdebug/format) group. Inspecting variable values alone is often insufficient for thorough debugging. Developers may need to evaluate the expression to gain deeper insights, which does not only require interpreting complex Solidity expressions and but also need to consider function invocations to external contracts. Supporint this features entails the need for dynamic evaluation of Solidity expressions based on the current contract state and execution context. ### Task Description In this task, we will implement arbitrary watchers by dynamically instrumenting Solidity expressions into the original contract and collecting evaluated results at runtime. This approach leverages our "forge tweak" technique developed in **Task I**. We will also introduce a new panel in the user interface of the current Foundry debugger to allow users to specify arbitrary watchers during debugging. ### Note This initiative is orthogonal to the ongoing [EthDebug](https://github.com/ethdebug/format) project. While EthDebug aims to devise a proper debugging format that facilitates better inspection of variable values, evaluating arbitrary Solidity expressions dynamically requires additional development efforts and is beyond their scope. ## Task III: Refine Source Code Integration and Execution in Debugger ### Objective The existing debugger struggles with integrating source code from multiple perspectives. Firstly, most debuggers perform step-by-step execution at the EVM instruction level. Although useful, this does not align well with the debugging practices commonly employed in traditional Web2 development. For instance, GDB supports step-by-step execution at the source code level, which is more intuitive for developers. Additionally, in the Foundry debugger, when code execution moves into an on-chain contract, no source code is displayed, significantly degrading the user experience. Lastly, as observed by the [EthDebug group](https://ethdebug.github.io/format/), the current source map is inaccurate. For instance, some mappings from EVM instructions to source code may be missing, leading to misinformation during debugging. In this task, we aim to enhance the user experience of the Foundry debugger by resolving these aforementioned issues. ### Task Description #### Subtask 1: step-by-step execution at source code level Support step-by-step execution (both forward and backward) at the source code level. This will entail designing effective snapshots during EVM execution. The optimization of resource overhead may be left as future work. #### Subtask 2: display source code downloaded from Etherscan Implement a side-by-side display of contract source code not only for local Foundry projects but also for external on-chain contracts (if any are invoked). When execution steps into the call of an on-chain contract, use the `forge clone` command, developed in Task I, to fetch the source code of these external contracts and display it in the debugger TUI. #### Subtask 3 (next proposal): introduce Workaround to overcome imprecision in source map Improve the resolution of source code location during debugging. The source map produced by the Solidity compiler is imprecise, i.e., some mappings from EVM instructions to source code may be missing, leading to misinformation during debugging. We will enhance the source map by perform analyzing contract control flow (of the source code) to determine the proper source location. This subtask is optional for two reasons: 1. The _EthDebug_ team is actively working on inventing a proper debugging format to replace the source map. Once developed, this issue should be resolved. 2. It is a fundamentally challenging task, requiring considerable effort which is beyond the time and monetary budget of this proposal. However, supporting this subtask would be beneficial since, even after the new debugging format from EthDebug is implemented, on-chain contracts deployed previously will still suffer from the same problem. #### Subtask 4 (next proposal): display decompiled code for on-chain contracts without verified source code Investigating transactions involving unverified contracts is crucial, especially when conducting attack investigations. This subtask requires embedding a decompiler into the debugger, a task whose research and engineering efforts significantly exceed the current budget. Therefore, it remains optional for this proposal. However, it is expected to be supported in the near future due to its high demand. ## Task IV: Advanced Conditional Breakpoints Written in Solidity ### Objective Conditional breakpoint, which are breakpoints with a condition, are another typical debugging functionality for tradiational debugger, and also on high demond according to our developer interview. We aim to provide debugging capability that temporarily stops the execution only when a condition is satisfied. The condition is specified by a Solidity expression. Note that no existing debuggers support conditional breakingpoints written in solidity. Conditional breakpoints, which are breakpoints with a condition, are another typical debugging feature for traditional debuggers and are also in high demand according to our developer interviews. We aim to provide debugging capabilities that temporarily halt execution only when a specified condition, expressed in Solidity, is met. **It's important to note that no existing debuggers currently support conditional breakpoints written in Solidity.** Conditional breakpoints are particularly useful when a piece of contract code executes multiple times in a transaction and developers need to inspect variable values at a specific contract state. For example, they can halt execution when the liquidity of a specific Uniswap pool falls below 100,000 during a swap transaction. ### Task Description (Next Proposal) This feature can be built upon the watchers developed in **Task II**, and we aim to provide greater flexibility and enhance the efficiency of developers by allowing they debug transaction executions with conditional breakpoints. ## Task V: Migrate Debugging Functionalities into Hardhat ### Objective Considering that Hardhat remains a vital toolkit in Web3 development and comprises a significant portion of Web3 projects, it is crucial for us to ensure that Hardhat developers have the same debugging experience as those using Foundry. Therefore, we will continuously migrate developments from Foundry (Tasks I to V) to the Hardhat ecosystem. However, it is important to note that some features might be challenging to directly migrate to Hardhat (without additional efforts), as **Hardhat does not yet have a well-developed interactive debugger**. Fortunately, the Nomic team (the creators of Hardhat) already has plans to build a debugger for Hardhat. As a result, in this proposal, we will first create the wrapper code for our debugger to be used with Hardhat. In the next proposal, we will discuss our future development plan for building a well-designed debugger in collaboration with the Nomic team. ### Task Description #### Subtask 1: Support `clone` and `tweak` features for Hardhat Given our successful development of the `clone` and `tweak` features in Foundry, we aim to integrate these two features into Hardhat. This is also a good starting point for Hardhat migration, since both features are keystones of our debugging functionalities. Additionally, as far as we know, the only debugging functionality Hardhat natively supports is adding `console.log`. These two features will significantly (and natively) enhance the current Hardhat debugging experience. We also aim to attract more open-source contributors to this Hardhat debugging development, possibly with support from the Ecosystem Support Program and the Hardhat core team. #### Subtask 2 (partially): adding support for our new REVM debugger layer in Hardhat Since Hardhat is now working on REVM, it aligns better with our original design. In this subtask, we will add connecting code in Hardhat to communicate with our debugger layer in REVM. Hardhat would be able to get all the information from our debugger layer, which can be helpful for future debugger development. #### Subtask 3 (next proposal): build a Hardhat debugger prototype Given that building a debugger is not a short-term goal for the Nomic team, we would like to communicate with them to first build a prototype to benefit the ecosystem. During this process, we will keep in mind that Slang may become the major compiler in Hardhat for compatibility. This task may require more discussion, especially regarding the Nomic team's own debugger plan. We are willing to remove this subtask per suggestion. ## Task VI: Improve Debugger UI/UX ### Objective Given the new functionalities we've added, it’s crucial to modify and enhance the current debugger's UI/UX. This could include adding new panels for the newly-added features, creating more user-friendly debugging commands, and more. ### Task Description (Partially) We will continuously integrate all functionalities of other popular debuggers (e.g., [GDB](https://web.eecs.umich.edu/~sugih/pointers/summary.html)'s `break`, `delete`, `clear`, `continue`, `step`, `next`, `until`, `list`, and `print` commands) into Foundry and Hardhat debugger, improve the UI, and provide a smooth user experience as debuggers in other programming languages. This task will be continually developed during the whole project. ## Timeline The time range of this proposal spans from January 2024 to April 2025. Tasks scheduled after April 2025 are part of a follow-up project, for which we will submit a separate proposal upon successful completion of tasks in this proposal. Please note that while development of Task I began in early 2024, it has been slightly delayed by one quarter due to a constrained funding situation (which is the main reason we are requesting a small grant from the Ethereum Foundation). We are making efforts to catch up. ![EtherDebug (EDB)](https://hackmd.io/_uploads/BkcbFl-EA.svg) ## Budget We intend to use the grant as compensation for two team members in this project. Specifically, referring to the timeline: - By the completion of Task V.1 (end of Q2 2024), we will spend $7,500 for team member compensation. - By the completion of Task III.1 and Task V.1 (end of Q3 2024), we will spend $11,250 for team member compensation. - - By the completion of Task II, Task IV, and Task III.2 (end of Q1 2025), we will spend $11,250 for team member compensation. # Our Team Our team consists of two members who have over 10 years of software development experience, Dr. Zhuo Zhang and Dr. Wuqi Zhang. Dr. Zhuo Zhang, currently a Post-doc researcher at Purdue University, specializes in compilation theory with a broad interest in software engineering. His research encompasses software testing and security, blockchain security, and reverse engineering. Academically, Dr. Zhang has been recognized for his contributions, receiving the SIGPLAN OOPSLA 2019 Distinguished Paper Award for his advanced analysis technique on post-compilation bytecode. He was also a CSAW 2021 Best Applied Security Paper Award Top-10 Finalist, acknowledging his work in general software security. Beyond academia, Dr. Zhang is an active participant in the blockchain community, notably as a whitehat hacker. He has been awarded over $200K in bug bounties for identifying critical vulnerabilities in foundational projects, including the Ethereum Name Service (ENS). His expertise was further highlighted by winning the Paradigm CTF 2023. Dr. Zhang's skills in compiler and bytecode analysis are invaluable to our team, particularly in the development of a high-quality Solidity compiler add-on. This tool is integral to our state-of-the-art smart contract debugger, leveraging his extensive knowledge and experience. Dr. Zhuo Zhang's reference links: + Homepage: https://zzhang.xyz + Github: https://github.com/ZhangZhuoSJTU + Twitter: https://twitter.com/i2huer Dr. Wuqi Zhang, a research scholar at Purdue University and a research fellow at the Hong Kong University of Science and Technology, is deeply committed to advancing the fields of software engineering and security. His focus is particularly honed on blockchain technology and its myriad applications. Dr. Zhang's scholarly contributions have garnered recognition within the academic community, as evidenced by accolades such as the HKUST Redbird Academic Excellence Award. His prolific research output includes over five publications in leading software engineering and security journals, underscoring his expertise in DApp bug detection, contract bytecode analysis, and vulnerability detection. Before his current academic pursuits, Dr. Zhang honed his practical skills as a Solidity developer at the Hong Kong Applied Science and Technology Institute (ASTRI). This blend of theoretical knowledge and hands-on experience positions him uniquely for the proposed project. Dr. Zhang's profound understanding of smart contract development and bytecode analysis will be instrumental in driving the project forward, particularly in the critical areas of designing and developing an innovative smart contract debugger. Dr. Wuqi Zhang's reference links: + Homepage: https://troublor.xyz + Github: https://github.com/Troublor + Twitter: https://twitter.com/troublor ## Previous work We have extensively researched and contributed to the open-source community in the fields of smart contract and blockchain analysis. Our development of EtherDebug builds on a multitude of our prior works. Below, we have outlined some of our most relevant projects and publications. For comprehensive details on our full range of works and contributions, please refer to our homepages: https://zzhang.xyz and https://troublor.xyz. A. Analysis of Smart Contract Code: EtherDebug's development necessitates a comprehensive analysis of verified Solidity code (sourced from Etherscan), which is supported by our extensive prior research: 1. "Characterizing Transaction-Reverting Statements in Ethereum Smart Contracts" (https://arxiv.org/pdf/2108.10799.pdf) - Published in the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 2. "ÐArcher: Detecting On-Chain-Off-Chain Synchronization Bugs in Decentralized Applications" (https://arxiv.org/abs/2106.09440) - Published in the Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). This work introduces a testing framework for DApps, specifically addressing on-chain-off-chain synchronization issues. (Project link: https://github.com/Troublor/darcher) 3. "Demystifying Exploitable Bugs in Smart Contracts" (https://www.cs.purdue.edu/homes/zhan3299/res/ICSE23.pdf) - Published in the Proceedings of the 45th ACM/IEEE International Conference on Software Engineering (ICSE 2023). It offers an in-depth analysis of exploitable bugs in smart contracts. (Project link: https://github.com/ZhangZhuoSJTU/Web3Bugs with over 1.3K stars) 4. "Combatting Front-Running in Smart Contracts: Attack Mining, Benchmark Construction, and Vulnerability Detector Evaluation" (https://arxiv.org/abs/2212.12110) - Published in Transactions on Software Engineering (TSE) 2023. This research involves mining historical front-running attacks and evaluating the vulnerability detection capabilities of state-of-the-art tools. (Project link: https://github.com/Troublor/erebus-redgiant) 5. "Your Exploit is Mine: Instantly Synthesizing Counterattack Smart Contract" - Published in the 32nd USENIX Security Symposium (Security 2023). It discusses a technique for on-the-fly blocking of smart contract attacks. 6. "Nyx: Detecting Exploitable Front-Running Vulnerabilities in Smart Contracts" - Published in the 45th IEEE Symposium on Security and Privacy (S&P 2024). This paper introduces a sound static analysis approach to narrow down the vulnerability detection search space and identify exploitable front-running opportunities in smart contracts. B. Post-Compilation Analysis: EtherDebug's unique features, such as local variable inspection for on-chain contracts, require analysis of post-compilation bytecode. This involves code analysis without access to the source code. 1. "Revamping Binary Analysis with Sampling and Probabilistic Inference" (https://hammer.purdue.edu/articles/thesis/Revamping_Binary_Analysis_with_Sampling_and_Probabilistic_Inference/23542014) - This is the Ph.D. thesis of Dr. Zhuo Zhang. It elaborates on various state-of-the-art bytecode analysis techniques for general programs, including smart contracts as well. 2. "OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary" (https://www.cs.purdue.edu/homes/zhan3299/res/SP21a.pdf) - Published in the 42nd IEEE Symposium on Security and Privacy (S&P 2021), this paper introduces a novel probabilistic approach for recovering data structures from compiled bytecode. 3. "BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation" - Featured in the Proceedings of the ACM on Programming Languages Volume 3, Issue OOPSLA (OOPSLA 2019), this research presents a practical method for data dependence analysis in bytecode.