# GREEN (GREEdy optimizatioN of EVM bytecode) ## Project Abstract ### What problem are you trying to solve? We are trying to build advanced EVM optimization technology amalgamating three key features that, until now, are not unified in any other tool: 1. achieving important optimizations in the use of the memory/storage and detecting inefficient code patterns generated by the compiler, 2. inferring inter-block information and using it for optimizing the EVM sequences going beyond superoptimization, 3. overcoming SAT/SMT performance limitations when optimizing medium-size sequences of EVM instructions. We expect to achieve gains of 6% bytes-size reduction and 2% of gas reduction (on contracts already optimized by the solc compiler) and require less than 3 seconds to optimize a medium-size contract. Using the memory analysis, we will also detect inefficient uses of the memory in the EVM code generated by the solc compiler and propose ways to optimize them. The outcome of the GREEN project will be an advanced EVM optimization technology that can be used by means of a standalone optimization tool, but also, its components will be prepared to be integrated within an EVM compiler, as a whole or individually. --- ## Objectives ### What are you hoping to accomplish with this grant? How do you define and measure success for this project? Overall, what we are hoping to accomplish is the development of precise and scalable EVM optimization technology, which requires achieving the following objectives: * [O1] Development of a precise heap analysis to represent references and accesses to data structures stored in memory or in storage. * [O2] Development of a scalable and efficient bytecode optimizer by combining the results obtained in O1 and a greedy algorithm to overcome the limitations of the current intra-block optimizers. * [O3] Identification of optimization opportunities when accessing memory or storage elements, and proposal of concrete optimization actions. We will measure the success for the project by applying it on real contracts and experimentally proving the gains mentioned in the project summary. --- ## Outcomes ### How does this project benefit the greater Ethereum ecosystem? Our basic memory analysis, in spite of its limited precision, has already detected some cases in which the Solidity compiler generates bytecode whose memory usage can be greatly optimized (e.g., the compiler creates memory that is never used in low-level plain calls for currency transfer, namely even though the contract code does not use the second result returned by the low-level call, the compiler generates code for retrieving it). Existing optimization tools do not have the technology to detect these fragments of improvable code. The framework developed in this project will allow the detection and optimization of this kind of cases. Besides this type of improvable patterns, our advanced optimization technology can be used to optimize a considerable amount of gas and bytes-size of smart contracts in few seconds. Based on our experience on EVM bytecode optimization, we believe that the GREEN project will achieve gains of 6% bytes-size reduction and 2% of gas reduction, what is a measurable benefit for the Ethereum ecosystem, that results in reducing both the execution and deployment costs. Finally, the efficiency of the developed technology and tool make the approach fully compatible with a real-world compiler. In fact, we are aware that there are future plans to include more complex memory optimizations in the solc compiler. In all these aspects, we believe the greater Ethereum ecosystem can benefit from this project. --- ## Relevant past publications [1] The basic memory/storage analysis has been accepted to the TACAS'23 conference (to be presented in April 2023 in Paris at ETAPS 2023): Elvira Albert, Jesús Correas, Pablo Gordillo, Guillermo Román-Díez, Albert Rubio Inferring Needless Write Memory Accesses on Ethereum Bytecode https://arxiv.org/pdf/2301.04757.pdf [2] An intra-block greedy algorithm is part of a (blind) conference submission: Combining Greedy and SAT Techniques for the Synthesis of Superoptimized Stack-Bytecode https://costa.fdi.ucm.es/papers/costa/greedy.pdf [3] Elvira Albert, Pablo Gordillo, Alejandro Hernández-Cerezo, Albert Rubio, Maria Anna Schett Super-optimization of Smart Contracts. ACM Trans. Softw. Eng. Methodol. 31(4): 70:1-70:29 (2022) https://dl.acm.org/doi/10.1145/3506800 [4] Elvira Albert, Pablo Gordillo, Alejandro Hernández-Cerezo, Albert Rubio A Max-SMT Superoptimizer for EVM handling Memory and Storage. TACAS (1) 2022: 201-219 https://costa.fdi.ucm.es/papers/costa/AlbertGHR22.pdf [5] Elvira Albert, Pablo Gordillo, Albert Rubio, Maria Anna Schett Synthesis of Super-Optimized Smart Contracts Using Max-SMT. CAV(1) 2020: 177-200 http://costa.fdi.ucm.es/papers/costa/AlbertGRS20.pdf --- ## Grant Scope ### What are you going to research? What is the expected output? While intra-block optimization tools for EVM bytecode (like GASOL and reference [2]) have achieved significant optimizations (0.11% of gas reduction and 1.57% of bytes-size reduction in [2], on contracts optimized with solc 0.8.17), they suffer from two limitations: (a) They miss important optimization opportunities due to their intra-block boundary. This restriction especially affects storage/memory optimization, which can greatly benefit from a global (inter-block) analysis. (b) Optimizing code by doing an exhaustive search for finding the optimal code (e.g. using an SMT solver) is prohibitively expensive for large sequences of instructions (hence it can only be applied at an intra-block level in practice). This greatly limits the impact of these tools, as working on larger sequences (e.g., merging blocks) has more potential gains due to reusing computations. The GREEN project aims at investigating and developing new EVM optimization technology that overcomes these two limitations by: 1. Inferring global information at an inter-block level that can be exploited when optimizing each of the blocks. This global information may refer to both (i) information regarding the references used to access memory/storage, and (ii) relational properties over the elements of the stack. 2. Developing a very efficient greedy algorithm for bytecode optimization. This algorithm reconstructs a sequence of EVM instructions (from possibly several blocks) given the stack and memory/storage layout before and after its execution and basic information about the computations performed. Hence, information from phase (1) can be incorporated to produce valid bytecode. This transformation is sound and aims at minimizing the number of instructions by making the best local decisions. The expected output is a scalable and precise optimization tool that outperforms the state-of-the-art optimizers (multiplying the gains by at least 10 times wrt them) and that, due to its efficiency, can be integrated within the compilation pipeline of Ethereum smart contracts (optimization times will be less than few seconds for medium-sized contracts). --- ## Challenges ### Have you come across any obstacles thus far? If so, how have you attempted to tackle these issues? Have you been successful in overcoming them? The challenges and problems that we had so far are: 1. We have developed a memory analysis which can detect memory slot reservations (array and structs) that are never read in the subsequent instructions of the program (reference [1] above). The next challenge consists in identifying, not only the base reference of the slot, but also the different offsets used for accessing the slot elements. With this extension, we will be able to achieve more precision to identify improvable uses of the memory. 2. The reconstruction of the CFG might require cloning some of the basic blocks of EVM bytecode if they might be reached with different stack-sizes. It raises a challenge in the analysis and optimization phase as the analyses proposed have to guarantee that the relations and properties inferred are maintained in all the copies of the same basic block. In addition cloning has to be taken into account when the optimized bytecode is reconstructed. In order to finally build the complete executable bytecode, we will have to recompute the jump addresses to keep the original flow of the contract, as the optimization often reduces the instructions. 3. We have designed an efficient greedy algorithm that obtains significant gains while ensuring soundness and termination (reference [2] above). Our next challenge is to further refine this algorithm, as the current version always prioritizes duplicating already generated values and can miss some optimization gains. Besides, the algorithm can fail to produce a solution when managing a stack with more than 16 elements, we will adapt it in order to mitigate these situations. Finally, changing the scope of optimization from intra-block to inter-block also poses other challenges, as ensuring correctness becomes harder. --- ## Methodology and Timeline ### Stage 1 Brief description: Prototype (GREEN 0.1) In this stage we will develop a first prototype of the heap analysis and generalize the greedy algorithm to take into account the global information provided - Expected roadmap * T0: Development of CFG reconstructor (O1, O2). In order to infer global properties over the stack, memory and storage layout, the heap analysis requires a model of the execution flow of the EVM bytecode by means of an enriched CFG. The enriched CFG gives us precise information that will be used in the following steps. * T1: Development of heap analysis (O1,O2,O3). This analysis allows us to infer a coarse-grained model of the structures stored in memory/storage. * T2: Development of a prototype of a greedy optimization algorithm (O2) Starting from a jump-free sequence of instructions, this algorithm will generate an equivalent sequence aiming at reducing its size and gas consumption. It outperforms previous algorithms as it does not rely in any external solver. - Expected result: Release of GREEN 0.1, an optimization tool that includes a memory analysis that detects needless memory/storage accesses and uses the greedy algorithm instead of SMT solvers for optimization and a corresponding deliverable. * Ballpark Grant ask 20.000 USD 300 hours of the 5 team members 4 months part-time contract of PhD student ### Stage 2 Brief description: System (GREEN 0.2) In this stage we will improve the optimizer by (i) enhancing the precision of the heap analysis and integrating the global information into the greedy algorithm, and (ii) considering larger sequences of bytecode by merging multiple basic blocks generated by the compiler. - Expected roadmap * T3: Extension of heap analysis to handle more precise allocation information (O1, O3) This analysis will produce a finer-grained model of the memory/storage by obtaining the particular memory locations accessed at each program point. * T4: Integration of global information into the greedy algorithm (O2) The greedy algorithm will be adapted to take advantage of the analysis results produced in T3. It will increase the potential gains obtained by the greedy algorithm. * T5: Analysis to join multiple basic blocks (O2) This analysis will identify sets of blocks that can be soundly merged together into larger sequences of instructions. It will improve the results of the greedy algorithm by (i) reusing intermediate computations, and (ii) removing jump instructions. - Expected result: Release of GREEN 0.2, an optimizer that improves the precision and savings of the previous version and the corresponding deliverable * Ballpark Grant ask 20.000 USD 300 hours of the 5 team members 4 months part-time contract of PhD student ### Stage 3 Brief description: Release of optimization tool (GREEN 0.3) In this stage we will develop an EVM bytecode reconstructor to integrate it with the solc compiler and evaluate the final outcome of the project. - Expected roadmap * T6: Development of EVM bytecode reconstructor (O2) Reconstruct the optimized EVM bytecode to make it compatible with the solc compiler, maintaining the semantics and flow of the original one. * T7: Testing and experimental evaluation (O2) - Expected result: Release of GREEN 0.3, an EVM bytecode optimization tool and a final project deliverable - Ballpark Grant ask 20.000 USD 300 hours of the 5 team members 4 months part-time contract of PhD student --- ## Project Team We are all members of the research group, COSTA https://costa.fdi.ucm.es/web/. The COSTA team devises formal techniques and models for optimizing, verifying and understanding programs, and develops related implementation technology. Since 2018, the group is applying this technology on EVM bytecode. Detailed information on team members, publications, projects, research activities, etc. is accessible from: <p style="text-align: center;"><a href="https://costa.fdi.ucm.es/web/"> https://costa.fdi.ucm.es/web/ </a></p> Elvira Albert is a Full Professor at the Universidad Complutense de Madrid (Spain). Elvira has ample experience in project leading: she has been principal investigator of the FORVES and GASOL projects funded by the Ethereum Foundation, and of the Envisage and HATS projects funded by the European Commission, as well as principal investigator of several national projects. Her work on optimizing and verifying security and safety properties of Ethereum smart contracts has been published at prestigious international conferences such as CAV, OOPSLA, ISSTA, TACAS and ATVA, and she has been invited to give keynote talks about her work on EVM at recognized events: IJCAR'22 (main conference), VPT'22 (ETAPS workshop) and SOAP'22 (PLDI workshop). Elvira is author of around 150 publications in international journals and volumes strongly related to the topics of this project. More info at: https://costa.fdi.ucm.es/~elvira/ Pablo Gordillo is an Assistant Professor at the Universidad Complutense de Madrid since October 2021. He took part in the GASOL project (funded by the Ethereum Foundation) as researcher and is one of the main developers of the GASTAP, SAFEVM and GASOL tools for the analysis and optimization of smart contracts. More info at: https://costa.fdi.ucm.es/~pabgordi/ Alejandro Hernández-Cerezo is a PhD student at the Universidad Complutense de Madrid (Spain) under the supervision of Prof. Elvira Albert and Prof. Albert Rubio. His thesis focuses on optimization of EVM bytecode, combining static analysis and encodings with soft constraints. Alejandro spent 3 months in the Formal Verification Team at the EF in Berlin, under the supervision of Leo Alt. More info at: https://costa.fdi.ucm.es/~ahernandez/ Jesús Correas is an Associate Professor at the Universidad Complutense de Madrid (Spain). Since 2018, he is working on the analysis and verification of Ethereum smart contracts, with several publications in journals and international conferences such as TACAS, ISSTA and ICST. More info at: https://costa.fdi.ucm.es/~jcorreas/ Guillermo Román is an Associate Professor at the Universidad Politécnica de Madrid (Spain). Currently, he is working on the analysis and verification of Ethereum smart contracts, with several publications in journals, like the Journal of Systems and Software, and in international conferences such as TACAS, ISSTA and ICST. More info at: https://costa.fdi.ucm.es/groman/ Albert Rubio is a Full Professor at the Universidad Complutense de Madrid (Spain). Besides his work on the GASOL project and his long experience and research impact in the application of formal methods in computer science, he is currently leading the development of the circom programming language and compiler. Circom is a DSL for designing arithmetic circuits for ZK protocols whose compiler is open source and fully written in Rust (https://github.com/iden3/circom). His experience in the design of compilers will be very valuable for the success of the GREEN project. More info at: https://costa.fdi.ucm.es/~arubio/ --- ## Budget 60.000 USD