Code chunking is a strategy for breaking up bytecode into smaller chunks, which helps reduce witness size. However, this approach is only effective if a relatively small portion of the bytecode is accessed during execution. This analysis explores byte and chunk access patterns to evaluate the utility of code chunking.
The complete repository which includes the data collection and analysis code can be found here.
This section examines the proportion of bytecode accessed during contract execution:
Core Finding: On average, only 22.8% of the contract bytecode was accessed in a block.
Detailed Insights:
Interpretation: For contracts larger than 1 KiB, only a small fraction of the bytecode is accessed. Contracts under 1 KiB tend to have more of their bytecode accessed. This is expected, as small contracts usually contain fewer functions that are repeatedly invoked.
Referencing EIP-2926 as a code chunking solution, we split the contract bytecode into 31-byte chunks and assess the proportion of 31-byte chunks accessed:
Core Finding: On average, only 29.6% of 32-byte chunks were accessed in a block.
Detailed Insights:
Interpretation: The results are similar to the bytes accessed ratio. Chunk access is also low for contracts over 1 KiB. However, the overall chunk accessed ratios are slightly higher than byte accessed ratios, suggesting that not all bytes in the accessed chunks were used.
In this section, we explore how efficient chunks are, i.e., how many bytes in the chunks are actually used:
Core Finding: On average, 68.9% of the bytes in the 31-byte chunks were accessed. That's roughly 21 bytes in every 31-byte chunks on average.
This indicates that more than half of the bytes in 31-byte chunks were accessed. To maximize chunk efficiency, we may consider smaller chunk sizes (e.g., 16-byte chunks). However, this comes at the cost of increased hashing overhead.
This section evaluates the impact of opcodes that access the entire code, namely:
EXTCODESIZEEXTCODECOPYCODECOPYCODESIZEWhen one of these opcodes is executed, it requires access to all of the bytes in the bytecode. In the past sections of evaluating the access ratios, we exclude them from the results. Here, we assess how including them changes the access ratios. We split them into two categories:
EXTCODESIZE, CODESIZE)EXTCODECOPY, CODECOPY)The reason for the 2 categories is because EIP2926 adds the code size in the account field. Therefore, once it's implemented, code size opcodes will no longer require access to the entire bytecode.
In total, 46.6% of contracts per block contain either the code size or code copy opcodes. Among these, 40.7% contain code size opcodes, while only 10.6% contain the code copy opcodes.
Avg Bytes Access Ratio
Avg Chunks Access Ratio
After including the code-access instructions, we do see a moderate increase in the access ratios. However, it's mostly due to code size opcodes. As mentioned before, the addition of code size in the account field would make code copy opcodes the only instructions to access the entire bytecode. Since the amount of code copy instructions is significantly lesser, the overall access ratios are lower.
Referencing EIP-2926, the main point of code chunking is to reduce witness size, as the current status quo requires the whole bytecode to be used in the code proof.
Our analysis has shown that not all of the bytes in a contract's bytecode are used. In fact, only a relatively small proportion of the bytes and chunks are used. Based on the current access patterns, if we were to implement code chunking, we would significantly reduce the amount of actual bytes used included in the code witness.
The addition of code size in the account field in EIP2926 would effectively make code copy opcodes the only instructions that requires accessing the entire bytecode. In addition, as shown in our findings, the amount of code copy opcodes is significantly less than the code size. Therefore, we would further reduce the average code witness size based on the current access pattern.
One additional exploration that we can conduct is to determine the optimal chunk size. In EIP-2926, it uses 31-byte chunks. We may want to explore smaller chunk sizes, such as 16-byte, to maximize the number of bytes utilized per chunk. However, this comes at a cost of additional hash overhead. Therefore, we need to experiment with different chunk sizes to find the optimal balance.