CDAP Development Update 6

# CDAP Development Update 6 During the past month, I spent most of my time on testing, fixing bugs and optimizing the tool, and then running it with thousands of history blocks on mainnet, and analyzing the collected data together with tracing results. ## Statistic and charts I run [this tool](https://github.com/alexchenzl/predict_al) agaist an archive Geth node from block 13580500 to 13585500 (exclusive) to get prediction result, and then run [an access list tracer](https://github.com/alexchenzl/TraceHelper) with the same blocks to get the storage access list in a real transaction exection. Finally I loaded all of them into a postgresql database, and analyze the data with SQL and pandas. #### Summay of data * Average Geth state access RPC time is about 250ms in the testing environment * Some constraints were set for the prediction tool: * max round is 20 * prediction timeout is 3 seconds * Total 5000 blocks, 1043603 transactions * 372033 simple Ether transfer transactions, they are excluded from further analysis * 671570 contract call or creation transactions, further analysis is based on them #### Time of tracing Here we record the execution time of tracing every transaction as an estimation of its real execution time. Time unit is millisecond in the chart. ![](https://i.imgur.com/P79eSKi.png) Average time is 40.84ms, median time is 9.06ms, 90% of them are less than 105ms and 99% are less than 300ms. The real execution time of a transaction on the archive node should be less than this. #### Rounds/total-state-accesses Here we define total-state-accesses as the sum of the count of accounts accessed and the count of storage slots accessed in a transaction execution. Rounds is the count of execution rounds to predict the transaction's access list, the minimum round of any transaction is one, the maximum round is limited as 20 in this testing. ![](https://i.imgur.com/8sbUBwZ.png) Average rate is 0.3, 99% of them are less than 0.8, and only 0.68% are equal with or larger than 1. #### Account-hit-rate Here we define accounts-accessed as all accounts accessed in a real transaction execution, accounts-predicted as all accounts predicted and accounts-matched as all accounts that occur in both accounts-accessed and accounts-predicted. Then we define account-hit-rate as number-of-accounts-matched/number-of-accounts-accessed, it means the percentage of accounts that are predicted correctly by this tool. ![](https://i.imgur.com/YV05owK.png) Average hit rate is 96.2%, and 86.6 of them are equal with 1. #### Slot-hit-rate The definition is almost the same as the account-hit-rate, but the targets analyzed are storage slots in contracts instead of accounts. ![](https://i.imgur.com/Lo92pR1.png) Average hit rate is 81.56%, and 79.67 of them are equal with 1. #### Accounts most frequently accessed The accounts accessed of a transaction is a set without duplicate address, but the same account can occur in different transactions' accounts accessed sets. * 2422725 account access records in total * 293705 unique accounts * Top 5 accounts occupy 18.78% of all account access records * Top 10 accounts occupy 25.47% * Top 20 accounts occupy 30.55% * Top 50 accounts occupy 38.73% * Top 100 accounts occupy 45.17% #### Contract Methods most frequently accessed A transacction may have a lot contract method calls, but every contract method call will only be counted once in this transacction even though it was called several times. Here we use "contract-method's signature" as an index to do this statistic, such as 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2-0xa9059cbb. * 2245330 contract method call records in total * 89647 unique contract methods * Top 5 contract methods occupy 20.09% of all records * Top 10 contract methods occupy 27.06% * Top 20 contract methods occupy 34.0% * Top 50 contract methods occupy 45.24% * Top 100 contract methods occupy 53.79% Conclusion and thoughts --- * The execution time of a transaction on a Geth archived node with local SSD storage is obviously much smaller than network data access. A portal client will need to retrieve states from the network to execute a transaction. Access list prediction should be very helpful to save the rounds to fetch data from the network. * This prototype demonstrates that it's possible to predict access list in most of the cases, at least partially. * There are some potential ways to improve this tool * Now the tool often stops until it doesn't find any new states in a round, need to find out a better way to avoid this kind of last round in some cases. * It seems that some of the most frequently accessed accounts and contract method calls should be investigated, maybe they could be used as a local dictionary to speed up the prediction.