# Make script to define optimal batch size and concurrency of blocks import ## Motivation Blockscout requesting the archive node with batched JSON RPC requests. And the value of batch size is crucial on the response time. Moreover different Ethereum clients can have different optimizations for batched JSON requests. Before starting of Blockscout, it would be awesome to have an idea, what is an optimal batch size and concurrency for JSON RPC requests, which utilized by Blockscout. ## Description Write a script which calculates success response times of JSON RPC requests to the node and suggests the optimal batch size and concurrency. Other quirks aside, request part of block import in Blockscout basically consists of - batched requests of block's data (`eth_getBlockByNumber` request) with batch size `block_batch_size` and concurrency `block_concurrency`. - and then batched requests of tx receipts (`eth_getTransactionsReceipt`) with own batch size `tx_batch_size` and concurrency `tx_concurrency` inside each block batch. The task of the script is to find optimal numbers `block_batch_size`, `block_concurrency`, `tx_batch_size`, `tx_concurrency` for concrete archive node endpoint of given blockchain. Optimal means when the cumulative response time to process the same amount of blocks is minimal. ### Suggested input parameters - `node_endpoint` - archive node endpoint. - `block_num_total` - a total number of blocks, for which script should get the data. - `block_range` - a range of valid block numbers from the blockchain, in which the script should request the data. It could be optional. The basic range could be [0, `max_block_number`], where `max_block_number` is the response of `eth_blockNUmber` request to the archvie node. ### Script phases 1. Firstly, script should generate `block_num_total` random block numbers in the range `block_range` and use them in all runs inside a single execution of the script. 2. Then, script should send batched JSON RPC requests `eth_getBlockByNumber` to the archive node's endpoint `node_endpoint` for those block numbers. The number of parallel requests aka concurrency is `block_concurrency`. For each given `block_batch_size`, `block_concurrency` is finding from: `block_num_total` <= `block_batch_size` * `block_concurrency` and `block_num_total` > `block_batch_size` * (`block_concurrency` - 1) _product shouldn't be strictly `block_num_total` for different runs since `block_batch_size` could be not a multiple of `blocks_num`_ 3. In each block batch script should request tx receipts (`eth_getTransactionsReceipt` request) in separate batches with batch size `tx_batch_size` and concurrency `tx_concurrency`. ### Expected results Script should find 4 optimal integers `block_batch_size`, `block_concurrency`, `tx_batch_size`, `tx_concurrency` for which the cumulative response time of all requests is minimal in terms of `block_num_total` block numbers. As a bonus it could also return the log of responses: - all block numbers, which have been tried. - number of runs. - which different `block_batch_size`, `block_concurrency`, `tx_batch_size`, `tx_concurrency` have been tried in each run. - cumulative response time for each run - min/max/avg response time of each kind of requests in the run.