# Problem
We should index values that builders receive behind relays. Also, signed values from all configured relays should be indexed.
## Amount of data should be stored
Relays can receive a lot of data from connected builders. Consider the following endpoints of Relays where `relay-m` collects the data:
- `/relay/v1/data/bidtraces/builder_blocks_received?slot=X` returns upto 200KB of JSON document for each slot
- `/relay/v1/data/bidtraces/proposer_payload_delivered?slot=X` returns upto 2KB of JSON document for each slot
## Relays availability
Sometimes relays can be unavailable. However, we need to keep indexing bids and payloads from all available relays. Keep in mind that relays have restrictions on their API for the request count per hour. So we should consider scenarios when one or more relays were unavailable and become available. In this case `relay-m` may exceed the API limits. For more accuracy setting limits consider the option when API limit can be set for each relay separately.
## Configurable relays indexing
Consider the option, for some relays we want to collect bids/payloads, but do not include their data into main statistics.
# Solution
## Data volume estimation
- Data volume for `/relay/v1/data/bidtraces/builder_blocks_received` for 1 year for 5 relays: `200KB * ((24h * 60m * 60sec) / 12 sec_per_slot) * 365days * 1 year * 5 relays = 2.628TB`
- Data volume for `/relay/v1/data/bidtraces/proposer_payload_delivered` for 1 year for 5 relays: `2KB * 7200 slots * 365days * 1 year * 5 relays = 26.28GB`
- if we want to store bids values separately to count them: `24bytes for each value * 150 bids * 7200 slots * 365days * 5 relays = 47.304GB`
- signed bids from 5 relays stored in the table `~ 30GB`
To summarize, the data volume needed to store all data per year for 5 relays:
`2628GB + 30GB + 50GB + 30GB + 50GB (relation DB metadata) ~ 2788GB ~ 2.8TB`
Probably, data can be archived/reduced that was already indexed, or older than 4 month.
## Statistics (stats) groups of relays
To implement counting statistics by groups of relays for builder blocks bids, new entities must be introduced: `statistics` and `relays`.
Once a new relay added, the relay not included to any group. We can configure to gather stats for this relay only or groups of relays.
Also, relays can be added into existing groups. There can be two below options for including:
- recalc statistics of recipient group from `fromSlot` option with newly added relay(-s)
- start counting statistics with newly added relay(-s)
These new entities will be reflected in API and DB schema.
## Gather data from relays as much as possible
Consider the case when relays was unavailable couple of hours. `relay-m` cannot retrieve `signed-bids` and/or `builder-blocks` immediately due to request limit.
The following steps are assumed to gather data:
- once `relay-m` understood that the relay became available, `relay-m` performs `REQUEST_LIMIT_PER_HOUR` queries to gather all missed data for slots
- in case when count of missed slots greater than `REQUEST_LIMIT_PER_HOUR`, `relay-m` tries to receive as many of possible slots, the rest slots will be skipped for now
- when `relay-m` consumes fewer requests than `REQUEST_LIMIT_PER_HOUR`, `relay-m` comes back to retrieving data for skipped slots
- once data for skipped slot are fully fetched, indexing will be applied
`mev-m` retrieves data for missed slots as soon as they appear.
## API for `relay-m` and `mev-m`
### Statistics
- `/stats` - returns list of stats
- `/stats/:statId` - returns statistic by `:statId`
- `/stats/:statId/builder-blocks` - returns bids stats for `:statId`
- slot - filter by slot number
- fromSlot - filter only slots which gt than
- toSlot - filter only slots which lte than
- `/stats/:statId/signed_value` - returns signed values for `:statId` of included relays
- slot - filter by slot number
- fromSlot - filter only slots which gt than
- toSlot - filter only slots which lte than
### Relays
- `/relays` - returns list of relays, output format:
- id
- pubkey
- url
- requestLimitPerMinute
- signedBidsStats:
- firstFetchedSlot
- lastFetchedSlot
- firstIndexedSlot
- lastIndexedSlot
- builderBlocksStats:
- firstFetchedSlot
- lastFetchedSlot
- firstIndexedSlot
- lastIndexedSlot
- `/relays/:relayId` - returns data by `:relayId`, output format the same as `/relays`
- `/relays/signed-bids/:slot` - returns data of signed values for all relays
- slot - filter by slot number
- fromSlot - filter only slots which gt than
- toSlot - filter only slots which lte than
- `/relays/:relayId/signed-bids` - returns data of signed values for the `:relayId` with filters:
- slot - filter by slot number
- fromSlot - filter only slots which gt than
- toSlot - filter only slots which lte than
## Create and alter tables in `relay-m` DB
Create a table `relay` with below fields:
- id PK INT - auto-incremental counter
- relay_pubkey string
- relay_url string
### Stat groups
Create a table `statistic` with the fields:
- id PK INT - auto-incremental counter
- name string
- proposer_payload_work_mode SMALLINT default 0
- 0 - disabled
- 1 - pulling data only
- 2 - counting stats
- bids_payload_work_mode SMALLINT default 0
- 0 - disabled
- 1 - pulling data only
- 2 - counting stats
Create a table `relay_in_stat` with the fields:
- id PK INT - auto-incremental counter
- stat_id INT
- relay_id INT
- proposer_payload_start_slot INT
- NULL - do not recalc, start from the current
- slot number - try to get data and calc stats from slot number
- bids_payload INT
- NULL - do not recalc, start from the current
- slot number - try to get data and calc stats from slot number
### Indexing Bids
Keep table `bids_range_storage` as is and store stat for all relays.
Create a table `stat_bids_range` with fields:
- id PK BIGINT - auto-incremental counter
- stat_id INT
- slot INT
- min_value numeric(36,0)
- max_value numeric(36,0)
- median_value numeric(36,0)
- count integer
Create a table `relay_builder_blocks_received` with fields:
- id PK BIGINT - auto-incremental counter
- slot INT
- relay_id INT
- bids_raw JSONB ?
- status SMALLINT default 0
- 0 - not started
- 1 - in process of pulling
- 2 - fetched/not included to stats
- 3 - included to stats
- 4 - skipped
- 5 - archived
Create a table `builder_blocks_received` with fields:
- id PK BIGINT - auto-incremental counter
- relay_slot_id BIGINT -> relay_builder_blocks_received.id
- signed_value numeric(36,0)
### Indexing proposers' payload
Create a table `relay_proposer_payload_delivered` with fields:
- id PK BIGINT - auto-incremental counter
- slot INT -> submitted_payload.slot
- relay_id INT -> relay.id
- status SMALLINT default 0
- 0 - not started
- 1 - in process of pulling
- 2 - fetched/not included to stats
- 3 - included to stats
- 4 - skipped
- 5 - archived
Create a table `proposer_payload_delivered` with fields:
- slot INT
- block_hash string
- parent_hash string
- builder_pubkey string
- proposer_pub_key string
- proposer_fee_recipient string
- signed_value numeric(36,0)
- gas_limit numeric(36,0)
- gas_used numeric(36,0)
Create a view `submitted_payload_storage` with the same fields to provide backward compability.