Problem ======= The main goal of RM (relay monitoring) is indexing all bids from relay by slots. In case when one (or more) relay is not available, RM is fall down. The next job execution may failed again due to unavailability of one (or more) relays. This behavior leads to stop consuming data by mev-monitoring. Solution ======== fetch-bids (every 5 sec): --------- 1. create a request record for each slot and relay with status = 0 2. processing: if success, set status = 1 and write the bids to the table with request_id, otherwise, set status = 2 3. select all relays with failed or unprocessed slots with lag no older than 120 slots; perform processing (2) ```sql! select request_id, pubkey, url, slot_number from relay_request_status rs where rs.status in (0, 2) and slot_number - lastIndexedSlotInDB > 120 ``` indexing-bids (every 12 sec): ---------- 1. index all slots that placed no older than 120 slots ago and with healthy relays ```sql! insert into bids_range_storage (slot, min_value, max_value, median_value, count) select rb.slot_number MIN(value) AS min_value MAX(value) AS max_value, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) AS median, count(*) as bid_count from relay_request_status rs inner join slot_relay_bids rb on rb.slot_number = rs.slot_number -- do not index slots with unhealthy relays where rs.slot_number not in ( select rs.slot_number from relay_request_status rs where rs.status in (0, 2) and rs.slot_number - lastIndexedSlotInDB > 120 ) and rb.slot_number - lastIndexedSlotInDB > 120 group by rb.slot_number ``` 2. get last indexed slot in db: ```sql! select min(slot_number) as lastIndexedSlotInDB from relay_request_status rs where rs.slot_number not in ( select rs.slot_number from relay_request_status rs where rs.status not in (0, 2) ) ``` 3. skip unhealthy relays from aggregation: ```sql! update relay_request_status set status = 3 where slot_number - lastIndexedSlotInDB > 80 and status in (0, 2) ``` cur = 1000 last = 800 forprocess = 1000 - 120 = 880 purge-bids (every 1 min): ---------- 1. remove all failed or not processed requests with bids placed 200 slots ago (200 * 12 sec = 2400 sec = 40 min)