# Problem As the number of staking modules and node operators grows rapidly, mission-critical services should work accordingly. Let's consider KAPI service. The main goal of this service is gathering data from the lido contract like staking module, node operators and submitted keys and spreading them among consumers. So the current workflow of data is the following: - KAPI collects data from a node via API - KAPI stores them into the memory of NodeJs VM - KAPI pushes them into DB - Service "A" requests data from KAPI endpoints - KAPI extracts data from DB into VM memory - KAPI sends data to Service "A" For small volume of data this workflow looks fine. However, it may be a problem for huge volume of data. For example: - Memory and storage space cannot be increased to any amount, there are some limits - Heavy service won't incentive to run KAPI service on relatively weak machines - It requires to maintain DB # Solution KAPI can use slightly different workflow than described above. A new workflow assumes not using DB at all. Instead, storing in NodeJs memory the last EL/CL meta data only, like nonces, sync timestamps and so on. That approach allows KAPI to figure out when data from contracts has been changed, and send newer data to consumers. WebSockets (WS) may fit for that purpose. Consumers subscribes on events of changing data and receives data material via WS. For example, below events can cover the streaming updates: - `operators_were_changed` - `staking_modules_were_changed` - `operators_batch_received` - `keys_batch_received` - `staking_module_received` - `sync_finished` ## How to verify data received via stream? `sync_finished` will contain the lump checksum for all data packages were sent, like: ```js! let commonChecksum = checksum(`${commonChecksum}_${packageSent}`); ``` ## What about services that are costly to modify? Of course, the `storageless` mode it's a seperate mode. However, KAPI can use half steps of the supposed approach. So, KAPI will continue storing the data into DB, but using streams to save data: - KAPI collects data from a node via API - KAPI streams data chunks for DB storage consumer and WS consumer As a benefit, supposed workflow will dramatically descrease memory consumption. ## What about data consistency during writing data into DB? Since KAPI starts using streaming data instead of accumulating data in memory, there will be a question: "which data should be sent via RestAPI while data is being stored?" KAPI will store two snapshots of data. First snapshot always contains full consistent data. Second snapshot serves as accumulator for data while data is being stored. Once data sucessfully stored, the second snapshot can be safely rotated. The steps of data updation will be the following: - remove all unfinished snapshots with status = 0 - write a new meta record with status = 0 - write new keys, operators and staking modules with reference to the meta record - update the meta record status to 1, it means the snapshot successfully completed - remove first snapshot All above operations can be performed without transaction. In case when writing operation interrupted, the next iteration of syncing will remove broken snapshot with status = 0. In meanwhile, the correct snapshot (status = 1) still exists. Therefore, KAPI will guarantee that only full snapshot of data will be available at all time.