Data stream transport layer substitution

# Data stream transport layer replacement ## Introduction The datastream has 3 parts: - Data Stream Producer API (directs call to library): - This [library](https://github.com/0xPolygonHermez/zkevm-data-streamer) is embedded to the producer. It has an API to produce the contents (the blue items on the diagram) and a port (TCP based) that give service to the client using a propietary protocol. - Data Stream Client library (using propietary TCP connection): - This is the client library for any client that want to consume the events produced. - Relay: It's a piece that connects to a DS, synchronize from them and publish a new stream ![zkEVM DataSreamer diagram](https://raw.githubusercontent.com/0xPolygonHermez/zkevm-data-streamer/main/doc/data-streamer.png) ### Actors - **Datastream trusted**: It's the main datastream with the source of truth data - **Datastream replicas**: It's a replicated datastream (currently using TCP proprietary) - Producer **Sequencer**: execute tx and produce l2batch data and publish the result using **datastream trusted** - Consumer **SequenceSender**: use data for sequences batches - Consumer **Aggregator**: use the data to generate the proofs ## Goals / Requirements - Evaluate the cost of change transport layer of datastream to gRPC, or another protocol from current stack - Evaluate the options available for make data stream highly available and consistent across replicas - Evaluate the migration cost ## Change transport layer to gRPC There are two possible separated tasks to achive that. Each task is independent but not incompatible ### Part A: Change the API of sequencer (producer) and datastream to gRPC: This is to create datastream server as an independant service. To do that we need to change the way that the producer is connecting to the service, instead of direct calls, using a gRPC client. ```plantuml [Sequencer] ..> endpoint_gRPC: connect [Datastream Trusted Server] --> endpoint_gRPC: provide [Datastream Trusted Server] --> file ``` #### Task related - Datastream: Create a client library for producer compatible with current API - Datastream: Change the API calls to gRPC (9 calls) - Datastream: Setup a new gRPC end-point - Sequencer: Integrate it to sequencer - Infra: create a new docker and deploy of data stream server ### Migration effort - The migration is easy because we only need update **sequencer** and **data stream** at the same time - The current datastream file must be copied ### Pros * Standard transport layer between producer and datastream * It allows to hot change sequencer ### Cons * Multiple points of failure * More complexity on deployment ### Part B: Change the client datastream API for consumers to gRPC: The current datastream is creating a TCP end-point with a proprietary protocol. The idea is to use gRPC + protobuf as this end-point #### Task related - datastream: Create a new end-point for gRPC (6 calls) - datastream: Create a new library for use gRPC end-point - clients: Change used library for new one ### Migration effort - The datastream could provide both end-points: proprietary TCP and the new gRPC temporally in order to make more easy the migration - The clients could migrate progressively to the new end-point ## Other alternatives to explorer - Pubsub / Event store - Websockets - P2P # Highly availability To achieve this there are several alternatives depending on which solution we adopted: ## Replica The first part is to have a mechanism that replicate the **trusted datastream** ### Using relay of datastream library Currently the system have a piece called **relay** that act as a replicator and server of a Datastream. ```plantuml [sequencer|datastream] --> publish_stream: stream [relay] --> publish_stream: connects and synchronizes [relay] --> publish_replicated_stream: stream note right of "sequencer|datastream": primary DS note right of relay: replicated DS ```  ### Replicate internal data using a tool (e.g. rsync) This option requires to sync the internal file between replicas. ## Fail over The fail over could be manual or automatic ### Manual - Stop the main datastream + sequencer, change configuration of sequencer and launch the new one ### Automatic - Implement a primary/secondary pattern in datastream # Conclusion After checking the options we agree: - Changing consumer API to gRPC and split **sequencer** and **datastream** as independent process allow to implement strategies for HA. - Changing client API to gRPC is easy but may the change does not significantly improve the product - Datastream HA can be achieve with current pieces (relay) with a bit of adaption - Failover switch we suggest the manual one, because is the current solution adopted on zkEVM, doing a smarted one implies a develop - We propose to go deeper and analyze changing full datastream to a standard event sourcer solution. Because our use case looks like match exactly that kind of solutions. See appendix below. # Appendix: change full datastream for a event storer Datastream is acting as a custom event store. There are a lot of solution of that: - Kafka - RabbitMQ - Redis - ... Pros: - Standard solution with a lot of tooling - Scalabitlity and high avaliability out of the box - Reduce code of CDK Cons: - Performance? The current datastream is a tailored solution and presumably faster for current use - Learning about the new tool