owned this note
owned this note
Published
Linked with GitHub
# Rollback banana
This doc presents a design proposal on how to implement the rollback feature introduced on the banana protocol upgrade.
## Implementation guide
### Data Stream
- add unwind batches: instruction that indicates to drop state until batch N
- mark batches as invalid: those batche shouldn't be added into the state. This is useful to avoid processing batches that will be unwinded later on
### Erigon
- CLI option to trigger a rollback, the command should:
1. Accept the following params:
- First invalid batch
- List of {L1 info tree index -> GER}
2. Dump txs that are going to be unwinded back to the pool
3. Add an instruction to unwind the state on the data stream so other clients syncing from it mimic the same actions
4. Unwind state
5. Inject the provided GERs to a new block
Note that this should be done while erigon sequencer is stopped, and be resumed after the command is executed
- Be able to switch from RPC mode to sequencer and the opposite, in particular this case should work:
1. While in sequencer mode generates the data stream
2. Switch to RPC mode and consume a different data stream, add a bunch of blocks and batches to the state, ...
3. Switch back to sequencer mode and keep producing blocks
4. The resulting data stream produced by the sequencer should include all the blocks, including the ones added to the state while synchronizing from a different data stream when in RPC mode
### CDK
Build `the tool` outlined in `Option B: edit the stream`
## Flow
```mermaid
sequenceDiagram
admin->>erigon: stop sequencer
admin->>sequence-sender: stop sequence-sender
admin->>L1: trigger rollback
admin->>rollback tool: start rollback process
rollback tool->>erigon: detect rollback
Note left of rollback tool: Option A:
rollback tool->>erigon: start sequencing
rollback tool->>erigon: inject txs sequentially
Note left of rollback tool: Option B:
rollback tool->>erigon: inject modified sequence
rollback tool-->>admin: done
admin->>erigon: start sequencer
admin->>sequence-sender: start sequence-sender
```
---
1. Stop erigon acting as sequencer & sequence sender
2. Trigger rollback on-chain
- :question: Needs tooling? Probably not, can be done thrugh etherscan, but double check with carlos :question:
3. Use rollback tool (more details bellow on this doc)
4. Start erigon acting as sequencer & sequence sender
## Rollback tool
Before triggering a rollback, there should be an investigation to understand what went wrong: why there's a need for a rollback. This investigation will lead to the decission that there's a need to rollback starting at a particular batch. Once the rollback is triggered on-chain, it's time to fix the sequence to avoid the problem.
The goal is to re-sequence with the minimum impact possible to the end users, ideally reaching the same state root presented by the RPCs before the rollback happened
### Option A: re-send txs
> **NOTE:**
> - in this scenario "the tool" should just be a CLI option of Erigon that handles all the steps.
> - If we completley discard option B, should this happen automatically? (assumption that Erigon will be stopped before sending the rollback on-chain, and the code will be patched (if needed) before restarting)
This option is suitable when the problem can be fixed by re-sequencing the txs. This may need some fixes on Erigon or sequence sender to avoid making the same mistakes again. Once the software is patched, the tool should:
1. Request Erigon to detect the rollback event, unwind the state and add the rollback indicator on the Data Stream
2. Send the txs that has been reorged back to the pool one by one (waiting for them to be added on the state before sending the next one) so the sequence is as close to the original as possible
Once this process is completed, restart everything. This design has a big problem: the injection of Global Exit Roots (GERs). By default the sequencer will just pick the latest GER from L1, this means that it will inject different GERs compared to the rolled back sequence, causing bridge claims done towards the non sequenced GERs to be reverted. To avoid this from happening:
- Erigon should pick the GERs injected on the rolled back sequence and add them before start the re-sequencing the reorged txs
- What if the problem was that the sequencer picked wrong GER / L1 info tree index in the first place? Flag to avoid re-injecting GERs could be an option. Maybe the list of GERs needs to be passed manually as a parameter ("the tool" could handle that)?
### Option B: edit the stream
This option is suitable when the problem is well understood, but fixing the code may need some time (keep in mind that the network is halted, and we will be in a hurry to recover it). In this case the tool should:
1. Detect the rollback event and add the rollback indicator on the Data Stream
2. Add the rolled back sequence as it is after the rollback indicator
3. Enter an interactive mode where the admin can edit the data stream:
- Split a given batch into two
- Modify timestamps
- Modify GERs/indexes
- Remove a tx
- ...
4. The admin will ammend the problem by modifying the sequence
5. The tool will serve the ammended Data Stream
6. Erigon will be started in RPC mode, and consume the Data Stream from the tool
7. Once the synchronization of the ammended Data Stream is complete by (RPC) Erigon, this will be restarted in sequencer mode
8. Restart sequence sender
:question: Can this work from the PoV of Erigon: sync from a DS in RPC mode, then jump into sequencer mode while offering the full Data Stream (includding the part that has been consumed as RPC)? :question:
## Data Stream
The data stream needs to be able to indicate reorgs:
- An indicator stating that the last N batches need to be rewinded (or rewind until batch M)
- (optional) Ideally add some indicator that prevents executing reorged batches if not already synced