Thoughts on Slush and Fractcal Scaling

# Thoughts on Slush and Fractcal Scaling Luke Pearson sent me a [hackmd outlining the design of Slush](https://hackmd.io/@kalmanlajko/rkgg9GLG5#Security-via-L2-consensus-mechanism), a "fractal scaling" approach that uses ZK proofs to bypass L1 when bridiging ZKR's at L2, L3, ..., LN (I will refer to these as "higher order rollups"). Collected here are my thoughts and questions about it so far. My thoughts fall into two categories - firstly, thoughts and questions about "fractal scaling" as a concept, and secondly, thoughts and questions about the approach outlined by the Suez team. Separating the two, I can judge the tech fairly in context of its goals. ## Thoughts about "Fractal Scaling" ### Why even bother? I've always been quite torn between the "multi-chain, not cross-chain" (fractal scaling) stance and the "separate consensus for separate domains" (cross chain) stance. On the one hand, the argument given by Vitalik makes a lot of sense to me - someone can break consistency by "bridging" an asset from one chain to another during a 51% attack, moving a bunch of "stolen" assets to the destination chain. The source chain can reorg (community gets together and decides they don't like the attacker who stole all of their assets), so according to that new version of history, the attacker's assets that sitll exist on the destination chain shouldn't exist. The whole point of a "single source of truth" is that there is a single version of the truth. At the same time, I recognize that, from a practical standpoint, simply going cross-chain makes sense. Whether or not cross-chain is a good idea from a normative ("*de-jure*") standpoint, it's *de-facto* a good idea. Why? Because there is so much value in doing it that people will do it anyways. It seems to me just as much a fact of life as "world peace would be nice, but everybody will never do it". Another way to say this is that the "wordcel" in me agrees with the rollup-centric approach and sees the "normative" impossibility of preventing conflicts between different L1s' versions of history. At the same time, the "shape rotator" in me accepts the fact that cross-chain is inevitable even though it's scuffed, the best thing to do is to make it as less-scuffed as possible. It's now a "fact of life", just like how LUNAtics will continue to wreck themselves and endanger the cosmos ecosystem by dumping money into Terra and Jump will continue to bail out Wormhole every time it gets hacked. ### Fractcal scaling requires pushing DA up to higher-order-rollups The "should we even bother trying to scale a single source of truth" stuff aside, there's also the tech problem: Rollups do not solve data availability - they explicitly rely on L1 for that. This *cannot* scale, because we know that consensus has scalability limits, and the more rollups we have, the more available blockspace we need. Thus *only* option to scale the rollup-centric approach is to push data availability into the higher-order rollups themselves. Indeed, this is what Slush does. In theory, this still is much better than the cross-chain approach, as, ultimately, every higher-order rollup is looking at a subset of the same state. Unlike the cross-chain approach, higher-order rollups have a "root state" to exit into if its parent rollup fails, so the "attacker duplicates some coins on the other chain" situation Vitalik mentions can't happen. At the same time, if we're putting data availability into the higher-order rollups, a rollup's only recourse is exit when its parent rollup is malicious. Since we're using ZKPs, we have safety guarantees about the state the rollup is exiting into, but liveness and DA gets much more complicated - if we anticipate fractal scaling would mean thousands of rollups, how thin would security be spread? ### Would inter-rollup politics make governance hard? Another thing that comes to mind for fractal scaling is inter-rollup politics/governance. While politics is inevitable, since in the end people are running nodes, I think it's important to consider how complicated it could be in a system of 1000s of rollups. In particular, the "exit" process could get quite ugly - the exiting rollup's nodes would want to "agree" where to exit to, lest it forks. I can imagine that each application and stakeholder in said rollup might have different opinions on where to go, and the rollups to which the exiting rollup is considering to exit might have opinions about that too. It's also unclear that a "cross-chain" world woundn't have as much of a politics problem. That said, I think it's likely that the following will end up being the case, reducing the impact: 1. We'll end up with a small number of highly-scalable L1s as every-ounce of consensus performance gets squeezed out, so there won't be nearly as many parties as a "fractal world" where DA is being split across thousands of rollups. 2. More and more workloads will move to separate "trustless cloud services" (e.g. compute, indexers, bridges, storage, CDN's etc) that are verifiable with ZKPs and act less like scaling replicas for L1 and more like infrastructure that uses L1 for DA (i.e. stuff like Maru). Many of them will likely have their own DA solutions as well, each optimized for the specific protocol. 3. We'll start seeing consensus/DA protocols that attempt domain separation / ["causal consistency"](https://jepsen.io/consistency/models/causal) without fragmenting security. If I'm recording kills for an FPS, all I care about is who shot first - I don't care which frontrunning bot managed to get in first on Uniswap. In general, it's unclear how much of a problem this will actually pose, and either way governance innovation will be very important. It's just something I think about, that I haven't seen too many others talk about, and that's geniunely worth considering. ## Thoughts about Slush I actually think Slush is a pretty neat way to do fractal compared to other designs - it's the only one I've seen that accepts the fact that it can't scale without pushing responsibility for DA up to higher-order rollups. Their approach to VM bridging seems quite interesting to me - in particular, routing through rollup tree is slow, so avoiding that as much as possible is a good idea I haven't seen before. That said, I have the following questions: 1. When we're "freezing" the state of the src VM when bridging, that means the whole rollup right? * This seems like the logical conclusion to me, because otherwise you'd need a parallel runtime (a la Solana) and accept the consequential loss of composability and extra complexity from needing to deterministically resolve read-write conflicts at runtime. * If transactions are going to be bridging a lot, how much would this block progress and/or negate some of the parallelism benefit? * Does this only need to happen when briding via a bot, or does it also need to happen when "routing" through the tree? * Suppose a block for the rollup contained, say, 20 transactions that each bridge to a different rollup: * Would we have to freeze and unfreeze for each bridged TX? * If we fired them all off at once, freezed the VM, and waited for all of the bots to send response proofs, that would be much better. However, if bots for 2nd and 7th TX don't respond, causing them to be routed through the tree, could that voilate atomicity? 2. If the bridging relies on an external bot to move data and submit both proofs, how easy would it be to DDoS the bots and force transactions to "route" through the tree? * Potential solution: design an auxilliary mechanism to incentivize a multitude of bots to be always online to service txs? 3. Sure, routing through L1 is lograithmic in number of rollups - how big would constant be when the state of each rollup needs to be updated all the way up and all the way down? * If we assume each rollup's block time is 15s (number used in the article), would a 10-hop route (not infrequent in a world with 1000s of rollups) would take more than two minutes? * The number of hops would be reduced if the tree is shallow-and-wide, but this is the opposite of what's most optimal in terms of DA costs (storage and consensus) according to the writeup. 4. Even if the bridging is highly efficient and seamless, how "close" to a function call could it be. * Even if it's just a little bit worse than a function call, would the sheer number of rollups introduce pain points? * If a rollup "exits" or the topology of the chain changes, do developers need to change the "destination" in their smart contract when bridging? And the following uncategorized thoughts: 1. The use of multidimentional timestamps for child proofs reminds me a lot of how differential dataflow uses them to iterate over collection deltas. I wonder if there's any fundamental connection there. 2. I could totally imagine that some idea of "domain separation" could inform the structure of the rollup tree in a similar vein to the "causally-consistent" consensus idea I mentioned above. That is, rollups that are causally "close" to each other are "close" to each other in the tree. * This might drastically reduce the number of hops that most txs would need to take in the case of routing through the tree. * It might also drastically reduce the impact of having to "freeze" VM state, but only if freezing the src VM's state doesn't block the parent. 3. All things considered, freezing the VM when bridging using a bot seems to get in the way of parallelism, but it's hard to estimate exactly what kind impact it would have in practice. Is it apt to say it's a (probably favorable) latency/throughput tradeoff?