Sparse blobpool adversarial peers

# Sparse blobpool adversarial peers ## Sparse blobpool Sparse blobpool is one approach to scale the EL blobpool that relies primarily on probabilistic fetching and availability signaling. A peer that receives an announcement for a blob transaction will fetch the full blob with probability *p* and will fetch a partial blob (cells) with probability *1 − p*. In the case of a partial fetch, to mitigate data withholding attacks, the peer checks for availability signals from at least *k* neighbor peers within a given timeout and only then stores the partial blob in the blobpool with the transaction. ## Malicious peers In Sparse blobpool, acceptance decisions depend on neighbor peers that signal availability. Could those peers behave maliciously and deceive a peer about availability? What impact would introducing such malicious peers have? The worst-case attack is a blobpool DoS, where an attacker fills a victim's blobpool with transactions that will never be included in a block. Current blobpool implementations include several mitigations against this. For example, nonce gaps in the blobpool are disallowed because gapped tx will not be included in blocks until the gap resolves. Thus, for better DoS resilience, it is safest to store only transactions that are likely to be included in the near future and removed from the mempool. The scenarios below describe malicious behavior that can be executed locally against a specific node with which malicious nodes are connected. Further study is needed to estimate the probability and repeatability of these behaviors, but the attacker goal is to waste some or all of the victim’s blobpool space. The primary impact would be degradation of the victim’s blobpool service (blob building and getBlobs API serving). I do not expect this to affect chain finality or liveness. From an engineering perspective, however, I think it is undesirable to require operators to delete all on-disk tx and restart whenever this occurs, and this risk needs to be considered. ## Scenario 1. The attacker connects *k* nodes to the victim node. - Can an attacker create this situation in realistic situation? - The calculated table suggests *k* will be set between 2 and 4. - Geth nodes typically allocate two-thirds of connections to inbound. - With MaxPeers = 50: 34 inbound and 16 outbound. - Additionally, one inbound or outbound connection is randomly dropped every 3–7 minutes. - How feasible is it for an attacker to occupy 2–4 inbound connections of a victim? - Intentionally increasing *k* makes such a setup harder to achieve, but would that stop propagation or increase *p* reducing the scaling factor? 2. The attacker creates blob tx *A0: {address: A, nonce: 0}* and announces it to the victim node. Using the attacker nodes from step 1, the attacker signals availability of *A0* using its *k* peers, exclusively to the victim. The attacker does not make this availability signaling visible to the rest of the network. - 2-1. (with prob. p) If the victim requests a full fetch, the attacker nodes do not respond and return to step 1, finding other victim nodes. - If it is affordable, the attacker could just response to this and spend the cost for the *A0*, and send *B0, C0, ...* until there is a situation of 2-2. - 2-2. (with prob. 1 − p) If the victim requests a partial fetch, the attacker nodes respond. - The victim then announces *A0* to its neighbor peers. - However, because the attacker won't signal availability to the other parts of the network, *A0* fails to be propagated. - As a result, *A0* is not propagated to the rest of the network but is stored only in the victim’s blobpool. 3. The attacker creates blob tx *A1: {address: A, nonce: 1}* and announces it to the victim node. - 3-1. (with prob. p) If the victim performs a full fetch, the attacker nodes respond. - The victim announces *A1* to its neighbors. - However, neighbor peers, unaware of *A0* (because it failed to be propagated previously), treat *A1* as a gapped tx and reject it. - 3-2. (with prob. 1 − p) If the victim performs a partial fetch, the attacker nodes respond and the same outcome as 2-2 occurs. - Regardless of the victim’s fetch decision, *A1* would not be propagate to the rest of the network and will be stored only in the victim’s blobpool. - Therefore, if the attacker ever succeeds in storing a non-propagating blob tx (*A0*) in the victim’s blobpool (case 2-2), the attacker can subsequently accumulate additional blob tx using the same address (*A1, A2, A3, …*) in the victim’s blobpool. - Those transactions will never be included in the block, since they were not disclosed to the other part of the network. - From the victim’s perspective, it cannot distinguish the attacker nodes as malicious after 2-2 because those nodes have been responding to the requests of the victim node. Fortunately, Geth’s blobpool caps stored transactions per account at 16. Thus a single stuck tx will not immediately lead to a blobpool DoS. However, since the attacker cannot be identified, attacker tx cannot be identified either, and the attacker’s up to 16 inserted tx per account cannot be removed by the victim node. To prevent such waste, we need a method to remove those txs. Also, another remaining question is to what extent the attacker can waste our blobpool. Can an attacker perform this attack repeatedly to slowly fill the blobpool? This ties back to question 1 above, but it is unclear for me how to quantify this possibility. ## Future work - Document the blobpool 16-tx cap per account in the Sparse blobpool EIP. - Clients that do not enforce this will be vulnerable to DoS. - To my knowledge this limit has not been specified in the EIP. - Considering that the current blobpool discourages re-propagation, it would be nice to specify this condition. - Evict blobs from the blobpool that have not been included in a slot after a resonable period. - This requires further specification. - Current blobpool implementations treat eviction of blob tx as prohibitive. - Deleting blob tx already be propagated could be a DoS vector. - (I need to study the rationale for this behavior further.) - Note that this does not eliminate the possibility of malicious peers wasting blobpool space. Those peers can still try to occupy the blobpool space, but we will also try to remove them. - Therefore, if the attacker can repeat this kind of attack in a short period, this method won't be able to mitigate it. - The whole concept of the sparse blobpool is that we can identify malicious peers with some probability, depending on their behavior. In my view, if there is no method to actually verify whether **each response** is helpful to us, peers can always find a way to bypass our radar. For example, if we verify them probabilistically, they can also attack us probabilistically. - Can we intentionally lengthen the period between successive attacks? Can we combine the three strategies below? 1) Prioritize signals from peers whose peer score is high enough. - To accrue peer score, a peer must maintain a stable connection for a certain period and contribute to our blobpool (Csaba proposed a similar idea). - This can prevent multiple attacks from being initiated in a short period. 2) Delete old transactions from our blobpool when it is full. - This can remove txs used by attackers, but also some cheap txs from honest peers - We might need a transaction tracker for local blob transactions to prevent our clients’ transactions from disappearing. 3) 16-tx cap per account