Handling Storage, Bandwidth and Eclipse Attacks

# Handling Storage, Bandwidth and Eclipse Attacks *Thanks to @pfh, @adityapk, @sds and @kc for conversations that contributed to this design* Farcaster Hubs promise to store [~20,000 messages](https://hackmd.io/@farcasterxyz/BJj3zuVVj) for every fid perpetually. Registering an fid has a one-time gas cost, but hubs must pay ongoing cloud storage and bandwidth costs. A tragedy of the commons ensures since users have no incentive to minimize data transfer or clean up unused data, increasing the burden for hub operators. Our goal is to allocate limited storage and bandwidth between users to maximize network value. Our constraints are that Hubs must fit into a single cloud instance (< 64 TB) and we must support at least 10M users. We also believe that: 1. **Increasing costs may align incentives**. While a one-time gas fee prevents rampant abuse, a recurring fee would result in useless data being flushed more often. 2. **Storage demand could exceed supply**. Dynamic pricing or first-come first serve allocation will be needed to enforce the 64 TB limit. 3. **Premature optimizaton should be avoided**. Charging too much, adding complexity or delaying launch may make the solution worse than the problem. 4. **Network costs are also a problem**. Network throughput also places an economic burden on hubs, and a separate design is needed to handle that. ## Allocating Bandwidth A user sends messages to a hub, which in turn broadcasts it to all its peers, which then repeat the process. On average users send out 1,200 messages a year which is easy to handle. But an errant script written by a developer may accidentally send thousands of messages in a second, which can clog the network or take it down entirely. This is likely to be a problem very soon after hubs are launched. We assume that at 10M users we are producing about 400 messages/ssecond on average, and may burst up to 40,000 messages/s during peak times. More research and benchmarking is needed to see what our current system is capable of handling. Flooding can be preventing by imposing rate limits at three different layers: 1. Client API's - to prevent a single client from breaking the rules 2. Sending Layer - to prevent clients in aggregate from break the rules 3. Receiving Layer - to protect against malicious hubs that ignore the rules The most efficient way to protect the network is to prevent flooding from affect any hubs by blocking at the client API's. But this may not be possible in all cases and a more defense in depth implementation may be needed. There are three types of rate limits that can be imposed: - **Per Hub Limits** — a hub will only receive N messages in a duration D from each peer. violations are met with a temporary mute leading to an eventual disconnect. a corresponding sending limit can be implemented to prevent hubs from accidentally triggering this. - **Per User Limits** — a hub will only receive N messages in a duration D per user after which they are ignored for a period. repeat violations are met with exponentially longer ignore periods. - **Timestamp Limits** — a hub will only accept N messages in timestamp-duration T per user. this limit is based on the user reported timestamp and not the produced timestamp. additional messages will result in earlier messages in that period being dropped. this is a consensus layer change that is more complicated to implement but may have benefits when it comes to speeding up eventual consistency. There are a few things to consider when implementing such rate limits: - Rotating signers may cause a large volume of messages to be issued triggering rate limits - Rate limits can delay eventual consistency even further and trigger more out-of-band syncs - Rate limits make state less deterministic making it harder to determine eclipse attacks ### Recommendation Implement per hub sending and receiving rate limits to prevent accidental flooding during phase 4. Re-evaluate other strategies as problems emerge. ## Allocating Storage Over five years, a user who spends $2 in gas fees can make hubs spend $10 in storage costs. The calculation is based on current gas prices and a target of 100 hubs: ``` registration = ~60,000 gas , ~$2.00 at today's prices storing 20mb = 0.02c / year * 100 hubs = $2/year ``` This math is based on some assumptions that may change: - Today's gas prices are used, but future costs could be 100x higher. - Hub network size is set to 100, but it could be 10x higher. - Message size is set to 1KB, but it may grow as we add features. - Storage costs are kept static but will reduce marginally over time. We can introduce a system to help hubs scale by adding a new contract called the Farcaster Storage Registry. The registry defines a yearly rent for a unit of storage and a total number of units available for rent. Users can store 200 messages with an fid, and may rent more storage by paying a fee to this contract. Hubs listen for events and update the allocated storage accordingly. The yearly rent has a base price which increases as the number of available slots decreases, ensuring that the network does not get overwhelmed. Such a system can be implemented at any point in the future, without affecting the design or decentralization of existing contracts. Fees collected through the system can be sent to a protocol treasury. It is unlikely that such fees can be passed directly to hubs in a fair manner, but they can indirectly help hubs by incentivizing the development of more efficient hub software. ### Recommendation We propose making no immediate change and allowing users to store up to 20k messages today. Improvements described above can be made at a later date as usage of the network begins to place a load on hubs. This should be revisited before open-access launch on mainnet. ## Preventing Eclipse Attacks A sophisticated attacker can spin up thousands of hubs and “eclipse” a user by refusing to sync their messages. Legitimate hubs that peer with the attacker’s hubs will behave the same way, disappearing the user from the network. It's desirable to quickly eliminate bad actors from the network by detecting such behavior and dropping them as peers. We have a few tools available at our disposal: - **Home Hubs** — a contract can let fids register a hub url where they upload content. a hub suspecting a peer of eclisping the user can compare the state of the user on the home hub and on the peer and make a decision. - **State Comparison** — a hub can check peers periodically by requesting the merkle root for a range of historical data and punishing peers who have fewer messages than expected by disconnecting from them. More research is required to come up with a peering system that will be robust enough to identify and detect bad users quickly. ### Recommendation We expect there to be little incentive to attack the network early on and want to avoid premature optimization. However, when developing our syncing system we should do it in a way that allows data sampling and state comparison in a deterministic manner. ## Appendix * [KC's Response](https://gist.github.com/kcchu/88c7220f4c9f90de33fb8479d5f0610d)