Farcaster
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
1
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Pruned Sets *Contributors: Varun Srinivasan, Paul Fletcher-Hill, Sagar Dhawan, Sanjay Prabhu, Shane da Silva, Dan Romero* ## 1. Problem A problem with the current Hub design is that there is no limit on how much data a user can add to the network. Unbounded storage will introduce two new problems that apply a centralizing force on the network. First, most developers will not be able to afford to keep a copy of the network's data if it keeps growing indefinitely. A few "large hub operators" will emerge as intermediaries between users and developers, offering to maintain a full copy of all data in exchange for some fees. Second, users might accidentally or maliciously add billions of messages to the network. Hub operators would solve this by developing tools to ignore specific users and content. Some operators might wield such tools more aggressively than others and have different opinions on what constitutes spam on the network. The cumulative effect will be to push the network towards centralization. A few hub operators become the intermediaries between developers, users and their audiences. They can collude to block users and developers (or be forced to do so). We believe Farcaster can become [sufficiently decentralized](https://www.varunsrinivasan.com/2022/01/11/sufficient-decentralization-for-social-networks) if **all user data is replicated across 100+ hubs** owned by different entities. We also believe that developers building companies on Farcaster have a strong incentive to run hubs and keep the network decentralized. However, they are only likely to run hubs if they are: 1. **Full copies of the network** - a Hub must have all data needed for a useful client. 1. **Easy to set up** - a Hub can be run locally or in the cloud on a single machine. 2. **Cheap to run** - a Hub can store a single user's data for < $0.01 / year. ## 2. Solution Hubs enforce a limit on the total number of messages within each Set, dropping the oldest known message if the limit is exceeded. They also enforce an age limit, dropping messages older than a rolling window for each Set. The size and age limit maximize the number of relevant messages stored on a network. A size limit alone would result in old data being kept around forever even if it is never viewed. Reallocating this space to data from a more active, recent user is more desirable for the network. A future update will explore further changes to allow users to preserve specific messages or pay for increased storage. These are out of scope for the upcoming v2 release since they increase complexity and most users will not be affected by limits for a while. ### 2.1 Set Limits Limits are intentionally **very conservative** and we expect them to increase in the near future as we develop certainty in our models. | Collection | Max Size | Max Age | | -------- | -------- | -------- | | Casts | 10,000 | 1 year | | Reactions | 5,000 | 3 months | | Follows | 5,000 | None | | Signers | 100 | None | | Verifications | 50 | None | | Meta | 100 | None | *See [Appendix A](#Appendix-A:-Estimated-Message-Volumes) and [Appendix B](#Appendix-B:-Estimated-Message-Sizes) for a detailed estimate on volume and sizing of messages* Message types have different limits based on their frequency, periodicity, value to the author and value to the network: - **Casts** - moderate volume and medium-term useful, so have high size and age limits - **Reactions** - frequent and medium-term useful, given medium size limits and low age limits - **Follows** - low volume and important, given medium size limits with no age limits - **Signers, Verifications, Meta** - rare but important, given low size limits and no age limits Based on estimates, only a tiny percentage of users (<1%) will run into size limits. At 10M users, the projected size of the network data is ~ 10TB which costs approximately $11k / year on major cloud providers. This comes out to a cost of $ 0.001 / user, 10x below our target, giving us space to increase limits and add new message types. ### 2.2 Known Issues 1. **A malicious actor can bloat the network since fids are cheap** Fids have a one-time registration cost but guarantee storage of data on the network. A malicious actor can register thousands of fids and bloat the network with dummy data. This will have to be specifically addressed before mainnet launch, but it is not urgent during testnet when registrations are restricted. 2. **A compromised signer can wipe a user's history without a trace** Previously, a compromised signer could delete earlier messages but it would always leave a tombstone. Now, a signer can issue a large volume of dummy messages to wipe older messages out without a trace. Both attacks can be reversed by revoking the signer and re-issuing the valid messages from a backup. 3. **Hubs can be subject to a Network DDOS by sending millions of valid messages** A malicious user can spam the network, and intelligent rate limits and a peer reputation system will be needed to prevent this. In the short term, this can be mitigated by only peering with known good actors until a more robust solution is developed. 4. **Users can accidentally wipe their own history** Similar to (2), a user can accidentally generate many messages that wipe their own history. Such a problem would only affect advanced users and can be handled with checks at the RPC client or server. 5. **A malicious actor can erase another user's messages by issuing signers** Signers can be submitted from invalid custody addresses, an optimization introduced to smooth custody address migrations. With a limited signer set, this can be exploited by a malicious user to flood the set with addresses causing the older ones to be pruned. A fix will need to be made to modify this behavior. ### 2.3 FAQ **Can we incentivize more hubs by giving them a share of the network fees?** While many mechanisms can prove that a Hub is hosting data, there is no practical way to prove that it is serving the data to callers with reasonable latency and a low error rate. **Can we let users choose how many messages to store?** Unbounded data growth leads to two problems. First, the network size will grow large enough that the average developer cannot afford to run a hub that maintains a full copy of the network. Second, some users will publish spam requiring hubs to have the ability to block specific users. The network will end up being controlled by a small oligopoly of hub operators have the power to remove people from the network. **Can we have super hubs that store more data?** See above **Can we use sharding to scale the network instead of a limit on data?** See above **Can we let users pick specific messages to keep forever?** We plan to explore this in a future upgrade to the protocol. While technically feasible today this adds significant complexity to the sync model and would significantly delay the launch of v2. **Can network-rate limits be used instead of a storage limit?** Any form of network-based rate limit to enforce a storage limit is bound to fail since it can be sidestepped in several ways in an untrusted, eventually consistent network. **Can we implement limits on actual bytes instead of the number of messages?** Counting bytes might lead to a more predictable outcome for "bytes-on-disk", but it is much harder for people to reason about and more work for Hubs to keep track of. **Can we evict reactions when a parent message is pruned?** To do this, we must be able to guarantee that all data will always live on a single logical Hub. While we have a clear path to this until 10M users, we do not yet have a path to our target of 1B and wish to hold off on this optimization for now. ## 3. Consensus Changes All Sets now have a "prune clause" that triggers when the a message when the size or age limits are exceeded. The message with the lowest farcaster identifier is removed first until the age and size clause are satisfied. This changes an important property of our sets — a member can now revert to a null state from a non-null state. The prune rules can be defined more formally on a set S which contains some messages, the lowest ordered of which is n. Now assume a new message m is added: - If the `set_size` < `max_set_size` , do nothing - If the `set_size` = `max_set_size`, and - `order(m)` =< `order(n)`, reject m - `order(m)` > `order(n)`, prune n and add m The prune rule must also trigger every second on every message n in the set S: - If `time.now()` - `timestamp(n)` > `age limit`, prune n - Else, do nothing ### 3.1 Last Write Wins Sets Reactions, Follows, Verifications, Signers and Meta use LWW sets with an add-set and a rem-set to track state. Adding the prune rules into the LWW rules adds a few new state transitions that were not possible before: 1. rem → null when a later message is added or the clock ticks forward 2. add → null when a later message is added or the clock ticks forward 3. messages may be rejected when order(message) < order of lowest element in set S ![](https://i.imgur.com/fEhDGTi.png) Users and applications perceive null and rem states similarly, so the only semantic change is that adds sometimes do not win even if they are the last write. An add must be both the last write for a value and within the validity range for the set's messages to win. We call this a *LastValidWriteWins* or LVWW set which has slightly different guarantees to the user. **Reactions**: LVWW sets works well with reactions where the value of a reaction decays with age and purging older ones is desirable. **Follows**: LVWW sets are less ideal for follows since the value of a follow does not necessarily change with age. Applications must take care to track follow set state and warn users when a new follow might expire an older one. **Signers**: LVWW sets with Signers suffer the same problem as follows since older signers are not necessarily less valuable. However, signer actions are rare and most users are not expected to cross the limits. The purging of a signer does not violate the consistency model since SignerRemoves have the same effect on other sets. ### 3.2 Remove Wins Sets Casts use RW sets with an add-set and a rem-set to track state. Adding the prune rules into the RW rules adds a few new state transitions that were not possible before: 1. rem → null when a later message is added or the clock ticks forward 2. add → null when a later message is added or the clock ticks forward 3. messages may be rejected when order(message) < order of lowest element in set S ![](https://i.imgur.com/3kQW3QN.png) If the remove message has an earlier order than the add message through user error, then a prune may cause an add to win after remove has won. Similar to LVWW sets, adds sometimes do not win if they are outside the validity range. We define this as *CausalRemovesWinIfValid* or CRWV sets. **Casts**: CRWV sets work perfectly fine except in the case where the timestamps are set incorrectly. The user can remedy this by waiting for the add message to expire or publishing a remove message with a later timestamp. ## 4. Sync Changes Hubs maintain a merkle trie with the id of every known message to compare state with each other. Previously, new messages were most likely to insert messages in the right hand side of the tree. The prefix-exclusion sync algorithm was chosen to optimize for this case. The introduction of pruned sets makes left-hand side changes very likely, and mid-tree changes somewhat likely. This reduces the efficiency of our sync algorithm though it is almost always better than a naive traversal of the tree. No changes are needed for now, but this will need to be upgraded sooner rather than later. ## 5. Appendices ### Appendix A: Estimated Per-User Message Volumes Shows the average, median and maximum number of messages generated by users of Farcaster over a year. Estimates were obtained by sampling the 30-day rates and extrapolating them to a year. | | **1yr Avg** | **1yr Median** | **1yr Maximum** | |---------------|-------------|----------------|-----------------| | **Casts** | 348 | 72 | 28,968 | | **Reactions** | 864 | 132 | 103,008 | | **Follows** | 828* | 168* | 47,964* | *Follows cannot be estimated accurately by extrapolation from a 30 day data set, since users follow often when they sign up and slow down thereafter* ### Appendix B: Estimated Message Sizes Shows the estimated sizes for three largest set types by volume on Farcaster. Estimates are based on the add-type which is usually larger, and factor both the message size as well the size of any indexes need to look up the message in the hub. | | **Avg (bytes)** | **Max (bytes)** | |---------------|-----------------|-----------------| | **Casts** | 700 | 1364 | | **Reactions** | 348 | 348 | | **Follows** | 332 | 332 |

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully