HackMD
  • New!
    New!  “Bookmark” and save your note
    Find a note that's worth keeping or want reading it later? “Bookmark” it to your personal reading list.
    Got it
      • Create new note
      • Create a note from template
    • New!  “Bookmark” and save your note
      New!  “Bookmark” and save your note
      Find a note that's worth keeping or want reading it later? “Bookmark” it to your personal reading list.
      Got it
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Save as template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
      • ODF (Beta)
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
    Menu Sharing Create Help
    Create Create new note Create a note from template
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Gist
    Import
    Dropbox Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML ODF (Beta)
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    Write
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note    
    Published Linked with
    Like BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    ![Uploading file..._jd66dc8m6]() Let us continue this discussion at swarmresear.ch https://swarmresear.ch/ # Garbage Collection and Syncing and Proof of Burn # Syncing, Retrieval, and the GC queue The Process of Syncing -- Syncing is the process by which the chunks of data in Swarm are delivered to those nodes at which retrieval requests for the chunks will terminate - i.e. the process of moving data to where it can later be found and retrieved from. It is in the interests of the entire network that syncing happens reliably. The Retrieval Process -- "Retrieval" is the process in which a chunk is requested from the Swarm and served to the requesting node. Nodes serving a retrieval get paid via SWAP and such have an incentive to be able to serve as many retrieval requests as possible. However this is not simply a question of storing 'as many chunks as possible' but also a question of 'which chunks'. The Kademlia routing topology deermines which nodes will be asked first to serve which retrieval request and so nodes are incentivised to store more of those chunks for which they are 'closer' in address because these are the ones they are likely to get asked to serve. ## The Garbage Collection Queue This section presents a simple method for prioritising chunks based on expected profitablity. ### **The Queue** All chunks are stored in an ordered list (technically we have several orderings, but only one of them concerns us here). The idea is that profitable chunks are at the head of the queue and garbage collection happens at the tail end. ### Retrieved chunks This section deals with chunks that were requested by a peer. This does not concern local requests. **Serving a retrieval request puts chunk at top of queue** Whenever a chunk is served _to another peer_ in response to a retrieval request, that chunk is moved to the top of the queue. _This is the only way chunks enter the top of the queue._ In order to avoid a flood of spam chunks flushing out these popular chunks from the queue, newly arriving chunks in the syncing process, are inserted in the queue lower down. Garbage collection happens at the tail end of the queue. **Ranking the head of the queue by popularity (alternative)** There is an alternative to the process of inserting any chunk served to a retrieval request at the top of the queue. This alternative seeks to remember how frequently a chunk has been recently requested and give very popular chunks precedence in the queue. The outline of this idea is the following: * When a chunk is requested, store the current unix timestamp along with the chunk. * When a chunk is requested again add a fixed amount of time to the timestamp T -> T+X. * Set the timestamp to max(now, T+X). * Move the chunk towards the top of the queue, but keep them ordered by timestamp. Frequently requested chunks will have a timestamp that is far in the future. A chunk that is requested for the first time will be inserted at the 'now' mark instead of at the very top of the queue. ### Synced chunks This section deals with how to insert chunks into the database that are fisrt encountered due to the syncing protocol and have not been subject to any retrieval request. **Most Proximate Chunks** When a node receives a chunk that falls within its radius of responsibiity, this chunk is inserted into the queue 1/3 of the way down (meaning 1/3 of the way down the maximum queue length). **[Note:** 1/3 is an arbitrary choice here. It leaves sufficient room above for popular chunks to be cached independent of any syncing, while leaving sufficient room afterwards for proximate synced chunks to be stored for quite a while before being flushed out.**]** **More Distant Chunks** During the syncing process, a node will encounter chunks to which it is not the most proximate node, but rather it is a facilitator in routing the chunk from the uploader to those most proximate. The question of whether this chunk should be cached depends on its expected profitability, and this in turn (for a new chunk) is based only on its distance. Chose a formula such that closer chunks should be inserted higher up than more distant ones. **Example1:** if a chunk falls N Kademlia bins above the most proximate, then it could be inserted into the queue at N/(N+1). So the bin before the most proximate is insterted at 1/2, the row above that at 3/4, then at 4/5 and so on. **Example2:** Or you might choose an exponential dropoff in which the first bin is inserted at 1/2, the next at 3/4 the next at 7/8 and so on. **Vik comment:** it is not obvious to me how you insert into an index at quantiles. This is totally impractical if we need to count. **Dan response:** You just track the quantile and update with each change. No counting is necessary. For example, if you want to point to the 1/2 of the queue, with each two items added above it, you move one element up, with each two items added below it, you move one element down. Similarly for deletions. More generally, you keep track of elements above and below the specified quantile, and if the fraction prescribed by the quantile differs by one whole element, you move in the direction to keep the difference below 1. It is important that newly synced chunks never enter the top of the queue. This is a form of spam protection. Further protections against a flood of bad data could be given by proof-of-burn requirements (see below). Garbage Collection -- Garbage collection happens at the bottom of the chunk queue. Although the ordering in the queue is meant only as a heuristic on which chunks to delete, in the absence of proof-of-burn or a payment lottery, it is the best data to go on. As such, the tail end of the queue should be garbage collected as is. --- ## Locally requested Data In all of the above, we are talking about chunks that arrive due to syncing and chunks that arrive due to retrieval requests originating at another node. We have not at all talked about chunks that are retrieved locally. This section now is about local retrieval requests - i.e. data that you request through localhost:8500. ### No caching of local retrievals The chunk store is designed to maximise profitability of chunks sold through SWAP. Locally requested chunks do not need to be cached at all*. We can insert them at the end of the queue and they may be garbage collected next. Consider for example: **Dapp data:** If a user loads a particular dapp frequently, it is up to the browser to do the caching and not the Swarm node. **Streaming data:** If a user consumes a lot of streaming data through Swarm, this should not "flush out" other data from the Swarm database. A Swarm node that is run as a 'server' only and is never used for local consumption should not differ in its functionality from a node that is being run on a desktop computer at home that consumes a lot of data. ### A separate local cache? There is an argument to be made for local caching if Swarm is to be used as a private dropbox, with a Swarm manifest mounted locally. This is not a measure of increasing profitability of chunks, but a measure of reducing costs to retreive. **The local cache** We feel that in this case there should be a whole separate DB/cache dedicated to this local data and should not interfere with the rest which is organised by profitablitiy. If this is to be instituted, there must be a separate chunk queue of local data, and every time a retrieval request is served, we must keep track of whether the request originated locally or on the network and act accordingly. These chunks are not subject to syncing. **Vik's comment:** which chunks are not subject to syncing? in fact i dont see a point in this extra cache at all. Just pin content with whitelisted proof of burn address and that will pin it. **Dan's response:** Pinned data also needs to be removed from the GC queue. I believe that they are not subject to GC, but still subject to syncing, as that is what makes them retrievable. **Pinning data?** Often we get asked if a node can 'pin' data to be kept locally. This is terminology that comes from IPFS. It is true that we could tweak this local cache in such a way that it does not contain all (most recent) locally retrieved chunks, but instead only chunks belonging to datasets that the user has explicitly 'pinned'. This would however still be functionally different from IPFS in that pinning does not guarantee that the data remains retrievable by the rest of the Swarm, it only guarantees that there is a local copy. --- Syncing and GC with Proof-of-Burn === What is Proof-of-Burn -- **How does proof-of-burn work?** We describe here a basic prototype of proof-of-burn. It requires uploaders to have registered on-chain and it has important privacy concerns that future revisions might seek to address. Every uploader must register on-chain and deposit a stake that is locked forever / burned. The registration also contains a date range indicating when this stake is set to expire. When uploading new chunks to the Swarm, each chunk is given a signed receipt indicating which registered deposit this chunk belongs to. Nodes want to compare the rate of burn (wei / second) for each chunk that they encounter. Since it is not feasible to burn a deposit for every chunk, we proceed probabilistically. Upon receiving a chunk, a node checks how many other chunks it knows about that have the same signature (Q: and maybe what the radius of that set is?) and thereby estimate the total number of chunks in the Swarm that are signed by that key. This is possible under the assumption that chunks are evenly distributed by hash. (This is not attackable, because generating chunks which cluster in a particular hash range only make the resulting calculation less favourable to the uploader ). Knowing the approximate total number of chunks signed by a given key and the total deposit and timeframe registered on chain allows the node to calculate the wei/s burned on this chunk. [Note: there could be multiple signatures for each chunk. In this case we add the wei/s]. **vik:** I dont get this. What you need to calculate is how many chunk seconds is bought with an account relative to another. For this you count your chunk seconds for the account, i.e., `POBRate(addr) := Balance(addr) / \Sum_{i, POB(c_i)=addr}(now-storedAt(c_i)).` For a start for this we need an index by POB address. It is enough to store with the address - the number of chunks currently (n) - the last time it changed (t) - the cumulative chunkseconds (s) If we receive a new chunk with address a, we retrieve `<n,t,s>_a` then define - n' = n+1 - t' = now - s' = s + (now-t)*n Now `POBRate(addr) := Balance(addr)/s` . Plus I still maintain that for users to have individual addresses this simply will not have a big enough sample size. **dan** I do not understand the objection. The value of n is not knowable/verifiable locally. **What does proof-of-burn achieve?** spam protection: During syncing nodes only accept chunks above some minimum of wei/s, starting at 0, but rising as chunks are encountered... Changes to SWAP: Dani asks: do we need to change the swap rules to enforce proof-of-burn as a prerequisite for getting paid? **vik:** ?? I dont get this. Why is proof of burn a prerequisite for getting paid. **dan** Only for syncing. Otherwise, syncing is not incentivized. Garbage Collection in light of proof-of-burn -- **How does all of the above affect Garbage Collection?** During Garbage Collection still process only the tail of the queue and never delete from the first half of the queue. From the part of the queue that is considered for GC, prioritise chunks that have low proof-of-burn. Garbage Collection -> look at bottom half of the queue and delete any whose broof of burn has expired... with the rest ... delete lowest wei/s? We probably need some formula that combines position in the queue with rate of burn (and maybe even proximity) to decide which chunks to delete and which to keep. **vik:** yes indeed this is rather vague. Also i dont understand how expiry is relevant here. You just need to order by POBRate but that depends on time so really unclear. **dan** this needs further discussion. Expiry means that the chunk is no longer paid for, its uploader no longer considers it valuable. --- # Just some notes - ignore Could we pay for syncing? --- **Short Answer: No.** Problem in general: Who should pay whom? How would we make sure that the data reaches its destination? **The lottery method** One idea might be the following: Deposit a bounty on the blockchain. When you upload a collection, sign each chunk with the corresponding key. Then there is some randomised probalistic lottery that you can win if you have the chunk (submit proof of custody) and the probability of winning increases if it is closer to your node address (somehow this has to be verified / registered nodes?). In this scenarion, during syncing, closer nodes compensate upstream nodes for the incoming sync. They are incentivised to do this based on the expected profitability of having the chunk as a result of this lottery. During Garbage Collection, look at bottom half of the queue and caluclate propable lottery winnings. delete least profitable. registration does not have to reveal node address .. we must carefully figure out what to reveal during the lottery. zk?

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.
    All
    • All
    • Team
    No template found.

    Create a template

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in via Google

    New to HackMD? Sign up

    Help

    Documents

    Tutorials
    YAML Metadata
    Slide Example
    Book Example

    Contacts

    Talk to us
    Report an issue
    Send us email

    Cheatsheet

    Example Syntax
    Header # Header
    • Unordered List
    - Unordered List
    1. Ordered List
    1. Ordered List
    • Todo List
    - [ ] Todo List
    Blockquote
    > Blockquote
    Bold font **Bold font**
    Italics font *Italics font*
    Strikethrough ~~Strikethrough~~
    19th 19^th^
    H2O H~2~O
    Inserted text ++Inserted text++
    Marked text ==Marked text==
    Link [link text](https:// "title")
    Image ![image alt](https:// "title")
    Code `Code`
    var i = 0;
    ```javascript
    var i = 0;
    ```
    :smile: :smile:
    Externals {%youtube youtube_id %}
    LaTeX $L^aT_eX$

    This is a alert area.

    :::info
    This is a alert area.
    :::

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully