HackMD
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
![Uploading file..._jd66dc8m6]() Let us continue this discussion at swarmresear.ch https://swarmresear.ch/ # Garbage Collection and Syncing and Proof of Burn # Syncing, Retrieval, and the GC queue The Process of Syncing -- Syncing is the process by which the chunks of data in Swarm are delivered to those nodes at which retrieval requests for the chunks will terminate - i.e. the process of moving data to where it can later be found and retrieved from. It is in the interests of the entire network that syncing happens reliably. The Retrieval Process -- "Retrieval" is the process in which a chunk is requested from the Swarm and served to the requesting node. Nodes serving a retrieval get paid via SWAP and such have an incentive to be able to serve as many retrieval requests as possible. However this is not simply a question of storing 'as many chunks as possible' but also a question of 'which chunks'. The Kademlia routing topology deermines which nodes will be asked first to serve which retrieval request and so nodes are incentivised to store more of those chunks for which they are 'closer' in address because these are the ones they are likely to get asked to serve. ## The Garbage Collection Queue This section presents a simple method for prioritising chunks based on expected profitablity. ### **The Queue** All chunks are stored in an ordered list (technically we have several orderings, but only one of them concerns us here). The idea is that profitable chunks are at the head of the queue and garbage collection happens at the tail end. ### Retrieved chunks This section deals with chunks that were requested by a peer. This does not concern local requests. **Serving a retrieval request puts chunk at top of queue** Whenever a chunk is served _to another peer_ in response to a retrieval request, that chunk is moved to the top of the queue. _This is the only way chunks enter the top of the queue._ In order to avoid a flood of spam chunks flushing out these popular chunks from the queue, newly arriving chunks in the syncing process, are inserted in the queue lower down. Garbage collection happens at the tail end of the queue. **Ranking the head of the queue by popularity (alternative)** There is an alternative to the process of inserting any chunk served to a retrieval request at the top of the queue. This alternative seeks to remember how frequently a chunk has been recently requested and give very popular chunks precedence in the queue. The outline of this idea is the following: * When a chunk is requested, store the current unix timestamp along with the chunk. * When a chunk is requested again add a fixed amount of time to the timestamp T -> T+X. * Set the timestamp to max(now, T+X). * Move the chunk towards the top of the queue, but keep them ordered by timestamp. Frequently requested chunks will have a timestamp that is far in the future. A chunk that is requested for the first time will be inserted at the 'now' mark instead of at the very top of the queue. ### Synced chunks This section deals with how to insert chunks into the database that are fisrt encountered due to the syncing protocol and have not been subject to any retrieval request. **Most Proximate Chunks** When a node receives a chunk that falls within its radius of responsibiity, this chunk is inserted into the queue 1/3 of the way down (meaning 1/3 of the way down the maximum queue length). **[Note:** 1/3 is an arbitrary choice here. It leaves sufficient room above for popular chunks to be cached independent of any syncing, while leaving sufficient room afterwards for proximate synced chunks to be stored for quite a while before being flushed out.**]** **More Distant Chunks** During the syncing process, a node will encounter chunks to which it is not the most proximate node, but rather it is a facilitator in routing the chunk from the uploader to those most proximate. The question of whether this chunk should be cached depends on its expected profitability, and this in turn (for a new chunk) is based only on its distance. Chose a formula such that closer chunks should be inserted higher up than more distant ones. **Example1:** if a chunk falls N Kademlia bins above the most proximate, then it could be inserted into the queue at N/(N+1). So the bin before the most proximate is insterted at 1/2, the row above that at 3/4, then at 4/5 and so on. **Example2:** Or you might choose an exponential dropoff in which the first bin is inserted at 1/2, the next at 3/4 the next at 7/8 and so on. **Vik comment:** it is not obvious to me how you insert into an index at quantiles. This is totally impractical if we need to count. **Dan response:** You just track the quantile and update with each change. No counting is necessary. For example, if you want to point to the 1/2 of the queue, with each two items added above it, you move one element up, with each two items added below it, you move one element down. Similarly for deletions. More generally, you keep track of elements above and below the specified quantile, and if the fraction prescribed by the quantile differs by one whole element, you move in the direction to keep the difference below 1. It is important that newly synced chunks never enter the top of the queue. This is a form of spam protection. Further protections against a flood of bad data could be given by proof-of-burn requirements (see below). Garbage Collection -- Garbage collection happens at the bottom of the chunk queue. Although the ordering in the queue is meant only as a heuristic on which chunks to delete, in the absence of proof-of-burn or a payment lottery, it is the best data to go on. As such, the tail end of the queue should be garbage collected as is. --- ## Locally requested Data In all of the above, we are talking about chunks that arrive due to syncing and chunks that arrive due to retrieval requests originating at another node. We have not at all talked about chunks that are retrieved locally. This section now is about local retrieval requests - i.e. data that you request through localhost:8500. ### No caching of local retrievals The chunk store is designed to maximise profitability of chunks sold through SWAP. Locally requested chunks do not need to be cached at all*. We can insert them at the end of the queue and they may be garbage collected next. Consider for example: **Dapp data:** If a user loads a particular dapp frequently, it is up to the browser to do the caching and not the Swarm node. **Streaming data:** If a user consumes a lot of streaming data through Swarm, this should not "flush out" other data from the Swarm database. A Swarm node that is run as a 'server' only and is never used for local consumption should not differ in its functionality from a node that is being run on a desktop computer at home that consumes a lot of data. ### A separate local cache? There is an argument to be made for local caching if Swarm is to be used as a private dropbox, with a Swarm manifest mounted locally. This is not a measure of increasing profitability of chunks, but a measure of reducing costs to retreive. **The local cache** We feel that in this case there should be a whole separate DB/cache dedicated to this local data and should not interfere with the rest which is organised by profitablitiy. If this is to be instituted, there must be a separate chunk queue of local data, and every time a retrieval request is served, we must keep track of whether the request originated locally or on the network and act accordingly. These chunks are not subject to syncing. **Vik's comment:** which chunks are not subject to syncing? in fact i dont see a point in this extra cache at all. Just pin content with whitelisted proof of burn address and that will pin it. **Dan's response:** Pinned data also needs to be removed from the GC queue. I believe that they are not subject to GC, but still subject to syncing, as that is what makes them retrievable. **Pinning data?** Often we get asked if a node can 'pin' data to be kept locally. This is terminology that comes from IPFS. It is true that we could tweak this local cache in such a way that it does not contain all (most recent) locally retrieved chunks, but instead only chunks belonging to datasets that the user has explicitly 'pinned'. This would however still be functionally different from IPFS in that pinning does not guarantee that the data remains retrievable by the rest of the Swarm, it only guarantees that there is a local copy. --- Syncing and GC with Proof-of-Burn === What is Proof-of-Burn -- **How does proof-of-burn work?** We describe here a basic prototype of proof-of-burn. It requires uploaders to have registered on-chain and it has important privacy concerns that future revisions might seek to address. Every uploader must register on-chain and deposit a stake that is locked forever / burned. The registration also contains a date range indicating when this stake is set to expire. When uploading new chunks to the Swarm, each chunk is given a signed receipt indicating which registered deposit this chunk belongs to. Nodes want to compare the rate of burn (wei / second) for each chunk that they encounter. Since it is not feasible to burn a deposit for every chunk, we proceed probabilistically. Upon receiving a chunk, a node checks how many other chunks it knows about that have the same signature (Q: and maybe what the radius of that set is?) and thereby estimate the total number of chunks in the Swarm that are signed by that key. This is possible under the assumption that chunks are evenly distributed by hash. (This is not attackable, because generating chunks which cluster in a particular hash range only make the resulting calculation less favourable to the uploader ). Knowing the approximate total number of chunks signed by a given key and the total deposit and timeframe registered on chain allows the node to calculate the wei/s burned on this chunk. [Note: there could be multiple signatures for each chunk. In this case we add the wei/s]. **vik:** I dont get this. What you need to calculate is how many chunk seconds is bought with an account relative to another. For this you count your chunk seconds for the account, i.e., `POBRate(addr) := Balance(addr) / \Sum_{i, POB(c_i)=addr}(now-storedAt(c_i)).` For a start for this we need an index by POB address. It is enough to store with the address - the number of chunks currently (n) - the last time it changed (t) - the cumulative chunkseconds (s) If we receive a new chunk with address a, we retrieve `<n,t,s>_a` then define - n' = n+1 - t' = now - s' = s + (now-t)*n Now `POBRate(addr) := Balance(addr)/s` . Plus I still maintain that for users to have individual addresses this simply will not have a big enough sample size. **dan** I do not understand the objection. The value of n is not knowable/verifiable locally. **What does proof-of-burn achieve?** spam protection: During syncing nodes only accept chunks above some minimum of wei/s, starting at 0, but rising as chunks are encountered... Changes to SWAP: Dani asks: do we need to change the swap rules to enforce proof-of-burn as a prerequisite for getting paid? **vik:** ?? I dont get this. Why is proof of burn a prerequisite for getting paid. **dan** Only for syncing. Otherwise, syncing is not incentivized. Garbage Collection in light of proof-of-burn -- **How does all of the above affect Garbage Collection?** During Garbage Collection still process only the tail of the queue and never delete from the first half of the queue. From the part of the queue that is considered for GC, prioritise chunks that have low proof-of-burn. Garbage Collection -> look at bottom half of the queue and delete any whose broof of burn has expired... with the rest ... delete lowest wei/s? We probably need some formula that combines position in the queue with rate of burn (and maybe even proximity) to decide which chunks to delete and which to keep. **vik:** yes indeed this is rather vague. Also i dont understand how expiry is relevant here. You just need to order by POBRate but that depends on time so really unclear. **dan** this needs further discussion. Expiry means that the chunk is no longer paid for, its uploader no longer considers it valuable. --- # Just some notes - ignore Could we pay for syncing? --- **Short Answer: No.** Problem in general: Who should pay whom? How would we make sure that the data reaches its destination? **The lottery method** One idea might be the following: Deposit a bounty on the blockchain. When you upload a collection, sign each chunk with the corresponding key. Then there is some randomised probalistic lottery that you can win if you have the chunk (submit proof of custody) and the probability of winning increases if it is closer to your node address (somehow this has to be verified / registered nodes?). In this scenarion, during syncing, closer nodes compensate upstream nodes for the incoming sync. They are incentivised to do this based on the expected profitability of having the chunk as a result of this lottery. During Garbage Collection, look at bottom half of the queue and caluclate propable lottery winnings. delete least profitable. registration does not have to reveal node address .. we must carefully figure out what to reveal during the lottery. zk?

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully