Viktor Trón
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee
  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- title: DB refactor and push syncing tags: swarm --- # DB refactor and push syncing This is the initial design document to achieve better storage, capacity management and efficient syncing. Requirements for optimising correctness with efficient resource utilisation: - *fast syncing* - newly uploaded chunks arrive at their nearest neighbourhood fast - lean on bandwidth, but also resilient to node dropouts or eclipse attacks - chunks cached on delivery route do not enter the syncpool - *garbage collection* - - items that are not synced yet never get purged - items which are least profitable are purged first - items that are more distant are purged first - first approximation to reconcile these is neighbourhood updates sent from caching node, later POC payments - *total local replication* - nearest neighbours always syncrhonise all chunks historically within their area of responsibility - optionally start filling up higher bins - for security there might be a need to do (pull) syncing between each pair of peers ## DB structure ### Counters The database for chunks is relying on DB abstractions using shed. Shed provides ways to define global fields and indexes over `IndexItem`. An index item has the following fields: - accessTimestamp - time of latest chunk access - represents recency of access - mutable - for collision resistance should be combined with hash - 0 is null vaue - storedTimestamp: - time of chunk storage - represents chunk age - immutable - for collision resistance should be combined with hash - 0 is null vaue - address: - the chunk'd address (either content address or feed update address) - to be validated against chunk data by localstore validators - empty slice is null vaue - data: - chunk data - to be validated against address by localstore validators unless switched off for testing with globalstore - empty slice is null vaue - useMockStore - pointer to bool to indicate to the encode/decode functions - nil is null value `shed.Get(ctx, item)` merges the fields gotten from the value to item. Therefore all values of `IndexItem` are treated as nullable. ### Indexes: The new localstore's main component is a dbstore using shed. We will not use a memory store in the beginning. First benchmark, then use cachesize settings if needed. The db defines the following shed indexes - RETRIEVAL index: `hash` -> `storedTimestamp|chunk` - ACCESS index: `hash -> accessTimestamp` - PULL syncing index: `po|storeTimestamp|hash` -> `nil` - PUSH syncing index: `storeTimestamp|hash'` -> `nil` - GC index: `accessedTimestamp|storedTimestamp|hash` -> `nil` ### operations Using the shed's indexes, the localstore dbstore will provide several modes of update and access. Practically, calling a `mode` on the localstore will return a `ChunkStore` interface ```golang type ChunkStore interface { Put(context.Context, storage.Chunk) error Get(context.Context, storage.Address) (storage.Chunk, error) } ``` The importance of modes of update/access is clear once we consider that depending on its origin, adding new chunks require slightly different operations. For generality, we define updates of existing entries and removal of entries as modes of update. - UPDATE * **SYNCING**: when a chunk is received via syncing in it is put in - RETRIEVAL, PULL * **UPLOAD**: when a chunk is created by local upload it is put in - RETRIEVAL, PULL and PUSH * **REQUEST**: when a chunk is received as a result of retrieve request and delivery, it is put only in - RETRIEVAL and GC index; ie., it does not enter the syncpool * **ACCESS**: when an update request is received for a chunk or chunk is retrieved for delivery - PULL and GC updated * **SYNCED**: when push sync receipt is received - delete from PUSH and put to GC * **REMOVAL**: when GC-d, - delete from RETRIEVAL, PULL, GC - ACCESS * **REQUEST**: when accessed for retrieval - UPDATE GC index with a new accessedAt (asyncronously) only if it exists * **SYNCING**: when accessed for syncing or proof of custody request - no update of accessedAt ### Garbage collection - GC is discussed at length in https://hackmd.io/t-OQFK3mTsGfrpLCqDrdlw ## Current syncing ### Peer connections and syncing Currently syncing is a protocol using the stream package. When two nodes are connected they will start syncing both ways. On each peer connection there is bidirectional chunk traffic. The two directions of syncing are managed by distinct and independent stream subscriptions. In the context of a subsription, the subscriber or client is called downstream peer, while the server is the upstream peer. Subscriptions are per proximity order requested by the upstream peer depending on whether the downstream peer is considered to belong to the nearest neighbours. Nearest neighbours are the connected peers that belong to kademlia proximity order higher or equal to depth. Depth is defined as the proximity order such that there are at least R nearest neighbours at any one time. R is a redundancy parameter (defaults to 2, i.e. neighbourhoods contain at least 3 nodes). ### Subscriptions to PO bins Nearest neighbours share an area of responsibility and replicate the contents of this area for redundancy. Therefore nearest neighbours subscribe to all PO bins higher or equal to depth. Peers that are not nearest neighbours subscribe to the bin they belong and as a result only receive chunks which are closer to then than to the upstream peer. ### Historical and live syncing Currently each stream (subscription) allows two parallel substreams: history and live. When the subscription is started the highest storeTimestamp in the PO bin is taken as the division point (called `sessionAt`). Already stored chunks which have storeTimestamp less than `sessionAt` are synced via the history substream whereas newly arriving chunks are synced as part of the live substream. The streamer protocol uses priority queues and prioritises live sync over history. In case the throughput is reasonable, live sync is idle, i.e the latest stored chunk is synced to downstream. Historical syncing keeps intervals and the subscriber makes sure all data is synced and no intervals are synced twice. the intervals are based on the upstream peers storeTimestamp and therefore are specific to a peer connection. ### the OfferedHashes--WantedHashes roundtrip As a result of syncing streams on each peer connection, a chunk can and would be synced to a peer from multiple upstream peers. In order to save bandwidth by not sending chunk data over to peers that already have them, the stream protocol implements a roundtrip: before sending chunks upstream peer offers a batch of hashes, to which downstream responds with stating which hashes in the offered batch they actually need. ### Caveats Live and history streams are separate primarily in order to allow faster track to newly uploaded chunks so that they become available for retrieval sooner. This however breaks down if a peer disconnects before fully syncing live. Upon reestablishing the connection or when connecting to another peer, the chunk that was previously in the live stream now becomes history. whats more since they are recent history they will be synced only after the entire history before completes. Pull syncing is node centric as opposed to chunk centric ,i.e., makes sure you fill any connection's storage with items closer to them. It does not however makes sure that any particular chunk has arrived in their local neighbourhood. ## Push Syncing Push syncing is applied to rectify these shortcomings of pull/stream syncing. It is a protocol followed by the uploader to make sure each newly uploaded chunk reaches their destination and get replicated across nodes in whose area of responsibility the respective chunks belong. Push sync is a proactive protocol where chunk is sent to their local neighbours and expect a statement of storage message as a response. ### Retries In pull syncing, a chunk travels through intermediate nodes which store and pass them on asynchronously. Push syncing, however, uses neighbourhood addressing, ie., the chunk is sent directly to storer nodes and not even cached by relaying nodes. Uploaders can keep monitoring responses to chunks they sent to the network. After some time, the chunk is pushed to local neighbourhood again. ### Tracking progress Since a message is expected from storers to indicate that chunks been synced, uploader can make sure the file/collection is fully retrievable before publishing without having to poll and actually retrieve data. ### Security, Attacks Since responses are statements of custody, there is a possibility of a grieving attack where a malicious node would eclipse the route for some chunks and respond with statements reassuring the uploader while also not forwarding the message. This is best mitigated by having pull syncing in place which channels the chunks to every peer closer and is much harder to attack. Alternatively nodes could be punished with a fingerpointing mechanism but that requires stake and registration ### POC protocol The uploader uses the PUSH syncing index (`storedTimestamp|hash->nil`) They iterate over the index from the beginning and send the chunks to the nodes whose area of responsibility the chunk falls on. For this they use `pss` neighbourhood addressing with no encryption using the `SYNC` topic. Storer nodes subsribed to the `SYNC` topic and pick up messages addressed to a destination that falls within their area of responsibility. Once a storer picks up a chunk in their area of responsibility, they send back a receipt to the uploader using unencrypted direct messaging on the `RECEIPT` topic. When the uploader (subscribed to `RECEIPT` topic) receive a receipt, they delete the item from the PUSH index and enter it into the GC index - as a result the chunk can now be deleted. Anonymous uploaders lose the ability to track synced status through receipts ### Components The current WIP implementation implements push syncing using the following components. The code is found here: https://github.com/ethersphere/go-ethereum/pull/971 - dispatcher: - fed chunks and sends them to neighbourhood using pub to SYNCING topic - receives receipts using sub on RECEIPT topic - storer - receives chunks using sub on SYNCING topic - store the chunks - sends receipts to uploader/sender using pub to RECEIPT topic - pubsub - provides pub/sub - pss adapter using raw (unencrypted) send - use neighbourhood addressing for CHUNK topic - use direct addressing with RECEIPT topic - db - to interface with localstore/netsore - subscribes to PUSH syncing index iteration (forever) - feeds the dispatcher - tags - chunks state machine SPLIT|STORED|SENT|SYNCED - tags associated with chunks (to indicate file/collection) - and have states with a counter associated with them - as chunks change state, the number of chunks of that reaching that state for the tag are incremented - counts are monotonic allowing for progress bar functionality This needs to be refactored: - db should be just a feed implemented by netstore - tags should move to storage package and handled there - the push sync 'protocol' could be part of the storage package on a par with fetcher. It should be part of netstore and could be called pusher.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully