David Irvine
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Churn FIXME: the sequence diagrams don't render on discourse. Replace them with offline-generated images or with pseudocode. This document describes in detail how churn (nodes joining and leaving the network) is handled in routing. ## Glossary ### Address See [Name](#name). ### Agreement Any action that affect the section must be agreed on by a supermajority (> 2/3) of the section elders. This is implemented by the section elders *proposing* the action by creating a message containing the proposal and signing it with their *BLS secret key share*. They then broadcast the signed proposal to the rest of the elders. Elders then *aggregate* the *signature shares* of these proposal and once they receive enough of the shares (> 2/3), they combine them into a full *BLS signature* which together with the corresponding [public key](#section-key) form an undeniable and independently verifiable cryptographic proof of the action and they can the perform it. ### ELDER_SIZE The number of elders in a section. This is a whole-network constant so every section has the same number of elders. This number is currently `7` and is chosen as a compromise between two opposing requirements: 1. It should be small because the more elders there are the more computation and message traffic is needed which affects performance. 2. It should be large to make the section resilient to failure and malice. With `ELDERS_SIZE` of 7 the number of elders is still relatively small and at the same time sections can tolerate up to 2 faulty elders which seems like a reasonable fault tolerance. This number might change in the future. ### Message signatures Messages sent over the network are cryptographically signed to prevent tampering. There are two types of signatures we use: 1. Messages sent from a single node are signed by their *ED-25519* keys 2. Messages sent from a whole section (see [Signature aggregation](#signature-aggregation)) are signed by the *BLS* secret key share and the resulting *signature shares* are aggregates into a full *BLS signature*. ### Name A way to identify objects (nodes, data, ...) on the network. It is a 256-bit long number that is typically randomly generated or obtained from a hash of some data. Thus the probability of collision is very low and so names can be considered to be globally unique. Names form a [metric space](https://en.wikipedia.org/wiki/Metric_space) where the distance function is the `XOR` operator. This is also called **XOR space** and names are called **XOR names**. Note [the XOR distance](https://medium.com/@maidsafe/structuring-network-with-xor-431e785b5ee7) is purely mathematical concept and has no relation to geographical distance. So nodes with names that are "close" according to the XOR metric can still be on the opposite sides on the globe. Name is sometimes referred to as **Address** in some contexts. It is implemented as `XorName` struct in the code. See also [Prefix](#prefix). ### Node state Each node holds a state that contains information about the node itself (`Node` structure), about the section it's a member of (`Section` structure) and about the other sections in the network (`Network` structure). ### Prefix The first `N` bits of a [Name](#name) where `N` is the prefix length. For example `(011)` is a prefix of length 3. A prefix defines a subset of the name space which contains all names that start with the prefix. An empty prefix covers the whole name space. Prefixes are used to identify [Sections](#section). ### Proposal When a node want's to perform some action that affects the section it must first propose this action to the rest of the section elders and wait for the proposal to reach agreement. See [Agreement](#agreement). ### RECOMMENDED_SECTION_SIZE A constant that determines when a section splits. A section can split when both subsection would have at least `RECOMMENDED_SECTION_SIZE` members. This also indirectly affects maximum average section size - it's twice the recommended section size. Currently this number is set to 2 * [`ELDER_SIZE`](#elder_size) which is a compromise between two opposing requirements: 1. We want the sections to be big, because that makes them more resilient to failure (even if lot of nodes are dropped, there is still enough to cover for them) 2. We want the sections to be small so that sections are responsible for smaller chunks of the address space which means less data and less network traffic per section. ### Relocation The process by which a node is moved ("relocated") from one section to another. Explained in detail in the [Node ageing](#node-ageing) section. ### Section Group of nodes on the network whose names have the same [prefix](#prefix). Initially the network has only one section with the empty prefix (covering all names). As the network grows, the sections split into two subsections whose prefixes are one bit longer than the original section. For example, section with prefix `(00)` splits into two with prefixes `(000)` and `(001)`. ### Section key The *BLS* public key that corresponds ("correlates") to the secret key shares held by the current section elders. This key correlates to a hypothetical *secret key* which is the *aggregation* of all those *secret key shares*, however this secret key doesn't physically exists as the shares are generated through a *DKG*. Every section key is signed by the previous section key (the key of the previous set of elders) to form a *Section chain*. The only exception is the very first section key called *genesis key* which is the key of the section consisting of only one node - the *genesis node*, or the first node of the whole network. ### Section structure Structure that contains information about the section the node is a member of. It contains: - The list of current section elders - The section [prefix](#prefix) - The [section chain](#1-section-chain-and-proof-chain) (the last key of the chain is the [section key](#section-key)) - The container of all the members of the section The current elders + prefix are grouped into a structure called `EldersInfo` which is signed using the current section key. The individual entries in the members container are also signed using the respective section key that was current at the time the entry was created. ### Signature aggregation We use the [BLS](https://en.wikipedia.org/wiki/BLS_digital_signature) signature scheme (implemented by the [threshold_cyrpto](https://github.com/poanetwork/threshold_crypto) library) which is a [Threshold scheme](https://en.wikipedia.org/wiki/Threshold_cryptosystem). A BLS signature is created by *aggregating* multiple *signature shares*. Only the elders of a section hold the secret key shares and we set the BLS threshold to 2/3 of the number of elders. This means that each BLS signature is a proof that at least a supermajority of elders signed the corresponding messages. This is how sections reach agreement on any actions without the need for centralized authority. See also [Agreement](#agreement). ### Sync message A message that contains a copy of the `Section` and `Network` structures of the sending node. This message is used to synchronize the node state among the section members. All the data in the message is already signed with BLS signatures so it can be already actioned without needing any additional agreement or aggregation. Actioning a `Sync`, means *merging* the `Section` and `Network` contained in the message into the `Section` and `Network` the node has. ### Upper layer Also called *downstream* / *user* / *consumer* / ... - the library or binary that uses the sn-routing library. ### XorName See [Name](#name). ## Pseudocode conventions The pseudocode samples in this document use a syntax based on a simplified subset of [rust](https://www.rust-lang.org/). Additionally, the following constructs are used: - `send(dst, message)` - send `message` to the peer at the given `dst` socket address (not this is a socket address (IP address + port), not a "XOR address"). - `broadcast(message)` - broadcast `message` to all the elders in the section. - `receive()` - wait for an incoming message and returns it. - `emit(event)` - emit the given event to the [upper layer](#upper-layer). - `error(error)` - signal that the given `error` has occurred. Errors are usually returned to the caller or logged. - `proposal(proposal)` - create a [proposal](#proposal) which is a message signed by the BLS secret key share of the sender. ----- ## Join When a node wants to join the network, it must first be provided with a [ED25519](https://en.wikipedia.org/wiki/EdDSA#Ed25519) keypair. This can either be auto-generated or assigned by the user (loaded from config file, ...). The public key also serves as the node [name](#name). Second, it needs a list of bootstrap contacts (socket addresses) of which at least one must belong to a currently active node in the network. It then bootstraps into the network by trying to connect to all of those contacts in parallel and the first connection that succeeds is then used as the initial bootstrap node. Then it requests the information about the section to join which is the section whose [prefix](#Prefix) matches the node name. To do this the node sends a `GetSectionQuery` to the bootstrap node and awaits a response. When the bootstrap node is already an elder in the matching section, it replies with `GetSectionResponse::Success` which contains the necessary information about the section: ```rust struct SectionInfo { // Prefix of the section prefix: Prefix, // BLS public key set of the section pk_set: bls::PublicKeySet, // Current section elders elders: BTreeMap<XorName, SocketAddr>, } ``` > [name=Gabriel] don't we need the key chain? and if so, we can also explain that the joining node would have the genesis key hard-coded so it can verify is joining the expected network and genuines Elders/section?? Otherwise, the bootstrap node replies with `GetSectionResponse::Redirect` which contains a list of socket addresses to repeat the query to. These should be "closer" to the matching section than the current bootstrap node is: - When the bootstrap node is not an elder, it replies with addresses of its elders. - When the bootstrap node is an elder, but not in the right section, it replies with addresses of elders of the section that is closest to the joining node name among all the sections it knows of. When the joining node receives a `GetSectionResponse::Redirect`, it updates the list of bootstrap nodes and repeats the `GetSectionQuery` to all of them in parallel. This process keeps going until it eventually receives a `GetSectionResponse::Success`. When that happens, it sends a `JoinRequest` with an empty resource proof to all the elders in the section it is joining. The section elders will respond with a `ResourceChallenge` message containing the data for the resource challenge the node must pass. This is to verify the node has enough resources (CPU, memory, storage and network bandwidth) to be useful to the network. The joining node must solve the challenges from all the elders and send back a `JoinRequest` which contains the respective solution to each of them. NOTE: The resource proof generator/solver is implemented in the [resource_proof](https://github.com/maidsafe/resource_proof) crate. When an elder receives this message, it performs all the following checks: - Validates the [message signature](#Message-signatures) - Checks the node isn't or wasn't already a member of this section (under this name) - Checks the node name matches the prefix of this section - Checks the node has the latest section key - Checks whether joins are allowed (`joins_allowed` flag is `true`) - Checks whether it passed the resource proof If all those checks pass, it proposes `Online` for the new node. This proposal contains the `MemberInfo` entry for the node with the `state` field set to `Joined`. When the proposal reaches [agreement](#agreement), the elders insert the contained `MemberInfo` entry into the section members container and send a `NodeApproval` message to the new node (each elder sends one). Then they raise a `MemberJoined` event to the upper layers. From the point of view of the elder, the new node is now a valid member of the section. When the new node receives the first valid `NodeApproval` message, it uses the information contained in it to initialize the [`Section` structure](#section-structure) and becomes a member of the section as adult. Any other `NodeApproval` messages received afterwards are discarded. ### Pseudocode From the point of view of the joining node ```rust fn join() { let section_info = get_section_info() // get_section_info defined below for addr in section_info.elders { send(addr, JoinRequest { resource_proof_response: None }) } loop { let message = receive() if message == ResourceChallenge { let solution = solve_resource_challenge(message.challenge) send( message.sender_addr, JoinRequest { resource_proof_response: Some(solution) } ) } if message == NodeApproval { return create_node_state(message) } } } fn get_section_info() { for addr in hardcoded_contacts { connect_to(addr) // asynchronous } let bootstrap_addr = wait_for_first_successful_connection() let bootstrap_addrs = [bootstrap_addr] loop { for addr in bootstrap_addrs { send(addr, GetSectionQuery) } let message = receive() if message == GetSectionResponse::Success { return message } if message == GetSectionResponse::Redirect { bootstrap_addrs.remove(message.sender_addr) bootstrap_addrs.append(message.new_addrs) } } } ``` From the point of view of a section elder ```rust // Called when a `JoinRequest` message is received. fn handle_join_request(request) { if !validate_signature(request) { error(InvalidSignature) } if !section.prefix.matches(request.sender_name) { error(PrefixMismatch) } if section.members.contains(request.sender_name) { error(AlreadyMember) } if request.resource_proof_response == None { send(request.sender_addr, ResourceChallenge) } else { if validate_resource_proof(request.resource_proof_response) { broadcast(proposal(Online(MemberInfo { peer: Peer { name: request.sender_name, addr: request.sender_addr, // MIN_AGE is 'Infant', MIN_AGE + 1 is the minimal 'Adult' age age: MIN_AGE + 1, }, state: Joined }))) } else { error(InvalidResourceProofResponse) } } } // Called when an `Online` proposal reaches agreement. fn handle_agreement_on_online(member_info) { section.members.update(member_info) send(member_info.peer.addr, NodeApproval) emit(MemberAdded(member_info.peer.name)) } ``` ## Leave When an elder tries to send a message to another member and that send fails, it declares the node as lost and proposes it `Offline`. This proposal contains the same `MemberInfo` entry as the corresponding `Online` proposal except the `state` field is set to `Left`. When this proposal reaches agreement, the elder inserts the contained `MemberInfo` entry with `State::Left` into the section members container, overwriting the previous entry which had `State::Joined` state. Then they raise a `MemberLeft` event to the [upper layer](#Upper-layer). From the point of view of the elder, the node is no longer a member of the section (except when the node is an elder undergoing delayed relocation, which will be explained later). NOTE: to speed the lost peers detection process, when the qp2p layer notifies us about a peer being disconnected, we immediately send them a `Ping` message. When this message gets sent successfully that means the connection was re-established and we carry on. Otherwise we declare the peer as lost the same as for any other unsuccessfully delivered message as already described. ### Pseudocode Graceful disconnect: ```rust fn handle_disconnect(peer) { if send(peer.addr, Ping) == Error { broadcast(proposal(Offline { peer, state: Left })) } } fn handle_agreement_on_offline(member_info) { section.members.update(member_info) emit(MemberLeft(member_info.peer.name)) } ``` Non-graceful disconnect: ```rust // ... if send(peer.addr, any_message) == Error { broadcast(proposal(Offline(MemberInfo { peer, state: Left }))) } // ... ``` > [name=Gabriel] I'm thinking we can briefly mention the possible states of the nodes somewhere above, before exploring the joining and leaving flows, i see some appendix/sections exist further down, so at least a link or somehow re-orginise the sections? ## Incoming relocation A [relocated](#relocation) node joining its new section goes through almost the same process as newly joining node, except there is no resource proof and instead the node must provide a `RelocatePayload` which proves that it's been relocated by the source section. The elders of the joining section perform the same steps as described in the [Join](#Join) part of this document and additionally they verify the `RelocatePayload` and then accept the node with its new age. ## Elder churn After every churn event (join or leave), the elders perform a check whether the current [`ELDER_SIZE`](#ELDER_SIZE) [oldest](#node-ageing) members of the section are different from the current set of elders and if so, trigger an elder churn which is a process where the section transitions to the new set of elders. While this process is ongoing, the current elders still retain their status and perform their duties in the section as normally. We will call the `ELDER_SIZE` oldest members of the section when they differ from the current elders the **elder candidates**. NOTE: the elder candidates usually (but not necessarily) overlap with the current elders. For example: - If a node that is older than the youngest current elder join the section, the elder candidates will consist of the new node plus all the current elders except the youngest one. - If one of the current elders goes offline, the elder candidates will consist of all the remaining current elders + the oldest adult of the section. However, in rare cases the elder candidates might be completely different from the current elders. This most likely happens during [split](#split) where all the current elders end up in only one of the subsections and the candidates for the other one consist of only adults. ### Determining elder candidates Because multiple nodes can have the same age, it's not enough to use just the age to decide the elder candidates. Instead the current members are sorted according to the following comparison rules: 1. The member with higher age wins over the one with lower age. 2. If they have the same age, the one that is currently elder wins. 3. If both are currently elders or both are currently not elders, the BLS signatures of their `Online` proposal are compared numerically and the smaller one wins. The final rule is just an arbitrary one to deterministically break ties which is also hard to game. After being sorted according to these rules, the first `ELDER_SIZE` ones are taken and become the elder candidates. ### DKG > [name=Gabriel Viganotti] nothing is said of how many new elder candidates at a time can be part of this process per candidates set, when and how are the candidates picked? > [name=Adam Cigánek] > > how many new elder candidates at a time can be part of this process > > All of them actually, but that's a detail of the DKG implementation we use which is the bls-dkg crate. But maybe it should be mentioned here as well? > > when and how are the candidates picked > > This is described in the previous section. Or Is that not clear enough? The elder candidates need to go through a Distributed Key Generation (DKG) process to get their section keys before they can become the new elders. Once the elder candidates are determined, the current elders send them a `DKGStart` message (each elder sends one to each elder candidate): ```rust struct DKGStart { elders_info: EldersInfo, dkg_key: DkgKey, // Unique identifier of the DKG session } struct EldersInfo { elders: BTreeSet<Peer>, prefix: Prefix, } ``` (Note these structures are slightly simplified compared to how they actually are defined in the code, but the differences are not significant here) The elder candidates accumulate these messages and once they get it from a supermajority of the current elders, they start the DKG. The reason to wait for a supermajority is to prevent a single malicious node from triggering spurious DKGs in order to spam the network. The actual DKG is implemented in the [bls-dkg](https://github.com/maidsafe/bls_dkg) crate and won't be described here in detail. But in summary, it consists of multiple rounds of messages being exchanged among the elder candidates until either the process completes successfully and the new keys are generated, or it fails. > [name=Gabriel Viganotti] here the candidates in these rounds are only the new ones or this actually means the existing Elders plus the new candidates?? ...I guess the later, as they all (current Elders plus new candidates) need to agree on the same pk set?? it seems we are sometimes calling candidates to current Eleders plus new candidates, and sometime we call candidates only to the new ones ?? #### Successful DKG completion When the DKG completes, it produces the new `PublicKeySet` (which is the same for everyone) and `SecretKeyShare` (which is unique per candidate and they must keep it secret). The candidates then sign the `EldersInfo` from the `DKGStart` message using this new `SecretKeyShare` and broadcast it back to the current elders in a `SectionInfo` message. When the current elders receive a supermajority of these `SectionInfo` messages, they check whether the [current elder candidates](#Determining-elder-candidates) are the same as the elders set in the message and if so, they create an `OurElders` proposal containing the new key in order to append it to the section chain. Otherwise they discard it. There are two rounds of proposals: one is the elder candidates proposing the new `EldersInfo` using the new key and one is the current elders proposing the new key using their current section key. This is necessary because we need two pieces of information and they both need to be signed: 1. The new section key (signed by the old section key) and 2. The new `EldersInfo` (signed by the new section key) Once this completes, the new key is inserted into the section chain and the new `EldersInfo` replaces the old `EldersInfo` in the [`Section` struct](#Section-structure). It's important that these two things happen as a single atomic step to maintain the invariant that the current `EldersInfo` is signed by the current last key in the section chain. Finally, the old elders broadcast a [`Sync` message](#Sync-message) to all the members of the section to update them about the new state of the section. #### Failed DKG When the candidate detects a DKG failure, it creates a failure proof and broadcasts it to the other candidates (not the current elders!). When a sufficient number of these proofs are accumulated, they are combined into a DKG failure agreement which is only then broadcast to the current elders. The reason the failures are accumulated by the candidates and not the current elders is to have the bulk of the work (including keeping any necessary state) be done by the candidates so there is less burden on the current elders who have enough other responsibilities already. NOTE: It's not necessary to accumulate a supermajority of these proofs. A `N - supermajority(N) + 1` (where `N` is the number of candidates) of them is enough because in that case a successful completion is no longer possible. When the current elders receive this failure agreement, they check whether the [current elder candidates](#Determining-elder-candidates) are still the same as the ones reporting the failure and if so, send them the `DKGStart` message again to restart the DKG. Otherwise they ignore it. ### Transition to the new elders A node can detect that the current elders changed either through observing an agreement on the `OurElders` proposal or by receiving a `Sync` message. In both cases the subsequent process is the same. When a node observes an elder change of its section, it raises a `EldersChanged` event to the upper layers: ```rust struct EldersChanged { prefix: Prefix, key: bls::PublicKey, sibling_key: Option<bls::PublicKey>, elders: BTreeSet<XorName>, self_status_change: NodeElderChange, } ``` When the node was not an elder before but is an elder now, the `self_status_change` is set to `Promoted`. When the node was an elder and is no longer one, it is set to `Demoted`. Otherwise it is set to `None`. When the section split, `prefix` contains the new post-split prefix and `sibling_key` contains the section key of the sibling section. When there is no split, `prefix` is the same as before the elders change and `sibling_key` is `None`. Splits will be explained in more detail later. This concludes the elder change. #### Lagging DKG Only a supermajority of the elder candidates are necessary to trigger the elders change because only a supermajority is needed to aggregate the `SectionInfo` proposal. This means that some of the candidates (at most `N - supermajority(N)`) might observe the elders' change before they observe the completion of the DKG. In this case they are unable to sign section messages with the new key until they observe the DKG completion. This is not a problem because at least a supermajority of the elders have the new keys and so any section messages sent by them can already be aggregated. ### Multiple sets of elder candidates In times of heavy churn there can be multiple sets of elder candidates being considered at the same time. Because different nodes might process the events in different order, there is no guaranteed way to tell which of the candidates are going to be processed first. Some nodes might see one set first but others might see a different one. Due to eventual consistency, everybody should eventually see all the sets, however the order this happens is unspecified. We don't use any ordering algorithm and instead let all the sets be processed concurrently. This means that there can be multiple DKGs (some nodes might even be participating in more than one DKG at a time), multiple different `SectionInfo` agreements and multiple `OurElders` agreements. The conflicts are only eventually resolved by the section chain which - being a CRDT - produces a consistent total order of the section keys without requiring any consensus. See the [Section chain appendix](https://hackmd.io/ERglUv22SrW_c2EjuGz-Cg?both#1-Section-chain-and-proof-chain) for more details. This means that forks (conflicts) are **not prevented**, but they are **immediately resolved**. The difference is subtle, but important. It means there is no way to commit in advance to a particular set of elder candidates as a "better" set of elder candidates might come in at any time. ### Sequence diagram ```mermaid sequenceDiagram participant A as Current Elders participant B as Elder Candidates participant C as Other members Note over A: Churn event A->>B: DKGStart Note over B: Receives DKGStart from a supermajority of elders B->>B: DKG alt DKG completes B->>A: SectionInfo Note over A: SectionInfo reaches agreement A->>A: Vote OurElders Note over A: OurElders reaches agreement Note over A: Update Section Note over A: Raise EldersChanged A->>A: Sync A->>B: Sync Note over B: Update Section Note over B: Raise EldersChanged A->>C: Sync Note over C: Update Section Note over C: Raise EldersChanged else DKG fails B->>B: DKGFailureObservation Note over B: DKGFailureObservation accumulate B->>A: DKGFailureAgreement opt elder candidates still the same Note over A: Trigger DKG restart A->>B: DKGStart end end ``` ## Split When a section grows past a certain size, it splits into two (roughly) equaly sized subsections. The splits are processed essentially as two concurrent Elder changes - one for each subsection. ### Split detection To trigger a split, the section must have at least [`RECOMMENDED_SECTION_SIZE`](#RECOMMENDED_SECTION_SIZE) members for each subsection. That is, given `N` is the length of the current section prefix (note the genesis section has pefix length of 0), there must be at least `RECOMMENDED_SECTION_SIZE` members whose name has `0` as the bit at index `N`, and at least the same number whose name has `1` as the bit at index `N`. ### Split processing The split is processed the same as normal elders change except twice - one for each subsection. The two subsections are processed concurrently. This means there are two DKGs, then two `SectionInfo` agreements and finally two `OurElders` agreements. Each elder however applies only the change that pertains to their own subsection. They will however send the `Sync` message for both subsections. This way even if all the pre-split elders end up in only one of the subsections, the other one still receives the `Sync`s which causes their new elders to be promoted. NOTE: as with normal elders change, there can be multiple splits being processed concurrently and they are only eventually resolved after being applied to the section chain. ### Split barrier A split is only driven to completion if the pre-split elders observe the `OurElders` agreement for both subsections. If they observe only one, they cache the `OurElders` proposal in a data structure called `SplitBarrier` and only when they observe the other one they finalize the split. The reason this is done is to avoid forgetting the sibling section info before the elders send the `Sync` message to them. ### Sequence diagram ```mermaid sequenceDiagram participant E as Current Elders participant C0 as Elder candidates 0 participant C1 as Elder candidates 1 Note over E: Split triggered par E->>C0: DKGStart(0) C0->>C0: DKG C0->>E: SectionInfo(0) E->>E: Vote OurElders(0) Note over E: OurElders(0) reaches agreement and E->>C1: DKGStart(1) C1->>C1: DKG C1->>E: SectionInfo(1) E->>E: Vote OurElders(1) Note over E: OurElders(1) reaches agreement end Note over E: Update section par E->>C0: Sync(0) Note over C0: Update section and E->>C1: Sync(1) Note over C1: Update section end ``` ## Node ageing Each node in the network is assigned an age which is roughly a measure of how long the node spent in the network doing useful work. The older a node is, the more likely is it to become an elder and thus get important responsibilities. This is a [sybil attack](https://en.wikipedia.org/wiki/Sybil_attack) prevention because if an attacker tried to spawn a large number of zombie nodes, they would all start with low ages and thus have little effect on the network. ### Initial age Nodes that are in the process of joining the network are called **infants** and they start with age 4. This gets incremented to 5 immediately when they join a section and become **adults**. The seemingly arbitrary number 4 was chosen to avoid too frequent relocations of young nodes (as would happen if it was 0, 1, ...). ### Relocations Node increases its age when it gets relocated from one section to the next. When the source section decides to relocate one of its members, it provides a proof of its new age which the relocated node presents to the destination section when it joins it. Based on this proof the node will join the destination section with this new age. ### Relocation trigger Relocations are triggered by churn events (joins and leaves). On each churn event, we perform the following check with each member of the section: ``` signature % 2^age == 0 ``` Where `signature` is the BLS signature of the churn event and `age` is the age of the member being checked. If this returns true, that member will be relocated. This means that assuming the signature distribution is roughly uniform, the probability of a node being relocated on any churn event is `1/2^age`. NOTE: if the check passes for an age `X` it also passes for any age < `X`. To avoid excessive relocations we limit them to only the oldest matching nodes. ### Relocation destination After a node is chosen for relocation, the destination is calculated by obtaining the `sha3-256` hash of the relocated node name concatenated with the name of the node from the churn event: ``` let destination_addr = sha3_256(relocated-node-name <> churn-event-node-name); ``` This destination address determines the next section the node will join - it's the section that is [closest in XOR distance](https://medium.com/@maidsafe/structuring-network-with-xor-431e785b5ee7) to the destination address. ### Outgoing relocations The elders of the section then create `RelocateDetails`: ```rust struct RelocateDetails { // The pre-relocation name of the node. name: XorName, // The destination to relocate the node to. destination: XorName, // A known section key of the destination section. destination_key: bls::PublicKey, // The post-relocation age of the node. age: u8, } ``` sign it with their BLS secret key shares and send it to the relocated node in a `Relocate` message. Then they propose the node `Offline` but with the `state` field set to `Relocated` instead of `Left`. The reason for this is to prevent relocated nodes from rejoining the source section (rejoining is explained in more details later). When the relocated node accumulates a supermajority of those `Relocate` messages, it combines the signature into a full BLS signature, stores it, leaves the section and re-joins the network. The `destination_key` field is set to the latest section key of the destination section that the source section knows and is used by the relocated node to check trust of messages sent to it by the destination section. ### Relocation joins The relocated node's process of rejoining the network starts similarly as a normal join - the node first requests information about the section to join. However this time it requests the section matching the `destination` name instead of its own name. When it receives the response, it first generates a new `ed25519` keypair such that the name derived from this keypair matches the destination section prefix. It then signs this new name using the old keypair as a proof that this is the node that's relocated and not some impostor that just somehow got hold of the `RelocateDetails`. The BLS-signed `RelocateDetails` + the signature of the new name are then packed into `RelocatePayload` structure and this is included in the `JoinRequest` message sent to the destination section. There is no resource proof challenge this time, as the `RelocatePayload` itself is an implicit proof that the node did already pass one when it joined for the first time. ### Incoming relocations When the elders of a section receive a `JoinRequest` from a relocated node, they verify the included `RelocatePayload` (check the `ed25519` signature of the new name and the `BLS` signature of the `RelocateDetails`) in addition to the verification steps of a normal join. They use the contained `destination_key` to provide a sufficiently long proof chain for the `NodeApproval` message so the joining node would trust it. ### Relocation of an elder (delayed relocation) The relocation process is slightly different when the relocated node is currently an elder. The reason for this is to avoid relocating too many elders before their replacements are finalized. This is because any section agreement requires a supermajority of elders and so losing `N - supermajority(N) + 1` or more elders would mean the section would become incapable of reaching any agreement, including agreement on the next elder candidates. There are mechanisms for dealing with such situations, but they are meant as an emergency measure in case of a catastrophic node loss (due to large network outages, etc...). There is no reason for the network to cause such a situation itself if it can be avoided. So, when an elder is selected for relocation, the actual relocation gets delayed until the elder gets demoted. The elders don't send them a `Relocate` message like they do to non-elders, but they send a `RelocatePromise` instead: ```rust struct RelocatePromise { // Pre-relocation name of the node name: XorName, // The destination to relocate the node to. destination: XorName, } ``` This message is also BLS signed by the section so can be used as a proof that the node was at some point selected for relocation. Then the elders propose the node `Offline` as well (with `state` set to `Relocated` too). When the relocated elders receives this message, it stores it locally and continues performing its elder duties as normal. They continue doing that even after the `Offline` proposal reaches agreement. They are incentivized to do so by the promise that they will be eventually relocated and will have their age increased. Eventually that `Offline` agreement triggers an elder change where the elder candidates no longer include the relocated node(s). Once that change gets finalized and the relocated former elder observes its own demotion, it sends the `RelocatePromise` back to the new elders. They verify its BLS signature and send back a regular `Relocate` message. From that point on everything is handled just like the regular relocation as already described. ### Rejoins When a node leaves the network for any other reason except relocation, it in theory loses all its age. In case the leave is just temporary (network outage, device restarts, ...) this would be too harsh, especially if the node has accumulated substantial age up to this point. To make things more fair, a node is allowed to rejoin the network under its existing name and will retain a portion of its age. This is possible because even nodes that left are still kept in the members container (with `state` set to `Left`) and so the section is always able to tell whether a node is joining for the first time or rejoining. If a rejoin is detected, the node is immediately relocated to the same section (`destination` is the current node name) with its age halved. However, if the halved age is less or equal to the initial age (4) then there would be no benefit in doing this as the resulting age would be the same as the age of a newly joined node. So to avoid pointless work, the join request is ignored and the node must join as an infant again. #### Abuse prevention Let's say a node was relocated to a new section and it misbehaves there and is punished by being kicked out of it. It might try to circumvent this punishment by rejoining the original section, hoping to retain at least half of its age. To prevent this type of abuse, relocated nodes cannot rejoin their original section. This is achieved by setting their `state` to `Relocated` instead of `Left` when they are being proposed `Offline`. Only nodes with `Left` are allowed to rejoin. ### Sequence diagrams Relocation of an adult: ```mermaid sequenceDiagram participant R as Relocated Node (R) participant S as Src Elders participant D as Dst Elders Note over S: Churn event Note over S: Relocation or R triggered par S->>R: Relocate Note over R: Relocate accumulates Note over R: Leaves src section R->>D: GetSectionQuery D->>R: GetSectionResponse Note over R: Changes name to match dst section R->>D: JoinRequest { relocate_payload: Some(..) } Note over D: validates the JoinRequest D->>D: Vote Online(R) Note over D: Mark R as Joined D->>R: NodeApproval Note over R: Becomes member of dst section and S->>S: Vote Offline(R) Note over S: Vote reaches agreement Note over S: Mark R as Relocated end ``` Relocation of an elder: ```mermaid sequenceDiagram participant R as Relocated elder (R) participant O as Other Src Elders participant N as New Src Elders Note over R,O: Churn event Note over R,O: Relocation of R triggered par O->>R: RelocatePromise(R) Note over R: RelocatePromise accumulates Note over R: RelocatePromise stored and Note over R,O: Vote Offline(R) end Note over R,O: Offline reaches agreement Note over R,N: Elder change Note over R: Observe own demotion R->>N: RelocatePromise(R) N->>R: Relocate(R) Note over R,N: Continue like when relocating an adult... ``` ## Appendix ### 1. Section chain and proof chain The section chain is a data structure which holds an uninterrupted and independently verifiable chain of section keys ([BLS](https://en.wikipedia.org/wiki/BLS_digital_signature) public keys) all the way from the **genesis key**. It is used as a means to verify trust. The **genesis key** is the first key in the chain and it forms the root of trust. It can be thought of as the identifier of the whole network and might be hardcoded in the node and client libraries/binaries. Alternatively it can be published in some reasonably trusted location. The genesis key is the only key that is not signed. All the other keys in the chain are signed by some previous keys. The chain does not prevent forks, but it does automatically resolve them. This means that it's possible the chain contains two or more keys that are all signed with the same parent key, but the chain still presents the keys in a single linear sequence to the outside (thus the "chain" in the name). This fork resolution is completely local (doesn't require any coordination with the other nodes). The order is established using this algorithm: 1. Arrange the keys in a tree where a key that signs some other key is its parent. 2. Sort the keys having the same parent numerically (using the `Ord` relation of the `PublicKey` type) 3. Walk the resulting three in a breadth-first fashion Example (assuming the lexicographical order corresponds to the numerical order here): ``` A->B->C->D | +->E->F->G | +->H ``` Would be sorted as: ``` A->B->C->E->D->F->H->G ``` All the mutable operations on the section chain are commutative, associative and idempotent which means the section chain is a CRDT and thus achieves consistency without consensus. A **proof chain** is a part of the section chain (could be the whole chain as well) which is attached to some signed data (e.g. a message) in order to provide a means to verify its trust. The complete verification process looks like this: 1. Verify the signature against the public key 2. Check the public key is present in the proof chain 3. Verify the proof chain against a set of already trusted keys. When passed all those checks, the data is considered to be trusted as it means : 1. the data was signed by some group of nodes 2. the group of nodes was approved by some previous group of nodes which in turn was approved by yet another group of nodes and so on all the way to a group we already trust. The genesis key is the root of trust so even if one doesn't have any other trusted keys they can always verify a proof chain that starts with the genesis key.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully