Radicle Overview (old)

# Radicle Overview (Old) [TOC] ## Resources - RIPs: https://app.radicle.xyz/nodes/seed.radicle.xyz/rad:z3trNYnLWS11cJWC6BbxDs5niGo82 - Lars arch: https://app.radicle.xyz/nodes/seed.radicle.xyz/rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5/tree/07e455dc597b7c368ce658ba65b866a92b9a88c4/architecture/arch.md -https://app.radicle.xyz/nodes/seed.rhizoma.dev/rad:z21RZuyuZwmmqB72Lq5A7hF8NGUF1 under slides/ - man pages: https://app.radicle.xyz/nodes/seed.radicle.xyz/rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5/tree/rad.1.adoc - setting up seed node (outdated): https://github.com/radicle-dev/radicle-client-services - Radicle 1.0 feature set: https://hackmd.io/CP_hhnhuTby-f_O6_E-Zew?both - COBs: https://github.com/radicle-dev/radicle-link/blob/master/docs/rfc/0662-collaborative-objects.adoc - Onboarding experience: https://hackmd.io/5_VS0D1DR_GTCyEhd39wTA Docs Inspiration - https://socketsupply.co/guides/#quick-start - https://ssbc.github.io/scuttlebutt-protocol-guide/#keys-and-identities ## Protocol Doc Notes - Protocol 1 Pager should be ~5 minute read. Just because its abridged doesn't mean we shouldnt be specific - it is important to still be specific. - Protocol Detail Docs should have even more specifics, and be a 15 minute read - Tone feedback: some aspects of the language are too colorful. be as matter of fact as possible. don’t use adjectives. instead of saying something is “secure” - say it is secured by cryptographic keys. - a good example is satoshi's bitcoin paper, it explains things very precisely, but not using complicated language or lots of color - [ ] diagram descriptions for all of the main sections & a few other important sub sections - [ ] **COBs - need to be expanded with more details.** - radicle comes with three collaborative projects out of the box - issue - patch - id - COBs are namespaced to keep them extensible, e.g. xyz.radicle.issue and xyz.radicle.patch - it is exensible thanks to a bunch of things: - the reason they are named like that - it ensures they are unique names - someone else can create com.google.issue - This section is a bit weak, we don't say much. I think we should say: * what are cobs and how they are stored (you have some of that) * what cobs are defined already by radicle (issue,patch,id) * how cobs can be extended, and new cobs can be created * how are cobs used to implement things like discussions etc. - [ ] what is a git repo? a bunch of objects and a head - a master branch - [ ] RIDs are URNs - and explain why - [ ] Add more references that can enrich the reading experience. e.g. Git, DID, CRDT, Bootstrap nodes, TUF, Noise protocol, Tor - [ ] It is nice how SSB's guide has supplementary info in side bar: https://ssbc.github.io/scuttlebutt-protocol-guide/#discovery - [x] two topics to expand: 1. self-certifying aspect of the repository identity 2. the idea of canonicity. - [x] issues and patches arent metadata - they are “social artifacts”. issues, patches, and cobs in general are “social artifacts”. id cob - parts of it are metadata, parts of it are social artifacts - [ ] STORAGE - your upstream gets pushed locally - then that gets propagated to other nodes - [x] Add something about how Radicle and Git tie to together (Repositories). git push rad - https://git-scm.com/docs/gitremote-helpers - it invokes a git remote helper that invokes the rad protocol - [ ] Likes how this resource explains p2p vs federation https://www.eff.org/deeplinks/2023/12/meet-spritely-and-veilid - [ ] *** censorship resistance / noise protocol / tor: - [ ] "I think we need to formulate this differently, because: (1) noise is just the handshake protocol, and (2) radicle doesn't really leverage tor, it's more that radicle has a tor integration that users can leverage for privacy" - [ ] "I think this is a bit light, the noise protocol is just a handshake protocol for establishing secure connections. This section needs more thinking" - [ ] *** We should explain what makes a node a public seed node, and also say that they are running the regular radicle client with a potentially different seeding policy. (you can ask me about this) - [ ] *** So perhaps here there is a confusion that we should talk about. `sigrefs` are only used to sign the heads of all branches, not the individual commits, so let's talk about this so you understand it better - [ ] *** git fetch tunneling / multiplexing "Here we can say that the fetch is tunneled over the same secure connection established between the nodes. It is effectively multiplexed over the physical connection" - [ ] *** repository: settings terminology - [ ] cloudhead: not a fan of the word 'settings', maybe 'metadata'? or 'configuration'? - [ ] stellar: Why not the word settings? Imagine you gave user a UI for editing the details in the repository identity document. You wouldnt call it settings? - [ ] easier to just say that they seed whatever is in the policy - [ ] A few general notes/impressions: - I think the order of things flows nicely, I didn't notice any hiccups until perhaps the very last part which felt duplicated - The main weakness of the document is that it is vague in certain parts - The diagrams will help a lot when we have them - I think that we need to talk a bit more about source code, somehow; after all that's the main thing that people will publish on it; and we barely talk about how that specifically works - We could talk more generally about the network, eg. by comparing it to the ssb network, saying that nodes maintain connections with other nodes, etc. - I think we want to talk a bit about the storage layout, eg. how peer refs are layed out in storage - The tone is generally good, the few issues that exist are usually because the language used is a bit heavy, eg. words like 'utilizing' vs. 'using', stuff like that - [ ] I feel like the doc is mostly complete in the sense of covering all the major areas, though some need more clarification and specificity - I think one thing that might be missing is an overview of the protocol in 1-2 paragraphs that explains the reason for it existing and the general ideas, which are roughly: (1) public keys, (2) everything must be verifiable, (3) p2p, (4) repo identities, (5) git interop - basically a summary of all summaries that gives the rationale, ie. "what is this trying to achieve" - because the sections at the moment can feel a bit out of context; they explain something, but not necessarily why - the "why" could just be at the very beginning though and then it's easy to refer to it always - The local first section doesn't really do it justice I think; we need to explain how all actions are local, and then are synced in the background/propagated asyncrhonously with the network - ie. all actions are on your local storage - Generally let's think of what is exciting about radicle and make sure that stuff is in there - For me I think the exciting stuff is: 1) it's Git all the way down, so I don't have to relearn everything and it interops very well with a bunch of existing systems 2) it's all verifiable and I control my identity and repository identity 3) I can self-host while still being connected with the wider network 4) it's extensible, so I can implement my own workflows, datatypes etc. - I think most of that is in there somehow, but might be worth thinking about it and making sure we really drive the point ## Writing guidelines - Network vs. Protocol - cloudhead: I think just 'Radicle' when talking about the whole thing and the product / user facing stuff - cloudhead: And 'Radicle protocol' when talking about the underlying protocol - cloudhead: 'Radicle network' is the physical network of nodes - Node vs. Client - How to talk about the client software # Radicle Protocol Overview (Heartwood Release) The Heartwood release of the Radicle protocol establishes a sovereign data network for code collaboration and publishing, built on top of Git. Radicle nodes, designed to function on standard laptops or home computers, facilitate the hosting and synchronization of Git repositories across a peer-to-peer network. Nodes employ a custom gossip protocol for peer and repository discovery, alongside the Git protocol for data replication. In combination, this enables peers to locate, replicate, and verify any repository published to the network, provided at least one other peer seeding the repository is online. All user actions, from code commits to issue comments, are cryptographically signed, allowing peers to verify authenticity and data provenance without reliance on a centralized authority. Additionally, Radicle's architecture is local-first, ensuring continuous access to one's own repositories directly from their machine, regardless of internet connectivity. Radicle's protocol is engineered to cater to multiple scenarios such as code collaboration, publishing, knowledge sharing, project coordination, and data set collaboration. This guide outlines the Heartwood release's functionalities, primarily emphasizing code collaboration and publishing. ## The Radicle Network Radicle is a peer-to-peer network that locates, replicates, and verifies the authenticity of data or code stored in Git repositories. It is designed to guarantee access to repositories, regardless of their location or replica count. Radicle uses its own gossip protocol to exchange peer and repository information between nodes and the Git protocol for data replication. Nodes, identified by Decentralized Identifiers (DIDs), gossip repository information to build routing tables that aid in discovery and replication. Emphasizing resistance to censorship and continuous availability, Radicle incorporates the Noise protocol for secure, encrypted communications and leverages Tor to enhance the privacy and security of network interactions. ::: info show high level network diagram ::: ### Radicle Nodes Each user in the Radicle network runs a node. A node's primary responsibility is to *seed repositories*, which includes both hosting the data and synchronizing changes with other nodes. All nodes run the same code, regardless of their specific role or activity within the network. Nodes have a `NodeID` (NID), which is stable identifier independent of their network address, as the network address may change frequently (e.g. if the user moves from home to a coffee shop). Users configure nodes with a *seeding policy* which specifies the list of repositories they are interested in seeding, in addition to retention rules. Nodes can be run a home computer or laptop, without the need for an always-on server. No specialized equipment is required. ::: info show 3 nodes with different seeding projects and whether they are connected to one another or not based on shared project set. ::: ### Gossip Protocol The Radicle networking layer is designed as a gossip protocol that employs three types of messages for its core functionality: 1. **Inventory Announcements**: Used for sharing repository inventories and constructing routing tables. 2. **Reference Announcements**: Used for broadcasting updates to repositories, relayed only to nodes seeding the relevant repository. 3. **Node Announcements**: Facilitate the discovery of peers within the network. To prevent endless propagation, nodes drop any message already encountered. However, for the sake of new nodes, gossip messages may be temporarily stored for future dissemination. Each message contains the `NodeId`, which is the originating node's public key, in addition to a `Signature`, which ensures that network participants can verify the authenticity of each message they receive. ### Replication via Git While gossip is used to exchange metadata, the actual repository data, ie. Git objects are transferred via the process of replication. To do this, the node establishes a connection with one or more of the repository's seeds, and initiates a git-fetch via the git protocol, which is tunneled over the same connection established between the nodes. This fetch operation downloads the relevant git objects into the node’s storage, making them available to other interested nodes. > Here we can say that the fetch is tunneled over the same secure connection established between the nodes. It is effectively multiplexed over the physical connection ### Bootstrap Nodes A node joining the network for the first time will not know of any peers. Hence, the reference implementation of the network client software has been pre-configured with two *Bootstrap Nodes*: `seed.radicle.garden` and `seed.radicle.xyz`, which are registered DNS names that resolve to node addresses on the network. In the bootstrapping process, nodes resolve these names to have a set of addresses to initially connect to, and once they establish connection with a peer, use the regular peer discovery process to find more peers. ### Public Seed Nodes In Radicle, while all nodes hosting a repository contribute to *seeding data* to the network, the term "Public Seed Node" is reserved for those that play a more significant role of ensuring data availability and redundancy. Anyone interested in providing stable and reliable points for data replication can set up a Public Seed Node and help contribute to Radicle's resilience. ### Permissionless Access To ensure resilience against censorship and consistent availability, the Radicle network employs the Noise protocol for establishing secure, encrypted peer connections. Additionally Radicle is designed to integrate with anonymity networks like Tor and Nym to enhance IP address privacy, although this feature is still under development. ## Repositories A Radicle repository is fundamentally a Git repository, supplemented with a unique repository identifier (RID) and metadata essential for validating the authenticity of its data. These repositories, which can be either public or private, can accommodate diverse content including source code, documentation, and data sets. Repositories are managed by **delegates**, who can be individuals, groups, or bots, and are responsible for critical tasks such as merging patches, addressing issues, and modifying repository settings. ::: info show some diagram that displays information architecture of a repo + the additional radicle metadata ::: ### Delegates Delegates are the set of maintainers and automatons, entrusted with key responsibilities such as merging patches, managing issues, and updating the repository’s settings (also known as the **identity document**). A repository always begins with one delegate, its creator, and can remain at that size for smaller projects, or can eventually involve multiple delegates. ### Identity Document The repository Identity Document, stored under `refs/rad/id`, is a foundational element for establishing trust and authenticity within a repository. This canonical JSON file comprises essential metadata: the repository's name, description, and information about its delegates, including public keys and approval thresholds. It details how many delegate approvals are necessary for key actions—for instance, requiring two out of three delegates for changes to be authenticated into the default branch. Updates to this document are exclusively made by specified delegates, adhering to the set approval threshold. ### Collaborative Objects Collaborative Objects (COBs) act as a dynamic API for crafting a range of collaborative functionalities, such as identity, issue tracking, and code review. COBs are stored as Git objects under the `refs/cobs` directory. Utilizing Conflict-Free Replicated Data Types (CRDTs), COBs can synchronize various social artifacts, including user-defined ones, providing an adaptable system for diverse use cases. While initially focused on enhancing code collaboration, COBs are the root of Radicle's extensibility, opening avenues for future functionalities such as knowledge management, project managment, data set collaboration, and publishing. ::: info show how these varying COBs all fit together and the api / data flows wrt to the protocol / network ::: ### Private Repositories Radicle's private repositories restrict access to a designated group of trusted peers, defined in an "allow list" within the repository's settings. This ensures only nodes in the privacy set can replicate and access the data, maintaining confidentiality. However, as the data is not encrypted at rest, these repositories rely on selective replication through the allow list for privacy, rendering them invisible and inaccessible to other nodes in the Radicle network. ## Identification within the Radicle Network In Radicle, both peer connections and repository management hinge on a system where node identifiers (NIDs) and repository identifiers (RIDs) are based on Decentralized Identifiers (DIDs), adhering to W3C standards to support interoperability. ### Node Identifier (NID) The core of Radicle's node identifier (NID) is an Ed25519[^id01] key pair, encoded as a DID using the `did:key` method[^did]. A changeable, non-unique `alias` can also be associated to nodes for easier recognition across the network. This public key cryptographic framework is flexible to support a range of user types, from individuals to organizations and even automated bots. ::: info show something like this, yet have a DID and alias on it ![image](https://hackmd.io/_uploads/SyxmI0yIT.png) ::: [^id01]: https://ed25519.cr.yp.to/ [^did]: https://w3c-ccg.github.io/did-method-key/ ### Repository Identifier (RID) To ensure uniqueness and easy identification of repositories, a stable and globally unique identifier, known as the Repository Identifier (RID), is assigned to each repository. The RID is deterministically derived from the Repository Identity Document. This process involves hashing the document using Git’s hash-object command to produce a SHA-1 hash. The hash is then encoded using `multibase` encoding with the `base-58-btc` alphabet, the same method used for the `did:key` method, and prefixed with `rad:`, creating a valid URN[^urn]. For example, here's the RID for the Heartwood repo: `rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5`. [^urn]: https://datatracker.ietf.org/doc/html/rfc8141 [^mb]: https://w3c-ccg.github.io/multibase/ ## Integrity and Trust (needs better intro) At its core, Radicle rethinks traditional data storage and integrity verification methods, adapting them to a peer-to-peer environment. The network employs a decentralized storage model where each user maintains their own version of a repository, alongside mechanisms like cryptographically signed references to ensure data authenticity and trust. ### Distributed Storage with Local Forks In Radicle's architecture, storage is managed through a partitioned approach where users individually maintain their own *local forks* of repositories they are interested in. These personal forks are shared across the network, with each fork having a singular owner and writer. Users have exclusive permission to modify their respective forks. This structure underpins Radicle's local-first strategy, facilitating offline work and eliminating the need for centralized servers. #### Working & Stored Copies Users will typically have *two* copies of a repository: one local *working copy* and one in network storage, called the *stored* copy. The working copy is setup in such a way that it is linked to storage via a *git remote helper*[^grh] named `git-remote-rad`. Publishing code is then a matter of running `git push rad`, for example. ``` ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┐ ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┐ ┆ ┌───────────────────┐ ┌──────┐ ┆ ┆ ┌──────┐ ┌─────────────────┐ ┆ ┆ │ Storage │ │ │ ┆ Git ┆ │ │ │ Storage │ ┆ ┆ │ ├╸┆╸╸╸╸╸╸┆╸╸╸╸╸╸╸╸╸╸╸╸╸╸┆╸╸╸╸╸╸┆╸┤ │ ┆ ┆ │ ┌──────┐ ┌─────┐ ┌│ │ │ ┆ protocol ┆ │ │ │ ┌─────┐ ┌─────┐ │ ┆ ┆ │ │repo │ │repo │ ││ │ │ ┆ ┆ │ │ │ │repo │ │repo │ │ ┆ ┆ │ ├──────┤ ├─────┤ ├│ │ │ ┆ ┆ │ │ │ ├─────┤ ├─────┤ │ ┆ ┆ └─┴───╿──┴─┴───┬─┴─┴┘ │ │ ┆ ┆ │ │ └─┴───┬─┴─┴───╿─┴─┘ ┆ ┆ │ │ │ │ ┆ gossip ┆ │ │ │ │ ┆ ┆ │ │ │ Node ├╸╸╸╸╸╸╸╸╸╸╸╸╸╸┤ Node │ │ │ ┆ ┆ │ │ │ │ ┆ protocol ┆ │ │ │ │ ┆ ┆ push pull │ │ ┆ ┆ │ │ pull push ┆ ┆ │ │ │ │ ┆ ┆ │ │ │ │ ┆ ┆ │ │ │ │ ┆ ┆ │ │ │ │ ┆ ┆ │ │ │ │ ┆ ┆ │ │ │ │ ┆ ┆ ┌────┴───┐ ┌──╽─────┐│ │ ┆ ┆ │ │ ┌─────╽──┐ ┌──┴────┐┆ ┆ │working │ │working ││ │ ┆ ┆ │ │ │working │ │working│┆ ┆ │copy │ │copy ││ │ ┆ ┆ │ │ │copy │ │copy │┆ ┆ └────────┘ └────────┘└──────┘ ┆ ┆ └──────┘ └────────┘ └───────┘┆ └╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┘ └╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┘ ``` [^grh]: https://git-scm.com/docs/gitremote-helpers #### Repository State Changes Based on this soft forking design paradigm, in the Radicle model, there are no shared branches. Instead, each branch (or set of branches under a tree? - do we need this fragment) is owned by one user, where they are partitioned by a public key in the repository hierarchy. This means that each delegate has the exclusive ability to write to their own branch alone. In a project with multiple delegates, for example, Alice, Bob and Eve, each would have their own `main` branch as follows: ``` <alice>/refs/heads/main <bob>/refs/heads/main <eve>/refs/heads/main ``` The concept of a canonical or 'authoritative' version of the repository is established dynamically based on the delegate approval thresholds defined in the identity document. For example, if a threshold of two out of three delegate approvals is set, and both Alice and Bob have the same commit in their `main` branches, that specific commit is recognized as the authoritative, current state of the repository. ::: info show push quorum image ::: ### Ensuring Trust with Self-Certifying Repositories Radicle's approach to data integrity and trust within its distributed network hinges on the self-certifying nature of its repositories, where each update within a repository is cryptographically signed, establishing a *verifiable record of changes*. This system is inspired by The Update Framework (TUF)[^tuf], a framework for securing software update systems. In Radicle, a repository must be initialized with an **identity document** before being published to the network. This document contains crucial metadata like the repository’s name, description, and the public keys of its delegates. The initial version of this identity document is critical for deterministically generating a unique Repository Identifier (RID). Together, the RID and the identity document form the repository's identity and the *ownership proof* for verifying all subsequent updates within the repository. Simply put, with an RID, anyone can retrieve the initial identity document, confirm its match with the RID, and then authenticate all following updates to the repository. To ensure self-certification, delegates in Radicle authenticate changes by cryptographically signing over repository heads, tags, and pertinent Git references, as well as essential metadata like the repository name and description. These cryptographically signed verifications are termed *signed refs* and are stored under the `refs/rad/sigref` directory. These signatures enable tracking of each change back to a delegate's public keys, as defined in the repository's identity document. Updated with each approved change, signed refs maintain the repository's authenticated, canonical state.  ::: info show something like this: ![image](https://hackmd.io/_uploads/Sk6sSCyU6.png) ::: [^tuf]: https://theupdateframework.github.io/specification/latest/