Save Dweb Backend Privacy Overview

# Save Dweb Backend Privacy Overview ## Core Assumptions - Veilid's cryptographic systems can be trusted - IP privacy is mostly between peers and isn't guarding against an adversary with full network visibility - Group members only invite those they trust ## Protocol Description ### Group Creation / Membership Groups form the core of the backend, allowing users to create or join them to discover each other. Groups list Data Repositories. Data repositories contain the hashes of file lists which can be downloaded from peers using the replication protocol. ```graphviz digraph { backend group1 group2 backend -> group1 backend -> group2 group1 -> repo1 group1 -> repo2 group1 -> repo3 group2 -> repo4 group2 -> repo5 group2 -> repo6 } ``` #### Group Creation - A Veilid DHT Record gets created which uses an [ED25519 keypair](https://gitlab.com/veilid/veilid/-/blob/main/veilid-core/src/crypto/vld0/mod.rs?ref_type=heads#L39) and some [signed metadata](https://gitlab.com/veilid/veilid/-/blob/main/veilid-core/src/storage_manager/types/signed_value_descriptor.rs?ref_type=heads#L6) is used to generate the DHT key - We then generate a separate [shared secret](https://gitlab.com/veilid/veilid/-/blob/main/veilid-core/src/crypto/vld0/mod.rs?ref_type=heads#L139) key to hide the DHT value from node operators using [chcha20poly1305](https://gitlab.com/veilid/veilid/-/blob/main/veilid-core/src/crypto/vld0/mod.rs?ref_type=heads#L9) #### Group DHT Values - When writing a value to a DHT subkey, we [encrypt](https://gitlab.com/veilid/veilid/-/blob/main/veilid-core/src/crypto/vld0/mod.rs?ref_type=heads#L323) it with the shared secret, when reading we decrypt the value. - The first subkey in the DHT record is reserved for the "group name" - Subsequent DHT subkeys are used by group members to identify their Data Repository using its DHT key #### Group Joining - A group is identified using its DHT record key, and it can be joined by sharing the DHT key, owner keypair, and encryption shared secret. - When a new member joins a group they first initialize a Veilid DHT record from the record ID and owner keypair - They then use the shared secret to read the group name and get the list of Data repositories - If they want to share their membership with others they can add their data repo to the group for others to read. They must scan the subkeys starting at `1` and find the first empty one to which they can write their key. #### Group Deletion - Any member can delete the data at a subkey but they should only do that for their own data repo if they wish to remove themselves. - Deleting another member's key isn't advised since it doesn't actually remove the member. Proper member removal will be developed in a future iteration of the group system using proper consensus mechanisms. ### Data Repositories (Repo) - A Data Repository represents a participant in a Group that is sharing some data and participating in replication. #### Repo Creation - After loading a group users can create a new DHT record for which they will not share their keypair with anyone else. - After creation, the DHT record key should be added to the Group's subkeys #### Repo DHT Values - The same encryption shared secret as the group is used to encrypt all values - The first subkey (0) is reserved for the repository name. this can be thought of as the user name of the participant - The next sub key is used for storing the hash of the repo's file list. This should be updated any time data changes - the third sub key is used for advertising a route id that other members can use to contact this peer. they should be updated every time the route is deemed to be dead. ### File List Format Files are listed as a CBOR encoded `HashMap<String, Hash>` where the string is a file path within the dataset and the value is a 32 byte Blake3 [iroh hash](https://www.iroh.computer/docs/components/blobs) encoded as raw bytes. ### Repo File Operations Updating the state of the files in the repo involves the following steps: - get the current hash or make an empty file list - load + parse the HashMap state from Iroh - mutate the HashMap - encode to bytes with CBOR and save to Iroh to get the latest hash - save the latest hash on the DHT ### Tunnels Veilid provides us with a way to create routes that can receive messages. However, creating many tunnels is computationally expensive, so it is better to reuse a single route ID. On top of this foundation we built a way to multiplex several connections from several peers. Tunnels are identified using a route ID of the sender and an unsigned 32 bit integer. This allows peers to open multiple tunnels to others by increasing the 32 bit integer. #### Tunnels - Wire Protocol Messages are sent using Veilid AppMessages to routes. each message is prefixed with a 32 bit unsigned integer, followed by 32 bytes for the route id. The rest of the AppMessage is the actual contents of the packet for that tunnel. The first message (PING) sent through a tunnel contains the bytes [0x07, 0x02, 0x08, 0x03] (SAVE on a phone dial pad) followed by the route ID blob needed to register the route with veilid. When a peer gets a tunnel ID it has not seen before it should attempt to check if the message contains the PING and if not ignore the tunnel. If it does then register the notified the application of a new tunnel and listen for subsequent messages. The Route ID from the Tunnel ID is where responses must be sent. ```mermaid sequenceDiagram AppA->>TunnelsA: Open New Tunnel to RouteIDB-Blob Note right of TunnelsA: Register RouteIDB-Blob with Veilid and get RouteIDB Note right of TunnelsA: Create Tunnel ID (u32,RouteIDA) TunnelsA ->> TunnelsB: (u32,RouteIDA)(PING)(RouteIDA-Blob) TunnelsA ->> AppA: New Tunnel (u32,RouteIDB) Note right of TunnelsB: Verify PING Note right of TunnelsB: Register RouteIDA-Blob TunnelsB ->> AppB: New Tunnel (u32,RouteIDA) AppB ->> TunnelsB: Send BYTES to (u32,RouteIDA) TunnelsB ->> TunnelsA: (u32,RouteIDB)(BYTES) TunnelsA ->> AppA: New data (u32,RouteIDB): BYTES ``` ### Data Replication Protocol Data replication works on top of a basic Request/Response protocol. The first byte sent in a message is the "type" of message, followed by any data needed. The other end will then respond to a message and end the tunnel once it's finished. The currently defined messages are: ``` NO = 0x00 YES = 0x01 HAS = 0x10u8 ASK = 0x11 DATA = 0x20 DONE = 0x22 ERR = 0xF0 ``` There are currently two types of requests: ASK and HAS. HAS will contain the BLAKE3 hash for some data, the response will then be either YES or NO if the peer has the data locally, or ERR if there was a problem. ASK does the same as HAS except that after a YES, the peer will start sending messages starting with DATA followed by some bytes for the requested data. The initiating peer must ingest the data one chunk at a time unil getting the DONE message after which they must verify the bytes and discard them if the hash does not match. There is opportunity to do verification as data comes in in a future iteration of the protocol. #### Data Replication in Group To download a Hash from the group we use the following algorithm: - list repos in the group - get their route IDs - shuffle the list of routes - iterate through each and use ASK to try to download+verify the data - on failure try the next peer - if all peers failed, return an error to the application Due to the high latency of Veilid tunnels, it's faster to ask random peers for data than it is to get the list of peers that have data and select from them. ## Risks With our stated assumptions and protocol design, we forsee some risk areas that users of the system should be aware of. ### Group Invite Leak The current group invitation gives anyone with a copy full and irrevocable access to the group due to it containing all the cryptographic secrets needed to interact with the DHT. There are a few ways that the link can be compromised: - The side channel used to invite a new member is public and bystanders get access (forums, email threads, physical stickers) - A malicious peer might intentially share the link publicly - A device without a pin can get stolen and the link extracted from the group #### Current Mitigation: - Only onboard devices by physically meeting and getting the other person to scan your group QR code directly - Only share the URL over secure encrypted messengers like Signal and make sure to have disappearing messages enabled so that an attacker can't get the link from your history - Keep a secure password on your phone and make sure to reset it before handing it off to third parties in order to force pin mode and temporarily disable fingerprint based login. #### Future Mitigations: - Instead of sharing all the group secrets we can instead share a "rendezvous" link to a temporary Veilid tunnel through which a peer can ask for the full group information from another member. This link will expire after a given timeframe or when the peer goes offline. - Change the group membership protocol to have potential members generate their repo *before* joining the group and pass the repo info to an existing member to be added into the group using some sort of consensus mechanism like [Consento](https://github.com/consento-org/). ### Group Member Removal Due to group membership consisting of knowing a shared secret, there is currently no way to "un-know" the secret on other devices. Members can be removed if they voluntarily leave the group, but this means that peers that should no longer participate have no clear path to being removed from the group since they can re-add themselves whenever. Lost devices that are unable to go back online are easy to remove by any member since they cannot re-add themselves. #### Current Mitigation: - The number one step is to only share groups with trusted members and to keep membership small when possible - If you wish to make sure to exclude a member, create a new group and only add trusted peers, then remove yourself from the old group #### Future Mitigations: - Group membership could be changed from knowing a secret to be consensus driven like [Consento](https://github.com/consento-org/) with encrypted DHT values being shared via group encryption schemes like [ssb private-group](https://github.com/ssbc/private-group-spec). - New group creation could be triggered by sending messages to the subset of trusted peers via group gossip and moving can be automated. ### Malicious Peer -> Invalid Data Our replication protocol currently requires a peer to fully download a file before verifying it this leads to some issues when a member of a group sends malicious data. 1. A peer can send some amount of invalid data, leading to an invalid hash. 2. A peer can waste anothers resources by sending an endless stream of data that can fill up device storage. The first case adds latency, but the peer can ask another device for the data. This case can also occur when tunnels break due to network churn mid-download. The second case can be more problematic since it can use up all of a phone's storage and mess up its functionality. #### Current Mitigations: - Case 1 is guarded against by deleting the junk data and retrying the download. - For case 2 we rely on group members being trustworthy. #### Future Mitigations: - One start would be to add a content-size to the File List Format so users can guarantee not to download more than they expect even if data is invalid. - Later, we could update the replication protocol to take advantage of blake3 merkle proofs to verify data as it streams in instead of once at the end so we can drop the stream replication sooner. ### Malicious Peer -> Overwrite Others IDs Since any group member can write to any DHT key, they can accidentally or purposefully overwrite the repo IDs of other members in the DHT. This can lead to members being hidden and making the group unusable. This can especially be an issue if they fill every subkey of the DHT record with garbage data that cannot resolve to actual data repositories. #### Current Mitigations: - Members will attempt to re-add themselves to the DHT when loading up. #### Future Mitigations: - Changes to how group membership is calculated can reduce the risk of malicious peers overwriting others - We can also have more advanced DHT authentication mechanisms as Veilid adds them ### Veilid Network Trafic Alerts Internet Surveillance Although veilid comes built in with a mix network for making it harder to find the origin and destination of traffic, it does not yet disguise the traffic as something innocuous. This can mean that environments with passive surveilence or strict firewalls can make users of the app more visible as targets for active surveillence on the network. #### Current Mitigations: - Using public wifi can sometimes help increase your anonymity, although in very strict networks it might be best to forego the internet entirely. #### Future Mitigations: - Veilid has ["lets make Veilid traffic not look like Veilid traffic"](https://gitlab.com/veilid/veilid/-/issues/329) on their roadmap by masking it behind standard TLS streams.