Try   HackMD

Holochain Security Review & Audits

tags: holochain security audit

Reviewer: Least Authority

Series of Scopes in Security Review Process

SCOPE 1 – Lair Keystore

Goal: Assess that there is no private key leakage from this tool set for key management, and no attack vectors via API for accessing private keys and that permissions are enforced.

Overview of Lair Keystore

Holochain's secret lair is where secrets are kept. It is the secure keystore which holds private keys and seeds. Private keys should be encrypted at rest, only load to crypto-protected memory, and never leave lair, therefore lair performs any and all cryptographic functions which need to use private keys, for the various Holochain subsystems which need decryption, encryption, or signing.

The main Lair Keystore crate provides the lair-keystore executable allowing initialization, configuration, and running of a Lair keystore.

Direct access to lair functions requires an understanding of the subtleties of cryptography and proper methods of use. In general these crates and docs are NOT intended to be used directly by hApp developers, who should just be using the HDK functions which get routed to lair under the hood by the Holochain conductor.

Composition of Lair Keystore

  1. sodoken [Code Repo] [Docs]
    Ergonomic wrappers around the rust libsodium-sys ffi crate
    libsodium wrapper providing tokio safe memory secure api access.
    sodoken is used by the following 3 crates to provide the libsodium cryptography api

  2. hc_seed_bundle [Code Repo] [Docs]
    SeedBundle parsing and generation library.
    Lair stores 32 byte seeds used to derive Ed25519 signature, and X25519 encryption key pairs. These seeds can be encrypted and passed around using the hc_seed_bundle encoding. The following two crates use hc_seed_bundle to manage seed bundle encoding and decoding. hc_seed_bundle uses sodoken for the libsodium api to derive keypairs from the seeds.

  3. lair_keystore_api [Code repo] [Docs]
    Secret lair client private keystore API library.
    lair_keystore_api contains the bulk of the logic for running a lair keystore server, connecting to it with a lair keystore client, and performing operations using private keys such as generating signatures and decrypting data. lair_keystore_api uses hc_seed_bundle to manage seeds, and sodoken for libsodium cryptography.

  4. lair_keystore (or lair_keystore_lib if used as a library) [Code repo] [Docs]
    Lair server lib + 'lair-keystore' cli server binary
    This crate mostly provides the lair-keystore executable allowing initialization, configuration, and running of a Lair keystore. If you want to run an in-process keystore, this crate also provides the canonical sqlite store. lair_keystore uses lair_keystore_api to provide all the logic for running a lair keystore server. lair_keystore provides the canonical sqlcipher/sqlite store implementation for key persistence, as well as the cli executable itself.

SCOPE 2 – Holochain Deterministic Integrity (HDI for HDK v0.1.+)

We believe this scope is probably best taken in two parts:

  1. Holochain's Integrity Model: This is the complex network mathy part to confirm Holochain's approach to data integrity. State changes are local. Agent's actions are signed to their local hash chains which establishes an immutable sequential history, then headers and public entries are published to the validating DHT.
  2. Deterministic Validation: Every node performing validation (the author and any validators in that shard of the DHT) must eventually arrive at the same validity status (and until then identify that they are missing dependencies). We've limited the range of Holochain functions available in integrity zomes to a subset that produce deterministic functions. All DHT nodes who must store and serve any piece of data will arrive at the same validation state for that data.

Part 1: Holochain Data Integrity Model

Holochain functions as a kind of data integrity engine for distributed apps. It's a bit like the era when databases were accessed via stored procedures, UI clients call functions which enforce business logic and ensure universal data integrity.

This phase is a high-level review of Holochain's data integrity model for app state and the convergence of network nodes to expected state – specifically, the integrity guarantees we deliver in the HDI 0.1.x+ versions. (There will be future guarantees of finality, and validation of served data, etc.)

Holochain makes specific integrity guarantees:

  1. State: Agent's actions are unambiguiously ordered from any given action back to genesis, unforgeable, non-repudiable, and immutable (accomplished via local hash chains called a Source Chain - because all data within the network is sourced from these chains.)
  2. Self-Validating Data: Because all DHT data is stored at the hash of its content, if the data returned from a request does not hash to the address you requested, you know you've received altered data.
  3. Self-Validating Keys: Agents declare their address on the network as their public key. You can confirm any network communication is valid by checking the signature using the from address as the pubkey.
  4. Termination of Execution: No node can be hijacked in to infinite loops by non-terminating application code in either remote zome call or validation callbacks. We use WASM metering to guarantee a max execution budget to address the the Halting Problem.
  5. Deterministic Validation: Ensure that only deterministic functions (ones that will always get the same result no matter who calls them on what computer) are available in integrity zomes. An interim result of "missing dependency" is also acceptable, but final evaluation of valid/invalid status for each datum, must be consistent across all nodes.
  6. Strong Eventual Consistency: Despite network partitions, for any nodes who are an authority (or become one at any point) will arrive at the same state (validation result) for each data element. This is ensured by the DHT functioning as a CRDT (via operational tranforms "Ops").
  7. "0 of N" Trust Model: Holochain is immune to "majority attacks" because any node can always validate data for themself independent of what any other nodes say. See this Levels of Trust Diagram
  8. Data Model Scalability: Because of the sharding of DHT storage and valdiation, the computing power and overall throughput for an application scales linearly as more users join the app. Global consensus systems can't utilize the compute power of new nodes to increase throughput / work done.
  9. Atomic zome calls: Multiple writes in a single zome call will all be committed in a single sql transaction or all fail together, if they fail the zome call will report an error, and if the zome call errors the writes will be rolled back

The codebase at the heart of this is the Holochain Deterministic Integrity crate inside the main Holochain repository. The HDI crate contains all the functions that devs can use in their apps, but they are supported by other code beneath them.

Code components involved in the above guarantees:

  1. State: This may be the most difficult one to narrow down. Devs write their apps (which change source chain states by calling CRUD operations) using the Holochain Developer Kit [Docs]. They compile their app to and distribute it as WASM, which is run by wasmer and call into Holochain's to change Holochain state. These changes are flushed at the end of zome call as a unified SQL transaction using sqlite.
  2. Self-Validating Data: We use standard Blake2b hashing for content addressable storage in the DHT.
  3. Self-Validating Keys: The one exception to the Blake2b hash function is that the hash function for Agent keys is the identity function (the hash = the content). You can just switch between entry and hash types with no change for AgentKeys. Since this key is also the agent's network address in our P2P networking, it's a self-validating key.
  4. Termination of Execution: We place a max ops limit on app calls using wasmer's built in metering in our guest/host interface. The hApp is guest WASM executing inside a Holochain conductor as the host.
    [Docs]
  5. Deterministic Validation: All functions available in the validation callback of the integrity zome which defines entry structues and validates their integrity are constrained to deterministic funcions available in the Holochain Deterministic Integrity crate.
  6. Strong Eventual Consistency: Local chain state changes are transformed to DHT operations to update particular address space authorities with the CRDT transformations required to converge to consistency. (Later scopes will look at gossip/networking.)
  7. "0 of N" Trust Model: Since every node/user of a holochain app has the DNA (as a hash of wasm code and config) of that app as the first/genesis entry of their source chain, every node can run validation an ANY data at any time. It never needs to trust validation outcomes reported by other nodes. (Scope 3 will look at the use of "1 of N" trust when retrieving data from the DHT that others have validated.)
  8. Data Model Scalability: I don't know what code to point you to except to show a dev can configure a target level of data redundancy. Once the number of nodes on the network exceeds the redundancy target, node workloads do not grow. The work is just divided among more nodes leading to an overall decrease in the percentage of work performed per node.

Note: We believe that Holochain is fully Byzantine Fault Tolerant even though it does not focus on managing consensus. See this paper for related proof of BFT in eventually consistent systems. But I'm not sure if we're making that one of the guarantees, yet.

Part 2: Implementation of Deterministic Validation

This is a review of our implementation of the above theoretical framework, specifically the HDI functions (HDI = Holochain Deterministic Integrity: the subset of our HDK functions available to integrity zomes which perform validation and guarantee data integrity).

Parts of this review may be quite simple (e.g. standard hash algorithm for self-validiting data, standard crypto lib for self-validating keys, built-in wasm metering, etc.), while other parts may require a deeper review of our code (e.g. CRDT consistency, trust model, capabilities, etc.).

Keep in mind that there is a later scope for auditing network functionality. For this phase, you can treat the network as a black box connecting nodes with ongoing gossip where messages may be temporarily lost, misordered, and such (any Byzantine Fault), but that the structure of chains and validation ensures convergence to a consistent validation outcome of Valid, Invalid, or an interim state of Missing Dependencies.

SCOPE 3 – Capabilities, Warrants, Countersigning

  • Capabilities model w/ signed calls
  • Warrants:
  • "1 of N" Network Trust Model: Because it only takes one honest node in a neighborhood to detect fraud and circulate warrants for the fraudster. Holochain's DHT 1 of N from network, and can be reduced to 0 of N for any
  • Countersigning (Atomic changes across Source Chains):
  • Micro-Consensus (M of N Countersigning):
  • Scheduling?
  • rust-backed formal interface that is mockable with host

Capabilities: Local state can only be changed by commands signed by authorized keys or secrets granting the capabilities to perform that change. Note: the chain's "agent" is like root granted all local state change capabilities. In addition the provenance of the zome call itself is signed such that it cannot be forged or replayed (e.g. has a nonce) or delayed (e.g. has expiry).
Capabilities: Every function call into a zome checks cryptographic capabilities for execution.

Subsequent Reviews

  • SCOPE 4: Deepkey (decentralised key management)
  • SCOPE 5: Networking (QUIC networking implementation and E2E encryption with proxy)
  • SCOPE 6: HoloFuel

Things we could also do

  • Serialization and wasmer crates

Functional internal APIs to audit?

  • Host / Guest interface – Fuzz testing handling this?
  • Lair / Holochain interface – Fuzz testing?
  • Zome Call Interface (Capabilities) – covered above?
  • Network Calls – Fuzz testing?