KES Agent - HackMD

--- CIP: ? Title: KES Agent Status: Proposed Category: Tools Authors: - Tobias Dammers <tobias@well-typed.com> Implementors: N/A Discussions: - https://github.com/input-output-hk/kes-agent/issues Created: 2023-06-29 License: CC-BY-4.0 --- ## Abstract Implement a KES Agent service to provide complete forward security for block-forging signatures. ## Motivation: why is this CIP necessary? As outlined in CPS-???? (KES Forward Security), Key Evolving Signatures in Cardano dictate two seemingly conflicting requirements: 1. KES Sign Keys must never be written to persistent storage, in order to guarantee secure deletion ("forgetting") 2. A Node process must be able to restart without losing its current KES sign key. The proposed solution is to run an "agent" process, which keeps a copy of the current key in secure memory, and sends it to a node process on demand. When the node restarts, the agent keeps running, and the node process can re-fetch the current key from the agent. As long as IPC is done such that no swappable RAM and no persistent storage is involved at any point, this will prevent the accidental leaking of KES sign keys to persistent storage, while allowing Node processes to restart autonomously without losing their KES sign keys. ## Specification ### Architectural Overview The KES Agent system will consist of 3 main parts: 1. The **KES Agent**, program responsible for storing KES keys in RAM, evolving them locally as the next KES period is reached, pushing them to any connected service clients (nodes), and receiving new keys from any connected control clients. 2. The **Control Client**, a command-line program that can be used to push KES keys to a running KES Agent process. 3. A **Node**, using the **Service Client API** to connect to a KES Agent and receive KES keys from it. ### Protocol The 3 parts of a KES Agent system will use the KES Agent Protocol to communicate between them. A detailed specification of this protocol shall be published as part of the implementation effort. The core operations of the protocol will be: 1. **Connect** 2. **Handshake** - this will verify the version of the KES Agent Protocol used by both sides, to assure compatibility. 3. **SendKey** - send a key bundle (see below). 4. **Disconnect** - implicitly end the connection. A key bundle consists of: - A raw **KES SignKey** - A **KES Period number**, indicating the KES Period for which this key is valid - A **serial number** for the KES series, used to make sure that newer keys are always preferred, even if keys are sent out-of-order - An operational certificate (**OpCert**) for the key, to verify that the key is legit and may be used by the node. The same protocol is used for both the Control Client and the Service Client (Node). The KES Agent itself acts as the receiver for Control Clients, and as the sender for Service Clients, and the order of interactions is always the same: 1. Connect/accept 2. Perform handshake 3. Send/receive first key 4. Wait for next key to become available, or either peer disconnects 5. Send/receive next key 6. Repeat 4.-5. ### Control Client The Control Client will consist of a CLI program which supports the following operations: - `keygen`: generate a fresh KES key pair and store it (in a connected Agent instance) - `export-verkey`: export the KES VerKey from a previously generated key pair to a file - `import-opcert`: import an operational certificate (OpCert) from a file - `upload`: once a KES key pair and matching OpCert are available, activate them, causing the Agent to push them out to connected Node processes TBD: There are two possible ways this can be done: 1. Store intermediate data (key pair, opcert) in the CLI process until the "upload" step; this requires the CLI process to keep running while the opcert is being created on the machine holding the Cold Key, so most likely this will require the CLI to be interactive, and users will need a separate terminal to copy the verkey and opcert files to/from removable storage media 2. Store intermediate data (key pair, opcert) in the Agent process, keeping them in separate slots until the "upload" step. This allows each step to be a one-shot CLI command, the same terminal can be used to copy the verkey and opcert files around, and the terminal session can even be closed between steps; however it does complicate the protocol and the Agent code a bit. ### Node On the `cardano-node` side, a few changes need to be made, too. Currently, `cardano-node` will read a KES0 key from a local file, which provides no forward security in practice. With an Agent in place, the Node will instead connect to an Agent, wait for it to push a KES key, and then use that. Required changes: - Integrate the KES agent library - Extend configuration to support fetching keys from a KES agent ### Use Case Scenarios #### Starting Up KES Agent On startup, a KES Agent will not hold any keys. In order to get the first key into a KES Agent, the following steps must be performed: 1. Generate a key 2. Sign the key to create an OpCert, and create a KES Bundle from the key, the OpCert, and metadata. 3. Upload the bundle to the KES Agent. #### Starting Up A Node 1. When a Node starts up, and is configured to use a KES Agent, it will attempt to connect to a KES Agent, retrying as necessary. The Node will remain in a non-block-forging state. 2. Once a connection is established, KES Agent will send the current key. 3. Node will verify the OpCert, check the serial number, and store the key locally in secure memory. 4. Check KES Period. - If the key's KES Period is the current KES Period, the Node will start using it and transition into a block-forging state immediately. - If it is in the future, the Node will remain in a non-block-forging state until it becomes current. - If it is in the past, then the Node will evolve the key up to the current KES Period ("fast-forward"). If this exhausts the key's evolutions, then the key is discarded, and the Node remains in a non-block-forging state; otherwise, it will start using the key and transition into a block-forging state. 5. The Node keeps the connection open in order to receive new keys from the KES Agent as they arrive. It also keep evolving keys locally as needed, transition into a non-block-forging state when a key's evolutions are exhausted, and into a block-forging state when a key becomes current. The KES Agent will not push out evolutions of KES keys to Nodes that have already received an earlier evolution of the same key; keys are only sent in two situations: - When a Node first connects. - When a Control Client pushes an entirely new key. The exact same procedure will also be used when a Node restarts. #### Uploading A New Key While the KES Agent is running, new KES keys can be generated and installed, using the same steps as when first starting the Agent. When a new key arrives, the Agent will handle it as follows: 1. Store the new key locally. 2. If the new key's KES Period is in the past, evolve it to the current KES Period. 3. Send out the new key to all connected Nodes. #### Cold Key Protection Cold Keys (used to sign KES keys to create OpCerts for them) must be maximally secured, because once a Cold Key is compromised, an attacker can use it to create new KES keys and OpCerts at will. Cold Keys should not be stored on the same machine that runs the Node, or the Agent; ideally, they should live on an air-gapped machine that is not connected to the internet and only used for storing the Cold Key and making OpCerts. At the same time, the KES key cannot be transferred to the air-gapped machine without storing it on some kind of persistent storage medium (like a USB Flash drive). This "wolf-goat-cabbage" problem can be solved as follows: 1. Use the Control Client to generate a KES key pair. 2. Export the *verification key* for that KES key and store it on a persistent medium for transfer. Keep the Control Client running, holding on to the KES SignKey. 3. Move the transfer medium to the air-gapped machine, and import the verification key there. 4. Generate an OpCert on the air-gapped machine, and store it on the persistent transfer medium. 5. Move the transfer medium back to the "hot" machine, and import the OpCert into the Control Client. 6. Make a bundle (KES key, OpCert, etc.) and push it to the KES Agent. This way, the Cold Key never leaves the air-gapped machine, the KES Key never leaves the Control Client / Agent / Node system, and the only things that travel via the persistent transfer medium are the KES VerKey and the OpCert, neither of which is secret information. ### Technical Implementation #### Toolchain The KES Agent and KES Agent Library will be implemented in Haskell, making use of existing Cardano libraries such as `ouroboros-network-framework`, `cardano-base`. #### IPC Communication between KES Agents, Control Clients, and Service Clients will use Unix Domain Sockets. The advantage of this approach is that we will not need to implement authentication in the protocol itself; instead, we use the Domain Sockets themselves to control access to the KES Agent. If, at any point, we want to run KES Agent and Node on separate servers, then a possible solution would be to tunnel Domain Sockets over SSH, which allows us to secure access with strong keys, again without having to implement our own custom authentication protocol. One disadvantage of Unix Domain Sockets is that these have only become available on Windows relatively recently, and our Haskell networking code does not currently support Unix Domain Sockets on Windows; however, there are no block-forging nodes running on Windows currently, so this is not a major concern, and if at any point we do want to support this, it should be possible to use Unix Domain Sockets at least on current versions of Windows. #### Serialization / Deserialization Existing serialization / deserialization solutions, such as the CBOR libraries we use throughout the Cardano software stack, are unsuitable for this purpose, because they use intermediate values to hold serialized data before it is passed to a networking layer. Those intermediate values live on the regular GHC heap, and they can be swapped out, which would violate the secure forgetting requirement. So instead, we add a "direct serialization" API to the data structures used to store KES keys, which gives access to the raw secured memory in which the key data is stored, and a "raw bearer" API to the networking layer, which exposes functionality for transferring data directly between (secured) memory and file descriptors, the assumption being that writing data into a file descriptor's network buffer will not cause it to reach persistent storage via swap. ## Rationale: how does this CIP achieve its goals? 1. KES sign keys are never written to persistent storage: - Node processes will store KES keys in mlocked memory, and receive them from a KES Agent - Agent processes will store KES keys in mlocked memory, receive them from Control Clients, and pass them to Nodes - Control Clients will store KES keys in mlocked memory, and pass them to Agent processes - The IPC mechanism will be picked such that none of the communications will be written to disk 2. KES sign keys will remain available through Node restarts: - Whenever a Node process restarts, the Agent will keep running, and the Node can just fetch the key from the Agent again. ## Path to Active ### Acceptance criteria - [ ] All major concerns or feedback have been addressed. ### Implementation Plan - [ ] Add support for KES with Secure Forgetting to cardano-base - [ ] Integrate Secure Forgetting support with cardano-node - [ ] Publish a KES Agent Protocol specification - [ ] Implement a KES Agent suite (Service Client Library, Agent Program, Control Client Program) - [ ] Add support for KES Agent connectivity (via the Service Client Library) to cardano-node. - [ ] Document the process of setting up and using a KES Agent installation - [ ] Trial the KES Agent setup on a set of block-forging nodes - [ ] Promote use of KES Agent ## Copyright This CIP is licensed under [CC-BY-4.0][]. [CC-BY-4.0]: https://creativecommons.org/licenses/by/4.0/legalcode

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.