owned this note
owned this note
Published
Linked with GitHub
---
CIP: ?
Title: KES Agent
Status: Proposed
Category: Tools
Authors:
- Tobias Dammers <tobias@well-typed.com>
Implementors: N/A
Discussions:
- https://github.com/input-output-hk/kes-agent/issues
Created: 2023-06-29
License: CC-BY-4.0
---
## Abstract
Implement a KES Agent service to provide complete forward security for
block-forging signatures.
## Motivation: why is this CIP necessary?
As outlined in CPS-???? (KES Forward Security), Key Evolving Signatures in
Cardano dictate two seemingly conflicting requirements:
1. KES Sign Keys must never be written to persistent storage, in order to
guarantee secure deletion ("forgetting")
2. A Node process must be able to restart without losing its current KES sign
key.
The proposed solution is to run an "agent" process, which keeps a copy of the
current key in secure memory, and sends it to a node process on demand. When
the node restarts, the agent keeps running, and the node process can re-fetch
the current key from the agent. As long as IPC is done such that no swappable
RAM and no persistent storage is involved at any point, this will prevent
the accidental leaking of KES sign keys to persistent storage, while allowing
Node processes to restart autonomously without losing their KES sign keys.
## Specification
### Architectural Overview
The KES Agent system will consist of 3 main parts:
1. The **KES Agent**, program responsible for storing KES keys in RAM, evolving
them locally as the next KES period is reached, pushing them to any
connected service clients (nodes), and receiving new keys from any connected
control clients.
2. The **Control Client**, a command-line program that can be used to push KES
keys to a running KES Agent process.
3. A **Node**, using the **Service Client API** to connect to a KES Agent and
receive KES keys from it.
### Protocol
The 3 parts of a KES Agent system will use the KES Agent Protocol to
communicate between them. A detailed specification of this protocol shall be
published as part of the implementation effort.
The core operations of the protocol will be:
1. **Connect**
2. **Handshake** - this will verify the version of the KES Agent Protocol used
by both sides, to assure compatibility.
3. **SendKey** - send a key bundle (see below).
4. **Disconnect** - implicitly end the connection.
A key bundle consists of:
- A raw **KES SignKey**
- A **KES Period number**, indicating the KES Period for which this key is
valid
- A **serial number** for the KES series, used to make sure that newer keys are
always preferred, even if keys are sent out-of-order
- An operational certificate (**OpCert**) for the key, to verify that the key is
legit and may be used by the node.
The same protocol is used for both the Control Client and the Service Client
(Node). The KES Agent itself acts as the receiver for Control Clients, and
as the sender for Service Clients, and the order of interactions is always the
same:
1. Connect/accept
2. Perform handshake
3. Send/receive first key
4. Wait for next key to become available, or either peer disconnects
5. Send/receive next key
6. Repeat 4.-5.
### Control Client
The Control Client will consist of a CLI program which supports the following
operations:
- `keygen`: generate a fresh KES key pair and store it (in a connected Agent
instance)
- `export-verkey`: export the KES VerKey from a previously generated key pair
to a file
- `import-opcert`: import an operational certificate (OpCert) from a file
- `upload`: once a KES key pair and matching OpCert are available, activate
them, causing the Agent to push them out to connected Node processes
TBD:
There are two possible ways this can be done:
1. Store intermediate data (key pair, opcert) in the CLI process until the
"upload" step; this requires the CLI process to keep running while the
opcert is being created on the machine holding the Cold Key, so most likely
this will require the CLI to be interactive, and users will need a separate
terminal to copy the verkey and opcert files to/from removable storage media
2. Store intermediate data (key pair, opcert) in the Agent process, keeping
them in separate slots until the "upload" step. This allows each step to be
a one-shot CLI command, the same terminal can be used to copy the verkey
and opcert files around, and the terminal session can even be closed between
steps; however it does complicate the protocol and the Agent code a bit.
### Node
On the `cardano-node` side, a few changes need to be made, too.
Currently, `cardano-node` will read a KES0 key from a local file, which
provides no forward security in practice.
With an Agent in place, the Node will instead connect to an Agent, wait for it
to push a KES key, and then use that.
Required changes:
- Integrate the KES agent library
- Extend configuration to support fetching keys from a KES agent
### Use Case Scenarios
#### Starting Up KES Agent
On startup, a KES Agent will not hold any keys. In order to get the first key
into a KES Agent, the following steps must be performed:
1. Generate a key
2. Sign the key to create an OpCert, and create a KES Bundle from the key, the
OpCert, and metadata.
3. Upload the bundle to the KES Agent.
#### Starting Up A Node
1. When a Node starts up, and is configured to use a KES Agent, it will attempt
to connect to a KES Agent, retrying as necessary. The Node will remain in a
non-block-forging state.
2. Once a connection is established, KES Agent will send the current key.
3. Node will verify the OpCert, check the serial number, and store the key
locally in secure memory.
4. Check KES Period.
- If the key's KES Period is the current KES Period, the Node will start
using it and transition into a block-forging state immediately.
- If it is in the future, the Node will remain in a non-block-forging state
until it becomes current.
- If it is in the past, then the Node will evolve the key up to the current
KES Period ("fast-forward"). If this exhausts the key's evolutions, then
the key is discarded, and the Node remains in a non-block-forging state;
otherwise, it will start using the key and transition into a block-forging
state.
5. The Node keeps the connection open in order to receive new keys from the KES
Agent as they arrive. It also keep evolving keys locally as needed,
transition into a non-block-forging state when a key's evolutions are
exhausted, and into a block-forging state when a key becomes current.
The KES Agent will not push out evolutions of KES keys to Nodes that have
already received an earlier evolution of the same key; keys are only sent in
two situations:
- When a Node first connects.
- When a Control Client pushes an entirely new key.
The exact same procedure will also be used when a Node restarts.
#### Uploading A New Key
While the KES Agent is running, new KES keys can be generated and installed,
using the same steps as when first starting the Agent. When a new key arrives,
the Agent will handle it as follows:
1. Store the new key locally.
2. If the new key's KES Period is in the past, evolve it to the current KES
Period.
3. Send out the new key to all connected Nodes.
#### Cold Key Protection
Cold Keys (used to sign KES keys to create OpCerts for them) must be maximally
secured, because once a Cold Key is compromised, an attacker can use it to
create new KES keys and OpCerts at will. Cold Keys should not be stored on the
same machine that runs the Node, or the Agent; ideally, they should live on an
air-gapped machine that is not connected to the internet and only used for
storing the Cold Key and making OpCerts.
At the same time, the KES key cannot be transferred to the air-gapped machine
without storing it on some kind of persistent storage medium (like a USB Flash
drive). This "wolf-goat-cabbage" problem can be solved as follows:
1. Use the Control Client to generate a KES key pair.
2. Export the *verification key* for that KES key and store it on a persistent
medium for transfer. Keep the Control Client running, holding on to the
KES SignKey.
3. Move the transfer medium to the air-gapped machine, and import the
verification key there.
4. Generate an OpCert on the air-gapped machine, and store it on the persistent
transfer medium.
5. Move the transfer medium back to the "hot" machine, and import the OpCert
into the Control Client.
6. Make a bundle (KES key, OpCert, etc.) and push it to the KES Agent.
This way, the Cold Key never leaves the air-gapped machine, the KES Key never
leaves the Control Client / Agent / Node system, and the only things that
travel via the persistent transfer medium are the KES VerKey and the OpCert,
neither of which is secret information.
### Technical Implementation
#### Toolchain
The KES Agent and KES Agent Library will be implemented in Haskell, making use
of existing Cardano libraries such as `ouroboros-network-framework`,
`cardano-base`.
#### IPC
Communication between KES Agents, Control Clients, and Service Clients will use
Unix Domain Sockets.
The advantage of this approach is that we will not need to implement
authentication in the protocol itself; instead, we use the Domain Sockets
themselves to control access to the KES Agent.
If, at any point, we want to run KES Agent and Node on separate servers, then a
possible solution would be to tunnel Domain Sockets over SSH, which allows us
to secure access with strong keys, again without having to implement our own
custom authentication protocol.
One disadvantage of Unix Domain Sockets is that these have only become
available on Windows relatively recently, and our Haskell networking code does
not currently support Unix Domain Sockets on Windows; however, there are no
block-forging nodes running on Windows currently, so this is not a major
concern, and if at any point we do want to support this, it should be possible
to use Unix Domain Sockets at least on current versions of Windows.
#### Serialization / Deserialization
Existing serialization / deserialization solutions, such as the CBOR libraries
we use throughout the Cardano software stack, are unsuitable for this purpose,
because they use intermediate values to hold serialized data before it is
passed to a networking layer. Those intermediate values live on the regular GHC
heap, and they can be swapped out, which would violate the secure forgetting
requirement. So instead, we add a "direct serialization" API to the data
structures used to store KES keys, which gives access to the raw secured memory
in which the key data is stored, and a "raw bearer" API to the networking
layer, which exposes functionality for transferring data directly between
(secured) memory and file descriptors, the assumption being that writing data
into a file descriptor's network buffer will not cause it to reach persistent
storage via swap.
## Rationale: how does this CIP achieve its goals?
1. KES sign keys are never written to persistent storage:
- Node processes will store KES keys in mlocked memory, and receive them
from a KES Agent
- Agent processes will store KES keys in mlocked memory, receive them from
Control Clients, and pass them to Nodes
- Control Clients will store KES keys in mlocked memory, and pass them to
Agent processes
- The IPC mechanism will be picked such that none of the communications
will be written to disk
2. KES sign keys will remain available through Node restarts:
- Whenever a Node process restarts, the Agent will keep running, and the
Node can just fetch the key from the Agent again.
## Path to Active
### Acceptance criteria
- [ ] All major concerns or feedback have been addressed.
### Implementation Plan
- [ ] Add support for KES with Secure Forgetting to cardano-base
- [ ] Integrate Secure Forgetting support with cardano-node
- [ ] Publish a KES Agent Protocol specification
- [ ] Implement a KES Agent suite (Service Client Library, Agent Program,
Control Client Program)
- [ ] Add support for KES Agent connectivity (via the Service Client Library)
to cardano-node.
- [ ] Document the process of setting up and using a KES Agent installation
- [ ] Trial the KES Agent setup on a set of block-forging nodes
- [ ] Promote use of KES Agent
## Copyright
This CIP is licensed under [CC-BY-4.0][].
[CC-BY-4.0]: https://creativecommons.org/licenses/by/4.0/legalcode