Try   HackMD

Efficient Distributed Secure Memory with Migratable Merkle Tree

IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023

Introduction

Many hardware assisted enclaves exists, such as:

  • Intel SGX
  • AMD SEV
  • ARM CCA
  • RISC-V Penglai

Problems

  • Existing enclaves are designed to protect memory on a single node
  • Inter-node network is considerd untrusted, thus software secure channel (e.g., TLS/SSL) is required for data tranfer
  • Crypto-based approaches incur significant overhead
    • Netword connection bandwith — 400-800 Gbps
    • AES/HS3 algorithm w/ AES/GCM accelerator throughput — 2-40 Gbps

Proposed Solution: Scale current single-node memory protection engine todeal with multiple nodes

Challenges

  • Root of integrity / one-time pad is sealed in CPU and connot be trasferred by untrusted network
  • Middle-man attacks are easy, such as luanching replays, re-order attacks

Background Knowkedge

Hardware-based Memory Protection

  1. Counter-Based integrity Tree:
    A hash tree also includes a counter value when the tree is modified. It guarantees confidentiality of secure memory, and is resistant to replay attacks.
  2. One-Time-Pad Encryption:
    The code address and counter value are used to generated a OTP (one time pad) to perform XOR with some Plaintext generating a cyphertext.

Demand of Distributed Confidential Computing

  1. Security gurantee for both code and data:
    Hardware enclaves CAN protect confidentiality of code and data, but CANNOT guarantee the integrity, requiring users to verify integrity of input and check the mesurements of the executive code. Despite the better performance of hardware enclaves, security checks brings high overhead
  2. Large-Scale Data Transfer:
    Communication overhead between nodes is one of the key metrics for distributed computation.

Inefficiency of Secure Channel over Untrusted Interconnection

Method Throughput Connection
PCI-E 5.0 32 GT/s CPU-Device
UCI-E 32 GT/s Chiplets
RDMA 400 Gb/s Remote Memory
NVLINK 900 GB/s GPU
  1. Limited throughput over secure channel mechanism:
    Interconnection latencies are low (above table), but also has no protection (untrusted). Secure channel can resist tanpering during transmission but has low throughput and reduces performance by orders of mangnitude.
  2. Establish the secure channel between enclaves:
    Enclave design uses a untrusted buffer to copu I/O messages out of enclave memory, requiring itself to encrypt the message. Due two the 2 extra memory copies, it is considered time-comsuming.

Design Overview

Design Goals

  1. The system should eliminate re-encryption overhead when transferring secure memory, and saturate the maximum throughput of the interconnection.
  2. The system should achieve same level of security gurasntee as secure channel mechanisms, with concern of the following properties:
    • Confidentiality
    • Integerity
    • Freshness of transferred data
  3. The system should hide hardware details for userapplications and inherit programming paradigm of the distributed confidential computation.

Threat Model

The TCB (Trusted Computing Base) only contains the CPU and the most previleged software.

  • Physical Attacks:
    Off-Chip hardwares plugged in are possible to perform attacks such as:
    • Spying
    • Slicing
    • Replay Attacks
  • Privileged Software Attacks:
    • Despite attacks triggering software attacks from REE, the firmware running most-privileged mode CANNOT be comprised.
    • TEEOS is trusted but optional
  • Side-Channel attacks and DoS attacks are not considered.

Challenges

To send secure memory to others directly, the following problems must be solved:

How to decouple the memory protection from the CPU-bound secret ina single node?

Unique physical address is used to generate the OTP for memory encryption, but the key is sealed within the CPU.

How to securely transfer the secure memory to the remote node without re-encryption?

If the secure memory is sent, the receiver wouldn't be able to decrypt and authenticate the transferred memory. However, we must defend against attacks during transfer.

How to hide the hardware details and provide the suitable primitive for distributed conputation?

The new hardware extension cannot complicate the implementation of user applications or break the paradigm in distributed confidential computation.

Important Terms

  • Integrity Forest — Spans among multiple nodes. To join, each node must be verified via a global attestation mechanism.
  • MMT Closure — A transfer unit, integrating both data and metadata. A remote node would be able to decrypt and authenticate the transfer memory according to it.
  • MMT Closure Delegation &edash; Communication protocol which allows transferring MMT closures.
  • MMT Monitor &edash; A module in the most privileged mode, works to:
    • Origanize secure hardware
    • Establish connection between remote nodes

Detailed Design of the MMT (Migratable Merkle Tree)

Multiple-nodes Integrity Tree

Two steps are used to set up a integrity forest:

  1. A global authority node will verify each participant node joining the distributed computing. Three phases are included for a participant attest itself to the global authority node:

    1. The attested node generate a key agreement message to attestation server, discuss a session key.
    2. The attested node sends a manufacturer certificate sealed with machine key to attestaation server.
      The attestation server verify the certificate wity manufacturer's public kay and respond a CA report.
    3. The attested node sends node-related message (e.g., software mesurement, node meta-info).
      The attestation server checks the info and responds a global-unique node id to node.
  2. A integrity forest is constructed to protect the memory among nodes.







G



IF

Integrity Forest



A_1

1



IF--A_1




if11




IF--if11




if12




IF--if12




B_1

2



IF:s--B_1




if21




IF--if21




if22




IF--if22




C_1

3



IF--C_1




A_1--if11




A_2




A_1--A_2




A_3




A_1--A_3




if11--if12




if12--B_1




B_1--if21




B_2




B_1--B_2




B_3




B_1--B_3




if21--if22




if22--C_1




C_2




C_1--C_2




C_3




C_1--C_3




A_2--A_3




A_4




A_2--A_4




A_5




A_2--A_5




A_3--A_5




A_6




A_3--A_6




if31




A_3--if31




LM1

Local Memory



A_5--LM1




B_2--B_3




B_4




B_2--B_4




B_5




B_2--B_5




B_3--B_5




B_6




B_3--B_6




if32




B_3--if32




LM2

Local Memory



B_5--LM2




C_2--C_3




C_4




C_2--C_4




C_5




C_2--C_5




C_3--C_5




C_6




C_3--C_6




LM3

Local Memory



C_5--LM3




if31--B_2




if32--C_2




N1

Node 1



LM1:sw--N1:nw




LM1--N1




LM2:se--N1:ne




LM2--N1




N2

Node 2



LM3--N2




In traditional hardware-supported memory protection, and OTP can only be used once, and is generated using a counter and the memory address. However, in multi-node scenario, there will exist same addresses.

To solve the problem, the system assigns a global memory address, used only in integrity checks, comprising two parts:

  1. Global unique node id — generated from the attestation server
  2. Monotonic number — generated from hardware to map into physical memory

MMT Scheme

  • An MMT is a subtree in the integrity forest.
  • (Compared with a normal merkle tree,) The root is expand with fout kinds of metadata:
    • MMT state
      An MMT could be in one of four states &edash;
      • valid: The MMT is active and performs security check for memory access
      • invalid: MMT is unallocated / reclaimed, memory is non-secure memory
      • sending: MMT is transferring memory, read-only
      • waiting: Works with MMT in sending state, waits for remote transferred memory
    • MMT key
      A user defined key for memory encryption / authentication. Processess holding the same key can modify the same secure memory
    • MMT counter
    • MMT global-unique address
      Initialized after global attestation, re-assigned if state is changed to valid