Reusable Setup in the Secure Runtime

Notes: - Language of functions may change with placement type? Ie TFF and JAX placements. - Application of a function may turn into computation/protocol call (say for an MPC placement). - Programs have logical typed placements (roles?); there are replaced by phisical placements at runtime (ie values of these types). - Remember that we want to support high-perf EL (eg dataflow operations that perform add and mul on tensors). - LogicalGraph and PhysicalGraph, both with FunctionGraps? - To invoke a (sub-)protocol it is not enough simply to send a value to an endpoint on a host: the host must agree to invoke the session as well; this is how we remain in control of how data is used. In the future we may want to build this as streaming layer on top, but not as a basic compute unit. Separate involvement of hosts equals separating scheduling of kernels from receiving values, eg having the host *enter* the computation. - Computations are what we want; functions are just logical representations that are compiled by placements into computations (ApplyFunction operations). Functions may not contain placements, and in general remain opaque to what fact that we are in a distributed setting. ```rust let runtime = Runtime::new(); let inputter0 = HostPlacement::new(runtime, "10.0.0.1"); let inputter1 = HostPlacement::new(runtime, "10.0.0.2"); let aggregator = HostPlacement::new(runtime, "10.0.0.3"); let outputter = HostPlacement::new(runtime, "10.0.0.4"); let main_comp = MainComputation::new( runtime, inputter0, inputter1, aggregator, outputter, ); main_comp.apply(); ``` ```rust struct MainComputation { runtime: Runtime, inputter0: HostPlacement, inputter1: HostPlacement, paillier_aggregator: PaillierPlacement, outputter: HostPlacement, } impl MainComputation { fn new( runtime: Runtime, inputter0: HostPlacement, inputter1: HostPlacement, aggregator: HostPlacement, outputter: HostPlacement, ) -> Self { // we want to use a fresh set of Paillier keys each time // this computation is instantiated (with potentially // different placements); this corresponds to using a // new ideal instance in the UC framework (as opposed // to joint state ideal functionalities) // // note that we could have accepted this as input instead; // however, this way we can consider the use of Paillier // an implementation detail that is not leaked; in larger // systems this will be important since sub-computations // may in general be implemented using several ideal // functionalities that we should not know about at this // level, including how they should be created in order to // satisfy any requirements the computation may have on // them; in short, the only thing we need to know about // here is that main is a computation among four placements // // note that we are actually leaking some implementation // details in that a separate aggregator host placement // is needed. TODO TODO let paillier_aggregator = PaillierPlacement::new( runtime, aggregator, outputter, ); // we will be using the sum sub-computation, and we // create it here to reuse the instance across runs; // // note that this computation should somehow have been // created by the Paillier placement let sum_comp = SumComputation::new( runtime, self.paillier_aggregator, ); MainComputation { runtime, } } fn apply(&self) { let inputter0 = self.inputter0; let inputter1 = self.inputter1; let paillier_aggregator = self.paillier_aggregator; // TODO reformulate as graph; invocation means supplying // values for all symbols used in the graph, including // graph symbols used for recursion? somehow the symbols // must have a reference to eg the keys and placements used, // so maybe what we need is that whena symbol is created // in the runtime, this stores all mappings needed, // so that these are avilable if called recursively // from inside the graph let x0 = inputter0.apply(self.runtime, load_data); let x1 = inputter1.apply(self.runtime, load_data); // can we do recursion the following way? let y = paillier_aggregator.apply(self.runtime, sum, x0, x1); let _ = outputter.apply(self.runtime, save, y); } } ``` ```rust struct PaillierPlacement { key_owner: HostPlacement, computer: HostPlacement, ek: EncryptionKey, dk: DecryptionKey, } impl PaillierPlacement { fn new( runtime: Runtime, key_owner: HostPlacement, computer: HostPlacement, ) -> Self { let keypair_comp = PaillierKeypairComputation::new(runtime, key_owner); let (ek, dk) = keypair_comp.apply(); PaillierPlacement{ key_owner, computer, ek, dk } } fn compile(&self, func, args) { // assume that func is "sum" with signature // ""(Tensor@inputter0, Tensor@inputter1) -> Tensor@outputter" // and body "fn(x0, x1) = x0 + x1"; note that we have the // extra hint of the output's placement let (x0, x1) = args; let inputter0 = x0.placement; let inputter1 = x1.placement; let outputter = ???; PaillierSumComputation::new( self.runtime, self, [inputter0, inputter1], [outputter], ) } } ``` ```rust // computation responsible for generating a keypair // using an external lib on the designated key owner; // // note that this computation can be fixed and internal // to the PaillierPlacement struct PaillierKeypairComputation { runtime: Runtime, key_owner: HostPlacement, } impl PaillierKeypairComputation { fn new(runtime: Runtime, key_owner: HostPlacement) -> Self { PaillierKeypairComputation{ runtime, key_owner } } fn apply(&self) -> (EncryptionKey, DecryptionKey) { let key_owner = self.key_owner; // could execution here by driven by a graph? key_owner.apply(self.runtime, paillier.generate_keypair) } } ``` ```rust // sum computation compiled by the Paillier placement; // // note that this computation is re-using previously // generated keys through the Paillier placement, unlike // the main computation which created and controls the // scope of these keys struct PaillierSumComputation { runtime: Runtime, placement: PaillierPlacement, inputters: [HostPlacement; 2], outputters: [HostPlacement; 1], } impl PaillierSumComputation { fn new( runtime: Runtime, placement: PaillierPlacement, inputters: [HostPlacement; 2], outputters: [HostPlacement; 1], ) -> Self { // here we assume that assert_eq!(self.outputter, self.placement.key_owner); PaillierSumComputation{ runtime, placement, inputters, outputters } } fn apply(&self, x0: Tensor, x1: Tensor) -> Tensor { assert_eq!(x0.placement, self.inputters[0]); assert_eq!(x1.placement, self.inputters[1]); let ek = self.placement.ek; let dk = self.placement.dk; let inputter0 = self.inputters[0]; let inputter1 = self.inputters[1]; let computer = self.placement.computer; let outputter = self.outputters[0]; // could execution here by driven by a graph? // this is graph execution where some placements input // previously generated values, ie this is loading a graph // into the runtime and invoking it with the inputs on // relevant to the local runtime let c0 = inputter0.apply(runtime, paillier.encrypt, ek, x0); let c1 = inputter1.apply(runtime, paillier.encrypt, ek, x1); let c = computer.apply(runtime, paillier.add, ek, c0, c1); let y = outputter.apply(runtime, paillier.decrypt, dk, c); y } } ``` # Reusable Setup in the Secure Runtime All cryptographic protocols rely on some form of setup before they can be executed, including loading keys from a PKI or generating correlated randomness. In some cases, this setup may be reused across different sessions for efficiency reasons. This document describes a possible design for how this may be accomplished in the secure runtime without compromising on our other goals as detailed below. ## Motivating example We use secure aggregation as a motivating example. Expressed in a fictional language, we may imagine something like the following. Note that we assume that types may carry additional data that can be used to assign placement to all operations. For instance, the type of plaintext values `ps` may include information that can be used to place each `encrypt` operations, the type of encryption key `ek` may carry data about the placement of `sum`, and the type of decryption key `dk` about the placement of `decrypt`. ```rust fn secure_sum(ek, dk, ps) { let cs = map(|p| { encrypt(ek, p) }, ps); let c = sum(ek, cs); decrypt(dk, c) } let (ek, dk) = setup(2048); let x = secure_sum(ek, dk, xs); let y = secure_sum(ek, dk, ys); ``` While the above works, it also does a less-than-optimal job of containing complexity, and for instance leaks to the user that this particular way of performing secure aggregation requires correlated values in the form of a key pair. We could of course use a struct to keep this as a single value, but as applications scale this becomes more complex and we still need to pass along *something* to every invocation. An alternative is to pass them along implicitly using a closure as seen below. ```rust fn gen_secure_sum(key_length) { let (ek, dk) = setup(key_length); |ps| { let cs = map(|p| { encrypt(ek, p) }, ps); let c = sum(ek, cs); decrypt(dk, c) } } let secure_sum = gen_secure_sum(2048); let x = secure_sum(xs); let y = secure_sum(ys); ``` Another issue is that we may want to use the same setup for multiple computations, say both sum and mean. While the above mixes both logical and physical properties of the computation (that we are computing a sum and that we are using homomorphic encryption, respectively), we can at least take a first step in separating them as seen below. ```rust fn gen_secure_f(key_length) { let (ek, dk) = setup(key_length); |f| { |ps| { let cs = map(|p| { encrypt(ek, p) }, ps); let c = f(ek, cs); decrypt(dk, c) } } } let secure_f = gen_secure_f(2048); let secure_sum = secure_f(|ek, cs| { sum(ek, cs) }) let secure_mean = secure_f(|ek, cs| { mean(ek, cs) }); let x = secure_sum(xs); let y = secure_mean(ys); ``` Of course, what we really want is to pass in a logical representation of for instance `sum`, and then have it automatically compiled to its physical representation `sum(ek, cs)`. Leaving out the details of how this compilation process can be done, we obtain something like the following. ```rust fn gen_secure_compiler(key_length) { let (ek, dk) = setup(key_length); |f| { let secure_f = compile(f); |ps| { let cs = map(|p| { encrypt(ek, p) }, ps); let c = secure_f(ek, cs); decrypt(dk, c) } } } let secure_compiler = gen_secure_compiler(2048); let secure_sum = secure_compiler(sum); let secure_mean = secure_compiler(mean); let x = secure_sum(xs); let y = secure_mean(ys); ``` And of course, we can avoid explicitly defining `secure_sum` and `secure_mean` without an impact on performance by for instance introducing an internal cache. ```rust let secure_compiler = gen_secure_compiler(2048); let x = secure_compiler(sum)(xs); let y = secure_compiler(mean)(ys); ``` At this point we can really argue that "compiler" is not the right term, and that a new concept would offer a better mental model. Concretely, we propose to introduce the notion of *instantiable virtual placements* to capture exactly what we have been creating here. ```rust let placement = SecurePlacement::init(2048); let x = placement(sum)(xs); let y = placement(mean)(ys); ``` Using something like Python context managers allows us to reexpress as follows. ```rust let placement = SecurePlacement::init(2048); placement.with(|plc| { let x = plc.apply(sum, xs); let y = plc.apply(mean, ys); }); ``` However, while we have now abstracted away many physical implementation details, in some cases we should be able to infer placements from the types of values and obtain the following. Here, parts of the program is expressed in a purely logical representation. ```rust let xs = plc.apply(load_values, ()); let ys = plc.apply(load_values, ()); let x = sum(xs); let y = mean(ys); ``` In summary: we used a protocol to compute the initial setup values, and we encapsulated them inside a placement instance with an attached compiler. Operations applied at this placement are passed through the compiler to be turned into protocols, and the values generated during initialization may be implicitly passed along when executing these. Compiling the same logical computation to a different physical computation is simply a question of using a different placement, including for instance a plaintext placement for debugging purposes. In the rest of this document we detail how this maps to dataflow DAGs and their execution. ## Placements **TODO TODO how do we do sub-protocol invocation/application??** ### Lifetimes The scope of values generated during initialization, including cryptographic setup, is tied together with the lifetime of a placement. This makes for easy reasoning, even at higher level. ### Generality The concept of placement and placement initialization using protocols is not limited to virtual placements employing advanced cryptographic techniques, but also captures for instance physical hosts and groups of these. In the former example, initialization of physical hosts is an essential part of bootstraping a cluster, where each participant in the cluster specifies the host's endpoint and prepares material needed for validating the endpoint when (gRPC) connections are subsequently established (including TLS certificates). Note that this means that connection failures do not require us to reinitialize the placement. ```rust let host = HostPlacement::init(endpoint); ``` In the latter example, TensorFlow Federated's `CLIENTS` may be expressed as a placement that simply replicates all operations across a fixed group of sub-placements. While initialization may be done without interaction, this step still generates distributed state that can be expressed as a protocol (concretely, we may use it to assign indices to members). ```rust let host0: HostPlacement = ...; let host1: HostPlacement = ...; let clients = ReplicatedPlacement::init([host0, host1]); ``` ### Levels of abstraction **We clearly need to be able to specify programs in different languages (especially Python) and reason about them, and hence need to introduce some common intermediate format that can capture the concepts introcued here; however, for couuminication purposes we use Rust and its concepts to illustrate these ideas. These could in fact be used as an implementation as a first approximation, and once settled, be ex-expressed in a DSL.** ```rust struct ReplicatedPlacement { sub_placements: Vec<Placement>, } impl Train<ReplicatedModel, ReplicatedTensor> for ReplicatedPlacement { fn train(&self, model: ReplicatedModel, batch: ReplicatedTensor) -> ReplicatedModel { let gradients = self.iter().map(|plc| pls.train()); } } ``` ```rust type FederatedPlacement = ReplicatedPlacement; impl Train for FederatedPlacement { fn train(&self, model, batch) { } } ``` ### Associated types We will likely need that value types can somehow depend on the placement type. For instance, when a replicated placement produces a tensor, this may in fact be backed by a set of tensors, one from each member. ```rust client.with(|plc| { let gradients: ReplicatedTensor = plc.train(model, batch); let gradient: Tensor = plc.mean(gradients); let new_model: Model = plc.update(model, gradient); }) ``` Although using a virtual placement, the above logical program still leaks some physical details, namely the fact that the training step generates more than one gradient that much be aggregated as part of updating the model. This makes the representation less suitable for reuse for local, for instance. One way to improve this is through abstraction, where we may simply wrap the above steps into a new `train_on_batch` protocol ```rust client.with(|plc| { let new_model: Model = plc.train(model, batch); }) ``` These tensor kinds should behave similarly and be interchangeable, keeping with the idea that all the placement specific parts of a logical program is stored in its types. Of course, the types should still help catch programming bugs, and using a `ReplicatedTensor` on for instance a host placement should raise a compile time error. ### Inferring placements ```rust let placement = SecurePlacement::init(2048); placement.with(|plc| { let x = plc.sum(xs); let y = plc.mean(ys); }); ``` ```rust impl Sum for FederatedTensor { fn sum(xs) { // extract placement from `xs` } } impl Mean for FederatedTensor { fn mean(xs) { // extract placement from `xs` } } let x = sum(xs); let y = mean(ys); ``` ### Runtime The runtime has built-in support for protocols and placements, meaning these are first-class objects in the API of the runtime. ### Iteration Iteration and looping is supported through recursion by any placement that supports application. ### Universal placement ## Prerequisites We here summarise certain design decisions upon which this proposal is based. ### Functions Functions are purely logical expressions formally defined from a small functional language, and their use range from capturing simple arithmetic operators to iterative learning processes. Functions and protocols are tighly related in that the latter is typically used to implement the former: functions are placement neutral and define *what* a computation does, whereas protocols are placement-aware and defines *how* it is done. Functions are turned into protocols by running them through a placement-aware compiler. In cryptographic theory, functions are `f` while protocols are the interactive machines `M_f` implementing those functions. At the end of the day, we only know how to execute protocols (computations), so the goal is to eventually end up with them. However, initially, parts of the computation may be specified as applications of functions on virtual placements that are compiled down to computations using the placements' associated compilers. `FunctionGraph` vs `ProtocolGraph` ### Protocols Protocols are at the core of our distributed computations as the only mechanism by which parties interact, and are formally defined as stateless dataflow DAGs where nodes represent operations to be evaluated at their assigned placement such as `map` and `call`. **Protocols are used to implement functions, and can be seen as the composition and coordination of local function applications** **Protocols are implementations of functions (functionality F vs function f)** Execution is done in isolated sessions by tagging values with unique session id, which implies that values from one session cannot mix with values from another. This isolation is required for both correctness and security since several sessions, including of the same protocol, may be running concurrently. The values produced at output nodes are stripped of session ids and stored at the node's associated placement under opaque handles that are safe to share. Protocols may invoke other protocols by letting values flow into input nodes and out of output nodes of the corresponding DAGs. This includes recursive invocations, which is how we express for instance iterative processes. The session ids of sub-protocols are prefixed by their parent's session id as in [[CCL'14](https://eprint.iacr.org/2014/553)]. Expressing distributed computations as dataflow graphs has been useful for both efficient execution and reasoning, as illustrated by for instance TensorFlow Distributed, Naiad, Apache Spark, and Apache Flink. ### Layered representations We separate logical and physical representations of distributed computations, where the former is left mostly independent of for instance cryptographic implementation details and may in fact be compiled to several different physical representations using different cryptographic techniques. This independence is done by storing all information needed to compile from logical to physical representation in the types of the logical computation, which may often be automatically inferred by the compiler given relatively few type hints. A good analogy of this approach is the use of type inference and let polymorphism in (functional) programming languages, and has been partially inspired by TensorFlow Federated. ### Modular systems Our goal of modularity goes beyond sub-protocols, and also includes making it easy for users and developers to reason about implementations: users should be aided in vetting out application level bugs, including violations of intended security and privacy policy; and developers from different domains should be empowered to approach the code base with confidence. this also applies, and at each layer we aim to support programmers in expressing concepts in a modular way manage complexity ### Functional language While our underlying computational model is based on dataflow DAGs, at especially the logical level we may express computations using a functional language that allows for easy compilation to DAGs. ## Reuse values across sessions The above non-interference of values raises questions about how values generated in one setup session can be used across subsequence sessions of other protocols, and how this can be reasoned about by all participants. One line of thought suggests that we need something beyond protocols that can hold untagged state, perhaps similar to closures in functional programming. Concretely, encapsulated key setup could be expressed as follows in the functional paradigm. In the above see that the values produced during setup (`ek` and `dk`) are reused by several subsequent sessions and in principle have been fully encapsulated in the closures. Automatically derive where for instance encryption happens The state needs to be attached to something, and ideally something that allows for encapsulation to control cmplexity, like the closure above. But instead of a closure, we could also attached it to a *placement*, which additionally makes it natural to also associate a compilation process. ## Dump The following program is meant to capture an iterative process as used in for instance federated learning. It offers an alternative for how control flow, including iteration and branching, may be distributed across a subset of the participants instead of leaving this task up to a single party. One may think of the high-level language as a layer that adds control flow on top of a data flow layer; in other words, the high-level language consists of primitives for control flow and for calling data flow graphs expressed in an lower-level language. The iterative behaviour is expressed via recursive calls, in this case to `train`. This differs from how iteration is dealt with in for instance TensorFlow 1.x and Naiad, where the more classical approach of cyclic graphs is taken. However, allowing cyclic graphs seems to lead to a lot of complexity in order to obtain performance optimizations that may be less relevant given our domain; some evidence for this may be found in TensorFlow 2.x where the loop driver has (mostly) been moved back into Python and efforts are instead focused on optimization the loop body. On the other hand though, as mentioned above, we do not wish to adapt their notion of a single loop driver (which is also found in TensorFlow Federated) in the secure runtime, as that introduces additional trust assumptions. The pseudo-code below is loosely following Rust syntax, however no attention has been paid to ownership and mutability since all values are assumed to be immutable. Notes: - Some operations, such as loading of training data, has a group placement similar to `tff.CLIENTS`. Operations assigned to this placement will later be copied onto all group members, and allows for us to use the same high-level logic independent of the actual number of for instance data owners; concretely, the program could just as easily be executed locally (with different type assignments). - Following Rust, operations can be overloaded on an inferred type. For instance, different implementions of `load_dataset` for type `P` can exist, and `P` can often be inferred automatically. - The data flow graphs are responsible for all data communication needed, including for instance broadcasting the model to clients; this avoids the need for communication primitives in the higher-level language and allowing the language to stay true to the idea of reusing the same logic for both local and distributed execution beyond the, typically inferred, types. ```rust FN SETUP() { TODO TODO TODO might be needed before this is really interesting, especially when keys are captured by closure } WE WANT TO PREVENT "DRIVER" TO BE FREE TO MIX AND MATCH AS DESIRED fn main() { TODO TODO TODO is the REPL case the only interesting one? // load distributed datasets and model let validation_data: DataSet<ModelOwner> = load_dataset(uri="datasetABC"); let training_data: DataSet<Clients> = load_dataset(uri="dataset123"); let model: Model<ModelOwner> = load_model(uri="old_model"); // train model iteratively let new_model = train(model, validation_data, training_data); // save newly trained model save_model(uri="new_model", model); } fn load_dataset<ModelOwner>(uri: String) -> DataSet<ModelOwner> { // implemented as data flow graph calling out to // for instance TensorFlow and its DataSet API } fn load_dataset<Client>(uri: String) -> DataSet<Client> { // implemented as data flow graph calling out to // for instance TensorFlow and its DataSet API } fn load_dataset<Clients>(uri: String) -> DataSet<Clients> { // implemented by replicating `load_dataset` to all members } fn load_model<ModelOwner>(uri: String) -> Model<ModelOwner> { // implemented as data flow graph calling out to // for instance TensorFlow and its Keras API } fn save_model<ModelOwner>(uri: String, model: Model<ModelOwner>) { // implemented as data flow graph calling out to // for instance TensorFlow and its Keras API } fn train( model: Model<ModelOwner>, validation_data: DataSet<ModelOwner>, training_data: DataSet<Clients>, ) -> Model<ModelOwner> { if accuracy(model, validation_data) <= 10.0 { model } else { let (batch, remaining_training_data) = next_batch(training_data); let new_model = train_step(model, batch); train(new_model, validation_data, remaining_training_data) } } fn accuracy(model: Model, data: DataSet) -> Scalar<P, f64> { // implemented as data flow graph calling out to // for instance TensorFlow; all nodes placed at P } fn next_batch(data: DataSet) -> (Tensor, DataSet) { // implemented as data flow graph calling out to // for instance TensorFlow; all nodes placed at P } fn train_step( model: Model<ModelOwner>, batch: Tensor<Clients>, ) -> Model<ModelOwner> { let gradient = grad(model, batch); update(model, gradient) } fn grad( model: Model<ModelOwner>, batch: Tensor<Clients>, ) -> Tensor<ModelOwner> { // implemented as a multi-party data flow graph that // may call a sub-graph to perform secure aggregation } fn update( model: Model, gradient: Tensor, ) -> Model { // implemented as data flow graph calling out to // for instance TensorFlow; all nodes placed at P } ``` One of the compilation steps for the above program is to turn the `if` expression into a `switch` expression that makes it easier to capture as a dataflow graph. Notes: - The use of closures instead of functions may be arbitrary, but at least seems easier since `switch` can simply expect two closures of type `() -> T`; it also seems to match how closures are implemented in functional languages [[Appel, chapter 15.2](https://www.cs.princeton.edu/~appel/modern/ml/)]. However, it implies that the dataflow graphs must be executed in an *environment* where values for the captures variables can be found, as well as node types for defining and referencing/accessing these. ```rust fn train( model: Model<ModelOwner>, validation_data: DataSet<ModelOwner>, training_data: DataSet<Clients>, ) -> Model<ModelOwner> { let cond = accuracy(model, validation_data) <= 10.0; let true_fn = || { model }; let false_fn = || { let (batch, remaining_training_data) = next_batch(training_data); let new_model = train_step(model, batch); train(new_model, validation_data, remaining_training_data) }; switch(cond, true_fn, false_fn) } ``` Making the above comments on passing values through environments may be expressed more explicitly as in the following. This may or may not prevent type checking at lower levels. ```rust fn true_fn(env: Environment) -> Model<ModelOwner> { let model = env["model"]; // below as before model } fn false_fn(env: Environment) -> Model<ModelOwner> { let model = env["model"]; let training_data = env["training_data"]; let validation_data = env["validation_data"]; // below as before let (batch, remaining_training_data) = next_batch(training_data); let new_model = train_step(model, batch); train(new_model, validation_data, remaining_training_data) } fn train( model: Model<ModelOwner>, validation_data: DataSet<ModelOwner>, training_data: DataSet<Clients>, ) -> Model<ModelOwner> { let cond = accuracy(model, validation_data) <= 10.0; let env = { "model": model, "training_data": training_data, "validation_data": validation_data, }; switch(cond, true_fn, false_fn, env) } ```