After writing up this spec, I would also like to question/justify the idea as to whether we need this. My concerns are around file access for things like state management and if there are any cases where a user may, due to ignorance allow one to execute arbitrary code in a non-sandboxed way
Introduction
Noir is a domain specific language for writing circuits. Non-deterministic behaviour is useful as they allow you to prove statements in a more efficient way. For example, when doing an inverse, one can either deterministically use a inversion algorithm, or non-deterministically supply the inverse and verify that it is the inverse, since we know that the inverse of a number multiplied by that number equals 1, except 0. The same applies for other operations like square root.
Another form of non-determinism is state fetching.
In general we characterise non-determinism as a method to fetch some value without needing that value to be constrained to any other value in the circuit.
This is ofcourse dangerous because I (Prover) could fetch a random value and if I do not apply the correct constraints to said value to ensure that it indeed the honest value, then this value would be known as unconstrained or mis-constrained.
Noir
In Noir, we would like to introduce this concept of non-determinism by allowing users to manipulate values and or fetch random state values. This document will specify a strategy for solving this problem.
We will split this document into two sections:
Section 1: How we will modify ACIR/ACVM
Section 2: How we will modify Noir
Directives in ACVM allow us to do very specific defined non-deterministic behaviour.
Lets look at the inversion directive as an example.
Directive::Inverse{input_index : WitnessIndex, output_index : WitnessIndex}
Since ACVm holds a map from witness indices to field elements. Processing this directive would involve
input_index
in the map and getting the concrete value.output_index
key.We have a few other directives which ACVM apriori knows how to deal with, we call these known directives. Another example of a known directive, is our bit decomposition directive.
This document does not propose to change these, instead we will add a new directive type.
These are directives which generalise the operation being done. We could remove known directives and only use general directives however, this way allows you to quickly figure out if an ACIR uses only known directives.
The proposed ACVM format for a general directive is:
Directive::general(name: String, input_indices : Vec<WitnessIndex>, output_indices : Vec<WitnessIndex>)
It is easy to see how these generalise known directives as one could replace the inversion directive by passing name = "inversion" and then one element each to the input and output indices.
In the case of known directives, ACVM knew how to handle it. However, for general directives, the implementation of them is not known. So if I pass the name compute_foo
as a name to the general directive, it is not clear where one should get the implementation details of compute_foo
.
For this we suggest adding a new map to ACVM, namely one that maps strings to the location of the implementation of the directive.
ie. Map<String, LocationOnDisk>
Hence, we map function names to the location of their source code on disk. This map can be called the general directive binary map.
There is a somewhat open question here as to what that source code is written in.
We have initially chosen C code due to C having a stable ABI. After conversations with Mike, WASM is also a viable option. WASM is a good option because we can sandbox code and simply disallow operations like file access.
We expect the binary to expose one method, namely:
fn call_functions(name : String, inputs : Vec<Field>, outputs : Vec<Field>)
Notice the similarities with the definition of the general directives. Now, this call_functions
method may not look exactly like this in C code, due to us needing to pass a pointer and the length of the vector, but it is close enough and that is an implementation detail.
Note that we have multiple call_functions methods. For each General::Directive we encounter, we need to go into the general directives map and check for the location of the corresponding binary.
General directives are expressive, however they come with friction since their implementations need to be in some external file in C/WASM.
We therefore propose another strategy to inline code into blocks. A new directive is needed:
Directive::code(inputs : Vec<WitnessIndex>, outputs : Vec<WitnessIndex>, code_block: String)
The question is then, what is the contents of code_block. Theoretically, we can make this any language and have acvm interpret it. We need to devise a strategy to copy the inputs into the parts of the code that needs it. The naive approach is as follows:
_0
would reference the first input.We have now defined two ways that allow ACVM to execute code blocks which do not apply constraints.
C Code/WASM
In one strategy, the code block is an external C file, which is extremely dangerous because one could download code that does arbitrary code execution. Using a WASM file mitigates most of this since it is sandboxed. I would personally lobby for only using wasm or sandboxed environments, however this means that one cannot easily do database fetches.
Lets investigate the last claim.
We should also investigate in what context will one need these functions.
Inline Code blocks
The second strategy is not as powerful, however it is also less dangerous, because one can see the code that is being ran inside of the Noir function; there is no external dependency. Also depending on how we do it, we may also have a lot of control over what is allowed in those blocks.
Now we specify what changes are needed on the Noir level in order to allow developers to use the newly available ACVM directive types.
In order to access known directives, we use a builtin_directive
attribute:
#[builtin_directive("inverse")]
fn inverse(input: Field) -> Field;
How it is compiled
The above is used when we want to use a directive that ACVM knows about. In the compiler, noir will simply create a Directive::Inverse
directive. This is similar to what is already happening in Noir. The difference being that, previously developers were not able to access these builtin directives.
An external library will declare a set of non-deterministic functions in a noir header file, alongside a C library which will specify how acvm should handle these non-deterministic functions in Noir.
These external libraries will be what we pass to acvm when it tries to find the implementation for a general directive.
We enforce the folder structure to be the following:
As mentioned, the header.nr file will hold extern functions, which are declared using an extern keyword.
An example of this is:
extern fn foo(a : Field, b : [Field;10]) -> Field;
The extern keyword is used to denote that this functions implementation is external to the noir language and that it is code that does not create constraints.
// Nargo.toml
my_extern_lib = {git = "github.com/foo/extern", extern = true}
First we add this external library as a dependency in our nargo.toml, with a boolean flag to state that it is indeed an extern library. We will see that this flag will then allow us to collect metadata that we will need later on.
In ones source file, you would write the following:
// main.nr
use my_extern_lib;
fn main(a : Field) {
let c = unsafe {my_extern_lib::foo(a)};
}
Note the usage of the unsafe keyword, since calling this method does not apply any constraints, we want this to be a deliberate choice by the user.
How it is compiled
Directive::general{
name: "my_extern_lib::foo"
inputs : vec![a, b[0], b[1], b[2], .., b[9]]
outputs : vec![c]
}
In the above code, the directive name is standardised to be the package name concatenated with the method name. We will only allow a single header extern file, so its not possible to have {package_name}::{module_name}::{function_name}
.
Linking external libraries to locations on disk
We now focus on the fact that we need to supply a map that links the names in general directives to locations on Disk.
The idea is as follows, in our nargo.toml, we have specified whether a library is either extern or not. Since nargo also knows where the packages are stored, we can create a map from the package name to the location on Disk.
The overall communication between nargo and acvm would look like the following:
nargo_meta = Map<String, LocationOnDisk>
location = nargo_meta[{package_name}]
call_library(field_type, lib_path, inputs, len_input,&mut outputs, len_output);
The rough draft that we used to play around with ideas is stored here for prosperity: https://hackmd.io/uaMDShRYRtCVQeOwOY-syg?both