This doc outlines the high-level technical design of implementing part of the larger on-chain dark pool, specifically the orderbook and on-chain matching engine, inside a secure enclave, which are hardware-based trusted execution environments (TEEs). While the exact taxonomy is out of the scope of this doc, there are subtle differences between secure enclaves and the generalized notion of TEEs. The major difference is that enclaves offer application isolation through hardware guarantees.
The key feature that our on-chain dark pool offes is that of open order price shielding, meaning that malicious entitities/traders who want to manipulate the token price cannot do so since the price of open orders are shielded by design and hence cannot gauge the impact trade(s) from their treasury would have on the token price.
As such, we make use of a number of primitives and techniques to achieve this:
Orders consist of two portions, transparent and shielded, with the price of the order comprising part of the shielded portion. Whenever this portion is used for public, on-chain commitments, it is done in a hashed manner, so as to not exposing its underlying value while maintaining cryptographic integrity.
All circuits that deal with raw orders take them in as private inputs.
The public output, of the circuit, does not expose the price since it is a free variable both in the case of asks as well as bids (see here for more details)
The raw client order is only exposed in the secure enclave, where it is used for order matching with other open orders from the orderbook. In essence, hence, all the open orders in the orderbook are exposed only in the enclave and are used for order matching, keeping (the price of ) exposed only to two entities, the user who placed the order, and the enclave.
The enclave is hence a crucial part of keeping price shielded.
Our enclave of choice is AWS Nitro. Nitro enclaves run on dedicated CPU and memory which is not shared with the untrusted code, which promises to mitigate the threat of side channel attacks, which is one of the major security issues in Intel SGX and other commodity enclaves like ARM TrustZone and AMD SEV, which are based on shared CPU resources. However, this is not without tradeoffs, as is any system that offers even significant advantages over others. Unlike SGX et al, Nitro enclaves do not support memory encryption, which makes recovery in the case of a crash fault (i.e. data availability) a non-trivial problem to solve.
Before getting into the details of the matching engine itself, here is a quick recap on how the client-enclave dynamics for placing, matching and filling an order work. For details on what exactly the terms 'commitment', 'backing ZKP' and 'enclave signature' imply, please refer to the "Client functions for interacting with the dark pool" section of this doc.
As is evident from the above, the matching engine has on-chain dependencies, i.e. it matches an order () only when the commitment for that order () has been published on-chain. Again, since we want to minimize trust assumptions as much as possible, only matching is done by the enclave. The actual filling of the order is done client-side, due to which the enclave returns upto 3x the volume of orders crossing that of the client's to the client. The client side picks up the best orders out of these to fill the order and provides them as private inputs to the circuit. This exact client-side process is out of the scope of this doc, however, we now provide a description of how the matching engine on the enclave's end works. All code described is in Rust.
Recall that an order consists of a transparent() and a shielded() structure. We define it as:
Now, since our actual matching engine depends on what happens on-chain – specifically, whether an order placed by a user is committed on-chain, we need a 'staging area' consisting of orders and their commitments, which the enclave then uses to verify on-chain commitments:
We then define how the actual matching engine. Specifically, we employ a Red-Black Tree to organize objects by their . The structure bifurcates these trees into and sides, ensuring the best buy and sell prices are quickly accessible. The levels of the LOB are indexed by the price, and each level contains a hashmap of orders indexed by their commitments as the key.
We have carefully considered the data structures used to construct the 'orderbook' that the matching engine will operate on. Since the orderbook will be multithreaded, with many concurrent users/threads accessing it at once, it becomes imperative to make sure that it is thread-safe. As such, we structure the book into two parts: Bid and Ask, each with its own independent RwLock
. RwLock
(s) are especially useful since they allow for multiple reads but single writes, and are made thread-safe using Arc
. Each of the bid and the ask side is a red-black tree (RBTree
), and hence writing to either side involves "locking" the whole tree. This means that once a lock is acquired, we can operate/write on the tree as if it were single-threaded. We hence use defined Node
as the node of the tree, with the LimitNodePtr
type being used for the parent
, left
and right
fields of the node. LimitNodePtr
is of the type Option<LimitNode>
, and LimitNode
is itself of the type Rc<Refcell<Node>>
. Utilizing Option<Rc<RefCell<LimitNode>>>
for the parent
, left
, and right
fields in a Node
facilitates shared ownership and interior mutability within a single-threaded context. The Rc
wrapper allows multiple references to a node, ensuring its longevity as long as it's required within the RBTree
. Meanwhile, RefCell
enables mutable access to the LimitNode, allowing for necessary modifications within the tree structure during operations such as insertions or deletions.
The main advantage that RBTs offer in terms of efficiency is that insertion, deletion and searching within the tree is done in time, where is the number of nodes in the tree. We now provide a high-level overview of the data structures and functions used in the system:
staging_queue
: Holds orders temporarily before on-chain confirmation.order_book
: The main structure with two Red-Black Trees (RBT) for bids and asks.:
OrderCommitted
event, it verifies if the commitment matches any order in staging_queue
. Matched orders are removed from staging_queue
, added to order_book
.OrderFilled
event, it checks for the order commitments of filled orders and removes that order from the book if present using the function.OrderCancelled
event, it checks for the order commitment of the cancelled order and removes that order from the book if present using the function.:
phi
value.order_book
.:
:
order_book
to find matches. Note – continually here means as and when a user order is received and is called on it, and not like a CRON job.:
:
order_book
using its commitment.We analyze the scenario when a Whale holding token A (ticker: TOKEN-A) wants to dump his holdings on unknowing followers on Twitter by tweeting a bullish stance and waiting for the delayed market reaction. We analyze both scenarios, one where there is a block limit for an order commitment to be submitted (i.e., if the order is submitted and enclave signature is received at block , then the commitment must be received by block ) and when there is no such block limit (i.e., in the previous case ).
As evidenced by the above scenario, it becomes more difficult for The Whale to dump on followers with precision as (the block limit for submitting order commitments) decreases. However, without a block limit (), The Whale can easily gauge the delayed market reaction and take a calculated move to dump on his followers. However, in practicality, since order commitments can very well be delayed for reasons beyond maliciousness (e.g. transactions failing), we need to set a reasonable value for . In most cases, 10 minutes, or translating to blocks (with Ethereum block time being 12 seconds), 50 blocks is a reasonable limit to be set. However, we can also explore setting custom limits based on the market cap/centralization of the token (for example, , or where is the Herfindahl-Herschman Index of the token).
If this limit is crossed, the order is removed from the staging_queue
. Likewise, we check for stale orders in the staging_queue
, and remove them if the order commitment is not received. We need to ensure that this is done for an interval, say for every blocks. The 'staleness' or the time from order to removal for an order should at least be . Consider the snapshot at . Now for removal at snapshot N, for an order at , . , hence we get . Therefore we can set and check for every blocks for stale orders. However, as given by the inequality, we can increase this quantity in case our approach becomes slow/inefficient. As is evident from this approach, all stale orders in the block range are removed from the staging_queue
at snapshot . Say there are a total of stale orders in this range where , where each represents the number of stale orders at block . The average 'staleness' then is given by . It is easy to see that the minimum value of this is , and the maximum value is .
As seen in the above design, the matching engine is heavily dependent on what happens on-chain (order commitments, filling/cancelling of orders to update the book), since the chain is the source of truth in the protocol. The matching engine can then be said to have an internal 'belief system' which is completely dependent on on-chain activity. The breaking of the chain will hence lead to breaking of his belief system and consequently the entire protocol. It therefore becomes of paramount importance to have safeguards in place to protect against the 'breaking' of the chain – which in most cases can be attributed to the chain undergoing a reorg. We explore one potential approach using light clients to handle reorgs:
Initialize the Light Client
Store Recent Block Headers
Listen for New Headers
Check Parent Header
Determine the Depth of the Re-org
Request Confirmations
Order Commitment Verification
Move to Staging Area
staging_queue
.Update Order Book
order_book
.Notification
Re-processing
staging_queue
are committed on-chain again, process them as usual.order_book
Reorg resistance is one of the most critical aspects of the protocol, and as such requires extensive testing before being deployed to production. The approach to take for testing is running a local Ethereum network (through tools like Anvil or Hardhat) and simulating a reorg. However, the tooling for this has unfortunately not been built yet. There are two issues for this functionality (this one in Hardhat and this one in Anvil/Foundry), however, they didn't receive attention and were hence closed.
We would need to build this tooling ourselves in either framework in order to test our approach to handling reorgs.
Say we want to start off with this simple approach (as described initially by Krasimir):
Head block is 1, calling would create block 2 with block 1 as an uncle including all transactions from block 1.
Matthias outlines the steps to do this in Foundry:
We need:
And we can build arbitrary depth reorgs once we have this basic primitive in place.
As is evident, the enclave is a critical portion of our on-chain dark pool protocol, since it is the only trusted entity which handles "raw" orders (trusted since it is not client-side) and hence it becomes paramount to ensure the resilience and safety of this sensitive order data against failures.
The above diagram (courtesy this paper) succinctly encapsulates the four broad types of failures we can expect to see within our system:
Crash failures: When the internal state of the system is lost. In our case, this is when an enclave terminates when we do not expect it/want it to.
Omission failures: This happens usually when the system fails to receive/send messages. In essence, this is tantamount to a timing failure with an "infinite" delay.
Timing failures: This happens when the system takes longer than expected to respond (say the limit to receive a response is , and the system takes some time to respond)
Byzantine failures: This is the most broad class of failures, and can occur to due to any arbitrary reason, including the system node being actively malicious.
Handling all of these requires a dependable distributed system, and since we deal with raw, sensitive data which is serviced to and from a client, we need a generalized mechanism for efficiently storing and retrieving this data in case of a failure within the system. To demonstrate why this is important, we consider the scenario of a crash failure:
Say there are initially 4 enclaves, Enclaves A through D, servicing clients. Each enclave is servicing the maximum possible number of clients it can. Now, enclave A experiences a crash failure. To handle this, a new enclave E, is immediately spun up. However, since A has crashed, E cannot communicate with A and hence cannot retrieve the data A was operating on before it crashed.
It is now clear hence, in a such a scenario, a reliable data availability mechanism is needed, using which E can effectively and efficiently retrieve the data A had been operating on before crashing, and using this, can start servicing (erstwhile) A's clients.
We explore the Semi-AVID-PR scheme(paper, video) for data availability in our system. Say we have storage nodes (just to be clear – these are dedicated storage nodes and not enclaves) from where we disperse to and retrieve our data. On a high level, the protocol consists of the following primitives:
The protocol also ensures privacy by enabling the client to add a blinding factor when dispering the data to the nodes, so that the node learns nothing about the dispered information. A detailed construction and proof can be found in the paper.
The protocol is also fairly cheap and Byzantine fault tolerant ( Byzantine nodes where total nodes are , as stated in the paper) with distributing a file of 22 MB among 256 storage nodes, up to 85 of which may be adversarial, requiring a total of ≈ 70 MB of communication and storage, and ≈ 41 s of singlethread runtime (< 3 s on 16 threads) on an AMD Opteron 6378 processor when using the BLS12-381 curve.
We use websockets as the preferred communication protocol between the clients and our system. The main reason for this is that websockets allow for a persistent bi-directional connection between client and server, which is exactly what our protocol demands. However, with using websockets comes the issue of scaling – long-running client connections mean that the server's ability to scale up or down is limited. We hence need a solution which allows for auto-scaling in the worst case scenario (long-running clients).
We containerize our application and deploy it using Amazon EKS with EC2 instances running Nitro enclaves as worker nodes. Usually, the naive approach taken would have been to write the application to natively handle websocket connections. However, we offload this responsibility to another service (Amazon API Gateway) from EKS. This provides multiple benefits:
Below is the high-level overview of how this would work (pic credits: AWS Blog):
Here, the websocket logic and connection handling is abstracted away from EKS to the API Gateway. Clients connect to the gateway via a websocket API. The gateway in turn connects with the EKS services using REST APIs (specifically through the enclaves' vsock-proxy
which allows for secure processing of external traffic without passing through the larger EC2 instance) which ensures that there are no open connections leading to termination of pods.
Throttling:
Since we use API Gateway to manage client websocket connections, we can implement rate limiting by setting custom throttling limits.