The design of blockchain-compatible cache for Tezos

# The design of blockchain-compatible cache for Tezos ## Context To validate a block, a node needs data stored on the chain. Turning stored data from its on-disk representation to an in-memory value can be expensive: typically, a well-typed contract needs to be parsed and typechecked each time it is called ; reads and writes to a persistent storage is an order of magnitude slower than in-memory reads and writes, etc. We observed several bottlenecks related to the cost of maintaining persistent data: contract code loading, stake distributions, bigmaps updates, etc. Previous attempts to solve these issues were based on indexes or smart ways of representing data on the disk. These techniques add more complexity to the system. Another way to tackle this problem consists in storing frequently used data in an in-memory cache that *persists across blocks*. In this document, we discuss the requirements for such a cache mechanism system to be compatible with the security and safety constraints of the Tezos blockchain and we present a design that satisfies these requirements. ## Requirements ### Functional requirements The main feature of a cache is to store a set of frequently-used values in a fixed amount of memory. In the specific case of a blockchain, one difficulty lies in the existence of multiple parallel branches: the notion of frequently-used values is local to a branch, hence there is morally one cache per branch (even though a node may optimistically maintain a single cache to reduce memory consumption and update it when a reorganization occurs). **Invariant 1:** the cache at the exit of some block must match the block at the entry of its successors, however the cache between these successors can evolve into distinct caches. Another important feature of a cache is to properly identify values. In particular, if a cached value $V$ is the result of a computation $C$ that depends on the blockchain context, it must be preserved along the branches, as long as it is not updated: **Invariant 2:** the cache-representation of a value must be preserved through the "successor" relation as long as $C$ does not change, i.e., $V$ is up-to-date. Since the cache at the entry of the block determines the execution of its validation, a freshly started node should be able to reconstruct this cache in a deterministic way (and quickly): **Invariant 3:** The content of an in-memory cache is uniquely determined by its stored representation. Of course, a cache must be faithful to the storage. **Invariant 4:** The content of an in-memory cache must be consistent with the values stored in the context (i.e., the persistent representation of the chain on the disk). The Tezos blockchain has a self-updating mechanism. The cache should not persist accross the protocol updates because the values it holds are likely not to make sense in general once the protocol code has been updated. **Invariant 5:** A cache must be invalidated during protocol upgrades. Some parts of the code must not depend on the cache. For instance, in the implementation of the partial application of operations -- a checker for the well-formedness of operations -- it would not make sense to take the cache into account because these operations are checked in a checkpoint context that can be two weeks old. **Invariant 6:** A cache should be locally deactivable. ### Security requirements In addition to the functional correctness, the operations over caches should not be exploitable by an attacker. As usual, a node must be protected against undercarbonated computations. As a consequence, in the case of a cache miss, the gas consumption must take the reconstruction of the value into account. In the case of a cache hit, the gas consumption should be significantly lower but it must take the cache lookup into account. When the cache is reconstructed because the node has shutdown, rebuilding the cache cannot be carbonated because node rebooting is a maintainance-related event. The existence of non-carbonated operations increases the attack surface. Fortunately, preheating a cache because of a reboot should be a seldom event, difficult to exploit by an attacker. Reorganization, that is, the situation where a given branch has not been chosen as the new head for the chain, is however a very common event. In that case, the cost of synchronizing the in-memory cache with this new head must not be exploitable by an attacker. Again, this operation cannot be carbonated since it is a side-effect of the consensus, hence with no specific responsible individual. ### Predictability Users must be able to determine the gas consumption of their operations. The cache complicates this task. Indeed, as said earlier, if an operation requires some data for its execution, the presence of this piece of data in the cache has a significant impact on the gas consumption. As the system may ultimately use many caches (for contract code and contract storage, for bigmaps, consensus-related values, etc), predicting gas consumption will be out of reach to normal users. Indeed, because of the cache, this consumption depends on implementation details while we can only assume that the user has a knowledge of entrypoint specifications or of basic operations. Moreover, the state of the cache depends on the other operations considered by the node, which is impossible to predict in general. For a good user experience, the user interface must provide an evaluation of the gas consumption that clearly exposes the dependency between this evaluation and the age of the pieces of data assumed to be in the cache. One can even imagine an advanced simulation mode where the evaluation of gas consumption produces a distribution of possible gas consumptions depending on the number of blocks before operation injection. One important requirement for predictability is to update the in-memory cache in-between blocks. That way, caches of the next block to be baked can be observed by the client (through an RPC). Not only this reduces a bit the discrepancy between the block at operation-creation time and the block at operation-injection time but it also prevents operation reordering to have too much influence on the gas consumption. ### Performance requirements The cache must be an in-memory data structure. Even though the cache must also be reconstructible from scratch after a reboot, this property cannot be achieved through a continuous synchronization between the in-memory cache and the storage because I/O operations to persistent storage are too costly. As expressed earlier, the performance of cache synchronization after reorganization is probably important from a security point of view. In terms of RAM usage, the cache should have enough capacity to deal with the actual (and foreseen for the next protocol) activity on the chain but it must not prevent the node to run on machines with relatively small memory capacity. In terms of storage usage, the cache should not duplicate data already stored in the context. In terms of responsivity, cache preheating must not significantly increase node booting time. This responsivity is critical for bakers because an important latency would make them miss blocks, hence opportunities to get rewards. ### Expressivity It should be possible to cache any value manipulated by the protocol. In particular, there is no point in restricting the cache to values that can be serializable. Indeed, since the cache is an in-memory store, it can contain any type of value. The important constraint is to have enough persistent information to rebuild the cache efficiently. ## Proposed design The requirements exposed in the previous section raise the following problems: 1. Which components of the system should be responsible for each aspects of the cache implementation? 2. How should the cache be implemented in terms of data structures? 3. How should it be used? ### Design choice for problem D1: A modular cache mechanism in the shell Since the cache mechanism must be aware of the chain blocks and branches, the shell is naturally the component responsible for providing the cache service to economic protocols. The shell must maintain the in-memory cache consistent with the current head and with the currently active context. The economic protocol must usually provide protocol-dependent information to the shell. In the case of the cache, the protocol must provide enough information to establish the validity of a cache entry in some specific context. The economic protocol uses the cache mechanism to define new caches, to put or to update some value in a cache, and to lookup for cached values. The domain of the cache is stored in-between blocks. This domain is accessible through RPCs. ### Design choice for problem D2: A block-sensitive LRU data structure When the cache is full and a new entry must enter the cache, one must decide how to proceed to enforce the size limit. A simple but reasonable strategy consists in removing the Least Recently Used (LRU) entry. Many other strategies exist but they are more complex and maybe less predictable. We propose to start with the LRU strategy until we have factual elements (e.g., benchmarks or new requirements) to move to another strategy. As said earlier, the data structure must allow for efficient resynchronization with the current context in case of reorganization. It is critical to ensure that all nodes reconstruct the same in-memory cache. We propose the following data model: - An in-memory cache is a partial function from hashes to entries. - An entry is made of: - a function to get the cached value ; - metadata : - a size to evaluate the space consumed by this value in the cache ; - a counter level to determine if the entry is the least recent used ; - a cache nonce to determine whether an entry is valid in the current context. - Each context in the storage contains the domain of the cache and the metadata of each entry. Notice that the cached values are never serialized, only the domains are. Next, the operations over in-memory caches are: - `load ctxt mode load_value`: In a given context `ctxt`, this function makes sure that the in-memory cache of `ctxt` is correctly populated with the entries described in the stored domain of `ctxt`. There are two modes of loading: - `Immediate`: Every entry in the domain is inserted into the in-memory cache if it is not already present. Missing cached values are immediately loaded with `load_value`. - `Lazy`: Same as `Immediate` except that missing cached values are loaded on demand. This mode reduces the time taken by cache preheating. Since `load_value` is not carbonated, this mode may increase the attack surface. - `update ctxt k (Some e | None)` updates, inserts, or removes the entry for `k`. Notice that, in case of insertion in a full cache, we choose NOT TO REMOVE the LRU entry at this point. The point is to improve the predictability of gas consumption by having the same cache misses for all the operations of a given block. As mentioned earlier, this policy has also good properties in terms of security. - `clear ctxt` clears the cache of `ctxt`. - `resize ctxt limit` resizes the in-memory cache of `ctxt` to a size of `limit`. - `sync ctxt` saves the in-memory cache domain in the storage of `ctxt`. This function also removes the Least Recently Used entries in a full cache to enforce the space limit. ### Design choice for problem D3: a policy to use the cache in the protocol One important consequence of the two previous design choices is: **it is the responsability of the cache mechanism client to make sure that the cache and the context stay synchronized**. Indeed, the fact that the storage is still directly accessible to the client makes the values of the context potentially modifiable with no corresponding update in the in-memory cache. For this reason, the API will be documented and designed to properly separate two use cases: 1. A safe use-case: a cache for immutable values. 2. An unsafe use-case: a cache for mutable values. It will be emphasized that the second category of use-cases must be designed very carefully: we will highly recommend to properly encapsulate read/write accessors behind a unique interface to avoid any risk of inconsistency between the values in the storage and the values in the in-memory cache. ## Implementation space In addition to the high-level design choices described earlier, we now enumerate lower-level implementation problems. ### Problem L1: monolithic cache or multiple caches? Given that distinct modules of the protocol will cache different kinds of values, one could wonder if the cache should not be made of several caches, typically by assigning one cache to each kind of values. On one hand, this separation would minimize the interferences between different parts of the protocol: if there is only one cache, the addition of an entry by module $A$ could induce the removal of an entry by module $B$. Besides, having multiple subcaches could help allocating a precise amount of memory to specific kind of entries and probably simplify the simulation of gas consumption. Having multiple caches allows for introducing subcache-dependent strategies in the future. For instance, we could introduce a notion of "guarantee to stay in the cache" that can be paid by users. On the other hand, if a subcache is regularly full, this may create incentives to rephrase smart contracts to use other subcaches through dangerous hacks. **Decision**: Implementing multiple caches seems the right choice in spite of the disadvantages described in this section. To avoid the pitfalls mentionned above, we propose to have two main caches, one for user-level modules (contract, storage, bigmaps, etc) and another for consensus-level modules (stake distribution, votes, etc). ### Problem L2: a robust type for keys Independently from the previous problem, the type for keys must ensure that no collision is possible between keys used in different parts of the protocol. For this reason, a notion of key namespace could help. **Status**: Implemented in this [commit](https://gitlab.com/nomadic-labs/tezos/-/commit/83a5ce95f6ee76f996d436bc112111351511d8d8). ### Problem L3: how to evaluate the life expectancies of cache entries? As explained earlier, the client must enhance its simulation mechanism to predict gas consumption for some operation. Of course, a worst-case evaluation does not make sense: considering that there are no cache hits would make the cache completely useless as its purpose is to reduce gas consumption, hence fees. **Decision**: We propose to compute life expectancies of cache entries dynamically by maintaining the median A of the number of entries removed per block during the $M$ last cache synchronizations. Then, we define the life expectancy $E$ of the $K$th oldest entry as: $$ E = \left\lfloor {K \over A} \right\rfloor $$ During gas simulation, we perform the operation simulation with a cache with only entries of life expectancy $E > L$ and $L$ is a parameter corresponding to the assumed number of blocks between simulation time and injection time. If $A = 0$, the cache is currently not full. In that case, $E$ can be considered infinite. Pathological cases where the cache status alternates between full and not full at each block should be handled by the fact that we compute $A$ as the median on the last $M$ blocks. ### Problem L4: how to compute the size of cache entries? Each time the cache accepts a new entry, it must update its size. In a high-level language, determining the memory usage of some value is not easy because value representation is not exposed to the programmer. In the specific case of OCaml, sharing between value representations complicates this process even further. It is therefore unrealistic to try to determine the exact size of the cache. However, for security reason, we must ensure that the computed size an over-approximation of the actual cache size. `Obj.reachable_words` is low-level utility function that returns the heap-allocated words reachable from a given value. ### Problem L5: how to make sure that cache size does not explode? As said in the predictability section of the requirements, the system must enforce cache size limit by removing entries only at block finalization time. There is therefore a risk that the cache grows arbitrarily during block validation. Fortunately, the limit on the storage consumed by an operation should prevent the cache size to be increased by more than a constant during block validation. **Decision**: To be implemented correctly, the storage cost of inserting an entry in the cache must be checked **before** the insertion. ## Validation plan ### Unit tests #### The cache data structure behaves as an LRU cache. To reduce the complexity of testing, the in-memory cache would be initialized with size as `15` bytes and all cached values would be `4` bytes. The LRU would happen only during the `sync`, which means only `3` of the most resently used keys will be kept afterwards. Notice that, bewteen `sync`s, the size of cache can grow beyond its size limitation. Given that a cache is illustrated as `[a, .. z]` where a letter represents a key and the leftmost key is the oldest, namely the least recently used key, the LRU testing scenario is a seqence of cache operations described as follows: | operation | cache (keys only) | | :-------- | -----: | | init | `[]` | | add 5 new value | `[a, b, c, d, e]` | | sync | `[c, d, e]` | | update `a` | `[c, d, e, a]` | | update `c` | `[d, e, a, c]` | | sync | `[e, a, c]` | ### Integration tests - During a protocol update, the cache is cleared. - During a reorganization, the cache is the same on every node. ### Replay Replay the chain, validate that the cache size and the cache strategy indeed reduces gas consumption. Perform several adjustments of the parameters based on this benchmark, the gas reduction being the measure to optimize. ### Robustness tests Try to find attacks based on the worst case complexity of non carbonated operations. ## Implementation plan We propose: 1. to develop a first MR that implements the caching mechanism at the level of the shell as well as an illustration of the API usage in the protocol. https://gitlab.com/tezos/tezos/-/merge_requests/3234 3. to develop an MR for each kind of caches we want to add in the protocol. ## Project management The branch for the first MR is [nomadic-labs:klakplok@cache_cache](https://gitlab.com/nomadic-labs/tezos/-/tree/klakplok@cache_cache). Issues to solve to reach this milestone are listed here: https://gitlab.com/tezos/tezos/-/milestones/30 The project board is here: ## Resources - [Some thoughts on the cache and the context](https://hackmd.io/6JGQStvfSheTbf3q4mfzqw) - [Branch for the MR](https://gitlab.com/nomadic-labs/tezos/-/tree/klakplok@cache_cache)