Accumulators & Pyth

Accumulators & Pyth ================================================================================ Pyth currently provides pricing information by aggregating price feeds on PythNet and submitting the prices to other chains over Wormhole. In our original design, Pyth would constantly push all prices to a target chain multiple times a second, but this was costly. Today, we instead offer an on-demand model. Prices can be requested from PythNet and sent to target contracts to be used. It remains near real-time but no longer requires pushing many potentially unused prices everywhere. Many protocols however require multiple prices at once, for example a pair of prices may be needed during a token swap. To facilitate this the on-demand model allows requesting a batch of prices at once, which we currently limit to 5 due to various chain limitations. This batching introduces a new cost: the more prices a user needs the more expensive the transfer becomes to verify. In order to allow Pyth to fully scale this document presents a way to batch Pyth the current price set with constant cost using a cryptographic primitive called an accumulator. Target chains can then verify as many prices as needed with a constant (or at least sub-linear) cost. Basic Definition ================================================================================ Accumulators are very similar to hash functions. Where a hash function takes an arbitrary amount of bits and compresses the result into a fixed sized bitstring, an accumulator takes an arbitrary sized set and compresses the result into a single element. Unlike classic hash functions however, accumulators retain their ability to test for set membership which makes them attractive for our use case. Using this primitive we can compress and send our entire price set cross-chain with a constant sized value. Target contracts could then simply check for the prices they wish to use. Accumulators from the Abstract ================================================================================ Accumulators are based on quasi-commutative functions. These are two-argument functions that commute over their second argument. In other words, any function where this equality holds true: $$ f(f(x,y_1), y_2) = f(f(x,y_2), y_1) $$ A very simple example is our good old `multiply` operation over the integers. If we assume `mul(x,y) = x*y` then we can see that the above property holds: $$ mul(mul(x,y_1),y_2) = mul(mul(x,y_2),y_1) $$ We can already build an accumulator out of this function. If we multiply a set of numbers, we can think of the result as the accumulation of the different factors. For example, given a set of primes: `{3,5,11,23}`, we can accumulate it with recursive calls to `mul`: $$ mul(mul(mul(3, 5), 11), 23) = 3795 $$ No matter how many elements we accumulate, we will always end up with a single result. We call this result a _commitment_ to the set. Note also that it doesn't matter what order we accumulate. We could just as easily accumulate `{11,3,5,23}` with the same result. This model also inherits the set membership test of the original set, which is the core selling point of accumulators, which we can do in our model do by checking for division without remainder: ```javascript assert(3795 % 3 == 0); // Member! assert(3795 % 5 == 0); // Member! assert(3795 % 11 == 0); // Member! assert(3795 % 23 == 0); // Member! assert(3795 % 17 != 0); // Non-Member! ``` > To understand why we accumulate primes, instead imagine accumulating `{4, 8, 9}`. The membership test `Acc % 2` will come back true due to being a factor of 4 and 8. The way to make sure we don't accidentally accumulate factors by mistake is to use numbers that are guaranteed to not share factors with each other, which primes suit perfectly. We can take this single accumulated `3795` value and send it cross-chain, where the target contract can verify as many members as needed. See [this paper][] for detail. [this paper]: https://link.springer.com/content/pdf/10.1007/3-540-48285-7_24.pdf ### Why is quasi-commutativity important? In our above example we performed our membership test by checking for a remainder after division. Not all accumulator schemes however have an easy membership test like our `mul` function did, but as long as the accumulator function is quasi-commutative we can employ a trick. Let's say we accumulate the following set: $$ acc = f(f(x_1 ,x_2),x_3) $$ And we wish to prove $x_2$ is part of the original set. We can do this in a generic way by _removing_ $x_2$ from the set and sending it along with the new accumulator. We call the new set with a hole in it the $witness$: $$ \begin{aligned} commit &= f(f(x_1, x_2),x_3) \\ witness &= f(x_1, x_3) \\ element &= x_2 \end{aligned} $$ The receiver can prove membership of the original accumulator by simply adding it back to the set and seeing if it's the same: $$ commit == f(witness, x^2) $$ This fundamentally re-orders the elements that have been accumulated, and so this trick only works if $f$ is quasi-commutative. One problem that comes up in the accumulator space often however is that deleting elements is not always easy or even possible. In our $mul$ example, deleting is performed by dividing by the prime we wish to remove. In other accumulators where a deletion mechanism does not exist, the only way to produce the witness is to **redo the entire accumulation minus the elements we wish to prove which requires knowing the original set.** As we will see this has design implications. Accumulator Terminology ================================================================================ Before looking at some accumulator choices, let's quickly cover the terminology used when discussing accumulators. There are several types of accumulator depending on which ability the scheme supports: Term | Meaning ----------|-------- Static | Single elements can be added in $O(1)$, can't delete. Witness in $O(n)$ Dynamic | Single elements can be deleted in $O(1)$. Witness in $O(1)$. Universal | Accumulator can also prove non-membership. A great place for definitions of various kinds of accumulators can be found in the trait hierarchy defined in [this Rust accumulator library](https://github.com/dignifiedquire/rust-accumulators/blob/master/src/traits.rs). For Pyth, we ideally would want a dynamic accumulator for efficient single member proofs. This is important because price requests are currently sent to a component we call the price service which does not have access to the full set. If using a static accumulator we would have to modify the price service to request the full set to produce witnesses. Here is an overview of available accumulators: Accumulator | Type --------------------|--------------------------------- mul | Dynamic, Universal RSA | Static, Universal RSA (Trapdoor) | Dynamic, Universal Merkle | Static Merkle (Ordered) | Static, Universal Merkle (Sparse) | Static, Universal Bilinear | Static, Universal Bilinear (Trapdoor) | Dynamic, Universal With that in mind, let's cover each of the available accumulators and their implementation considerations in more detail. RSA Accumulators ================================================================================ **Libraries of interest:** [oleiba/RSA-accumulator](https://github.com/oleiba/RSA-accumulator) [accumulators-rs](https://github.com/mikelodder7/accumulator-rs/tree/master/accumulator-rsa) While the `mul` example is technically a valid accumulator construction, there is a small problem with this simple model; it is very easy to reverse the process and discover the factors. One of the key properties of accumulators is no attacker should be able to recover the original set given only the accumulator. It is easily solved by using a different quasi-commutative function that is harder to break apart than our simple `mul`: modular exponentiation in the RSA setting. This is a non-dynamic but zero knowledge accumulator. *Note: The zero-knowledge property isn't actually valuable to Pyth, as we do not need to hide the price set from anyone. However the difficulty of factoring the RSA exponent prevents an attacker from forging prices which we **do** value.* ### Modular Exponentiation as an Accumulator As $(x^a)^b = (x^b)^a$ is quasi-commutative, we can build our new accumulator on top of the famous [Strong RSA Principle](https://en.wikipedia.org/wiki/Strong_RSA_assumption), which states that as long as the factors of $N$ are unknown the following accumulator is easy to compute but difficult to reverse: $$modexp(a, b) = a^b \mod N$$ We can chain this accumulator and re-order just as in our $mul$ example. Only now while it is easy to add and test membership it is hard to _discover_ members. Our witness trick continues to work in this scheme. ### Validator Calculation Time As mentioned earlier, RSA accumulators can only accumulate primes otherwise we expose ourself to collision attacks. In order to accumulate arbitrary data, RSA accumulators are combined with a mapping function known as `hash_to_prime` that converts that data into a unique prime that is inserted into the accumulator in place of the full data: ```rust // Given a complete arbitrary data blob, find a hash // that is prime which represents the original data // in the set. This pure brute force approach just // repeatedly hashes until a prime is found. No fast // method to do this is known. fn hash_to_prime(data: [u8]) -> u256 { let mut counter = 0; while let Ok(hash) = sha256(data || counter) { if (hash.to_u256() & 1).is_prime() { return hash.to_u256() & 1; } counter += 1; } } ``` In Pyth, validators are the entity that needs to perform this operation. In our Solana fork, block time is every 200-300 milliseconds so it is important this has a low cost. The following criterion benchmark shows the cost of a single call to `hash_to_prime` using a hardware SHA2 instruction: ``` primes/primes time: [2.2480 ms 2.2481 ms 2.2482 ms] change: [-27.919% -27.879% -27.842%] (p = 0.00 < 0.05) Performance has improved. Found 5494 outliers among 100000 measurements (5.49%) 12 (0.01%) low severe 53 (0.05%) low mild 4110 (4.11%) high mild 1319 (1.32%) high severe ``` Thus a single call costs roughly 2-3ms. Benching a full implementation of RSA accumulators from [this paper](https://eprint.iacr.org/2018/1188.pdf) adding 100 primes gives the following which matches my own benchmark: ``` accumulators::Accumulator::add took 212 milliseconds. ``` This gives us a small problem. In order to accumulate 100 prices linearly we incur 200ms of overhead, nearly doubling our block time. Even if we were to parallelize the prime search over a 32 core validator, we would still only be able to reduce the search of 100 primes to 7-8ms. If we want to scale to 10,000 prices we'll run into the same problem. There are a few ways this can be optimized: - SIMD to parallelize the search even further (8x perf, so 0.7ms for 100 prices). - Use a cheaper collision-resistant non-cryptographich hash. - Use a faster probabilistic primality test. Unfortunately there does not seem to be an alternative approach to the `hash_to_prime` primitive that I could find. ### Distributed Primes One idea I had as a side-note that could potentially make this scale infinitely is to have the publishers run `hash_to_prime` as part of their price component submission. When the aggregation is complete we can pair the result with the multiplication of the primes: ```javascript prices = [(comp1, prime1), (comp2, prime2), ...] median = aggregate(prices.map(p => p.0)) prime = reduce(mul, prices.map(p => p.1)) ``` We already rely on the assumption that the hash function to primes is pre-image resistant so an attacker cannot use the primes that make up the product of primes to find a collision without breaking the hash function. The result is a new product of primes that shares no prime factors with any other price and so should be safe to use as the accumulator value. In the original scheme we generate $n$ primes for $n$ aggregations where all found factors are co-prime with one another. Intuitively using a hash function to find primes means it should be hard to find any collision at all, for SHA2 this is $1.47*10^{-29}$. In the new model you have $nm$ primes where $m$ is the number of price components. A slightly more rigorous analysis of the probability of an attacker finding a set of primes with any coprime factor can be done using the approach [described in this SO answer](https://mathoverflow.net/questions/119416/probability-of-all-combinations-of-k-numbers-among-n-being-coprime). [This evaluation from Anirush](https://www.wolframalpha.com/input?i=%28+%28+%28N+%2F+ln%28N%29%29%21+%2F+%28N%2Fln%28N%29+-+n*m%29%21+%29+%2F+%28N%2Fln%28N%29%29%5E%28n*m%29+%29+%2F.+%7Bn+-%3E+100%2C+m+-%3E+32%2C+N+-%3E+2%5E51%7D) gives a $99.9$ with eight 9's probability of the primes chosen being coprime up to bit-size $2^{51}$, with 256 bit hashes it should be infeasible. I am not a rigorous mathematician or cryptographer however so I am _very_ hesitant to use any strange rolled-at-home schemes, so would appreciate feedback on this idea. ### On-Chain Costs On ethereum there is a precompile capable of performing modular exponentiation which [describes the gas cost here.][1] -- of note is the mention that a worst case gas cost of a single $modexp$ is `22,853,376` gas which is a couple of orders of magnitude over our current `300K` gas cost for a verification. However given sane choices of exponent, the max gas cost looks to be around `80K` per check potentially down to as low as `4095` when choosing a very low exponent. Compared to our current `300K` gas check for 5 prices now this would give us a gas cost as low as `25K` to prove the same 5 prices. This cost can be further reduced as RSA accumulator proofs can be batched together into a single proof that proves all the individual elements at once: $$ \begin{aligned} accumulator &= f(f(f(x_1, x_2),x_3),x_4) \\ witness &= f(x_1, x_3) \\ ... \end{aligned} $$ Note that we also need to perform a single hash on-chain to verify that the prime being checked is actually the price. This has a small cost however, around $$30 + (bytes / 32 * 32)$$. ### Choosing an N In standard RSA, $N$ is chosen by multiplying two large random primes: $N = pq$. While it is safe to share $N$ itself, $p$ and $q$ must be kept private because they can be used by an attacker to create membership proofs for elements that were not accumulated. This would allow attackers to forge prices. This means we need some way of choosing an $N$ without anyone knowing $pq$, which can be done using multiparty computation in a setup phase. It's important that if we do this, we include the public otherwise trust in Pyth could be affected. ### UFO Primes There is an alternative offline way to find an $N$ with unknown factors that does not require a public setup phase. This idea is used in Zerocoin and are known as RSA Unknown Factorisation Objects. Calculating them can be done by a single party and verified to be safe. Some research on these can be found in the Zerocoin paper as well as [here](https://ethresear.ch/t/generating-rsa-ufos/3401) where they are investigated for use in VDF functions which have a similar requirements for large unfactorable numbers. ### Universal RSA Accumulators It turns out it is possible to design RSA accumulators that can delete, making them efficient again in cases where we wish to prove thousands of prices. The work discussing this can be found [in this paper](https://eprint.iacr.org/2018/1188.pdf) which extends the traditional RSA accumulator with a scheme that allows for both dynamic deletion and efficient batching. This however requires a trusted party to know the trapdoor ($pq$) in order to produce the proofs, and so isn't a viable option for Pyth due to centralising trust. [1]: https://github.com/ethereum/EIPs/blob/f4179980751f7a3434928181c8950184ab357cf5/EIPS/bigint_modexp.md Merkle Trees ================================================================================ **Libraries of interest:** [merkle](docs.rs/merkle) [accumulators-rs](https://github.com/mikelodder7/accumulator-rs/tree/master/accumulator-rsa) [merkletree](docs.rs/merkletree) [solana-merkle-tree](docs.rs/solana-merkle-tree) [merkle-cbt](docs.rs/merkle-cbt) An alternative to _any_ accumulator is a merkle tree. We can simply create a merkle root of all the prices in a given block and allow verification of that tree on target chains. The main cost here is that generating the proofs requires an algorithm known as `merkle tree traversal` which is unexpectedly expensive and has to be done once per proof. There is [some literature](https://link.springer.com/content/pdf/10.1007/978-3-540-24676-3_32.pdf) on doing this traversal in $O(2log_2(N))$ time which would allow us to implement this efficiently which is a design consideration if we choose this data structure as our choice. There are several pre-existing libraries, here are benchmarks adding 100,000 elements and proving 10,000 using the most viable candidates: Library | Add | Prove | Verify | Size ---------------|-----:|-------:|--------:|--------: merkle | 72ms | 32ms | 4ms | 128KiB solana-merkle | 9ms | 0.1ms | 0.2ms | 24KiB merkle-cbt | 1ms | 615ms (merkle traversal problem) | 128ms | 48KiB The solana library is the most attractive for several reasons: - It comes with the validator, so no new dependencies. - It's fast, very fast, using their optimized `hashv` function based on SHA2. - Ethereum supports SHA2 natively. - It uses the standard node duplication technique for unbalanced trees. - It uses an efficient array layout for the tree and proofs, similar to other chain impls. It dodges the merkle traversal problem by assuming the prover knows the indices of all relevant elements ahead of time which makes the API slightly weaker, but is easy to work around by keeping a mapping of price/index during generation. There are a couple more general libraries that aren't quite as fast as Solana but they require providing your own hashing primitives. I don't think they're worth covering as Solana already takes care of easy layout for on-chain verification as well as proper [pre-image resistance](https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_attack) utilizing prefixed SHA2. Note that none of these are usable as an _actual_ accumulator by the strict definition, but for Pyth a standard merkle tree will work as we don't need a lot of the properties of cryptographic accumulators (I.E zero knowledge or non-membership). ### On-Chain Costs Assuming the use of `solana-merkle-tree` or a similar implementation, the number of hashes required to verify membership is $O(log_2n)$ hashes per proof. The [ethereum gas cost](https://eips.ethereum.org/EIPS/eip-2666) for SHA2 is 60 gas plus 6 gas per 32 byte chunk. Given we need to do two hashes per level the on-chain cost for verifying a single price with a tree containing 128 nodes (reasonable for our current price list) we end up with a cost of: $$ \begin{aligned} 30 * log_n 128 &= 60 * 7 = 720 \end{aligned} $$ Submitting the data itself has a gas cost of 68 bytes per non-zero byte, which for a single proof which contains 8 hashes each 32 bytes in size is $68 * 32 * 8 = 17408$ in the worst case where none of the hashes contain a 0 byte. This gives us a total cost of $18128$ per price, or $90640$ gas to verify a batch of 5 prices which matches our current batch capability. For reference, an RSA proof for an accumulator with a key size of 3072 bits would be 384 bytes, for a total of $26112$ for the calldata and roughly $80K$ gas for the $exp-mul$ call. Therefore RSA starts to become more efficient as soon as we want to prove more than 5 prices with a merkle tree due to RSA batching. It should be noted however that there are few protocols at the moment that need more than a handful of prices and so the main benefit of the accumulator, namely the efficient batching of the whole set, is still present. ### Sparse Merkle Trees Merkle trees can be converted into a similar data structure that _can_ be used instead as a base for an accumulator. As mentioned while we probably don't need this, this document covers the implementation anyway for future use cases. To understand the problem first let's see what happens with a normal merkle tree. In a normal merkle tree changing the order of insertion results in a different root hash, meaning the operation is **not** quasi-commutative: $$ \text{insert}(\text{insert}(tree, a), b) \ne \text{insert}(\text{insert}(tree, b), a) $$ We can see this visually by accumulating 3 elements in different orders resulting in two different merkle tree roots despite representing the same set of items. Note that the top and bottom show two different roots for the same order of items: ![](https://pape.rs/NFr4tjzD) Note that both these trees represent the same elements in the correct order but result in different tree roots. The problem with this is this prevents merkle trees from being quasi-commutative, in other words we cannot use the generalized approach to creating set membership proofs seen above. The solution is to use a merkle tree with deterministic placement of leaves. In a sparse merkle tree, we pre-allocate a giant tree with all potential leaves for every possible value we want to store. If our inputs were letters of the alphabet, then `a` would go to 0, `b` to 1, `z` to 26 regardless of the order we insert the items. This also gives us the benefit of being able to prove membership in only $O(log N)$ time as the merkle tree in question is a perfectly balanced binary tree. Of course if we want to store hashes of prices instead, we'd need a tree of 2^256 in size to pre-allocate all the possible hashes which is impossible. However because the majority of nodes in the tree remain null we can optimise the tree by replacing any large null subtrees with a single null value to compress the tree. **Other explanations of this concept can be found [here][2], [here][3] and [here][4]. The third link from Lisk is recommended reading as it explains the data structure succinctly and compares to their own needs and alternatives.** [2]: https://medium.com/newcryptoblock/sparse-merkle-tree-introduction-a267f3a29223 [3]: https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751 [4]: https://github.com/LiskHQ/lips/blob/main/proposals/lip-0031.md#sparse-merkle-trees Elliptic Accumulators ================================================================================ Elliptic accumulators are most similar to RSA accumulators, but instead of using the quasi-commutative property of modular exponentiation they rely on addition over a pair of curves known as a bilinear pairing. Before digging into these accumulators it is important to mention that support for ECC accumulators is very flakey. It requires on-chain primitives for an elliptic curve that supports bilinear pairing and different chains offer support for different curves which would require us to implement multiple. $BN254$ on Ethereum is not one such curve meaning we run into support issues on a large class of chains. In order to verify these then most implementations implement their accumulators over a curve known as the JubJub curve which is bilinear friendly and can be embedded in $BN254$. This however means using zero-knowledge proofs. These accumulators therefore come with several drawbacks: - Expensive to compute (despite being cheap to verify). - Complex. - Flakey chain support. - May require embedding in a SNARK scheme. - Non-dynamic means they have no benefit over RSA feature wise. If we're going to resort to SNARKs, we may as well start proving the entire aggregation process Pyth performs instead which brings a whole host of benefits. As they are the most complex accumulator I leave out the implementation details, as I am also uncertain myself about some of the math. However included here to fully exercise the document are benchmarks of accumulation for comparison with RSA/Merkle: Chain Support ================================================================================ To make a decision on which accumulator to use, an important requirement is which chains support which accumulator. The following grid gives us an overview of possible chain support for each: Accumulator | EVM | SOL | CW | Algo | ADA | DOT | Move | NEAR | Mina ------------|-----|-----|----|------|-----|-----|------|------|------ RSA | ✅ | ✅ | ✅ | ❌^[1] | ✅ | ✅ | ✅^[2] | ✅ | ✅ Merkle | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ Bilinear | 🟠^[3] | ✅^[4] | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ ^[1]: Team consideration for `expmod` is already there, could push them to include it. ^[2]: Would have to hand-roll but can be done. ^[3]: Compute budget may impact this one, needs more research. ^[4]: This requires the $BLS12-384$ curve which is not ubiquitous. Conclusion & Design Steps ================================================================================ ### Accumulator Choice The best choice here would be all three. However the merkle tree option has the most support, the simplest implementation, and average costs for all chains. As a starting point I believe we should implement merkle accumulators followed potentially by RSA accumulators due to the potentially cheap witness generation and wide chain support. Both the RSA and ECC accumulators can be modified to act as threshold schemes, and so it is possible to extend these accumulator implementations such that they can be verified _without wormhole_, with this extension the price service can be removed and proofs can be directly requested from PythNet for extremely fast price requests. This is another reason a followup addition of RSA accumulators should be considered. Given the generic-ness of the accumulator scheme, I suggest we provide and fulfill the following traits within our stack even with merkle trees as our initial choice: ```rust trait Accumulator<T: AsRef<[u8]>> { type Proof: Proof<T>; pub fn add(v: T); pub fn del(a: Self, v: T) -> Self; pub fn prove(a: Self, v: T) -> Proof; pub fn verify(a: Self, v: T, p: Proof) -> Result<()>; } // Requiring this trait allows us to limit ourselves to accumulator // schemes that allow for batching of proofs. trait Proof<T> { pub fn combine(l: Self, r: Self) -> Self; } ``` This means we can start by implementing the simplest protocol and extend the validator to produce accumulators of any kind. The price service can then return any available accumulator proof depending on chain allowing people to always pick the best proof possible in each case. This design also means we can incrementally improve our accumulator designs, a good way to go about this would be to design our payload in such a way that we can constantly extend it with different versions of proof: ```rust enum Proof { MerkleV1(MerkleProof), RSAV1(RSAProof), RSAV2(RSATrapdoorlessProof), ... } ``` It is then entirely up to each destination chain to implement the proof system the chain best supports and reject any other proof type. In general **non-dynamic** accumulators introduce a problem for us. In general with a static accumulator the flow for a request to the price service will take these steps: 1. One request from a user to the Price service for a set of prices. 2. One request from the Price service to Wormhole for an accumulator. 3. One request to Pyth for the _entire_ set of prices in this accumulator. 4. Accumulate of all elements minus the price to produce a witness. 5. Return result for request at (1). As there are no viable dynamic accumulators that can be fully decentralized this section assumes that we will have to eat the cost of transferring the price set to the price service. Until we wish to scale to thousands of prices this should be reasonably efficient. ### Design Components In order to implement accumulators we need several components and changes. ### Validator The validator needs to be updated to produce accumulators, this can be done with a sysvar which is outlined in a separate document. This work, along with feature flagging should be done first to de-risk any complications from having a forked Solana implementation. ### Price Service The Price Service needs to become accumulator aware. On request for a set of prices it will need to pull the latest accumulator VAA from Wormhole, and for each price requested produce a witness (or a batched witness). Whether or not the accumulator is dynamic this will require a request to a validator to pull price accounts. These can be batch requested, however, because the accumulator VAA is likely to be stale compared to the RPC. _NOTE: As an aside, through writing this document it is becoming more and more attractive to implement the price service as a guiser plugin. It's close to the validator, able to track accounts nearly in real time and removes the latency of requests too and from the existing price service. As a side benefit it would also fully distribute the price service to the validator set which is a property we already have considered wanting._ ### Geyser Layer (Future?) In order to reduce the cost of accumulation, the price service can be replaced by a Guiser plugin that watches accumulator updates and provides an RPC that provides witnesses. The plugin could expose an RPC with an endpoint that looks roughly like so: ```json { "height": 1282381237, "prices": [ "111111111111", "111111111112", "111111111113", ] } ``` With the following response: ```json { "height": 1282381237, "witness": <bytes> } ``` ### Storing Price/Proof for Accumulator+Height The validator sysvar can populare PDA's that are keyed by accumulator and height, so that when the price service requests prices the historical proofs are available for some amount of time before being purged. This would require coming up with a design for accounts so that we do not infinitely accumulate account state on the validator as we would end up generating thousands of accounts per second. One attractive option here is to allocate a range of accounts (Solana provides checks to make sure any address range we choose are safe) to act as a ring buffer. This would allow us to store only a fixed range of accounts to act as accumulator history.