PVM State Serialization

# PVM State Serialization This document develops a canonical PVM state serialization for JAM implementers to improve their implementations to be maximally GP compliant. ### Motivation In order to compare large amounts of PVM traces between teams precisely, JAM Implementers can use an easily implemented hashing function to hash PVM state for teams to quickly verify they end same answer at the end of some PVM execution, and if they do not, be able to quickly determine which PC in some PVM trace they differed in results. This can be used for PVM traces whether for authorizers or services refine/accumulate/transfer entrypoint. JAM implementers may log a *PVM Hash* in their PVM traces *after* each instruction execution: ``` DEBUG[02-26|01:05:52.162] 196: PC 703 LOAD_IND_U64 g=8991 pvmHash=1521..a9d3 reg="[8 4278058800 0 128 0 1065941251 4278124591 1065941251 500000 100 4278058824 4278058952 0]" ``` or host function call: ``` DEBUG[02-26|01:05:52.162] TRANSFER pvmHash=bd39..8a77 sender=0 receiver=1065941251 amount=500000 gaslimit=100 ``` If there is a disagreement about the correct execution of either an ordinary instruction or host function call, teams can then request a dump of the PVM state and isolate which component of a PVM interpreter implementation. ### Two stage execution To make this easy for JAM implementers to implement quickly, we recommend proceeding in two stages: * v1 (0.6-0.8/Spring 2025): - [13 64-bit registers](https://graypaper.fluffylabs.dev/#/5f542d7/235700235700) $\omega_{0 \ldots +13} \in \mathbb{N}_R$ - 64-bit gas counter $\varrho$ - all PVM memory pages $\mu$ (4 byte page index and 4096 byte page), [initialized (see A.7)](https://graypaper.fluffylabs.dev/#/5f542d7/2b27022b2702), read or written to * v2 (0.9-1.0/Summer 2025): all of the above _and_ at least the following which goes beyond PVM State per se: - both contexts ${\bf X}$ and ${\bf Y}$ - all exceptions $\varepsilon$ and others (see [here](https://graypaper.fluffylabs.dev/#/5f542d7/239c00239c00)) - treatment of inner PVM invocation - memory accessibility per page (?) - program counter $\iota$ (?) The idea for v1 is to have something implementable in a few hours while teams attempt to converge on the effects of host functions in a M1/M2 time frame. The idea for v2 is to extend the hash to go beyond PVM state alone. ### Status This PVM Hash concept is not part of JAM protocol, will not be used to GP, and is purely for JAM Implementers to detect issues for PVM host function implementations. As of late Feb 2025, this is just a draft, but its believed that a PoC with agreement between multiple JAM implementers can be achieved in March 2025. If you are a JAM implementer and would like to refine this PVM Hash concept, feel free to join [JAM Testnet on TG](https://t.me/jamtestnet) and suggest your changes or add comments to the doc. ### v1 PVM Hash The basic components to form a *v1 PVM Hash* are: * Register encoding $\omega_i$ : ${\cal E}_8(\omega_i)$ * Gas counter encoding: ${\cal E}_8(\varrho)$ * Page $\nu_p$ of Memory $\mu_{o\ldots+4096}$: ${\cal E}_4(\nu_p) \frown {\cal E}(\mu_{o\ldots+4096})$ It is clearly wasteful to have PVM Hash that hashes the contents of all $2^{32}$ bytes. The above assumes JAM implementers organize memory in 4096-byte pages, where $2^{32}$ bytes of $\mu$ may have $2^{22}=4,194,304$ pages. (Example: The contents of $\mu$ from `0xFFFF0000` to `0xFFFF0FFF` would be at page $\nu_p$ = `0xFFFF0` ) Since only a tiny fraction of the pages are initialized, read or written to with non-zero values, a PVM Implementation would only have just those pages to hash, and the hash of the memory should involve just those pages. An implementation can simply order these page to have the encoding subset of memory $\mu$: ${\cal E}(\nu) = {\cal E}(\nu_{p_i}) \frown \ldots \frown {\cal E}(\nu_{p_j})$ The PVM Hash is then simply: ${\cal H}({\cal E}_8(\omega_0) \frown \ldots \frown {\cal E}_8(\omega_{12}) \frown{\cal E}_8(\varrho) \frown {\cal E}(\nu))$ For any pvm hash generated in a reproducible PVM execution (as part of a JAM State Transition), PVM Implementations should be able to dump the full contents of the above in hex form to enable reasoning about the effects of an instruction or host function call. The hex content the dump should be copy-paste-able into a [blake2b hash tool](https://emn178.github.io/online-tools/blake2b/) to compute the PVM hash, and simple [text comparison](https://text-compare.com/) should enable identifying the key page difference and precise memory difference (an additional or missing page, etc.). In contrast, registers and gas counters may need the above text comparison if they are already visible in a PVM execution trace (see [jamtestnet/assurances.txt](https://github.com/jam-duna/jamtestnet/blob/main/assurances.txt#L58). #### Test case [Instruction test case](https://github.com/jam-duna/jamtestnet/blob/main/assurances.txt#L59) - Sourabh to provide ### v2 PVM Hash #### Context Encoding (for accumulate) For both ${\bf x}$ and ${\bf y}$ context, we require a full encoding specification: * ${\cal E}({\bf x} \in {\bf X}) = \ldots$ ![image](https://hackmd.io/_uploads/rJPl203q1g.png) The includes a complex partial state ${\bf u}$: ![image](https://hackmd.io/_uploads/Hyl26An9Jg.png) * ${\cal E}({\bf u} \in \mathbb{U}) = \ldots$ #### Memory Accessibility TBD #### Exception handling TBD