# Understanding the PVM: A Step-by-Step Guide ## Overview This guide explains the internal workings of the PVM. We will break down its core components, how instructions are structured, and how execution works. Finally, we will build a simple program step by step. So Let's go <iframe src="https://giphy.com/embed/0LuGjkmTRa2jfDpOvR" width="200" height="200" style="" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/firstwefeast-uncle-roger-hot-ones-versus-vs-nick-digiovanni-0LuGjkmTRa2jfDpOvR">via GIPHY</a></p> --- ## 1. What is the PVM? The PVM (Polkadot Virtual Machine) is defined as a function: ``` pvm(binary_blob, counter, gas, registers, memory) => (exit_reason, counter_, gas_, registers_, memory_) ``` - `registers` is a list of 13 64-bit integers. - `memory` with simplification can be tought about as a list of bytes (values up to 255) with a length of `2^32` (0x_1000_0000). The binary input to the PVM follows this structure: ``` E(∣j∣) ⌢ E1 (z) ⌢ E(∣c∣) ⌢ Ez (j) ⌢ E(c) ⌢ E(k), ∣k∣ = ∣c∣ ``` From the GP: > *"The program blob p is split into a series of octets which make up the instruction data c and the opcode bitmask k as well as the dynamic jump table, j..."* --- ## 2. Instruction Data, Bitmask and Jump Table ### C: Instruction Data This is the sequence of instructions the PVM executes. Examples include: - Load a number into register 9. - Read `n` bytes from memory and load them into register 3. - Add the values in registers 9 and 3 and store the result in register 2. ### K: Bitmask The bitmask tells us where instructions and arguments are located. Example: ```elixir <<1,0,0,0,1,0,0,1,1,0,0,0,0,1>> ``` The `1`s mark instruction start points, while the `0`s mark arguments. From this, we can see: - The program has 14 bytes. - Instructions are at indices: `[0,4,7,8,13]`. - After executing the instruction at index 0, we `skip` by 4 in order to reach the next instruction ### J: Jump Table The jump table contains predefined jump locations. Example loop: ```elixir program = [load, 7, 10, load, 8, 5, reduce, 7, 1, jump_gt, 7, 8, 6] bitmask = [1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0] ``` How it works: 1. Load `10` into register 7 (`w7 = 10`). 2. Load `5` into register 8 (`w8 = 5`). 3. Decrease `w7` by 1 (`w7 -= 1`). 4. If `w7 > w8`, jump to index 6. This is equivalent to: ```js const w7 = 10; const w8 = 5; while (w7 > w8) { w7 -= 1; } ``` --- ## 3. How the PVM Executes Instructions ### 3.a Single Step Transition ### ![image](https://hackmd.io/_uploads/BkQloKT5yg.png) This step executes a single instruction: The function takes: 1. Instruction sequence (`c`). 2. Bitmask (`k`). 3. Jump table (`j`). 4. Counter (`i` - index of the current instruction). 5. Gas. 6. Registers. 7. Memory. The result includes: 1. **Exit reason** (continue, halt, or error). 2. **Updated state** (counter, gas, registers, memory). **Unless** specified otherwise: - The counter moves to the next instruction (`next 1 bit in k`). - Gas is deducted. - Registers and memory remain unchanged. ![image](https://hackmd.io/_uploads/SkllnYT91g.png) ### 3.b Basic PVM Definition ### ![image](https://hackmd.io/_uploads/BJQ7k5Tc1e.png) The function takes in the program blob, counter, gas, registers, and memory. It executes Ψ1, the single-step transition, to yield: `{ı', ϱ', ω', µ'} - (counter, gas, registers, memory)` we will mark this as **`state_`** and then: - If the exit reason is ▸, recursively call itself with the updated state. - If ϱ′ (gas after Ψ1) < 0, return `(∞(:out_of_gas), state_)`. - If the exit reason is panic or halt, return `(exit_reason, state_)`, except `i' = 0`. - Otherwise, return `(exit_reason, state_)`. And that's almost all there is to know about the basic PVM The rest, is the table of all the supported operations, in section [A.5](https://graypaper.fluffylabs.dev/#/5f542d7/255700255700) ## 4. Building a Sample Program ## To continue our understanding, let's build a most basic program that: 1. Loads two numbers into registers. 2. Sums them. 3. Writes the result into memory. Let's Begin: In order to load a number into a register, we use instruction 51: ![image](https://hackmd.io/_uploads/HkZfMq651x.png) Look at 51. It says that the instruction should place the value `v_x` into `w_a`, which is the value of the register whose index is `r_a`. Let's say we want to put 10 into register 7. We need: `r_a == 7` `v_x == 10` So our instructions start as <<51>>. How would we specify that we want to load into register 7? Like this: ``<<51,7>>`` Now let's look at `l_x` to undersatnd `l_x` let take a little detour and go over `skip` ### 4.a Skip ### ![image](https://hackmd.io/_uploads/ByckWlR5yx.png) in pseudo-code it would look like this: ```elixir! def skip(i, bitmask) do n = first j such that bitmask[i + 1 +j] == 1 min(24, n) end ``` #### Examples: `bitmask = <<1,0,1,0,0,0,1,0>>` `skip(2, bitmask) == 3` what about `skip(6, bitmask)`? this is why we append 1's in the end `skip(6, bitmask) == 1` for me the simplest way to think about it is that skip counts the number of 0 until the next instuction, up to 24. Now we return to our sample program ### 4 - continue ### so we had `l_x = min(4, max(0, l - 1)` l is the skip distance [A.19](https://graypaper.fluffylabs.dev/#/5f542d7/255c00255c00) (remember, number of 0 until the next instruction) so if we build our load op like this: ```elixir! c = <<51, 7, 10>> k = <<1,0,0>> # and then: # l == 2 => l_x == 1 => v_x = c[1..+1] |> decode_little_endian(1) |>sign_extend_to_64bits == 10 (0x0A) |> sign_extend == 0x0000_0000_0000_000A ``` withi this instruciton we have loaded the value `10` (in 64 bits) into register 7 what about larger numbers you may ask? no problem, up to 32 bits (l_x can take maximum value of 4) so we could have: `<<51, 7, 0x01,0x20,0x0A,0x00>>` with bitmask (you guessed it) `<<1,0,0,0,0,0>>` skip will be 5 => `l_x` will be 4 and so ```elixir v_x = c[1...+4] |> decode_liitle_integer(4) |> extend_to_64_bits => v_x = 0x0000_0000_000A2001 ``` --- Great! Now let's load 5 into register 8! We know how to do that already Now our program will become ```elixir! c = <<51,7,10,51,8,5>> k = <<1, 0, 0, 1, 0, 0>> # Hex: c = <<0x33, 0x07, 0x0A, 0x33,0x08, 0x05>> ``` Next, we sum them and save the result into register 1. How to do that? Going through the instruction table we find the instruction add_32: going through the instruction table we find the instruction *add_32* ![image](https://hackmd.io/_uploads/BJke8cpcye.png) See how `r_a` and `r_b` are both specifed using the first byte after the instruction code `r_b` will use the top 4 bits and `r_a` the bottom 4 bits i.e the number `0x87` will yield `r_a = 7`, `r_b = 8` let's save the result into `r_1` so our instruction is: `<<190, 0x87, 1>>` and in total our program is now ```elixir! c = <<51,7,10,51,8,5,190, 0x87, 1>> k = <<1, 0, 0, 1, 0, 0, 1, 0, 0>> ``` finally lets save it to memory: ![image](https://hackmd.io/_uploads/ryoyms6qye.png) ![image](https://hackmd.io/_uploads/H1Ox7spc1x.png) the instruction will be: `<<59, 1, 0x00,0x10,0x02`>> this will store `w_1` into address `0x2_1000` so our final program is ```elixir! c = <<51, 7, 10, 51, 8, 5, 190, 135, 1, 59, 1, 0, 16 ,2>> #hex c_hex = <<0x33, 0x7, 0xA0, 0x33, 0x8, 0x5, 0xBE, 0x87, 0x1, 0x3A, 0x1,0x0, 0x10, 0x2>> k = <<1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0>> ``` ## 5. Finalizing the Executable Binary ## We don’t need a jump table, so we structure the final binary as: `E(∣j∣) ⌢ E1 (z) ⌢ E(∣c∣) ⌢ Ez (j) ⌢ E(c) ⌢ E(k) , ∣k∣ = ∣c∣` : ```elixir <<0, 1, 14, 51, 7, 10, 51, 8, 5, 190, 135, 1, 59, 1, 0, 16 ,2>> ^ e(k) ``` Before adding k, pad it to the closest multiple of 8: we then concat it to the rest of the bianry as **Bytes** not **Bits** ```elixir! k_padded = <<1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0>> <> <<0, 0>> 8bits_chunks = <<1, 0, 0, 1, 0, 0, 1, 0>> <> <<0, 1, 0, 0, 0, 0, 0, 0>> bimmask_byte == <<146, 64>> ``` and the *(almost)* ready to go binary is: ```elixir <<0, 1, 14, 51, 7, 10, 51, 8, 5, 190, 135, 1, 59, 1, 0, 16 ,2, 146, 164>> ``` ### 5.a Why *(almost)* ?? althogh not specified in the GP, the accepted way to encode the bitmask is to reverse the bits of each byte ```elixir! bitmask = 0b10010010_01000000 revresed = 0b01001001_00000010 revereed_bytes = <<73, 2>> ``` The Final executable program is : ```elixir <<0, 1, 14, 51, 7, 10, 51, 8, 5, 190, 135, 1, 59, 1, 0, 16 ,2, 73, 2>> #as hex: <<00, 01,0e, 33,07, 0a, 33, 08, 05, be, 87, 01, 3b, 01,00 10, 02, 49,02 ``` Run it here:[PVM Execution](https://pvm.fluffylabs.dev/?program=0x00010e33070a330805be87013b010010024902#/ With this, you've now successfully built and executed a basic PVM program! You should have a solid grasp of how instructions are structured, how they interact with registers and memory, and how the PVM processes execution step by step. But this is just the beginning. The Rabbit Hole goes deeper... ## Up Next: Expanding PVM Functionality ## On the Next Post, we will explore - Host functions invocation - Argument invocation - and finally the functions that the JAM runtime actully uses: a. is-authorized b. refine c. accumualte d on transfer --- Stay tuned for more insights! If you're enjoying this deep dive, let's continue the discussion on Twitter([@luke_fishman](https://x.com/luke_fishman)) and share your thoughts and questions. 📢 Follow [@jamixir](https://x.com/jamixir) for the latest updates, discussions in the journy to build JAM on Elixir