Try   HackMD

Building a Blockchain from Scratch

NOTE: work in progress, submitted as unfinished, for the finished article please check the linked source. This will be left as it was submitted.

Motivations for Building a Blockchain from Scratch:

  • Understanding
  • Explaining to non-technical people why blockchain development takes so much time (abstraction layers)
  • Explaining to technical people

Planned parts of this series

  • PART 1 - Building the vm
  • PART 2 - Building the extended vm (storage, transactions)
  • PART 3 - Consensus algorithm component
  • PART 4 - Compiler to translate to bytecode that can run on vm
  • PART 5 - Peer to peer compononent for transactions and block propagations
  • PART 6 - Client component to interact with the chain
  • PART 7 - Transaction explorer
  • PART 8 - Wallet primitives as library
  • PART 9 - Blockchain principles and characteristics, how well done (opennnes, decentralization, global, immutability, finality, etc)

Part 1 - Building the VM

The VM (Virtual Machine) component in blockchain systems. Its main purpose is to execute the code that triggers state transitions within the blockchain. It acts as a computational engine, responsible for determining whether the code can run successfully or not.

Think of the VM as a machine that enforces the rules and restrictions of the blockchain. When code is executed by the VM, it checks if the code is allowed to run based on predefined conditions. If the code complies with the rules, the state of the blockchain is updated accordingly, reflecting the changes caused by the code execution.

However, if the code violates any of the predefined rules or runs out of gas (meaning it exceeds the allocated computational resources), the VM halts the execution. In such cases, the code may revert, meaning the changes made to the state are undone, and an error or exceptional condition is triggered.

For this part, the VM will the following attributes:

  • single threaded (no time slice)
  • no scheduling
  • deterministic machine, (no random value, finite operations through the amount of gas available)
  • atomic (either fails or succeeds)
  • stack-based architecture (word size of 256 bits)
  • immutable program code, loaded with the bytecode of the smart contract to be executed
  • volatile memory
  • set of environment variables and data

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Note: In this context will be meaning Extended VM and not Ethereum VM. (Though it is building an Ethereum like virtual machine)

Example of what we are trying to build

to write a simple interpreter that does simple arithmetic and bitwise logic operations is pretty straightforward

main.rs

enum Bytecode {
    // Define the bytecode instructions
    Add,
    Sub,
    Mul,
    Div,
    Push(i32),
    Halt,
}

fn interpreter(bytecode: Vec<Bytecode>, gas_limit: u32) {
    let mut stack: Vec<i32> = Vec::new();
    let mut pc: usize = 0;
    let mut gas_left = gas_limit;

    while pc < bytecode.len() && gas_left > 0 {
        match bytecode[pc] {
            Bytecode::Add => {
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a + b);
                gas_left -= 1;
            }
            Bytecode::Sub => {
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a - b);
                gas_left -= 1;
            }
            Bytecode::Mul => {
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a * b);
                gas_left -= 1;
            }
            Bytecode::Div => {
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a / b);
                gas_left -= 1;
            }
            Bytecode::Push(value) => {
                stack.push(value);
                gas_left -= 1;
            }
            Bytecode::Halt => {
                break;
            }
        }
        pc += 1;
    }

    if gas_left == 0 {
        println!("Out of gas!");
    } else if let Some(result) = stack.pop() {
        println!("Result: {}", result);
    } else {
        println!("Stack is empty!");
    }
}

fn compile(bytecode_str: &str) -> Vec<Bytecode> {
    let bytecode = bytecode_str
        .split(',')
        .map(|opcode| match opcode.trim() {
            "ADD" => Bytecode::Add,
            "SUB" => Bytecode::Sub,
            "MUL" => Bytecode::Mul,
            "DIV" => Bytecode::Div,
            "HALT" => Bytecode::Halt,
            inst => {
                let value = inst.replace("PUSH ", "");
                if let Ok(num) = value.parse::<i32>() {
                    Bytecode::Push(num)
                } else {
                    panic!("Invalid bytecode instruction: {}", value);
                }
            }
        })
        .collect();
    bytecode
}

fn main() {
    // Example bytecode as a string
    let bytecode_str = "PUSH 5, PUSH 3, ADD, HALT";

    let gas_limit = 100;
    let bytecode = compile(bytecode_str);
    interpreter(bytecode, gas_limit);
}

In this example we wrote a machine that allows you do do arithmetic operations: addition, sub, divisions, multiply without stack overflow or underflow in a manner that each operation is metered through the gas. This minimal machine does not have

  • minimal error handling
  • required instructions set
  • control flow operations, we cannot branch or call code with this yet
  • there is no blockchain specific functions
  • though it's deterministic, it's very minimal

Supported Operations

Our blockchain is going to support the following operations. We go through and add support for each group as needed.

The VM instruction set offers most of the operations you might expect, including:

  • Arithmetic and bitwise logic operations
  • Execution context inquiries
  • Stack, memory access
  • Control flow operations
  • Logging, calling, and other operators

Arithmetic operations

  • ADD //Add the top two stack items
  • MUL //Multiply the top two stack items
  • SUB //Subtract the top two stack items
  • DIV //Integer division
  • SDIV //Signed integer division
  • MOD //Modulo (remainder) operation
  • SMOD //Signed modulo operation
  • ADDMOD //Addition modulo any number
  • MULMOD //Multiplication modulo any number
  • EXP //Exponential operation
  • SIGNEXTEND //Extend the length of a two's complement signed integer
  • KECCAK256 // compute the hash of a block of memory

The operations: ADD, MUL, SUB, DIV, SDIV, MOD, SMOD common stack operations that most people are familiar, the only difference is that is these operations performed on 256 bit stack.

ADDMOD, MULMOD, KECCACK256 operations to support cryptograpy.

System operations

  • LOGx
  • CREATE
  • CALL
  • CALLCODE
  • RETURN
  • STATICCALL
  • REVERT
  • INVALID
  • SELFDESTRUCT

Environmental operations

  • GAS
  • ADDRESS
  • BALANCE
  • ORIGIN
  • CALLER
  • CALLVALUE
  • CALLDATALOAD
  • CALLDATASIZE
  • CALLDATACOPY
  • CODESIZE
  • CODECOPY
  • GASPRICE
  • EXTCODECOPY
  • EXTCODESIZE
  • RETURNDATASIZE
  • RETURNDATACOPY

Block operations

  • BLOCKHASH
  • COINBASE
  • TIMESTAMP
  • NUMBER
  • DIFFICULTY
  • GASLIMIT

Logic operations

  • LT //Less-than comparison
  • GT //Greater-than comparison
  • SLT //Signed less-than comparison
  • SGT //Signed greater-than comparison
  • EQ //Equality comparison
  • ISZERO
  • AND //Bitwise AND operation
  • OR //Bitwise OR operation
  • XOR //Bitwise XOR operation
  • BYTE //Retrieve a single byte from a full-width 256-bit word

Common vm logic operations. Please check the source code for implementations details.

Stack, memory operations

  • POP //Remove the top item from the stack
  • MLOAD //Load a word from memory
  • MSTORE //Save a word to memory
  • MSTORE8 //Save byte to memory
  • SLOAD // Load a word from storage
  • SSTORE // Save a word to storage
  • MSIZE // Get the size of the active memory in bytes
  • PUSHx // Place x byte item on the stack, where x cab be any integer from 1 to 32
  • DUPx // Duplicate x-th stack item, where x cab e any integer from 1 to 16
  • SWAPx // Exchange 1st and (x+1)-th item, where x can be any integer

These are also common vm operations.

Process flow operations

  • STOP
  • JUMP
  • JUMPI
  • PC
  • JUMPDEST

Common process flow operations. Check source code.

Plan

To implement the above operations we need to introduce and refactor our minimal applications and add the following

  • Stack: we need to introduce a Stack data structure with 256bit of word size
  • Memory: we need to abstract the memory and define what it is
  • Instructions: we have a long list of instruction set to support, abstracting them would be nice
  • Contract: volatile ROM for accessing contract code
  • Access to the world state, account/contract storage, and execution environment variables
  • we are not going into opcodes, the current implementation still going to use string based compilation
  • Add control flow ready interpreter, current doesn't allow jumps

interpreter

the looop

instruction calling

Operations will fall into the following categories stack operations

pub fn operation( pc: &mut u64, interpreter: &mut interpreter::Interpreter, scope: &mut interpreter::ScopeContext, ) -> Result<Vec<u8>, Error> { let x = match scope.stack.pop() { Some(value) => value, None => return Err(Error::StackUnderflow), }; let y = match scope.stack.peek() { Some(value) => value, None => return Err(Error::StackUnderflow), }; if let Some(result) = do_something_with_operands(x, y) { *y = result; Ok(Vec::new()) } else { Err(Error::Overflow) } }

memory operation interpreterer.statedb operation (accessing world state) scope.contract scope.memory other parts that use the ScopeContext

gas

In this implementation gas is static for all call, so this is not providing the right economic security model, and it's gameable. We update the gas implemenation in another part of this series.

context

  • stack
  • memory
  • contract

extended vm context

the following operation will access the extended context

contract

Implementation

TODO

Notes about VM

  • security considerations when it comes to vm implementation
  • Stack-based architecthure to hybrid register based ones
  • how to handle EIPs (practice)
  • how to handle backward and forward compatibility
  • supporting different chains and their specific instructions sets
  • constant gas vs dynamic gas
  • error handling, error codes