Can LLMs be PROVABLE computers?

# Can LLMs be PROVABLE computers? Percepta published a paper recently. Most people missed it. The title was "Can LLMs Be Computers?" The answer was yes. Not in the hand-wavy "LLMs can do math sometimes" sense. In the literal sense. They compiled a WASM interpreter into transformer weights. The forward pass became a CPU. Attention became memory access. Feed-forward layers became instruction execution. The transformer stopped predicting tokens and started computing deterministically. Same input, same output, every time. Karpathy praised it. Hacker News debated it. The AI community moved on. But nobody asked the follow-up question. If a transformer is a computer, how do you know it computed correctly? ## I. Trust the compiler The uncomfortable truth about Percepta's work. The weights are not learned. They are compiled from a WASM interpreter. The model does not train on data. It executes programs by mechanical application of its compiled weights. This is brilliant. It is also a trust assumption. You are trusting the compiler, the runtime, the attention mechanism, the feed-forward layers. If any of these are wrong, the output is wrong. And you have no way to check except by re-running the program yourself. This is the oldest problem in computing. You either trust the machine or you verify the output. "Trust" has never scaled. When a transformer claims to compute Fibonacci(8) = 21, we just... believe it? ## II. What verification actually means There are three levels of confidence you can have that a computation happened correctly. **Level 1: It ran.** The program executed inside the system and produced an output. This is what Percepta demonstrated. The transformer is the computer. It runs programs. You see the result. But seeing a result is not the same as knowing it is correct. **Level 2: It ran the same way everywhere.** The same compiled program produces the same result across four independent runtimes: transformer, native interpreter, Burn tensors, ONNX. Four different implementations of the same semantics, all agreeing. Stronger. But still a proof by consensus, not a proof by math. **Level 3: A verifier checked it without re-running it.** The execution produces a cryptographic proof. A standalone verifier, which never ran the program, checks that proof. It does not trust anyone. It checks the math. This is a STARK proof. Transparent (no trusted setup), post-quantum (hash-based, no elliptic curves), and verification is O(log² n): exponentially cheaper than re-execution. Level 3 is what we built. ## III. The smallest possible demo The conventional way an LLM handles "what is 3 + 5": ``` LLM writes "python -c 'print(3+5)'" → external interpreter returns 8 ``` The model specified the computation. It did not execute it. The transformer-as-computer way: ```asm .memory 4 LOADI 5 ADD 3 HALT ``` Three instructions. The transformer compiles these into its weights and executes them. The forward pass is the computation. ``` engine: transformer steps: 3 halted: true acc: 8 ``` The provable computer way. Same program. Same execution. One more step: ``` prove-stark addition.tvm → proof (70,057 bytes) verify-stark proof → verified: true ``` 70 KB of cryptographic evidence that the execution was correct. A verifier that never ran the program, never saw the internal state, checks the proof and confirms: yes, this is right. Three regimes. External computation, internal computation, externally verifiable computation. ## IV. Fibonacci: where it gets real Addition is a toy. No loops. No memory mutation. So let us run a real program. Fibonacci(8) = 21. A loop that iterates, mutates memory cells, compares a counter to a target, and halts when the sequence is complete. ```asm .memory 5 .init 0 0 ; F(n-2) .init 1 1 ; F(n-1) .init 3 0 ; loop counter .init 4 7 ; iteration target loop: LOAD 3 SUBM 4 JZ done LOAD 0 ADDM 1 STORE 2 LOAD 1 STORE 0 LOAD 2 STORE 1 LOAD 3 ADD 1 STORE 3 JMP loop done: LOAD 1 HALT ``` This is a program. Loop, conditional branching, memory reads and writes. It runs for 103 steps inside the transformer. ``` engine: transformer steps: 103 halted: true acc: 21 memory: [13, 21, 21, 7, 7] throughput: 10,517 steps/sec ``` The transformer did not predict 21. It computed it, instruction by instruction, through its forward pass. Four independent engines confirm it: ``` verified_all: true verified_all_steps: 103 verified_all_engines: transformer, native, burn, onnx ``` Same program. Same 103 steps. Same answer. Lockstep differential verification. Now prove it: ``` prove-stark fibonacci.tvm → proof (159,007 bytes) verify-stark proof → verified: true ``` 159 KB. A standalone cryptographic proof that a 103-step Fibonacci computation was executed correctly. ## V. This is a real transformer Obvious objection: "You wrote a Rust program that executes instructions. Where is the transformer?" The execution loop calls `model.step()` at every cycle. The model encodes the current machine state into a d=36 dimensional token. Attention heads, backed by a hull-based KV cache, perform memory reads over 2D point sets. Each memory address maintains a write history as `(step, value)` pairs. The query direction `[1, 0]` selects the most recent write via argmax over the convex hull. Feed-forward layers implement compiled instruction logic as gated linear transformations. Same architecture Percepta describes. 2D attention over convex hulls. Compiled weights in the FFN. The difference is that we then take the execution trace and prove it. The four-engine verification exists precisely to prevent the "this is just Rust" objection. If the result came only from our transformer runtime, you could argue it is cosplaying as a transformer. But the same compiled program runs through: 1. The transformer runtime (encode, attend, FFN loop with hull-backed memory) 2. A native interpreter (direct ISA semantics, no transformer structure) 3. Burn tensors (same compiled weights, different tensor framework) 4. ONNX/Tract (exported model in a portable neural network format) All four agree at every step. It is not trapped inside custom Rust structs. ## VI. Recursion: this is really a computer Loops and memory are necessary but not sufficient. A real computer handles subroutines. Function calls. Stack frames. Recursion. Recursive factorial. 5! = 120. ```asm .memory 11 LOADI 5 CALL fact HALT fact: JZ fact_base PUSH SUB 1 CALL fact STORE 0 POP MULM 0 RET fact_base: LOADI 1 RET ``` CALL pushes a return address onto the stack and jumps. RET pops it and returns. PUSH and POP manage the accumulator. The function calls itself recursively, unwinding the stack as it multiplies 5 * 4 * 3 * 2 * 1. ``` engine: transformer steps: 46 halted: true acc: 120 ``` 46 steps. Five recursive calls. Stack frames pushed and popped. All inside the forward pass. ``` prove-stark factorial_recursive.tvm → proof (206,581 bytes) verify-stark proof → verified: true ``` Recursive function calls inside a transformer, with a cryptographic proof. You can check it yourself. ## VII. Why the trace is a natural proof witness A STARK proof takes an execution trace, a table of machine states where each row follows deterministically from the previous one, and proves that this table satisfies a set of polynomial constraints called an Algebraic Intermediate Representation (AIR). Look at what a transformer-computer's execution trace already is. A table of `(PC, ACC, SP, flags, memory)` rows. Each row is the machine state after one step. Each row follows from the previous by the semantics of whatever instruction the program counter points to. The transition constraints of the AIR are the instruction semantics. The boundary constraints are the initial and final states. The trace is not something you retrofit. It falls directly out of the computation. ``` question / task → compiled .tvm program → transformer execution (encode → attend → FFN) → equivalence checks (native / burn / onnx) → canonical VM trace → AIR (polynomial constraints from instruction semantics) → STARK proof (FRI + Merkle commitments) → external verifier (O(log² n), no re-execution) ``` The computation happens inside the transformer. The trace becomes the witness. The STARK proves the witness satisfies the constraints. The verifier checks the proof without re-executing anything. Provability was not bolted on. It falls out of how execution works. ## VIII. What this does not prove The STARK does not prove transformer activations directly. It proves the VM execution relation: that state transitions follow instruction semantics. The connection to the transformer goes through lockstep equivalence across four engines. The canonical execution is then proven with the STARK. Engineering proof: transformer matches canonical semantics. Mathematical proof: canonical execution is cryptographically correct. Two links in a chain. I am not claiming we prove neural activations. That is a different problem. The ISA is custom, not WASM. Twenty-six instructions. Arithmetic, memory, stack, control flow. Enough for real programs. Not a general-purpose frontend. The proof is transparent, not zero-knowledge. The claim is public: program, attention mode, step count, final state. You cannot prove computation over private inputs yet. Eighteen of twenty-six instructions are proof-backed today. The `average-hard` attention mode is the only one the STARK supports. Bitwise and compare instructions execute but are not yet in the AIR. ## IX. Where this goes Today this is a research system. A custom ISA. A vanilla STARK prover. Programs that fit in 256 instructions. 128-bit field, Blake2b Merkle trees, FRI for the low-degree test. The vanilla STARK prover gets replaced by STWO, StarkWare's production Circle STARK over Mersenne prime fields. Faster proving. The proof-supported instruction set expands with a general purpose ZKVM system such as Cairo. Zero-knowledge becomes possible. Private inputs. You prove you computed the right answer without revealing what the inputs were. Transformers become computers. Execution traces become proof witnesses. STARKs verify the computation. The gap between "the model says so" and "the math proves it" closes. ## X. The punchline A computer that cannot be verified is just a black box with extra steps. You gain determinism. You do not gain trust. Testing tells you "it worked this time." Proofs tell you "it is correct." When a transformer executes a program and produces a STARK proof, that proof is a permanent, independently verifiable certificate. Anyone can check it on any machine without the program, the transformer, or your word. They verify the math and they know. Percepta asked: can LLMs be computers? We are asking: can those computers be provable? For a supported deterministic subset today, yes. The program executes inside the transformer. The trace becomes a STARK witness. The proof verifies outside the model. A transformer that computes. A STARK that proves it. That is the foundation. Everything else is engineering. --- **Repository:** [github.com/AbdelStark/llm-provable-computer](https://github.com/AbdelStark/llm-provable-computer)