# Computing the square root on Ben Eater's breadboard computer (upgraded RAM) ###### tags: `8bit` It's surprising how much power a simple computer architecture as the SAP-1 has. This is a report of my journey to create a program that computes the square root of any integer number $0 \leq S \leq 255$. Unfortunately this is impossible in just 16 bytes of RAM, so I will be using a version of Ben Eater's breadboard computer that has had its memory expanded to 256 bytes, following the guide of [/u/MironV](https://www.reddit.com/r/beneater/comments/h8y28k). However, we will not be using any extra instructions, flags or control lines. The code and an emulator to run it are available here: https://github.com/wmvanvliet/8bit/tree/ext_memory. ## The Babylonian method for computing the square root To method we will use to compute the square root is known as the [Babylonian method](https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method). In C, it goes like this: ```c uint8 S = 49; // number we want to compute the sqrt of uint8 x = S / 10; // initial guess of the sqrt if(x == 0) x = 1; // to prevent divide-by-zero errors uint8 x_prev = 0; // our previous guess, used to track convergence while(x != x_prev) { // check for convergence x_prev = x; x = (x + S / x) / 2; // this brings `x` closer to the sqrt } // `x` now contains the square root ``` The algorithm is easy enough to wrap your head around. However, there is a challenge. Our computer does not have a native divide instruction. We will have to make one. ## Program to divide a number by another number Ben Eater [demonstrated](https://youtu.be/Zg1NdPKoosU?t=1971) a program to multiply two numbers. In order to compute $a \times b$, we start with 0, then loop $b$ times, each time adding $a$ to the result. In C: ```c uint8 a = 2; // compute a * b uint8 b = 4; uint8 answer = 0; // answer is stored here while(b > 0) { answer += a; b--; } // `answer` now contains a * b ``` Division can be performed by [reversing the polarity](https://youtu.be/k1prJr9VIaY) of the multiplication algorithm. To compute $a / b$, we start with $a$, then enter a loop, each time subtracting $b$ from $a$ until $b$ no longer fits in $a$, that is $b > a$. The number of times we iterated the loop is the answer, and whatever is left of $a$ is the remainder of the division. In C: ```c uint8 a = 14 uint8 b = 2 uint8 answer = 0; while(b <= a) { a -= b; answer++; } // `answer` now contains a / b // `a` now contains the remainder ``` The tricky part here is determining whether $b \leq a$. We will do this by performing the subtraction and see if the result is negative. How do we check for a negative result? Many CPUs have a `neg` flag for this. Ours doesn't, but we do have a `carry` flag! Remember that subtracting is implemented in our ALU as addition with the twos-complement. For example, 200 - 100 is: ``` 11001000 binary representation of 200 10011100 twos-complement of 100 (01100100 in binary) ---------- + 1 01100100 carry flag is set! ``` And 100 - 200 is: ``` 01100100 binary representation of 100 00111000 twos-complement of 200 (11001000 in binary) ---------- + 0 10011100 carry flag is not set! ``` Whenever we tell our ALU to subtract two numbers, the `carry` flag will be set if the result was *positive*. Let's also check the edge case of the result being zero: ``` 01100100 binary representation of 100 10011100 twos-complement of 100 (01100100 in binary) ---------- + 1 00000000 carry flag is set! (and zero flag is set) ``` There's one final edge case: $0 - 0$. From the examples above, you might expect that the `carry` flag will not be set, but I've been simplifying things a bit. Our ALU computes the twos-complement by XOR-ing all the bits with 1, and setting the `carry-in` pin of the first adder. So the computation is actually: ``` 00000000 binary representation of 0 11111111 0 fed through the XOR gates 00000001 carry-in pin is set, which adds 1 ---------- + 1 00000000 carry flag is set! (and zero flag is set) ``` So, whenever our ALU computes $a - b$, the `carry` flag indicates $b \leq a$, and the absence of the flag indicates $b > a$. With this knowledge, we can write our division program: ``` ; ; Program that computes a / b ; loop: lda a sub b jc iter ; b <= a, iterate the loop jmp end ; a > b, end the loop iter: sta a ldi 1 add ans sta ans jmp loop end: lda ans out hlt a: db 14 b: db 2 answer: db 0 ``` This program takes up 15 bytes, so can also be performed by the original breadboard computer architecture. ## Implementing subroutines If you scroll back up to the C version of the square root program, you'll see that we need to divide three times. It would therefore be nice if we could implement our division program as a subroutine. Unfortunately, our breadboard computer does not have `call` and `ret` instructions. Or a stack. We'll have to get creative. The lack of a hardware stack is not really a problem. Normally, it is used to save/restore the state of the registers, but we only have the "A" register accessible from code, and we might as well put the return value of our subroutine in there. The stack is also used to pass parameters to the subroutine, but we can just put them somewhere in memory where our subroutine can find them. Our problem is the return address. At the end of the subroutine, the program needs to jump back to where it was called from. But we only have a `jmp` instruction that takes a hardcoded jump address as parameter. We'd like to perform an [indirect jump](https://en.wikipedia.org/wiki/Indirect_branch). We're going to have to resort to something that is normally considered a bad idea: override the hardcoded parameter of the `jmp` instruction at runtime by writing to its memory location. In the original architecture, a `jmp` instruction only takes one byte of memory: ``` 0110 1010 jmp 10 ``` In the upgraded memory architecture, we need a full byte for the memory address, so the `jmp` instruction takes two bytes of memory: ``` 0000 0110 jmp 0000 1010 10 ``` Conveniently, the parameter is now on a memory address of its own. Here is an example of a subroutine in action in the extended memory architecture ([example on the original architecture](https://github.com/wmvanvliet/8bit/blob/main/example_programs/subroutine.asm)): ``` ; ; Demonstrate calling a subroutine from two different locations. ; ldi cont1 ; set return address (cont1) sta sub_ret + 1 ; overwrite parameter of return jump instruction of subroutine ldi 1 ; setup argument for the subroutine jmp sub ; call the subroutine cont1: ldi cont2 ; set return address (cont2) sta sub_ret + 1 ; overwrite parameter of return jump instruction of subroutine ldi 2 ; setup argument for the subroutine jmp sub ; call the subroutine cont2: ldi 3 ; subroutine returns to here out hlt ; A subroutine that displays the current value of the A register sub: out sub_ret: ; this label indicates the address of the jmp instruction jmp 0 ; jump to the return address (param set at runtime) ``` The assembler is clever enough to be able to compute `sub_ret + 1` when resolving the labels. ## Putting it all together Now we can write a subroutine to perform division, and with that write the square root program. Here we compute $\sqrt{49} = 7$: ``` ; ; Compute the square root of a number, based on the Babylonian method. ; ; Compute inital guess of the sqrt (S / 10) lda S ; prepare for computing S / 10 sta numer ldi 10 sta denom ldi 0 sta ans ldi a ; setup return address sta div_ret + 1 ; address of the param of the jmp instruction jmp div ; call division subroutine a: lda ans ; subroutine returns here sta x ; this is our inital guess for the sqrt jz set_to_one ; guess should not be zero to prevent divide-by-zero errors jmp iter ; if not zero, start refining the guess set_to_one: ldi 1 sta x ; Refine the guess of the sqrt iter: lda S ; prepare for computing S / x sta numer lda x sta denom ldi 0 sta ans ldi b ; setup return address sta div_ret + 1 ; address of the param of the jmp instruction jmp div ; call division subroutine b: lda ans ; subroutine returns here add x ; prepare for computing (x + S / x) / 2 sta numer ldi 2 sta denom ldi 0 sta ans ldi c ; setup return address sta div_ret + 1 ; address of the param of the jmp instruction jmp div ; call division subroutine c: lda ans ; subroutine returns here sta x out ; display our refined guess sub x_prev ; check for convergence jz end lda x ; not converged yet. x_prev = x and loop sta x_prev jmp iter end: hlt ; Subroutine for division div: lda numer ; start computing numer / denom sub denom jc div_iter div_ret: jmp 0 ; we are done, jump to return address div_iter: sta numer ldi 1 ; increment `ans` by 1 add ans sta ans jmp div ; loop ; Variables S: db 49 ; number of compute sqrt of x: db 0 ; current guess of the square root x_prev: db 0 ; used to track convergence numer: db 0 ; first parameter for division subroutine denom: db 0 ; second parameter for division subroutine ans: db 0 ; result of the division subroutine ```