# Computing the square root on Ben Eater's breadboard computer (upgraded RAM)
###### tags: `8bit`
It's surprising how much power a simple computer architecture as the SAP-1 has.
This is a report of my journey to create a program that computes the square root of any integer number $0 \leq S \leq 255$.
Unfortunately this is impossible in just 16 bytes of RAM, so I will be using a version of Ben Eater's breadboard computer that has had its memory expanded to 256 bytes, following the guide of [/u/MironV](https://www.reddit.com/r/beneater/comments/h8y28k).
However, we will not be using any extra instructions, flags or control lines.
The code and an emulator to run it are available here: https://github.com/wmvanvliet/8bit/tree/ext_memory.
## The Babylonian method for computing the square root
To method we will use to compute the square root is known as the [Babylonian method](https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method).
In C, it goes like this:
```c
uint8 S = 49; // number we want to compute the sqrt of
uint8 x = S / 10; // initial guess of the sqrt
if(x == 0)
x = 1; // to prevent divide-by-zero errors
uint8 x_prev = 0; // our previous guess, used to track convergence
while(x != x_prev) { // check for convergence
x_prev = x;
x = (x + S / x) / 2; // this brings `x` closer to the sqrt
}
// `x` now contains the square root
```
The algorithm is easy enough to wrap your head around.
However, there is a challenge.
Our computer does not have a native divide instruction.
We will have to make one.
## Program to divide a number by another number
Ben Eater [demonstrated](https://youtu.be/Zg1NdPKoosU?t=1971) a program to multiply two numbers.
In order to compute $a \times b$, we start with 0, then loop $b$ times, each time adding $a$ to the result.
In C:
```c
uint8 a = 2; // compute a * b
uint8 b = 4;
uint8 answer = 0; // answer is stored here
while(b > 0) {
answer += a;
b--;
}
// `answer` now contains a * b
```
Division can be performed by [reversing the polarity](https://youtu.be/k1prJr9VIaY) of the multiplication algorithm.
To compute $a / b$, we start with $a$, then enter a loop, each time subtracting $b$ from $a$ until $b$ no longer fits in $a$, that is $b > a$.
The number of times we iterated the loop is the answer, and whatever is left of $a$ is the remainder of the division.
In C:
```c
uint8 a = 14
uint8 b = 2
uint8 answer = 0;
while(b <= a) {
a -= b;
answer++;
}
// `answer` now contains a / b
// `a` now contains the remainder
```
The tricky part here is determining whether $b \leq a$.
We will do this by performing the subtraction and see if the result is negative.
How do we check for a negative result?
Many CPUs have a `neg` flag for this.
Ours doesn't, but we do have a `carry` flag!
Remember that subtracting is implemented in our ALU as addition with the twos-complement.
For example, 200 - 100 is:
```
11001000 binary representation of 200
10011100 twos-complement of 100 (01100100 in binary)
---------- +
1 01100100 carry flag is set!
```
And 100 - 200 is:
```
01100100 binary representation of 100
00111000 twos-complement of 200 (11001000 in binary)
---------- +
0 10011100 carry flag is not set!
```
Whenever we tell our ALU to subtract two numbers, the `carry` flag will be set if the result was *positive*.
Let's also check the edge case of the result being zero:
```
01100100 binary representation of 100
10011100 twos-complement of 100 (01100100 in binary)
---------- +
1 00000000 carry flag is set! (and zero flag is set)
```
There's one final edge case: $0 - 0$.
From the examples above, you might expect that the `carry` flag will not be set, but I've been simplifying things a bit.
Our ALU computes the twos-complement by XOR-ing all the bits with 1, and setting the `carry-in` pin of the first adder.
So the computation is actually:
```
00000000 binary representation of 0
11111111 0 fed through the XOR gates
00000001 carry-in pin is set, which adds 1
---------- +
1 00000000 carry flag is set! (and zero flag is set)
```
So, whenever our ALU computes $a - b$, the `carry` flag indicates $b \leq a$, and the absence of the flag indicates $b > a$.
With this knowledge, we can write our division program:
```
;
; Program that computes a / b
;
loop: lda a
sub b
jc iter ; b <= a, iterate the loop
jmp end ; a > b, end the loop
iter: sta a
ldi 1
add ans
sta ans
jmp loop
end: lda ans
out
hlt
a: db 14
b: db 2
answer: db 0
```
This program takes up 15 bytes, so can also be performed by the original breadboard computer architecture.
## Implementing subroutines
If you scroll back up to the C version of the square root program, you'll see that we need to divide three times.
It would therefore be nice if we could implement our division program as a subroutine.
Unfortunately, our breadboard computer does not have `call` and `ret` instructions.
Or a stack.
We'll have to get creative.
The lack of a hardware stack is not really a problem.
Normally, it is used to save/restore the state of the registers, but we only have the "A" register accessible from code, and we might as well put the return value of our subroutine in there.
The stack is also used to pass parameters to the subroutine, but we can just put them somewhere in memory where our subroutine can find them.
Our problem is the return address.
At the end of the subroutine, the program needs to jump back to where it was called from.
But we only have a `jmp` instruction that takes a hardcoded jump address as parameter.
We'd like to perform an [indirect jump](https://en.wikipedia.org/wiki/Indirect_branch).
We're going to have to resort to something that is normally considered a bad idea:
override the hardcoded parameter of the `jmp` instruction at runtime by writing to its memory location.
In the original architecture, a `jmp` instruction only takes one byte of memory:
```
0110 1010 jmp 10
```
In the upgraded memory architecture, we need a full byte for the memory address, so the `jmp` instruction takes two bytes of memory:
```
0000 0110 jmp
0000 1010 10
```
Conveniently, the parameter is now on a memory address of its own.
Here is an example of a subroutine in action in the extended memory architecture ([example on the original architecture](https://github.com/wmvanvliet/8bit/blob/main/example_programs/subroutine.asm)):
```
;
; Demonstrate calling a subroutine from two different locations.
;
ldi cont1 ; set return address (cont1)
sta sub_ret + 1 ; overwrite parameter of return jump instruction of subroutine
ldi 1 ; setup argument for the subroutine
jmp sub ; call the subroutine
cont1: ldi cont2 ; set return address (cont2)
sta sub_ret + 1 ; overwrite parameter of return jump instruction of subroutine
ldi 2 ; setup argument for the subroutine
jmp sub ; call the subroutine
cont2: ldi 3 ; subroutine returns to here
out
hlt
; A subroutine that displays the current value of the A register
sub:
out
sub_ret: ; this label indicates the address of the jmp instruction
jmp 0 ; jump to the return address (param set at runtime)
```
The assembler is clever enough to be able to compute `sub_ret + 1` when resolving the labels.
## Putting it all together
Now we can write a subroutine to perform division, and with that write the square root program.
Here we compute $\sqrt{49} = 7$:
```
;
; Compute the square root of a number, based on the Babylonian method.
;
; Compute inital guess of the sqrt (S / 10)
lda S ; prepare for computing S / 10
sta numer
ldi 10
sta denom
ldi 0
sta ans
ldi a ; setup return address
sta div_ret + 1 ; address of the param of the jmp instruction
jmp div ; call division subroutine
a: lda ans ; subroutine returns here
sta x ; this is our inital guess for the sqrt
jz set_to_one ; guess should not be zero to prevent divide-by-zero errors
jmp iter ; if not zero, start refining the guess
set_to_one:
ldi 1
sta x
; Refine the guess of the sqrt
iter:
lda S ; prepare for computing S / x
sta numer
lda x
sta denom
ldi 0
sta ans
ldi b ; setup return address
sta div_ret + 1 ; address of the param of the jmp instruction
jmp div ; call division subroutine
b: lda ans ; subroutine returns here
add x ; prepare for computing (x + S / x) / 2
sta numer
ldi 2
sta denom
ldi 0
sta ans
ldi c ; setup return address
sta div_ret + 1 ; address of the param of the jmp instruction
jmp div ; call division subroutine
c: lda ans ; subroutine returns here
sta x
out ; display our refined guess
sub x_prev ; check for convergence
jz end
lda x ; not converged yet. x_prev = x and loop
sta x_prev
jmp iter
end: hlt
; Subroutine for division
div: lda numer ; start computing numer / denom
sub denom
jc div_iter
div_ret:
jmp 0 ; we are done, jump to return address
div_iter:
sta numer
ldi 1 ; increment `ans` by 1
add ans
sta ans
jmp div ; loop
; Variables
S: db 49 ; number of compute sqrt of
x: db 0 ; current guess of the square root
x_prev: db 0 ; used to track convergence
numer: db 0 ; first parameter for division subroutine
denom: db 0 ; second parameter for division subroutine
ans: db 0 ; result of the division subroutine
```