Computing the square root on Ben Eater's breadboard computer (upgraded RAM)

tags: `8bit`

It's surprising how much power a simple computer architecture as the SAP-1 has. This is a report of my journey to create a program that computes the square root of any integer number

0 \leq S \leq 255

. Unfortunately this is impossible in just 16 bytes of RAM, so I will be using a version of Ben Eater's breadboard computer that has had its memory expanded to 256 bytes, following the guide of /u/MironV. However, we will not be using any extra instructions, flags or control lines.

The code and an emulator to run it are available here: https://github.com/wmvanvliet/8bit/tree/ext_memory.

The Babylonian method for computing the square root

To method we will use to compute the square root is known as the Babylonian method. In C, it goes like this:

uint8 S = 49;             // number we want to compute the sqrt of
uint8 x = S / 10;         // initial guess of the sqrt
if(x == 0)
    x = 1;                // to prevent divide-by-zero errors
uint8 x_prev = 0;         // our previous guess, used to track convergence
while(x != x_prev) {      // check for convergence
    x_prev = x;
    x = (x + S / x) / 2;  // this brings `x` closer to the sqrt
}
// `x` now contains the square root

The algorithm is easy enough to wrap your head around. However, there is a challenge. Our computer does not have a native divide instruction. We will have to make one.

Program to divide a number by another number

Ben Eater demonstrated a program to multiply two numbers. In order to compute

a \times b

, we start with 0, then loop

b

times, each time adding

a

to the result. In C:

uint8 a = 2;       // compute a * b
uint8 b = 4;
uint8 answer = 0;  // answer is stored here
while(b > 0) {
    answer += a;
    b--;
}
// `answer` now contains a * b

Division can be performed by reversing the polarity of the multiplication algorithm. To compute

a / b

, we start with

a

, then enter a loop, each time subtracting

b

from

a

until

b

no longer fits in

a

, that is

b > a

. The number of times we iterated the loop is the answer, and whatever is left of

a

is the remainder of the division. In C:

uint8 a = 14
uint8 b = 2
uint8 answer = 0;
while(b <= a) {
    a -= b;
    answer++;
}
// `answer` now contains a / b
// `a` now contains the remainder

The tricky part here is determining whether

b \leq a

. We will do this by performing the subtraction and see if the result is negative. How do we check for a negative result? Many CPUs have a neg flag for this. Ours doesn't, but we do have a carry flag! Remember that subtracting is implemented in our ALU as addition with the twos-complement. For example, 200 - 100 is:

  11001000       binary representation of 200
  10011100       twos-complement of 100 (01100100 in binary)
---------- +
1 01100100       carry flag is set!

And 100 - 200 is:

  01100100       binary representation of 100
  00111000       twos-complement of 200 (11001000 in binary)
---------- +
0 10011100       carry flag is not set!

Whenever we tell our ALU to subtract two numbers, the carry flag will be set if the result was positive. Let's also check the edge case of the result being zero:

  01100100       binary representation of 100
  10011100       twos-complement of 100 (01100100 in binary)
---------- +
1 00000000       carry flag is set! (and zero flag is set)

There's one final edge case:

0 - 0

. From the examples above, you might expect that the carry flag will not be set, but I've been simplifying things a bit. Our ALU computes the twos-complement by XOR-ing all the bits with 1, and setting the carry-in pin of the first adder. So the computation is actually:

  00000000       binary representation of 0
  11111111       0 fed through the XOR gates
  00000001       carry-in pin is set, which adds 1
---------- +
1 00000000       carry flag is set! (and zero flag is set)

So, whenever our ALU computes

a - b

, the carry flag indicates

b \leq a

, and the absence of the flag indicates

b > a

With this knowledge, we can write our division program:

;
; Program that computes a / b
;
loop:	lda a
	sub b
	jc iter		; b <= a, iterate the loop
	jmp end		; a > b, end the loop

iter:	sta a
	ldi 1
	add ans
	sta ans
	jmp loop

end:	lda ans
	out
	hlt

a:	db 14
b:	db 2
answer:	db 0

This program takes up 15 bytes, so can also be performed by the original breadboard computer architecture.

Implementing subroutines

If you scroll back up to the C version of the square root program, you'll see that we need to divide three times. It would therefore be nice if we could implement our division program as a subroutine. Unfortunately, our breadboard computer does not have call and ret instructions. Or a stack. We'll have to get creative.

The lack of a hardware stack is not really a problem. Normally, it is used to save/restore the state of the registers, but we only have the "A" register accessible from code, and we might as well put the return value of our subroutine in there. The stack is also used to pass parameters to the subroutine, but we can just put them somewhere in memory where our subroutine can find them.

Our problem is the return address.
At the end of the subroutine, the program needs to jump back to where it was called from. But we only have a jmp instruction that takes a hardcoded jump address as parameter. We'd like to perform an indirect jump. We're going to have to resort to something that is normally considered a bad idea: override the hardcoded parameter of the jmp instruction at runtime by writing to its memory location.

In the original architecture, a jmp instruction only takes one byte of memory:

0110 1010      jmp 10

In the upgraded memory architecture, we need a full byte for the memory address, so the jmp instruction takes two bytes of memory:

0000 0110      jmp
0000 1010      10

Conveniently, the parameter is now on a memory address of its own. Here is an example of a subroutine in action in the extended memory architecture (example on the original architecture):

;
; Demonstrate calling a subroutine from two different locations.
;
	ldi cont1       ; set return address (cont1)
	sta sub_ret + 1 ; overwrite parameter of return jump instruction of subroutine
	ldi 1           ; setup argument for the subroutine
	jmp sub         ; call the subroutine

cont1:  ldi cont2       ; set return address (cont2)
	sta sub_ret + 1 ; overwrite parameter of return jump instruction of subroutine
	ldi 2           ; setup argument for the subroutine
	jmp sub         ; call the subroutine

cont2:	ldi 3           ; subroutine returns to here
	out
	hlt

; A subroutine that displays the current value of the A register
sub:
	out
sub_ret:                ; this label indicates the address of the jmp instruction
	jmp 0           ; jump to the return address (param set at runtime)

The assembler is clever enough to be able to compute sub_ret + 1 when resolving the labels.

Putting it all together

Now we can write a subroutine to perform division, and with that write the square root program. Here we compute

\sqrt{49} = 7

;
; Compute the square root of a number, based on the Babylonian method.
;

; Compute inital guess of the sqrt (S / 10)
	lda S            ; prepare for computing S / 10
	sta numer
	ldi 10
	sta denom
	ldi 0
	sta ans
	ldi a            ; setup return address
	sta div_ret + 1  ; address of the param of the jmp instruction
	jmp div          ; call division subroutine
a:	lda ans          ; subroutine returns here
	sta x            ; this is our inital guess for the sqrt
	jz set_to_one    ; guess should not be zero to prevent divide-by-zero errors
	jmp iter         ; if not zero, start refining the guess
set_to_one:
	ldi 1
	sta x

; Refine the guess of the sqrt
iter:
	lda S            ; prepare for computing S / x
	sta numer
	lda x
	sta denom
	ldi 0
	sta ans
	ldi b            ; setup return address
	sta div_ret + 1  ; address of the param of the jmp instruction
	jmp div          ; call division subroutine
b:	lda ans          ; subroutine returns here

	add x            ; prepare for computing (x + S / x) / 2
	sta numer
	ldi 2
	sta denom 
	ldi 0
	sta ans
	ldi c            ; setup return address
	sta div_ret + 1  ; address of the param of the jmp instruction
	jmp div          ; call division subroutine
c:	lda ans          ; subroutine returns here
	sta x
	out              ; display our refined guess

	sub x_prev       ; check for convergence
	jz end
	lda x            ; not converged yet. x_prev = x and loop
	sta x_prev
	jmp iter

end:    hlt

; Subroutine for division
div:	lda numer         ; start computing numer / denom
	sub denom 
	jc div_iter
div_ret:
	jmp 0            ; we are done, jump to return address
div_iter:
	sta numer
	ldi 1            ; increment `ans` by 1
	add ans
	sta ans
	jmp div          ; loop

; Variables
S:	db 49            ; number of compute sqrt of
x:	db 0             ; current guess of the square root
x_prev:	db 0             ; used to track convergence
numer:	db 0             ; first parameter for division subroutine
denom:	db 0             ; second parameter for division subroutine
ans:	db 0             ; result of the division subroutine

Computing the square root on Ben Eater's breadboard computer (upgraded RAM)

tags: 8bit

The Babylonian method for computing the square root

Program to divide a number by another number

Implementing subroutines

Putting it all together

Read more

Invitation for participants You 2024

Converting from LaTeX to DOCX (and everything else): Pandoc and working around its limitations

Setting up a remote desktop from your work laptop to Aalto Ubuntu workstation

Proof of concept: text chain of trust

tags: `8bit`