8bit
It's surprising how much power a simple computer architecture as the SAP-1 has. This is a report of my journey to create a program that computes the square root of any integer number . Unfortunately this is impossible in just 16 bytes of RAM, so I will be using a version of Ben Eater's breadboard computer that has had its memory expanded to 256 bytes, following the guide of /u/MironV. However, we will not be using any extra instructions, flags or control lines.
The code and an emulator to run it are available here: https://github.com/wmvanvliet/8bit/tree/ext_memory.
To method we will use to compute the square root is known as the Babylonian method. In C, it goes like this:
The algorithm is easy enough to wrap your head around. However, there is a challenge. Our computer does not have a native divide instruction. We will have to make one.
Ben Eater demonstrated a program to multiply two numbers. In order to compute , we start with 0, then loop times, each time adding to the result. In C:
Division can be performed by reversing the polarity of the multiplication algorithm. To compute , we start with , then enter a loop, each time subtracting from until no longer fits in , that is . The number of times we iterated the loop is the answer, and whatever is left of is the remainder of the division. In C:
The tricky part here is determining whether .
We will do this by performing the subtraction and see if the result is negative.
How do we check for a negative result?
Many CPUs have a neg
flag for this.
Ours doesn't, but we do have a carry
flag!
Remember that subtracting is implemented in our ALU as addition with the twos-complement.
For example, 200 - 100 is:
And 100 - 200 is:
Whenever we tell our ALU to subtract two numbers, the carry
flag will be set if the result was positive.
Let's also check the edge case of the result being zero:
There's one final edge case: .
From the examples above, you might expect that the carry
flag will not be set, but I've been simplifying things a bit.
Our ALU computes the twos-complement by XOR-ing all the bits with 1, and setting the carry-in
pin of the first adder.
So the computation is actually:
So, whenever our ALU computes , the carry
flag indicates , and the absence of the flag indicates .
With this knowledge, we can write our division program:
This program takes up 15 bytes, so can also be performed by the original breadboard computer architecture.
If you scroll back up to the C version of the square root program, you'll see that we need to divide three times.
It would therefore be nice if we could implement our division program as a subroutine.
Unfortunately, our breadboard computer does not have call
and ret
instructions.
Or a stack.
We'll have to get creative.
The lack of a hardware stack is not really a problem. Normally, it is used to save/restore the state of the registers, but we only have the "A" register accessible from code, and we might as well put the return value of our subroutine in there. The stack is also used to pass parameters to the subroutine, but we can just put them somewhere in memory where our subroutine can find them.
Our problem is the return address.
At the end of the subroutine, the program needs to jump back to where it was called from.
But we only have a jmp
instruction that takes a hardcoded jump address as parameter.
We'd like to perform an indirect jump.
We're going to have to resort to something that is normally considered a bad idea:
override the hardcoded parameter of the jmp
instruction at runtime by writing to its memory location.
In the original architecture, a jmp
instruction only takes one byte of memory:
In the upgraded memory architecture, we need a full byte for the memory address, so the jmp
instruction takes two bytes of memory:
Conveniently, the parameter is now on a memory address of its own. Here is an example of a subroutine in action in the extended memory architecture (example on the original architecture):
The assembler is clever enough to be able to compute sub_ret + 1
when resolving the labels.
Now we can write a subroutine to perform division, and with that write the square root program. Here we compute :